From jwaters at openjdk.org Mon Jul 1 04:21:23 2024 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 1 Jul 2024 04:21:23 GMT Subject: RFR: 8335283: Build failure due to 'no_sanitize' attribute directive ignored In-Reply-To: <AEE6fRQYsPzCnuCjDiT7JMmMIL9j2NleCZBs33Y9P7o=.65148c51-5815-458d-be42-eeed08ddfba7@github.com> References: <AEE6fRQYsPzCnuCjDiT7JMmMIL9j2NleCZBs33Y9P7o=.65148c51-5815-458d-be42-eeed08ddfba7@github.com> Message-ID: <kj0Pm-kX9z8Cdf7UnRFxzU0C2idwoYb9fTOzWEM3VQo=.283058fa-c75d-42f9-835e-8bf5747c2439@github.com> On Fri, 28 Jun 2024 11:04:16 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote: > The following build error has been reported with old gcc used > installers/linux/universal/tar/corretto-build/buildRoot/src/hotspot/share/utilities/vmError.cpp:2068:44: error: 'no_sanitize' attribute directive ignored [-Werror=attributes] > static void ALWAYSINLINE crash_with_sigfpe() { > > We can avoid it by not settings the mentioned attribute in case ubsan is not enabled. Marked as reviewed by jwaters (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19937#pullrequestreview-2150364871 From rehn at openjdk.org Mon Jul 1 06:22:18 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 1 Jul 2024 06:22:18 GMT Subject: RFR: 8334999: RISC-V: implement AES single block encryption/decryption intrinsics In-Reply-To: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com> References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com> Message-ID: <RzKNgwshdkyAHmHaN6d32EqBHTSeIh6u8RKs7YhsuLc=.3cc0f5e1-fb1e-4c9d-ba23-1a7bc93d8ba3@github.com> On Sun, 30 Jun 2024 14:02:00 GMT, ArsenyBochkarev <duke at openjdk.org> wrote: > Hello everyone! Please review this port of vector AES single block encryption/decryption intrinsics. On my QEMU with `Zvkned` extension enabled the `test/hotspot/jtreg/compiler/codegen/aes/TestAESMain.java` test is OK. I know that currently hardware implementing this extension is not available on the market but I suppose this PR can be a good starting point on supporting AES intrinsics for RISC-V in OpenJDK. Thank you, looks good! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19960#pullrequestreview-2150512867 From mbaesken at openjdk.org Mon Jul 1 06:39:30 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 1 Jul 2024 06:39:30 GMT Subject: Integrated: 8335283: Build failure due to 'no_sanitize' attribute directive ignored In-Reply-To: <AEE6fRQYsPzCnuCjDiT7JMmMIL9j2NleCZBs33Y9P7o=.65148c51-5815-458d-be42-eeed08ddfba7@github.com> References: <AEE6fRQYsPzCnuCjDiT7JMmMIL9j2NleCZBs33Y9P7o=.65148c51-5815-458d-be42-eeed08ddfba7@github.com> Message-ID: <NSOQCamAWJrvAKo4rnT1elhRksMwbhA4WWeZKaEoRKU=.10d03b3c-6a72-4e89-bccb-fc6c94f83e6f@github.com> On Fri, 28 Jun 2024 11:04:16 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote: > The following build error has been reported with old gcc used > installers/linux/universal/tar/corretto-build/buildRoot/src/hotspot/share/utilities/vmError.cpp:2068:44: error: 'no_sanitize' attribute directive ignored [-Werror=attributes] > static void ALWAYSINLINE crash_with_sigfpe() { > > We can avoid it by not settings the mentioned attribute in case ubsan is not enabled. This pull request has now been integrated. Changeset: 53242cdf Author: Matthias Baesken <mbaesken at openjdk.org> URL: https://git.openjdk.org/jdk/commit/53242cdf9ef17c502ebd541e84370e7c158639c1 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8335283: Build failure due to 'no_sanitize' attribute directive ignored Reviewed-by: shade, tschatzl, kbarrett, jwaters ------------- PR: https://git.openjdk.org/jdk/pull/19937 From mbaesken at openjdk.org Mon Jul 1 06:39:30 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 1 Jul 2024 06:39:30 GMT Subject: RFR: 8335283: Build failure due to 'no_sanitize' attribute directive ignored In-Reply-To: <AEE6fRQYsPzCnuCjDiT7JMmMIL9j2NleCZBs33Y9P7o=.65148c51-5815-458d-be42-eeed08ddfba7@github.com> References: <AEE6fRQYsPzCnuCjDiT7JMmMIL9j2NleCZBs33Y9P7o=.65148c51-5815-458d-be42-eeed08ddfba7@github.com> Message-ID: <AMdyfaoRH6SfX4yU2FFq31zmaeAv1wzNSdd9bhsk9i4=.2407074b-782f-4ae4-b5db-d25c37552f6e@github.com> On Fri, 28 Jun 2024 11:04:16 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote: > The following build error has been reported with old gcc used > installers/linux/universal/tar/corretto-build/buildRoot/src/hotspot/share/utilities/vmError.cpp:2068:44: error: 'no_sanitize' attribute directive ignored [-Werror=attributes] > static void ALWAYSINLINE crash_with_sigfpe() { > > We can avoid it by not settings the mentioned attribute in case ubsan is not enabled. Thanks for the reviews ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19937#issuecomment-2199351560 From luhenry at openjdk.org Mon Jul 1 06:43:25 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 1 Jul 2024 06:43:25 GMT Subject: RFR: 8334999: RISC-V: implement AES single block encryption/decryption intrinsics In-Reply-To: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com> References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com> Message-ID: <BWV1qtKhP0MV1SrYotttrc0LqUNWLVWjUqwF5ZQQPj0=.c586f3d6-d2f3-4325-a2c8-9de67f67b6ec@github.com> On Sun, 30 Jun 2024 14:02:00 GMT, ArsenyBochkarev <duke at openjdk.org> wrote: > Hello everyone! Please review this port of vector AES single block encryption/decryption intrinsics. On my QEMU with `Zvkned` extension enabled the `test/hotspot/jtreg/compiler/codegen/aes/TestAESMain.java` test is OK. I know that currently hardware implementing this extension is not available on the market but I suppose this PR can be a good starting point on supporting AES intrinsics for RISC-V in OpenJDK. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2348: > 2346: __ lwu(keylen, Address(key, arrayOopDesc::length_offset_in_bytes() - arrayOopDesc::base_offset_in_bytes(T_INT))); > 2347: > 2348: __ vsetivli(temp1, 4, Assembler::e32, Assembler::m1); There is no use of `temp1` after, should we replace with `x0`? src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2351: > 2349: __ vle32_v(res, from); > 2350: __ vmv_v_x(vzero, zr); > 2351: generate_vle32_pack4(key, vtmp1, vtmp2, vtmp3, vtmp4); It would be great to add a quick comment mentioning the side effect on `key` of this function call. Same at https://github.com/openjdk/jdk/pull/19960/files#diff-97f199af6d1c8c17b2fa4f50eb1bbc0081858cc59a899f32792a2d31f933ccc4R2355 and https://github.com/openjdk/jdk/pull/19960/files#diff-97f199af6d1c8c17b2fa4f50eb1bbc0081858cc59a899f32792a2d31f933ccc4R2359 src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2362: > 2360: generate_rev8_pack2(vtmp1, vtmp2); > 2361: > 2362: __ mv(temp2, 44); You could replace `temp2` by `t0`/`t1`/`t2` src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2448: > 2446: __ lwu(keylen, Address(key, arrayOopDesc::length_offset_in_bytes() - arrayOopDesc::base_offset_in_bytes(T_INT))); > 2447: > 2448: __ vsetivli(temp1, 4, Assembler::e32, Assembler::m1); Same as for encrypt, there is no use of `temp1`, could you replace by `x0`? src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2459: > 2457: generate_aesdecrypt_round(res, vzero, vtmp1, vtmp2, vtmp3, vtmp4); > 2458: > 2459: generate_vle32_pack4(key, vtmp1, vtmp2, vtmp3, vtmp4); Same as above, please add a comment on the side effect on `key`. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2466: > 2464: generate_rev8_pack2(vtmp1, vtmp2); > 2465: > 2466: __ mv(temp2, 44); Same as above, could you use `t0`/`t1`/`t2` instead? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1660541398 PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1660542460 PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1660541850 PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1660543748 PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1660544520 PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1660544242 From duke at openjdk.org Mon Jul 1 07:47:20 2024 From: duke at openjdk.org (duke) Date: Mon, 1 Jul 2024 07:47:20 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v18] In-Reply-To: <XHMQAMOt-SyDN5AtlV53ZI39o9T6PtwbiSnnNaw94S0=.778ad97c-0e9f-41e5-b5e8-1ee24cbfb26e@github.com> References: <M5khiX3nenq64pSpGHB2ClK0pABiaZb0y7YwKYFhgK4=.602e9a58-6792-4bdb-87cf-a95a4346937f@github.com> <XHMQAMOt-SyDN5AtlV53ZI39o9T6PtwbiSnnNaw94S0=.778ad97c-0e9f-41e5-b5e8-1ee24cbfb26e@github.com> Message-ID: <5dDjN6TS_mEh79bvo2LIOx6cRlyM0kO_Lwcj-AKPH94=.08d1abbe-c4d3-4f03-9805-99c28cd0e47d@github.com> On Sat, 29 Jun 2024 06:47:52 GMT, Suchismith Roy <sroy at openjdk.org> wrote: >> [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) >> The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). >> >> SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. >> Power10 has the "setbc" / "setbcr" instruction. >> >> A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). >> >> The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. > > Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: > > default value correction @suchismith1993 Your change (at version 0de46f43f5f7fa233fdd2154edf971941b16ab4a) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19886#issuecomment-2199463949 From sroy at openjdk.org Mon Jul 1 08:10:36 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Mon, 1 Jul 2024 08:10:36 GMT Subject: Integrated: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 In-Reply-To: <M5khiX3nenq64pSpGHB2ClK0pABiaZb0y7YwKYFhgK4=.602e9a58-6792-4bdb-87cf-a95a4346937f@github.com> References: <M5khiX3nenq64pSpGHB2ClK0pABiaZb0y7YwKYFhgK4=.602e9a58-6792-4bdb-87cf-a95a4346937f@github.com> Message-ID: <I1xSNawOD5e5syxs03Rx5xvB6TPWnbKwHBaI2Oc86PE=.28e6fe71-1b01-4dcf-818d-64a440200620@github.com> On Tue, 25 Jun 2024 15:35:43 GMT, Suchismith Roy <sroy at openjdk.org> wrote: > [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) > The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). > > SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. > Power10 has the "setbc" / "setbcr" instruction. > > A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). > > The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. This pull request has now been integrated. Changeset: c7e9ebb4 Author: Suchismith Roy <sroy at openjdk.org> Committer: Martin Doerr <mdoerr at openjdk.org> URL: https://git.openjdk.org/jdk/commit/c7e9ebb4cfff56b7a977eb2942f563f96b3336bd Stats: 54 lines in 7 files changed: 33 ins; 11 del; 10 mod 8331732: [PPC64] Unify and optimize code which converts != 0 to 1 Reviewed-by: mdoerr, amitkumar ------------- PR: https://git.openjdk.org/jdk/pull/19886 From aph at openjdk.org Mon Jul 1 08:49:25 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 1 Jul 2024 08:49:25 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v6] In-Reply-To: <exEarPhFo6CWfoL6H8P8Ydrdy5XYSfop67bRl9yuzro=.2c5f2707-f786-4991-8595-f1fb04626b3a@github.com> References: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com> <NSOakR2gz_n5Nlsthxp9d3h-1PCriKR89bLyhsvOfJ4=.b3af7ba9-f3f2-47c0-98a9-6f106416a0c6@github.com> <CrgtP7a89N6V7kzI-0jMcIGPJCpRJXIBcKupHKIGSb0=.999fdf6e-8231-45e8-8d25-35dcfd8fa6df@github.com> <exEarPhFo6CWfoL6H8P8Ydrdy5XYSfop67bRl9yuzro=.2c5f2707-f786-4991-8595-f1fb04626b3a@github.com> Message-ID: <YgXbsHRi1Zpf8q0vu1J-tricRWTkep-6Nz5pqxgWTj8=.1914f498-15ba-4380-93b8-57aed13cbe25@github.com> On Sun, 30 Jun 2024 15:34:02 GMT, Amit Kumar <amitkumar at openjdk.org> wrote: >> src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3243: >> >>> 3241: // Get the first array index that can contain super_klass. >>> 3242: if (bit != 0) { >>> 3243: pop_count_long(r_array_index, r_array_index, Z_R1_scratch); // all the registers are hardcoded so should be fine >> >> This comment is also rather baffling. You seem to be concerned about something, but what? `pop_count_long` doesn't cause any particular risk, does it? > > For machines older than `Z15`, `pop_count_long` clobbers `Z_R1_scratch` register. That's why I added it there. Better then just to say what matters: "NB: May clobber Z_R1_scratch" or "Clobbers Z_R1_scratch on older machines." "Should be fine" is just confusing because "should" is tentative, where you need to be certain in comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1660691384 From aph at openjdk.org Mon Jul 1 08:49:26 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 1 Jul 2024 08:49:26 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v4] In-Reply-To: <M6NsxpbSTah5LZYXLqez-2Y0_rebkaoyIfp1W_if-Ko=.d47e435a-338f-463a-b810-29a63cd04510@github.com> References: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com> <VUjXfo6HMhmcuR_vkWvbTRGqlbcX5E5W37WrkUxN9L8=.48ff6757-8d99-44f1-98fd-ce7d29c3c7db@github.com> <M6NsxpbSTah5LZYXLqez-2Y0_rebkaoyIfp1W_if-Ko=.d47e435a-338f-463a-b810-29a63cd04510@github.com> Message-ID: <OLkjLuRRnaThpE2TIXSckXwFKkx94sNYK5fDQylLvvU=.57f5aedc-3e6f-4967-9b07-9693a7ee417c@github.com> On Mon, 24 Jun 2024 14:02:15 GMT, Lutz Schmidt <lucy at openjdk.org> wrote: >> Amit Kumar has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp >> - rename: r_scratch to r_result in repne_scan method > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3275: > >> 3273: call_stub(StubRoutines::lookup_secondary_supers_table_slow_path_stub()); >> 3274: >> 3275: z_bru(L_done); // pass whatever result we got from a slow path > > This one branch could be saved by using "load immediate on condition". But it's after slow path processing. As @RealLucy says, this is after slow processing. It's not worth optimizing here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1660693537 From sgehwolf at openjdk.org Mon Jul 1 08:50:31 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Mon, 1 Jul 2024 08:50:31 GMT Subject: Integrated: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container In-Reply-To: <ElCWVtphfdgi651yIzNsfJS3C5ewhpvOH9ZXjqt3PFE=.de11a3f2-c2fd-4cc4-8c6d-e20f9c8de03d@github.com> References: <ElCWVtphfdgi651yIzNsfJS3C5ewhpvOH9ZXjqt3PFE=.de11a3f2-c2fd-4cc4-8c6d-e20f9c8de03d@github.com> Message-ID: <0hEDyLmsRgW-GR23fKnixv3_5edApaB_eoEQ2D_28NU=.32f148cd-e8a2-4bef-b8bc-44ec7cc0dbd0@github.com> On Mon, 11 Mar 2024 16:55:36 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote: > Please review this enhancement to the container detection code which allows it to figure out whether the JVM is actually running inside a container (`podman`, `docker`, `crio`), or with some other means that enforces memory/cpu limits by means of the cgroup filesystem. If neither of those conditions hold, the JVM runs in not containerized mode, addressing the issue described in the JBS tracker. For example, on my Linux system `is_containerized() == false" is being indicated with the following trace log line: > > > [0.001s][debug][os,container] OSContainer::init: is_containerized() = false because no cpu or memory limit is present > > > This state is being exposed by the Java `Metrics` API class using the new (still JDK internal) `isContainerized()` method. Example: > > > java -XshowSettings:system --version > Operating System Metrics: > Provider: cgroupv1 > System not containerized. > openjdk 23-internal 2024-09-17 > OpenJDK Runtime Environment (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk) > OpenJDK 64-Bit Server VM (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) > > > The basic property this is being built on is the observation that the cgroup controllers typically get mounted read only into containers. Note that the current container tests assert that `OSContainer::is_containerized() == true` in various tests. Therefore, using the heuristic of "is any memory or cpu limit present" isn't sufficient. I had considered that in an earlier iteration, but many container tests failed. > > Overall, I think, with this patch we improve the current situation of claiming a containerized system being present when it's actually just a regular Linux system. > > Testing: > > - [x] GHA (risc-v failure seems infra related) > - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 (including gtests) > - [x] Some manual testing using cri-o > > Thoughts? This pull request has now been integrated. Changeset: 0a6ffa57 Author: Severin Gehwolf <sgehwolf at openjdk.org> URL: https://git.openjdk.org/jdk/commit/0a6ffa57954ddf4f92205205a5a1bada813d127a Stats: 411 lines in 20 files changed: 305 ins; 79 del; 27 mod 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container Reviewed-by: stuefe, iklam ------------- PR: https://git.openjdk.org/jdk/pull/18201 From aph at openjdk.org Mon Jul 1 08:56:22 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 1 Jul 2024 08:56:22 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v13] In-Reply-To: <LeHv1fkxfIoNdN5KHLJzdI2RPwVd2ty54CdOVcr3Ahs=.789598fd-4c45-4bb3-b848-661ba1cfe5e6@github.com> References: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com> <LeHv1fkxfIoNdN5KHLJzdI2RPwVd2ty54CdOVcr3Ahs=.789598fd-4c45-4bb3-b848-661ba1cfe5e6@github.com> Message-ID: <c7ni2RO1HmDd-LclUqpqkiuXzXRk3SZ9a8_ZH67DlVw=.4da466e8-916d-4af3-a6ae-e2de177029bf@github.com> On Sun, 30 Jun 2024 15:56:48 GMT, Amit Kumar <amitkumar at openjdk.org> wrote: >> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) >> >> I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. >> >> >> Without Patch: >> >> SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op >> >> SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op >> >> Benchmark Mode Cnt Score Error Units >> SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > removed unnecessary checks src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3228: > 3226: > 3227: // load 0 in r_result by default. In case search fails, r_result will be loaded > 3228: // with value 1 (failure) at the end of this method. Suggestion: // Initialize r_result with 0 (indicating success). If searching fails, r_result will be loaded // with 1 (failure) at the end of this method. src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3331: > 3329: // wrapped around the end of the array. > 3330: > 3331: { // This is conventional linear probing, but instead of terminating, Suggestion: { // This is conventional linear probing, but instead of terminating src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3344: > 3342: > 3343: // We should only reach here after having found a bit in the bitmap. > 3344: // Invariant: array_length == popcount(bitmap) It turns out this invariant isn't true. When `array_length` is >= 63 we set `SECONDARY_SUPERS_BITMAP_FULL`, for performance reasons. Suggestion: ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1660697339 PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1660699570 PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1660702697 From sgehwolf at openjdk.org Mon Jul 1 08:50:29 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Mon, 1 Jul 2024 08:50:29 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v8] In-Reply-To: <qwrLCWWzsXHeQy6jM21G7MSXxKroMi-rpUHhk-KCgfc=.ff4e4746-72b3-496f-bd57-4526858e2e31@github.com> References: <ElCWVtphfdgi651yIzNsfJS3C5ewhpvOH9ZXjqt3PFE=.de11a3f2-c2fd-4cc4-8c6d-e20f9c8de03d@github.com> <qwrLCWWzsXHeQy6jM21G7MSXxKroMi-rpUHhk-KCgfc=.ff4e4746-72b3-496f-bd57-4526858e2e31@github.com> Message-ID: <x8LUWka1p6cXsWbPjJWo0OmtwtBBrQ0OOMx86EbgGNg=.9aaa99cb-1026-4eec-987a-f4683a7bb6ca@github.com> On Fri, 28 Jun 2024 15:41:48 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote: >> Please review this enhancement to the container detection code which allows it to figure out whether the JVM is actually running inside a container (`podman`, `docker`, `crio`), or with some other means that enforces memory/cpu limits by means of the cgroup filesystem. If neither of those conditions hold, the JVM runs in not containerized mode, addressing the issue described in the JBS tracker. For example, on my Linux system `is_containerized() == false" is being indicated with the following trace log line: >> >> >> [0.001s][debug][os,container] OSContainer::init: is_containerized() = false because no cpu or memory limit is present >> >> >> This state is being exposed by the Java `Metrics` API class using the new (still JDK internal) `isContainerized()` method. Example: >> >> >> java -XshowSettings:system --version >> Operating System Metrics: >> Provider: cgroupv1 >> System not containerized. >> openjdk 23-internal 2024-09-17 >> OpenJDK Runtime Environment (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) >> >> >> The basic property this is being built on is the observation that the cgroup controllers typically get mounted read only into containers. Note that the current container tests assert that `OSContainer::is_containerized() == true` in various tests. Therefore, using the heuristic of "is any memory or cpu limit present" isn't sufficient. I had considered that in an earlier iteration, but many container tests failed. >> >> Overall, I think, with this patch we improve the current situation of claiming a containerized system being present when it's actually just a regular Linux system. >> >> Testing: >> >> - [x] GHA (risc-v failure seems infra related) >> - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 (including gtests) >> - [x] Some manual testing using cri-o >> >> Thoughts? > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: > > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Refactor mount info matching to helper function > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Remove problem listing of PlainRead which is reworked here > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Add doc for mountinfo scanning. > - Unify naming of variables > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - ... and 8 more: https://git.openjdk.org/jdk/compare/486aa11e...1017da35 Thank you for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18201#issuecomment-2199581201 From aboldtch at openjdk.org Mon Jul 1 09:27:50 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 1 Jul 2024 09:27:50 GMT Subject: RFR: 8335397: Improve reliability of TestRecursiveMonitorChurn.java Message-ID: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com> TestRecursiveMonitorChurn.java currently uses NMT to try and correlate the native memory increase with unwanted inflation. Change to instead query the JVM for exact number of inflations via the Whitebox API. This allow us to both be more exact and less dependent on interactions with NMT. ------------- Commit messages: - 8335397: Improve reliability of TestRecursiveMonitorChurn.java Changes: https://git.openjdk.org/jdk/pull/19965/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19965&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335397 Stats: 77 lines in 5 files changed: 28 ins; 31 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/19965.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19965/head:pull/19965 PR: https://git.openjdk.org/jdk/pull/19965 From amitkumar at openjdk.org Mon Jul 1 09:31:56 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 1 Jul 2024 09:31:56 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v14] In-Reply-To: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com> References: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com> Message-ID: <PLAQttD3MSbh0TfJ-ae3bRziCW5anWeO_1Ajzy39mDs=.2068ea9a-e248-4a75-b0d8-a5486b069fe1@github.com> > s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) > > I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. > > > Without Patch: > > SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op > > SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op > > Benchmark Mode Cnt Score Error Units > SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op > SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op > SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op > SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op > SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op > SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op > SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op > SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op > SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op > SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op > SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op > SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op > SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op > SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op > SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op > SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op > SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op > SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op > SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op > SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op > SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op > SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op > SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ns/op > SecondarySupersLookup.testNegative61 avgt 15 27.763 ? 1.628 ns... Amit Kumar has updated the pull request incrementally with three additional commits since the last revision: - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp Co-authored-by: Andrew Haley <aph-open at littlepinkcloud.com> - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp Co-authored-by: Andrew Haley <aph-open at littlepinkcloud.com> - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp Co-authored-by: Andrew Haley <aph-open at littlepinkcloud.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19544/files - new: https://git.openjdk.org/jdk/pull/19544/files/98a8f5ad..6df0922f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=12-13 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19544.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19544/head:pull/19544 PR: https://git.openjdk.org/jdk/pull/19544 From amitkumar at openjdk.org Mon Jul 1 09:52:48 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 1 Jul 2024 09:52:48 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v15] In-Reply-To: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com> References: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com> Message-ID: <B3QnMNkIaw59FI-SRMX1R31gCni7ibyWczmIKYZlD2Y=.c0001db2-1a44-497e-aa9f-4f62147b41f5@github.com> > s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) > > I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. > > > Without Patch: > > SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op > > SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op > > Benchmark Mode Cnt Score Error Units > SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op > SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op > SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op > SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op > SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op > SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op > SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op > SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op > SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op > SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op > SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op > SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op > SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op > SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op > SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op > SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op > SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op > SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op > SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op > SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op > SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op > SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op > SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ns/op > SecondarySupersLookup.testNegative61 avgt 15 27.763 ? 1.628 ns... Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: updates comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19544/files - new: https://git.openjdk.org/jdk/pull/19544/files/6df0922f..6d05364f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=13-14 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19544.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19544/head:pull/19544 PR: https://git.openjdk.org/jdk/pull/19544 From aboldtch at openjdk.org Mon Jul 1 10:27:23 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 1 Jul 2024 10:27:23 GMT Subject: [jdk23] RFR: 8326820: Metadata artificially kept alive In-Reply-To: <9401T6FMpCnxvfgCCxHR-7-wEcwchAqf_ETKFbQSXg0=.096ce771-f64a-4e41-bb3a-94a1b232965c@github.com> References: <9401T6FMpCnxvfgCCxHR-7-wEcwchAqf_ETKFbQSXg0=.096ce771-f64a-4e41-bb3a-94a1b232965c@github.com> Message-ID: <zwf_jzY4OB6ioEcaMEeNUv7XIVO6vHXpycxJuD7pkks=.e175bd1b-3475-4a7e-8f0b-774d81bd5c3d@github.com> On Thu, 27 Jun 2024 14:30:43 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: > Hi all, > > This pull request contains a backport of commit [5909d541](https://github.com/openjdk/jdk/commit/5909d54147355dd7da5786ff39ead4c15816705c) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Axel Boldt-Christmas on 27 Jun 2024 and was reviewed by Erik ?sterlund, Stefan Karlsson and Coleen Phillimore. > > Thanks! Thanks for the review. This has been running through the JDK 24 CI over the weekend. No issues found. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19929#issuecomment-2199778367 From aboldtch at openjdk.org Mon Jul 1 10:27:24 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 1 Jul 2024 10:27:24 GMT Subject: [jdk23] Integrated: 8326820: Metadata artificially kept alive In-Reply-To: <9401T6FMpCnxvfgCCxHR-7-wEcwchAqf_ETKFbQSXg0=.096ce771-f64a-4e41-bb3a-94a1b232965c@github.com> References: <9401T6FMpCnxvfgCCxHR-7-wEcwchAqf_ETKFbQSXg0=.096ce771-f64a-4e41-bb3a-94a1b232965c@github.com> Message-ID: <ms7RUHs4I44jaHAR2lbrj4vU-wIoT65NDeFUYb1zZxw=.0cd04f0d-83cb-4bcd-af2f-2f43a481ad11@github.com> On Thu, 27 Jun 2024 14:30:43 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: > Hi all, > > This pull request contains a backport of commit [5909d541](https://github.com/openjdk/jdk/commit/5909d54147355dd7da5786ff39ead4c15816705c) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Axel Boldt-Christmas on 27 Jun 2024 and was reviewed by Erik ?sterlund, Stefan Karlsson and Coleen Phillimore. > > Thanks! This pull request has now been integrated. Changeset: e5fbc631 Author: Axel Boldt-Christmas <aboldtch at openjdk.org> URL: https://git.openjdk.org/jdk/commit/e5fbc631ca06b40a682149b0903221e190f592aa Stats: 80 lines in 6 files changed: 31 ins; 25 del; 24 mod 8326820: Metadata artificially kept alive Reviewed-by: stefank Backport-of: 5909d54147355dd7da5786ff39ead4c15816705c ------------- PR: https://git.openjdk.org/jdk/pull/19929 From rkennke at openjdk.org Mon Jul 1 12:17:18 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 1 Jul 2024 12:17:18 GMT Subject: RFR: 8335397: Improve reliability of TestRecursiveMonitorChurn.java In-Reply-To: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com> References: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com> Message-ID: <Trc5NZuM5oyxoD_2DUhYNsGNudFbt-7m7A7jAD6FILQ=.9ffd6838-474c-4246-85f9-79769b59aecd@github.com> On Mon, 1 Jul 2024 09:21:13 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: > TestRecursiveMonitorChurn.java currently uses NMT to try and correlate the native memory increase with unwanted inflation. > > Change to instead query the JVM for exact number of inflations via the Whitebox API. This allow us to both be more exact and less dependent on interactions with NMT. Looks good to me! Thank you! ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19965#pullrequestreview-2151229563 From coleenp at openjdk.org Mon Jul 1 12:19:21 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 1 Jul 2024 12:19:21 GMT Subject: [jdk23] RFR: 8333542: Breakpoint in parallel code does not work In-Reply-To: <PwvQso402YjRkww_XCi4EjEzV0YCe7WtVRnvCIRKpQo=.f8f95d7a-fa5c-4322-8375-17eff213e073@github.com> References: <PwvQso402YjRkww_XCi4EjEzV0YCe7WtVRnvCIRKpQo=.f8f95d7a-fa5c-4322-8375-17eff213e073@github.com> Message-ID: <ssQ1swZKVUBp1gwCQHH8rnp6OFZ5w8mHeTiU7BYeZk0=.5db1bfe3-448b-414e-abe1-56aa14659e1e@github.com> On Fri, 28 Jun 2024 12:14:55 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: > Clean backport of JDK-8333542. After this, we need a backport for JDK-8335134 to fix the test. Thank you Chris. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19938#issuecomment-2199990801 From coleenp at openjdk.org Mon Jul 1 12:19:22 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 1 Jul 2024 12:19:22 GMT Subject: [jdk23] Integrated: 8333542: Breakpoint in parallel code does not work In-Reply-To: <PwvQso402YjRkww_XCi4EjEzV0YCe7WtVRnvCIRKpQo=.f8f95d7a-fa5c-4322-8375-17eff213e073@github.com> References: <PwvQso402YjRkww_XCi4EjEzV0YCe7WtVRnvCIRKpQo=.f8f95d7a-fa5c-4322-8375-17eff213e073@github.com> Message-ID: <cp0Fet3QgVnMzTlRFx1CXA5UkTDNomotKUIDfhjWDiQ=.835e462f-8651-467a-af48-084dfa3f3f1c@github.com> On Fri, 28 Jun 2024 12:14:55 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: > Clean backport of JDK-8333542. After this, we need a backport for JDK-8335134 to fix the test. This pull request has now been integrated. Changeset: 7040de19 Author: Coleen Phillimore <coleenp at openjdk.org> URL: https://git.openjdk.org/jdk/commit/7040de19bdb29a3abacf2a39b7c7c30a07c61135 Stats: 516 lines in 16 files changed: 339 ins; 129 del; 48 mod 8333542: Breakpoint in parallel code does not work Reviewed-by: cjplummer Backport-of: b3bf31a0a08da679ec2fd21613243fb17b1135a9 ------------- PR: https://git.openjdk.org/jdk/pull/19938 From ayang at openjdk.org Mon Jul 1 12:33:20 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 1 Jul 2024 12:33:20 GMT Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is failing [v2] In-Reply-To: <C7DmVK87y-6S5Ljq58--YbyaFND0i03LQEGzpn0FBrY=.fd0e1e6f-86bb-44fa-b61d-1853a48e2fd7@github.com> References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com> <C7DmVK87y-6S5Ljq58--YbyaFND0i03LQEGzpn0FBrY=.fd0e1e6f-86bb-44fa-b61d-1853a48e2fd7@github.com> Message-ID: <_RvHrV4nbtsAGnJq9S_98XOXciT71gMb6DfbCr6WVC0=.1278784f-b6c5-4d5e-92e6-fa21db95bc51@github.com> On Fri, 28 Jun 2024 19:22:32 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> See JBS issue. >> >> It is not completely obvious what the problem is in Oracle's CI, but the current assumption is that RSS of the testee VM gets reduced after it started and before we measured due to memory pressure. >> >> The patch: >> - exposes os::available_memory via Whitebox >> - For the test to count as failed, we require a certain minimum size of available memory both before and during the start of the testee JVM. Otherwise, we throw a `SkippedException` >> >> I have some misgivings about this solution, though: >> 1) obviously, it is not bullet-proof either, since it is vulnerable to fast changes in machine memory load. >> 2) On MacOS, we have the problem that 'os::available_memory()' totally underreports how much memory is available. Therefore, as an estimate of whether the test is valid, it is too conservative. I opened https://bugs.openjdk.org/browse/JDK-8334767 to track that issue. As long as it is not fixed, the tests will likely fall below the threshold on MacOS and, therefore, be skipped. Still, this is somewhat better than outright excluding the test for MacOS (or is it? Open to opinions) >> 3) `SkippedException` leads to the test counting as "passed", not "skipped". I think that is a usability issue with jtreg. I cannot easily see which tests had been skipped due to SkippedException. >> >> Despite my doubts, I think this is the best we can come up with if we want to have such a test. >> >> Note: One way to go about (3) would be to make "minimum available memory" a `@requires` tag, similar to os.maxMemory. However, I fear that this may be easily misused and cause many tests to be excluded without notice. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Update TestAlwaysPreTouchBehavior.java test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java line 101: > 99: * @build jdk.test.whitebox.WhiteBox > 100: * @run driver jdk.test.lib.helpers.ClassFileInstaller jdk.test.whitebox.WhiteBox > 101: * @run main/othervm -Xbootclasspath/a:. -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -Xmx64m gc.TestAlwaysPreTouchBehavior -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC Why the explicit `-Xmx64m`? As I understand this is essentially the launcher, whose heap-size is of little importance. Also, why does the launch require `WhiteBoxAPI`? test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java line 161: > 159: System.out.println("RSS: " + rss + " available: " + avail + " committed " + committed); > 160: > 161: if (args[0].equals("run")) { When will this branch be taken? I can't find where the `run` arg is specified. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19803#discussion_r1660980458 PR Review Comment: https://git.openjdk.org/jdk/pull/19803#discussion_r1660970079 From stuefe at openjdk.org Mon Jul 1 12:39:20 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 1 Jul 2024 12:39:20 GMT Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is failing [v2] In-Reply-To: <_RvHrV4nbtsAGnJq9S_98XOXciT71gMb6DfbCr6WVC0=.1278784f-b6c5-4d5e-92e6-fa21db95bc51@github.com> References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com> <C7DmVK87y-6S5Ljq58--YbyaFND0i03LQEGzpn0FBrY=.fd0e1e6f-86bb-44fa-b61d-1853a48e2fd7@github.com> <_RvHrV4nbtsAGnJq9S_98XOXciT71gMb6DfbCr6WVC0=.1278784f-b6c5-4d5e-92e6-fa21db95bc51@github.com> Message-ID: <hxa4YcsrUexAD_cqjQV9kz556tbr62l25sEdAKJf9hk=.3c1f219b-582b-46af-89c1-0b3dbcdc2c16@github.com> On Mon, 1 Jul 2024 12:30:31 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: > Why the explicit -Xmx64m? As I understand this is essentially the launcher, whose heap-size is of little importance. No particular reason, just don't like launchers to use large heaps. I can remove it. > Also, why does the launch require WhiteBoxAPI? Because the launcher needs to access hostAvailableMemory in order to decide before starting the test whether it makes sense to start the test. > test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java line 161: > >> 159: System.out.println("RSS: " + rss + " available: " + avail + " committed " + committed); >> 160: >> 161: if (args[0].equals("run")) { > > When will this branch be taken? I can't find where the `run` arg is specified. See line 141 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19803#discussion_r1660988240 PR Review Comment: https://git.openjdk.org/jdk/pull/19803#discussion_r1660985382 From pchilanomate at openjdk.org Mon Jul 1 12:45:27 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 1 Jul 2024 12:45:27 GMT Subject: RFR: 8329665: fatal error: memory leak: allocating without ResourceMark [v2] In-Reply-To: <c7P6CcVzkWEXcWmPSaJs7_vigReuBsGop5J-Tr5AQDY=.ef8428fe-8951-4804-a81d-7e02d05564ca@github.com> References: <riJhOUThyVVgaGs098dEOrDTvXnAOjxlK48aA-yE_2s=.a235716d-5221-46c0-bd0b-ef0cf7ab9ccf@github.com> <BytvXQLrd23w2e_K_YnaTbAI5-HpFSq_loPL3sA33NQ=.ffce43c4-d601-49e3-a013-d741baec9852@github.com> <c7P6CcVzkWEXcWmPSaJs7_vigReuBsGop5J-Tr5AQDY=.ef8428fe-8951-4804-a81d-7e02d05564ca@github.com> Message-ID: <6Y8MH5mOmcBplADeqhAWSYRd3JA2Yul7UjA8D3cc-5Y=.144cefac-7c3b-4cfa-b40a-d7862282a614@github.com> On Thu, 27 Jun 2024 13:56:15 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> take ResourceMark out of debug only > > I am a bit concerned about this fix. Introducing an RM into `frame::oops_interpreted_do` means we cannot assemble anything in RA in the closure code and keep the memory across the RM. But closure code is opaque to the iteration site. Do we have any safeguards against OopClosure using and retaining RA memory? (Because even if no closure does this today, this could sneak in easily) @tstuefe I added checks and I found there is one case in JFR code where we can allocate and retain memory from the resource area across the closure (https://github.com/openjdk/jdk/blob/d9bcf061450ebfb7fe02b5a50c855db1d9178e5d/src/hotspot/share/jfr/leakprofiler/checkpoint/objectSampleWriter.cpp#L462). It can be triggered by running tests in jdk/jfr/event/oldobject. There are a couple of options I can think of here: 1 - Fix this case and add a safeguard to prevent this allocations. Maybe have some ResetRM object before mask.iterate_oop to reset the nesting in the RA on construction and then restore it on destruction. 2 - Allocate the _bit_mask in the C heap instead. There is a comment in InterpreterOopMap::resource_copy() that this has a significant performance cost. But we are already allocating an OopMapCacheEntry from the C heap, and this allocation for the bit_mask should be a rare case so I doubt this. 3 - Have the callers of frame::oops_interpreted_do() declare the ResourceMark instead. I wanted to avoid this because this is an implementation detail of InterpreterOopMap. 4 - Restore the code as it was before this change and add an object before NEW_RESOURCE_ARRAY(...) in InterpreterOopMap::resource_copy() that increments the nesting in the RA on construction and decrements it on destruction, i.e basically mark this particular allocation as okay without a RM since we check for it in ~InterpreterOopMap. What do you think? I don't actually see a particular reason to disallow this allocations so I'm now not convinced on 1. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18632#issuecomment-2200044049 From stuefe at openjdk.org Mon Jul 1 13:08:26 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 1 Jul 2024 13:08:26 GMT Subject: RFR: 8329665: fatal error: memory leak: allocating without ResourceMark [v2] In-Reply-To: <6Y8MH5mOmcBplADeqhAWSYRd3JA2Yul7UjA8D3cc-5Y=.144cefac-7c3b-4cfa-b40a-d7862282a614@github.com> References: <riJhOUThyVVgaGs098dEOrDTvXnAOjxlK48aA-yE_2s=.a235716d-5221-46c0-bd0b-ef0cf7ab9ccf@github.com> <BytvXQLrd23w2e_K_YnaTbAI5-HpFSq_loPL3sA33NQ=.ffce43c4-d601-49e3-a013-d741baec9852@github.com> <c7P6CcVzkWEXcWmPSaJs7_vigReuBsGop5J-Tr5AQDY=.ef8428fe-8951-4804-a81d-7e02d05564ca@github.com> <6Y8MH5mOmcBplADeqhAWSYRd3JA2Yul7UjA8D3cc-5Y=.144cefac-7c3b-4cfa-b40a-d7862282a614@github.com> Message-ID: <NR6CysHCjWX2AB1k8ccdDh5kHyNqKexk-oU5z88cNDE=.6f10c851-936c-4482-aefc-fddaefd213c6@github.com> On Mon, 1 Jul 2024 12:42:44 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote: >> I am a bit concerned about this fix. Introducing an RM into `frame::oops_interpreted_do` means we cannot assemble anything in RA in the closure code and keep the memory across the RM. But closure code is opaque to the iteration site. Do we have any safeguards against OopClosure using and retaining RA memory? (Because even if no closure does this today, this could sneak in easily) > > @tstuefe I added checks and I found there is one case in JFR code where we can allocate and retain memory from the resource area across the closure (https://github.com/openjdk/jdk/blob/d9bcf061450ebfb7fe02b5a50c855db1d9178e5d/src/hotspot/share/jfr/leakprofiler/checkpoint/objectSampleWriter.cpp#L462). It can be triggered by running tests in jdk/jfr/event/oldobject. > > There are a couple of options I can think of here: > > 1 - Fix this case and add a safeguard to prevent this allocations. Maybe have some ResetRM object before mask.iterate_oop to reset the nesting in the RA on construction and then restore it on destruction. > 2 - Allocate the _bit_mask in the C heap instead. There is a comment in InterpreterOopMap::resource_copy() that this has a significant performance cost. But we are already allocating an OopMapCacheEntry from the C heap, and this allocation for the bit_mask should be a rare case so I doubt this. > 3 - Have the callers of frame::oops_interpreted_do() declare the ResourceMark instead. I wanted to avoid this because this is an implementation detail of InterpreterOopMap. > 4 - Restore the code as it was before this change and add an object before NEW_RESOURCE_ARRAY(...) in InterpreterOopMap::resource_copy() that increments the nesting in the RA on construction and decrements it on destruction, i.e basically mark this particular allocation as okay without a RM since we check for it in ~InterpreterOopMap. > > What do you think? I don't actually see a particular reason to disallow this allocations so I'm now not convinced on 1. Hi @pchilano > @tstuefe I added checks and I found there is one case in JFR code where we can allocate and retain memory from the resource area across the closure ( > > https://github.com/openjdk/jdk/blob/d9bcf061450ebfb7fe02b5a50c855db1d9178e5d/src/hotspot/share/jfr/leakprofiler/checkpoint/objectSampleWriter.cpp#L462 > ). It can be triggered by running tests in jdk/jfr/event/oldobject. > Yes, I was afraid of something like that. Please open a follow-up bug, and chain it to 8329665, since the fix has already been downported to JDK21. We should fix this before the October update. I added a little comment to 8329665 to ensure this does not get lost. > There are a couple of options I can think of here: > > 1 - Fix this case and add a safeguard to prevent this allocations. Maybe have some ResetRM object before mask.iterate_oop to reset the nesting in the RA on construction and then restore it on destruction. Safeguarding is not so easy. You could observe the RA state at the end and compare it with the beginning, but the code may use RM in a benign way. > 2 - Allocate the _bit_mask in the C heap instead. There is a comment in InterpreterOopMap::resource_copy() that this has a significant performance cost. But we are already allocating an OopMapCacheEntry from the C heap, and this allocation for the bit_mask should be a rare case so I doubt this. I was thinking along of this line. The performance comment makes sense for lookup, but not so much here. Plus, I would wager this case is rare anyway since the InterpreterOopMap can hold what, (4*64)/2bit = 128 entries? About the other comment (` // Due to the invariants above it's tricky to allocate a temporary OopMapCacheEntry on the stack`), that is weird. I don't get it. It is ages old (comes from initial commit), so you may be more able than I to dig out the history behind it. I tried to just allocate an InterpreterOopMap entry on the stack. I did a small test, nothing bad happened. So, if this comment is not up to date, you may pay for the C-heap allocation by removing this allocation :) > 3 - Have the callers of frame::oops_interpreted_do() declare the ResourceMark instead. I wanted to avoid this because this is an implementation detail of InterpreterOopMap. Yes, I don't like this either. > 4 - Restore the code as it was before this change and add an object before NEW_RESOURCE_ARRAY(...) in InterpreterOopMap::resource_copy() that increments the nesting in the RA on construction and decrements it on destruction, i.e basically mark this particular allocation as okay without a RM since we check for it in ~InterpreterOopMap. Not a fan, overly complicated. > > What do you think? I don't actually see a particular reason to disallow this allocations so I'm now not convinced on 1. I prefer 2. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18632#issuecomment-2200093475 From ayang at openjdk.org Mon Jul 1 13:13:21 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 1 Jul 2024 13:13:21 GMT Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is failing [v2] In-Reply-To: <C7DmVK87y-6S5Ljq58--YbyaFND0i03LQEGzpn0FBrY=.fd0e1e6f-86bb-44fa-b61d-1853a48e2fd7@github.com> References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com> <C7DmVK87y-6S5Ljq58--YbyaFND0i03LQEGzpn0FBrY=.fd0e1e6f-86bb-44fa-b61d-1853a48e2fd7@github.com> Message-ID: <GXA1RAdCoZKi_8VnJ9qtOsqPCuuyhO-gfjfnlvW04d8=.17a66849-d08e-4e71-ba1d-0db0c742e691@github.com> On Fri, 28 Jun 2024 19:22:32 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> See JBS issue. >> >> It is not completely obvious what the problem is in Oracle's CI, but the current assumption is that RSS of the testee VM gets reduced after it started and before we measured due to memory pressure. >> >> The patch: >> - exposes os::available_memory via Whitebox >> - For the test to count as failed, we require a certain minimum size of available memory both before and during the start of the testee JVM. Otherwise, we throw a `SkippedException` >> >> I have some misgivings about this solution, though: >> 1) obviously, it is not bullet-proof either, since it is vulnerable to fast changes in machine memory load. >> 2) On MacOS, we have the problem that 'os::available_memory()' totally underreports how much memory is available. Therefore, as an estimate of whether the test is valid, it is too conservative. I opened https://bugs.openjdk.org/browse/JDK-8334767 to track that issue. As long as it is not fixed, the tests will likely fall below the threshold on MacOS and, therefore, be skipped. Still, this is somewhat better than outright excluding the test for MacOS (or is it? Open to opinions) >> 3) `SkippedException` leads to the test counting as "passed", not "skipped". I think that is a usability issue with jtreg. I cannot easily see which tests had been skipped due to SkippedException. >> >> Despite my doubts, I think this is the best we can come up with if we want to have such a test. >> >> Note: One way to go about (3) would be to make "minimum available memory" a `@requires` tag, similar to os.maxMemory. However, I fear that this may be easily misused and cause many tests to be excluded without notice. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Update TestAlwaysPreTouchBehavior.java Some readability suggestions. test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java line 164: > 162: if (rss < committed) { > 163: if (avail < requiredAvailableDuring) { > 164: throw new SkippedException("Not enough memory for this test (" + avail + ")"); This is essentially an early-return; why is this inside the `rss < committed` comparison? Does it work if it's lifted up? The structure I have in mind is like: if (avail < ....) { skip-test; } assert(rss >= committed, error-msg); ------------- Marked as reviewed by ayang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19803#pullrequestreview-2151337452 PR Review Comment: https://git.openjdk.org/jdk/pull/19803#discussion_r1661031216 From ayang at openjdk.org Mon Jul 1 13:13:22 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 1 Jul 2024 13:13:22 GMT Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is failing [v2] In-Reply-To: <hxa4YcsrUexAD_cqjQV9kz556tbr62l25sEdAKJf9hk=.3c1f219b-582b-46af-89c1-0b3dbcdc2c16@github.com> References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com> <C7DmVK87y-6S5Ljq58--YbyaFND0i03LQEGzpn0FBrY=.fd0e1e6f-86bb-44fa-b61d-1853a48e2fd7@github.com> <_RvHrV4nbtsAGnJq9S_98XOXciT71gMb6DfbCr6WVC0=.1278784f-b6c5-4d5e-92e6-fa21db95bc51@github.com> <hxa4YcsrUexAD_cqjQV9kz556tbr62l25sEdAKJf9hk=.3c1f219b-582b-46af-89c1-0b3dbcdc2c16@github.com> Message-ID: <j9wD-xHe-y5DPpuECd_cRVryshOhzq602wuMqqpXZi0=.389f4f4c-845e-46f4-9f46-d187a534cdf9@github.com> On Mon, 1 Jul 2024 12:37:03 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > just don't like launchers to use large heaps. Could you add a comment at the start of this file explaining the test setup, launcher creating another VM + real test flags? There, the rational for small heap (64M) can be covered as well. Some text on these fields can also help understand this test. final static long expectedMaxNonHeapRSS = M * 256; final static long requiredAvailableBefore = heapsize * 2 + expectedMaxNonHeapRSS; final static long requiredAvailableDuring = expectedMaxNonHeapRSS; >> test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java line 161: >> >>> 159: System.out.println("RSS: " + rss + " available: " + avail + " committed " + committed); >>> 160: >>> 161: if (args[0].equals("run")) { >> >> When will this branch be taken? I can't find where the `run` arg is specified. > > See line 141 I see; thanks. Can you add a comment referencing `prepareOptions`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19803#discussion_r1661021604 PR Review Comment: https://git.openjdk.org/jdk/pull/19803#discussion_r1661022484 From pchilanomate at openjdk.org Mon Jul 1 13:54:28 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 1 Jul 2024 13:54:28 GMT Subject: RFR: 8329665: fatal error: memory leak: allocating without ResourceMark [v2] In-Reply-To: <NR6CysHCjWX2AB1k8ccdDh5kHyNqKexk-oU5z88cNDE=.6f10c851-936c-4482-aefc-fddaefd213c6@github.com> References: <riJhOUThyVVgaGs098dEOrDTvXnAOjxlK48aA-yE_2s=.a235716d-5221-46c0-bd0b-ef0cf7ab9ccf@github.com> <BytvXQLrd23w2e_K_YnaTbAI5-HpFSq_loPL3sA33NQ=.ffce43c4-d601-49e3-a013-d741baec9852@github.com> <c7P6CcVzkWEXcWmPSaJs7_vigReuBsGop5J-Tr5AQDY=.ef8428fe-8951-4804-a81d-7e02d05564ca@github.com> <6Y8MH5mOmcBplADeqhAWSYRd3JA2Yul7UjA8D3cc-5Y=.144cefac-7c3b-4cfa-b40a-d7862282a614@github.com> <NR6CysHCjWX2AB1k8ccdDh5kHyNqKexk-oU5z88cNDE=.6f10c851-936c-4482-aefc-fddaefd213c6@github.com> Message-ID: <0fgUhYcCEUgcOxiB3Ot_-sc86HQTVCgTbXknZH-_cL8=.8314c987-02c7-4541-a40f-c35c3abab3f3@github.com> On Mon, 1 Jul 2024 13:05:36 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > Yes, I was afraid of something like that. Please open a follow-up bug, and chain it to 8329665, since the fix has already been downported to JDK21. We should fix this before the October update. I added a little comment to 8329665 to ensure this does not get lost. > Filed https://bugs.openjdk.org/browse/JDK-8335409. > I was thinking along of this line. > > The performance comment makes sense for lookup, but not so much here. Plus, I would wager this case is rare anyway since the InterpreterOopMap can hold what, (4*64)/2bit = 128 entries? > Yes. I actually run tiers1-6 and found there are very few methods where we hit this code path where 128 entries are not enough so we need to allocate (~ less than 10). Most are tests to exercise this corner case. One of the only legit ones I found was TimeZoneNames_xx.java.getContents(). So it is a rare case. > About the other comment (` // Due to the invariants above it's tricky to allocate a temporary OopMapCacheEntry on the stack`), that is weird. I don't get it. It is ages old (comes from initial commit), so you may be more able than I to dig out the history behind it. I tried to just allocate an InterpreterOopMap entry on the stack. I did a small test, nothing bad happened. > > So, if this comment is not up to date, you may pay for the C-heap allocation by removing this allocation :) > Sounds we should be able to allocate in the stack. I'll check this. > > What do you think? I don't actually see a particular reason to disallow this allocations so I'm now not convinced on 1. > > I prefer 2. > Okay, I was thinking the same. I'll work on this approach. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18632#issuecomment-2200217314 From aph at openjdk.org Mon Jul 1 14:11:33 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 1 Jul 2024 14:11:33 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v15] In-Reply-To: <B3QnMNkIaw59FI-SRMX1R31gCni7ibyWczmIKYZlD2Y=.c0001db2-1a44-497e-aa9f-4f62147b41f5@github.com> References: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com> <B3QnMNkIaw59FI-SRMX1R31gCni7ibyWczmIKYZlD2Y=.c0001db2-1a44-497e-aa9f-4f62147b41f5@github.com> Message-ID: <lGch3Yk2EtgNE8W8TTVqcc2oXzVMXzgfpL9rkLqtWsg=.d5b5d340-8de1-425d-b830-9b1be5e453b7@github.com> On Mon, 1 Jul 2024 09:52:48 GMT, Amit Kumar <amitkumar at openjdk.org> wrote: >> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) >> >> I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. >> >> >> Without Patch: >> >> SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op >> >> SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op >> >> Benchmark Mode Cnt Score Error Units >> SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > updates comment src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3177: > 3175: > 3176: // z_brct above doesn't change CC. > 3177: // If the search operation is unsuccessful, then it's a failure case. Suggestion: // If we reach here, then the value in r_value is not present. Set r_result to 1. "Failure" is not a good word to use here, because it's unclear what failed. src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3362: > 3360: z_bre(L_done); // success > 3361: > 3362: // look-ahead check (Bit 2), if bit-2 is also 0, we're done Suggestion: // look-ahead check: if Bit 2 is 0, we're done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1661115340 PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1661108285 From amitkumar at openjdk.org Mon Jul 1 14:11:32 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 1 Jul 2024 14:11:32 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v16] In-Reply-To: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com> References: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com> Message-ID: <bqd-eFTpYldNYZ-qOu-R7ft6O_kI6kWnT-5YIUy762E=.bb6bf3cd-f223-4234-8f71-edad7822ba68@github.com> > s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) > > I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. > > > Without Patch: > > SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op > > SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op > > Benchmark Mode Cnt Score Error Units > SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op > SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op > SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op > SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op > SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op > SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op > SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op > SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op > SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op > SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op > SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op > SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op > SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op > SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op > SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op > SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op > SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op > SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op > SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op > SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op > SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op > SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op > SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ns/op > SecondarySupersLookup.testNegative61 avgt 15 27.763 ? 1.628 ns... Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/cpu/s390/macroAssembler_s390.cpp Co-authored-by: Andrew Haley <aph-open at littlepinkcloud.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19544/files - new: https://git.openjdk.org/jdk/pull/19544/files/6d05364f..7b533b41 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=14-15 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19544.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19544/head:pull/19544 PR: https://git.openjdk.org/jdk/pull/19544 From amitkumar at openjdk.org Mon Jul 1 14:14:50 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 1 Jul 2024 14:14:50 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v17] In-Reply-To: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com> References: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com> Message-ID: <NQ1QNuTBkNsmBReCpdhY1lrdIYz9s8UiNd1As1sLQ7M=.17c8f789-2bf1-4beb-891f-debccad29164@github.com> > s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) > > I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. > > > Without Patch: > > SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op > > SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op > > Benchmark Mode Cnt Score Error Units > SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op > SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op > SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op > SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op > SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op > SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op > SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op > SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op > SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op > SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op > SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op > SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op > SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op > SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op > SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op > SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op > SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op > SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op > SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op > SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op > SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op > SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op > SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ns/op > SecondarySupersLookup.testNegative61 avgt 15 27.763 ? 1.628 ns... Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/cpu/s390/macroAssembler_s390.cpp Co-authored-by: Andrew Haley <aph-open at littlepinkcloud.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19544/files - new: https://git.openjdk.org/jdk/pull/19544/files/7b533b41..e935834b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=15-16 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19544.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19544/head:pull/19544 PR: https://git.openjdk.org/jdk/pull/19544 From mli at openjdk.org Mon Jul 1 14:17:44 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 1 Jul 2024 14:17:44 GMT Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding Message-ID: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com> Hi, Can you help to review the patch? I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it. Thanks. ## Test benchmarks run on CanVM-K230 I've tried several implementations, respectively with vector group * m2+m1 * m2 * m1 The best one is combination of m2+m1, it have best performance in all source size. ###this implementation (m2+m1) <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic -- | -- | -- | -- | -- | -- | -- | -- | -- Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063 Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443 Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364 Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856 Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146 Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423 Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862 Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469 Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631 Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698 Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876 </google-sheets-html-origin> ###vector with only m2 <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsics | Score +intrinsic, m2 | Error | Units | -intrinsic/+intrinsic -- | -- | -- | -- | -- | -- | -- | -- | -- Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.797 | 86.872 | 0.374 | ns/op | 0.9991366608 Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.971 | 94.203 | 1.918 | ns/op | 0.9975372334 Base64Encode.testBase64Encode | 3 | avgt | 10 | 122.074 | 123.978 | 1.009 | ns/op | 0.9846424366 Base64Encode.testBase64Encode | 6 | avgt | 10 | 138.999 | 138.344 | 2.175 | ns/op | 1.004734575 Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.857 | 157.494 | 1.036 | ns/op | 1.021353194 Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.511 | 154.998 | 1.727 | ns/op | 1.042019897 Base64Encode.testBase64Encode | 10 | avgt | 10 | 186.228 | 175.38 | 0.62 | ns/op | 1.061854259 Base64Encode.testBase64Encode | 48 | avgt | 10 | 408.461 | 349.558 | 15.377 | ns/op | 1.168507086 Base64Encode.testBase64Encode | 512 | avgt | 10 | 3679.283 | 1103.717 | 3.911 | ns/op | 3.333538398 Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7206.265 | 1988.927 | 224.732 | ns/op | 3.623192304 Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135695.875 | 33930.292 | 97.85 | ns/op | 3.99925456 </google-sheets-html-origin> ###vector with only m1 <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score +intrinsic, m1 | Error | Units | -intrinsic/+intrinsic -- | -- | -- | -- | -- | -- | -- | -- | -- Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.837 | 87.137 | 0.527 | ns/op | 0.9965571456 Base64Encode.testBase64Encode | 2 | avgt | 10 | 94.723 | 94.125 | 5.122 | ns/op | 1.006353254 Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.51 | 123.082 | 0.854 | ns/op | 0.9872280268 Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.045 | 137.175 | 0.201 | ns/op | 1.013632222 Base64Encode.testBase64Encode | 7 | avgt | 10 | 161.216 | 159.387 | 2.385 | ns/op | 1.011475214 Base64Encode.testBase64Encode | 9 | avgt | 10 | 160.541 | 154.19 | 1.665 | ns/op | 1.041189442 Base64Encode.testBase64Encode | 10 | avgt | 10 | 184.874 | 174.766 | 5.569 | ns/op | 1.057837337 Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.124 | 199.333 | 1.584 | ns/op | 2.032398047 Base64Encode.testBase64Encode | 512 | avgt | 10 | 3659.335 | 1185.626 | 24.686 | ns/op | 3.086415952 Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7239.269 | 2164.709 | 1022.367 | ns/op | 3.344222711 Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135048.828 | 38248.645 | 319.978 | ns/op | 3.530813392 </google-sheets-html-origin> ------------- Commit messages: - clean code - Initial commit Changes: https://git.openjdk.org/jdk/pull/19973/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19973&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314125 Stats: 245 lines in 3 files changed: 245 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19973.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19973/head:pull/19973 PR: https://git.openjdk.org/jdk/pull/19973 From fjiang at openjdk.org Mon Jul 1 14:40:26 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 1 Jul 2024 14:40:26 GMT Subject: RFR: 8335411: RISC-V: Optimize encode_heap_oop when oop is not null Message-ID: <oc-oKUicWVvFjZKiZdhlKYw9nQv9kq2zABpj-beTyxA=.79a98f53-bd18-4bdc-b08d-f21494b949a0@github.com> Hi, please review this enhancement that adds two more `encode_heap_oop_not_null` methods. Currently, `encode_heap_oop` will check if the oop pointer is `null` at first. We can skip the null check of the oop to reduce the unnecessary branch instruction when encoding non-null oop pointer into compressed form. Testing: - [x] Tier1~3 on linux-riscv64 with release build - [x] renaissance & dacapo benchmark suits for functionality ------------- Commit messages: - add encode_heap_oop_not_null for riscv Changes: https://git.openjdk.org/jdk/pull/19974/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19974&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335411 Stats: 62 lines in 3 files changed: 59 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19974.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19974/head:pull/19974 PR: https://git.openjdk.org/jdk/pull/19974 From sgehwolf at openjdk.org Mon Jul 1 14:43:58 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Mon, 1 Jul 2024 14:43:58 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v3] In-Reply-To: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com> References: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com> Message-ID: <t_jUv9-mkIFcGRInYKmcnfP0W8VwXEtflahjSUiK8zI=.d524b51c-1963-4024-87e0-b12911d475d0@github.com> > Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. > > I'm adding those tests in order to not regress another time. > > Testing: > - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. > - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) > - [x] GHA Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Fix comments - 8333446: Add tests for hierarchical container support ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19530/files - new: https://git.openjdk.org/jdk/pull/19530/files/00b528ae..22141a48 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19530&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19530&range=01-02 Stats: 26334 lines in 522 files changed: 18610 ins; 5830 del; 1894 mod Patch: https://git.openjdk.org/jdk/pull/19530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19530/head:pull/19530 PR: https://git.openjdk.org/jdk/pull/19530 From duke at openjdk.org Mon Jul 1 14:45:24 2024 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 1 Jul 2024 14:45:24 GMT Subject: RFR: 8324124: RISC-V: implement _vectorizedMismatch intrinsic In-Reply-To: <dxSBhJiLeVkLF8PvHW3MMg69vwXU0VshECCMz5HnhhI=.e0cbda8b-f7f6-44ff-806b-1f21496911be@github.com> References: <dxSBhJiLeVkLF8PvHW3MMg69vwXU0VshECCMz5HnhhI=.e0cbda8b-f7f6-44ff-806b-1f21496911be@github.com> Message-ID: <7Yi0Fbxcg9ZlXCiU6j12NsOGgNsn9Y8IA8ad9dUV3ko=.7b3f4f37-3ec9-4152-8bf6-0b57853e073d@github.com> On Wed, 7 Feb 2024 14:35:55 GMT, Yuri Gaevsky <duke at openjdk.org> wrote: > Hello All, > > Please review these changes to enable the __vectorizedMismatch_ intrinsic on RISC-V platform with RVV instructions supported. > > Thank you, > -Yuri Gaevsky > > **Correctness checks:** > hotspot/jtreg/compiler/{intrinsic/c1/c2}/ under QEMU-8.1 with RVV v1.0.0 and -XX:TieredStopAtLevel=1/2/3/4. . ------------- PR Comment: https://git.openjdk.org/jdk/pull/17750#issuecomment-2200351378 From duke at openjdk.org Mon Jul 1 15:26:38 2024 From: duke at openjdk.org (duke) Date: Mon, 1 Jul 2024 15:26:38 GMT Subject: RFR: 8280481: Duplicated stubs to interpreter for static calls [v2] In-Reply-To: <SEOOihMeBukAoQInq9Lt5xDYNB037oRdkZc8F9R4_FA=.793f18eb-4572-43d7-ad15-6cf09b27576a@github.com> References: <9N1GcHDRvyX1bnPrRcyw96zWIgrrAm4mfrzp8dQ-BBk=.6d55c5fd-7d05-4058-99b6-7d40a92450bf@github.com> <SEOOihMeBukAoQInq9Lt5xDYNB037oRdkZc8F9R4_FA=.793f18eb-4572-43d7-ad15-6cf09b27576a@github.com> Message-ID: <XgfLEbpFT_CgKaZwmScqPMcPyOLdEkl1pqachAXXTn4=.394d33c0-58b7-405f-ab52-fe3078351613@github.com> On Wed, 29 Jun 2022 14:50:59 GMT, Evgeny Astigeevich <eastigeevich at openjdk.org> wrote: >> ## Problem >> Calls of Java methods have stubs to the interpreter for the cases when an invoked Java method is not compiled. Calls of static Java methods and final Java methods have statically bound information about a callee during compilation. Such calls can share stubs to the interpreter. >> >> Each stub to the interpreter has a relocation record (accessed via `relocInfo`) which provides the address of the stub and the address of its owner. `relocInfo` has an offset which is an offset from the previously known relocatable address. The address of a stub is calculated as the address provided by the previous `relocInfo` plus the offset. >> >> Each Java call has: >> - A relocation for a call site. >> - A relocation for a stub to the interpreter. >> - A stub to the interpreter. >> - If far jumps are used (arm64 case): >> - A trampoline relocation. >> - A trampoline. >> >> We cannot avoid creating relocations. They are needed to support patching call sites. >> With shared stubs there will be multiple relocations having the same stub address but different owners' addresses. >> If we try to generate relocations as we go there will be a case which requires negative offsets: >> >> reloc1 ---> 0x0: stub1 >> reloc2 ---> 0x4: stub2 (reloc2.addr = reloc1.addr + reloc2.offset = 0x0 + 4) >> reloc3 ---> 0x0: stub1 (reloc3.addr = reloc2.addr + reloc3.offset = 0x4 - 4) >> >> >> `CodeSection` does not support negative offsets. It [assumes](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/asm/codeBuffer.hpp#L195) addresses relocations pointing at grow upward. >> Negative offsets reduce the offset range by half. This can increase filler records, the empty `relocInfo` records to reduce offset values. Also negative offsets are only needed for `static_stub_type`, but other 13 types don?t need them. >> >> ## Solution >> In this PR creation of stubs is done in two stages. First we collect requests for creating shared stubs: a callee `ciMethod*` and an offset of a call in `CodeBuffer` (see [src/hotspot/share/asm/codeBuffer.hpp](https://github.com/openjdk/jdk/pull/8816/files#diff-deb8ab083311ba60c0016dc34d6518579bbee4683c81e8d348982bac897fe8ae)). Then we have the finalisation phase (see [src/hotspot/share/ci/ciEnv.cpp](https://github.com/openjdk/jdk/pull/8816/files#diff-7c032de54e85754d39e080fd24d49b7469543b163f54229eb0631c6b1bf26450)), where `CodeBuffer::finalize_stubs()` creates shared stubs in `CodeBuffer`: a stub and multiple relocations sharing it. The first relocation will ... > > Evgeny Astigeevich has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: > > - Merge branch 'master' into JDK-8280481C > - Use call offset instead of caller pc > - Simplify test > - Fix x86 build failures > - Remove UseSharedStubs and clarify shared stub use cases > - Make SharedStubToInterpRequest ResourceObj and set initial size of SharedStubToInterpRequests to 8 > - Update copyright year and add Unimplemented guards > - Set UseSharedStubs to true for X86 > - Set UseSharedStubs to true for AArch64 > - Fix x86 build failure > - ... and 10 more: https://git.openjdk.org/jdk/compare/073960fa...da3bfb5b @eastig Your change (at version da3bfb5b86dd272a0bf3919ea710e12b2fd66bcc) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/8816#issuecomment-1173782159 From eastigeevich at openjdk.org Mon Jul 1 15:26:38 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 1 Jul 2024 15:26:38 GMT Subject: RFR: 8280481: Duplicated stubs to interpreter for static calls In-Reply-To: <rl8hny_U1ThjuAiduYmmsoGqN343SqZv9v6b3M7Ugiw=.a03e5ac9-668b-4d94-a2d5-54dfdcc8eb3a@github.com> References: <9N1GcHDRvyX1bnPrRcyw96zWIgrrAm4mfrzp8dQ-BBk=.6d55c5fd-7d05-4058-99b6-7d40a92450bf@github.com> <t1Gigc1XLRETSXriG2Bw9-zZGbaJxxIj-Hda3sRBGf8=.60774236-39cc-4f39-b757-076d33af675b@github.com> <B8WuOWl_39ppfyZuz8fAoCmlwSPvDzLFF1ikhF-N0S8=.fd498ddb-7854-4616-802c-a0675dbb031c@github.com> <ONVSGWta7auQsl98toVizl8e6L9USdeZNzwdrg48gmQ=.d71dd705-43cb-43b7-a3f1-a18e40ec103f@github.com> <eCO5C4rdv0svuNfSPaRfva15HuqEzUTzJ08Fc2gMjF0=.df9bf477-2fb1-4172-8559-4f6b8cf26a52@github.com> <GeRhcXEA5_PBba-cRJ5sWHFvrrjgGUGO0cVv82tPn5g=.d8372a44-afa6-44dc-87be-8a305ea7b9e7@github.com> <SaHwtSn1Sh0Q4fvFbbrDhWCvRIXbk2z7ka13ezI9PyQ=.7ab81aa0-678e-4108-838c-5757ad1aa2ac@github.com> <rl8hny_U1ThjuAiduYmmsoGqN343SqZv9v6b3M7Ugiw=.a03e5ac9-668b-4d94-a2d5-54dfdcc8eb3a@github.com> Message-ID: <7L47Ho5J_hz_XkmE9M25CtLwtRdSJ0k3INrQQx6YZ0U=.b4fe9454-fdd5-4513-ab77-3eda4ad4ad09@github.com> On Wed, 12 Jun 2024 07:17:25 GMT, Doug Simon <dnsimon at openjdk.org> wrote: >>> Hi @eastig , >>> I would like to recurring your experimental data and I would be very grateful if you could provide a small patch to help me get the result of `Saved bytes` and `Nmethods with shared stubs`. >>> Thank you! >> >> >> diff --git a/src/hotspot/share/asm/codeBuffer.inline.hpp b/src/hotspot/share/asm/codeBuffer.inline.hpp >> index 045cff13f25..9af26730cbd 100644 >> --- a/src/hotspot/share/asm/codeBuffer.inline.hpp >> +++ b/src/hotspot/share/asm/codeBuffer.inline.hpp >> @@ -45,6 +45,7 @@ bool emit_shared_stubs_to_interp(CodeBuffer* cb, SharedStubToInterpRequests* sha >> }; >> shared_stub_to_interp_requests->sort(by_shared_method); >> MacroAssembler masm(cb); >> + bool has_shared = false; >> for (int i = 0; i < shared_stub_to_interp_requests->length();) { >> address stub = masm.start_a_stub(CompiledStaticCall::to_interp_stub_size()); >> if (stub == NULL) { >> @@ -53,13 +54,22 @@ bool emit_shared_stubs_to_interp(CodeBuffer* cb, SharedStubToInterpRequests* sha >> } >> >> ciMethod* method = shared_stub_to_interp_requests->at(i).shared_method(); >> + int shared = 0; >> do { >> address caller_pc = cb->insts_begin() + shared_stub_to_interp_requests->at(i).call_offset(); >> masm.relocate(static_stub_Relocation::spec(caller_pc), relocate_format); >> ++i; >> + ++shared; >> } while (i < shared_stub_to_interp_requests->length() && shared_stub_to_interp_requests->at(i).shared_method() == method); >> masm.emit_static_call_stub(); >> masm.end_a_stub(); >> + if (UseNewCode && shared > 1) { >> + has_shared = true; >> + tty->print_cr("Saved: %d", (shared - 1) * CompiledStaticCall::to_interp_stub_size()); >> + } >> + } >> + if (has_shared) { >> + tty->print_cr("nm_has_shared"); >> } >> return true; >> } >> >> >> You will need to use `-XX:+UseNewCode` in your runs. >> `grep nm_has_shared run.log | wc -l` is a number of nmethods having a shared stub. >> `grep Saved: run.log | awk '{print $2}' | grep -o '[0-9]*' | paste -s -d+ - | bc` prints a number of saved bytes. > > @eastig as I understand, this optimization is about saving code cache memory. The sharing is within an nmethod, not across nmethods, correct? > I'm trying to prioritize an effort to adopt this optimization in Graal. In addition to the numbers you present for code cache bytes saved in the benchmarks, can you say anything about how much that is relative to the code cache used in the benchmarks? > > More context: https://github.com/openjdk/jdk/pull/19672 Hi @dougxc > @eastig as I understand, this optimization is about saving code cache memory. The sharing is within an nmethod, not across nmethods, correct? Yes, you are correct. Sharing across nmethods would need non-simple maintanance, maybe similar to the inline cache. I am not sure it is worth. > In addition to the numbers you present for code cache bytes saved in the benchmarks, can you say anything about how much that is relative to the code cache used in the benchmarks? In a service application with more than 50 000 nmethods I saw between 1% - 2% reduction in CodeCache usage. ------------- PR Comment: https://git.openjdk.org/jdk/pull/8816#issuecomment-2200450633 From mli at openjdk.org Mon Jul 1 15:33:29 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 1 Jul 2024 15:33:29 GMT Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v2] In-Reply-To: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com> References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com> Message-ID: <i74xW_pCw7qGaDg6Dk9VokHRJiyhMFQ5PDz8Mi0BLr4=.939e76e4-caa2-4c9f-b33a-f29c901fc193@github.com> > Hi, > Can you help to review the patch? > > I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it. > > Thanks. > > ## Test > benchmarks run on CanVM-K230 > > I've tried several implementations, respectively with vector group > * m2+m1 > * m2 > * m1 > The best one is combination of m2+m1, it have best performance in all source size. > > ###this implementation (m2+m1) > <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> > Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic > -- | -- | -- | -- | -- | -- | -- | -- | -- > Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063 > Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443 > Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364 > Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856 > Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146 > Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423 > Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862 > Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469 > Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631 > Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698 > Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876 > > </google-sheets-html-origin> > > ###vector with only m2 > <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -web... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: use pure scalar version when rvv is not supported ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19973/files - new: https://git.openjdk.org/jdk/pull/19973/files/fc32d9fa..cf732984 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19973&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19973&range=00-01 Stats: 13 lines in 2 files changed: 4 ins; 7 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19973.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19973/head:pull/19973 PR: https://git.openjdk.org/jdk/pull/19973 From mli at openjdk.org Mon Jul 1 15:38:18 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 1 Jul 2024 15:38:18 GMT Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v2] In-Reply-To: <i74xW_pCw7qGaDg6Dk9VokHRJiyhMFQ5PDz8Mi0BLr4=.939e76e4-caa2-4c9f-b33a-f29c901fc193@github.com> References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com> <i74xW_pCw7qGaDg6Dk9VokHRJiyhMFQ5PDz8Mi0BLr4=.939e76e4-caa2-4c9f-b33a-f29c901fc193@github.com> Message-ID: <S-BpiX60ySY6FNDfcskTHuuDsQQIno54AaOvSFlm67c=.24e8cf29-de2c-4f8e-bcdb-7cd1c7927c30@github.com> On Mon, 1 Jul 2024 15:33:29 GMT, Hamlin Li <mli at openjdk.org> wrote: >> Hi, >> Can you help to review the patch? >> >> I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it. >> >> Thanks. >> >> ## Test >> benchmarks run on CanVM-K230 >> >> I've tried several implementations, respectively with vector group >> * m2+m1+scalar >> * m2+scalar >> * m1+scalar >> * pure scalar >> The best one is combination of m2+m1, it have best performance in all source size. >> >> this implementation (m2+m1) >> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> >> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic >> -- | -- | -- | -- | -- | -- | -- | -- | -- >> Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063 >> Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443 >> Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364 >> Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856 >> Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146 >> Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423 >> Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862 >> Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469 >> Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631 >> Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698 >> Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876 >> >> </google-sheets-html-origin> >> >> vector with only m2 >> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: st... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > use pure scalar version when rvv is not supported with pure scalar impelmentation, it also bring some performance imrpovement in all source size, so also enable the intrinsic when rvv is not supported. performance data <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score +instrinsic, scalar | Error | Units | Perf opt -- | -- | -- | -- | -- | -- | -- | -- | -- Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.75 | 0.38 | ns/op | 1 Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.71 | 93.824 | 1.954 | ns/op | 0.999 Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.824 | 123.487 | 0.559 | ns/op | 0.987 Base64Encode.testBase64Encode | 6 | avgt | 10 | 138.984 | 137.697 | 0.273 | ns/op | 1.009 Base64Encode.testBase64Encode | 7 | avgt | 10 | 161.243 | 157.696 | 0.875 | ns/op | 1.022 Base64Encode.testBase64Encode | 9 | avgt | 10 | 169.724 | 155.223 | 1.908 | ns/op | 1.093 Base64Encode.testBase64Encode | 10 | avgt | 10 | 185.92 | 176.339 | 5.875 | ns/op | 1.054 Base64Encode.testBase64Encode | 48 | avgt | 10 | 408.467 | 347.269 | 1.799 | ns/op | 1.176 Base64Encode.testBase64Encode | 512 | avgt | 10 | 3665.34 | 2718.442 | 26.954 | ns/op | 1.348 Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7022.025 | 5290.003 | 33.216 | ns/op | 1.327 Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135819.7 | 101988.94 | 2209.887 | ns/op | 1.332 </google-sheets-html-origin> ------------- PR Comment: https://git.openjdk.org/jdk/pull/19973#issuecomment-2200477845 From dnsimon at openjdk.org Mon Jul 1 15:41:33 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 1 Jul 2024 15:41:33 GMT Subject: RFR: 8280481: Duplicated stubs to interpreter for static calls [v2] In-Reply-To: <SEOOihMeBukAoQInq9Lt5xDYNB037oRdkZc8F9R4_FA=.793f18eb-4572-43d7-ad15-6cf09b27576a@github.com> References: <9N1GcHDRvyX1bnPrRcyw96zWIgrrAm4mfrzp8dQ-BBk=.6d55c5fd-7d05-4058-99b6-7d40a92450bf@github.com> <SEOOihMeBukAoQInq9Lt5xDYNB037oRdkZc8F9R4_FA=.793f18eb-4572-43d7-ad15-6cf09b27576a@github.com> Message-ID: <vaLmP2lZkrQoRcSEfIHkELyRUoyFgSS7H9DtJ8uqf8c=.c4e50a71-602e-4f36-a3d7-b36ecf0a640c@github.com> On Wed, 29 Jun 2022 14:50:59 GMT, Evgeny Astigeevich <eastigeevich at openjdk.org> wrote: >> ## Problem >> Calls of Java methods have stubs to the interpreter for the cases when an invoked Java method is not compiled. Calls of static Java methods and final Java methods have statically bound information about a callee during compilation. Such calls can share stubs to the interpreter. >> >> Each stub to the interpreter has a relocation record (accessed via `relocInfo`) which provides the address of the stub and the address of its owner. `relocInfo` has an offset which is an offset from the previously known relocatable address. The address of a stub is calculated as the address provided by the previous `relocInfo` plus the offset. >> >> Each Java call has: >> - A relocation for a call site. >> - A relocation for a stub to the interpreter. >> - A stub to the interpreter. >> - If far jumps are used (arm64 case): >> - A trampoline relocation. >> - A trampoline. >> >> We cannot avoid creating relocations. They are needed to support patching call sites. >> With shared stubs there will be multiple relocations having the same stub address but different owners' addresses. >> If we try to generate relocations as we go there will be a case which requires negative offsets: >> >> reloc1 ---> 0x0: stub1 >> reloc2 ---> 0x4: stub2 (reloc2.addr = reloc1.addr + reloc2.offset = 0x0 + 4) >> reloc3 ---> 0x0: stub1 (reloc3.addr = reloc2.addr + reloc3.offset = 0x4 - 4) >> >> >> `CodeSection` does not support negative offsets. It [assumes](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/asm/codeBuffer.hpp#L195) addresses relocations pointing at grow upward. >> Negative offsets reduce the offset range by half. This can increase filler records, the empty `relocInfo` records to reduce offset values. Also negative offsets are only needed for `static_stub_type`, but other 13 types don?t need them. >> >> ## Solution >> In this PR creation of stubs is done in two stages. First we collect requests for creating shared stubs: a callee `ciMethod*` and an offset of a call in `CodeBuffer` (see [src/hotspot/share/asm/codeBuffer.hpp](https://github.com/openjdk/jdk/pull/8816/files#diff-deb8ab083311ba60c0016dc34d6518579bbee4683c81e8d348982bac897fe8ae)). Then we have the finalisation phase (see [src/hotspot/share/ci/ciEnv.cpp](https://github.com/openjdk/jdk/pull/8816/files#diff-7c032de54e85754d39e080fd24d49b7469543b163f54229eb0631c6b1bf26450)), where `CodeBuffer::finalize_stubs()` creates shared stubs in `CodeBuffer`: a stub and multiple relocations sharing it. The first relocation will ... > > Evgeny Astigeevich has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: > > - Merge branch 'master' into JDK-8280481C > - Use call offset instead of caller pc > - Simplify test > - Fix x86 build failures > - Remove UseSharedStubs and clarify shared stub use cases > - Make SharedStubToInterpRequest ResourceObj and set initial size of SharedStubToInterpRequests to 8 > - Update copyright year and add Unimplemented guards > - Set UseSharedStubs to true for X86 > - Set UseSharedStubs to true for AArch64 > - Fix x86 build failure > - ... and 10 more: https://git.openjdk.org/jdk/compare/134ea4b0...da3bfb5b Ok, thanks for the numbers. Was there any noticeable increase in throughput (or any other interesting metrics)? ------------- PR Comment: https://git.openjdk.org/jdk/pull/8816#issuecomment-2200483943 From eastigeevich at openjdk.org Mon Jul 1 15:45:32 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 1 Jul 2024 15:45:32 GMT Subject: RFR: 8280481: Duplicated stubs to interpreter for static calls [v2] In-Reply-To: <vaLmP2lZkrQoRcSEfIHkELyRUoyFgSS7H9DtJ8uqf8c=.c4e50a71-602e-4f36-a3d7-b36ecf0a640c@github.com> References: <9N1GcHDRvyX1bnPrRcyw96zWIgrrAm4mfrzp8dQ-BBk=.6d55c5fd-7d05-4058-99b6-7d40a92450bf@github.com> <SEOOihMeBukAoQInq9Lt5xDYNB037oRdkZc8F9R4_FA=.793f18eb-4572-43d7-ad15-6cf09b27576a@github.com> <vaLmP2lZkrQoRcSEfIHkELyRUoyFgSS7H9DtJ8uqf8c=.c4e50a71-602e-4f36-a3d7-b36ecf0a640c@github.com> Message-ID: <JSLKkazCvQcLa9xigxiNCgM0gAAmlXjbcGhV9WMI8dQ=.79987e73-72b0-4ff5-99f4-52e2a30aa173@github.com> On Mon, 1 Jul 2024 15:39:05 GMT, Doug Simon <dnsimon at openjdk.org> wrote: > Was there any noticeable increase in throughput (or any other interesting metrics)? Nothing I can recall. I did not test it alone for performance. ------------- PR Comment: https://git.openjdk.org/jdk/pull/8816#issuecomment-2200492049 From mli at openjdk.org Mon Jul 1 16:54:55 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 1 Jul 2024 16:54:55 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v9] In-Reply-To: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> Message-ID: <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com> > Hi, > Can you help to review the patch? > This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294). > > Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time. > Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk. > > Besides of the code changes, one important task is to handle the legal process. > > Thanks! > > ## Performance > NOTE: > * `Src` means implementation in this pr, i.e. without depenency on external sleef. > * `Disabled` means disable intrinsics by `-XX:-UseVectorStubs` > * `system_sleef` means implementation in [previous pr 18294](https://github.com/openjdk/jdk/pull/18294), i.e. build and run jdk with depenency on external sleef. > > Basically, the perf data below shows that > * this implementation has better performance than previous version in [pr 18294](https://github.com/openjdk/jdk/pull/18294), > * and both sleef versions has much better performance compared with non-sleef version. > > |Benchmark |(size)|Src |Units|system_sleef|(system_sleef-Src)/Src|Diabled |(Disable-Src)/Src| > |------------------------------|------|---------|-----|------------|----------------------|---------|-----------------| > |3472:Double128Vector.ACOS |1024 |8546.842 |ns/op|8516.007 |-0.004 |16799.273|0.966 | > |3473:Double128Vector.ASIN |1024 |6864.656 |ns/op|6987.328 |0.018 |16602.442|1.419 | > |3474:Double128Vector.ATAN |1024 |11489.255|ns/op|12261.800 |0.067 |26329.320|1.292 | > |3475:Double128Vector.ATAN2 |1024 |16661.170|ns/op|17234.472 |0.034 |42084.100|1.526 | > |3476:Double128Vector.CBRT |1024 |18999.387|ns/op|20298.458 |0.068 |35998.688|0.895 | > |3477:Double128Vector.COS |1024 |14081.857|ns/op|14846.117 |0.054 |24420.692|0.734 | > |3478:Double128Vector.COSH |1024 |12202.306|ns/op|12237.772 |0.003 |21343.863|0.749 | > |3479:Double128Vector.EXP |1024 |4553.108 |ns/op|4777.638 |0.049 |20155.903|3.427 | > |3480:D... Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits: - Merge branch 'master' into sleef-aarch64-integrate-source - merge master - sleef 3.6.1 for riscv - sleef 3.6.1 - update header files for arm - add inline header file for riscv64 - remove notes about sleef changes - fix performance issue - disable unused-function warnings; add log msg - minor - ... and 23 more: https://git.openjdk.org/jdk/compare/2f4f6cc3...b54fc863 ------------- Changes: https://git.openjdk.org/jdk/pull/18605/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18605&range=08 Stats: 21668 lines in 21 files changed: 21624 ins; 1 del; 43 mod Patch: https://git.openjdk.org/jdk/pull/18605.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18605/head:pull/18605 PR: https://git.openjdk.org/jdk/pull/18605 From luhenry at openjdk.org Mon Jul 1 17:16:22 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 1 Jul 2024 17:16:22 GMT Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v2] In-Reply-To: <i74xW_pCw7qGaDg6Dk9VokHRJiyhMFQ5PDz8Mi0BLr4=.939e76e4-caa2-4c9f-b33a-f29c901fc193@github.com> References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com> <i74xW_pCw7qGaDg6Dk9VokHRJiyhMFQ5PDz8Mi0BLr4=.939e76e4-caa2-4c9f-b33a-f29c901fc193@github.com> Message-ID: <hvqUkBLtcQL_zyScuU4YzgupWTLVXQDrgNGCtPRahQ8=.089771da-3a1b-4ff7-bf55-16b801b8ed11@github.com> On Mon, 1 Jul 2024 15:33:29 GMT, Hamlin Li <mli at openjdk.org> wrote: >> Hi, >> Can you help to review the patch? >> >> I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it. >> >> Thanks. >> >> ## Test >> benchmarks run on CanVM-K230 >> >> I've tried several implementations, respectively with vector group >> * m2+m1+scalar >> * m2+scalar >> * m1+scalar >> * pure scalar >> The best one is combination of m2+m1, it have best performance in all source size. >> >> this implementation (m2+m1) >> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> >> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic >> -- | -- | -- | -- | -- | -- | -- | -- | -- >> Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063 >> Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443 >> Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364 >> Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856 >> Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146 >> Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423 >> Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862 >> Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469 >> Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631 >> Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698 >> Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876 >> >> </google-sheets-html-origin> >> >> vector with only m2 >> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: st... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > use pure scalar version when rvv is not supported Changes requested by luhenry (Committer). src/hotspot/cpu/riscv/assembler_riscv.hpp line 1828: > 1826: } > 1827: > 1828: // Vector Unit-Stride Instructions Suggestion: // Vector Unit-Stride Load Instructions src/hotspot/cpu/riscv/assembler_riscv.hpp line 1831: > 1829: INSN(vlseg3e8_v, 0b0000111, 0b000, 0b00000, 0b00, 0b0, g3); > 1830: > 1831: INSN(vsseg4e8_v, 0b0100111, 0b000, 0b00000, 0b00, 0b0, g4); Suggestion: // Vector Unit-Stride Store Instructions INSN(vsseg4e8_v, 0b0100111, 0b000, 0b00000, 0b00, 0b0, g4); src/hotspot/cpu/riscv/assembler_riscv.hpp line 1832: > 1830: > 1831: INSN(vsseg4e8_v, 0b0100111, 0b000, 0b00000, 0b00, 0b0, g4); > 1832: #undef INSN Blank like before the `#undef INSN` src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5115: > 5113: * NOTE: each field will occupy a vector register group > 5114: */ > 5115: void encodeVector(Register src, Register dst, Register codec, Register step, Suggestion: void generate_base64_encodeVector(Register src, Register dst, Register codec, Register step, src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5230: > 5228: > 5229: // vector version > 5230: { You should not even generate the vectorized code if `UseRVV` is false. You can then remove https://github.com/openjdk/jdk/pull/19973/files#diff-97f199af6d1c8c17b2fa4f50eb1bbc0081858cc59a899f32792a2d31f933ccc4R5225-R5227 Suggestion: if (UseRVV) { src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5263: > 5261: > 5262: // scalar version > 5263: __ BIND(ProcessScalar); You can move that in the previous block at https://github.com/openjdk/jdk/pull/19973/files#diff-97f199af6d1c8c17b2fa4f50eb1bbc0081858cc59a899f32792a2d31f933ccc4R5260 as it's the only block where it's used. ------------- PR Review: https://git.openjdk.org/jdk/pull/19973#pullrequestreview-2151861929 PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1661333608 PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1661333775 PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1661333402 PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1661334220 PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1661335985 PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1661336651 From tschatzl at openjdk.org Tue Jul 2 07:47:58 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 2 Jul 2024 07:47:58 GMT Subject: RFR: 8331385: G1: Prefix HeapRegion helper classes with G1 Message-ID: <q2rzIb9CIlSji4pbk0GdDk-y6jrRgZCsvNFkrYI4CJM=.136951b5-f2bc-4169-83dc-b44d20b42f07@github.com> Hi all, after [JDK-8330694](https://bugs.openjdk.org/browse/JDK-8330694) which renamed HeapRegion to G1HeapRegion, there were a few related helper classes in this CR that were not renamed. It's purely mechanical renaming without even further renaming of files etc. This change updates them. (Fwiw, the "Viewed" checkbox at the top right of the file change helps a lot review this change incrementally) Testing: tier1, tier4, tier5 Thanks, Thomas ------------- Commit messages: - 8331385 Changes: https://git.openjdk.org/jdk/pull/19967/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19967&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331385 Stats: 887 lines in 68 files changed: 163 ins; 165 del; 559 mod Patch: https://git.openjdk.org/jdk/pull/19967.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19967/head:pull/19967 PR: https://git.openjdk.org/jdk/pull/19967 From fyang at openjdk.org Tue Jul 2 08:08:17 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 2 Jul 2024 08:08:17 GMT Subject: RFR: 8335411: RISC-V: Optimize encode_heap_oop when oop is not null In-Reply-To: <oc-oKUicWVvFjZKiZdhlKYw9nQv9kq2zABpj-beTyxA=.79a98f53-bd18-4bdc-b08d-f21494b949a0@github.com> References: <oc-oKUicWVvFjZKiZdhlKYw9nQv9kq2zABpj-beTyxA=.79a98f53-bd18-4bdc-b08d-f21494b949a0@github.com> Message-ID: <lIVaw53Tr_xZacCfEuFiDJdmau6xpqBPQ33vqcAMlWg=.94875ebb-c758-4cd3-b2f6-2d6c9b5602cf@github.com> On Mon, 1 Jul 2024 14:32:03 GMT, Feilong Jiang <fjiang at openjdk.org> wrote: > Hi, please review this enhancement that adds two more `encode_heap_oop_not_null` methods. > > Currently, `encode_heap_oop` will check if the oop pointer is `null` at first. We can skip the null check of the oop to reduce the unnecessary branch instruction when encoding non-null oop pointer into compressed form. > > > Testing: > - [x] Tier1~3 on linux-riscv64 with release build > - [x] renaissance & dacapo benchmark suits for functionality Looks good. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19974#pullrequestreview-2153023485 From ayang at openjdk.org Tue Jul 2 10:24:18 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 2 Jul 2024 10:24:18 GMT Subject: RFR: 8331385: G1: Prefix HeapRegion helper classes with G1 In-Reply-To: <q2rzIb9CIlSji4pbk0GdDk-y6jrRgZCsvNFkrYI4CJM=.136951b5-f2bc-4169-83dc-b44d20b42f07@github.com> References: <q2rzIb9CIlSji4pbk0GdDk-y6jrRgZCsvNFkrYI4CJM=.136951b5-f2bc-4169-83dc-b44d20b42f07@github.com> Message-ID: <BbCLtLUIqyaA9lNeheVeZJV2fb49kWP2p5t8vRAJ1Uw=.6f72af7b-c5dd-4814-95b4-04e91f32b2c7@github.com> On Mon, 1 Jul 2024 09:35:00 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote: > Hi all, > > after [JDK-8330694](https://bugs.openjdk.org/browse/JDK-8330694) which renamed HeapRegion to G1HeapRegion, there were a few related helper classes in this CR that were not renamed. > > It's purely mechanical renaming without even further renaming of files etc. > > This change updates them. > > (Fwiw, the "Viewed" checkbox at the top right of the file change helps a lot review this change incrementally) > > Testing: tier1, tier4, tier5 > > Thanks, > Thomas Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19967#pullrequestreview-2153390622 From mli at openjdk.org Tue Jul 2 13:53:33 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 2 Jul 2024 13:53:33 GMT Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v3] In-Reply-To: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com> References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com> Message-ID: <xvE-5_bUzUxvsTKVf5g470H8_CDVwIIla5D8hrU3vBI=.0e946b91-eb68-4239-b99a-4f9d3a9282c6@github.com> > Hi, > Can you help to review the patch? > > I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it. > > Thanks. > > ## Test > benchmarks run on CanVM-K230 > > I've tried several implementations, respectively with vector group > * m2+m1+scalar > * m2+scalar > * m1+scalar > * pure scalar > The best one is combination of m2+m1, it have best performance in all source size. > > this implementation (m2+m1) > <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> > Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic > -- | -- | -- | -- | -- | -- | -- | -- | -- > Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063 > Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443 > Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364 > Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856 > Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146 > Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423 > Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862 > Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469 > Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631 > Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698 > Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876 > > </google-sheets-html-origin> > > vector with only m2 > <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: ... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: refine code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19973/files - new: https://git.openjdk.org/jdk/pull/19973/files/cf732984..264b354b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19973&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19973&range=01-02 Stats: 11 lines in 2 files changed: 2 ins; 4 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/19973.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19973/head:pull/19973 PR: https://git.openjdk.org/jdk/pull/19973 From mli at openjdk.org Tue Jul 2 13:53:34 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 2 Jul 2024 13:53:34 GMT Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v2] In-Reply-To: <hvqUkBLtcQL_zyScuU4YzgupWTLVXQDrgNGCtPRahQ8=.089771da-3a1b-4ff7-bf55-16b801b8ed11@github.com> References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com> <i74xW_pCw7qGaDg6Dk9VokHRJiyhMFQ5PDz8Mi0BLr4=.939e76e4-caa2-4c9f-b33a-f29c901fc193@github.com> <hvqUkBLtcQL_zyScuU4YzgupWTLVXQDrgNGCtPRahQ8=.089771da-3a1b-4ff7-bf55-16b801b8ed11@github.com> Message-ID: <J4UUOh8pLl6AFLHJk-ee4yt6NDUCpLJ6S17vBeJbkDk=.385a1bc1-c432-4a72-9f7f-467ce446481a@github.com> On Mon, 1 Jul 2024 17:13:07 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> use pure scalar version when rvv is not supported > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5230: > >> 5228: >> 5229: // vector version >> 5230: { > > You should not even generate the vectorized code if `UseRVV` is false. You can then remove https://github.com/openjdk/jdk/pull/19973/files#diff-97f199af6d1c8c17b2fa4f50eb1bbc0081858cc59a899f32792a2d31f933ccc4R5225-R5227 > > Suggestion: > > if (UseRVV) { good catch, thanks! > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5263: > >> 5261: >> 5262: // scalar version >> 5263: __ BIND(ProcessScalar); > > You can move that in the previous block at https://github.com/openjdk/jdk/pull/19973/files#diff-97f199af6d1c8c17b2fa4f50eb1bbc0081858cc59a899f32792a2d31f933ccc4R5260 as it's the only block where it's used. I think that block is for vector vesion only. Or maybe I misunderstood you? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1662575854 PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1662577899 From luhenry at openjdk.org Tue Jul 2 13:57:23 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 2 Jul 2024 13:57:23 GMT Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v2] In-Reply-To: <J4UUOh8pLl6AFLHJk-ee4yt6NDUCpLJ6S17vBeJbkDk=.385a1bc1-c432-4a72-9f7f-467ce446481a@github.com> References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com> <i74xW_pCw7qGaDg6Dk9VokHRJiyhMFQ5PDz8Mi0BLr4=.939e76e4-caa2-4c9f-b33a-f29c901fc193@github.com> <hvqUkBLtcQL_zyScuU4YzgupWTLVXQDrgNGCtPRahQ8=.089771da-3a1b-4ff7-bf55-16b801b8ed11@github.com> <J4UUOh8pLl6AFLHJk-ee4yt6NDUCpLJ6S17vBeJbkDk=.385a1bc1-c432-4a72-9f7f-467ce446481a@github.com> Message-ID: <ZdaJ73eas0VRALPh57TZi61e7gS1V6nzsMiXZCRKOmk=.f62e2422-15a7-4f81-a86f-04262d7ae3c3@github.com> On Tue, 2 Jul 2024 13:51:02 GMT, Hamlin Li <mli at openjdk.org> wrote: >> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5263: >> >>> 5261: >>> 5262: // scalar version >>> 5263: __ BIND(ProcessScalar); >> >> You can move that in the previous block at https://github.com/openjdk/jdk/pull/19973/files#diff-97f199af6d1c8c17b2fa4f50eb1bbc0081858cc59a899f32792a2d31f933ccc4R5260 as it's the only block where it's used. > > I think that block is for vector vesion only. Or maybe I misunderstood you? That `ProcessScalar` Label is only ever jumped to if we're in the `UseRVV` block, so please move that `__ BIND(ProcessScalar)` to L5256, inside the `if (UseRVV) { ... }` block. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1662583734 From mli at openjdk.org Tue Jul 2 14:11:20 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 2 Jul 2024 14:11:20 GMT Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v2] In-Reply-To: <ZdaJ73eas0VRALPh57TZi61e7gS1V6nzsMiXZCRKOmk=.f62e2422-15a7-4f81-a86f-04262d7ae3c3@github.com> References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com> <i74xW_pCw7qGaDg6Dk9VokHRJiyhMFQ5PDz8Mi0BLr4=.939e76e4-caa2-4c9f-b33a-f29c901fc193@github.com> <hvqUkBLtcQL_zyScuU4YzgupWTLVXQDrgNGCtPRahQ8=.089771da-3a1b-4ff7-bf55-16b801b8ed11@github.com> <J4UUOh8pLl6AFLHJk-ee4yt6NDUCpLJ6S17vBeJbkDk=.385a1bc1-c432-4a72-9f7f-467ce446481a@github.com> <ZdaJ73eas0VRALPh57TZi61e7gS1V6nzsMiXZCRKOmk=.f62e2422-15a7-4f81-a86f-04262d7ae3c3@github.com> Message-ID: <aS4fTbrCKE8KdlaRhcxzlrtcXb4-tTIOiHuVgppdz4s=.a231d455-5295-4e8c-8b54-04b9708ee981@github.com> On Tue, 2 Jul 2024 13:54:43 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >> I think that block is for vector vesion only. Or maybe I misunderstood you? > > That `ProcessScalar` Label is only ever jumped to if we're in the `UseRVV` block, so please move that `__ BIND(ProcessScalar)` to L5256, inside the `if (UseRVV) { ... }` block. I see, You're right! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1662607335 From mli at openjdk.org Tue Jul 2 14:16:35 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 2 Jul 2024 14:16:35 GMT Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v4] In-Reply-To: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com> References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com> Message-ID: <FZMjsZWO9NKx4v5svo8qQPE5HKqvoiM-lc0oiDCah80=.2d250429-524a-4e93-a453-bf1db0238626@github.com> > Hi, > Can you help to review the patch? > > I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it. > > Thanks. > > ## Test > benchmarks run on CanVM-K230 > > I've tried several implementations, respectively with vector group > * m2+m1+scalar > * m2+scalar > * m1+scalar > * pure scalar > The best one is combination of m2+m1, it have best performance in all source size. > > this implementation (m2+m1) > <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> > Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic > -- | -- | -- | -- | -- | -- | -- | -- | -- > Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063 > Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443 > Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364 > Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856 > Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146 > Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423 > Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862 > Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469 > Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631 > Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698 > Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876 > > </google-sheets-html-origin> > > vector with only m2 > <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: ... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: move label ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19973/files - new: https://git.openjdk.org/jdk/pull/19973/files/264b354b..8645a6a1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19973&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19973&range=02-03 Stats: 6 lines in 1 file changed: 3 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19973.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19973/head:pull/19973 PR: https://git.openjdk.org/jdk/pull/19973 From coleenp at openjdk.org Tue Jul 2 14:52:19 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 2 Jul 2024 14:52:19 GMT Subject: RFR: 8335397: Improve reliability of TestRecursiveMonitorChurn.java In-Reply-To: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com> References: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com> Message-ID: <ckYjpnygwhRaEcnSITi5UXcp2vi4OUPriVJ0VzSDYfk=.f10b36bd-33e4-428b-8cd4-22506455701c@github.com> On Mon, 1 Jul 2024 09:21:13 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: > TestRecursiveMonitorChurn.java currently uses NMT to try and correlate the native memory increase with unwanted inflation. > > Change to instead query the JVM for exact number of inflations via the Whitebox API. This allow us to both be more exact and less dependent on interactions with NMT. This is an improvement. test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java line 58: > 56: public static void main(String[] args) { > 57: if (WB.getIntVMFlag("LockingMode") == LM_MONITOR) { > 58: throw new SkippedException("LM_MONITOR always infaltes. Invalid test."); typo: inflates test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java line 85: > 83: long reserved = Long.parseLong(m.group(1)); > 84: long committed = Long.parseLong(m.group(2)); > 85: System.out.println(">>>>> " + line + ": " + reserved + " - " + committed); Oh so it just measures how much memory we use for ObjectMonitors? yes, this doesn't seem very reliable. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19965#pullrequestreview-2154070059 PR Review Comment: https://git.openjdk.org/jdk/pull/19965#discussion_r1662670355 PR Review Comment: https://git.openjdk.org/jdk/pull/19965#discussion_r1662683507 From coleenp at openjdk.org Tue Jul 2 15:02:18 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 2 Jul 2024 15:02:18 GMT Subject: RFR: 8334220: Optimize Klass layout after JDK-8180450 In-Reply-To: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com> References: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com> Message-ID: <UAwVP6aRgZKWnjeI3CktNd1atCRK0JCuYFGQU7PsZ_w=.b4f21ee3-000e-441d-b215-804d96206865@github.com> On Sat, 29 Jun 2024 19:58:23 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote: > Hi all, > This PR is created to optimize the layout of Klass in hotspot, after JDK-8180450 the layout of Klsss seems broken, there are 3 holes, they are caused by alignment issue introduced by the 1 byte ```_hash_slot```. > > > (gdb) ptype /ox Klass > /* offset | size */ type = class Klass : public Metadata { > public: > static const uint KLASS_KIND_COUNT; > protected: > /* 0x000c | 0x0004 */ jint _layout_helper; > /* 0x0010 | 0x0004 */ const enum Klass::KlassKind _kind; > /* 0x0014 | 0x0004 */ jint _modifier_flags; > /* 0x0018 | 0x0004 */ juint _super_check_offset; > /* XXX 4-byte hole */ > /* 0x0020 | 0x0008 */ class Symbol *_name; > /* 0x0028 | 0x0008 */ class Klass *_secondary_super_cache; > /* 0x0030 | 0x0008 */ class Array<Klass*> *_secondary_supers; > /* 0x0038 | 0x0040 */ class Klass *_primary_supers[8]; > /* 0x0078 | 0x0008 */ class OopHandle { > private: > /* 0x0078 | 0x0008 */ class oop *_obj; > > /* total size (bytes): 8 */ > } _java_mirror; > /* 0x0080 | 0x0008 */ class Klass *_super; > /* 0x0088 | 0x0008 */ class Klass * volatile _subklass; > /* 0x0090 | 0x0008 */ class Klass * volatile _next_sibling; > /* 0x0098 | 0x0008 */ class Klass *_next_link; > /* 0x00a0 | 0x0008 */ class ClassLoaderData *_class_loader_data; > /* 0x00a8 | 0x0008 */ uintx _bitmap; > /* 0x00b0 | 0x0001 */ uint8_t _hash_slot; > /* XXX 3-byte hole */ > /* 0x00b4 | 0x0004 */ int _vtable_len; > /* 0x00b8 | 0x0004 */ class AccessFlags { > private: > /* 0x00b8 | 0x0004 */ jint _flags; > > /* total size (bytes): 4 */ > } _access_flags; > /* XXX 4-byte hole */ > /* 0x00c0 | 0x0008 */ traceid _trace_id; > private: > /* 0x00c8 | 0x0002 */ s2 _shared_class_path_index; > /* 0x00ca | 0x0002 */ u2 _shared_class_flags; > /* 0x00cc | 0x0004 */ int _archived_mirror_index; > public: > static const int SECONDARY_SUPERS_TABLE_SIZE; > static const int SECONDARY_SUPERS_TABLE_MASK; > static const... I have a couple of questions about this and a request. Thanks. src/hotspot/share/oops/klass.hpp line 166: > 164: uintx _bitmap; > 165: > 166: static uint8_t compute_hash_slot(Symbol* s); We don't usually put functions in with nonstatic member declarations. Since you moved hash_slot, can you move this function to the first private section where hash_insert is? Where you moved hash_slot looks fine. Doesn't look like it will have negative cache effects, unless it needs to be on a cache line with _bitmap? src/hotspot/share/oops/klass.hpp line 176: > 174: JFR_ONLY(DEFINE_TRACE_ID_FIELD;) > 175: uint8_t _hash_slot; > 176: DEFINE_PAD_MINUS_SIZE(1, 4, sizeof(uint8_t)); //3 bytes padding after 1 byte _hash_slot for better layout How does this help? Doesn't the compiler add this padding? ------------- Changes requested by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19958#pullrequestreview-2154109976 PR Review Comment: https://git.openjdk.org/jdk/pull/19958#discussion_r1662698438 PR Review Comment: https://git.openjdk.org/jdk/pull/19958#discussion_r1662701347 From sgehwolf at openjdk.org Tue Jul 2 15:17:28 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Tue, 2 Jul 2024 15:17:28 GMT Subject: RFR: 8322475: Extend printing for System.map [v6] In-Reply-To: <-Qkoj2CJIqS0pNR-3JxXULeaty66oPIAJZgFx7IskTA=.9e679c42-24e4-4fb2-a3fd-d27be65aeac0@github.com> References: <xXLpEw01_OAADNe6SFsw8sBYqjShMROIKQH3IflvgAM=.facb614e-cc97-441f-873f-e7453bd4338d@github.com> <-Qkoj2CJIqS0pNR-3JxXULeaty66oPIAJZgFx7IskTA=.9e679c42-24e4-4fb2-a3fd-d27be65aeac0@github.com> Message-ID: <I2AjfniaetQsEpBX3ir4NBP1Ja1de1NZDbVmmkuAPwc=.5b2bb425-659f-4cd8-a5f3-360c55020d0b@github.com> On Thu, 20 Jun 2024 09:31:48 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> This is an expansion on the new `System.map` command introduced with JDK-8318636. >> >> We now print valuable information per memory region, such as: >> >> - the actual resident set size >> - the actual number of huge pages >> - the actual used page size >> - the THP state of the region (was advised, is eligible, uses THP, ...) >> - whether the region is shared >> - whether the region had been committed (backed by swap) >> - whether the region has been swapped out. >> >> Example output: >> >> [system-map-thp1.txt](https://github.com/user-attachments/files/15587748/system-map-thp1.txt) >> >> >> from to size rss hugetlb pgsz prot notes vm info/file >> 0x00000000c0000000 - 0x00000000ffe00000 1071644672 0 4194304 2M rw-p huge JAVAHEAP /anon_hugepage >> 0x00000000ffe00000 - 0x0000000100000000 2097152 0 0 2M rw-p huge JAVAHEAP /anon_hugepage >> 0x0000558016b67000 - 0x0000558016b68000 4096 4096 0 4K r--p /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java >> 0x0000558016b68000 - 0x0000558016b69000 4096 4096 0 4K r-xp /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java >> 0x00007f3a749f2000 - 0x00007f3a74c62000 2555904 2555904 0 4K rwxp CODE(CodeHeap 'profiled nmethods') >> 0x00007f3a74c62000 - 0x00007f3a7be51000 119468032 0 0 4K ---p nores CODE(CodeHeap 'profiled nmethods') >> 0x00007f3a7be51000 - 0x00007f3a7c1c1000 3604480 3604480 0 4K rwxp CODE(CodeHeap 'profiled nmethods') >> 0x00007f3a7c1c1000 - 0x00007f3a7c592000 4001792 0 0 4K ---p nores CODE(CodeHeap 'non-nmethods') >> 0x00007f3a7c592000 - 0x00007f3a7c802000 2555904 2... > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: > > - feedback johan > - fix merge errors > - Merge branch 'master' into System.maps-more-info > - copyrights > - Merge branch 'master' into System.maps-more-info > - fix merge issue > - Merge branch 'master' into System.maps-more-info > - fix whitespace issue > - wip > - exhuming > - ... and 13 more: https://git.openjdk.org/jdk/compare/c6f3bf4b...940199de This seems fine. Mostly nits. src/hotspot/os/linux/procMapsParser.hpp line 66: > 64: from = to = nullptr; > 65: prot[0] = filename[0] = '\0'; > 66: kernelpagesize = rss = private_hugetlb = anonhugepages = swap = 0; `private_hugetlb` and `shared_hugetlb` missing in reset. Intentional? src/hotspot/share/nmt/memMapPrinter.cpp line 262: > 260: print_thread_details_for_supposed_stack_address(vma_from, vma_to, _out); > 261: } > 262: num_printed ++; Style: No space before `++`. test/hotspot/jtreg/serviceability/dcmd/vm/SystemDumpMapTest.java line 31: > 29: > 30: import java.io.*; > 31: import java.lang.StringBuilder; Nit: `java.lang.*` are imported by default. I don't see it used, so maybe a left over? test/hotspot/jtreg/serviceability/dcmd/vm/SystemMapTestBase.java line 53: > 51: regexBase_committed + "\\[stack\\]", > 52: // we should see the hs-perf data file, and it should appear as shared as well as committed > 53: regexBase_shared_and_committed + "hsperfdata_.*" Suggestion: Should the test run with `-XX:+UsePerfData` since it's expecting this file. It's default on, but that might change. ------------- PR Review: https://git.openjdk.org/jdk/pull/17158#pullrequestreview-2154058332 PR Review Comment: https://git.openjdk.org/jdk/pull/17158#discussion_r1662723988 PR Review Comment: https://git.openjdk.org/jdk/pull/17158#discussion_r1662661856 PR Review Comment: https://git.openjdk.org/jdk/pull/17158#discussion_r1662705054 PR Review Comment: https://git.openjdk.org/jdk/pull/17158#discussion_r1662700550 From eosterlund at openjdk.org Tue Jul 2 15:48:45 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 2 Jul 2024 15:48:45 GMT Subject: RFR: 8334890: Missing unconditional cross modifying fence in nmethod entry barriers Message-ID: <592bq3FIM28SxUn6yH2iCDRT6TO_lpn_WvoS6PglM90=.b965043e-8550-45e0-be8b-5a71163a16d6@github.com> On x86_64, our nmethod entry barriers use a mix of asynchronous and synchronous code modification. There is a cmp instruction with an immediate. When the immediate value is "incorrect", the nmethod is armed, and when it's "correct", it's disarmed. When we load the immediate with the instruction fetcher, we use asynchronous cross modifying code, and when we load the immediate as data, we use synchronous cross modifying code. We use asynchronous code modification in the fast path of nmethod entry barriers. If the nmethod is concurrently being disarmed while the nmethod entry barrier is executed, then we are guaranteed that if the updated "correct" immediate is observed by the instruction fetcher, then any code modification to the nmethod prior to disarming it on another thread, is guaranteed to also be observed by the instruction fetcher. However, in the slow path, when the immediate was observed to have the "incorrect" value by the instruction fetcher, we call a C++ function, BarrierSetNMethod::nmethod_stub_entry_barrier. In this function we check if the nmethod is disarmed or armed, by loading the guard value (from the immediate), as data. If we observe the updated value, indicating that the nmethod has become disarmed, we want to enter the nmethod. However, since we used data to signal that the instruction cross modification has happened, it is not safe to execute the concurrently modified instructions, without enforcing a cross modifying code fence. This is synchronous code modification. There is some questionable optimization that in the stub slow path entry (which we just got to because the nmethod was observed to be armed by the instruction fetcher). It checks "just one more time" if the nmethod concurrently got disarmed, and then exits without cross modification fence. This is an opportunistic optimization that is very unlikely to be useful, since we got into the slow path because it a couple of instructions ago was armed. This opportunistic optimization breaks the synchronous code modification contract, which is that you have to issue an instruction cross modification fence after reading the data that signalled that cross modification has completed successfully. This patch removes these kinds of opportunistic optimizations from the nmethod entry barrier code, in order to make it more robust and follow the synchronous cross modification dance correctly. ------------- Commit messages: - 8334890: Missing unconditional cross modifying fence in nmethod entry barriers Changes: https://git.openjdk.org/jdk/pull/19990/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19990&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334890 Stats: 21 lines in 1 file changed: 1 ins; 17 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19990.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19990/head:pull/19990 PR: https://git.openjdk.org/jdk/pull/19990 From xpeng at openjdk.org Tue Jul 2 16:53:22 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 2 Jul 2024 16:53:22 GMT Subject: RFR: 8334220: Optimize Klass layout after JDK-8180450 In-Reply-To: <UAwVP6aRgZKWnjeI3CktNd1atCRK0JCuYFGQU7PsZ_w=.b4f21ee3-000e-441d-b215-804d96206865@github.com> References: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com> <UAwVP6aRgZKWnjeI3CktNd1atCRK0JCuYFGQU7PsZ_w=.b4f21ee3-000e-441d-b215-804d96206865@github.com> Message-ID: <19BMqaE29h4XHAYBm2QkjfdUQpInAxgrdWFHnP_JcmA=.505b9844-9656-4c14-83d5-fa9acf6efa2d@github.com> On Tue, 2 Jul 2024 14:58:34 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> Hi all, >> This PR is created to optimize the layout of Klass in hotspot, after JDK-8180450 the layout of Klsss seems broken, there are 3 holes, they are caused by alignment issue introduced by the 1 byte ```_hash_slot```. >> >> >> (gdb) ptype /ox Klass >> /* offset | size */ type = class Klass : public Metadata { >> public: >> static const uint KLASS_KIND_COUNT; >> protected: >> /* 0x000c | 0x0004 */ jint _layout_helper; >> /* 0x0010 | 0x0004 */ const enum Klass::KlassKind _kind; >> /* 0x0014 | 0x0004 */ jint _modifier_flags; >> /* 0x0018 | 0x0004 */ juint _super_check_offset; >> /* XXX 4-byte hole */ >> /* 0x0020 | 0x0008 */ class Symbol *_name; >> /* 0x0028 | 0x0008 */ class Klass *_secondary_super_cache; >> /* 0x0030 | 0x0008 */ class Array<Klass*> *_secondary_supers; >> /* 0x0038 | 0x0040 */ class Klass *_primary_supers[8]; >> /* 0x0078 | 0x0008 */ class OopHandle { >> private: >> /* 0x0078 | 0x0008 */ class oop *_obj; >> >> /* total size (bytes): 8 */ >> } _java_mirror; >> /* 0x0080 | 0x0008 */ class Klass *_super; >> /* 0x0088 | 0x0008 */ class Klass * volatile _subklass; >> /* 0x0090 | 0x0008 */ class Klass * volatile _next_sibling; >> /* 0x0098 | 0x0008 */ class Klass *_next_link; >> /* 0x00a0 | 0x0008 */ class ClassLoaderData *_class_loader_data; >> /* 0x00a8 | 0x0008 */ uintx _bitmap; >> /* 0x00b0 | 0x0001 */ uint8_t _hash_slot; >> /* XXX 3-byte hole */ >> /* 0x00b4 | 0x0004 */ int _vtable_len; >> /* 0x00b8 | 0x0004 */ class AccessFlags { >> private: >> /* 0x00b8 | 0x0004 */ jint _flags; >> >> /* total size (bytes): 4 */ >> } _access_flags; >> /* XXX 4-byte hole */ >> /* 0x00c0 | 0x0008 */ traceid _trace_id; >> private: >> /* 0x00c8 | 0x0002 */ s2 _shared_class_path_index; >> /* 0x00ca | 0x0002 */ u2 _shared_class_flags; >> /* 0x00cc | 0x0004 */ int _archived_mirror_index; >> public: >> static const int SECONDARY_SUPERS_TABLE_SIZE; >> ... > > src/hotspot/share/oops/klass.hpp line 176: > >> 174: JFR_ONLY(DEFINE_TRACE_ID_FIELD;) >> 175: uint8_t _hash_slot; >> 176: DEFINE_PAD_MINUS_SIZE(1, 4, sizeof(uint8_t)); //3 bytes padding after 1 byte _hash_slot for better layout > > How does this help? Doesn't the compiler add this padding? Compiler doesn't seem to add padding, there will be two smaller holes after _hash_slot, here is what I got from gdb: /* 0x00b8 | 0x0001 */ uint8_t _hash_slot; private: /* XXX 1-byte hole */ /* 0x00ba | 0x0002 */ s2 _shared_class_path_index; /* 0x00bc | 0x0002 */ u2 _shared_class_flags; /* XXX 2-byte hole */ /* 0x00c0 | 0x0004 */ int _archived_mirror_index; I think we can remove it to avoid confusion, I doesn't really change the cache line, here is output from ```pahole``` with cacheline boundaries: With padding: protected: jint _layout_helper; /* 8 4 */ const enum KlassKind _kind; /* 12 4 */ jint _modifier_flags; /* 16 4 */ juint _super_check_offset; /* 20 4 */ class Symbol * _name; /* 24 8 */ class Klass * _secondary_super_cache; /* 32 8 */ class Array<Klass*> * _secondary_supers; /* 40 8 */ class Klass * _primary_supers[8]; /* 48 64 */ /* --- cacheline 1 boundary (64 bytes) was 48 bytes ago --- */ class OopHandle _java_mirror; /* 112 8 */ class Klass * _super; /* 120 8 */ /* --- cacheline 2 boundary (128 bytes) --- */ volatile class Klass * _subklass; /* 128 8 */ volatile class Klass * _next_sibling; /* 136 8 */ class Klass * _next_link; /* 144 8 */ class ClassLoaderData * _class_loader_data; /* 152 8 */ uintx _bitmap; /* 160 8 */ int _vtable_len; /* 168 4 */ class AccessFlags _access_flags; /* 172 4 */ traceid _trace_id; /* 176 8 */ uint8_t _hash_slot; /* 184 1 */ char _pad_buf1[3]; /* 185 3 */ s2 _shared_class_path_index; /* 188 2 */ u2 _shared_class_flags; /* 190 2 */ /* --- cacheline 3 boundary (192 bytes) --- */ int _archived_mirror_index; /* 192 4 */ w/o padding: protected: jint _layout_helper; /* 8 4 */ const enum KlassKind _kind; /* 12 4 */ jint _modifier_flags; /* 16 4 */ juint _super_check_offset; /* 20 4 */ class Symbol * _name; /* 24 8 */ class Klass * _secondary_super_cache; /* 32 8 */ class Array<Klass*> * _secondary_supers; /* 40 8 */ class Klass * _primary_supers[8]; /* 48 64 */ /* --- cacheline 1 boundary (64 bytes) was 48 bytes ago --- */ class OopHandle _java_mirror; /* 112 8 */ class Klass * _super; /* 120 8 */ /* --- cacheline 2 boundary (128 bytes) --- */ volatile class Klass * _subklass; /* 128 8 */ volatile class Klass * _next_sibling; /* 136 8 */ class Klass * _next_link; /* 144 8 */ class ClassLoaderData * _class_loader_data; /* 152 8 */ uintx _bitmap; /* 160 8 */ int _vtable_len; /* 168 4 */ class AccessFlags _access_flags; /* 172 4 */ traceid _trace_id; /* 176 8 */ uint8_t _hash_slot; /* 184 1 */ /* XXX 1 byte hole, try to pack */ s2 _shared_class_path_index; /* 186 2 */ u2 _shared_class_flags; /* 188 2 */ /* XXX 2 bytes hole, try to pack */ /* --- cacheline 3 boundary (192 bytes) --- */ int _archived_mirror_index; /* 192 4 */ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19958#discussion_r1662866957 From xpeng at openjdk.org Tue Jul 2 17:07:19 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 2 Jul 2024 17:07:19 GMT Subject: RFR: 8334220: Optimize Klass layout after JDK-8180450 In-Reply-To: <UAwVP6aRgZKWnjeI3CktNd1atCRK0JCuYFGQU7PsZ_w=.b4f21ee3-000e-441d-b215-804d96206865@github.com> References: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com> <UAwVP6aRgZKWnjeI3CktNd1atCRK0JCuYFGQU7PsZ_w=.b4f21ee3-000e-441d-b215-804d96206865@github.com> Message-ID: <IUdBaJ9KRr0pUDrYzKzQUDRqn1IxiLeJKuFqPEIuJKY=.1de59e3e-4e95-44ae-ba8c-9792a2e71632@github.com> On Tue, 2 Jul 2024 14:56:56 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> Hi all, >> This PR is created to optimize the layout of Klass in hotspot, after JDK-8180450 the layout of Klsss seems broken, there are 3 holes, they are caused by alignment issue introduced by the 1 byte ```_hash_slot```. >> >> >> (gdb) ptype /ox Klass >> /* offset | size */ type = class Klass : public Metadata { >> public: >> static const uint KLASS_KIND_COUNT; >> protected: >> /* 0x000c | 0x0004 */ jint _layout_helper; >> /* 0x0010 | 0x0004 */ const enum Klass::KlassKind _kind; >> /* 0x0014 | 0x0004 */ jint _modifier_flags; >> /* 0x0018 | 0x0004 */ juint _super_check_offset; >> /* XXX 4-byte hole */ >> /* 0x0020 | 0x0008 */ class Symbol *_name; >> /* 0x0028 | 0x0008 */ class Klass *_secondary_super_cache; >> /* 0x0030 | 0x0008 */ class Array<Klass*> *_secondary_supers; >> /* 0x0038 | 0x0040 */ class Klass *_primary_supers[8]; >> /* 0x0078 | 0x0008 */ class OopHandle { >> private: >> /* 0x0078 | 0x0008 */ class oop *_obj; >> >> /* total size (bytes): 8 */ >> } _java_mirror; >> /* 0x0080 | 0x0008 */ class Klass *_super; >> /* 0x0088 | 0x0008 */ class Klass * volatile _subklass; >> /* 0x0090 | 0x0008 */ class Klass * volatile _next_sibling; >> /* 0x0098 | 0x0008 */ class Klass *_next_link; >> /* 0x00a0 | 0x0008 */ class ClassLoaderData *_class_loader_data; >> /* 0x00a8 | 0x0008 */ uintx _bitmap; >> /* 0x00b0 | 0x0001 */ uint8_t _hash_slot; >> /* XXX 3-byte hole */ >> /* 0x00b4 | 0x0004 */ int _vtable_len; >> /* 0x00b8 | 0x0004 */ class AccessFlags { >> private: >> /* 0x00b8 | 0x0004 */ jint _flags; >> >> /* total size (bytes): 4 */ >> } _access_flags; >> /* XXX 4-byte hole */ >> /* 0x00c0 | 0x0008 */ traceid _trace_id; >> private: >> /* 0x00c8 | 0x0002 */ s2 _shared_class_path_index; >> /* 0x00ca | 0x0002 */ u2 _shared_class_flags; >> /* 0x00cc | 0x0004 */ int _archived_mirror_index; >> public: >> static const int SECONDARY_SUPERS_TABLE_SIZE; >> ... > > src/hotspot/share/oops/klass.hpp line 166: > >> 164: uintx _bitmap; >> 165: >> 166: static uint8_t compute_hash_slot(Symbol* s); > > We don't usually put functions in with nonstatic member declarations. Since you moved hash_slot, can you move this function to the first private section where hash_insert is? > > Where you moved hash_slot looks fine. Doesn't look like it will have negative cache effects, unless it needs to be on a cache line with _bitmap? Thanks Coleen! Good catch! I have checked the cacheline boundaries with pahole, both are in same cache line after moving, that shouldn't be a concern. But yes, they are related fields and should be stay together. Also checked the layer after moving both _bitmap and _hash_slot, it looks good, I'll update the PR: protected: jint _layout_helper; /* 8 4 */ const enum KlassKind _kind; /* 12 4 */ jint _modifier_flags; /* 16 4 */ juint _super_check_offset; /* 20 4 */ class Symbol * _name; /* 24 8 */ class Klass * _secondary_super_cache; /* 32 8 */ class Array<Klass*> * _secondary_supers; /* 40 8 */ class Klass * _primary_supers[8]; /* 48 64 */ /* --- cacheline 1 boundary (64 bytes) was 48 bytes ago --- */ class OopHandle _java_mirror; /* 112 8 */ class Klass * _super; /* 120 8 */ /* --- cacheline 2 boundary (128 bytes) --- */ volatile class Klass * _subklass; /* 128 8 */ volatile class Klass * _next_sibling; /* 136 8 */ class Klass * _next_link; /* 144 8 */ class ClassLoaderData * _class_loader_data; /* 152 8 */ int _vtable_len; /* 160 4 */ class AccessFlags _access_flags; /* 164 4 */ traceid _trace_id; /* 168 8 */ uintx _bitmap; /* 176 8 */ uint8_t _hash_slot; /* 184 1 */ /* XXX 1 byte hole, try to pack */ s2 _shared_class_path_index; /* 186 2 */ u2 _shared_class_flags; /* 188 2 */ /* XXX 2 bytes hole, try to pack */ /* --- cacheline 3 boundary (192 bytes) --- */ int _archived_mirror_index; /* 192 4 */ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19958#discussion_r1662884573 From xpeng at openjdk.org Tue Jul 2 17:25:53 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 2 Jul 2024 17:25:53 GMT Subject: RFR: 8334220: Optimize Klass layout after JDK-8180450 [v2] In-Reply-To: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com> References: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com> Message-ID: <HmIHMdh599RCc4lq7jPiveOQphvPaRS-BlN3XHaE74E=.f145682f-5cbc-4b1c-b460-4a2bf323430b@github.com> > Hi all, > This PR is created to optimize the layout of Klass in hotspot, after JDK-8180450 the layout of Klsss seems broken, there are 3 holes, they are caused by alignment issue introduced by the 1 byte ```_hash_slot```. > > > (gdb) ptype /ox Klass > /* offset | size */ type = class Klass : public Metadata { > public: > static const uint KLASS_KIND_COUNT; > protected: > /* 0x000c | 0x0004 */ jint _layout_helper; > /* 0x0010 | 0x0004 */ const enum Klass::KlassKind _kind; > /* 0x0014 | 0x0004 */ jint _modifier_flags; > /* 0x0018 | 0x0004 */ juint _super_check_offset; > /* XXX 4-byte hole */ > /* 0x0020 | 0x0008 */ class Symbol *_name; > /* 0x0028 | 0x0008 */ class Klass *_secondary_super_cache; > /* 0x0030 | 0x0008 */ class Array<Klass*> *_secondary_supers; > /* 0x0038 | 0x0040 */ class Klass *_primary_supers[8]; > /* 0x0078 | 0x0008 */ class OopHandle { > private: > /* 0x0078 | 0x0008 */ class oop *_obj; > > /* total size (bytes): 8 */ > } _java_mirror; > /* 0x0080 | 0x0008 */ class Klass *_super; > /* 0x0088 | 0x0008 */ class Klass * volatile _subklass; > /* 0x0090 | 0x0008 */ class Klass * volatile _next_sibling; > /* 0x0098 | 0x0008 */ class Klass *_next_link; > /* 0x00a0 | 0x0008 */ class ClassLoaderData *_class_loader_data; > /* 0x00a8 | 0x0008 */ uintx _bitmap; > /* 0x00b0 | 0x0001 */ uint8_t _hash_slot; > /* XXX 3-byte hole */ > /* 0x00b4 | 0x0004 */ int _vtable_len; > /* 0x00b8 | 0x0004 */ class AccessFlags { > private: > /* 0x00b8 | 0x0004 */ jint _flags; > > /* total size (bytes): 4 */ > } _access_flags; > /* XXX 4-byte hole */ > /* 0x00c0 | 0x0008 */ traceid _trace_id; > private: > /* 0x00c8 | 0x0002 */ s2 _shared_class_path_index; > /* 0x00ca | 0x0002 */ u2 _shared_class_flags; > /* 0x00cc | 0x0004 */ int _archived_mirror_index; > public: > static const int SECONDARY_SUPERS_TABLE_SIZE; > static const int SECONDARY_SUPERS_TABLE_MASK; > static const... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Move both _bitmap and _hash_slot together ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19958/files - new: https://git.openjdk.org/jdk/pull/19958/files/00fbff70..ce1560c1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19958&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19958&range=00-01 Stats: 10 lines in 1 file changed: 4 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19958.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19958/head:pull/19958 PR: https://git.openjdk.org/jdk/pull/19958 From xpeng at openjdk.org Tue Jul 2 17:25:53 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 2 Jul 2024 17:25:53 GMT Subject: RFR: 8334220: Optimize Klass layout after JDK-8180450 [v2] In-Reply-To: <IUdBaJ9KRr0pUDrYzKzQUDRqn1IxiLeJKuFqPEIuJKY=.1de59e3e-4e95-44ae-ba8c-9792a2e71632@github.com> References: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com> <UAwVP6aRgZKWnjeI3CktNd1atCRK0JCuYFGQU7PsZ_w=.b4f21ee3-000e-441d-b215-804d96206865@github.com> <IUdBaJ9KRr0pUDrYzKzQUDRqn1IxiLeJKuFqPEIuJKY=.1de59e3e-4e95-44ae-ba8c-9792a2e71632@github.com> Message-ID: <WbcFyXAITuG0372YiVLYfb5qaln_c89apoobqzP04Co=.1aa6e5f1-2022-4b3e-81c7-4332982b3e3b@github.com> On Tue, 2 Jul 2024 17:04:40 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote: >> src/hotspot/share/oops/klass.hpp line 166: >> >>> 164: uintx _bitmap; >>> 165: >>> 166: static uint8_t compute_hash_slot(Symbol* s); >> >> We don't usually put functions in with nonstatic member declarations. Since you moved hash_slot, can you move this function to the first private section where hash_insert is? >> >> Where you moved hash_slot looks fine. Doesn't look like it will have negative cache effects, unless it needs to be on a cache line with _bitmap? > > Thanks Coleen! Good catch! > I have checked the cacheline boundaries with pahole, both are in same cache line after moving, that shouldn't be a concern. > But yes, they are related fields and should be stay together. Also checked the layer after moving both _bitmap and _hash_slot, it looks good, I'll update the PR: > > protected: > > jint _layout_helper; /* 8 4 */ > const enum KlassKind _kind; /* 12 4 */ > jint _modifier_flags; /* 16 4 */ > juint _super_check_offset; /* 20 4 */ > class Symbol * _name; /* 24 8 */ > class Klass * _secondary_super_cache; /* 32 8 */ > class Array<Klass*> * _secondary_supers; /* 40 8 */ > class Klass * _primary_supers[8]; /* 48 64 */ > /* --- cacheline 1 boundary (64 bytes) was 48 bytes ago --- */ > class OopHandle _java_mirror; /* 112 8 */ > class Klass * _super; /* 120 8 */ > /* --- cacheline 2 boundary (128 bytes) --- */ > volatile class Klass * _subklass; /* 128 8 */ > volatile class Klass * _next_sibling; /* 136 8 */ > class Klass * _next_link; /* 144 8 */ > class ClassLoaderData * _class_loader_data; /* 152 8 */ > int _vtable_len; /* 160 4 */ > class AccessFlags _access_flags; /* 164 4 */ > traceid _trace_id; /* 168 8 */ > uintx _bitmap; /* 176 8 */ > uint8_t _hash_slot; /* 184 1 */ > > /* XXX 1 byte hole, try to pack */ > > s2 _shared_class_path_index; /* 186 2 */ > u2 _shared_class_flags; /* 188 2 */ > > /* XXX 2 bytes hole, try to pack */ > > /* --- cacheline 3 boundary (192 bytes) --- */ > int _archived_mirror_index; /* 192 4 */ I have updated PR to reflect what we have discussed, thanks! >> src/hotspot/share/oops/klass.hpp line 176: >> >>> 174: JFR_ONLY(DEFINE_TRACE_ID_FIELD;) >>> 175: uint8_t _hash_slot; >>> 176: DEFINE_PAD_MINUS_SIZE(1, 4, sizeof(uint8_t)); //3 bytes padding after 1 byte _hash_slot for better layout >> >> How does this help? Doesn't the compiler add this padding? > > Compiler doesn't seem to add padding, there will be two smaller holes after _hash_slot, here is what I got from gdb: > > /* 0x00b8 | 0x0001 */ uint8_t _hash_slot; > private: > /* XXX 1-byte hole */ > /* 0x00ba | 0x0002 */ s2 _shared_class_path_index; > /* 0x00bc | 0x0002 */ u2 _shared_class_flags; > /* XXX 2-byte hole */ > /* 0x00c0 | 0x0004 */ int _archived_mirror_index; > > > I think we can remove it to avoid confusion, I doesn't really change the cache line, here is output from ```pahole``` with cacheline boundaries: > With padding: > > protected: > > jint _layout_helper; /* 8 4 */ > const enum KlassKind _kind; /* 12 4 */ > jint _modifier_flags; /* 16 4 */ > juint _super_check_offset; /* 20 4 */ > class Symbol * _name; /* 24 8 */ > class Klass * _secondary_super_cache; /* 32 8 */ > class Array<Klass*> * _secondary_supers; /* 40 8 */ > class Klass * _primary_supers[8]; /* 48 64 */ > /* --- cacheline 1 boundary (64 bytes) was 48 bytes ago --- */ > class OopHandle _java_mirror; /* 112 8 */ > class Klass * _super; /* 120 8 */ > /* --- cacheline 2 boundary (128 bytes) --- */ > volatile class Klass * _subklass; /* 128 8 */ > volatile class Klass * _next_sibling; /* 136 8 */ > class Klass * _next_link; /* 144 8 */ > class ClassLoaderData * _class_loader_data; /* 152 8 */ > uintx _bitmap; /* 160 8 */ > int _vtable_len; /* 168 4 */ > class AccessFlags _access_flags; /* 172 4 */ > traceid _trace_id; /* 176 8 */ > uint8_t _hash_slot; /* 184 1 */ > char _pad_buf1[3]; /* 185 3 */ > s2 _shared_class_path_index; /* 188 2 */ > u2 _shared_class_flags; /* 190 2 */ > /* --- cacheline 3 boundary (192 bytes) --- */ > int _archived_mirror_index; /* 192 4 */ > > > w/o padding: > > protected: > > jint _layout_helper; /* 8 4 */ > const enum KlassKind _kind; /* 12 ... I have removed the padding from code, thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19958#discussion_r1662905230 PR Review Comment: https://git.openjdk.org/jdk/pull/19958#discussion_r1662907038 From stuefe at openjdk.org Tue Jul 2 17:57:20 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 2 Jul 2024 17:57:20 GMT Subject: RFR: 8334220: Optimize Klass layout after JDK-8180450 [v2] In-Reply-To: <HmIHMdh599RCc4lq7jPiveOQphvPaRS-BlN3XHaE74E=.f145682f-5cbc-4b1c-b460-4a2bf323430b@github.com> References: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com> <HmIHMdh599RCc4lq7jPiveOQphvPaRS-BlN3XHaE74E=.f145682f-5cbc-4b1c-b460-4a2bf323430b@github.com> Message-ID: <_kYyW9f7DYRGXmfBruv4r5ywF8jzc4wu3qRHRljEuwQ=.6a594a8b-4f9a-431b-aeea-9bcc639c61ac@github.com> On Tue, 2 Jul 2024 17:25:53 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote: >> Hi all, >> This PR is created to optimize the layout of Klass in hotspot, after JDK-8180450 the layout of Klsss seems broken, there are 3 holes, they are caused by alignment issue introduced by the 1 byte ```_hash_slot```. >> >> >> (gdb) ptype /ox Klass >> /* offset | size */ type = class Klass : public Metadata { >> public: >> static const uint KLASS_KIND_COUNT; >> protected: >> /* 0x000c | 0x0004 */ jint _layout_helper; >> /* 0x0010 | 0x0004 */ const enum Klass::KlassKind _kind; >> /* 0x0014 | 0x0004 */ jint _modifier_flags; >> /* 0x0018 | 0x0004 */ juint _super_check_offset; >> /* XXX 4-byte hole */ >> /* 0x0020 | 0x0008 */ class Symbol *_name; >> /* 0x0028 | 0x0008 */ class Klass *_secondary_super_cache; >> /* 0x0030 | 0x0008 */ class Array<Klass*> *_secondary_supers; >> /* 0x0038 | 0x0040 */ class Klass *_primary_supers[8]; >> /* 0x0078 | 0x0008 */ class OopHandle { >> private: >> /* 0x0078 | 0x0008 */ class oop *_obj; >> >> /* total size (bytes): 8 */ >> } _java_mirror; >> /* 0x0080 | 0x0008 */ class Klass *_super; >> /* 0x0088 | 0x0008 */ class Klass * volatile _subklass; >> /* 0x0090 | 0x0008 */ class Klass * volatile _next_sibling; >> /* 0x0098 | 0x0008 */ class Klass *_next_link; >> /* 0x00a0 | 0x0008 */ class ClassLoaderData *_class_loader_data; >> /* 0x00a8 | 0x0008 */ uintx _bitmap; >> /* 0x00b0 | 0x0001 */ uint8_t _hash_slot; >> /* XXX 3-byte hole */ >> /* 0x00b4 | 0x0004 */ int _vtable_len; >> /* 0x00b8 | 0x0004 */ class AccessFlags { >> private: >> /* 0x00b8 | 0x0004 */ jint _flags; >> >> /* total size (bytes): 4 */ >> } _access_flags; >> /* XXX 4-byte hole */ >> /* 0x00c0 | 0x0008 */ traceid _trace_id; >> private: >> /* 0x00c8 | 0x0002 */ s2 _shared_class_path_index; >> /* 0x00ca | 0x0002 */ u2 _shared_class_flags; >> /* 0x00cc | 0x0004 */ int _archived_mirror_index; >> public: >> static const int SECONDARY_SUPERS_TABLE_SIZE; >> ... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Move both _bitmap and _hash_slot together Ok. > I have checked the cacheline boundaries with pahole, both are in same cache line after moving, that shouldn't be a concern. Note that in current class space, Klass does usually not start at a cache line boundary. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19958#pullrequestreview-2154515609 From aboldtch at openjdk.org Tue Jul 2 19:18:47 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 2 Jul 2024 19:18:47 GMT Subject: RFR: 8335397: Improve reliability of TestRecursiveMonitorChurn.java [v2] In-Reply-To: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com> References: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com> Message-ID: <U2hHhqSXfrNJ5ZqJollgUAwBsv8jUbXNcqQ8A_D7Wh4=.e58c75e1-44a6-4afd-86e1-d65582fc6580@github.com> > TestRecursiveMonitorChurn.java currently uses NMT to try and correlate the native memory increase with unwanted inflation. > > Change to instead query the JVM for exact number of inflations via the Whitebox API. This allow us to both be more exact and less dependent on interactions with NMT. Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19965/files - new: https://git.openjdk.org/jdk/pull/19965/files/81fd7c07..44279523 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19965&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19965&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19965.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19965/head:pull/19965 PR: https://git.openjdk.org/jdk/pull/19965 From aboldtch at openjdk.org Tue Jul 2 19:18:47 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 2 Jul 2024 19:18:47 GMT Subject: RFR: 8335397: Improve reliability of TestRecursiveMonitorChurn.java [v2] In-Reply-To: <ckYjpnygwhRaEcnSITi5UXcp2vi4OUPriVJ0VzSDYfk=.f10b36bd-33e4-428b-8cd4-22506455701c@github.com> References: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com> <ckYjpnygwhRaEcnSITi5UXcp2vi4OUPriVJ0VzSDYfk=.f10b36bd-33e4-428b-8cd4-22506455701c@github.com> Message-ID: <_kIYNEJtKJMbKPqcKTEaBqhgM2BFTfJygHgLtOSm6FQ=.e0f441ae-48c7-4399-a3a9-b9e61cdb7851@github.com> On Tue, 2 Jul 2024 14:42:44 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java > > test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java line 58: > >> 56: public static void main(String[] args) { >> 57: if (WB.getIntVMFlag("LockingMode") == LM_MONITOR) { >> 58: throw new SkippedException("LM_MONITOR always infaltes. Invalid test."); > > typo: inflates Suggestion: throw new SkippedException("LM_MONITOR always inflates. Invalid test."); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19965#discussion_r1663037132 From coleenp at openjdk.org Tue Jul 2 19:38:19 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 2 Jul 2024 19:38:19 GMT Subject: RFR: 8335397: Improve reliability of TestRecursiveMonitorChurn.java [v2] In-Reply-To: <U2hHhqSXfrNJ5ZqJollgUAwBsv8jUbXNcqQ8A_D7Wh4=.e58c75e1-44a6-4afd-86e1-d65582fc6580@github.com> References: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com> <U2hHhqSXfrNJ5ZqJollgUAwBsv8jUbXNcqQ8A_D7Wh4=.e58c75e1-44a6-4afd-86e1-d65582fc6580@github.com> Message-ID: <n4hjYeBK9hcom7ZZ22E4dcLnI0-lJKYBC9JgPHukoM0=.e2bc985f-9b34-4bf4-a9a1-2417cae95258@github.com> On Tue, 2 Jul 2024 19:18:47 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> TestRecursiveMonitorChurn.java currently uses NMT to try and correlate the native memory increase with unwanted inflation. >> >> Change to instead query the JVM for exact number of inflations via the Whitebox API. This allow us to both be more exact and less dependent on interactions with NMT. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java Marked as reviewed by coleenp (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19965#pullrequestreview-2154700442 From xpeng at openjdk.org Tue Jul 2 20:01:18 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 2 Jul 2024 20:01:18 GMT Subject: RFR: 8334220: Optimize Klass layout after JDK-8180450 [v2] In-Reply-To: <UAwVP6aRgZKWnjeI3CktNd1atCRK0JCuYFGQU7PsZ_w=.b4f21ee3-000e-441d-b215-804d96206865@github.com> References: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com> <UAwVP6aRgZKWnjeI3CktNd1atCRK0JCuYFGQU7PsZ_w=.b4f21ee3-000e-441d-b215-804d96206865@github.com> Message-ID: <g7-ZlhxQqYHzjnC-ow9H1yNFQjJUJ_lAh7_dVPCAsqQ=.288fa0ce-060d-480f-9476-d3f0dd0a9a96@github.com> On Tue, 2 Jul 2024 14:59:33 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: > I have a couple of questions about this and a request. Thanks. Thanks so much for the suggestions! Do you have other concerns on the new revision? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19958#issuecomment-2204279047 From xpeng at openjdk.org Tue Jul 2 20:07:19 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 2 Jul 2024 20:07:19 GMT Subject: RFR: 8334220: Optimize Klass layout after JDK-8180450 [v2] In-Reply-To: <_kYyW9f7DYRGXmfBruv4r5ywF8jzc4wu3qRHRljEuwQ=.6a594a8b-4f9a-431b-aeea-9bcc639c61ac@github.com> References: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com> <HmIHMdh599RCc4lq7jPiveOQphvPaRS-BlN3XHaE74E=.f145682f-5cbc-4b1c-b460-4a2bf323430b@github.com> <_kYyW9f7DYRGXmfBruv4r5ywF8jzc4wu3qRHRljEuwQ=.6a594a8b-4f9a-431b-aeea-9bcc639c61ac@github.com> Message-ID: <hNRmjIpcI6y_332uNIdN3t6RGbTyDCGxzq4abLJkvDM=.6051c3ad-bfb7-43be-b65a-d0972419beb2@github.com> On Tue, 2 Jul 2024 17:55:03 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > Ok. > > > I have checked the cacheline boundaries with pahole, both are in same cache line after moving, that shouldn't be a concern. > > Note that in current class space, Klass does usually not start at a cache line boundary. Thank you Thomas for the review and reminding! The whole point is to compact the layout of Klass, which is instantiated often at runtime, more compact will benefit the footprint of cachelines, although the improvement won't be significant. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19958#issuecomment-2204288859 From coleenp at openjdk.org Tue Jul 2 20:17:18 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 2 Jul 2024 20:17:18 GMT Subject: RFR: 8334220: Optimize Klass layout after JDK-8180450 [v2] In-Reply-To: <HmIHMdh599RCc4lq7jPiveOQphvPaRS-BlN3XHaE74E=.f145682f-5cbc-4b1c-b460-4a2bf323430b@github.com> References: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com> <HmIHMdh599RCc4lq7jPiveOQphvPaRS-BlN3XHaE74E=.f145682f-5cbc-4b1c-b460-4a2bf323430b@github.com> Message-ID: <nthulxS9SuMBePGssqh-woC7odTQaU-mxNSg_ceycEU=.2c456585-b528-4131-b68a-fa4c049304ed@github.com> On Tue, 2 Jul 2024 17:25:53 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote: >> Hi all, >> This PR is created to optimize the layout of Klass in hotspot, after JDK-8180450 the layout of Klsss seems broken, there are 3 holes, they are caused by alignment issue introduced by the 1 byte ```_hash_slot```. >> >> >> (gdb) ptype /ox Klass >> /* offset | size */ type = class Klass : public Metadata { >> public: >> static const uint KLASS_KIND_COUNT; >> protected: >> /* 0x000c | 0x0004 */ jint _layout_helper; >> /* 0x0010 | 0x0004 */ const enum Klass::KlassKind _kind; >> /* 0x0014 | 0x0004 */ jint _modifier_flags; >> /* 0x0018 | 0x0004 */ juint _super_check_offset; >> /* XXX 4-byte hole */ >> /* 0x0020 | 0x0008 */ class Symbol *_name; >> /* 0x0028 | 0x0008 */ class Klass *_secondary_super_cache; >> /* 0x0030 | 0x0008 */ class Array<Klass*> *_secondary_supers; >> /* 0x0038 | 0x0040 */ class Klass *_primary_supers[8]; >> /* 0x0078 | 0x0008 */ class OopHandle { >> private: >> /* 0x0078 | 0x0008 */ class oop *_obj; >> >> /* total size (bytes): 8 */ >> } _java_mirror; >> /* 0x0080 | 0x0008 */ class Klass *_super; >> /* 0x0088 | 0x0008 */ class Klass * volatile _subklass; >> /* 0x0090 | 0x0008 */ class Klass * volatile _next_sibling; >> /* 0x0098 | 0x0008 */ class Klass *_next_link; >> /* 0x00a0 | 0x0008 */ class ClassLoaderData *_class_loader_data; >> /* 0x00a8 | 0x0008 */ uintx _bitmap; >> /* 0x00b0 | 0x0001 */ uint8_t _hash_slot; >> /* XXX 3-byte hole */ >> /* 0x00b4 | 0x0004 */ int _vtable_len; >> /* 0x00b8 | 0x0004 */ class AccessFlags { >> private: >> /* 0x00b8 | 0x0004 */ jint _flags; >> >> /* total size (bytes): 4 */ >> } _access_flags; >> /* XXX 4-byte hole */ >> /* 0x00c0 | 0x0008 */ traceid _trace_id; >> private: >> /* 0x00c8 | 0x0002 */ s2 _shared_class_path_index; >> /* 0x00ca | 0x0002 */ u2 _shared_class_flags; >> /* 0x00cc | 0x0004 */ int _archived_mirror_index; >> public: >> static const int SECONDARY_SUPERS_TABLE_SIZE; >> ... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Move both _bitmap and _hash_slot together Yes, this looks good. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19958#pullrequestreview-2154783065 From xpeng at openjdk.org Tue Jul 2 20:30:20 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 2 Jul 2024 20:30:20 GMT Subject: RFR: 8334220: Optimize Klass layout after JDK-8180450 [v2] In-Reply-To: <nthulxS9SuMBePGssqh-woC7odTQaU-mxNSg_ceycEU=.2c456585-b528-4131-b68a-fa4c049304ed@github.com> References: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com> <HmIHMdh599RCc4lq7jPiveOQphvPaRS-BlN3XHaE74E=.f145682f-5cbc-4b1c-b460-4a2bf323430b@github.com> <nthulxS9SuMBePGssqh-woC7odTQaU-mxNSg_ceycEU=.2c456585-b528-4131-b68a-fa4c049304ed@github.com> Message-ID: <SeA81tLNOKSwArcddDfEu629HcJ5akwsPDAytrIWkHI=.31268549-d6c0-4615-931e-af96135b2513@github.com> On Tue, 2 Jul 2024 20:14:25 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: > Yes, this looks good. Thank you, appreciate it! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19958#issuecomment-2204339711 From duke at openjdk.org Tue Jul 2 20:30:21 2024 From: duke at openjdk.org (duke) Date: Tue, 2 Jul 2024 20:30:21 GMT Subject: RFR: 8334220: Optimize Klass layout after JDK-8180450 [v2] In-Reply-To: <HmIHMdh599RCc4lq7jPiveOQphvPaRS-BlN3XHaE74E=.f145682f-5cbc-4b1c-b460-4a2bf323430b@github.com> References: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com> <HmIHMdh599RCc4lq7jPiveOQphvPaRS-BlN3XHaE74E=.f145682f-5cbc-4b1c-b460-4a2bf323430b@github.com> Message-ID: <-3uGfF3w8n4Tb0_C15jtfdLDD1DB6Z0rF_pu_zA2g4I=.fd01fa84-607f-4705-86f1-e24f91df3378@github.com> On Tue, 2 Jul 2024 17:25:53 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote: >> Hi all, >> This PR is created to optimize the layout of Klass in hotspot, after JDK-8180450 the layout of Klsss seems broken, there are 3 holes, they are caused by alignment issue introduced by the 1 byte ```_hash_slot```. >> >> >> (gdb) ptype /ox Klass >> /* offset | size */ type = class Klass : public Metadata { >> public: >> static const uint KLASS_KIND_COUNT; >> protected: >> /* 0x000c | 0x0004 */ jint _layout_helper; >> /* 0x0010 | 0x0004 */ const enum Klass::KlassKind _kind; >> /* 0x0014 | 0x0004 */ jint _modifier_flags; >> /* 0x0018 | 0x0004 */ juint _super_check_offset; >> /* XXX 4-byte hole */ >> /* 0x0020 | 0x0008 */ class Symbol *_name; >> /* 0x0028 | 0x0008 */ class Klass *_secondary_super_cache; >> /* 0x0030 | 0x0008 */ class Array<Klass*> *_secondary_supers; >> /* 0x0038 | 0x0040 */ class Klass *_primary_supers[8]; >> /* 0x0078 | 0x0008 */ class OopHandle { >> private: >> /* 0x0078 | 0x0008 */ class oop *_obj; >> >> /* total size (bytes): 8 */ >> } _java_mirror; >> /* 0x0080 | 0x0008 */ class Klass *_super; >> /* 0x0088 | 0x0008 */ class Klass * volatile _subklass; >> /* 0x0090 | 0x0008 */ class Klass * volatile _next_sibling; >> /* 0x0098 | 0x0008 */ class Klass *_next_link; >> /* 0x00a0 | 0x0008 */ class ClassLoaderData *_class_loader_data; >> /* 0x00a8 | 0x0008 */ uintx _bitmap; >> /* 0x00b0 | 0x0001 */ uint8_t _hash_slot; >> /* XXX 3-byte hole */ >> /* 0x00b4 | 0x0004 */ int _vtable_len; >> /* 0x00b8 | 0x0004 */ class AccessFlags { >> private: >> /* 0x00b8 | 0x0004 */ jint _flags; >> >> /* total size (bytes): 4 */ >> } _access_flags; >> /* XXX 4-byte hole */ >> /* 0x00c0 | 0x0008 */ traceid _trace_id; >> private: >> /* 0x00c8 | 0x0002 */ s2 _shared_class_path_index; >> /* 0x00ca | 0x0002 */ u2 _shared_class_flags; >> /* 0x00cc | 0x0004 */ int _archived_mirror_index; >> public: >> static const int SECONDARY_SUPERS_TABLE_SIZE; >> ... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Move both _bitmap and _hash_slot together @pengxiaolong Your change (at version ce1560c1fcdca237d594cdbd9e0ea59572964509) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19958#issuecomment-2204347799 From dholmes at openjdk.org Wed Jul 3 02:21:21 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 3 Jul 2024 02:21:21 GMT Subject: RFR: 8331385: G1: Prefix HeapRegion helper classes with G1 In-Reply-To: <q2rzIb9CIlSji4pbk0GdDk-y6jrRgZCsvNFkrYI4CJM=.136951b5-f2bc-4169-83dc-b44d20b42f07@github.com> References: <q2rzIb9CIlSji4pbk0GdDk-y6jrRgZCsvNFkrYI4CJM=.136951b5-f2bc-4169-83dc-b44d20b42f07@github.com> Message-ID: <SPBNsHY3noBQGgTadY2dtKkM29h59fcloTH9mIMRmzs=.f6dadafb-e5f5-455d-b2a3-04be5554d1d4@github.com> On Mon, 1 Jul 2024 09:35:00 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote: > Hi all, > > after [JDK-8330694](https://bugs.openjdk.org/browse/JDK-8330694) which renamed HeapRegion to G1HeapRegion, there were a few related helper classes in this CR that were not renamed. > > It's purely mechanical renaming without even further renaming of files etc. > > This change updates them. > > (Fwiw, the "Viewed" checkbox at the top right of the file change helps a lot review this change incrementally) > > Testing: tier1, tier4, tier5 > > Thanks, > Thomas Seems okay. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19967#pullrequestreview-2155206936 From dholmes at openjdk.org Wed Jul 3 02:58:30 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 3 Jul 2024 02:58:30 GMT Subject: RFR: 8334220: Optimize Klass layout after JDK-8180450 [v2] In-Reply-To: <HmIHMdh599RCc4lq7jPiveOQphvPaRS-BlN3XHaE74E=.f145682f-5cbc-4b1c-b460-4a2bf323430b@github.com> References: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com> <HmIHMdh599RCc4lq7jPiveOQphvPaRS-BlN3XHaE74E=.f145682f-5cbc-4b1c-b460-4a2bf323430b@github.com> Message-ID: <tJD6UEGEi8QQfFNbG1dJSZOv0ZYCaRp0up08CXFsW2w=.8783a858-2be5-4b1b-82e3-0ed8e25a418c@github.com> On Tue, 2 Jul 2024 17:25:53 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote: >> Hi all, >> This PR is created to optimize the layout of Klass in hotspot, after JDK-8180450 the layout of Klsss seems broken, there are 3 holes, they are caused by alignment issue introduced by the 1 byte ```_hash_slot```. >> >> >> (gdb) ptype /ox Klass >> /* offset | size */ type = class Klass : public Metadata { >> public: >> static const uint KLASS_KIND_COUNT; >> protected: >> /* 0x000c | 0x0004 */ jint _layout_helper; >> /* 0x0010 | 0x0004 */ const enum Klass::KlassKind _kind; >> /* 0x0014 | 0x0004 */ jint _modifier_flags; >> /* 0x0018 | 0x0004 */ juint _super_check_offset; >> /* XXX 4-byte hole */ >> /* 0x0020 | 0x0008 */ class Symbol *_name; >> /* 0x0028 | 0x0008 */ class Klass *_secondary_super_cache; >> /* 0x0030 | 0x0008 */ class Array<Klass*> *_secondary_supers; >> /* 0x0038 | 0x0040 */ class Klass *_primary_supers[8]; >> /* 0x0078 | 0x0008 */ class OopHandle { >> private: >> /* 0x0078 | 0x0008 */ class oop *_obj; >> >> /* total size (bytes): 8 */ >> } _java_mirror; >> /* 0x0080 | 0x0008 */ class Klass *_super; >> /* 0x0088 | 0x0008 */ class Klass * volatile _subklass; >> /* 0x0090 | 0x0008 */ class Klass * volatile _next_sibling; >> /* 0x0098 | 0x0008 */ class Klass *_next_link; >> /* 0x00a0 | 0x0008 */ class ClassLoaderData *_class_loader_data; >> /* 0x00a8 | 0x0008 */ uintx _bitmap; >> /* 0x00b0 | 0x0001 */ uint8_t _hash_slot; >> /* XXX 3-byte hole */ >> /* 0x00b4 | 0x0004 */ int _vtable_len; >> /* 0x00b8 | 0x0004 */ class AccessFlags { >> private: >> /* 0x00b8 | 0x0004 */ jint _flags; >> >> /* total size (bytes): 4 */ >> } _access_flags; >> /* XXX 4-byte hole */ >> /* 0x00c0 | 0x0008 */ traceid _trace_id; >> private: >> /* 0x00c8 | 0x0002 */ s2 _shared_class_path_index; >> /* 0x00ca | 0x0002 */ u2 _shared_class_flags; >> /* 0x00cc | 0x0004 */ int _archived_mirror_index; >> public: >> static const int SECONDARY_SUPERS_TABLE_SIZE; >> ... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Move both _bitmap and _hash_slot together Marked as reviewed by dholmes (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19958#pullrequestreview-2155233872 From xpeng at openjdk.org Wed Jul 3 02:58:31 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Jul 2024 02:58:31 GMT Subject: Integrated: 8334220: Optimize Klass layout after JDK-8180450 In-Reply-To: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com> References: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com> Message-ID: <T1pobsG9ZcZgm696zN1p5ibrItIG4w8jOz4ntvucjMQ=.98c761df-fc19-4bc4-9510-8214f57ed18d@github.com> On Sat, 29 Jun 2024 19:58:23 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote: > Hi all, > This PR is created to optimize the layout of Klass in hotspot, after JDK-8180450 the layout of Klsss seems broken, there are 3 holes, they are caused by alignment issue introduced by the 1 byte ```_hash_slot```. > > > (gdb) ptype /ox Klass > /* offset | size */ type = class Klass : public Metadata { > public: > static const uint KLASS_KIND_COUNT; > protected: > /* 0x000c | 0x0004 */ jint _layout_helper; > /* 0x0010 | 0x0004 */ const enum Klass::KlassKind _kind; > /* 0x0014 | 0x0004 */ jint _modifier_flags; > /* 0x0018 | 0x0004 */ juint _super_check_offset; > /* XXX 4-byte hole */ > /* 0x0020 | 0x0008 */ class Symbol *_name; > /* 0x0028 | 0x0008 */ class Klass *_secondary_super_cache; > /* 0x0030 | 0x0008 */ class Array<Klass*> *_secondary_supers; > /* 0x0038 | 0x0040 */ class Klass *_primary_supers[8]; > /* 0x0078 | 0x0008 */ class OopHandle { > private: > /* 0x0078 | 0x0008 */ class oop *_obj; > > /* total size (bytes): 8 */ > } _java_mirror; > /* 0x0080 | 0x0008 */ class Klass *_super; > /* 0x0088 | 0x0008 */ class Klass * volatile _subklass; > /* 0x0090 | 0x0008 */ class Klass * volatile _next_sibling; > /* 0x0098 | 0x0008 */ class Klass *_next_link; > /* 0x00a0 | 0x0008 */ class ClassLoaderData *_class_loader_data; > /* 0x00a8 | 0x0008 */ uintx _bitmap; > /* 0x00b0 | 0x0001 */ uint8_t _hash_slot; > /* XXX 3-byte hole */ > /* 0x00b4 | 0x0004 */ int _vtable_len; > /* 0x00b8 | 0x0004 */ class AccessFlags { > private: > /* 0x00b8 | 0x0004 */ jint _flags; > > /* total size (bytes): 4 */ > } _access_flags; > /* XXX 4-byte hole */ > /* 0x00c0 | 0x0008 */ traceid _trace_id; > private: > /* 0x00c8 | 0x0002 */ s2 _shared_class_path_index; > /* 0x00ca | 0x0002 */ u2 _shared_class_flags; > /* 0x00cc | 0x0004 */ int _archived_mirror_index; > public: > static const int SECONDARY_SUPERS_TABLE_SIZE; > static const int SECONDARY_SUPERS_TABLE_MASK; > static const... This pull request has now been integrated. Changeset: f9b4ea13 Author: Xiaolong Peng <xpeng at openjdk.org> Committer: David Holmes <dholmes at openjdk.org> URL: https://git.openjdk.org/jdk/commit/f9b4ea13e693da268c9aee27dee49f9c7f798bb1 Stats: 10 lines in 1 file changed: 5 ins; 5 del; 0 mod 8334220: Optimize Klass layout after JDK-8180450 Reviewed-by: coleenp, stuefe, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/19958 From dholmes at openjdk.org Wed Jul 3 03:11:20 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 3 Jul 2024 03:11:20 GMT Subject: RFR: 8334738: os::print_hex_dump should optionally print ASCII [v3] In-Reply-To: <nyLYOhw7-wSPlKjeWi3FyuLY0UzFwWJdj-19ijEInU4=.6f539aaf-0cff-4ab8-8ca0-3acd3b44d071@github.com> References: <YKa7IgCjp0GLJDZFTlLVoBfDavVdj1Fc5XmQV-xVBM8=.46792106-0555-47bd-899f-056fa5219d03@github.com> <nyLYOhw7-wSPlKjeWi3FyuLY0UzFwWJdj-19ijEInU4=.6f539aaf-0cff-4ab8-8ca0-3acd3b44d071@github.com> Message-ID: <ZM5LyIYOqAkOZ_wDJS12HFBR_ZsHqE4QLzQN2ygLiQk=.d4680604-5eaa-4edb-b786-5a45b5c09eb4@github.com> On Thu, 27 Jun 2024 08:05:42 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Motivated by analyzing CDS dump differences for reproducible builds, I found an optional ASCII printout to be valuable. As usual with hex dumps, ascii follows hex printout >> >> Example: >> >> >> >> 118 0x00000000000001c0: 204b444a6e65704f 53207469422d3436 4d56207265767265 6564747361662820 OpenJDK 64-Bit Server VM (fastde >> 119 0x00000000000001e0: 692d343220677562 2d6c616e7265746e 68742e636f686461 756f732e73616d6f bug 24-internal-adhoc.thomas.sou >> 120 0x0000000000000200: 726f662029656372 612d78756e696c20 45524a203436646d 746e692d34322820 rce) for linux-amd64 JRE (24-int >> 121 0x0000000000000220: 64612d6c616e7265 6d6f68742e636f68 6372756f732e7361 6c697562202c2965 ernal-adhoc.thomas.source), buil >> 122 0x0000000000000240: 323032206e6f2074 5430322d36302d34 32313a35343a3031 672068746977205a t on 2024-06-20T10:45:12Z with g >> 123 0x0000000000000260: 2e352e3031206363 0000000000000030 0000000000000000 0000000000000000 cc 10.5.0_______________________ >> 124 0x0000000000000280: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ________________________________ >> 125 0x00000000000002a0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ________________________________ >> >> >> The patch does that. >> >> Small unrelated changes: >> >> - I rewrote and extended the gtests, testing now a real-life printout containing a mixture or readable and non-readable pages, and printable and non-printable characters. I re-enabled tests on Windows, since https://bugs.openjdk.org/browse/JDK-8185734 is long solved. >> >> - The new test uncovered an issue on 32-bit when printing giant words. We shift a signed value by 32 bits upwards, which can result in -1 resp. ffffffff in the upper half of the giant word. One of the pitfalls of intptr_t vs uintptr_t (I think most uses of intptr_t should probably use uintptr_t). >> >> - I got tired of casting constness away from to-be-printed memory range just to be able to feed an address to os::print_hex_dump. The content printed is usually const. os::print_hex_dump does not need non-constness, but since we use address, and address is typedef char*, and one cannot declare a typedef'ed pointer target-const, the issue is there. I therefore changed the input to const uint8_t*. Maybe we need a const_address or something similar. >> >> ---- >> >> Ran tests on Linux x64 and x86, Windows x86 and Mac aarch64. Fixed all issues I found. Only little-endian, I don't have big-e... > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > exclude test for AIX Marked as reviewed by dholmes (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19835#pullrequestreview-2155244941 From dholmes at openjdk.org Wed Jul 3 03:11:21 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 3 Jul 2024 03:11:21 GMT Subject: RFR: 8334738: os::print_hex_dump should optionally print ASCII [v2] In-Reply-To: <nnF0zgFpLnwB7QpwiUq39YhjDQN6J9HiV4Yb6B-vhAo=.b746b5b7-81c7-4cd5-a6d1-d29c8dd0137b@github.com> References: <YKa7IgCjp0GLJDZFTlLVoBfDavVdj1Fc5XmQV-xVBM8=.46792106-0555-47bd-899f-056fa5219d03@github.com> <Is5Y-4I-mPUfwsGgPw81goRaHtXMORucAqKXdxnufD0=.c13bf441-4678-4801-b874-7a610286822e@github.com> <cRYxoxtDjMenRgLdZEDcYtEzxv6JliX_sTBjbmNsw3U=.abf947a0-b1a9-4846-b06f-b62796aa04c4@github.com> <nnF0zgFpLnwB7QpwiUq39YhjDQN6J9HiV4Yb6B-vhAo=.b746b5b7-81c7-4cd5-a6d1-d29c8dd0137b@github.com> Message-ID: <l--dAczzsXMRswqo_9XmPlWtjR_TMK_LuLzcE0e2QAI=.7d3c03a8-eddb-4d10-9e04-15159e339623@github.com> On Thu, 27 Jun 2024 07:27:48 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> src/hotspot/share/runtime/os.cpp line 949: >> >>> 947: uintptr_t i = (uintptr_t)SafeFetchN((intptr_t*)p, errval); >>> 948: if (i == errval) { >>> 949: i = (uintptr_t)SafeFetchN((intptr_t*)p, ~errval); >> >> Pre-existing but if the initial fetch fails why do we think the second one can succeed ??? > > There is a one-in-2^(32|64) chance the errval numerical value happend to be in memory. By reading twice, with different errval, we diminish the chance of mistaking a successful read for an error. Ouch! So all the other places we only use SafetchN once are potentially broken? Or is this is special case where any value in memory could theoretically be valid? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19835#discussion_r1663414798 From kbarrett at openjdk.org Wed Jul 3 05:00:26 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 3 Jul 2024 05:00:26 GMT Subject: RFR: 8335591: Fix -Wzero-as-null-pointer-constant warnings in ConcurrentHashTable Message-ID: <CIgX0UWlbnyNubbrvO9B3IE5Ilm9QP7f8v0Jho752tc=.03073f61-304d-4d4d-b85a-72c5f8ffcdbf@github.com> Please review this trivial change to ConcurrentHashTable. Initialization of and assignments to the _invisible_epoch member are changed from a value of 0 to nullptr. This removes some -Wzero-as-null-pointer-constant warnings when building with that enabled. Testing: mach5 tier1 ------------- Commit messages: - fix _invisible_epoch usage in ConcurrentHashTable Changes: https://git.openjdk.org/jdk/pull/19996/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19996&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335591 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19996.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19996/head:pull/19996 PR: https://git.openjdk.org/jdk/pull/19996 From dholmes at openjdk.org Wed Jul 3 05:21:21 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 3 Jul 2024 05:21:21 GMT Subject: RFR: 8335108: Build error after JDK-8333658 due to class templates In-Reply-To: <6ljQH6-vh9FlOzTeKq4Satpqa4V-Jj-KYzh-4bmTWq8=.7b6ebf6a-4ed8-4a58-a5b6-88083dd4df6d@github.com> References: <6ljQH6-vh9FlOzTeKq4Satpqa4V-Jj-KYzh-4bmTWq8=.7b6ebf6a-4ed8-4a58-a5b6-88083dd4df6d@github.com> Message-ID: <wng_QvnzLUne1f_oobHsr4DaaPLhq7jcptPWBcAJHUo=.2eac54ed-49f7-4564-91c7-6f0ec20d0237@github.com> On Tue, 25 Jun 2024 18:52:20 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: > Hi all, > > This PR addresses [8335108](https://bugs.openjdk.org/browse/JDK-8335108). > > The error arises as template-id is not allowed for constructor/destructor in C++20. > > Testing: > - [x] Compilation succeeds with g++ 14.1.1. > > Thanks, > Sonia The 24-hour rule is mentioned here: https://openjdk.org/guide/#life-of-a-pr 6. Allow enough time for review In general all PRs should be open for at least 24 hours to allow for reviewers in all time zones to get a chance to see it. It may actually happen that even 24 hours isn?t enough. Take into account weekends, holidays, and vacation times throughout the world and you?ll realize that a change that requires more than just a trivial review may have to be open for a while. In some areas [trivial] changes are allowed to be pushed without the 24 hour delay. Ask your reviewers if you think this applies to your change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19890#issuecomment-2205119100 From stuefe at openjdk.org Wed Jul 3 05:23:19 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 3 Jul 2024 05:23:19 GMT Subject: RFR: 8334738: os::print_hex_dump should optionally print ASCII [v2] In-Reply-To: <l--dAczzsXMRswqo_9XmPlWtjR_TMK_LuLzcE0e2QAI=.7d3c03a8-eddb-4d10-9e04-15159e339623@github.com> References: <YKa7IgCjp0GLJDZFTlLVoBfDavVdj1Fc5XmQV-xVBM8=.46792106-0555-47bd-899f-056fa5219d03@github.com> <Is5Y-4I-mPUfwsGgPw81goRaHtXMORucAqKXdxnufD0=.c13bf441-4678-4801-b874-7a610286822e@github.com> <cRYxoxtDjMenRgLdZEDcYtEzxv6JliX_sTBjbmNsw3U=.abf947a0-b1a9-4846-b06f-b62796aa04c4@github.com> <nnF0zgFpLnwB7QpwiUq39YhjDQN6J9HiV4Yb6B-vhAo=.b746b5b7-81c7-4cd5-a6d1-d29c8dd0137b@github.com> <l--dAczzsXMRswqo_9XmPlWtjR_TMK_LuLzcE0e2QAI=.7d3c03a8-eddb-4d10-9e04-15159e339623@github.com> Message-ID: <YWfftjtRvw97c5a5c7pSAcwX-Ck369idnO2J58_lzOs=.e97cbd26-de71-4eb4-844c-8d06b14974f8@github.com> On Wed, 3 Jul 2024 03:07:23 GMT, David Holmes <dholmes at openjdk.org> wrote: >> There is a one-in-2^(32|64) chance the errval numerical value happend to be in memory. By reading twice, with different errval, we diminish the chance of mistaking a successful read for an error. > > Ouch! So all the other places we only use SafetchN once are potentially broken? Or is this is special case where any value in memory could theoretically be valid? Well, maybe I am just overly careful. After all, the chance is infinitesimally small. You are probably in the clear unless you use 0 or some other frequent pattern. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19835#discussion_r1663506578 From duke at openjdk.org Wed Jul 3 06:11:19 2024 From: duke at openjdk.org (Liming Liu) Date: Wed, 3 Jul 2024 06:11:19 GMT Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is failing [v2] In-Reply-To: <C7DmVK87y-6S5Ljq58--YbyaFND0i03LQEGzpn0FBrY=.fd0e1e6f-86bb-44fa-b61d-1853a48e2fd7@github.com> References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com> <C7DmVK87y-6S5Ljq58--YbyaFND0i03LQEGzpn0FBrY=.fd0e1e6f-86bb-44fa-b61d-1853a48e2fd7@github.com> Message-ID: <rnPsj7tJBhXHpNZ_Cubn3LdzsXi_u8vjFjWIlTiE9-s=.acda4cb0-7fe3-40ce-aac4-7fb5d7deef1a@github.com> On Fri, 28 Jun 2024 19:22:32 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> See JBS issue. >> >> It is not completely obvious what the problem is in Oracle's CI, but the current assumption is that RSS of the testee VM gets reduced after it started and before we measured due to memory pressure. >> >> The patch: >> - exposes os::available_memory via Whitebox >> - For the test to count as failed, we require a certain minimum size of available memory both before and during the start of the testee JVM. Otherwise, we throw a `SkippedException` >> >> I have some misgivings about this solution, though: >> 1) obviously, it is not bullet-proof either, since it is vulnerable to fast changes in machine memory load. >> 2) On MacOS, we have the problem that 'os::available_memory()' totally underreports how much memory is available. Therefore, as an estimate of whether the test is valid, it is too conservative. I opened https://bugs.openjdk.org/browse/JDK-8334767 to track that issue. As long as it is not fixed, the tests will likely fall below the threshold on MacOS and, therefore, be skipped. Still, this is somewhat better than outright excluding the test for MacOS (or is it? Open to opinions) >> 3) `SkippedException` leads to the test counting as "passed", not "skipped". I think that is a usability issue with jtreg. I cannot easily see which tests had been skipped due to SkippedException. >> >> Despite my doubts, I think this is the best we can come up with if we want to have such a test. >> >> Note: One way to go about (3) would be to make "minimum available memory" a `@requires` tag, similar to os.maxMemory. However, I fear that this may be easily misused and cause many tests to be excluded without notice. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Update TestAlwaysPreTouchBehavior.java Could you please confirm whether it is related to JDK-8335167? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19803#issuecomment-2205168688 From rehn at openjdk.org Wed Jul 3 06:34:20 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 3 Jul 2024 06:34:20 GMT Subject: RFR: 8335411: RISC-V: Optimize encode_heap_oop when oop is not null In-Reply-To: <oc-oKUicWVvFjZKiZdhlKYw9nQv9kq2zABpj-beTyxA=.79a98f53-bd18-4bdc-b08d-f21494b949a0@github.com> References: <oc-oKUicWVvFjZKiZdhlKYw9nQv9kq2zABpj-beTyxA=.79a98f53-bd18-4bdc-b08d-f21494b949a0@github.com> Message-ID: <Twjo9tLARYiTFf4DlqQzJuwAvbxBDsUJGQ-XYZndDrU=.a9291118-99bb-4430-8d2b-8fcff9f4a003@github.com> On Mon, 1 Jul 2024 14:32:03 GMT, Feilong Jiang <fjiang at openjdk.org> wrote: > Hi, please review this enhancement that adds two more `encode_heap_oop_not_null` methods. > > Currently, `encode_heap_oop` will check if the oop pointer is `null` at first. We can skip the null check of the oop to reduce the unnecessary branch instruction when encoding non-null oop pointer into compressed form. > > > Testing: > - [x] Tier1~3 on linux-riscv64 with release build > - [x] renaissance & dacapo benchmark suits for functionality Nice, thanks! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19974#pullrequestreview-2155566884 From rehn at openjdk.org Wed Jul 3 06:37:51 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 3 Jul 2024 06:37:51 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v20] In-Reply-To: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> Message-ID: <FMu4_SfI8mQvWR4HQwIssjRHArOwX1MQf7qDHiSAb2w=.807d5c7d-fd9c-4fdd-b7c2-f8c6323f440a@github.com> > Hi all, please consider! > > Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). > Using a very small application or running very short time we have fast patchable calls. > But any normal application running longer will increase the code size and code chrun/fragmentation. > So whatever or not you get hot fast calls rely on luck. > > To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. > This would be the common case for a patchable call. > > Code stream: > JAL <trampo> > Stubs: > AUIPC > LD > JALR > <DEST> > > > On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. > Even if you don't have that problem having a call to a jump is not the fastest way. > Loading the address avoids the pitsfalls of cmodx. > > This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, > and instead do by default: > > Code stream: > AUIPC > LD > JALR > Stubs: > <DEST> > > An experimental option for turning trampolines back on exists. > > It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. > > Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): > > fop (msec) 2239 | 2128 = 0.950424 > h2 (msec) 18660 | 16594 = 0.889282 > jython (msec) 22022 | 21925 = 0.995595 > luindex (msec) 2866 | 2842 = 0.991626 > lusearch (msec) 4108 | 4311 = 1.04942 > lusearch-fix (msec) 4406 | 4116 = 0.934181 > pmd (msec) 5976 | 5897 = 0.98678 > jython (msec) 22022 | 21925 = 0.995595 > Avg: 0.974112 > fop(xcomp) (msec) 2721 | 2714 = 0.997427 > h2(xcomp) (msec) 37719 | 38004 = 1.00756 > jython(xcomp) ... Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: - Merge branch 'master' into 8332689 - Rename lc - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - Comments - Missed in merge-fixes, minor revert - Merge branch 'master' into 8332689 - Minor review comments - Merge branch 'master' into 8332689 - To be pushed - ... and 19 more: https://git.openjdk.org/jdk/compare/77a7078b...6fd73a66 ------------- Changes: https://git.openjdk.org/jdk/pull/19453/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=19 Stats: 897 lines in 16 files changed: 622 ins; 177 del; 98 mod Patch: https://git.openjdk.org/jdk/pull/19453.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453 PR: https://git.openjdk.org/jdk/pull/19453 From chagedorn at openjdk.org Wed Jul 3 06:59:18 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 3 Jul 2024 06:59:18 GMT Subject: RFR: 8335591: Fix -Wzero-as-null-pointer-constant warnings in ConcurrentHashTable In-Reply-To: <CIgX0UWlbnyNubbrvO9B3IE5Ilm9QP7f8v0Jho752tc=.03073f61-304d-4d4d-b85a-72c5f8ffcdbf@github.com> References: <CIgX0UWlbnyNubbrvO9B3IE5Ilm9QP7f8v0Jho752tc=.03073f61-304d-4d4d-b85a-72c5f8ffcdbf@github.com> Message-ID: <aYI3OWb3zWuLPK8CJHq20UWxO55mtnqhxnsAYNIBpE8=.d2df1e20-52f5-4900-9026-064e3bd8d6e0@github.com> On Wed, 3 Jul 2024 04:54:59 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: > Please review this trivial change to ConcurrentHashTable. Initialization of > and assignments to the _invisible_epoch member are changed from a value of 0 > to nullptr. This removes some -Wzero-as-null-pointer-constant warnings when > building with that enabled. > > Testing: mach5 tier1 Looks good and trivial. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19996#pullrequestreview-2155609135 From dholmes at openjdk.org Wed Jul 3 07:14:19 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 3 Jul 2024 07:14:19 GMT Subject: RFR: 8335397: Improve reliability of TestRecursiveMonitorChurn.java [v2] In-Reply-To: <U2hHhqSXfrNJ5ZqJollgUAwBsv8jUbXNcqQ8A_D7Wh4=.e58c75e1-44a6-4afd-86e1-d65582fc6580@github.com> References: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com> <U2hHhqSXfrNJ5ZqJollgUAwBsv8jUbXNcqQ8A_D7Wh4=.e58c75e1-44a6-4afd-86e1-d65582fc6580@github.com> Message-ID: <AB5lGlWSK4IcD98ADnYCTvbipSgv8aGzxbhrx0utaWc=.dfa6b91e-7a81-449e-8f7e-100a932e99f5@github.com> On Tue, 2 Jul 2024 19:18:47 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> TestRecursiveMonitorChurn.java currently uses NMT to try and correlate the native memory increase with unwanted inflation. >> >> Change to instead query the JVM for exact number of inflations via the Whitebox API. This allow us to both be more exact and less dependent on interactions with NMT. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java This seems much more reliable if we are testing the absence of excessive inflation. One query below and one typo, but approved. Thanks src/hotspot/share/prims/whitebox.cpp line 1858: > 1856: > 1857: WB_ENTRY(jlong, WB_getInUseMonitorCount(JNIEnv* env, jobject wb)) > 1858: return (jlong) WhiteBox::get_in_use_monitor_count(); Why the indirection? test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java line 72: > 70: if (pre_monitor_count != post_monitor_count) { > 71: final long monitor_count_change = post_monitor_count - pre_monitor_count; > 72: System.out.println("Unexpected change in mointor count: " + monitor_count_change); typo: mointor ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19965#pullrequestreview-2155627165 PR Review Comment: https://git.openjdk.org/jdk/pull/19965#discussion_r1663630628 PR Review Comment: https://git.openjdk.org/jdk/pull/19965#discussion_r1663636766 From dholmes at openjdk.org Wed Jul 3 07:18:20 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 3 Jul 2024 07:18:20 GMT Subject: RFR: 8335397: Improve reliability of TestRecursiveMonitorChurn.java [v2] In-Reply-To: <AB5lGlWSK4IcD98ADnYCTvbipSgv8aGzxbhrx0utaWc=.dfa6b91e-7a81-449e-8f7e-100a932e99f5@github.com> References: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com> <U2hHhqSXfrNJ5ZqJollgUAwBsv8jUbXNcqQ8A_D7Wh4=.e58c75e1-44a6-4afd-86e1-d65582fc6580@github.com> <AB5lGlWSK4IcD98ADnYCTvbipSgv8aGzxbhrx0utaWc=.dfa6b91e-7a81-449e-8f7e-100a932e99f5@github.com> Message-ID: <x-cPdnQy6lxMx3JNPo4F6Yc6c9XJ4RSxzK3D-XmDcas=.e34c0733-4353-4ca1-9fae-15f42bcf1905@github.com> On Wed, 3 Jul 2024 07:06:17 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java > > src/hotspot/share/prims/whitebox.cpp line 1858: > >> 1856: >> 1857: WB_ENTRY(jlong, WB_getInUseMonitorCount(JNIEnv* env, jobject wb)) >> 1858: return (jlong) WhiteBox::get_in_use_monitor_count(); > > Why the indirection? Ah now I see. We need the member function to make it a friend. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19965#discussion_r1663646717 From aboldtch at openjdk.org Wed Jul 3 07:25:48 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 3 Jul 2024 07:25:48 GMT Subject: RFR: 8335397: Improve reliability of TestRecursiveMonitorChurn.java [v3] In-Reply-To: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com> References: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com> Message-ID: <HiqUmFciHGmdjRTa9B3RfxRxstbd6BA-QfZck-y5wBE=.98e4d5f3-4d47-406a-8de4-c712ca48d24f@github.com> > TestRecursiveMonitorChurn.java currently uses NMT to try and correlate the native memory increase with unwanted inflation. > > Change to instead query the JVM for exact number of inflations via the Whitebox API. This allow us to both be more exact and less dependent on interactions with NMT. Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19965/files - new: https://git.openjdk.org/jdk/pull/19965/files/44279523..071869f3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19965&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19965&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19965.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19965/head:pull/19965 PR: https://git.openjdk.org/jdk/pull/19965 From aboldtch at openjdk.org Wed Jul 3 07:25:48 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 3 Jul 2024 07:25:48 GMT Subject: RFR: 8335397: Improve reliability of TestRecursiveMonitorChurn.java [v2] In-Reply-To: <x-cPdnQy6lxMx3JNPo4F6Yc6c9XJ4RSxzK3D-XmDcas=.e34c0733-4353-4ca1-9fae-15f42bcf1905@github.com> References: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com> <U2hHhqSXfrNJ5ZqJollgUAwBsv8jUbXNcqQ8A_D7Wh4=.e58c75e1-44a6-4afd-86e1-d65582fc6580@github.com> <AB5lGlWSK4IcD98ADnYCTvbipSgv8aGzxbhrx0utaWc=.dfa6b91e-7a81-449e-8f7e-100a932e99f5@github.com> <x-cPdnQy6lxMx3JNPo4F6Yc6c9XJ4RSxzK3D-XmDcas=.e34c0733-4353-4ca1-9fae-15f42bcf1905@github.com> Message-ID: <jTghiE5P2NvKjeHntSSZssb7LDAlNZrmyEvfCxFmDiA=.93f464dd-a3ca-4e12-b414-65212925e43d@github.com> On Wed, 3 Jul 2024 07:16:08 GMT, David Holmes <dholmes at openjdk.org> wrote: >> src/hotspot/share/prims/whitebox.cpp line 1858: >> >>> 1856: >>> 1857: WB_ENTRY(jlong, WB_getInUseMonitorCount(JNIEnv* env, jobject wb)) >>> 1858: return (jlong) WhiteBox::get_in_use_monitor_count(); >> >> Why the indirection? > > Ah now I see. We need the member function to make it a friend. Yeah. But your comment made me realise that I could have friended the `friend jlong WB_getInUseMonitorCount(JNIEnv* env, jobject wb)` function directly. But I think this is fine as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19965#discussion_r1663656789 From aboldtch at openjdk.org Wed Jul 3 07:25:49 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 3 Jul 2024 07:25:49 GMT Subject: RFR: 8335397: Improve reliability of TestRecursiveMonitorChurn.java [v2] In-Reply-To: <AB5lGlWSK4IcD98ADnYCTvbipSgv8aGzxbhrx0utaWc=.dfa6b91e-7a81-449e-8f7e-100a932e99f5@github.com> References: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com> <U2hHhqSXfrNJ5ZqJollgUAwBsv8jUbXNcqQ8A_D7Wh4=.e58c75e1-44a6-4afd-86e1-d65582fc6580@github.com> <AB5lGlWSK4IcD98ADnYCTvbipSgv8aGzxbhrx0utaWc=.dfa6b91e-7a81-449e-8f7e-100a932e99f5@github.com> Message-ID: <vYAJxtUCQUW-rWQymIj7Zw_vjdEzBCLNfQnIeT83vrc=.04726dc5-c88d-41bb-9070-c0a9949c34fa@github.com> On Wed, 3 Jul 2024 07:09:55 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java > > test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java line 72: > >> 70: if (pre_monitor_count != post_monitor_count) { >> 71: final long monitor_count_change = post_monitor_count - pre_monitor_count; >> 72: System.out.println("Unexpected change in mointor count: " + monitor_count_change); > > typo: mointor Suggestion: System.out.println("Unexpected change in monitor count: " + monitor_count_change); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19965#discussion_r1663653892 From stuefe at openjdk.org Wed Jul 3 07:31:21 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 3 Jul 2024 07:31:21 GMT Subject: RFR: 8322475: Extend printing for System.map [v6] In-Reply-To: <I2AjfniaetQsEpBX3ir4NBP1Ja1de1NZDbVmmkuAPwc=.5b2bb425-659f-4cd8-a5f3-360c55020d0b@github.com> References: <xXLpEw01_OAADNe6SFsw8sBYqjShMROIKQH3IflvgAM=.facb614e-cc97-441f-873f-e7453bd4338d@github.com> <-Qkoj2CJIqS0pNR-3JxXULeaty66oPIAJZgFx7IskTA=.9e679c42-24e4-4fb2-a3fd-d27be65aeac0@github.com> <I2AjfniaetQsEpBX3ir4NBP1Ja1de1NZDbVmmkuAPwc=.5b2bb425-659f-4cd8-a5f3-360c55020d0b@github.com> Message-ID: <b_GyqOzspgbgo4cCtunljWBJT-bI7-BDG3YajeXv7W8=.3c3a5c61-8317-40e3-b139-1145ad3532c4@github.com> On Tue, 2 Jul 2024 15:11:13 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: >> >> - feedback johan >> - fix merge errors >> - Merge branch 'master' into System.maps-more-info >> - copyrights >> - Merge branch 'master' into System.maps-more-info >> - fix merge issue >> - Merge branch 'master' into System.maps-more-info >> - fix whitespace issue >> - wip >> - exhuming >> - ... and 13 more: https://git.openjdk.org/jdk/compare/c6f3bf4b...940199de > > src/hotspot/os/linux/procMapsParser.hpp line 66: > >> 64: from = to = nullptr; >> 65: prot[0] = filename[0] = '\0'; >> 66: kernelpagesize = rss = private_hugetlb = anonhugepages = swap = 0; > > `private_hugetlb` and `shared_hugetlb` missing in reset. Intentional? No :( good catch. That is why I prefer memsetting, but folks don't like that. > test/hotspot/jtreg/serviceability/dcmd/vm/SystemDumpMapTest.java line 31: > >> 29: >> 30: import java.io.*; >> 31: import java.lang.StringBuilder; > > Nit: `java.lang.*` are imported by default. I don't see it used, so maybe a left over? Yep, a bunch of those are not needed anymore. > test/hotspot/jtreg/serviceability/dcmd/vm/SystemMapTestBase.java line 53: > >> 51: regexBase_committed + "\\[stack\\]", >> 52: // we should see the hs-perf data file, and it should appear as shared as well as committed >> 53: regexBase_shared_and_committed + "hsperfdata_.*" > > Suggestion: Should the test run with `-XX:+UsePerfData` since it's expecting this file. It's default on, but that might change. That is a good point. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17158#discussion_r1663668311 PR Review Comment: https://git.openjdk.org/jdk/pull/17158#discussion_r1663664444 PR Review Comment: https://git.openjdk.org/jdk/pull/17158#discussion_r1663662914 From stuefe at openjdk.org Wed Jul 3 07:55:48 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 3 Jul 2024 07:55:48 GMT Subject: RFR: 8322475: Extend printing for System.map [v7] In-Reply-To: <xXLpEw01_OAADNe6SFsw8sBYqjShMROIKQH3IflvgAM=.facb614e-cc97-441f-873f-e7453bd4338d@github.com> References: <xXLpEw01_OAADNe6SFsw8sBYqjShMROIKQH3IflvgAM=.facb614e-cc97-441f-873f-e7453bd4338d@github.com> Message-ID: <x8LrKZVgWp9X2bwTSnV3KppIRabvSzfvnRLuLDgvy84=.7b71527c-95a7-49e1-a0c6-e78c81f644c1@github.com> > This is an expansion on the new `System.map` command introduced with JDK-8318636. > > We now print valuable information per memory region, such as: > > - the actual resident set size > - the actual number of huge pages > - the actual used page size > - the THP state of the region (was advised, is eligible, uses THP, ...) > - whether the region is shared > - whether the region had been committed (backed by swap) > - whether the region has been swapped out. > > Example output: > > [system-map-thp1.txt](https://github.com/user-attachments/files/15587748/system-map-thp1.txt) > > > from to size rss hugetlb pgsz prot notes vm info/file > 0x00000000c0000000 - 0x00000000ffe00000 1071644672 0 4194304 2M rw-p huge JAVAHEAP /anon_hugepage > 0x00000000ffe00000 - 0x0000000100000000 2097152 0 0 2M rw-p huge JAVAHEAP /anon_hugepage > 0x0000558016b67000 - 0x0000558016b68000 4096 4096 0 4K r--p /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java > 0x0000558016b68000 - 0x0000558016b69000 4096 4096 0 4K r-xp /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java > 0x00007f3a749f2000 - 0x00007f3a74c62000 2555904 2555904 0 4K rwxp CODE(CodeHeap 'profiled nmethods') > 0x00007f3a74c62000 - 0x00007f3a7be51000 119468032 0 0 4K ---p nores CODE(CodeHeap 'profiled nmethods') > 0x00007f3a7be51000 - 0x00007f3a7c1c1000 3604480 3604480 0 4K rwxp CODE(CodeHeap 'profiled nmethods') > 0x00007f3a7c1c1000 - 0x00007f3a7c592000 4001792 0 0 4K ---p nores CODE(CodeHeap 'non-nmethods') > 0x00007f3a7c592000 - 0x00007f3a7c802000 2555904 2555904 0 4K rwxp CODE(Code... Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: - no stack on 32-bit, scan for vDSO lib instead - feedback severin - Merge branch 'master' into System.maps-more-info - feedback johan - fix merge errors - Merge branch 'master' into System.maps-more-info - copyrights - Merge branch 'master' into System.maps-more-info - fix merge issue - Merge branch 'master' into System.maps-more-info - ... and 16 more: https://git.openjdk.org/jdk/compare/0db9bc57...3cc5943d ------------- Changes: https://git.openjdk.org/jdk/pull/17158/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17158&range=06 Stats: 624 lines in 14 files changed: 425 ins; 111 del; 88 mod Patch: https://git.openjdk.org/jdk/pull/17158.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17158/head:pull/17158 PR: https://git.openjdk.org/jdk/pull/17158 From stuefe at openjdk.org Wed Jul 3 07:55:49 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 3 Jul 2024 07:55:49 GMT Subject: RFR: 8322475: Extend printing for System.map [v6] In-Reply-To: <I2AjfniaetQsEpBX3ir4NBP1Ja1de1NZDbVmmkuAPwc=.5b2bb425-659f-4cd8-a5f3-360c55020d0b@github.com> References: <xXLpEw01_OAADNe6SFsw8sBYqjShMROIKQH3IflvgAM=.facb614e-cc97-441f-873f-e7453bd4338d@github.com> <-Qkoj2CJIqS0pNR-3JxXULeaty66oPIAJZgFx7IskTA=.9e679c42-24e4-4fb2-a3fd-d27be65aeac0@github.com> <I2AjfniaetQsEpBX3ir4NBP1Ja1de1NZDbVmmkuAPwc=.5b2bb425-659f-4cd8-a5f3-360c55020d0b@github.com> Message-ID: <RFTaag5x2gqKf-BoDkIovVZvUG0qw54T541JwnbMvvY=.ffb56e15-9314-4647-a199-a7392cfd39e6@github.com> On Tue, 2 Jul 2024 15:15:00 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: >> >> - feedback johan >> - fix merge errors >> - Merge branch 'master' into System.maps-more-info >> - copyrights >> - Merge branch 'master' into System.maps-more-info >> - fix merge issue >> - Merge branch 'master' into System.maps-more-info >> - fix whitespace issue >> - wip >> - exhuming >> - ... and 13 more: https://git.openjdk.org/jdk/compare/c6f3bf4b...940199de > > This seems fine. Mostly nits. Many thanks, @jerboaa ! I fixed you remarks, and swapped scanning for the primordial thread VMA for scanning for the vdso library, which should be loaded on all linuxes and archs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17158#issuecomment-2205333212 From sgehwolf at openjdk.org Wed Jul 3 10:15:21 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Wed, 3 Jul 2024 10:15:21 GMT Subject: RFR: 8322475: Extend printing for System.map [v7] In-Reply-To: <x8LrKZVgWp9X2bwTSnV3KppIRabvSzfvnRLuLDgvy84=.7b71527c-95a7-49e1-a0c6-e78c81f644c1@github.com> References: <xXLpEw01_OAADNe6SFsw8sBYqjShMROIKQH3IflvgAM=.facb614e-cc97-441f-873f-e7453bd4338d@github.com> <x8LrKZVgWp9X2bwTSnV3KppIRabvSzfvnRLuLDgvy84=.7b71527c-95a7-49e1-a0c6-e78c81f644c1@github.com> Message-ID: <or-sjCNsgmoehxnd5HyNFp_4jFGzuYbYr_h9KOsmOx8=.8ed899a0-3b85-4ebf-85db-5dd654cb61d1@github.com> On Wed, 3 Jul 2024 07:55:48 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> This is an expansion on the new `System.map` command introduced with JDK-8318636. >> >> We now print valuable information per memory region, such as: >> >> - the actual resident set size >> - the actual number of huge pages >> - the actual used page size >> - the THP state of the region (was advised, is eligible, uses THP, ...) >> - whether the region is shared >> - whether the region had been committed (backed by swap) >> - whether the region has been swapped out. >> >> Example output: >> >> [system-map-thp1.txt](https://github.com/user-attachments/files/15587748/system-map-thp1.txt) >> >> >> from to size rss hugetlb pgsz prot notes vm info/file >> 0x00000000c0000000 - 0x00000000ffe00000 1071644672 0 4194304 2M rw-p huge JAVAHEAP /anon_hugepage >> 0x00000000ffe00000 - 0x0000000100000000 2097152 0 0 2M rw-p huge JAVAHEAP /anon_hugepage >> 0x0000558016b67000 - 0x0000558016b68000 4096 4096 0 4K r--p /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java >> 0x0000558016b68000 - 0x0000558016b69000 4096 4096 0 4K r-xp /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java >> 0x00007f3a749f2000 - 0x00007f3a74c62000 2555904 2555904 0 4K rwxp CODE(CodeHeap 'profiled nmethods') >> 0x00007f3a74c62000 - 0x00007f3a7be51000 119468032 0 0 4K ---p nores CODE(CodeHeap 'profiled nmethods') >> 0x00007f3a7be51000 - 0x00007f3a7c1c1000 3604480 3604480 0 4K rwxp CODE(CodeHeap 'profiled nmethods') >> 0x00007f3a7c1c1000 - 0x00007f3a7c592000 4001792 0 0 4K ---p nores CODE(CodeHeap 'non-nmethods') >> 0x00007f3a7c592000 - 0x00007f3a7c802000 2555904 2... > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: > > - no stack on 32-bit, scan for vDSO lib instead > - feedback severin > - Merge branch 'master' into System.maps-more-info > - feedback johan > - fix merge errors > - Merge branch 'master' into System.maps-more-info > - copyrights > - Merge branch 'master' into System.maps-more-info > - fix merge issue > - Merge branch 'master' into System.maps-more-info > - ... and 16 more: https://git.openjdk.org/jdk/compare/0db9bc57...3cc5943d Looks OK to me. ------------- Marked as reviewed by sgehwolf (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17158#pullrequestreview-2156041166 From kbarrett at openjdk.org Wed Jul 3 11:14:22 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 3 Jul 2024 11:14:22 GMT Subject: RFR: 8335591: Fix -Wzero-as-null-pointer-constant warnings in ConcurrentHashTable In-Reply-To: <aYI3OWb3zWuLPK8CJHq20UWxO55mtnqhxnsAYNIBpE8=.d2df1e20-52f5-4900-9026-064e3bd8d6e0@github.com> References: <CIgX0UWlbnyNubbrvO9B3IE5Ilm9QP7f8v0Jho752tc=.03073f61-304d-4d4d-b85a-72c5f8ffcdbf@github.com> <aYI3OWb3zWuLPK8CJHq20UWxO55mtnqhxnsAYNIBpE8=.d2df1e20-52f5-4900-9026-064e3bd8d6e0@github.com> Message-ID: <1dXlcjPoisEhSe50WkJGe47to7wdbuAGaOsifGKmrak=.51c72d76-afd2-463e-918d-400b1cca370f@github.com> On Wed, 3 Jul 2024 06:56:12 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote: >> Please review this trivial change to ConcurrentHashTable. Initialization of >> and assignments to the _invisible_epoch member are changed from a value of 0 >> to nullptr. This removes some -Wzero-as-null-pointer-constant warnings when >> building with that enabled. >> >> Testing: mach5 tier1 > > Looks good and trivial. Thanks for review @chhagedorn ------------- PR Comment: https://git.openjdk.org/jdk/pull/19996#issuecomment-2205826875 From kbarrett at openjdk.org Wed Jul 3 11:14:23 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 3 Jul 2024 11:14:23 GMT Subject: Integrated: 8335591: Fix -Wzero-as-null-pointer-constant warnings in ConcurrentHashTable In-Reply-To: <CIgX0UWlbnyNubbrvO9B3IE5Ilm9QP7f8v0Jho752tc=.03073f61-304d-4d4d-b85a-72c5f8ffcdbf@github.com> References: <CIgX0UWlbnyNubbrvO9B3IE5Ilm9QP7f8v0Jho752tc=.03073f61-304d-4d4d-b85a-72c5f8ffcdbf@github.com> Message-ID: <o1wSdNWF7BdFsVQXAyEO8F00DkjWkRKzpYG2cM7z7PM=.952dff23-7721-43f7-af9d-ac0cb01dae22@github.com> On Wed, 3 Jul 2024 04:54:59 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: > Please review this trivial change to ConcurrentHashTable. Initialization of > and assignments to the _invisible_epoch member are changed from a value of 0 > to nullptr. This removes some -Wzero-as-null-pointer-constant warnings when > building with that enabled. > > Testing: mach5 tier1 This pull request has now been integrated. Changeset: c06b75ff Author: Kim Barrett <kbarrett at openjdk.org> URL: https://git.openjdk.org/jdk/commit/c06b75ff88babf57bdcd0919ea177ff363fd858b Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod 8335591: Fix -Wzero-as-null-pointer-constant warnings in ConcurrentHashTable Reviewed-by: chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/19996 From fjiang at openjdk.org Wed Jul 3 12:14:25 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 3 Jul 2024 12:14:25 GMT Subject: RFR: 8335411: RISC-V: Optimize encode_heap_oop when oop is not null In-Reply-To: <Twjo9tLARYiTFf4DlqQzJuwAvbxBDsUJGQ-XYZndDrU=.a9291118-99bb-4430-8d2b-8fcff9f4a003@github.com> References: <oc-oKUicWVvFjZKiZdhlKYw9nQv9kq2zABpj-beTyxA=.79a98f53-bd18-4bdc-b08d-f21494b949a0@github.com> <Twjo9tLARYiTFf4DlqQzJuwAvbxBDsUJGQ-XYZndDrU=.a9291118-99bb-4430-8d2b-8fcff9f4a003@github.com> Message-ID: <FX7RhxK2he69hMb8nzE2yMd5WJZLw0j9uJ2_dxDidZk=.6992ce14-b903-4e30-b94a-47a3c0c86d2a@github.com> On Wed, 3 Jul 2024 06:31:38 GMT, Robbin Ehn <rehn at openjdk.org> wrote: >> Hi, please review this enhancement that adds two more `encode_heap_oop_not_null` methods. >> >> Currently, `encode_heap_oop` will check if the oop pointer is `null` at first. We can skip the null check of the oop to reduce the unnecessary branch instruction when encoding non-null oop pointer into compressed form. >> >> >> Testing: >> - [x] Tier1~3 on linux-riscv64 with release build >> - [x] renaissance & dacapo benchmark suits for functionality > > Nice, thanks! @robehn @RealFYang -- Thanks? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19974#issuecomment-2205930449 From fjiang at openjdk.org Wed Jul 3 12:14:27 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 3 Jul 2024 12:14:27 GMT Subject: Integrated: 8335411: RISC-V: Optimize encode_heap_oop when oop is not null In-Reply-To: <oc-oKUicWVvFjZKiZdhlKYw9nQv9kq2zABpj-beTyxA=.79a98f53-bd18-4bdc-b08d-f21494b949a0@github.com> References: <oc-oKUicWVvFjZKiZdhlKYw9nQv9kq2zABpj-beTyxA=.79a98f53-bd18-4bdc-b08d-f21494b949a0@github.com> Message-ID: <mx_vmNTC-AXMc9qr0utCZgNQp0ybfYOl34wlp3HdL50=.73344f6e-3602-4eea-8f3e-3efa0e253aac@github.com> On Mon, 1 Jul 2024 14:32:03 GMT, Feilong Jiang <fjiang at openjdk.org> wrote: > Hi, please review this enhancement that adds two more `encode_heap_oop_not_null` methods. > > Currently, `encode_heap_oop` will check if the oop pointer is `null` at first. We can skip the null check of the oop to reduce the unnecessary branch instruction when encoding non-null oop pointer into compressed form. > > > Testing: > - [x] Tier1~3 on linux-riscv64 with release build > - [x] renaissance & dacapo benchmark suits for functionality This pull request has now been integrated. Changeset: 5866b16d Author: Feilong Jiang <fjiang at openjdk.org> URL: https://git.openjdk.org/jdk/commit/5866b16dbca3f63770c8792d204dabdf49b59839 Stats: 62 lines in 3 files changed: 59 ins; 0 del; 3 mod 8335411: RISC-V: Optimize encode_heap_oop when oop is not null Reviewed-by: fyang, rehn ------------- PR: https://git.openjdk.org/jdk/pull/19974 From duke at openjdk.org Wed Jul 3 12:33:29 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Wed, 3 Jul 2024 12:33:29 GMT Subject: RFR: 8335615: Clean up left-overs from 8317721 Message-ID: <8mKp_7GIrYq0ncBiLKo7UK0B110YLom7xHiJJLs6YY8=.80719e65-de2f-49a6-96e5-3af42dfd20a9@github.com> The `compiler/intrinsics/zip/TestCRC32.java` is ok. Note about performance: https://github.com/openjdk/jdk/pull/17046#discussion_r1548163980 ------------- Commit messages: - JDK-8335615: Clean up left-overs from 8317721 Changes: https://git.openjdk.org/jdk/pull/20004/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20004&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335615 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20004.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20004/head:pull/20004 PR: https://git.openjdk.org/jdk/pull/20004 From rehn at openjdk.org Wed Jul 3 12:53:54 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 3 Jul 2024 12:53:54 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v21] In-Reply-To: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> Message-ID: <tdkglfMOAm2Mg_Qj_TvnOro8oIqlOV8LKgfqgKTYFIw=.654dd861-9959-4b17-9c5a-6628f2782e3b@github.com> > Hi all, please consider! > > Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). > Using a very small application or running very short time we have fast patchable calls. > But any normal application running longer will increase the code size and code chrun/fragmentation. > So whatever or not you get hot fast calls rely on luck. > > To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. > This would be the common case for a patchable call. > > Code stream: > JAL <trampo> > Stubs: > AUIPC > LD > JALR > <DEST> > > > On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. > Even if you don't have that problem having a call to a jump is not the fastest way. > Loading the address avoids the pitsfalls of cmodx. > > This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, > and instead do by default: > > Code stream: > AUIPC > LD > JALR > Stubs: > <DEST> > > An experimental option for turning trampolines back on exists. > > It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. > > Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): > > fop (msec) 2239 | 2128 = 0.950424 > h2 (msec) 18660 | 16594 = 0.889282 > jython (msec) 22022 | 21925 = 0.995595 > luindex (msec) 2866 | 2842 = 0.991626 > lusearch (msec) 4108 | 4311 = 1.04942 > lusearch-fix (msec) 4406 | 4116 = 0.934181 > pmd (msec) 5976 | 5897 = 0.98678 > jython (msec) 22022 | 21925 = 0.995595 > Avg: 0.974112 > fop(xcomp) (msec) 2721 | 2714 = 0.997427 > h2(xcomp) (msec) 37719 | 38004 = 1.00756 > jython(xcomp) ... Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Rename to reloc_call ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19453/files - new: https://git.openjdk.org/jdk/pull/19453/files/6fd73a66..7337a2fc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=19-20 Stats: 26 lines in 10 files changed: 0 ins; 0 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/19453.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453 PR: https://git.openjdk.org/jdk/pull/19453 From rehn at openjdk.org Wed Jul 3 12:53:54 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 3 Jul 2024 12:53:54 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v19] In-Reply-To: <Om10ZojNaTukbhvpMkDBFrScxrRrbNad3UgtsIWpX1k=.fc472d60-670b-4388-b601-3d7075b2b241@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> <iV5H4rQ0puc5_3boriBFg2XOqhdeuaNkGhXM6ExImYo=.107e0dfd-5096-49f0-8184-d019fd933918@github.com> <-SCwSVP6zTNao7X4VlIPRor8OU9vDg79oGVDWTf4XCM=.b72cbf6f-66e7-4fb3-b387-e00ae0537ac6@github.com> <C0jgPKg6jBtEMdY6ExLPLigaYdAw4ilbi0g4AnoCDUE=.25fccd4a-e53c-48a4-8c7d-cc8ac1e65b73@github.com> <Om10ZojNaTukbhvpMkDBFrScxrRrbNad3UgtsIWpX1k=.fc472d60-670b-4388-b601-3d7075b2b241@github.com> Message-ID: <j10U_JSyhnDVNiIFEjVeMeVstb0MAG_4t-5HhjY7hRs=.2b8e0606-eef6-4c95-bae5-f27831430ced@github.com> On Sat, 29 Jun 2024 01:55:56 GMT, Dean Long <dlong at openjdk.org> wrote: >> Yes. My thinking was, the site is still patachable, even if some sites don't need that capability. >> The reason why this patch ignores near calls is because the short reach of JAL +-1MB (so normally only a few stubs can be reach from a few nmethods). >> But it is on the enhancement list. >> >> I don't mind changing the name, feel free to suggest something! > > The key things seems to be that they are typed with a relocInfo, so maybe `reloc_call`? Ok, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1664141580 From coleenp at openjdk.org Wed Jul 3 12:55:19 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 3 Jul 2024 12:55:19 GMT Subject: RFR: 8335397: Improve reliability of TestRecursiveMonitorChurn.java [v3] In-Reply-To: <HiqUmFciHGmdjRTa9B3RfxRxstbd6BA-QfZck-y5wBE=.98e4d5f3-4d47-406a-8de4-c712ca48d24f@github.com> References: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com> <HiqUmFciHGmdjRTa9B3RfxRxstbd6BA-QfZck-y5wBE=.98e4d5f3-4d47-406a-8de4-c712ca48d24f@github.com> Message-ID: <CH2MObaRHB7xOBoeF78t8nSurjHARVdmzvGK1Uc9kgQ=.61fe6ca0-1e37-4d71-9b1d-367509564300@github.com> On Wed, 3 Jul 2024 07:25:48 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> TestRecursiveMonitorChurn.java currently uses NMT to try and correlate the native memory increase with unwanted inflation. >> >> Change to instead query the JVM for exact number of inflations via the Whitebox API. This allow us to both be more exact and less dependent on interactions with NMT. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java Marked as reviewed by coleenp (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19965#pullrequestreview-2156368077 From fyang at openjdk.org Wed Jul 3 13:46:18 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 3 Jul 2024 13:46:18 GMT Subject: RFR: 8335615: Clean up left-overs from 8317721 In-Reply-To: <8mKp_7GIrYq0ncBiLKo7UK0B110YLom7xHiJJLs6YY8=.80719e65-de2f-49a6-96e5-3af42dfd20a9@github.com> References: <8mKp_7GIrYq0ncBiLKo7UK0B110YLom7xHiJJLs6YY8=.80719e65-de2f-49a6-96e5-3af42dfd20a9@github.com> Message-ID: <XdLKabSGwtSwsscso3HOcsOsPh_xTyvcgEL54KQ0l5I=.cbf4b879-90eb-4066-8b98-fbb2bbf68258@github.com> On Wed, 3 Jul 2024 12:28:42 GMT, ArsenyBochkarev <duke at openjdk.org> wrote: > The `compiler/intrinsics/zip/TestCRC32.java` is ok. Note about performance: https://github.com/openjdk/jdk/pull/17046#discussion_r1548163980 Thanks! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20004#pullrequestreview-2156497002 From duke at openjdk.org Wed Jul 3 13:54:19 2024 From: duke at openjdk.org (duke) Date: Wed, 3 Jul 2024 13:54:19 GMT Subject: RFR: 8335615: Clean up left-overs from 8317721 In-Reply-To: <8mKp_7GIrYq0ncBiLKo7UK0B110YLom7xHiJJLs6YY8=.80719e65-de2f-49a6-96e5-3af42dfd20a9@github.com> References: <8mKp_7GIrYq0ncBiLKo7UK0B110YLom7xHiJJLs6YY8=.80719e65-de2f-49a6-96e5-3af42dfd20a9@github.com> Message-ID: <ZxM_7zxGnjwuCVWpvH91ZX8mTNji4STS5loz3o0w4OY=.d21774e1-6ea7-43cd-9fd3-3cae3ae11b34@github.com> On Wed, 3 Jul 2024 12:28:42 GMT, ArsenyBochkarev <duke at openjdk.org> wrote: > The `compiler/intrinsics/zip/TestCRC32.java` is ok. Note about performance: https://github.com/openjdk/jdk/pull/17046#discussion_r1548163980 @ArsenyBochkarev Your change (at version fd2db6da165aa26923537b93128b4dfc7a62ab3a) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20004#issuecomment-2206135362 From duke at openjdk.org Wed Jul 3 14:12:23 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Wed, 3 Jul 2024 14:12:23 GMT Subject: Integrated: 8335615: Clean up left-overs from 8317721 In-Reply-To: <8mKp_7GIrYq0ncBiLKo7UK0B110YLom7xHiJJLs6YY8=.80719e65-de2f-49a6-96e5-3af42dfd20a9@github.com> References: <8mKp_7GIrYq0ncBiLKo7UK0B110YLom7xHiJJLs6YY8=.80719e65-de2f-49a6-96e5-3af42dfd20a9@github.com> Message-ID: <dE1Bo5ledkwRLSbRspakcga5Sj4yfmIxiOSGKQogMpI=.7853515f-1f60-4c1e-b079-5cf4c68788f2@github.com> On Wed, 3 Jul 2024 12:28:42 GMT, ArsenyBochkarev <duke at openjdk.org> wrote: > The `compiler/intrinsics/zip/TestCRC32.java` is ok. Note about performance: https://github.com/openjdk/jdk/pull/17046#discussion_r1548163980 This pull request has now been integrated. Changeset: 5a8af2b8 Author: Arseny Bochkarev <arseny.bochkarev at syntacore.com> Committer: Vladimir Kempik <vkempik at openjdk.org> URL: https://git.openjdk.org/jdk/commit/5a8af2b8b93672de9b3a3e73e6984506980da932 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod 8335615: Clean up left-overs from 8317721 Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/20004 From stuefe at openjdk.org Wed Jul 3 16:11:31 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 3 Jul 2024 16:11:31 GMT Subject: RFR: 8322475: Extend printing for System.map [v7] In-Reply-To: <x8LrKZVgWp9X2bwTSnV3KppIRabvSzfvnRLuLDgvy84=.7b71527c-95a7-49e1-a0c6-e78c81f644c1@github.com> References: <xXLpEw01_OAADNe6SFsw8sBYqjShMROIKQH3IflvgAM=.facb614e-cc97-441f-873f-e7453bd4338d@github.com> <x8LrKZVgWp9X2bwTSnV3KppIRabvSzfvnRLuLDgvy84=.7b71527c-95a7-49e1-a0c6-e78c81f644c1@github.com> Message-ID: <DjtiwDJV9LMKe0g7Lfsb96MhmtcGatw-TXaCC8uhOvg=.ea5afc08-dc09-4d2e-b5df-6c040c00b82d@github.com> On Wed, 3 Jul 2024 07:55:48 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> This is an expansion on the new `System.map` command introduced with JDK-8318636. >> >> We now print valuable information per memory region, such as: >> >> - the actual resident set size >> - the actual number of huge pages >> - the actual used page size >> - the THP state of the region (was advised, is eligible, uses THP, ...) >> - whether the region is shared >> - whether the region had been committed (backed by swap) >> - whether the region has been swapped out. >> >> Example output: >> >> [system-map-thp1.txt](https://github.com/user-attachments/files/15587748/system-map-thp1.txt) >> >> >> from to size rss hugetlb pgsz prot notes vm info/file >> 0x00000000c0000000 - 0x00000000ffe00000 1071644672 0 4194304 2M rw-p huge JAVAHEAP /anon_hugepage >> 0x00000000ffe00000 - 0x0000000100000000 2097152 0 0 2M rw-p huge JAVAHEAP /anon_hugepage >> 0x0000558016b67000 - 0x0000558016b68000 4096 4096 0 4K r--p /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java >> 0x0000558016b68000 - 0x0000558016b69000 4096 4096 0 4K r-xp /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java >> 0x00007f3a749f2000 - 0x00007f3a74c62000 2555904 2555904 0 4K rwxp CODE(CodeHeap 'profiled nmethods') >> 0x00007f3a74c62000 - 0x00007f3a7be51000 119468032 0 0 4K ---p nores CODE(CodeHeap 'profiled nmethods') >> 0x00007f3a7be51000 - 0x00007f3a7c1c1000 3604480 3604480 0 4K rwxp CODE(CodeHeap 'profiled nmethods') >> 0x00007f3a7c1c1000 - 0x00007f3a7c592000 4001792 0 0 4K ---p nores CODE(CodeHeap 'non-nmethods') >> 0x00007f3a7c592000 - 0x00007f3a7c802000 2555904 2... > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: > > - no stack on 32-bit, scan for vDSO lib instead > - feedback severin > - Merge branch 'master' into System.maps-more-info > - feedback johan > - fix merge errors > - Merge branch 'master' into System.maps-more-info > - copyrights > - Merge branch 'master' into System.maps-more-info > - fix merge issue > - Merge branch 'master' into System.maps-more-info > - ... and 16 more: https://git.openjdk.org/jdk/compare/0db9bc57...3cc5943d x64 fastdebug build error unrelated. I locally built and tested on linux x64. Thanks @jerboaa ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17158#issuecomment-2206708662 From stuefe at openjdk.org Wed Jul 3 16:11:33 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 3 Jul 2024 16:11:33 GMT Subject: Integrated: 8322475: Extend printing for System.map In-Reply-To: <xXLpEw01_OAADNe6SFsw8sBYqjShMROIKQH3IflvgAM=.facb614e-cc97-441f-873f-e7453bd4338d@github.com> References: <xXLpEw01_OAADNe6SFsw8sBYqjShMROIKQH3IflvgAM=.facb614e-cc97-441f-873f-e7453bd4338d@github.com> Message-ID: <Jz3SMNKWUmZHMAbauun0xkOW89W80eGA2PIr-oxOkJU=.356c4e73-b209-43a6-a460-280b48caf56a@github.com> On Tue, 19 Dec 2023 15:48:58 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > This is an expansion on the new `System.map` command introduced with JDK-8318636. > > We now print valuable information per memory region, such as: > > - the actual resident set size > - the actual number of huge pages > - the actual used page size > - the THP state of the region (was advised, is eligible, uses THP, ...) > - whether the region is shared > - whether the region had been committed (backed by swap) > - whether the region has been swapped out. > > Example output: > > [system-map-thp1.txt](https://github.com/user-attachments/files/15587748/system-map-thp1.txt) > > > from to size rss hugetlb pgsz prot notes vm info/file > 0x00000000c0000000 - 0x00000000ffe00000 1071644672 0 4194304 2M rw-p huge JAVAHEAP /anon_hugepage > 0x00000000ffe00000 - 0x0000000100000000 2097152 0 0 2M rw-p huge JAVAHEAP /anon_hugepage > 0x0000558016b67000 - 0x0000558016b68000 4096 4096 0 4K r--p /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java > 0x0000558016b68000 - 0x0000558016b69000 4096 4096 0 4K r-xp /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java > 0x00007f3a749f2000 - 0x00007f3a74c62000 2555904 2555904 0 4K rwxp CODE(CodeHeap 'profiled nmethods') > 0x00007f3a74c62000 - 0x00007f3a7be51000 119468032 0 0 4K ---p nores CODE(CodeHeap 'profiled nmethods') > 0x00007f3a7be51000 - 0x00007f3a7c1c1000 3604480 3604480 0 4K rwxp CODE(CodeHeap 'profiled nmethods') > 0x00007f3a7c1c1000 - 0x00007f3a7c592000 4001792 0 0 4K ---p nores CODE(CodeHeap 'non-nmethods') > 0x00007f3a7c592000 - 0x00007f3a7c802000 2555904 2555904 0 4K rwxp CODE(Code... This pull request has now been integrated. Changeset: 8aaec37a Author: Thomas Stuefe <stuefe at openjdk.org> URL: https://git.openjdk.org/jdk/commit/8aaec37ace102b55ee1387cfd1967ec3ab662083 Stats: 624 lines in 14 files changed: 425 ins; 111 del; 88 mod 8322475: Extend printing for System.map Reviewed-by: sgehwolf, jsjolen ------------- PR: https://git.openjdk.org/jdk/pull/17158 From pchilanomate at openjdk.org Wed Jul 3 16:43:26 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 3 Jul 2024 16:43:26 GMT Subject: RFR: 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 Message-ID: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens. The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details). The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. I tested the patch by running it through mach5 tiers 1-6. Thanks, Patricio ------------- Commit messages: - v1 Changes: https://git.openjdk.org/jdk/pull/20012/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20012&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335409 Stats: 38 lines in 3 files changed: 11 ins; 15 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/20012.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20012/head:pull/20012 PR: https://git.openjdk.org/jdk/pull/20012 From iklam at openjdk.org Wed Jul 3 17:04:41 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 3 Jul 2024 17:04:41 GMT Subject: RFR: 8312125: Refactor CDS enum class handling Message-ID: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com> Please review this simple refactoring of the CDS code for handling enum classes. The code is moved to new files cdsEnumKlass.cpp/hpp. There's otherwise no change. ------------- Commit messages: - 8312125: Refactor CDS enum class handling Changes: https://git.openjdk.org/jdk/pull/20013/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20013&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8312125 Stats: 285 lines in 5 files changed: 190 ins; 93 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20013.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20013/head:pull/20013 PR: https://git.openjdk.org/jdk/pull/20013 From sgehwolf at openjdk.org Wed Jul 3 17:37:20 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Wed, 3 Jul 2024 17:37:20 GMT Subject: RFR: 8334738: os::print_hex_dump should optionally print ASCII [v3] In-Reply-To: <nyLYOhw7-wSPlKjeWi3FyuLY0UzFwWJdj-19ijEInU4=.6f539aaf-0cff-4ab8-8ca0-3acd3b44d071@github.com> References: <YKa7IgCjp0GLJDZFTlLVoBfDavVdj1Fc5XmQV-xVBM8=.46792106-0555-47bd-899f-056fa5219d03@github.com> <nyLYOhw7-wSPlKjeWi3FyuLY0UzFwWJdj-19ijEInU4=.6f539aaf-0cff-4ab8-8ca0-3acd3b44d071@github.com> Message-ID: <EliUQk2e0HZE3BQ3BKOGvF81KROy_lLp4OgK-hRWazA=.79466db9-87df-403c-a928-15e1dea8bbd5@github.com> On Thu, 27 Jun 2024 08:05:42 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Motivated by analyzing CDS dump differences for reproducible builds, I found an optional ASCII printout to be valuable. As usual with hex dumps, ascii follows hex printout >> >> Example: >> >> >> >> 118 0x00000000000001c0: 204b444a6e65704f 53207469422d3436 4d56207265767265 6564747361662820 OpenJDK 64-Bit Server VM (fastde >> 119 0x00000000000001e0: 692d343220677562 2d6c616e7265746e 68742e636f686461 756f732e73616d6f bug 24-internal-adhoc.thomas.sou >> 120 0x0000000000000200: 726f662029656372 612d78756e696c20 45524a203436646d 746e692d34322820 rce) for linux-amd64 JRE (24-int >> 121 0x0000000000000220: 64612d6c616e7265 6d6f68742e636f68 6372756f732e7361 6c697562202c2965 ernal-adhoc.thomas.source), buil >> 122 0x0000000000000240: 323032206e6f2074 5430322d36302d34 32313a35343a3031 672068746977205a t on 2024-06-20T10:45:12Z with g >> 123 0x0000000000000260: 2e352e3031206363 0000000000000030 0000000000000000 0000000000000000 cc 10.5.0_______________________ >> 124 0x0000000000000280: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ________________________________ >> 125 0x00000000000002a0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ________________________________ >> >> >> The patch does that. >> >> Small unrelated changes: >> >> - I rewrote and extended the gtests, testing now a real-life printout containing a mixture or readable and non-readable pages, and printable and non-printable characters. I re-enabled tests on Windows, since https://bugs.openjdk.org/browse/JDK-8185734 is long solved. >> >> - The new test uncovered an issue on 32-bit when printing giant words. We shift a signed value by 32 bits upwards, which can result in -1 resp. ffffffff in the upper half of the giant word. One of the pitfalls of intptr_t vs uintptr_t (I think most uses of intptr_t should probably use uintptr_t). >> >> - I got tired of casting constness away from to-be-printed memory range just to be able to feed an address to os::print_hex_dump. The content printed is usually const. os::print_hex_dump does not need non-constness, but since we use address, and address is typedef char*, and one cannot declare a typedef'ed pointer target-const, the issue is there. I therefore changed the input to const uint8_t*. Maybe we need a const_address or something similar. >> >> ---- >> >> Ran tests on Linux x64 and x86, Windows x86 and Mac aarch64. Fixed all issues I found. Only little-endian, I don't have big-e... > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > exclude test for AIX This isn't really the area of my expertise, but the patch seems reasonable to me. src/hotspot/share/runtime/os.cpp line 945: > 943: > 944: ATTRIBUTE_NO_ASAN static bool read_safely_from(const uintptr_t* p, uintptr_t* result) { > 945: DEBUG_ONLY(*result = 0xAAAA;) It's not clear why this was added. Left-over? ------------- Marked as reviewed by sgehwolf (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19835#pullrequestreview-2156999448 PR Review Comment: https://git.openjdk.org/jdk/pull/19835#discussion_r1664527314 From matsaave at openjdk.org Wed Jul 3 17:51:20 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 3 Jul 2024 17:51:20 GMT Subject: RFR: 8312125: Refactor CDS enum class handling In-Reply-To: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com> References: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com> Message-ID: <ZHucB6ROGjR9owwiQyZpm-k25GRwPdkmpLoaXglBZ7M=.e04beda9-56d1-4425-9297-5cac89ee7f57@github.com> On Wed, 3 Jul 2024 17:00:30 GMT, Ioi Lam <iklam at openjdk.org> wrote: > Please review this simple refactoring of the CDS code for handling enum classes. The code is moved to new files cdsEnumKlass.cpp/hpp. There's otherwise no change. I have two comments but overall this looks good! src/hotspot/share/cds/cdsEnumKlass.cpp line 2: > 1: /* > 2: * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. Please update the copyrights on the new files. src/hotspot/share/cds/cdsEnumKlass.cpp line 68: > 66: oop orig_obj) { > 67: assert(level > 1, "must never be called at the first (outermost) level"); > 68: assert(is_enum_obj(orig_obj), "must be"); Is this assert redundant? You check for this before you make the call to `handle_enum_obj()` below. I think you can move the if statement in here. ------------- Changes requested by matsaave (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20013#pullrequestreview-2157034791 PR Review Comment: https://git.openjdk.org/jdk/pull/20013#discussion_r1664548371 PR Review Comment: https://git.openjdk.org/jdk/pull/20013#discussion_r1664553655 From hgreule at openjdk.org Wed Jul 3 19:47:39 2024 From: hgreule at openjdk.org (Hannes Greule) Date: Wed, 3 Jul 2024 19:47:39 GMT Subject: RFR: 8335638: Calling VarHandle.{access-mode} methods reflectively throws wrong exception Message-ID: <gD4D2MSMO5dqwOf-XWA1u-a50e59goP8F_6be-mermA=.d172f4cf-14ad-492b-bdcc-8cf39d77c8ef@github.com> Similar to how `MethodHandle#invoke(Exact)` methods are already handled, this change adds special casing for `VarHandle.{access-mode}` methods. The exception message is less exact, but I think that's acceptable. ------------- Commit messages: - add test (and find missing method) - make reflective calls to signature polymorphic methods in VarHandle throw UOE Changes: https://git.openjdk.org/jdk/pull/20015/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20015&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335638 Stats: 75 lines in 2 files changed: 71 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20015.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20015/head:pull/20015 PR: https://git.openjdk.org/jdk/pull/20015 From iklam at openjdk.org Wed Jul 3 19:57:51 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 3 Jul 2024 19:57:51 GMT Subject: RFR: 8312125: Refactor CDS enum class handling [v2] In-Reply-To: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com> References: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com> Message-ID: <xxU06cCiROZP1kPcY6pWxomBLPTGPPnxrbc22c-K08E=.4c6d7a88-29d0-43eb-a917-4e90766ddfc9@github.com> > Please review this simple refactoring of the CDS code for handling enum classes. The code is moved to new files cdsEnumKlass.cpp/hpp. There's otherwise no change. Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: fixed copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20013/files - new: https://git.openjdk.org/jdk/pull/20013/files/64b77ecb..49dc109e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20013&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20013&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20013.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20013/head:pull/20013 PR: https://git.openjdk.org/jdk/pull/20013 From iklam at openjdk.org Wed Jul 3 19:57:52 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 3 Jul 2024 19:57:52 GMT Subject: RFR: 8312125: Refactor CDS enum class handling [v2] In-Reply-To: <ZHucB6ROGjR9owwiQyZpm-k25GRwPdkmpLoaXglBZ7M=.e04beda9-56d1-4425-9297-5cac89ee7f57@github.com> References: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com> <ZHucB6ROGjR9owwiQyZpm-k25GRwPdkmpLoaXglBZ7M=.e04beda9-56d1-4425-9297-5cac89ee7f57@github.com> Message-ID: <7LNBOLBhkwk0BpAWFiJv-i0PFhX6M54IwpVM6w0a5uc=.59137b5a-8810-4d7b-81af-ac1d931cc80b@github.com> On Wed, 3 Jul 2024 17:42:07 GMT, Matias Saavedra Silva <matsaave at openjdk.org> wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed copyright > > src/hotspot/share/cds/cdsEnumKlass.cpp line 2: > >> 1: /* >> 2: * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. > > Please update the copyrights on the new files. I changed the dates to `2023, 2024` as these files were initially published in the leyden repo last year. > src/hotspot/share/cds/cdsEnumKlass.cpp line 68: > >> 66: oop orig_obj) { >> 67: assert(level > 1, "must never be called at the first (outermost) level"); >> 68: assert(is_enum_obj(orig_obj), "must be"); > > Is this assert redundant? You check for this before you make the call to `handle_enum_obj()` below. I think you can move the if statement in here. The assert declares the fact that the caller must check that orig_obj is an enum before calling this function. This guards against inadvertent changes that might accidentally drop the "if" check in the caller. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20013#discussion_r1664690497 PR Review Comment: https://git.openjdk.org/jdk/pull/20013#discussion_r1664687592 From matsaave at openjdk.org Wed Jul 3 20:08:17 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 3 Jul 2024 20:08:17 GMT Subject: RFR: 8312125: Refactor CDS enum class handling [v2] In-Reply-To: <xxU06cCiROZP1kPcY6pWxomBLPTGPPnxrbc22c-K08E=.4c6d7a88-29d0-43eb-a917-4e90766ddfc9@github.com> References: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com> <xxU06cCiROZP1kPcY6pWxomBLPTGPPnxrbc22c-K08E=.4c6d7a88-29d0-43eb-a917-4e90766ddfc9@github.com> Message-ID: <uy6CKlyFbVZ-yLe6Mklejpa6AmToFoMFuV_tL6VJ-f4=.0d123829-80e2-43f3-8d2c-c7ff8973cb0a@github.com> On Wed, 3 Jul 2024 19:57:51 GMT, Ioi Lam <iklam at openjdk.org> wrote: >> Please review this simple refactoring of the CDS code for handling enum classes. The code is moved to new files cdsEnumKlass.cpp/hpp. There's otherwise no change. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > fixed copyright Thanks for the changes and clarification! ------------- Marked as reviewed by matsaave (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20013#pullrequestreview-2157295122 From liach at openjdk.org Wed Jul 3 20:36:22 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 3 Jul 2024 20:36:22 GMT Subject: RFR: 8335638: Calling VarHandle.{access-mode} methods reflectively throws wrong exception In-Reply-To: <gD4D2MSMO5dqwOf-XWA1u-a50e59goP8F_6be-mermA=.d172f4cf-14ad-492b-bdcc-8cf39d77c8ef@github.com> References: <gD4D2MSMO5dqwOf-XWA1u-a50e59goP8F_6be-mermA=.d172f4cf-14ad-492b-bdcc-8cf39d77c8ef@github.com> Message-ID: <wehVWk76N-IIsKi8LfdOchbLJGxUfxsW8Y9VqiDd-2k=.1b3d5923-9ba1-4eec-b926-164c7423cc87@github.com> On Wed, 3 Jul 2024 19:43:05 GMT, Hannes Greule <hgreule at openjdk.org> wrote: > Similar to how `MethodHandle#invoke(Exact)` methods are already handled, this change adds special casing for `VarHandle.{access-mode}` methods. > > The exception message is less exact, but I think that's acceptable. Great work! src/hotspot/share/prims/methodHandles.cpp line 1372: > 1370: */ > 1371: JVM_ENTRY(jobject, VH_UOE(JNIEnv* env, jobject mh, jobjectArray args)) { > 1372: THROW_MSG_NULL(vmSymbols::java_lang_UnsupportedOperationException(), "VarHandle access mode method a cannot be invoked reflectively"); Suggestion: THROW_MSG_NULL(vmSymbols::java_lang_UnsupportedOperationException(), "VarHandle access mode methods cannot be invoked reflectively"); Looks like a typo to me. src/hotspot/share/prims/methodHandles.cpp line 1419: > 1417: static JNINativeMethod VH_methods[] = { > 1418: // UnsupportedOperationException throwers > 1419: {CC "compareAndExchange", CC "([" OBJ ")" OBJ, FN_PTR(VH_UOE)}, I recommend ordering these by the order in `AccessMode`, which is also the declaration order in `VarHandle`; that way, if we add a new access mode, it's easier for us to maintain this list. src/hotspot/share/prims/methodHandles.cpp line 1457: > 1455: JVM_ENTRY(void, JVM_RegisterMethodHandleMethods(JNIEnv *env, jclass MHN_class)) { > 1456: assert(!MethodHandles::enabled(), "must not be enabled"); > 1457: assert(vmClasses::MethodHandle_klass() != nullptr, "should be present"); Should we duplicate this assert for `vmClasses::VarHandle_klass()` too? test/jdk/java/lang/invoke/VarHandles/VarHandleTestReflection.java line 1: > 1: /* The copyright header's year needs an update. test/jdk/java/lang/invoke/VarHandles/VarHandleTestReflection.java line 69: > 67: VarHandle v = handle(); > 68: > 69: // Try a reflective invoke using a Method, with an array of 0 arguments Suggestion: // Try a reflective invoke using a Method, with the minimal required argument test/jdk/java/lang/invoke/VarHandles/VarHandleTestReflection.java line 72: > 70: > 71: Method vhm = VarHandle.class.getMethod(accessMode.methodName(), Object[].class); > 72: Object args = new Object[0]; I recommend naming this `arg`, as this is the single arg to the reflected method. Had you inlined this, you would have called `vhm.invoke(v, (Object) new Object[0]);` ------------- PR Review: https://git.openjdk.org/jdk/pull/20015#pullrequestreview-2157341254 PR Review Comment: https://git.openjdk.org/jdk/pull/20015#discussion_r1664744641 PR Review Comment: https://git.openjdk.org/jdk/pull/20015#discussion_r1664741601 PR Review Comment: https://git.openjdk.org/jdk/pull/20015#discussion_r1664737631 PR Review Comment: https://git.openjdk.org/jdk/pull/20015#discussion_r1664753008 PR Review Comment: https://git.openjdk.org/jdk/pull/20015#discussion_r1664751627 PR Review Comment: https://git.openjdk.org/jdk/pull/20015#discussion_r1664751688 From psandoz at openjdk.org Wed Jul 3 21:34:18 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 3 Jul 2024 21:34:18 GMT Subject: RFR: 8335638: Calling VarHandle.{access-mode} methods reflectively throws wrong exception In-Reply-To: <gD4D2MSMO5dqwOf-XWA1u-a50e59goP8F_6be-mermA=.d172f4cf-14ad-492b-bdcc-8cf39d77c8ef@github.com> References: <gD4D2MSMO5dqwOf-XWA1u-a50e59goP8F_6be-mermA=.d172f4cf-14ad-492b-bdcc-8cf39d77c8ef@github.com> Message-ID: <xAUJsRxtQTGRtlpHChBG17bxbhbFbIE0iK_3YYz5z2Y=.e63e4147-7197-4438-9ffa-3f2deead0088@github.com> On Wed, 3 Jul 2024 19:43:05 GMT, Hannes Greule <hgreule at openjdk.org> wrote: > Similar to how `MethodHandle#invoke(Exact)` methods are already handled, this change adds special casing for `VarHandle.{access-mode}` methods. > > The exception message is less exact, but I think that's acceptable. src/hotspot/share/prims/methodHandles.cpp line 1371: > 1369: * invoked directly. > 1370: */ > 1371: JVM_ENTRY(jobject, VH_UOE(JNIEnv* env, jobject mh, jobjectArray args)) { Suggestion: JVM_ENTRY(jobject, VH_UOE(JNIEnv* env, jobject vh, jobjectArray args)) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20015#discussion_r1664836522 From dholmes at openjdk.org Wed Jul 3 22:03:28 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 3 Jul 2024 22:03:28 GMT Subject: RFR: 8322475: Extend printing for System.map [v7] In-Reply-To: <x8LrKZVgWp9X2bwTSnV3KppIRabvSzfvnRLuLDgvy84=.7b71527c-95a7-49e1-a0c6-e78c81f644c1@github.com> References: <xXLpEw01_OAADNe6SFsw8sBYqjShMROIKQH3IflvgAM=.facb614e-cc97-441f-873f-e7453bd4338d@github.com> <x8LrKZVgWp9X2bwTSnV3KppIRabvSzfvnRLuLDgvy84=.7b71527c-95a7-49e1-a0c6-e78c81f644c1@github.com> Message-ID: <7_EFxaouVZu-UhsgaEPKb771nYu0H9bLkyywlxgLhWM=.2145753f-7397-45c0-8a2b-8c5a1fafd74e@github.com> On Wed, 3 Jul 2024 07:55:48 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> This is an expansion on the new `System.map` command introduced with JDK-8318636. >> >> We now print valuable information per memory region, such as: >> >> - the actual resident set size >> - the actual number of huge pages >> - the actual used page size >> - the THP state of the region (was advised, is eligible, uses THP, ...) >> - whether the region is shared >> - whether the region had been committed (backed by swap) >> - whether the region has been swapped out. >> >> Example output: >> >> [system-map-thp1.txt](https://github.com/user-attachments/files/15587748/system-map-thp1.txt) >> >> >> from to size rss hugetlb pgsz prot notes vm info/file >> 0x00000000c0000000 - 0x00000000ffe00000 1071644672 0 4194304 2M rw-p huge JAVAHEAP /anon_hugepage >> 0x00000000ffe00000 - 0x0000000100000000 2097152 0 0 2M rw-p huge JAVAHEAP /anon_hugepage >> 0x0000558016b67000 - 0x0000558016b68000 4096 4096 0 4K r--p /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java >> 0x0000558016b68000 - 0x0000558016b69000 4096 4096 0 4K r-xp /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java >> 0x00007f3a749f2000 - 0x00007f3a74c62000 2555904 2555904 0 4K rwxp CODE(CodeHeap 'profiled nmethods') >> 0x00007f3a74c62000 - 0x00007f3a7be51000 119468032 0 0 4K ---p nores CODE(CodeHeap 'profiled nmethods') >> 0x00007f3a7be51000 - 0x00007f3a7c1c1000 3604480 3604480 0 4K rwxp CODE(CodeHeap 'profiled nmethods') >> 0x00007f3a7c1c1000 - 0x00007f3a7c592000 4001792 0 0 4K ---p nores CODE(CodeHeap 'non-nmethods') >> 0x00007f3a7c592000 - 0x00007f3a7c802000 2555904 2... > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: > > - no stack on 32-bit, scan for vDSO lib instead > - feedback severin > - Merge branch 'master' into System.maps-more-info > - feedback johan > - fix merge errors > - Merge branch 'master' into System.maps-more-info > - copyrights > - Merge branch 'master' into System.maps-more-info > - fix merge issue > - Merge branch 'master' into System.maps-more-info > - ... and 16 more: https://git.openjdk.org/jdk/compare/0db9bc57...3cc5943d The modified tests are failing in our CI on Linux x64 and Aarch64 - see https://bugs.openjdk.org/browse/JDK-8335643 ------------- PR Comment: https://git.openjdk.org/jdk/pull/17158#issuecomment-2207383900 From Matthew.Carter at microsoft.com Wed Jul 3 23:13:43 2024 From: Matthew.Carter at microsoft.com (Mat Carter) Date: Wed, 3 Jul 2024 23:13:43 +0000 Subject: Proposal for small experimental change to compiler thread calculation. Message-ID: <SJ0PR21MB2040BDBA4389364FBC5D39B58ADD2@SJ0PR21MB2040.namprd21.prod.outlook.com> We've been looking at compiler queue load, both under dynamic and static configurations and added jdk.CompilerQueueUtilization to the JFR logging to help with this. There's much discussion in JBS ([1], [2] and [3] amongst others) and on the mailinglists regarding 1 vs 2 queues, shared threads vs dedicated and the benefits/tradeoffs of dynamic compiler threads. Our proposal is to allow the 1:2 ratio (c1:c2) to be overridden on the command line with a goal to allow experimentation that might help either solidify the rational around the current settings or set us on a new path to make some changes. Further it could allow developers to fine tune the ratio for their workload specific needs. Something like -XX:CICompilerThreadRatio="2:3" (default is "1:2" which matches the current settings) Note that the math to calculate the allocation of CICompilerCounts to C1 and C2 would remain integer, ensuring that the default ratio of 1:2 allocates the same number of threads to C1 and C2 as it does today. Other than adding a new command line option, the only change would be in the initialize method in src/hotspot/share/compiler/compilationPolicy.cpp There's also a thought on setting the compiler threads explicitly that we're happy to table until later: -XX:CICompilerThreadCounts="3:4"; in this case we'd compute CICompilerCounts as the sum of C1 and C2 threads. Thoughts and questions appreciated, thanks in advance Mat [1] https://bugs.openjdk.org/browse/JDK-8134507 [2] https://bugs.openjdk.org/browse/JDK-8198756 [3] https://bugs.openjdk.org/browse/JDK-8302264 -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20240703/a5b4b41c/attachment.htm> From xpeng at openjdk.org Thu Jul 4 00:36:33 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 4 Jul 2024 00:36:33 GMT Subject: RFR: 8334231: Optimize MethodData layout Message-ID: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com> Hi all, This PR is a part of the https://bugs.openjdk.org/browse/JDK-8334227 to optimize Hotspot C++ class layouts, this one is for the layout of MethodData. Here is the original layout from `pahole`: class MethodData : public Metadata { public: /* class Metadata <ancestor>; */ /* 0 0 */ /* XXX 8 bytes hole, try to pack */ class Method * _method; /* 8 8 */ int _size; /* 16 4 */ int _hint_di; /* 20 4 */ class Mutex _extra_data_lock; /* 24 104 */ /* --- cacheline 2 boundary (128 bytes) --- */ class CompilerCounters _compiler_counters; /* 128 80 */ /* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */ intx _eflags; /* 208 8 */ intx _arg_local; /* 216 8 */ intx _arg_stack; /* 224 8 */ intx _arg_returned; /* 232 8 */ int _creation_mileage; /* 240 4 */ class InvocationCounter _invocation_counter; /* 244 4 */ class InvocationCounter _backedge_counter; /* 248 4 */ int _invocation_counter_start; /* 252 4 */ /* --- cacheline 4 boundary (256 bytes) --- */ int _backedge_counter_start; /* 256 4 */ uint _tenure_traps; /* 260 4 */ int _invoke_mask; /* 264 4 */ int _backedge_mask; /* 268 4 */ short int _num_loops; /* 272 2 */ short int _num_blocks; /* 274 2 */ enum WouldProfile _would_profile; /* 276 4 */ int _jvmci_ir_size; /* 280 4 */ /* XXX 4 bytes hole, try to pack */ class FailedSpeculation * _failed_speculations; /* 288 8 */ int _data_size; /* 296 4 */ int _parameters_type_data_di; /* 300 4 */ int _exception_handler_data_di; /* 304 4 */ /* XXX 4 bytes hole, try to pack */ intptr_t _data[1]; /* 312 8 */ /* size: 320, cachelines: 5, members: 27 */ /* sum members: 304, holes: 3, sum holes: 16 */ }; There are 3 holes in the layout, the 1st 8-byte hole seems related to the ancestor Metadata which actually has 8-byte size, we may not be able to do anything to optimize: class Metadata : public MetaspaceObj { public: /* class MetaspaceObj <ancestor>; */ /* 0 0 */ /* XXX last struct has 1 byte of padding */ int ()(void) * * _vptr.Metadata; /* 0 8 */ /* size: 8, cachelines: 1, members: 2 */ /* paddings: 1, sum paddings: 1 */ /* last cacheline: 8 bytes */ }; The two 4-byte holes should be easy to fix, we can simply swap the position of _jvmci_ir_size and _failed_speculations for better alignment. Here is the new layout after the change: class MethodData : public Metadata { public: /* class Metadata <ancestor>; */ /* 0 0 */ /* XXX 8 bytes hole, try to pack */ class Method * _method; /* 8 8 */ int _size; /* 16 4 */ int _hint_di; /* 20 4 */ class Mutex _extra_data_lock; /* 24 104 */ /* --- cacheline 2 boundary (128 bytes) --- */ class CompilerCounters _compiler_counters; /* 128 80 */ /* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */ intx _eflags; /* 208 8 */ intx _arg_local; /* 216 8 */ intx _arg_stack; /* 224 8 */ intx _arg_returned; /* 232 8 */ int _creation_mileage; /* 240 4 */ class InvocationCounter _invocation_counter; /* 244 4 */ class InvocationCounter _backedge_counter; /* 248 4 */ int _invocation_counter_start; /* 252 4 */ /* --- cacheline 4 boundary (256 bytes) --- */ int _backedge_counter_start; /* 256 4 */ uint _tenure_traps; /* 260 4 */ int _invoke_mask; /* 264 4 */ int _backedge_mask; /* 268 4 */ short int _num_loops; /* 272 2 */ short int _num_blocks; /* 274 2 */ enum WouldProfile _would_profile; /* 276 4 */ class FailedSpeculation * _failed_speculations; /* 280 8 */ int _jvmci_ir_size; /* 288 4 */ int _data_size; /* 292 4 */ int _parameters_type_data_di; /* 296 4 */ int _exception_handler_data_di; /* 300 4 */ intptr_t _data[1]; /* 304 8 */ /* size: 312, cachelines: 5, members: 27 */ /* sum members: 304, holes: 1, sum holes: 8 */ /* last cacheline: 56 bytes */ }; The two 4-byte holes are removed, saving 8 bytes. Also removed unnecessary `private: ` mark. Additional test: - [x] CONF=linux-x86_64-server-fastdebug CONF_CHECK=ignore make clean test TEST=tier2 Best, Xiaolong. ------------- Commit messages: - Swap position of _jvmci_ir_size and _failed_speculations for better alignment - Optimize MethodData layout Changes: https://git.openjdk.org/jdk/pull/20019/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20019&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334231 Stats: 3 lines in 1 file changed: 1 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20019.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20019/head:pull/20019 PR: https://git.openjdk.org/jdk/pull/20019 From duke at openjdk.org Thu Jul 4 01:53:29 2024 From: duke at openjdk.org (duke) Date: Thu, 4 Jul 2024 01:53:29 GMT Subject: Withdrawn: 8325316: Enable -pedantic -Wpedantic for gcc In-Reply-To: <5-fL8vy-065EhMR2SzE21o7UFrOOoDVfnhDG06G-kkE=.0c2dab10-c458-4178-89f7-004f8f921a3f@github.com> References: <5-fL8vy-065EhMR2SzE21o7UFrOOoDVfnhDG06G-kkE=.0c2dab10-c458-4178-89f7-004f8f921a3f@github.com> Message-ID: <27AIDm3yKW2W4MwsZCVPsxPdmQLc5LCIvFpT5P32ruA=.ab7c27fd-064e-43c1-8abd-77f8a052592c@github.com> On Tue, 6 Feb 2024 09:45:07 GMT, Julian Waters <jwaters at openjdk.org> wrote: > Similarly to [JDK-8325163](https://bugs.openjdk.org/browse/JDK-8325163), this enables pedantic mode for gcc, ensuring stricter Standard conformance and allowing for buggy and broken code previously undetectable by gcc to be caught This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/17727 From dholmes at openjdk.org Thu Jul 4 04:18:17 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 4 Jul 2024 04:18:17 GMT Subject: RFR: 8334231: Optimize MethodData layout In-Reply-To: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com> References: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com> Message-ID: <clMXAOykJ1YFhkWUsoEbupm32URSyEnFJ5hcYRSB6iM=.04eaf1c4-3baa-46f0-b1c8-87ccbc802298@github.com> On Thu, 4 Jul 2024 00:08:35 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote: > Hi all, > This PR is a part of https://bugs.openjdk.org/browse/JDK-8334227 to optimize Hotspot C++ class layouts, this one is for the layout of MethodData. Here is the original layout from `pahole`: > > class MethodData : public Metadata { > public: > > /* class Metadata <ancestor>; */ /* 0 0 */ > > /* XXX 8 bytes hole, try to pack */ > > class Method * _method; /* 8 8 */ > int _size; /* 16 4 */ > int _hint_di; /* 20 4 */ > class Mutex _extra_data_lock; /* 24 104 */ > /* --- cacheline 2 boundary (128 bytes) --- */ > class CompilerCounters _compiler_counters; /* 128 80 */ > /* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */ > intx _eflags; /* 208 8 */ > intx _arg_local; /* 216 8 */ > intx _arg_stack; /* 224 8 */ > intx _arg_returned; /* 232 8 */ > int _creation_mileage; /* 240 4 */ > class InvocationCounter _invocation_counter; /* 244 4 */ > class InvocationCounter _backedge_counter; /* 248 4 */ > int _invocation_counter_start; /* 252 4 */ > /* --- cacheline 4 boundary (256 bytes) --- */ > int _backedge_counter_start; /* 256 4 */ > uint _tenure_traps; /* 260 4 */ > int _invoke_mask; /* 264 4 */ > int _backedge_mask; /* 268 4 */ > short int _num_loops; /* 272 2 */ > short int _num_blocks; /* 274 2 */ > enum WouldProfile _would_profile; /* 276 4 */ > int _jvmci_ir_size; /* 280 4 */ > > /* XXX 4 bytes hole, try to pack */ > > class FailedSpeculation * _failed_speculations; /* 288 8 */ > int _data_size; /* 296 4 */ > int _parameters_type_data_di; /* 300 4 */ > int _exception_handler_data_di; /* 304 4 */ > > /* XXX 4 bytes hole, try to pack */ > > intptr_t _data[1]; /* 312 8 */ > > /* size: 320, cachelines: 5, members: 27 */ > /* sum members: 304, holes: 3, sum holes: 16 */ > }; > > > There are 3 holes ... Seems reasonable. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20019#pullrequestreview-2157913550 From dholmes at openjdk.org Thu Jul 4 05:02:25 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 4 Jul 2024 05:02:25 GMT Subject: RFR: 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 In-Reply-To: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> Message-ID: <iZb_AvCGeJYQ51-UTqMhkxRKQwt0F6UgdM6nppalaEo=.d3c5ad91-9342-42a6-83c9-03a9e4a104bb@github.com> On Wed, 3 Jul 2024 16:24:20 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote: > The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens. > > The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details). > > The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. > > I tested the patch by running it through mach5 tiers 1-6. > > Thanks, > Patricio Thanks for the detailed explanations. A couple of minor (pre-existing) nits but changes are good. Thanks src/hotspot/share/interpreter/oopMapCache.cpp line 179: > 177: #ifdef ASSERT > 178: _used = false; > 179: #endif Nit pre-existing: use of DEBUG_ONLY would be more consistent with later setting of `_used`. src/hotspot/share/interpreter/oopMapCache.cpp line 408: > 406: > 407: void InterpreterOopMap::resource_copy(OopMapCacheEntry* from) { > 408: // The expectation is that this InterpreterOopMap is a recently created s/is a recently/is recently/ src/hotspot/share/interpreter/oopMapCache.hpp line 136: > 134: // Copy the OopMapCacheEntry in parameter "from" into this > 135: // InterpreterOopMap. If the _bit_mask[0] in "from" points to > 136: // allocated space (i.e., the bit mask was to large to hold Nit pre-existing: s/to/too/ src/hotspot/share/interpreter/oopMapCache.hpp line 138: > 136: // allocated space (i.e., the bit mask was to large to hold > 137: // in-line), allocate the space from the C heap. > 138: void resource_copy(OopMapCacheEntry* from); The name `resource_copy` seems somewhat of a misnomer given it may be C heap. Is it worth changing? ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20012#pullrequestreview-2157946377 PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1665114778 PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1665112766 PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1665116975 PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1665117670 From hgreule at openjdk.org Thu Jul 4 06:22:31 2024 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 4 Jul 2024 06:22:31 GMT Subject: RFR: 8335638: Calling VarHandle.{access-mode} methods reflectively throws wrong exception [v2] In-Reply-To: <gD4D2MSMO5dqwOf-XWA1u-a50e59goP8F_6be-mermA=.d172f4cf-14ad-492b-bdcc-8cf39d77c8ef@github.com> References: <gD4D2MSMO5dqwOf-XWA1u-a50e59goP8F_6be-mermA=.d172f4cf-14ad-492b-bdcc-8cf39d77c8ef@github.com> Message-ID: <1yQze0X7kl1oxFtlWu0rtJwHF2WtnZYJ7t6OteIJAnQ=.85eae267-7848-4978-aa11-9f2720e67e00@github.com> > Similar to how `MethodHandle#invoke(Exact)` methods are already handled, this change adds special casing for `VarHandle.{access-mode}` methods. > > The exception message is less exact, but I think that's acceptable. Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: address comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20015/files - new: https://git.openjdk.org/jdk/pull/20015/files/fe43b749..e329ceb2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20015&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20015&range=00-01 Stats: 43 lines in 2 files changed: 17 ins; 16 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/20015.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20015/head:pull/20015 PR: https://git.openjdk.org/jdk/pull/20015 From stuefe at openjdk.org Thu Jul 4 06:24:32 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 4 Jul 2024 06:24:32 GMT Subject: RFR: 8334738: os::print_hex_dump should optionally print ASCII [v3] In-Reply-To: <EliUQk2e0HZE3BQ3BKOGvF81KROy_lLp4OgK-hRWazA=.79466db9-87df-403c-a928-15e1dea8bbd5@github.com> References: <YKa7IgCjp0GLJDZFTlLVoBfDavVdj1Fc5XmQV-xVBM8=.46792106-0555-47bd-899f-056fa5219d03@github.com> <nyLYOhw7-wSPlKjeWi3FyuLY0UzFwWJdj-19ijEInU4=.6f539aaf-0cff-4ab8-8ca0-3acd3b44d071@github.com> <EliUQk2e0HZE3BQ3BKOGvF81KROy_lLp4OgK-hRWazA=.79466db9-87df-403c-a928-15e1dea8bbd5@github.com> Message-ID: <b4NrQs2S9jYAEddRJvmJelnXOXo8tGRulqW7b9Q_RO8=.0a33e434-6096-427d-940b-6f87facc3db6@github.com> On Wed, 3 Jul 2024 17:34:14 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> exclude test for AIX > > This isn't really the area of my expertise, but the patch seems reasonable to me. Many thanks, @jerboaa ! > src/hotspot/share/runtime/os.cpp line 945: > >> 943: >> 944: ATTRIBUTE_NO_ASAN static bool read_safely_from(const uintptr_t* p, uintptr_t* result) { >> 945: DEBUG_ONLY(*result = 0xAAAA;) > > It's not clear why this was added. Left-over? It's intentional, to have an indication for a failing SafeFetch that we in turn fail to recognise. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19835#issuecomment-2208195396 PR Review Comment: https://git.openjdk.org/jdk/pull/19835#discussion_r1665183201 From stuefe at openjdk.org Thu Jul 4 06:24:33 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 4 Jul 2024 06:24:33 GMT Subject: RFR: 8334738: os::print_hex_dump should optionally print ASCII [v3] In-Reply-To: <Aj6UFZiCaVQTWxFWc3SvhVTFINYza_C0NruMDH6auPU=.0ae0896a-7eaf-49a5-96e5-672ff241cfef@github.com> References: <YKa7IgCjp0GLJDZFTlLVoBfDavVdj1Fc5XmQV-xVBM8=.46792106-0555-47bd-899f-056fa5219d03@github.com> <Aj6UFZiCaVQTWxFWc3SvhVTFINYza_C0NruMDH6auPU=.0ae0896a-7eaf-49a5-96e5-672ff241cfef@github.com> Message-ID: <1Tis955GRVGvI0HJ7G80tvnN_wzsPZlU7v72CtbBE2s=.f6633a9f-3fe8-4a50-9c86-bf1b673d88c7@github.com> On Tue, 25 Jun 2024 06:53:39 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> exclude test for AIX > > src/hotspot/share/runtime/os.cpp line 961: > >> 959: union { >> 960: uint64_t v; >> 961: uint8_t c[sizeof(v)]; > > Why `uint8_t` instead of `unsigned char`? @dholmes-ora I missed your question, sorry. Both are the same, obviously, but I vaguely prefer uint8_t since its a bit shorter and has the size in its name. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19835#discussion_r1665186434 From stuefe at openjdk.org Thu Jul 4 06:24:33 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 4 Jul 2024 06:24:33 GMT Subject: Integrated: 8334738: os::print_hex_dump should optionally print ASCII In-Reply-To: <YKa7IgCjp0GLJDZFTlLVoBfDavVdj1Fc5XmQV-xVBM8=.46792106-0555-47bd-899f-056fa5219d03@github.com> References: <YKa7IgCjp0GLJDZFTlLVoBfDavVdj1Fc5XmQV-xVBM8=.46792106-0555-47bd-899f-056fa5219d03@github.com> Message-ID: <E62kiGYd4JUlaxg6YHZh9dzoAl2zLN8WhMmp44E5VZU=.b6e793f5-42d5-494e-b1c1-e93e824692c7@github.com> On Fri, 21 Jun 2024 16:17:43 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > Motivated by analyzing CDS dump differences for reproducible builds, I found an optional ASCII printout to be valuable. As usual with hex dumps, ascii follows hex printout > > Example: > > > > 118 0x00000000000001c0: 204b444a6e65704f 53207469422d3436 4d56207265767265 6564747361662820 OpenJDK 64-Bit Server VM (fastde > 119 0x00000000000001e0: 692d343220677562 2d6c616e7265746e 68742e636f686461 756f732e73616d6f bug 24-internal-adhoc.thomas.sou > 120 0x0000000000000200: 726f662029656372 612d78756e696c20 45524a203436646d 746e692d34322820 rce) for linux-amd64 JRE (24-int > 121 0x0000000000000220: 64612d6c616e7265 6d6f68742e636f68 6372756f732e7361 6c697562202c2965 ernal-adhoc.thomas.source), buil > 122 0x0000000000000240: 323032206e6f2074 5430322d36302d34 32313a35343a3031 672068746977205a t on 2024-06-20T10:45:12Z with g > 123 0x0000000000000260: 2e352e3031206363 0000000000000030 0000000000000000 0000000000000000 cc 10.5.0_______________________ > 124 0x0000000000000280: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ________________________________ > 125 0x00000000000002a0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ________________________________ > > > The patch does that. > > Small unrelated changes: > > - I rewrote and extended the gtests, testing now a real-life printout containing a mixture or readable and non-readable pages, and printable and non-printable characters. I re-enabled tests on Windows, since https://bugs.openjdk.org/browse/JDK-8185734 is long solved. > > - The new test uncovered an issue on 32-bit when printing giant words. We shift a signed value by 32 bits upwards, which can result in -1 resp. ffffffff in the upper half of the giant word. One of the pitfalls of intptr_t vs uintptr_t (I think most uses of intptr_t should probably use uintptr_t). > > - I got tired of casting constness away from to-be-printed memory range just to be able to feed an address to os::print_hex_dump. The content printed is usually const. os::print_hex_dump does not need non-constness, but since we use address, and address is typedef char*, and one cannot declare a typedef'ed pointer target-const, the issue is there. I therefore changed the input to const uint8_t*. Maybe we need a const_address or something similar. > > ---- > > Ran tests on Linux x64 and x86, Windows x86 and Mac aarch64. Fixed all issues I found. Only little-endian, I don't have big-endian machines and therefore made those changes blindly. ... This pull request has now been integrated. Changeset: 38a578d5 Author: Thomas Stuefe <stuefe at openjdk.org> URL: https://git.openjdk.org/jdk/commit/38a578d547f39c3637d97f5e0242f4a69f3bbb31 Stats: 159 lines in 7 files changed: 67 ins; 15 del; 77 mod 8334738: os::print_hex_dump should optionally print ASCII Reviewed-by: dholmes, sgehwolf ------------- PR: https://git.openjdk.org/jdk/pull/19835 From rehn at openjdk.org Thu Jul 4 06:58:23 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 4 Jul 2024 06:58:23 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v21] In-Reply-To: <tdkglfMOAm2Mg_Qj_TvnOro8oIqlOV8LKgfqgKTYFIw=.654dd861-9959-4b17-9c5a-6628f2782e3b@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> <tdkglfMOAm2Mg_Qj_TvnOro8oIqlOV8LKgfqgKTYFIw=.654dd861-9959-4b17-9c5a-6628f2782e3b@github.com> Message-ID: <lbWK_Y3hEdKr4vm76gKheXLMBYyLGvusC2svp0BKTSo=.cd4f881b-8217-46ad-8384-b95129db851d@github.com> On Wed, 3 Jul 2024 12:53:54 GMT, Robbin Ehn <rehn at openjdk.org> wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL <trampo> >> Stubs: >> AUIPC >> LD >> JALR >> <DEST> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> <DEST> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Rename to reloc_call If there is no major issues, I suggest we should consider ship now. As it is early in the cycle this will get a lot of bake-time, and there will be plenty of time to do additional changes, and even possible to change the default to trampoline calls. I'll re-start all testing once again :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2208250089 From stuefe at openjdk.org Thu Jul 4 07:19:19 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 4 Jul 2024 07:19:19 GMT Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is failing [v2] In-Reply-To: <j9wD-xHe-y5DPpuECd_cRVryshOhzq602wuMqqpXZi0=.389f4f4c-845e-46f4-9f46-d187a534cdf9@github.com> References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com> <C7DmVK87y-6S5Ljq58--YbyaFND0i03LQEGzpn0FBrY=.fd0e1e6f-86bb-44fa-b61d-1853a48e2fd7@github.com> <_RvHrV4nbtsAGnJq9S_98XOXciT71gMb6DfbCr6WVC0=.1278784f-b6c5-4d5e-92e6-fa21db95bc51@github.com> <hxa4YcsrUexAD_cqjQV9kz556tbr62l25sEdAKJf9hk=.3c1f219b-582b-46af-89c1-0b3dbcdc2c16@github.com> <j9wD-xHe-y5DPpuECd_cRVryshOhzq602wuMqqpXZi0=.389f4f4c-845e-46f4-9f46-d187a534cdf9@github.com> Message-ID: <1yfgRA2s391Nqf1ONq_8MNblW3S2U_ARUncWsA4T83w=.845c926d-de90-4f4b-ba6d-bcf929d5d924@github.com> On Mon, 1 Jul 2024 13:02:54 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: >> See line 141 > > I see; thanks. Can you add a comment referencing `prepareOptions`? Sure. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19803#discussion_r1665245390 From rehn at openjdk.org Thu Jul 4 07:28:34 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 4 Jul 2024 07:28:34 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v22] In-Reply-To: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> Message-ID: <tqCYUNb0IDg2yp0mq1keFZiQ-ANeBFQZQbxDtHd_iLM=.facaaf44-63e1-44e0-ba18-9634dfd04fc5@github.com> > Hi all, please consider! > > Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). > Using a very small application or running very short time we have fast patchable calls. > But any normal application running longer will increase the code size and code chrun/fragmentation. > So whatever or not you get hot fast calls rely on luck. > > To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. > This would be the common case for a patchable call. > > Code stream: > JAL <trampo> > Stubs: > AUIPC > LD > JALR > <DEST> > > > On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. > Even if you don't have that problem having a call to a jump is not the fastest way. > Loading the address avoids the pitsfalls of cmodx. > > This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, > and instead do by default: > > Code stream: > AUIPC > LD > JALR > Stubs: > <DEST> > > An experimental option for turning trampolines back on exists. > > It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. > > Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): > > fop (msec) 2239 | 2128 = 0.950424 > h2 (msec) 18660 | 16594 = 0.889282 > jython (msec) 22022 | 21925 = 0.995595 > luindex (msec) 2866 | 2842 = 0.991626 > lusearch (msec) 4108 | 4311 = 1.04942 > lusearch-fix (msec) 4406 | 4116 = 0.934181 > pmd (msec) 5976 | 5897 = 0.98678 > jython (msec) 22022 | 21925 = 0.995595 > Avg: 0.974112 > fop(xcomp) (msec) 2721 | 2714 = 0.997427 > h2(xcomp) (msec) 37719 | 38004 = 1.00756 > jython(xcomp) ... Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 31 commits: - Merge branch 'master' into 8332689 - Rename to reloc_call - Merge branch 'master' into 8332689 - Rename lc - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - Comments - Missed in merge-fixes, minor revert - Merge branch 'master' into 8332689 - Minor review comments - ... and 21 more: https://git.openjdk.org/jdk/compare/38a578d5...9eabb5fa ------------- Changes: https://git.openjdk.org/jdk/pull/19453/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=21 Stats: 897 lines in 16 files changed: 622 ins; 177 del; 98 mod Patch: https://git.openjdk.org/jdk/pull/19453.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453 PR: https://git.openjdk.org/jdk/pull/19453 From stuefe at openjdk.org Thu Jul 4 07:30:18 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 4 Jul 2024 07:30:18 GMT Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is failing [v2] In-Reply-To: <j9wD-xHe-y5DPpuECd_cRVryshOhzq602wuMqqpXZi0=.389f4f4c-845e-46f4-9f46-d187a534cdf9@github.com> References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com> <C7DmVK87y-6S5Ljq58--YbyaFND0i03LQEGzpn0FBrY=.fd0e1e6f-86bb-44fa-b61d-1853a48e2fd7@github.com> <_RvHrV4nbtsAGnJq9S_98XOXciT71gMb6DfbCr6WVC0=.1278784f-b6c5-4d5e-92e6-fa21db95bc51@github.com> <hxa4YcsrUexAD_cqjQV9kz556tbr62l25sEdAKJf9hk=.3c1f219b-582b-46af-89c1-0b3dbcdc2c16@github.com> <j9wD-xHe-y5DPpuECd_cRVryshOhzq602wuMqqpXZi0=.389f4f4c-845e-46f4-9f46-d187a534cdf9@github.com> Message-ID: <tATLRKVSTICcHm46o6uumcpF_RWHC_hwAXnyXzKpofU=.b3a77778-6a8c-4ddd-9eda-724e3e4cd9ae@github.com> On Mon, 1 Jul 2024 13:02:09 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: >>> Why the explicit -Xmx64m? As I understand this is essentially the launcher, whose heap-size is of little importance. >> >> No particular reason, just don't like launchers to use large heaps. I can remove it. >> >>> Also, why does the launch require WhiteBoxAPI? >> >> Because the launcher needs to access hostAvailableMemory in order to decide before starting the test whether it makes sense to start the test. > >> just don't like launchers to use large heaps. > > Could you add a comment at the start of this file explaining the test setup, launcher creating another VM + real test flags? There, the rational for small heap (64M) can be covered as well. > > Some text on these fields can also help understand this test. > > final static long expectedMaxNonHeapRSS = M * 256; > final static long requiredAvailableBefore = heapsize * 2 + expectedMaxNonHeapRSS; > final static long requiredAvailableDuring = expectedMaxNonHeapRSS; Sure. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19803#discussion_r1665258517 From stuefe at openjdk.org Thu Jul 4 07:33:19 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 4 Jul 2024 07:33:19 GMT Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is failing [v2] In-Reply-To: <GXA1RAdCoZKi_8VnJ9qtOsqPCuuyhO-gfjfnlvW04d8=.17a66849-d08e-4e71-ba1d-0db0c742e691@github.com> References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com> <C7DmVK87y-6S5Ljq58--YbyaFND0i03LQEGzpn0FBrY=.fd0e1e6f-86bb-44fa-b61d-1853a48e2fd7@github.com> <GXA1RAdCoZKi_8VnJ9qtOsqPCuuyhO-gfjfnlvW04d8=.17a66849-d08e-4e71-ba1d-0db0c742e691@github.com> Message-ID: <ZnAE3KOtCPDj0viMtN2g3tU_i0-XvkrRHNnNMBIbYhQ=.66cae7cb-831a-4674-8477-1ea04c3d773a@github.com> On Mon, 1 Jul 2024 13:09:51 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> Update TestAlwaysPreTouchBehavior.java > > test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java line 164: > >> 162: if (rss < committed) { >> 163: if (avail < requiredAvailableDuring) { >> 164: throw new SkippedException("Not enough memory for this test (" + avail + ")"); > > This is essentially an early-return; why is this inside the `rss < committed` comparison? Does it work if it's lifted up? The structure I have in mind is like: > > > if (avail < ....) { > skip-test; > } > assert(rss >= committed, error-msg); Well, the test may have succeeded despite what we recognize as low memory conditions. The "low-memory-condition-recognition" is necessarily over-generous - we count even vaguely lowish conditions as "low." We do this to prevent false negatives, which tests like these are often plagued with. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19803#discussion_r1665262024 From stuefe at openjdk.org Thu Jul 4 07:46:35 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 4 Jul 2024 07:46:35 GMT Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is failing [v2] In-Reply-To: <GXA1RAdCoZKi_8VnJ9qtOsqPCuuyhO-gfjfnlvW04d8=.17a66849-d08e-4e71-ba1d-0db0c742e691@github.com> References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com> <C7DmVK87y-6S5Ljq58--YbyaFND0i03LQEGzpn0FBrY=.fd0e1e6f-86bb-44fa-b61d-1853a48e2fd7@github.com> <GXA1RAdCoZKi_8VnJ9qtOsqPCuuyhO-gfjfnlvW04d8=.17a66849-d08e-4e71-ba1d-0db0c742e691@github.com> Message-ID: <uyUvbKGKPu8Y_SL77V20x0s8wTNu-sJI3DNOXYAFkzM=.6ae7c214-0d28-4c84-8e8e-bb3fd3f5a977@github.com> On Mon, 1 Jul 2024 13:10:39 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> Update TestAlwaysPreTouchBehavior.java > > Some readability suggestions. Hi @albertnetymk, thanks a lot for your review. I added a comment explaining the test ratio and -setup. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19803#issuecomment-2208322749 From stuefe at openjdk.org Thu Jul 4 07:46:35 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 4 Jul 2024 07:46:35 GMT Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is failing [v3] In-Reply-To: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com> References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com> Message-ID: <ri-bl1TeDnmiHjJX4EDIElQZl_ff5vIl_Al7RrbFCUg=.9b2952ee-9c14-42ab-aa50-eb5dd60fcce8@github.com> > See JBS issue. > > It is not completely obvious what the problem is in Oracle's CI, but the current assumption is that RSS of the testee VM gets reduced after it started and before we measured due to memory pressure. > > The patch: > - exposes os::available_memory via Whitebox > - For the test to count as failed, we require a certain minimum size of available memory both before and during the start of the testee JVM. Otherwise, we throw a `SkippedException` > > I have some misgivings about this solution, though: > 1) obviously, it is not bullet-proof either, since it is vulnerable to fast changes in machine memory load. > 2) On MacOS, we have the problem that 'os::available_memory()' totally underreports how much memory is available. Therefore, as an estimate of whether the test is valid, it is too conservative. I opened https://bugs.openjdk.org/browse/JDK-8334767 to track that issue. As long as it is not fixed, the tests will likely fall below the threshold on MacOS and, therefore, be skipped. Still, this is somewhat better than outright excluding the test for MacOS (or is it? Open to opinions) > 3) `SkippedException` leads to the test counting as "passed", not "skipped". I think that is a usability issue with jtreg. I cannot easily see which tests had been skipped due to SkippedException. > > Despite my doubts, I think this is the best we can come up with if we want to have such a test. > > Note: One way to go about (3) would be to make "minimum available memory" a `@requires` tag, similar to os.maxMemory. However, I fear that this may be easily misused and cause many tests to be excluded without notice. Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'master' into JDK-8334513-New-test-gc-TestAlwaysPreTouchBehavior-java-is-failing - comments for albert - Update TestAlwaysPreTouchBehavior.java - tweaks - fixes - Merge branch 'master' into JDK-8334513-New-test-gc-TestAlwaysPreTouchBehavior-java-is-failing - fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19803/files - new: https://git.openjdk.org/jdk/pull/19803/files/19ed5833..a25bfca4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19803&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19803&range=01-02 Stats: 14736 lines in 498 files changed: 9695 ins; 3004 del; 2037 mod Patch: https://git.openjdk.org/jdk/pull/19803.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19803/head:pull/19803 PR: https://git.openjdk.org/jdk/pull/19803 From stuefe at openjdk.org Thu Jul 4 07:49:32 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 4 Jul 2024 07:49:32 GMT Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is failing [v4] In-Reply-To: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com> References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com> Message-ID: <aiYvWQf9AqVWcE_I4yy5e4l1CL3pN9KjZyYEMa0t0N8=.67276cf4-29f0-41d7-8ef2-a1eb1d4dc68e@github.com> > See JBS issue. > > It is not completely obvious what the problem is in Oracle's CI, but the current assumption is that RSS of the testee VM gets reduced after it started and before we measured due to memory pressure. > > The patch: > - exposes os::available_memory via Whitebox > - For the test to count as failed, we require a certain minimum size of available memory both before and during the start of the testee JVM. Otherwise, we throw a `SkippedException` > > I have some misgivings about this solution, though: > 1) obviously, it is not bullet-proof either, since it is vulnerable to fast changes in machine memory load. > 2) On MacOS, we have the problem that 'os::available_memory()' totally underreports how much memory is available. Therefore, as an estimate of whether the test is valid, it is too conservative. I opened https://bugs.openjdk.org/browse/JDK-8334767 to track that issue. As long as it is not fixed, the tests will likely fall below the threshold on MacOS and, therefore, be skipped. Still, this is somewhat better than outright excluding the test for MacOS (or is it? Open to opinions) > 3) `SkippedException` leads to the test counting as "passed", not "skipped". I think that is a usability issue with jtreg. I cannot easily see which tests had been skipped due to SkippedException. > > Despite my doubts, I think this is the best we can come up with if we want to have such a test. > > Note: One way to go about (3) would be to make "minimum available memory" a `@requires` tag, similar to os.maxMemory. However, I fear that this may be easily misused and cause many tests to be excluded without notice. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: comma ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19803/files - new: https://git.openjdk.org/jdk/pull/19803/files/a25bfca4..eba72ed9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19803&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19803&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19803.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19803/head:pull/19803 PR: https://git.openjdk.org/jdk/pull/19803 From stuefe at openjdk.org Thu Jul 4 07:53:01 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 4 Jul 2024 07:53:01 GMT Subject: RFR: 8330174: Establish no-access zone at the start of Klass encoding range [v4] In-Reply-To: <9RShpjQGr5MI3aqK6VqpYgDiUJS3q_Q6Bdo4jWmtJ5g=.764b3747-69be-4a70-a599-d6cb9a02bddd@github.com> References: <9RShpjQGr5MI3aqK6VqpYgDiUJS3q_Q6Bdo4jWmtJ5g=.764b3747-69be-4a70-a599-d6cb9a02bddd@github.com> Message-ID: <ZTQ7TB_8UHD8r9W96IsXtV2TO2X2DLwRaf4tgVHVpuA=.d93ccc2a-ab86-47f4-8c8c-a38040f84eef@github.com> > After having reserved an address range for the Klass encoding range, we either: > a) Place CDS, then class space, into that address range > b) Place only class space in that range (if CDS is off). > > For an nKlass of 0, the decoded Klasspointer points to the beginning of the encoding range. Since nKlass=0 is a special value, both CDS (a) and Metaspace (b) ensure that no Klass is placed right at the start of the Klass range. > > However, it would also be good to establish a no-access zone at the range's start. Dereferencing an nKlass=0 would then result in an immediate, obvious crash instead of in reading invalid data. > > This would closely mimic what we do in the compressed-oops-enabled java heap (albeit there we do it for fault-based null checks, too) and what Operating Systems do with low-address ranges. > > --- > > The patch: > > We can neither move the encoding base down one page (the encoding base is carefully chosen to fit the platform's decoding). Nor can we move CDS archive space up one page (since CDS relies on the archive being placed exactly at the encoding base address). Nor do we want to move class space up (since class space start has a high alignment requirement of 16MB, protection zone would need to be 16MB large, which is a waste of address space). > > Instead, as before, we just let Metaspace and CDS handle the protection zone internally. For Metaspace, this is very simple. We just protect the first page of class space. > > For CDS, it is a tiny bit more complex since we need to leave a "protection-zone-shaped hole" in the first region of the archive when we dump it. We do just that and then give that region a new property, "has protection zone". At runtime, we protect the underlying memory if a mapped region has a protection zone. > > With CDS, because the page size can differ between dump- and runtime, the protection zone is the size of CDS core region alignment, not page-sized (e.g. dumping on Linux aarch64 with 4KB pages shall generate an archive that can be used in Docker on MacOS with 16KB pages). > > ---- > > Tests: > - ran CDS and AppCDS jtreg tests manually on Mac m1 > - manually tested that decoding, then dereferencing an nKlass=0 gives us the new "Fault address is narrow Klass base - dereferencing a zero nKlass?" output in the hs-err file > - GHAs (which include the new regression test) Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'openjdk:master' into cds-metaspace-prot-prefix - Merge branch 'openjdk:master' into cds-metaspace-prot-prefix - Merge branch 'openjdk:master' into cds-metaspace-prot-prefix - Merge branch 'openjdk:master' into cds-metaspace-prot-prefix - Update metaspace.cpp - cds-metaspace-prot-prefix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19290/files - new: https://git.openjdk.org/jdk/pull/19290/files/2ccd527d..ee869fc6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19290&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19290&range=02-03 Stats: 35871 lines in 820 files changed: 23818 ins; 8212 del; 3841 mod Patch: https://git.openjdk.org/jdk/pull/19290.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19290/head:pull/19290 PR: https://git.openjdk.org/jdk/pull/19290 From eosterlund at openjdk.org Thu Jul 4 07:54:22 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 4 Jul 2024 07:54:22 GMT Subject: RFR: 8334890: Missing unconditional cross modifying fence in nmethod entry barriers In-Reply-To: <592bq3FIM28SxUn6yH2iCDRT6TO_lpn_WvoS6PglM90=.b965043e-8550-45e0-be8b-5a71163a16d6@github.com> References: <592bq3FIM28SxUn6yH2iCDRT6TO_lpn_WvoS6PglM90=.b965043e-8550-45e0-be8b-5a71163a16d6@github.com> Message-ID: <yjmiVgrkuSxz7OHkp_Pg5lhOlu6mGoA4L46ZcmXhgxk=.8e6ed7af-6bef-4e43-8161-79156f76afaf@github.com> On Tue, 2 Jul 2024 15:43:08 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: > On x86_64, our nmethod entry barriers use a mix of asynchronous and synchronous code modification. There is a cmp instruction with an immediate. When the immediate value is "incorrect", the nmethod is armed, and when it's "correct", it's disarmed. When we load the immediate with the instruction fetcher, we use asynchronous cross modifying code, and when we load the immediate as data, we use synchronous cross modifying code. > > We use asynchronous code modification in the fast path of nmethod entry barriers. If the nmethod is concurrently being disarmed while the nmethod entry barrier is executed, then we are guaranteed that if the updated "correct" immediate is observed by the instruction fetcher, then any code modification to the nmethod prior to disarming it on another thread, is guaranteed to also be observed by the instruction fetcher. > > However, in the slow path, when the immediate was observed to have the "incorrect" value by the instruction fetcher, we call a C++ function, BarrierSetNMethod::nmethod_stub_entry_barrier. In this function we check if the nmethod is disarmed or armed, by loading the guard value (from the immediate), as data. If we observe the updated value, indicating that the nmethod has become disarmed, we want to enter the nmethod. However, since we used data to signal that the instruction cross modification has happened, it is not safe to execute the concurrently modified instructions, without enforcing a cross modifying code fence. This is synchronous code modification. > > There is some questionable optimization that in the stub slow path entry (which we just got to because the nmethod was observed to be armed by the instruction fetcher). It checks "just one more time" if the nmethod concurrently got disarmed, and then exits without cross modification fence. This is an opportunistic optimization that is very unlikely to be useful, since we got into the slow path because it a couple of instructions ago was armed. This opportunistic optimization breaks the synchronous code modification contract, which is that you have to issue an instruction cross modification fence after reading the data that signalled that cross modification has completed successfully. > > This patch removes these kinds of opportunistic optimizations from the nmethod entry barrier code, in order to make it more robust and follow the synchronous cross modification dance correctly. Performance results are neutral, as expected. Tier1-5 passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19990#issuecomment-2208337400 From aboldtch at openjdk.org Thu Jul 4 08:01:18 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 4 Jul 2024 08:01:18 GMT Subject: RFR: 8334890: Missing unconditional cross modifying fence in nmethod entry barriers In-Reply-To: <592bq3FIM28SxUn6yH2iCDRT6TO_lpn_WvoS6PglM90=.b965043e-8550-45e0-be8b-5a71163a16d6@github.com> References: <592bq3FIM28SxUn6yH2iCDRT6TO_lpn_WvoS6PglM90=.b965043e-8550-45e0-be8b-5a71163a16d6@github.com> Message-ID: <AJ60YmU3wkp4rz9Q8noIiOCF6BuAaSicDOEGmziLn4s=.f4532050-f64f-4725-b825-9cc36cec2813@github.com> On Tue, 2 Jul 2024 15:43:08 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: > On x86_64, our nmethod entry barriers use a mix of asynchronous and synchronous code modification. There is a cmp instruction with an immediate. When the immediate value is "incorrect", the nmethod is armed, and when it's "correct", it's disarmed. When we load the immediate with the instruction fetcher, we use asynchronous cross modifying code, and when we load the immediate as data, we use synchronous cross modifying code. > > We use asynchronous code modification in the fast path of nmethod entry barriers. If the nmethod is concurrently being disarmed while the nmethod entry barrier is executed, then we are guaranteed that if the updated "correct" immediate is observed by the instruction fetcher, then any code modification to the nmethod prior to disarming it on another thread, is guaranteed to also be observed by the instruction fetcher. > > However, in the slow path, when the immediate was observed to have the "incorrect" value by the instruction fetcher, we call a C++ function, BarrierSetNMethod::nmethod_stub_entry_barrier. In this function we check if the nmethod is disarmed or armed, by loading the guard value (from the immediate), as data. If we observe the updated value, indicating that the nmethod has become disarmed, we want to enter the nmethod. However, since we used data to signal that the instruction cross modification has happened, it is not safe to execute the concurrently modified instructions, without enforcing a cross modifying code fence. This is synchronous code modification. > > There is some questionable optimization that in the stub slow path entry (which we just got to because the nmethod was observed to be armed by the instruction fetcher). It checks "just one more time" if the nmethod concurrently got disarmed, and then exits without cross modification fence. This is an opportunistic optimization that is very unlikely to be useful, since we got into the slow path because it a couple of instructions ago was armed. This opportunistic optimization breaks the synchronous code modification contract, which is that you have to issue an instruction cross modification fence after reading the data that signalled that cross modification has completed successfully. > > This patch removes these kinds of opportunistic optimizations from the nmethod entry barrier code, in order to make it more robust and follow the synchronous cross modification dance correctly. The always fence changes looks good. The effects w.r.t. `DeoptimizeNMethodBarriersALot` and our testing is less clear to me. src/hotspot/share/gc/shared/barrierSetNMethod.cpp line 197: > 195: if (DeoptimizeNMethodBarriersALot && !nm->is_osr_method()) { > 196: static volatile uint32_t counter=0; > 197: if (Atomic::add(&counter, 1u) % 10 == 0) { I have not good intuition about the frequency of this, and how this affects things. So have hard time commenting on this change. An alternative would be to just fence when the `nmethod` is already disarmed and not call `bs_nm->nmethod_entry_barrier(nm);`. This is what effectively already happens in all implementations of `nmethod_entry_barrier` after [JDK-8331911](https://bugs.openjdk.org/browse/JDK-8331911) / #19285. (Maintaining the current behaviour with respect to `DeoptimizeNMethodBarriersALot`). But as long as this new magic constant seems to have similar testing coverage when running our tests that use `DeoptimizeNMethodBarriersALot` this seems like a sensible solution as well. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19990#pullrequestreview-2158224060 PR Review Comment: https://git.openjdk.org/jdk/pull/19990#discussion_r1665284317 From kbarrett at openjdk.org Thu Jul 4 08:08:24 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 4 Jul 2024 08:08:24 GMT Subject: RFR: 8334890: Missing unconditional cross modifying fence in nmethod entry barriers In-Reply-To: <592bq3FIM28SxUn6yH2iCDRT6TO_lpn_WvoS6PglM90=.b965043e-8550-45e0-be8b-5a71163a16d6@github.com> References: <592bq3FIM28SxUn6yH2iCDRT6TO_lpn_WvoS6PglM90=.b965043e-8550-45e0-be8b-5a71163a16d6@github.com> Message-ID: <GPFw-GRnWeKbjJILxoF3yPuXHTk-PNP5Rs2tQidkjIo=.00fa8b12-2e54-4304-ab8d-d19e04a5d249@github.com> On Tue, 2 Jul 2024 15:43:08 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: > On x86_64, our nmethod entry barriers use a mix of asynchronous and synchronous code modification. There is a cmp instruction with an immediate. When the immediate value is "incorrect", the nmethod is armed, and when it's "correct", it's disarmed. When we load the immediate with the instruction fetcher, we use asynchronous cross modifying code, and when we load the immediate as data, we use synchronous cross modifying code. > > We use asynchronous code modification in the fast path of nmethod entry barriers. If the nmethod is concurrently being disarmed while the nmethod entry barrier is executed, then we are guaranteed that if the updated "correct" immediate is observed by the instruction fetcher, then any code modification to the nmethod prior to disarming it on another thread, is guaranteed to also be observed by the instruction fetcher. > > However, in the slow path, when the immediate was observed to have the "incorrect" value by the instruction fetcher, we call a C++ function, BarrierSetNMethod::nmethod_stub_entry_barrier. In this function we check if the nmethod is disarmed or armed, by loading the guard value (from the immediate), as data. If we observe the updated value, indicating that the nmethod has become disarmed, we want to enter the nmethod. However, since we used data to signal that the instruction cross modification has happened, it is not safe to execute the concurrently modified instructions, without enforcing a cross modifying code fence. This is synchronous code modification. > > There is some questionable optimization that in the stub slow path entry (which we just got to because the nmethod was observed to be armed by the instruction fetcher). It checks "just one more time" if the nmethod concurrently got disarmed, and then exits without cross modification fence. This is an opportunistic optimization that is very unlikely to be useful, since we got into the slow path because it a couple of instructions ago was armed. This opportunistic optimization breaks the synchronous code modification contract, which is that you have to issue an instruction cross modification fence after reading the data that signalled that cross modification has completed successfully. > > This patch removes these kinds of opportunistic optimizations from the nmethod entry barrier code, in order to make it more robust and follow the synchronous cross modification dance correctly. Looks good. src/hotspot/share/gc/shared/barrierSetNMethod.cpp line 195: > 193: // Diagnostic option to force deoptimization 1 in 10 times. It is otherwise > 194: // a very rare event. > 195: if (DeoptimizeNMethodBarriersALot && !nm->is_osr_method()) { This could also include may_enter in the conditions, since the effect of this bit of code is to set it false. But maybe that's a rare thing and not worth checking for here. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19990#pullrequestreview-2158259609 PR Review Comment: https://git.openjdk.org/jdk/pull/19990#discussion_r1665305892 From sgehwolf at openjdk.org Thu Jul 4 08:10:23 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Thu, 4 Jul 2024 08:10:23 GMT Subject: RFR: 8334738: os::print_hex_dump should optionally print ASCII [v3] In-Reply-To: <b4NrQs2S9jYAEddRJvmJelnXOXo8tGRulqW7b9Q_RO8=.0a33e434-6096-427d-940b-6f87facc3db6@github.com> References: <YKa7IgCjp0GLJDZFTlLVoBfDavVdj1Fc5XmQV-xVBM8=.46792106-0555-47bd-899f-056fa5219d03@github.com> <nyLYOhw7-wSPlKjeWi3FyuLY0UzFwWJdj-19ijEInU4=.6f539aaf-0cff-4ab8-8ca0-3acd3b44d071@github.com> <EliUQk2e0HZE3BQ3BKOGvF81KROy_lLp4OgK-hRWazA=.79466db9-87df-403c-a928-15e1dea8bbd5@github.com> <b4NrQs2S9jYAEddRJvmJelnXOXo8tGRulqW7b9Q_RO8=.0a33e434-6096-427d-940b-6f87facc3db6@github.com> Message-ID: <6CdB1rRz9eL_2DXfwAGmWbNOW8QAx66YaZyPFQ53RCo=.fad303a5-e94b-478a-ab6f-a28e2e67e7a8@github.com> On Thu, 4 Jul 2024 06:17:45 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> src/hotspot/share/runtime/os.cpp line 945: >> >>> 943: >>> 944: ATTRIBUTE_NO_ASAN static bool read_safely_from(const uintptr_t* p, uintptr_t* result) { >>> 945: DEBUG_ONLY(*result = 0xAAAA;) >> >> It's not clear why this was added. Left-over? > > It's intentional, to have an indication for a failing SafeFetch that we in turn fail to recognise. OK, thanks for the explanation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19835#discussion_r1665309552 From stuefe at openjdk.org Thu Jul 4 08:23:21 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 4 Jul 2024 08:23:21 GMT Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is failing [v2] In-Reply-To: <rnPsj7tJBhXHpNZ_Cubn3LdzsXi_u8vjFjWIlTiE9-s=.acda4cb0-7fe3-40ce-aac4-7fb5d7deef1a@github.com> References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com> <C7DmVK87y-6S5Ljq58--YbyaFND0i03LQEGzpn0FBrY=.fd0e1e6f-86bb-44fa-b61d-1853a48e2fd7@github.com> <rnPsj7tJBhXHpNZ_Cubn3LdzsXi_u8vjFjWIlTiE9-s=.acda4cb0-7fe3-40ce-aac4-7fb5d7deef1a@github.com> Message-ID: <ZhyduzWKIbuw3PC9eJb2IXpmrEOuihapCp9MUubHiuI=.74c0787f-6a48-45d9-82d1-c109e704d19e@github.com> On Wed, 3 Jul 2024 06:08:34 GMT, Liming Liu <duke at openjdk.org> wrote: > Could you please confirm whether it is related to JDK-8335167? Both failures may be symptom of the same issue. See my comment in JBS. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19803#issuecomment-2208388841 From aboldtch at openjdk.org Thu Jul 4 08:23:26 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 4 Jul 2024 08:23:26 GMT Subject: Integrated: 8335397: Improve reliability of TestRecursiveMonitorChurn.java In-Reply-To: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com> References: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com> Message-ID: <TP3HJj8byl2IagJ4KFHtqUs5xaISnHC0yL8jcCZbx10=.bf3da5de-6385-4353-b6fd-d574ca2aac31@github.com> On Mon, 1 Jul 2024 09:21:13 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: > TestRecursiveMonitorChurn.java currently uses NMT to try and correlate the native memory increase with unwanted inflation. > > Change to instead query the JVM for exact number of inflations via the Whitebox API. This allow us to both be more exact and less dependent on interactions with NMT. This pull request has now been integrated. Changeset: b20e8c8e Author: Axel Boldt-Christmas <aboldtch at openjdk.org> URL: https://git.openjdk.org/jdk/commit/b20e8c8e85e0a0e96ae648f42ff803f1c83f6291 Stats: 77 lines in 5 files changed: 28 ins; 31 del; 18 mod 8335397: Improve reliability of TestRecursiveMonitorChurn.java Reviewed-by: coleenp, rkennke, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/19965 From aboldtch at openjdk.org Thu Jul 4 08:23:25 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 4 Jul 2024 08:23:25 GMT Subject: RFR: 8335397: Improve reliability of TestRecursiveMonitorChurn.java [v3] In-Reply-To: <HiqUmFciHGmdjRTa9B3RfxRxstbd6BA-QfZck-y5wBE=.98e4d5f3-4d47-406a-8de4-c712ca48d24f@github.com> References: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com> <HiqUmFciHGmdjRTa9B3RfxRxstbd6BA-QfZck-y5wBE=.98e4d5f3-4d47-406a-8de4-c712ca48d24f@github.com> Message-ID: <hCTzMAa1TsGw7xRGynwyawGuk_R4hiie-rNWkGycItQ=.56f8bd10-060c-4f7b-8e9d-20f47a44a5da@github.com> On Wed, 3 Jul 2024 07:25:48 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> TestRecursiveMonitorChurn.java currently uses NMT to try and correlate the native memory increase with unwanted inflation. >> >> Change to instead query the JVM for exact number of inflations via the Whitebox API. This allow us to both be more exact and less dependent on interactions with NMT. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19965#issuecomment-2208388161 From fyang at openjdk.org Thu Jul 4 09:28:24 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 4 Jul 2024 09:28:24 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v22] In-Reply-To: <tqCYUNb0IDg2yp0mq1keFZiQ-ANeBFQZQbxDtHd_iLM=.facaaf44-63e1-44e0-ba18-9634dfd04fc5@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> <tqCYUNb0IDg2yp0mq1keFZiQ-ANeBFQZQbxDtHd_iLM=.facaaf44-63e1-44e0-ba18-9634dfd04fc5@github.com> Message-ID: <EARIa_wPXLx8OWa8uIZIZqLEYmE6jXyvfywSBkXdEwo=.2d52a2b5-837c-4015-a5c7-8537bf01badb@github.com> On Thu, 4 Jul 2024 07:28:34 GMT, Robbin Ehn <rehn at openjdk.org> wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL <trampo> >> Stubs: >> AUIPC >> LD >> JALR >> <DEST> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> <DEST> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 31 commits: > > - Merge branch 'master' into 8332689 > - Rename to reloc_call > - Merge branch 'master' into 8332689 > - Rename lc > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Comments > - Missed in merge-fixes, minor revert > - Merge branch 'master' into 8332689 > - Minor review comments > - ... and 21 more: https://git.openjdk.org/jdk/compare/38a578d5...9eabb5fa All right. I think this has been thoroughly checked. Looks good modulo one small question. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 987: > 985: assert(is_simm32(distance), "Must be"); > 986: Assembler::auipc(temp, (int32_t)distance + 0x800); > 987: Assembler::_ld(temp, temp, ((int32_t)distance << 20) >> 20); Question: Why would you use this low-level `Assembler::_ld` here? ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19453#pullrequestreview-2158438628 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1665415011 From rehn at openjdk.org Thu Jul 4 10:00:23 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 4 Jul 2024 10:00:23 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v22] In-Reply-To: <EARIa_wPXLx8OWa8uIZIZqLEYmE6jXyvfywSBkXdEwo=.2d52a2b5-837c-4015-a5c7-8537bf01badb@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> <tqCYUNb0IDg2yp0mq1keFZiQ-ANeBFQZQbxDtHd_iLM=.facaaf44-63e1-44e0-ba18-9634dfd04fc5@github.com> <EARIa_wPXLx8OWa8uIZIZqLEYmE6jXyvfywSBkXdEwo=.2d52a2b5-837c-4015-a5c7-8537bf01badb@github.com> Message-ID: <RpC7ys1vEQoTjqXMQ68o68ZhWEdJcJi9msPTqz63Xl4=.d798c00c-38a2-48d4-aab5-ebd124441ef4@github.com> On Thu, 4 Jul 2024 09:22:46 GMT, Fei Yang <fyang at openjdk.org> wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 31 commits: >> >> - Merge branch 'master' into 8332689 >> - Rename to reloc_call >> - Merge branch 'master' into 8332689 >> - Rename lc >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Comments >> - Missed in merge-fixes, minor revert >> - Merge branch 'master' into 8332689 >> - Minor review comments >> - ... and 21 more: https://git.openjdk.org/jdk/compare/38a578d5...9eabb5fa > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 987: > >> 985: assert(is_simm32(distance), "Must be"); >> 986: Assembler::auipc(temp, (int32_t)distance + 0x800); >> 987: Assembler::_ld(temp, temp, ((int32_t)distance << 20) >> 20); > > Question: Why would you use this low-level `Assembler::_ld` here? I used to be excplicit about this is the normal **ld**. But I see I did not do the same for **jalr** => **_jalr**. (as we are in MASM we can use 'private' method). I'll change to normal ld and add an assert that we are in a incompressable region? (I would like to revert that at some time, so the user of reloc_call don't need to know about it needs incompressable for reloc_call) Suggested: @@ -982,0 +983 @@ void MacroAssembler::load_link_jump(const address source, Register temp) { + assert(!in_compressible_region(), "Must be"); @@ -987 +988 @@ void MacroAssembler::load_link_jump(const address source, Register temp) { - Assembler::_ld(temp, temp, ((int32_t)distance << 20) >> 20); + Assembler::ld(temp, temp, ((int32_t)distance << 20) >> 20); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1665465771 From eosterlund at openjdk.org Thu Jul 4 10:08:24 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 4 Jul 2024 10:08:24 GMT Subject: RFR: 8334890: Missing unconditional cross modifying fence in nmethod entry barriers In-Reply-To: <GPFw-GRnWeKbjJILxoF3yPuXHTk-PNP5Rs2tQidkjIo=.00fa8b12-2e54-4304-ab8d-d19e04a5d249@github.com> References: <592bq3FIM28SxUn6yH2iCDRT6TO_lpn_WvoS6PglM90=.b965043e-8550-45e0-be8b-5a71163a16d6@github.com> <GPFw-GRnWeKbjJILxoF3yPuXHTk-PNP5Rs2tQidkjIo=.00fa8b12-2e54-4304-ab8d-d19e04a5d249@github.com> Message-ID: <wtWJVsoy68A6CH-hZDosoR8bprqOcJLXfGOEFbQGBEU=.c895fab0-1de8-4ffd-9991-1ed4bba6b637@github.com> On Thu, 4 Jul 2024 08:05:15 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: >> On x86_64, our nmethod entry barriers use a mix of asynchronous and synchronous code modification. There is a cmp instruction with an immediate. When the immediate value is "incorrect", the nmethod is armed, and when it's "correct", it's disarmed. When we load the immediate with the instruction fetcher, we use asynchronous cross modifying code, and when we load the immediate as data, we use synchronous cross modifying code. >> >> We use asynchronous code modification in the fast path of nmethod entry barriers. If the nmethod is concurrently being disarmed while the nmethod entry barrier is executed, then we are guaranteed that if the updated "correct" immediate is observed by the instruction fetcher, then any code modification to the nmethod prior to disarming it on another thread, is guaranteed to also be observed by the instruction fetcher. >> >> However, in the slow path, when the immediate was observed to have the "incorrect" value by the instruction fetcher, we call a C++ function, BarrierSetNMethod::nmethod_stub_entry_barrier. In this function we check if the nmethod is disarmed or armed, by loading the guard value (from the immediate), as data. If we observe the updated value, indicating that the nmethod has become disarmed, we want to enter the nmethod. However, since we used data to signal that the instruction cross modification has happened, it is not safe to execute the concurrently modified instructions, without enforcing a cross modifying code fence. This is synchronous code modification. >> >> There is some questionable optimization that in the stub slow path entry (which we just got to because the nmethod was observed to be armed by the instruction fetcher). It checks "just one more time" if the nmethod concurrently got disarmed, and then exits without cross modification fence. This is an opportunistic optimization that is very unlikely to be useful, since we got into the slow path because it a couple of instructions ago was armed. This opportunistic optimization breaks the synchronous code modification contract, which is that you have to issue an instruction cross modification fence after reading the data that signalled that cross modification has completed successfully. >> >> This patch removes these kinds of opportunistic optimizations from the nmethod entry barrier code, in order to make it more robust and follow the synchronous cross modification dance correctly. > > Looks good. Thanks for the reviews @kimbarrett and @xmas92! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19990#issuecomment-2208600511 From eosterlund at openjdk.org Thu Jul 4 10:08:25 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 4 Jul 2024 10:08:25 GMT Subject: Integrated: 8334890: Missing unconditional cross modifying fence in nmethod entry barriers In-Reply-To: <592bq3FIM28SxUn6yH2iCDRT6TO_lpn_WvoS6PglM90=.b965043e-8550-45e0-be8b-5a71163a16d6@github.com> References: <592bq3FIM28SxUn6yH2iCDRT6TO_lpn_WvoS6PglM90=.b965043e-8550-45e0-be8b-5a71163a16d6@github.com> Message-ID: <08nCRUZWk-JXV63xf5M1oU2_5fVS5hRsHe75eaSOrSs=.b7c41e14-211f-4bbb-bf9b-88cf454704d2@github.com> On Tue, 2 Jul 2024 15:43:08 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: > On x86_64, our nmethod entry barriers use a mix of asynchronous and synchronous code modification. There is a cmp instruction with an immediate. When the immediate value is "incorrect", the nmethod is armed, and when it's "correct", it's disarmed. When we load the immediate with the instruction fetcher, we use asynchronous cross modifying code, and when we load the immediate as data, we use synchronous cross modifying code. > > We use asynchronous code modification in the fast path of nmethod entry barriers. If the nmethod is concurrently being disarmed while the nmethod entry barrier is executed, then we are guaranteed that if the updated "correct" immediate is observed by the instruction fetcher, then any code modification to the nmethod prior to disarming it on another thread, is guaranteed to also be observed by the instruction fetcher. > > However, in the slow path, when the immediate was observed to have the "incorrect" value by the instruction fetcher, we call a C++ function, BarrierSetNMethod::nmethod_stub_entry_barrier. In this function we check if the nmethod is disarmed or armed, by loading the guard value (from the immediate), as data. If we observe the updated value, indicating that the nmethod has become disarmed, we want to enter the nmethod. However, since we used data to signal that the instruction cross modification has happened, it is not safe to execute the concurrently modified instructions, without enforcing a cross modifying code fence. This is synchronous code modification. > > There is some questionable optimization that in the stub slow path entry (which we just got to because the nmethod was observed to be armed by the instruction fetcher). It checks "just one more time" if the nmethod concurrently got disarmed, and then exits without cross modification fence. This is an opportunistic optimization that is very unlikely to be useful, since we got into the slow path because it a couple of instructions ago was armed. This opportunistic optimization breaks the synchronous code modification contract, which is that you have to issue an instruction cross modification fence after reading the data that signalled that cross modification has completed successfully. > > This patch removes these kinds of opportunistic optimizations from the nmethod entry barrier code, in order to make it more robust and follow the synchronous cross modification dance correctly. This pull request has now been integrated. Changeset: c0604fb8 Author: Erik ?sterlund <eosterlund at openjdk.org> URL: https://git.openjdk.org/jdk/commit/c0604fb823d9f3b2e347a9857b11606b223ad8ec Stats: 21 lines in 1 file changed: 1 ins; 17 del; 3 mod 8334890: Missing unconditional cross modifying fence in nmethod entry barriers Reviewed-by: aboldtch, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/19990 From fyang at openjdk.org Thu Jul 4 11:07:21 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 4 Jul 2024 11:07:21 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v22] In-Reply-To: <RpC7ys1vEQoTjqXMQ68o68ZhWEdJcJi9msPTqz63Xl4=.d798c00c-38a2-48d4-aab5-ebd124441ef4@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> <tqCYUNb0IDg2yp0mq1keFZiQ-ANeBFQZQbxDtHd_iLM=.facaaf44-63e1-44e0-ba18-9634dfd04fc5@github.com> <EARIa_wPXLx8OWa8uIZIZqLEYmE6jXyvfywSBkXdEwo=.2d52a2b5-837c-4015-a5c7-8537bf01badb@github.com> <RpC7ys1vEQoTjqXMQ68o68ZhWEdJcJi9msPTqz63Xl4=.d798c00c-38a2-48d4-aab5-ebd124441ef4@github.com> Message-ID: <dSl5ihkZZiE0ZM_a69HgNUOwBPs-Dnd3rmwYOBmaCLc=.15750165-6919-4f8c-9177-08f0dfee8e18@github.com> On Thu, 4 Jul 2024 09:57:44 GMT, Robbin Ehn <rehn at openjdk.org> wrote: >> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 987: >> >>> 985: assert(is_simm32(distance), "Must be"); >>> 986: Assembler::auipc(temp, (int32_t)distance + 0x800); >>> 987: Assembler::_ld(temp, temp, ((int32_t)distance << 20) >> 20); >> >> Question: Why would you use this low-level `Assembler::_ld` here? > > I used to be excplicit about this is the normal **ld**. > But I see I did not do the same for **jalr** => **_jalr**. (as we are in MASM we can use 'private' method). > I'll change to normal ld and add an assert that we are in a incompressable region? > > (I would like to revert that at some time, so the user of reloc_call don't need to know about it needs incompressable for reloc_call) > > Suggested: > > @@ -982,0 +983 @@ void MacroAssembler::load_link_jump(const address source, Register temp) { > + assert(!in_compressible_region(), "Must be"); > @@ -987 +988 @@ void MacroAssembler::load_link_jump(const address source, Register temp) { > - Assembler::_ld(temp, temp, ((int32_t)distance << 20) >> 20); > + Assembler::ld(temp, temp, ((int32_t)distance << 20) >> 20); Seems no need to worry about that? I think it's up to the caller to decide if it wants compressed instructions. Here in this case, the only call site is within a relocate() which will disable compressed instructions for you[1]. relocate(entry.rspec(), [&] { load_link_jump(target); }); [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/assembler_riscv.hpp#L2130 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1665546545 From xpeng at openjdk.org Thu Jul 4 11:16:19 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 4 Jul 2024 11:16:19 GMT Subject: RFR: 8334231: Optimize MethodData layout In-Reply-To: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com> References: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com> Message-ID: <P0oXQu4hhYp8JVZOsPJVL6erv1hvzNOnBYpRyjmg82k=.c4af4468-ba02-40ab-a1c7-727bc3690287@github.com> On Thu, 4 Jul 2024 00:08:35 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote: > Hi all, > This PR is a part of https://bugs.openjdk.org/browse/JDK-8334227 to optimize Hotspot C++ class layouts, this one is for the layout of MethodData. Here is the original layout from `pahole`: > > class MethodData : public Metadata { > public: > > /* class Metadata <ancestor>; */ /* 0 0 */ > > /* XXX 8 bytes hole, try to pack */ > > class Method * _method; /* 8 8 */ > int _size; /* 16 4 */ > int _hint_di; /* 20 4 */ > class Mutex _extra_data_lock; /* 24 104 */ > /* --- cacheline 2 boundary (128 bytes) --- */ > class CompilerCounters _compiler_counters; /* 128 80 */ > /* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */ > intx _eflags; /* 208 8 */ > intx _arg_local; /* 216 8 */ > intx _arg_stack; /* 224 8 */ > intx _arg_returned; /* 232 8 */ > int _creation_mileage; /* 240 4 */ > class InvocationCounter _invocation_counter; /* 244 4 */ > class InvocationCounter _backedge_counter; /* 248 4 */ > int _invocation_counter_start; /* 252 4 */ > /* --- cacheline 4 boundary (256 bytes) --- */ > int _backedge_counter_start; /* 256 4 */ > uint _tenure_traps; /* 260 4 */ > int _invoke_mask; /* 264 4 */ > int _backedge_mask; /* 268 4 */ > short int _num_loops; /* 272 2 */ > short int _num_blocks; /* 274 2 */ > enum WouldProfile _would_profile; /* 276 4 */ > int _jvmci_ir_size; /* 280 4 */ > > /* XXX 4 bytes hole, try to pack */ > > class FailedSpeculation * _failed_speculations; /* 288 8 */ > int _data_size; /* 296 4 */ > int _parameters_type_data_di; /* 300 4 */ > int _exception_handler_data_di; /* 304 4 */ > > /* XXX 4 bytes hole, try to pack */ > > intptr_t _data[1]; /* 312 8 */ > > /* size: 320, cachelines: 5, members: 27 */ > /* sum members: 304, holes: 3, sum holes: 16 */ > }; > > > There are 3 holes ... Thanks David! I assume it is trivial, what do you think? @dholmes-ora ------------- PR Comment: https://git.openjdk.org/jdk/pull/20019#issuecomment-2208714610 From mli at openjdk.org Thu Jul 4 11:21:28 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 4 Jul 2024 11:21:28 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v22] In-Reply-To: <tqCYUNb0IDg2yp0mq1keFZiQ-ANeBFQZQbxDtHd_iLM=.facaaf44-63e1-44e0-ba18-9634dfd04fc5@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> <tqCYUNb0IDg2yp0mq1keFZiQ-ANeBFQZQbxDtHd_iLM=.facaaf44-63e1-44e0-ba18-9634dfd04fc5@github.com> Message-ID: <d--581tMPd5ifCiz-9Y4aSkO4aTz1xsjHymjTTBmeyU=.8663438a-15f6-4069-becc-b6d3db91c27b@github.com> On Thu, 4 Jul 2024 07:28:34 GMT, Robbin Ehn <rehn at openjdk.org> wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL <trampo> >> Stubs: >> AUIPC >> LD >> JALR >> <DEST> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> <DEST> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 31 commits: > > - Merge branch 'master' into 8332689 > - Rename to reloc_call > - Merge branch 'master' into 8332689 > - Rename lc > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Comments > - Missed in merge-fixes, minor revert > - Merge branch 'master' into 8332689 > - Minor review comments > - ... and 21 more: https://git.openjdk.org/jdk/compare/38a578d5...9eabb5fa I agree, let's move forward, and remove the old one if possible. ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19453#pullrequestreview-2158687339 From chagedorn at openjdk.org Thu Jul 4 11:30:19 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 4 Jul 2024 11:30:19 GMT Subject: RFR: 8334231: Optimize MethodData layout In-Reply-To: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com> References: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com> Message-ID: <tqL2MTD8hE0Dq2mpMhT80f3kscrUPkzY8NNMyJbZzFo=.c9857e46-7d65-4124-960f-5d6878b03a91@github.com> On Thu, 4 Jul 2024 00:08:35 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote: > Hi all, > This PR is a part of https://bugs.openjdk.org/browse/JDK-8334227 to optimize Hotspot C++ class layouts, this one is for the layout of MethodData. Here is the original layout from `pahole`: > > class MethodData : public Metadata { > public: > > /* class Metadata <ancestor>; */ /* 0 0 */ > > /* XXX 8 bytes hole, try to pack */ > > class Method * _method; /* 8 8 */ > int _size; /* 16 4 */ > int _hint_di; /* 20 4 */ > class Mutex _extra_data_lock; /* 24 104 */ > /* --- cacheline 2 boundary (128 bytes) --- */ > class CompilerCounters _compiler_counters; /* 128 80 */ > /* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */ > intx _eflags; /* 208 8 */ > intx _arg_local; /* 216 8 */ > intx _arg_stack; /* 224 8 */ > intx _arg_returned; /* 232 8 */ > int _creation_mileage; /* 240 4 */ > class InvocationCounter _invocation_counter; /* 244 4 */ > class InvocationCounter _backedge_counter; /* 248 4 */ > int _invocation_counter_start; /* 252 4 */ > /* --- cacheline 4 boundary (256 bytes) --- */ > int _backedge_counter_start; /* 256 4 */ > uint _tenure_traps; /* 260 4 */ > int _invoke_mask; /* 264 4 */ > int _backedge_mask; /* 268 4 */ > short int _num_loops; /* 272 2 */ > short int _num_blocks; /* 274 2 */ > enum WouldProfile _would_profile; /* 276 4 */ > int _jvmci_ir_size; /* 280 4 */ > > /* XXX 4 bytes hole, try to pack */ > > class FailedSpeculation * _failed_speculations; /* 288 8 */ > int _data_size; /* 296 4 */ > int _parameters_type_data_di; /* 300 4 */ > int _exception_handler_data_di; /* 304 4 */ > > /* XXX 4 bytes hole, try to pack */ > > intptr_t _data[1]; /* 312 8 */ > > /* size: 320, cachelines: 5, members: 27 */ > /* sum members: 304, holes: 3, sum holes: 16 */ > }; > > > There are 3 holes ... Looks good to me, too. Yes, I think this is trivial. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20019#pullrequestreview-2158700920 From xpeng at openjdk.org Thu Jul 4 11:30:19 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 4 Jul 2024 11:30:19 GMT Subject: RFR: 8334231: Optimize MethodData layout In-Reply-To: <tqL2MTD8hE0Dq2mpMhT80f3kscrUPkzY8NNMyJbZzFo=.c9857e46-7d65-4124-960f-5d6878b03a91@github.com> References: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com> <tqL2MTD8hE0Dq2mpMhT80f3kscrUPkzY8NNMyJbZzFo=.c9857e46-7d65-4124-960f-5d6878b03a91@github.com> Message-ID: <Ty-mRJRi5dDG3ybiCqNj1xT3cDTOABdMZZ9aseGchvw=.c4cd3af8-1ded-4f76-9591-5e2e7663ed1b@github.com> On Thu, 4 Jul 2024 11:25:39 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote: >> Hi all, >> This PR is a part of https://bugs.openjdk.org/browse/JDK-8334227 to optimize Hotspot C++ class layouts, this one is for the layout of MethodData. Here is the original layout from `pahole`: >> >> class MethodData : public Metadata { >> public: >> >> /* class Metadata <ancestor>; */ /* 0 0 */ >> >> /* XXX 8 bytes hole, try to pack */ >> >> class Method * _method; /* 8 8 */ >> int _size; /* 16 4 */ >> int _hint_di; /* 20 4 */ >> class Mutex _extra_data_lock; /* 24 104 */ >> /* --- cacheline 2 boundary (128 bytes) --- */ >> class CompilerCounters _compiler_counters; /* 128 80 */ >> /* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */ >> intx _eflags; /* 208 8 */ >> intx _arg_local; /* 216 8 */ >> intx _arg_stack; /* 224 8 */ >> intx _arg_returned; /* 232 8 */ >> int _creation_mileage; /* 240 4 */ >> class InvocationCounter _invocation_counter; /* 244 4 */ >> class InvocationCounter _backedge_counter; /* 248 4 */ >> int _invocation_counter_start; /* 252 4 */ >> /* --- cacheline 4 boundary (256 bytes) --- */ >> int _backedge_counter_start; /* 256 4 */ >> uint _tenure_traps; /* 260 4 */ >> int _invoke_mask; /* 264 4 */ >> int _backedge_mask; /* 268 4 */ >> short int _num_loops; /* 272 2 */ >> short int _num_blocks; /* 274 2 */ >> enum WouldProfile _would_profile; /* 276 4 */ >> int _jvmci_ir_size; /* 280 4 */ >> >> /* XXX 4 bytes hole, try to pack */ >> >> class FailedSpeculation * _failed_speculations; /* 288 8 */ >> int _data_size; /* 296 4 */ >> int _parameters_type_data_di; /* 300 4 */ >> int _exception_handler_data_di; /* 304 4 */ >> >> /* XXX 4 bytes hole, try to pack */ >> >> intptr_t _data[1]; /* 312 8 */ >> >> /* size: 320, cachelin... > > Looks good to me, too. Yes, I think this is trivial. Thank you for the review! @chhagedorn ------------- PR Comment: https://git.openjdk.org/jdk/pull/20019#issuecomment-2208737190 From duke at openjdk.org Thu Jul 4 11:30:19 2024 From: duke at openjdk.org (duke) Date: Thu, 4 Jul 2024 11:30:19 GMT Subject: RFR: 8334231: Optimize MethodData layout In-Reply-To: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com> References: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com> Message-ID: <3eechMaFiFrPJ2ak8xsLpFhwvXtT8caB3UxZZ_EWI6Y=.2fbe9e2c-5b3f-4f78-8a94-64b616e31e4d@github.com> On Thu, 4 Jul 2024 00:08:35 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote: > Hi all, > This PR is a part of https://bugs.openjdk.org/browse/JDK-8334227 to optimize Hotspot C++ class layouts, this one is for the layout of MethodData. Here is the original layout from `pahole`: > > class MethodData : public Metadata { > public: > > /* class Metadata <ancestor>; */ /* 0 0 */ > > /* XXX 8 bytes hole, try to pack */ > > class Method * _method; /* 8 8 */ > int _size; /* 16 4 */ > int _hint_di; /* 20 4 */ > class Mutex _extra_data_lock; /* 24 104 */ > /* --- cacheline 2 boundary (128 bytes) --- */ > class CompilerCounters _compiler_counters; /* 128 80 */ > /* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */ > intx _eflags; /* 208 8 */ > intx _arg_local; /* 216 8 */ > intx _arg_stack; /* 224 8 */ > intx _arg_returned; /* 232 8 */ > int _creation_mileage; /* 240 4 */ > class InvocationCounter _invocation_counter; /* 244 4 */ > class InvocationCounter _backedge_counter; /* 248 4 */ > int _invocation_counter_start; /* 252 4 */ > /* --- cacheline 4 boundary (256 bytes) --- */ > int _backedge_counter_start; /* 256 4 */ > uint _tenure_traps; /* 260 4 */ > int _invoke_mask; /* 264 4 */ > int _backedge_mask; /* 268 4 */ > short int _num_loops; /* 272 2 */ > short int _num_blocks; /* 274 2 */ > enum WouldProfile _would_profile; /* 276 4 */ > int _jvmci_ir_size; /* 280 4 */ > > /* XXX 4 bytes hole, try to pack */ > > class FailedSpeculation * _failed_speculations; /* 288 8 */ > int _data_size; /* 296 4 */ > int _parameters_type_data_di; /* 300 4 */ > int _exception_handler_data_di; /* 304 4 */ > > /* XXX 4 bytes hole, try to pack */ > > intptr_t _data[1]; /* 312 8 */ > > /* size: 320, cachelines: 5, members: 27 */ > /* sum members: 304, holes: 3, sum holes: 16 */ > }; > > > There are 3 holes ... @pengxiaolong Your change (at version 12e7aaf5c0a4ed529337d773e78a35a893f6cb44) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20019#issuecomment-2208737573 From liach at openjdk.org Thu Jul 4 11:51:19 2024 From: liach at openjdk.org (Chen Liang) Date: Thu, 4 Jul 2024 11:51:19 GMT Subject: RFR: 8335638: Calling VarHandle.{access-mode} methods reflectively throws wrong exception [v2] In-Reply-To: <1yQze0X7kl1oxFtlWu0rtJwHF2WtnZYJ7t6OteIJAnQ=.85eae267-7848-4978-aa11-9f2720e67e00@github.com> References: <gD4D2MSMO5dqwOf-XWA1u-a50e59goP8F_6be-mermA=.d172f4cf-14ad-492b-bdcc-8cf39d77c8ef@github.com> <1yQze0X7kl1oxFtlWu0rtJwHF2WtnZYJ7t6OteIJAnQ=.85eae267-7848-4978-aa11-9f2720e67e00@github.com> Message-ID: <MJxYuFBXEn0tEnHqV9bLaOTxjPRtxU7Lb1wumWvVR6g=.4eada565-e293-4b7f-b55a-270c82a3ea79@github.com> On Thu, 4 Jul 2024 06:22:31 GMT, Hannes Greule <hgreule at openjdk.org> wrote: >> Similar to how `MethodHandle#invoke(Exact)` methods are already handled, this change adds special casing for `VarHandle.{access-mode}` methods. >> >> The exception message is less exact, but I think that's acceptable. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > address comments Marked as reviewed by liach (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20015#pullrequestreview-2158742384 From stuefe at openjdk.org Thu Jul 4 12:44:19 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 4 Jul 2024 12:44:19 GMT Subject: RFR: 8334231: Optimize MethodData layout In-Reply-To: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com> References: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com> Message-ID: <qBONEcrgJyYqSsBdiDRbA9NeV8sC8uXKRY2zbpDE8Fc=.1dfd2cbc-5982-4958-b7cb-313d0c52139a@github.com> On Thu, 4 Jul 2024 00:08:35 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote: > Hi all, > This PR is a part of https://bugs.openjdk.org/browse/JDK-8334227 to optimize Hotspot C++ class layouts, this one is for the layout of MethodData. Here is the original layout from `pahole`: > > class MethodData : public Metadata { > public: > > /* class Metadata <ancestor>; */ /* 0 0 */ > > /* XXX 8 bytes hole, try to pack */ > > class Method * _method; /* 8 8 */ > int _size; /* 16 4 */ > int _hint_di; /* 20 4 */ > class Mutex _extra_data_lock; /* 24 104 */ > /* --- cacheline 2 boundary (128 bytes) --- */ > class CompilerCounters _compiler_counters; /* 128 80 */ > /* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */ > intx _eflags; /* 208 8 */ > intx _arg_local; /* 216 8 */ > intx _arg_stack; /* 224 8 */ > intx _arg_returned; /* 232 8 */ > int _creation_mileage; /* 240 4 */ > class InvocationCounter _invocation_counter; /* 244 4 */ > class InvocationCounter _backedge_counter; /* 248 4 */ > int _invocation_counter_start; /* 252 4 */ > /* --- cacheline 4 boundary (256 bytes) --- */ > int _backedge_counter_start; /* 256 4 */ > uint _tenure_traps; /* 260 4 */ > int _invoke_mask; /* 264 4 */ > int _backedge_mask; /* 268 4 */ > short int _num_loops; /* 272 2 */ > short int _num_blocks; /* 274 2 */ > enum WouldProfile _would_profile; /* 276 4 */ > int _jvmci_ir_size; /* 280 4 */ > > /* XXX 4 bytes hole, try to pack */ > > class FailedSpeculation * _failed_speculations; /* 288 8 */ > int _data_size; /* 296 4 */ > int _parameters_type_data_di; /* 300 4 */ > int _exception_handler_data_di; /* 304 4 */ > > /* XXX 4 bytes hole, try to pack */ > > intptr_t _data[1]; /* 312 8 */ > > /* size: 320, cachelines: 5, members: 27 */ > /* sum members: 304, holes: 3, sum holes: 16 */ > }; > > > There are 3 holes ... I don't think these "Optimize XXX layouts" should be marked as trivial, and I worry that we overuse the trivial rule. It circumvents the second reviewer as well as the 24hr rule, which both are necessary safeties. Especially in the wake of the xz fiasco. "Trivial" is usually reserved for either changes that need very quick reaction (e.g. reasonably simple build errors that require immediate fixing because everyone's CI is standing still) or things that are painfully obvious in being trivial, e.g. comment changes. Memory layout changes are neither urgent nor really trivial enough IMHO. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20019#issuecomment-2208881597 From rehn at openjdk.org Thu Jul 4 13:21:23 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 4 Jul 2024 13:21:23 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v22] In-Reply-To: <dSl5ihkZZiE0ZM_a69HgNUOwBPs-Dnd3rmwYOBmaCLc=.15750165-6919-4f8c-9177-08f0dfee8e18@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> <tqCYUNb0IDg2yp0mq1keFZiQ-ANeBFQZQbxDtHd_iLM=.facaaf44-63e1-44e0-ba18-9634dfd04fc5@github.com> <EARIa_wPXLx8OWa8uIZIZqLEYmE6jXyvfywSBkXdEwo=.2d52a2b5-837c-4015-a5c7-8537bf01badb@github.com> <RpC7ys1vEQoTjqXMQ68o68ZhWEdJcJi9msPTqz63Xl4=.d798c00c-38a2-48d4-aab5-ebd124441ef4@github.com> <dSl5ihkZZiE0ZM_a69HgNUOwBPs-Dnd3rmwYOBmaCLc=.15750165-6919-4f8c-9177-08f0dfee8e18@github.com> Message-ID: <kPV0mynNKt-zkoeR924qfwLg_9BjAeOb0CBPfWj495g=.9c5ca84a-f607-4a26-9891-0d014f040e75@github.com> On Thu, 4 Jul 2024 11:04:52 GMT, Fei Yang <fyang at openjdk.org> wrote: >> I used to be excplicit about this is the normal **ld**. >> But I see I did not do the same for **jalr** => **_jalr**. (as we are in MASM we can use 'private' method). >> I'll change to normal ld and add an assert that we are in a incompressable region? >> >> (I would like to revert that at some time, so the user of reloc_call don't need to know about it needs incompressable for reloc_call) >> >> Suggested: >> >> @@ -982,0 +983 @@ void MacroAssembler::load_link_jump(const address source, Register temp) { >> + assert(!in_compressible_region(), "Must be"); >> @@ -987 +988 @@ void MacroAssembler::load_link_jump(const address source, Register temp) { >> - Assembler::_ld(temp, temp, ((int32_t)distance << 20) >> 20); >> + Assembler::ld(temp, temp, ((int32_t)distance << 20) >> 20); > > Seems no need to worry about that? I think it's up to the caller to decide if it wants compressed instructions. > Here in this case, the only call site is within a relocate() which will disable compressed instructions for you[1]. > > relocate(entry.rspec(), [&] { > load_link_jump(target); > }); > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/assembler_riscv.hpp#L2130 When you use reloc_call the size of the call must be an exact size we already specified (3 * NativeInstruction::instruction_size). (if those sizes don't apply you don't want a reloc_call) So there is nothing to choose from the caller prespective. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1665703906 From jsjolen at openjdk.org Thu Jul 4 13:23:43 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 4 Jul 2024 13:23:43 GMT Subject: RFR: 8335701: Make GrowableArray templated by an Index Message-ID: <RdHPj2BymMdh9XdDmzcAtFCxfvfPfA0jxKk5lDK-GPI=.cae72821-07f6-4092-934c-b4bbd08a8167@github.com> Hi, Today the GrowableArray has a set index type of `int`, this PR makes it so that you can set your own index type through a template parameter. This opens up for a few new design choices: - Do you know that you have a very small array? Use an `uint8_t` for len and cap, each. - Do you have a very large one? Use an `uint64_t`. The code has opted for `int` being default, as to keep identical semantics for all existing code and to let users not have to worry about the index if they don't care. One "major" change that I don't want to get lost in the review: I've changed the mid-point calculation to be overflow insensitive without casting. // Old mid = ((max + min) / 2); // New mid = min + ((max - min) / 2); Some semi-rigorous thinking: min \in [0, len) max \in [0, len) min <= max max - min / 2 \in [0, len/2) Maximizing min and max => len + 0 Maximizing max, minimizing min => len/2 Minimizing max, maximizing min => max = min => min // Proof that they're identical when m, h, l \in N (1) m = l + (h - l) / 2 <=> 2m = 2l + h - l = h + l (2) m = (h + l) / 2 <=> 2m = h + l (1) = (2) QED ------------- Commit messages: - Fix spelling and actually include the growableArray is it used in cpp file - Move - Handle unhandled oops - Make GrowableArray receive an Index type optionally Changes: https://git.openjdk.org/jdk/pull/20031/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20031&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335701 Stats: 262 lines in 32 files changed: 19 ins; 25 del; 218 mod Patch: https://git.openjdk.org/jdk/pull/20031.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20031/head:pull/20031 PR: https://git.openjdk.org/jdk/pull/20031 From rehn at openjdk.org Thu Jul 4 13:24:22 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 4 Jul 2024 13:24:22 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v22] In-Reply-To: <kPV0mynNKt-zkoeR924qfwLg_9BjAeOb0CBPfWj495g=.9c5ca84a-f607-4a26-9891-0d014f040e75@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> <tqCYUNb0IDg2yp0mq1keFZiQ-ANeBFQZQbxDtHd_iLM=.facaaf44-63e1-44e0-ba18-9634dfd04fc5@github.com> <EARIa_wPXLx8OWa8uIZIZqLEYmE6jXyvfywSBkXdEwo=.2d52a2b5-837c-4015-a5c7-8537bf01badb@github.com> <RpC7ys1vEQoTjqXMQ68o68ZhWEdJcJi9msPTqz63Xl4=.d798c00c-38a2-48d4-aab5-ebd124441ef4@github.com> <dSl5ihkZZiE0ZM_a69HgNUOwBPs-Dnd3rmwYOBmaCLc=.15750165-6919-4f8c-9177-08f0dfee8e18@github.com> <kPV0mynNKt-zkoeR924qfwLg_9BjAeOb0CBPfWj495g=.9c5ca84a-f607-4a26-9891-0d014f040e75@github.com> Message-ID: <dD4ZxN87ZZkJVYPvC3wzpucTt2TDq7-PEXloNDYsu-0=.445db78a-858a-4204-9fcf-c98dc9e6bbe7@github.com> On Thu, 4 Jul 2024 13:18:57 GMT, Robbin Ehn <rehn at openjdk.org> wrote: >> Seems no need to worry about that? I think it's up to the caller to decide if it wants compressed instructions. >> Here in this case, the only call site is within a relocate() which will disable compressed instructions for you[1]. >> >> relocate(entry.rspec(), [&] { >> load_link_jump(target); >> }); >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/assembler_riscv.hpp#L2130 > > When you use reloc_call the size of the call must be an exact size we already specified (3 * NativeInstruction::instruction_size). (if those sizes don't apply you don't want a reloc_call) > So there is nothing to choose from the caller prespective. Maybe in the future load_link_jump will be used out side reloc_call, was that your thinking? Anyhow change _ld to ld ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1665707305 From jsjolen at openjdk.org Thu Jul 4 13:35:36 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 4 Jul 2024 13:35:36 GMT Subject: RFR: 8335701: Make GrowableArray templated by an Index [v2] In-Reply-To: <RdHPj2BymMdh9XdDmzcAtFCxfvfPfA0jxKk5lDK-GPI=.cae72821-07f6-4092-934c-b4bbd08a8167@github.com> References: <RdHPj2BymMdh9XdDmzcAtFCxfvfPfA0jxKk5lDK-GPI=.cae72821-07f6-4092-934c-b4bbd08a8167@github.com> Message-ID: <VN0fNxU6lHhckcxd-NtBrSwE8x5o52dTv89e8NuudGM=.a81cf548-3bb0-4610-a9b6-d783b6311984@github.com> > Hi, > > Today the GrowableArray has a set index type of `int`, this PR makes it so that you can set your own index type through a template parameter. > > This opens up for a few new design choices: > > - Do you know that you have a very small array? Use an `uint8_t` for len and cap, each. > - Do you have a very large one? Use an `uint64_t`. > > The code has opted for `int` being default, as to keep identical semantics for all existing code and to let users not have to worry about the index if they don't care. > > One "major" change that I don't want to get lost in the review: I've changed the mid-point calculation to be overflow insensitive without casting. > > > > // Old > mid = ((max + min) / 2); > // New > mid = min + ((max - min) / 2); > > Some semi-rigorous thinking: > min \in [0, len) > max \in [0, len) > min <= max > max - min / 2 \in [0, len/2) > Maximizing min and max => len + 0 > Maximizing max, minimizing min => len/2 > Minimizing max, maximizing min => max = min => min > > > // Proof that they're identical when m, h, l \in N > (1) m = l + (h - l) / 2 <=> > 2m = 2l + h - l = h + l > > (2) m = (h + l) / 2 <=> > 2m = h + l > (1) = (2) > QED Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Attempt at fixing GA VMStruct ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20031/files - new: https://git.openjdk.org/jdk/pull/20031/files/7407a151..b5a87422 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20031&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20031&range=00-01 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20031.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20031/head:pull/20031 PR: https://git.openjdk.org/jdk/pull/20031 From jsjolen at openjdk.org Thu Jul 4 13:41:17 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 4 Jul 2024 13:41:17 GMT Subject: RFR: 8335701: Make GrowableArray templated by an Index [v2] In-Reply-To: <VN0fNxU6lHhckcxd-NtBrSwE8x5o52dTv89e8NuudGM=.a81cf548-3bb0-4610-a9b6-d783b6311984@github.com> References: <RdHPj2BymMdh9XdDmzcAtFCxfvfPfA0jxKk5lDK-GPI=.cae72821-07f6-4092-934c-b4bbd08a8167@github.com> <VN0fNxU6lHhckcxd-NtBrSwE8x5o52dTv89e8NuudGM=.a81cf548-3bb0-4610-a9b6-d783b6311984@github.com> Message-ID: <OpawmhVDx5yj4qXdiiYcp08Qxc0KyAVPFv95F272n3o=.bc3aa06c-2954-4376-a0a5-54363b2b76d6@github.com> On Thu, 4 Jul 2024 13:35:36 GMT, Johan Sj?len <jsjolen at openjdk.org> wrote: >> Hi, >> >> Today the GrowableArray has a set index type of `int`, this PR makes it so that you can set your own index type through a template parameter. >> >> This opens up for a few new design choices: >> >> - Do you know that you have a very small array? Use an `uint8_t` for len and cap, each. >> - Do you have a very large one? Use an `uint64_t`. >> >> The code has opted for `int` being default, as to keep identical semantics for all existing code and to let users not have to worry about the index if they don't care. >> >> One "major" change that I don't want to get lost in the review: I've changed the mid-point calculation to be overflow insensitive without casting. >> >> >> >> // Old >> mid = ((max + min) / 2); >> // New >> mid = min + ((max - min) / 2); >> >> Some semi-rigorous thinking: >> min \in [0, len) >> max \in [0, len) >> min <= max >> max - min / 2 \in [0, len/2) >> Maximizing min and max => len + 0 >> Maximizing max, minimizing min => len/2 >> Minimizing max, maximizing min => max = min => min >> >> >> // Proof that they're identical when m, h, l \in N >> (1) m = l + (h - l) / 2 <=> >> 2m = 2l + h - l = h + l >> >> (2) m = (h + l) / 2 <=> >> 2m = h + l >> (1) = (2) >> QED > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Attempt at fixing GA VMStruct Always fun to grapple with vmStructs, moving this back to draft. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20031#issuecomment-2209022728 From eosterlund at openjdk.org Thu Jul 4 13:58:26 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 4 Jul 2024 13:58:26 GMT Subject: [jdk23] RFR: 8334890: Missing unconditional cross modifying fence in nmethod entry barriers Message-ID: <Zl8M3k7N0G-oBFmAyX3oO6RrxeNnCdr2lH9JyrdX0GQ=.4d2d82e8-49f7-44ec-84ff-0c0d6794b9e5@github.com> 8334890: Missing unconditional cross modifying fence in nmethod entry barriers ------------- Commit messages: - Backport c0604fb823d9f3b2e347a9857b11606b223ad8ec Changes: https://git.openjdk.org/jdk/pull/20036/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20036&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334890 Stats: 18 lines in 1 file changed: 1 ins; 14 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20036.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20036/head:pull/20036 PR: https://git.openjdk.org/jdk/pull/20036 From aboldtch at openjdk.org Thu Jul 4 13:58:26 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 4 Jul 2024 13:58:26 GMT Subject: [jdk23] RFR: 8334890: Missing unconditional cross modifying fence in nmethod entry barriers In-Reply-To: <Zl8M3k7N0G-oBFmAyX3oO6RrxeNnCdr2lH9JyrdX0GQ=.4d2d82e8-49f7-44ec-84ff-0c0d6794b9e5@github.com> References: <Zl8M3k7N0G-oBFmAyX3oO6RrxeNnCdr2lH9JyrdX0GQ=.4d2d82e8-49f7-44ec-84ff-0c0d6794b9e5@github.com> Message-ID: <sKjd64NGI947n-5zuIRi78OPMZfwgreI54UDcyTeTW0=.a28e7639-c100-4e00-938c-15d0dcb8ca63@github.com> On Thu, 4 Jul 2024 13:52:09 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: > 8334890: Missing unconditional cross modifying fence in nmethod entry barriers Marked as reviewed by aboldtch (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20036#pullrequestreview-2158996704 From fyang at openjdk.org Thu Jul 4 14:04:21 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 4 Jul 2024 14:04:21 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v22] In-Reply-To: <dD4ZxN87ZZkJVYPvC3wzpucTt2TDq7-PEXloNDYsu-0=.445db78a-858a-4204-9fcf-c98dc9e6bbe7@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> <tqCYUNb0IDg2yp0mq1keFZiQ-ANeBFQZQbxDtHd_iLM=.facaaf44-63e1-44e0-ba18-9634dfd04fc5@github.com> <EARIa_wPXLx8OWa8uIZIZqLEYmE6jXyvfywSBkXdEwo=.2d52a2b5-837c-4015-a5c7-8537bf01badb@github.com> <RpC7ys1vEQoTjqXMQ68o68ZhWEdJcJi9msPTqz63Xl4=.d798c00c-38a2-48d4-aab5-ebd124441ef4@github.com> <dSl5ihkZZiE0ZM_a69HgNUOwBPs-Dnd3rmwYOBmaCLc=.15750165-6919-4f8c-9177-08f0dfee8e18@github.com> <kPV0mynNKt-zkoeR924qfwLg_9BjAeOb0CBPfWj495g=.9c5ca84a-f607-4a26-9891-0d014f040e75@github.com> <dD4ZxN87ZZkJVYPvC3wzpucTt2TDq7-PEXloNDYsu-0=.445db78a-858a-4204-9fcf-c98dc9e6bbe7@github.com> Message-ID: <_pvvhby0M1_L7J34xtX5ZXQSjyBndjiqUAvc7ohj5ng=.f1a0adaf-5961-4cbf-8001-b4a066fdb201@github.com> On Thu, 4 Jul 2024 13:21:46 GMT, Robbin Ehn <rehn at openjdk.org> wrote: >> When you use reloc_call the size of the call must be an exact size we already specified (3 * NativeInstruction::instruction_size). (if those sizes don't apply you don't want a reloc_call) >> So there is nothing to choose from the caller prespective. > > Maybe in the future load_link_jump will be used out side reloc_call, was that your thinking? > > Anyhow change _ld to ld ? Yeah, I just feel it's better to decouple the two and let the users of `load_link_jump` to decide. (Or just inline this `load_link_jump` into the caller?) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1665753700 From rehn at openjdk.org Thu Jul 4 14:48:36 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 4 Jul 2024 14:48:36 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v23] In-Reply-To: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> Message-ID: <CJzw2cha3OyqX9jnxeFj9se8z4V6alfhaTAHxj_R63k=.86e35c57-9bf9-4d22-a350-45d10c4e307b@github.com> > Hi all, please consider! > > Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). > Using a very small application or running very short time we have fast patchable calls. > But any normal application running longer will increase the code size and code chrun/fragmentation. > So whatever or not you get hot fast calls rely on luck. > > To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. > This would be the common case for a patchable call. > > Code stream: > JAL <trampo> > Stubs: > AUIPC > LD > JALR > <DEST> > > > On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. > Even if you don't have that problem having a call to a jump is not the fastest way. > Loading the address avoids the pitsfalls of cmodx. > > This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, > and instead do by default: > > Code stream: > AUIPC > LD > JALR > Stubs: > <DEST> > > An experimental option for turning trampolines back on exists. > > It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. > > Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): > > fop (msec) 2239 | 2128 = 0.950424 > h2 (msec) 18660 | 16594 = 0.889282 > jython (msec) 22022 | 21925 = 0.995595 > luindex (msec) 2866 | 2842 = 0.991626 > lusearch (msec) 4108 | 4311 = 1.04942 > lusearch-fix (msec) 4406 | 4116 = 0.934181 > pmd (msec) 5976 | 5897 = 0.98678 > jython (msec) 22022 | 21925 = 0.995595 > Avg: 0.974112 > fop(xcomp) (msec) 2721 | 2714 = 0.997427 > h2(xcomp) (msec) 37719 | 38004 = 1.00756 > jython(xcomp) ... Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: _ld to ld ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19453/files - new: https://git.openjdk.org/jdk/pull/19453/files/9eabb5fa..b958ee0f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=21-22 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19453.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453 PR: https://git.openjdk.org/jdk/pull/19453 From rehn at openjdk.org Thu Jul 4 14:48:38 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 4 Jul 2024 14:48:38 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v2] In-Reply-To: <VUlq4fBerYjwcZKXnY-1R_WIRuk-nlFDYqe5MjeXvRs=.a06722cd-9e8d-4117-b2dc-9f20a8f0b60c@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> <3W3z-PDFsRFSclrP3FJRmnEjL4rLDRSUEFN5qkFxSUI=.feb03562-9ca3-4383-94cd-967d4234a4aa@github.com> <VUlq4fBerYjwcZKXnY-1R_WIRuk-nlFDYqe5MjeXvRs=.a06722cd-9e8d-4117-b2dc-9f20a8f0b60c@github.com> Message-ID: <cg4Wr-Co1XRUaF30v4SJ-9bjTjV_kmVh_O5nJaFc6m4=.78ed450b-1e65-4072-b39a-3f5f337538cc@github.com> On Tue, 4 Jun 2024 11:05:19 GMT, Hamlin Li <mli at openjdk.org> wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into 8332689 >> - Remove accidental files >> - Remove accidental files >> - Baseline > > I see new classes are added in nativeInst, maybe the comments at the top of nativeInst.hpp needs updated accordingly. Thanks @Hamlin-Li @RealFYang ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2209160416 From rehn at openjdk.org Thu Jul 4 14:48:38 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 4 Jul 2024 14:48:38 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v22] In-Reply-To: <_pvvhby0M1_L7J34xtX5ZXQSjyBndjiqUAvc7ohj5ng=.f1a0adaf-5961-4cbf-8001-b4a066fdb201@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> <tqCYUNb0IDg2yp0mq1keFZiQ-ANeBFQZQbxDtHd_iLM=.facaaf44-63e1-44e0-ba18-9634dfd04fc5@github.com> <EARIa_wPXLx8OWa8uIZIZqLEYmE6jXyvfywSBkXdEwo=.2d52a2b5-837c-4015-a5c7-8537bf01badb@github.com> <RpC7ys1vEQoTjqXMQ68o68ZhWEdJcJi9msPTqz63Xl4=.d798c00c-38a2-48d4-aab5-ebd124441ef4@github.com> <dSl5ihkZZiE0ZM_a69HgNUOwBPs-Dnd3rmwYOBmaCLc=.15750165-6919-4f8c-9177-08f0dfee8e18@github.com> <kPV0mynNKt-zkoeR924qfwLg_9BjAeOb0CBPfWj495g=.9c5ca84a-f607-4a26-9891-0d014f040e75@github.com> <dD4ZxN87ZZkJVYPvC3wzpucTt2TDq7-PEXloNDYsu-0=.445db78a-858a-4204-9fcf-c98dc9e6bbe7@github.com> <_pvvhby0M1_L7J34xtX5ZXQSjyBndjiqUAvc7ohj5ng=.f1a0adaf-5961-4cbf-8001-b4a066fdb201@github.com> Message-ID: <g5d3WUbf-ppmfPcgJ_qEBeDG333mwbMCWRVc61Vuucs=.4cae2e43-dcd2-4ae1-957f-43615c749f2a@github.com> On Thu, 4 Jul 2024 13:58:33 GMT, Fei Yang <fyang at openjdk.org> wrote: >> Maybe in the future load_link_jump will be used out side reloc_call, was that your thinking? >> >> Anyhow change _ld to ld ? > > Yeah, I just feel it's better to decouple the two and let the users of `load_link_jump` to decide. (Or just inline this `load_link_jump` into the caller?) I just changed to ld. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1665807685 From chagedorn at openjdk.org Thu Jul 4 15:00:20 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 4 Jul 2024 15:00:20 GMT Subject: RFR: 8334231: Optimize MethodData layout In-Reply-To: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com> References: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com> Message-ID: <Cf9U1p3z4utQLP6ygOGk-1Os1CNzSzuiSElIhju-q9Y=.e30c51da-c25e-4aae-adf3-253327a9dc9f@github.com> On Thu, 4 Jul 2024 00:08:35 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote: > Hi all, > This PR is a part of https://bugs.openjdk.org/browse/JDK-8334227 to optimize Hotspot C++ class layouts, this one is for the layout of MethodData. Here is the original layout from `pahole`: > > class MethodData : public Metadata { > public: > > /* class Metadata <ancestor>; */ /* 0 0 */ > > /* XXX 8 bytes hole, try to pack */ > > class Method * _method; /* 8 8 */ > int _size; /* 16 4 */ > int _hint_di; /* 20 4 */ > class Mutex _extra_data_lock; /* 24 104 */ > /* --- cacheline 2 boundary (128 bytes) --- */ > class CompilerCounters _compiler_counters; /* 128 80 */ > /* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */ > intx _eflags; /* 208 8 */ > intx _arg_local; /* 216 8 */ > intx _arg_stack; /* 224 8 */ > intx _arg_returned; /* 232 8 */ > int _creation_mileage; /* 240 4 */ > class InvocationCounter _invocation_counter; /* 244 4 */ > class InvocationCounter _backedge_counter; /* 248 4 */ > int _invocation_counter_start; /* 252 4 */ > /* --- cacheline 4 boundary (256 bytes) --- */ > int _backedge_counter_start; /* 256 4 */ > uint _tenure_traps; /* 260 4 */ > int _invoke_mask; /* 264 4 */ > int _backedge_mask; /* 268 4 */ > short int _num_loops; /* 272 2 */ > short int _num_blocks; /* 274 2 */ > enum WouldProfile _would_profile; /* 276 4 */ > int _jvmci_ir_size; /* 280 4 */ > > /* XXX 4 bytes hole, try to pack */ > > class FailedSpeculation * _failed_speculations; /* 288 8 */ > int _data_size; /* 296 4 */ > int _parameters_type_data_di; /* 300 4 */ > int _exception_handler_data_di; /* 304 4 */ > > /* XXX 4 bytes hole, try to pack */ > > intptr_t _data[1]; /* 312 8 */ > > /* size: 320, cachelines: 5, members: 27 */ > /* sum members: 304, holes: 3, sum holes: 16 */ > }; > > > There are 3 holes ... That's a fair point. It's sometimes tricky to find the boundary between trivial and non-trivial. Here I thought about it for a bit but it looked trivial (but I understand that you could think about it differently) and I was the second reviewer. Nevertheless, as you pointed out, I agree that we should probably restrict the use of this rule to the really urgent issues (most are not). My default is usually to just wait 24h for normal issues even for changes that are marked as trivial just to give everyone a chance to have a look. Sometimes, you still get some valuable feedback that you could have missed otherwise when integrating early. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20019#issuecomment-2209179254 From aph at openjdk.org Thu Jul 4 15:29:22 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 4 Jul 2024 15:29:22 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v17] In-Reply-To: <NQ1QNuTBkNsmBReCpdhY1lrdIYz9s8UiNd1As1sLQ7M=.17c8f789-2bf1-4beb-891f-debccad29164@github.com> References: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com> <NQ1QNuTBkNsmBReCpdhY1lrdIYz9s8UiNd1As1sLQ7M=.17c8f789-2bf1-4beb-891f-debccad29164@github.com> Message-ID: <G2IfSoXv1DKf69H_Gr5O_L-FTkQQgYGBS15UCNMoVt0=.acf2acd9-337c-4d45-8321-1c1be4e3316e@github.com> On Mon, 1 Jul 2024 14:14:50 GMT, Amit Kumar <amitkumar at openjdk.org> wrote: >> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) >> >> I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. >> >> >> Without Patch: >> >> SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op >> >> SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op >> >> Benchmark Mode Cnt Score Error Units >> SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/cpu/s390/macroAssembler_s390.cpp > > Co-authored-by: Andrew Haley <aph-open at littlepinkcloud.com> Looks good. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19544#pullrequestreview-2159162829 From sgehwolf at openjdk.org Thu Jul 4 15:45:20 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Thu, 4 Jul 2024 15:45:20 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v3] In-Reply-To: <t_jUv9-mkIFcGRInYKmcnfP0W8VwXEtflahjSUiK8zI=.d524b51c-1963-4024-87e0-b12911d475d0@github.com> References: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com> <t_jUv9-mkIFcGRInYKmcnfP0W8VwXEtflahjSUiK8zI=.d524b51c-1963-4024-87e0-b12911d475d0@github.com> Message-ID: <Ctdm2c5kjQzfc15BcfAWtiJKwLjNKOhAYz0Lnsy-7N0=.ded4abfe-d6df-44a4-802f-5bf17ef338bc@github.com> On Mon, 1 Jul 2024 14:43:58 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote: >> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. >> >> I'm adding those tests in order to not regress another time. >> >> Testing: >> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. >> - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) >> - [x] GHA > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Fix comments > - 8333446: Add tests for hierarchical container support Gentle ping. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2209259086 From szaldana at openjdk.org Thu Jul 4 15:46:25 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Thu, 4 Jul 2024 15:46:25 GMT Subject: RFR: 8300732: Whitebox functions for Metaspace test should use byte size Message-ID: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com> Hi all, This PR addresses [8300732](https://bugs.openjdk.org/browse/JDK-8300732) switching Whitebox Metaspace test functions to use bytes as opposed to words. Testing: - [x] `test/hotspot/jtreg/runtime/Metaspace` tests pass. Thanks, Sonia ------------- Commit messages: - 8300732: Whitebox functions for Metaspace test should use byte size Changes: https://git.openjdk.org/jdk/pull/20039/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20039&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8300732 Stats: 140 lines in 12 files changed: 48 ins; 0 del; 92 mod Patch: https://git.openjdk.org/jdk/pull/20039.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20039/head:pull/20039 PR: https://git.openjdk.org/jdk/pull/20039 From stuefe at openjdk.org Thu Jul 4 15:58:18 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 4 Jul 2024 15:58:18 GMT Subject: RFR: 8300732: Whitebox functions for Metaspace test should use byte size In-Reply-To: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com> References: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com> Message-ID: <OUlxUaj3irq9j4fNiLvdP4gfgRju-zdfMyNibVp1p18=.ff70caf7-04ac-4c66-a7c3-c8cb090b3450@github.com> On Thu, 4 Jul 2024 15:18:29 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: > Hi all, > > This PR addresses [8300732](https://bugs.openjdk.org/browse/JDK-8300732) switching Whitebox Metaspace test functions to use bytes as opposed to words. > > Testing: > - [x] `test/hotspot/jtreg/runtime/Metaspace` tests pass. > > Thanks, > Sonia First cursory look, will look again later src/hotspot/share/prims/whitebox.cpp line 1715: > 1713: // MetaspaceTestContext and MetaspaceTestArena > 1714: WB_ENTRY(jlong, WB_CreateMetaspaceTestContext(JNIEnv* env, jobject wb, jlong commit_limit, jlong reserve_limit)) > 1715: if (commit_limit % BytesPerWord != 0) { Use is_aligned() from utilities/align.hpp ------------- PR Review: https://git.openjdk.org/jdk/pull/20039#pullrequestreview-2159200244 PR Review Comment: https://git.openjdk.org/jdk/pull/20039#discussion_r1665875107 From stuefe at openjdk.org Thu Jul 4 15:58:18 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 4 Jul 2024 15:58:18 GMT Subject: RFR: 8300732: Whitebox functions for Metaspace test should use byte size In-Reply-To: <OUlxUaj3irq9j4fNiLvdP4gfgRju-zdfMyNibVp1p18=.ff70caf7-04ac-4c66-a7c3-c8cb090b3450@github.com> References: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com> <OUlxUaj3irq9j4fNiLvdP4gfgRju-zdfMyNibVp1p18=.ff70caf7-04ac-4c66-a7c3-c8cb090b3450@github.com> Message-ID: <cqRtkSTTxmfueCmWXsXIjmF0oUX5vjliE0LmwRowSHg=.6bb60863-943e-46e4-b8d2-a7ce214547ae@github.com> On Thu, 4 Jul 2024 15:53:17 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Hi all, >> >> This PR addresses [8300732](https://bugs.openjdk.org/browse/JDK-8300732) switching Whitebox Metaspace test functions to use bytes as opposed to words. >> >> Testing: >> - [x] `test/hotspot/jtreg/runtime/Metaspace` tests pass. >> >> Thanks, >> Sonia > > src/hotspot/share/prims/whitebox.cpp line 1715: > >> 1713: // MetaspaceTestContext and MetaspaceTestArena >> 1714: WB_ENTRY(jlong, WB_CreateMetaspaceTestContext(JNIEnv* env, jobject wb, jlong commit_limit, jlong reserve_limit)) >> 1715: if (commit_limit % BytesPerWord != 0) { > > Use is_aligned() from utilities/align.hpp And I think you can just assert() here. If this happens, a test written by us is using the whitebox function wrong, and since its all internal, no need to propagate a java exception. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20039#discussion_r1665875957 From duke at openjdk.org Thu Jul 4 17:56:20 2024 From: duke at openjdk.org (Camel Coder) Date: Thu, 4 Jul 2024 17:56:20 GMT Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v2] In-Reply-To: <S-BpiX60ySY6FNDfcskTHuuDsQQIno54AaOvSFlm67c=.24e8cf29-de2c-4f8e-bcdb-7cd1c7927c30@github.com> References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com> <i74xW_pCw7qGaDg6Dk9VokHRJiyhMFQ5PDz8Mi0BLr4=.939e76e4-caa2-4c9f-b33a-f29c901fc193@github.com> <S-BpiX60ySY6FNDfcskTHuuDsQQIno54AaOvSFlm67c=.24e8cf29-de2c-4f8e-bcdb-7cd1c7927c30@github.com> Message-ID: <dGgB0M0dpmd_gFfsX8XlLaeL5uk9HynqHdYuZvp7URs=.17839235-7974-4be7-a92f-2e8d5fdb1c0b@github.com> On Mon, 1 Jul 2024 15:36:03 GMT, Hamlin Li <mli at openjdk.org> wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> use pure scalar version when rvv is not supported > > with pure scalar impelmentation, it also bring some performance imrpovement in all source size, so also enable the intrinsic when rvv is not supported. > > performance data > <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> > Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score +instrinsic, scalar | Error | Units | Perf opt > -- | -- | -- | -- | -- | -- | -- | -- | -- > Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.75 | 0.38 | ns/op | 1 > Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.71 | 93.824 | 1.954 | ns/op | 0.999 > Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.824 | 123.487 | 0.559 | ns/op | 0.987 > Base64Encode.testBase64Encode | 6 | avgt | 10 | 138.984 | 137.697 | 0.273 | ns/op | 1.009 > Base64Encode.testBase64Encode | 7 | avgt | 10 | 161.243 | 157.696 | 0.875 | ns/op | 1.022 > Base64Encode.testBase64Encode | 9 | avgt | 10 | 169.724 | 155.223 | 1.908 | ns/op | 1.093 > Base64Encode.testBase64Encode | 10 | avgt | 10 | 185.92 | 176.339 | 5.875 | ns/op | 1.054 > Base64Encode.testBase64Encode | 48 | avgt | 10 | 408.467 | 347.269 | 1.799 | ns/op | 1.176 > Base64Encode.testBase64Encode | 512 | avgt | 10 | 3665.34 | 2718.442 | 26.954 | ns/op | 1.348 > Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7022.025 | 5290.003 | 33.216 | ns/op | 1.327 > Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135819.7 | 101988.94 | 2209.887 | ns/op | 1.332 > > </google-sheets-html-origin> @Hamlin-Li Hi, we looked at RVV base64 encode/decode for another project before, however there wasn't one implementation that obviously was best across the different hardware: https://github.com/WojciechMula/base64simd/issues/9 (see issue for benchmark, and repo for code) I think we currently can't tell how, the complex load/stores will perform on future hardware. Segmented load/stores for example are quite fast on the current in-order RVV 1.0 boards, however it's very slow on the ooo C910, and XiangShan (current master, may change) cores (SiFive P670 LLVM-MCA indicates that it might also be slow on that core). I'm not sure if that is because they are ooo and that gives you additional constraints, but I wouldn't rely on it just yet. I think the safest bet for encode would be for now "RISC-V RVV (LMUL=1)" ([`encode`](https://github.com/WojciechMula/base64simd/blob/master/encode/encode.rvv.cpp#L60C14-L60C20) + [`lookup_pshufb_improved`](https://github.com/WojciechMula/base64simd/blob/master/encode/lookup.rvv.cpp#L7)), as this only uses instructions with predictable performance, except for LMUL=1 `vrgather.vv`, which I think will need to be fast on any application class core. (See x86 equivalent vperm*) For decode, I'm not really happy with any implementation. Yours uses multiple `vluxei8` + `vlsege4` + `vssege3`, the others from base64simd use LMUL=8 `vrgather.vv`, which will take `LMUL^2=8^2=64` times the amount of cycles a LMUL=1 `vrgather.vv` takes (on sane implementations, [see my reasoning](https://gitlab.com/riseproject/riscv-optimization-guide/-/issues/1#note_1977583125)). As I said, I'm fairly certain LMUL=1 `vrgather.vv` will have to be relatively fast, so if I had to choose, I'd prefer [my implementation](https://godbolt.org/z/7qc1xhMao) that uses LMUL=1 `vrgather.vv`s + `vlsege4` + `vssege3`, but using `vsseg*` is not ideal. (Note that gcc currently chokes on the register allocation, so you should use clang for now) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19973#issuecomment-2209403751 From jrose at openjdk.org Thu Jul 4 21:31:21 2024 From: jrose at openjdk.org (John R Rose) Date: Thu, 4 Jul 2024 21:31:21 GMT Subject: RFR: 8334890: Missing unconditional cross modifying fence in nmethod entry barriers In-Reply-To: <592bq3FIM28SxUn6yH2iCDRT6TO_lpn_WvoS6PglM90=.b965043e-8550-45e0-be8b-5a71163a16d6@github.com> References: <592bq3FIM28SxUn6yH2iCDRT6TO_lpn_WvoS6PglM90=.b965043e-8550-45e0-be8b-5a71163a16d6@github.com> Message-ID: <Objsw0uDMNiMohx4Xn6QedHF9Qdgoe6taf93Jfvz7Ts=.c3f0a6f7-228f-4cb5-9a05-55c68f93ebcc@github.com> On Tue, 2 Jul 2024 15:43:08 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: > On x86_64, our nmethod entry barriers use a mix of asynchronous and synchronous code modification. There is a cmp instruction with an immediate. When the immediate value is "incorrect", the nmethod is armed, and when it's "correct", it's disarmed. When we load the immediate with the instruction fetcher, we use asynchronous cross modifying code, and when we load the immediate as data, we use synchronous cross modifying code. > > We use asynchronous code modification in the fast path of nmethod entry barriers. If the nmethod is concurrently being disarmed while the nmethod entry barrier is executed, then we are guaranteed that if the updated "correct" immediate is observed by the instruction fetcher, then any code modification to the nmethod prior to disarming it on another thread, is guaranteed to also be observed by the instruction fetcher. > > However, in the slow path, when the immediate was observed to have the "incorrect" value by the instruction fetcher, we call a C++ function, BarrierSetNMethod::nmethod_stub_entry_barrier. In this function we check if the nmethod is disarmed or armed, by loading the guard value (from the immediate), as data. If we observe the updated value, indicating that the nmethod has become disarmed, we want to enter the nmethod. However, since we used data to signal that the instruction cross modification has happened, it is not safe to execute the concurrently modified instructions, without enforcing a cross modifying code fence. This is synchronous code modification. > > There is some questionable optimization that in the stub slow path entry (which we just got to because the nmethod was observed to be armed by the instruction fetcher). It checks "just one more time" if the nmethod concurrently got disarmed, and then exits without cross modification fence. This is an opportunistic optimization that is very unlikely to be useful, since we got into the slow path because it a couple of instructions ago was armed. This opportunistic optimization breaks the synchronous code modification contract, which is that you have to issue an instruction cross modification fence after reading the data that signalled that cross modification has completed successfully. > > This patch removes these kinds of opportunistic optimizations from the nmethod entry barrier code, in order to make it more robust and follow the synchronous cross modification dance correctly. I see you integrated; good job, and thank you to the Reviewers. The narrative at the top of this PR is excellent for motivating and explaining the removal of the extra check. Some of the other diffs are more mysterious, as Axel noted. There are no API docs in barrierSetNMethod.[ch]pp so I would need to trace through all the code paths to properly educate myself about the effect of this change. For example, BarrierSetNMethod::nmethod_entry_barrier is a very interesting function, along with its OSR brother, but there are no comments directly visible here that give a clue as to when it is called, or why it must be called. I think that level of non-documentation is often a maintenance problem. I see this file relates to the larger API in barrierSet.hpp but that file has sparse comments also. Ideally, I?d hope to read The Narrative of the Barrier Set at the top of barrierSet.hpp, and maybe have a brief pointer to The Narrative from less-commented related files like barrierSetNMethod.cpp. Also, if I did this change, and was feeling chatty and cautious, I?d leave behind an informative comment to the effect that ?you might want to double-check the barrier state here, but don?t, because races?. It?s nice not to leave a seam from past history, but sometimes the absence of a warning leads people to repeat history. None of the above critiques would have stopped me from approving the change as another Reviewer, but the lack of documentation would have made me hesitate to review quickly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19990#issuecomment-2209578133 From dholmes at openjdk.org Thu Jul 4 23:02:21 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 4 Jul 2024 23:02:21 GMT Subject: RFR: 8334231: Optimize MethodData layout In-Reply-To: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com> References: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com> Message-ID: <jb5ayvzoKuUpOdlf-58EBJ8BHTt9OUOByPOctf4sEgY=.23ccae9d-a92a-4e9d-b10c-4d349e6c6a3e@github.com> On Thu, 4 Jul 2024 00:08:35 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote: > Hi all, > This PR is a part of https://bugs.openjdk.org/browse/JDK-8334227 to optimize Hotspot C++ class layouts, this one is for the layout of MethodData. Here is the original layout from `pahole`: > > class MethodData : public Metadata { > public: > > /* class Metadata <ancestor>; */ /* 0 0 */ > > /* XXX 8 bytes hole, try to pack */ > > class Method * _method; /* 8 8 */ > int _size; /* 16 4 */ > int _hint_di; /* 20 4 */ > class Mutex _extra_data_lock; /* 24 104 */ > /* --- cacheline 2 boundary (128 bytes) --- */ > class CompilerCounters _compiler_counters; /* 128 80 */ > /* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */ > intx _eflags; /* 208 8 */ > intx _arg_local; /* 216 8 */ > intx _arg_stack; /* 224 8 */ > intx _arg_returned; /* 232 8 */ > int _creation_mileage; /* 240 4 */ > class InvocationCounter _invocation_counter; /* 244 4 */ > class InvocationCounter _backedge_counter; /* 248 4 */ > int _invocation_counter_start; /* 252 4 */ > /* --- cacheline 4 boundary (256 bytes) --- */ > int _backedge_counter_start; /* 256 4 */ > uint _tenure_traps; /* 260 4 */ > int _invoke_mask; /* 264 4 */ > int _backedge_mask; /* 268 4 */ > short int _num_loops; /* 272 2 */ > short int _num_blocks; /* 274 2 */ > enum WouldProfile _would_profile; /* 276 4 */ > int _jvmci_ir_size; /* 280 4 */ > > /* XXX 4 bytes hole, try to pack */ > > class FailedSpeculation * _failed_speculations; /* 288 8 */ > int _data_size; /* 296 4 */ > int _parameters_type_data_di; /* 300 4 */ > int _exception_handler_data_di; /* 304 4 */ > > /* XXX 4 bytes hole, try to pack */ > > intptr_t _data[1]; /* 312 8 */ > > /* size: 320, cachelines: 5, members: 27 */ > /* sum members: 304, holes: 3, sum holes: 16 */ > }; > > > There are 3 holes ... I consider moving declarations around trivial; as functionally there is no impact. There could be a performance impact but it is very unlikely that anyone would know for certain during a code review (unless we violate a comment saying that things needs to be spaced out for caching). But I agree there is also no urgency. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20019#issuecomment-2209627272 From duke at openjdk.org Fri Jul 5 03:25:29 2024 From: duke at openjdk.org (duke) Date: Fri, 5 Jul 2024 03:25:29 GMT Subject: Withdrawn: 8329204: Diagnostic command for zeroing unused parts of the heap In-Reply-To: <b4mPQ-O25iKRZTAsKCoUJ2IrB78D_aEYs1kZRNRS4O4=.e64e5fee-52f4-4543-8c5e-3d6dd60cd8d7@github.com> References: <b4mPQ-O25iKRZTAsKCoUJ2IrB78D_aEYs1kZRNRS4O4=.e64e5fee-52f4-4543-8c5e-3d6dd60cd8d7@github.com> Message-ID: <YoIRuUtbj3T9QMht3_Ar4XDiZVZazCZzrMMFczWqk1E=.11de59a2-bcc9-4160-b782-7b5edbe3f04f@github.com> On Wed, 27 Mar 2024 17:24:34 GMT, Volker Simonis <simonis at openjdk.org> wrote: > Diagnostic command for zeroing unused parts of the heap > > I propose to add a new diagnostic command `System.zero_unused_memory` which zeros out all unused parts of the heap. The name of the command is intentionally GC/heap agnostic because in the future it might be extended to also zero unused parts of the Metaspace and/or CodeCache. > > Currently `System.zero_unused_memory` triggers a full GC and afterwards zeros unused parts of the heap. Zeroing can help snapshotting technologies like [CRIU][1] or [Firecracker][2] to shrink the snapshot size of VMs/containers with running JVM processes because pages which only contain zero bytes can be easily removed from the image by making the image *sparse* (e.g. with [`fallocate -p`][3]). > > Notice that uncommitting unused heap parts in the JVM doesn't help in the context of virtualization (e.g. KVM/Firecracker) because from the host perspective they are still dirty and can't be easily removed from the snapshot image because they usually contain some non-zero data. More details can be found in my FOSDEM talk ["Zeroing and the semantic gap between host and guest"][4]. > > Furthermore, removing pages which only contain zero bytes (i.e. "empty pages") from a snapshot image not only decreases the image size but also speeds up the restore process because empty pages don't have to be read from the image file but will be populated by the kernel zero page first until they are used for the first time. This also decreases the initial memory footprint of a restored process. > > An additional argument for memory zeroing is security. By zeroing unused heap parts, we can make sure that secrets contained in unreferenced Java objects are deleted. Something that's currently impossibly to achieve from Java because even if a Java program zeroes out arrays with sensitive data after usage, it can never guarantee that the corresponding object hasn't already been moved by the GC and an old, unreferenced copy of that data still exists somewhere in the heap. > > A prototype implementation for this proposal for Serial, Parallel, G1 and Shenandoah GC is available in the linked pull request. > > [1]: https://criu.org > [2]: https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/snapshot-support.md > [3]: https://man7.org/linux/man-pages/man1/fallocate.1.html > [4]: https://fosdem.org/2024/schedule/event/fosdem-2024-3454-zeroing-and-the-semantic-gap-between-host-and-guest/ This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/18521 From duke at openjdk.org Fri Jul 5 07:07:35 2024 From: duke at openjdk.org (duke) Date: Fri, 5 Jul 2024 07:07:35 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v30] In-Reply-To: <9d5UWcRhfhgqpUkvy2dv77bATgCKYFjxNTDreBfk4MI=.5682e46d-b448-4936-8e98-14549669d3dc@github.com> References: <ah1A3dIb6pD5Z7wYQnjoUPuuU5NvyNKEjUQvmp8MKXU=.1b615efe-deef-44d5-8bfa-908c2b2c9eb0@github.com> <9d5UWcRhfhgqpUkvy2dv77bATgCKYFjxNTDreBfk4MI=.5682e46d-b448-4936-8e98-14549669d3dc@github.com> Message-ID: <87VKmlRTP3dMFaOxiViAj2O44e7hBgbjWkDwnG5K3ug=.2ee61b21-4d7b-4db9-94d2-05ecf94d0908@github.com> On Fri, 26 Jan 2024 03:07:02 GMT, Liming Liu <lliu at openjdk.org> wrote: >> As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). >> >> Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported: >> >> <table> >> <tr> >> <th>Kernel</th> >> <th colspan="2"><tt>-XX:-TransparentHugePages</tt></th> >> <th colspan="2"><tt>-XX:+TransparentHugePages</tt></th> >> </tr> >> <tr><td></td><td>Unpatched</td><td>Patched</td><td>Unpatched</td><td>Patched</td></tr> >> <tr><td>4.18</td><td>11.30</td><td>11.30</td><td>0.25</td><td>0.25</td></tr> >> <tr><td>5.13</td><td>0.22</td><td>0.22</td><td>3.42</td><td>3.42</td></tr> >> <tr><td>6.1</td><td>0.27</td><td>0.33</td><td>3.54</td><td>0.33</td></tr> >> </table> > > Liming Liu has updated the pull request incrementally with one additional commit since the last revision: > > Make it true by default and use a lower log level when fail @limingliu-ampere Your change (at version 3ac920fd2f1f99e6889f3958e13aa8d2a749e17c) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15781#issuecomment-1911765848 From tschatzl at openjdk.org Fri Jul 5 07:21:27 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 5 Jul 2024 07:21:27 GMT Subject: RFR: 8331385: G1: Prefix HeapRegion helper classes with G1 In-Reply-To: <BbCLtLUIqyaA9lNeheVeZJV2fb49kWP2p5t8vRAJ1Uw=.6f72af7b-c5dd-4814-95b4-04e91f32b2c7@github.com> References: <q2rzIb9CIlSji4pbk0GdDk-y6jrRgZCsvNFkrYI4CJM=.136951b5-f2bc-4169-83dc-b44d20b42f07@github.com> <BbCLtLUIqyaA9lNeheVeZJV2fb49kWP2p5t8vRAJ1Uw=.6f72af7b-c5dd-4814-95b4-04e91f32b2c7@github.com> Message-ID: <G5mdMZ4HOP4K43zFH2JbNEBpec8tRLpv8Sqe5H5WVEI=.cdb0f0f1-e720-4abf-8a9d-3131411f7fb5@github.com> On Tue, 2 Jul 2024 10:21:35 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: >> Hi all, >> >> after [JDK-8330694](https://bugs.openjdk.org/browse/JDK-8330694) which renamed HeapRegion to G1HeapRegion, there were a few related helper classes in this CR that were not renamed. >> >> It's purely mechanical renaming without even further renaming of files etc. >> >> This change updates them. >> >> (Fwiw, the "Viewed" checkbox at the top right of the file change helps a lot review this change incrementally) >> >> Testing: tier1, tier4, tier5 >> >> Thanks, >> Thomas > > Marked as reviewed by ayang (Reviewer). Thanks @albertnetymk @dholmes-ora for your reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19967#issuecomment-2210331294 From tschatzl at openjdk.org Fri Jul 5 07:21:29 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 5 Jul 2024 07:21:29 GMT Subject: Integrated: 8331385: G1: Prefix HeapRegion helper classes with G1 In-Reply-To: <q2rzIb9CIlSji4pbk0GdDk-y6jrRgZCsvNFkrYI4CJM=.136951b5-f2bc-4169-83dc-b44d20b42f07@github.com> References: <q2rzIb9CIlSji4pbk0GdDk-y6jrRgZCsvNFkrYI4CJM=.136951b5-f2bc-4169-83dc-b44d20b42f07@github.com> Message-ID: <eMo7F30Tw-fsx2MNBVThXkJFfG8-0JCUnrPDTPV8PTM=.5a6a6e6f-2b05-4910-9d12-86665cafe80b@github.com> On Mon, 1 Jul 2024 09:35:00 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote: > Hi all, > > after [JDK-8330694](https://bugs.openjdk.org/browse/JDK-8330694) which renamed HeapRegion to G1HeapRegion, there were a few related helper classes in this CR that were not renamed. > > It's purely mechanical renaming without even further renaming of files etc. > > This change updates them. > > (Fwiw, the "Viewed" checkbox at the top right of the file change helps a lot review this change incrementally) > > Testing: tier1, tier4, tier5 > > Thanks, > Thomas This pull request has now been integrated. Changeset: 4ec1ae10 Author: Thomas Schatzl <tschatzl at openjdk.org> URL: https://git.openjdk.org/jdk/commit/4ec1ae109710aa150e27acf5706475d335c4655c Stats: 887 lines in 68 files changed: 163 ins; 165 del; 559 mod 8331385: G1: Prefix HeapRegion helper classes with G1 Reviewed-by: ayang, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/19967 From eosterlund at openjdk.org Fri Jul 5 07:57:22 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 5 Jul 2024 07:57:22 GMT Subject: [jdk23] RFR: 8334890: Missing unconditional cross modifying fence in nmethod entry barriers In-Reply-To: <sKjd64NGI947n-5zuIRi78OPMZfwgreI54UDcyTeTW0=.a28e7639-c100-4e00-938c-15d0dcb8ca63@github.com> References: <Zl8M3k7N0G-oBFmAyX3oO6RrxeNnCdr2lH9JyrdX0GQ=.4d2d82e8-49f7-44ec-84ff-0c0d6794b9e5@github.com> <sKjd64NGI947n-5zuIRi78OPMZfwgreI54UDcyTeTW0=.a28e7639-c100-4e00-938c-15d0dcb8ca63@github.com> Message-ID: <3eH8AAPY5p1IQ9e7q9K_kIxbmLW_g0Y36ZGF3_MC1eE=.aeb6e1b7-2ce0-4586-a7d9-9a200c008cd6@github.com> On Thu, 4 Jul 2024 13:55:15 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> 8334890: Missing unconditional cross modifying fence in nmethod entry barriers > > Marked as reviewed by aboldtch (Reviewer). Thanks for the review @xmas92! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20036#issuecomment-2210381271 From eosterlund at openjdk.org Fri Jul 5 07:57:23 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 5 Jul 2024 07:57:23 GMT Subject: [jdk23] Integrated: 8334890: Missing unconditional cross modifying fence in nmethod entry barriers In-Reply-To: <Zl8M3k7N0G-oBFmAyX3oO6RrxeNnCdr2lH9JyrdX0GQ=.4d2d82e8-49f7-44ec-84ff-0c0d6794b9e5@github.com> References: <Zl8M3k7N0G-oBFmAyX3oO6RrxeNnCdr2lH9JyrdX0GQ=.4d2d82e8-49f7-44ec-84ff-0c0d6794b9e5@github.com> Message-ID: <iHwQYcMFiNCQYEUCbnJ1XKsBMVWLdBlu2IJ016J9Tcg=.d90c13c2-1fbd-4a67-8223-9c2b688b495e@github.com> On Thu, 4 Jul 2024 13:52:09 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: > 8334890: Missing unconditional cross modifying fence in nmethod entry barriers This pull request has now been integrated. Changeset: d383365e Author: Erik ?sterlund <eosterlund at openjdk.org> URL: https://git.openjdk.org/jdk/commit/d383365ea4196cd5f40de217547392b820c4ad01 Stats: 18 lines in 1 file changed: 1 ins; 14 del; 3 mod 8334890: Missing unconditional cross modifying fence in nmethod entry barriers Reviewed-by: aboldtch Backport-of: c0604fb823d9f3b2e347a9857b11606b223ad8ec ------------- PR: https://git.openjdk.org/jdk/pull/20036 From azafari at openjdk.org Fri Jul 5 11:15:25 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 5 Jul 2024 11:15:25 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v4] In-Reply-To: <VuFm0lU78YLRVyTMOvNd1rofkOkF6VyvzhAePmQMFJc=.d8e6ca76-c405-4148-9fe9-007f0a3e616d@github.com> References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <VuFm0lU78YLRVyTMOvNd1rofkOkF6VyvzhAePmQMFJc=.d8e6ca76-c405-4148-9fe9-007f0a3e616d@github.com> Message-ID: <gch2E5eiUMmfTLIqawsbkP0QlwuJQy_Eg0K5ZUzX7aQ=.5f6d33dc-fd8f-4814-99ed-2d6c1f569d62@github.com> On Fri, 24 May 2024 13:46:15 GMT, Afshin Zafari <azafari at openjdk.org> wrote: >> This PR fixes the problems existed in the original PR (https://github.com/openjdk/jdk/pull/18745). There are two main fixes here: >> 1- `ReservedSpace` class is changed so that the `_flag` member never changes after it is set in ctor. Since reserving memory regions may go thru a try and fail sequence of reserve-release pairs, changing the `_flag` member at failed releases would lead to incorrect flags in subsequent reserves. >> Also, some assertion are added to the getters of a `ReservedSpace` to check if the region is successfully reserved. >> >> 2- In order to have adjacent regions with different flags, CDS reserves a (large) region `R` and then splits it into sub regions `R1` and `R2` (`R == <---R1---><--R2-->`). At release time, NMT tracks only `R` and ignores releasing `R1` and `R2`. This ignoring is problematic when a requested region `R` is size-aligned to `R1---R---R2` first and then the `R1` and `R2` are released (`chop_extra_memory` function is called for this). In this case, NMT ignores tracking `R1` and `R2` with false assumption that a containing `R` will be released. Therefore, `R1` and `R2` remain in the NMT reserved-regions-list and when a new reserve happens at that regions, NMT complains by raising an exception. >> >> Tests: >> mach5 tiers 1-5, {linux-x64, macosx-aarch64, windows-x64, linux-aarch64 } x {debug, non-debug} > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > more fixes. Withdrawn. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19343#issuecomment-2210686409 From azafari at openjdk.org Fri Jul 5 11:15:26 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 5 Jul 2024 11:15:26 GMT Subject: Withdrawn: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API In-Reply-To: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> Message-ID: <AH1L3pa_NYr7AJG595_g3d_8iQvLkpTZMLmXz573NbM=.eb1f09bf-8a9e-4062-be19-eb326fcd945b@github.com> On Wed, 22 May 2024 08:29:05 GMT, Afshin Zafari <azafari at openjdk.org> wrote: > This PR fixes the problems existed in the original PR (https://github.com/openjdk/jdk/pull/18745). There are two main fixes here: > 1- `ReservedSpace` class is changed so that the `_flag` member never changes after it is set in ctor. Since reserving memory regions may go thru a try and fail sequence of reserve-release pairs, changing the `_flag` member at failed releases would lead to incorrect flags in subsequent reserves. > Also, some assertion are added to the getters of a `ReservedSpace` to check if the region is successfully reserved. > > 2- In order to have adjacent regions with different flags, CDS reserves a (large) region `R` and then splits it into sub regions `R1` and `R2` (`R == <---R1---><--R2-->`). At release time, NMT tracks only `R` and ignores releasing `R1` and `R2`. This ignoring is problematic when a requested region `R` is size-aligned to `R1---R---R2` first and then the `R1` and `R2` are released (`chop_extra_memory` function is called for this). In this case, NMT ignores tracking `R1` and `R2` with false assumption that a containing `R` will be released. Therefore, `R1` and `R2` remain in the NMT reserved-regions-list and when a new reserve happens at that regions, NMT complains by raising an exception. > > Tests: > mach5 tiers 1-5, {linux-x64, macosx-aarch64, windows-x64, linux-aarch64 } x {debug, non-debug} This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/19343 From coleenp at openjdk.org Fri Jul 5 12:38:21 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 5 Jul 2024 12:38:21 GMT Subject: RFR: 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 In-Reply-To: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> Message-ID: <KSG0PgqjRhlVE2khvuSnf_CYg2sSqJ_oRaQKqqB4nT4=.aa29fd29-c476-4144-8454-78cf536ed55e@github.com> On Wed, 3 Jul 2024 16:24:20 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote: > The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens. > > The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details). > > The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. > > I tested the patch by running it through mach5 tiers 1-6. > > Thanks, > Patricio Also a couple of nits, but this looks good. Thanks for tracking down the history and verifying that its an unusual situation that we were optimizing for. src/hotspot/share/interpreter/oopMapCache.hpp line 45: > 43: // For InterpreterOopMap the bit_mask is allocated in the C heap > 44: // to avoid issues with allocations from the resource area that have > 45: // to live accross the oop closure (see 8335409). InterpreterOopMap We don't usually put bug numbers in the code and after this change nobody will want to move this back to resource area, so putting the bug number as a caution shouldn't be needed. If one wants to know the details, they can git blame this file. src/hotspot/share/interpreter/oopMapCache.hpp line 46: > 44: // to avoid issues with allocations from the resource area that have > 45: // to live accross the oop closure (see 8335409). InterpreterOopMap > 46: // should only be created and deleted during same garbage collection. Can you add 'the' to "during the same garbage collection." ------------- PR Review: https://git.openjdk.org/jdk/pull/20012#pullrequestreview-2160631864 PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1666753678 PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1666754397 From coleenp at openjdk.org Fri Jul 5 12:38:22 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 5 Jul 2024 12:38:22 GMT Subject: RFR: 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 In-Reply-To: <iZb_AvCGeJYQ51-UTqMhkxRKQwt0F6UgdM6nppalaEo=.d3c5ad91-9342-42a6-83c9-03a9e4a104bb@github.com> References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> <iZb_AvCGeJYQ51-UTqMhkxRKQwt0F6UgdM6nppalaEo=.d3c5ad91-9342-42a6-83c9-03a9e4a104bb@github.com> Message-ID: <ICN_RiO5Rpx7xM9JGERyUT7Dh2VB-DoW-h4jGmyKDdY=.62faa586-b9f2-4915-95fd-e74e184e0bac@github.com> On Thu, 4 Jul 2024 04:53:59 GMT, David Holmes <dholmes at openjdk.org> wrote: >> The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens. >> >> The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details). >> >> The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. >> >> I tested the patch by running it through mach5 tiers 1-6. >> >> Thanks, >> Patricio > > src/hotspot/share/interpreter/oopMapCache.hpp line 138: > >> 136: // allocated space (i.e., the bit mask was to large to hold >> 137: // in-line), allocate the space from the C heap. >> 138: void resource_copy(OopMapCacheEntry* from); > > The name `resource_copy` seems somewhat of a misnomer given it may be C heap. Is it worth changing? I agree, this should probably be copy_from, and rename the parameter src. Or something like that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1666755529 From duke at openjdk.org Fri Jul 5 13:29:24 2024 From: duke at openjdk.org (duke) Date: Fri, 5 Jul 2024 13:29:24 GMT Subject: Withdrawn: 8331608: Consolidate EncodeGCModeConcurrentFrameClosure and TransformStackChunkClosure In-Reply-To: <RjjYzSdzZdei0SN7GLMfHcTXoa_-HJItLxMJEM5UdYo=.6a3d2b3f-788e-4f82-98d7-68e38e62241b@github.com> References: <RjjYzSdzZdei0SN7GLMfHcTXoa_-HJItLxMJEM5UdYo=.6a3d2b3f-788e-4f82-98d7-68e38e62241b@github.com> Message-ID: <qHaXyKEgvO610PiIeRxc7rjkzlYjUIjnkxJzuUMMka4=.3f1ed153-e4d6-4cba-a59a-8ea8db98f9b5@github.com> On Fri, 3 May 2024 11:58:36 GMT, Guoxiong Li <gli at openjdk.org> wrote: > Hi all, > > After [JDK-8296875](https://bugs.openjdk.org/browse/JDK-8296875), the classes `EncodeGCModeConcurrentFrameClosure` and `TransformStackChunkClosure` almost have the same code. This patch consolidates them into one. > > The tests `make test-hotspot_loom` and `make test-hotspot_gc` passed locally (linux & x64). Thanks for taking the time to review. > > Best Regards, > -- Guoxiong This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/19084 From aph at openjdk.org Fri Jul 5 13:36:42 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 5 Jul 2024 13:36:42 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter Message-ID: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> This patch expands the use of a hash table for secondary superclasses to the interpreter, C1, and runtime. It also adds a C2 implementation of hashed lookup in cases where the superclass isn't known at compile time. HotSpot shared runtime ---------------------- Building hashed secondary tables is now unconditional. It takes very little time, and now that the shared runtime always has the tables, it might as well take advantage of them. The shared code is easier to follow now, I think. There might be a performance issue with x86-64 in that we build HotSpot for a default x86-64 target that does not support popcount. This means that HotSpot C++ runtime on x86 always uses a software emulation for popcount, even though the vast majority of machines made for the past 20 years can do popcount in a single instruction. It wouldn't be terribly hard to do something about that. Having said that, the software popcount is really not bad. x86 --- x86 is rather tricky, because we still support `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as well as 32- and 64-bit ports. There's some further complication in that only `RCX` can be used as a shift count, so there's some register shuffling to do. All of this makes the logic in macroAssembler_x86.cpp rather gnarly, with multiple levels of conditionals at compile time and runtime. AArch64 ------- AArch64 is considerably more straightforward. We always have a popcount instruction and (thankfully) no 32-bit code to worry about. Generally --------- I would dearly love simply to rip out the "old" secondary supers cache support, but I've left it in just in case someone has a performance regression. The versions of `MacroAssembler::lookup_secondary_supers_table` that work with variable superclasses don't take a fixed set of temp registers, and neither do they call out to to a slow path subroutine. Instead, the slow patch is expanded inline. I don't think this is necessarily bad. Apart from the very rare cases where C2 can't determine the superclass to search for at compile time, this code is only used for generating stubs, and it seemed to me ridiculous to have stubs calling other stubs. I've followed the guidance from @iwanowww not to obsess too much about the performance of C1-compiled secondary supers lookups, and to prefer simplicity over absolute performance. Nonetheless, this is a complicated patch that touches many areas. ------------- Commit messages: - Cleanup tests - small - Small - Temp - Merge remote-tracking branch 'refs/remotes/origin/JDK-8331658-work' into JDK-8331658-work - Fix x86-32 - Fix x86 - Temp - Temp - Temp - ... and 16 more: https://git.openjdk.org/jdk/compare/747e1e47...7d7694cc Changes: https://git.openjdk.org/jdk/pull/19989/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331341 Stats: 886 lines in 13 files changed: 755 ins; 69 del; 62 mod Patch: https://git.openjdk.org/jdk/pull/19989.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989 PR: https://git.openjdk.org/jdk/pull/19989 From mli at openjdk.org Fri Jul 5 13:48:24 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 5 Jul 2024 13:48:24 GMT Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v4] In-Reply-To: <FZMjsZWO9NKx4v5svo8qQPE5HKqvoiM-lc0oiDCah80=.2d250429-524a-4e93-a453-bf1db0238626@github.com> References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com> <FZMjsZWO9NKx4v5svo8qQPE5HKqvoiM-lc0oiDCah80=.2d250429-524a-4e93-a453-bf1db0238626@github.com> Message-ID: <JH625cZxMHDvjzWakK6XGICFywENU6G0odkwwzpzLvU=.8e62af09-bdbb-4d77-a63d-fb77f9bb6a92@github.com> On Tue, 2 Jul 2024 14:16:35 GMT, Hamlin Li <mli at openjdk.org> wrote: >> Hi, >> Can you help to review the patch? >> >> I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it. >> >> Thanks. >> >> ## Test >> benchmarks run on CanVM-K230 >> >> I've tried several implementations, respectively with vector group >> * m2+m1+scalar >> * m2+scalar >> * m1+scalar >> * pure scalar >> The best one is combination of m2+m1, it have best performance in all source size. >> >> this implementation (m2+m1) >> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> >> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic >> -- | -- | -- | -- | -- | -- | -- | -- | -- >> Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063 >> Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443 >> Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364 >> Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856 >> Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146 >> Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423 >> Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862 >> Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469 >> Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631 >> Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698 >> Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876 >> >> </google-sheets-html-origin> >> >> vector with only m2 >> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: st... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > move label Thanks a lot for sharing the information. > @Hamlin-Li Hi, we looked at RVV base64 encode/decode for another project before, however there wasn't one implementation that obviously was best across the different hardware: [WojciechMula/base64simd#9](https://github.com/WojciechMula/base64simd/issues/9) (see issue for benchmark, and repo for code) Agree, I think your observation is right. > I think we currently can't tell how, the complex load/stores will perform on future hardware. Segmented load/stores for example are quite fast on the current in-order RVV 1.0 boards, however it's very slow on the ooo C910, and XiangShan (current master, may change) cores (SiFive P670 LLVM-MCA indicates that it might also be slow on that core). I'm not sure if that is because they are ooo and that gives you additional constraints, but I wouldn't rely on it just yet. I don't know how that (`it's very slow on the ooo`) happens and currently I don't have these types of machine. And it's bit strange that they are very slow with those instructions, could it be that they are not fully optimized for those instructions on these machines? > I think the safest bet for encode would be for now "RISC-V RVV (LMUL=1)" ([`encode`](https://github.com/WojciechMula/base64simd/blob/master/encode/encode.rvv.cpp#L60C14-L60C20) + [`lookup_pshufb_improved`](https://github.com/WojciechMula/base64simd/blob/master/encode/lookup.rvv.cpp#L7)), as this only uses instructions with predictable performance, except for LMUL=1 `vrgather.vv`, which I think will need to be fast on any application class core. (See x86 equivalent vperm*) My current tests on k230 shows that m2+m1+scalar bring the best performance on all size values, I'd like to see test data on other hardwares if someone can help test and get the data. And, for current implementation it's easy to adjust lmul value in the algorithm. So I'm flexiable to either lmul value based the test data. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19973#issuecomment-2210902916 From mli at openjdk.org Fri Jul 5 13:48:25 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 5 Jul 2024 13:48:25 GMT Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v2] In-Reply-To: <dGgB0M0dpmd_gFfsX8XlLaeL5uk9HynqHdYuZvp7URs=.17839235-7974-4be7-a92f-2e8d5fdb1c0b@github.com> References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com> <i74xW_pCw7qGaDg6Dk9VokHRJiyhMFQ5PDz8Mi0BLr4=.939e76e4-caa2-4c9f-b33a-f29c901fc193@github.com> <S-BpiX60ySY6FNDfcskTHuuDsQQIno54AaOvSFlm67c=.24e8cf29-de2c-4f8e-bcdb-7cd1c7927c30@github.com> <dGgB0M0dpmd_gFfsX8XlLaeL5uk9HynqHdYuZvp7URs=.17839235-7974-4be7-a92f-2e8d5fdb1c0b@github.com> Message-ID: <j_zMt6H-xeKbTEySzai9jsiS8jS0vTgyloYW0DHANF4=.c89a1fde-7f90-49bc-a8e2-29e9df5142f4@github.com> On Thu, 4 Jul 2024 17:49:56 GMT, Camel Coder <duke at openjdk.org> wrote: > For decode, I'm not really happy with any implementation. Yours uses multiple `vluxei8` + `vlsege4` + `vssege3`, the others from base64simd use LMUL=8 `vrgather.vv`, which will take `LMUL^2=8^2=64` times the amount of cycles a LMUL=1 `vrgather.vv` takes (on sane implementations, [see my reasoning](https://gitlab.com/riseproject/riscv-optimization-guide/-/issues/1#note_1977583125)). As I said, I'm fairly certain LMUL=1 `vrgather.vv` will have to be relatively fast, so if I had to choose, I'd prefer [my implementation](https://godbolt.org/z/hrs61x9aP) that uses LMUL=1 `vrgather.vv`s + `vlsege4` + `vssege3`, but using `vsseg*` is not ideal. (Note that gcc currently chokes on the register allocation, so you should use clang for now) I import [your implementation](https://godbolt.org/z/hrs61x9aP) into jdk, but compared to my current decode implementation, it brings much regression. Let's discuss about decode in https://github.com/openjdk/jdk/pull/20026. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19973#issuecomment-2210907011 From pchilanomate at openjdk.org Fri Jul 5 14:15:33 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 5 Jul 2024 14:15:33 GMT Subject: RFR: 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 [v2] In-Reply-To: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> Message-ID: <jgJVKPLVStPVPXoOtDJO5RcwYG4ForKGV-ZPpyIEOnk=.9bd869b5-5856-49c5-9fd0-727ea2e04c9f@github.com> > The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens. > > The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details). > > The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. > > I tested the patch by running it through mach5 tiers 1-6. > > Thanks, > Patricio Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: - Coleen's comments - David's comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20012/files - new: https://git.openjdk.org/jdk/pull/20012/files/ca0db02b..805358e7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20012&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20012&range=00-01 Stats: 23 lines in 2 files changed: 0 ins; 2 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/20012.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20012/head:pull/20012 PR: https://git.openjdk.org/jdk/pull/20012 From pchilanomate at openjdk.org Fri Jul 5 14:15:36 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 5 Jul 2024 14:15:36 GMT Subject: RFR: 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 [v2] In-Reply-To: <iZb_AvCGeJYQ51-UTqMhkxRKQwt0F6UgdM6nppalaEo=.d3c5ad91-9342-42a6-83c9-03a9e4a104bb@github.com> References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> <iZb_AvCGeJYQ51-UTqMhkxRKQwt0F6UgdM6nppalaEo=.d3c5ad91-9342-42a6-83c9-03a9e4a104bb@github.com> Message-ID: <BxVXPXx1uYVm3LYXBOIgQ26i8VhKpaH-r0zio3ykvAI=.00a0a0bc-92fa-40cc-ae77-6f25f6f9be0d@github.com> On Thu, 4 Jul 2024 04:49:53 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - Coleen's comments >> - David's comments > > src/hotspot/share/interpreter/oopMapCache.cpp line 179: > >> 177: #ifdef ASSERT >> 178: _used = false; >> 179: #endif > > Nit pre-existing: use of DEBUG_ONLY would be more consistent with later setting of `_used`. Fixed. > src/hotspot/share/interpreter/oopMapCache.cpp line 408: > >> 406: >> 407: void InterpreterOopMap::resource_copy(OopMapCacheEntry* from) { >> 408: // The expectation is that this InterpreterOopMap is a recently created > > s/is a recently/is recently/ Fixed. > src/hotspot/share/interpreter/oopMapCache.hpp line 136: > >> 134: // Copy the OopMapCacheEntry in parameter "from" into this >> 135: // InterpreterOopMap. If the _bit_mask[0] in "from" points to >> 136: // allocated space (i.e., the bit mask was to large to hold > > Nit pre-existing: s/to/too/ Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1666856873 PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1666856765 PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1666856975 From pchilanomate at openjdk.org Fri Jul 5 14:17:09 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 5 Jul 2024 14:17:09 GMT Subject: RFR: 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 [v2] In-Reply-To: <KSG0PgqjRhlVE2khvuSnf_CYg2sSqJ_oRaQKqqB4nT4=.aa29fd29-c476-4144-8454-78cf536ed55e@github.com> References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> <KSG0PgqjRhlVE2khvuSnf_CYg2sSqJ_oRaQKqqB4nT4=.aa29fd29-c476-4144-8454-78cf536ed55e@github.com> Message-ID: <qc9NFhzOYAVZNhWoHtHkob6X1_iNYUPtsogLgp1zLm8=.32c19818-6de4-463a-b213-6131e0dcca6c@github.com> On Fri, 5 Jul 2024 12:32:53 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - Coleen's comments >> - David's comments > > src/hotspot/share/interpreter/oopMapCache.hpp line 45: > >> 43: // For InterpreterOopMap the bit_mask is allocated in the C heap >> 44: // to avoid issues with allocations from the resource area that have >> 45: // to live accross the oop closure (see 8335409). InterpreterOopMap > > We don't usually put bug numbers in the code and after this change nobody will want to move this back to resource area, so putting the bug number as a caution shouldn't be needed. If one wants to know the details, they can git blame this file. Removed. > src/hotspot/share/interpreter/oopMapCache.hpp line 46: > >> 44: // to avoid issues with allocations from the resource area that have >> 45: // to live accross the oop closure (see 8335409). InterpreterOopMap >> 46: // should only be created and deleted during same garbage collection. > > Can you add 'the' to "during the same garbage collection." Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1666861364 PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1666861445 From pchilanomate at openjdk.org Fri Jul 5 14:17:10 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 5 Jul 2024 14:17:10 GMT Subject: RFR: 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 [v2] In-Reply-To: <ICN_RiO5Rpx7xM9JGERyUT7Dh2VB-DoW-h4jGmyKDdY=.62faa586-b9f2-4915-95fd-e74e184e0bac@github.com> References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> <iZb_AvCGeJYQ51-UTqMhkxRKQwt0F6UgdM6nppalaEo=.d3c5ad91-9342-42a6-83c9-03a9e4a104bb@github.com> <ICN_RiO5Rpx7xM9JGERyUT7Dh2VB-DoW-h4jGmyKDdY=.62faa586-b9f2-4915-95fd-e74e184e0bac@github.com> Message-ID: <spVCFBfSxNDbMApL5o8zC8KJA5wJ1L2VISkpHfFT8Eg=.5f07f8d4-e8cd-470c-9c78-b49bf08f3ef0@github.com> On Fri, 5 Jul 2024 12:34:55 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> src/hotspot/share/interpreter/oopMapCache.hpp line 138: >> >>> 136: // allocated space (i.e., the bit mask was to large to hold >>> 137: // in-line), allocate the space from the C heap. >>> 138: void resource_copy(OopMapCacheEntry* from); >> >> The name `resource_copy` seems somewhat of a misnomer given it may be C heap. Is it worth changing? > > I agree, this should probably be copy_from, and rename the parameter src. Or something like that. I also thought about renaming it but ended up leaving it as is in v1. I changed it to Coleen's suggestion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1666861013 From jvernee at openjdk.org Fri Jul 5 14:27:37 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 5 Jul 2024 14:27:37 GMT Subject: RFR: 8335638: Calling VarHandle.{access-mode} methods reflectively throws wrong exception [v2] In-Reply-To: <1yQze0X7kl1oxFtlWu0rtJwHF2WtnZYJ7t6OteIJAnQ=.85eae267-7848-4978-aa11-9f2720e67e00@github.com> References: <gD4D2MSMO5dqwOf-XWA1u-a50e59goP8F_6be-mermA=.d172f4cf-14ad-492b-bdcc-8cf39d77c8ef@github.com> <1yQze0X7kl1oxFtlWu0rtJwHF2WtnZYJ7t6OteIJAnQ=.85eae267-7848-4978-aa11-9f2720e67e00@github.com> Message-ID: <s3ecyFzSeSB-ZY_HYForZPoga3JUOjMRftb91Zt_Wzs=.a6708d5f-e74d-43ea-afbb-e4136a356ca3@github.com> On Thu, 4 Jul 2024 06:22:31 GMT, Hannes Greule <hgreule at openjdk.org> wrote: >> Similar to how `MethodHandle#invoke(Exact)` methods are already handled, this change adds special casing for `VarHandle.{access-mode}` methods. >> >> The exception message is less exact, but I think that's acceptable. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > address comments I think this needs a CSR, to document the change in behavior. (See e.g. https://bugs.openjdk.org/browse/JDK-8335554 which is a very similar case) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20015#issuecomment-2210971780 From coleenp at openjdk.org Fri Jul 5 14:38:32 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 5 Jul 2024 14:38:32 GMT Subject: RFR: 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 [v2] In-Reply-To: <jgJVKPLVStPVPXoOtDJO5RcwYG4ForKGV-ZPpyIEOnk=.9bd869b5-5856-49c5-9fd0-727ea2e04c9f@github.com> References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> <jgJVKPLVStPVPXoOtDJO5RcwYG4ForKGV-ZPpyIEOnk=.9bd869b5-5856-49c5-9fd0-727ea2e04c9f@github.com> Message-ID: <MFRBX5ILKT3OucHWesIQe52aS13yoxRet-jHkUjpks8=.742dddb8-52b8-4a09-94e7-2374b47535ba@github.com> On Fri, 5 Jul 2024 14:15:33 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote: >> The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens. >> >> The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details). >> >> The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. >> >> I tested the patch by running it through mach5 tiers 1-6. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Coleen's comments > - David's comments One tiny nit. src/hotspot/share/interpreter/oopMapCache.hpp line 93: > 91: protected: > 92: #ifdef ASSERT > 93: bool _used; Can you make this a DEBUG_ONLY() too? ------------- PR Review: https://git.openjdk.org/jdk/pull/20012#pullrequestreview-2160851216 PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1666882419 From pchilanomate at openjdk.org Fri Jul 5 15:01:05 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 5 Jul 2024 15:01:05 GMT Subject: RFR: 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 [v3] In-Reply-To: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> Message-ID: <uFD2HVD2DS8b9XI68lOXqSyyT3gdfmNFXmYIUozJ3hc=.f5aa1a99-e90e-4f5e-9159-c9724205fbd9@github.com> > The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens. > > The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details). > > The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. > > I tested the patch by running it through mach5 tiers 1-6. > > Thanks, > Patricio Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: use DEBUG_ONLY on _used declaration ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20012/files - new: https://git.openjdk.org/jdk/pull/20012/files/805358e7..7ce559cb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20012&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20012&range=01-02 Stats: 5 lines in 1 file changed: 0 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20012.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20012/head:pull/20012 PR: https://git.openjdk.org/jdk/pull/20012 From pchilanomate at openjdk.org Fri Jul 5 15:01:06 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 5 Jul 2024 15:01:06 GMT Subject: RFR: 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 [v2] In-Reply-To: <MFRBX5ILKT3OucHWesIQe52aS13yoxRet-jHkUjpks8=.742dddb8-52b8-4a09-94e7-2374b47535ba@github.com> References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> <jgJVKPLVStPVPXoOtDJO5RcwYG4ForKGV-ZPpyIEOnk=.9bd869b5-5856-49c5-9fd0-727ea2e04c9f@github.com> <MFRBX5ILKT3OucHWesIQe52aS13yoxRet-jHkUjpks8=.742dddb8-52b8-4a09-94e7-2374b47535ba@github.com> Message-ID: <1yESUDNXtrEnuUC0qHYA81qWZpXhrZEE1K9atEtZsI0=.53f41358-b333-4300-af2d-d441c8efe537@github.com> On Fri, 5 Jul 2024 14:34:57 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - Coleen's comments >> - David's comments > > src/hotspot/share/interpreter/oopMapCache.hpp line 93: > >> 91: protected: >> 92: #ifdef ASSERT >> 93: bool _used; > > Can you make this a DEBUG_ONLY() too? Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1666904522 From nprasad at openjdk.org Fri Jul 5 15:01:10 2024 From: nprasad at openjdk.org (Neethu Prasad) Date: Fri, 5 Jul 2024 15:01:10 GMT Subject: RFR: 8334230: Optimize C2 classes layout Message-ID: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com> **Notes** Rearrange C2 class fields to optimize footprint. **Verification** 1. Ran tier2_compiler, hotspot_compiler, tier 1 & tier 2 tests. 2. Ran pahole on 64 bit machine post re-ordering and verified that there are no holes / reduction in total bytes. | Class | Size | Cachelines | Sum Members | Holes | Sum holes | Last Cacheline | Padding | | ----- | ----- | ---------- | --------------- | ----- | ---------- | --------------- | -------- | | ArrayPointer | 56 -> 48 | 1 -> 1 | 45 -> 0 | 2 -> 0 | 11 -> 0 | 56 bytes -> 48 | 0 -> 3 | | CallJavaNode | 152 -> 144 | 3 -> 3 | 12 -> 0 | 1 -> 0 | 5 -> 0 | 24 bytes -> 16 | 7 -> 4 | | C2Access | 56 -> 48 | 1-> 1 | 42 -> 0 | 1 -> 0 | 7 -> 0 | 56 bytes -> 48 | 7 -> 6 | | VectorSet| 32 -> 24 | 1-> 1 | 24 -> 0 | 1 -> 0 | 8 -> 0 | 32 bytes -> 24 | 1 -> 1 | class ArrayPointer { const class Node * _pointer; /* 0 8 */ const class Node * _base; /* 8 8 */ const jlong _constant_offset; /* 16 8 */ const class Node * _int_offset; /* 24 8 */ const class GrowableArray<Node*> * _other_offsets; /* 32 8 */ const jint _int_offset_shift; /* 40 4 */ const bool _is_valid; /* 44 1 */ public: /* size: 48, cachelines: 1, members: 7 */ /* padding: 3 */ /* last cacheline: 48 bytes */ }; class CallJavaNode : public CallNode { public: /* class CallNode <ancestor>; */ /* 0 128 */ protected: /* --- cacheline 2 boundary (128 bytes) --- */ class ciMethod * _method; /* 128 8 */ bool _optimized_virtual; /* 136 1 */ bool _method_handle_invoke; /* 137 1 */ bool _override_symbolic_info; /* 138 1 */ bool _arg_escape; /* 139 1 */ public: protected: public: /* size: 144, cachelines: 3, members: 6 */ /* padding: 4 */ /* last cacheline: 16 bytes */ /* BRAIN FART ALERT! 144 bytes != 12 (member bytes) + 0 (member bits) + 0 (byte holes) + 0 (bit holes), diff = 1024 bits */ }; class C2Access : public StackObj { public: /* class StackObj <ancestor>; */ /* 0 0 */ /* XXX last struct has 1 byte of padding */ int ()(void) * * _vptr.C2Access; /* 0 8 */ protected: DecoratorSet _decorators; /* 8 8 */ class Node * _base; /* 16 8 */ class C2AccessValuePtr & _addr; /* 24 8 */ class Node * _raw_access; /* 32 8 */ enum BasicType _type; /* 40 1 */ uint8_t _barrier_data; /* 41 1 */ public: protected: public: /* size: 48, cachelines: 1, members: 8 */ /* padding: 6 */ /* paddings: 1, sum paddings: 1 */ /* last cacheline: 48 bytes */ }; class VectorSet : public AnyObj { public: /* class AnyObj <ancestor>; */ /* 0 0 */ /* XXX last struct has 1 byte of padding */ static const uint word_bits; /* 0 0 */ static const uint bit_mask; /* 0 0 */ uint _size; /* 0 4 */ uint _data_size; /* 4 4 */ uint32_t * _data; /* 8 8 */ class Arena * _set_arena; /* 16 8 */ /* size: 24, cachelines: 1, members: 5, static members: 2 */ /* paddings: 1, sum paddings: 1 */ /* last cacheline: 24 bytes */ }; I wrote simple program that just assigns integer value to a variable and observed the following - Number of ArrayPointer instances = 58. Number of C2Access instances = 1390. Number of CallJavaNode instances = 1626. 58 * 8 byte + 1390 * 8 + 1626 * 8 = 24KB 24 KB space saving at the very least and significant memory footprint savings for much complex programs. ------------- Commit messages: - 8334230: Keep constructor order same as before & optimize VectorSet - 8334230: Optimize C2 classes layout Changes: https://git.openjdk.org/jdk/pull/19861/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19861&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334230 Stats: 20 lines in 4 files changed: 8 ins; 8 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19861.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19861/head:pull/19861 PR: https://git.openjdk.org/jdk/pull/19861 From coleenp at openjdk.org Fri Jul 5 15:08:40 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 5 Jul 2024 15:08:40 GMT Subject: RFR: 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 [v3] In-Reply-To: <uFD2HVD2DS8b9XI68lOXqSyyT3gdfmNFXmYIUozJ3hc=.f5aa1a99-e90e-4f5e-9159-c9724205fbd9@github.com> References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> <uFD2HVD2DS8b9XI68lOXqSyyT3gdfmNFXmYIUozJ3hc=.f5aa1a99-e90e-4f5e-9159-c9724205fbd9@github.com> Message-ID: <irnzAa0yJB8hugTmGr7TSJi6kqutUJ_2-nrfQj1w0Rc=.d279c33e-7535-47c5-bcfb-6e4903677d79@github.com> On Fri, 5 Jul 2024 15:01:05 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote: >> The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens. >> >> The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details). >> >> The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. >> >> I tested the patch by running it through mach5 tiers 1-6. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > use DEBUG_ONLY on _used declaration Perfect, thanks! ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20012#pullrequestreview-2160900015 From pchilanomate at openjdk.org Fri Jul 5 15:20:13 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 5 Jul 2024 15:20:13 GMT Subject: RFR: 8335269: [Graal] occasional timeout in java/lang/StringBuffer/TestSynchronization.java with loom Message-ID: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com> Please review the following simple fix. A pinned virtual thread calling Thread.yield() in a loop might never poll for safepoints if the compiler relies on a poll in native method Continuation.doYield while optimizing. This is a special native method that doesn't always poll for safepoints, and in particular it doesn't if the virtual thread is pinned due to owning monitors. Currently this scenario can be reproduced with the Graal compiler. I included a test which reproduces the issue with Graal (couldn't reproduce the issue with c2). The test times out without the fix and passes with it. I also run the patch through mach5 tiers1-3. Thanks, Patricio ------------- Commit messages: - v1 Changes: https://git.openjdk.org/jdk/pull/20016/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20016&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335269 Stats: 81 lines in 2 files changed: 81 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20016.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20016/head:pull/20016 PR: https://git.openjdk.org/jdk/pull/20016 From ccheung at openjdk.org Fri Jul 5 16:41:32 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 5 Jul 2024 16:41:32 GMT Subject: RFR: 8312125: Refactor CDS enum class handling [v2] In-Reply-To: <xxU06cCiROZP1kPcY6pWxomBLPTGPPnxrbc22c-K08E=.4c6d7a88-29d0-43eb-a917-4e90766ddfc9@github.com> References: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com> <xxU06cCiROZP1kPcY6pWxomBLPTGPPnxrbc22c-K08E=.4c6d7a88-29d0-43eb-a917-4e90766ddfc9@github.com> Message-ID: <qmGt-PH377XuHmbiyWcBVmOns3mAHT1yBjtb5zLvVds=.8cec38c7-6af9-4603-8659-10d23f7943f1@github.com> On Wed, 3 Jul 2024 19:57:51 GMT, Ioi Lam <iklam at openjdk.org> wrote: >> Please review this simple refactoring of the CDS code for handling enum classes. The code is moved to new files cdsEnumKlass.cpp/hpp. There's otherwise no change. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > fixed copyright Refactoring looks good. I have one suggestion. src/hotspot/share/cds/cdsEnumKlass.cpp line 136: > 134: return true; > 135: } > 136: #endif Suggestion: `#endif INCLUDE_CDS_JAVA_HEAP` ------------- Marked as reviewed by ccheung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20013#pullrequestreview-2161016601 PR Review Comment: https://git.openjdk.org/jdk/pull/20013#discussion_r1666987825 From duke at openjdk.org Fri Jul 5 17:25:34 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Fri, 5 Jul 2024 17:25:34 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 In-Reply-To: <nq7CFhZoe0rxlErqNKGguqM2rPTQPpg_9Fr6Cj5JHXE=.856ec966-04d8-41b8-b1cf-33ece5ff843a@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <Yg1PC9SInsa5q1qJvsDjEJuqUHoW5WLMYMUEo9Rx_WE=.9b40b7c4-a21e-4139-b6c3-52c2757e933d@github.com> <nq7CFhZoe0rxlErqNKGguqM2rPTQPpg_9Fr6Cj5JHXE=.856ec966-04d8-41b8-b1cf-33ece5ff843a@github.com> Message-ID: <HhMPWbuQUf5pmev0UqK8WkDpVZdmzw211OLOa7OQLp8=.f41ed4f0-e1c9-4ee1-afa0-74c052e29ce5@github.com> On Thu, 16 May 2024 12:40:30 GMT, Andrew Haley <aph at openjdk.org> wrote: >> Hi, >> >>> I can update the patch with current results on Monday and we could decide how to proceed with this PR after that. Sounds good? >> >> Yes, that's right. > >> Hi @theRealAph ! You may find the latest version here: [mikabl-arm at b3db421](https://github.com/mikabl-arm/jdk/commit/b3db421c795f683db1a001853990026bafc2ed4b) . I gave a short explanation in the commit message, feel free to ask for more details if required. >> >> Unfortunately, it still contains critical bugs and I won't be able to take a look into the issue before the next week at best. Until it's fixed, it's not possible to run the benchmarks. Although I expect it to improve performance on longer integer arrays based on a benchmark I've written in C++ and Assembly. The results aren't comparable to the jmh results, so I won't post them here. > > OK. One small thing, I think it's possible to rearrange things a bit to use `mlav`, which may help performance. No need for that until the code is correct, though. Hi @theRealAph ! This took a while, but please find a fixed version here: https://github.com/mikabl-arm/jdk/tree/285826-vmul Here are performance numbers collected for Neoverse V2 compared to the common baseline and the latest state of this PR: | d2ea6b1e657 | f19203015fb | 5504227bfe3 | | baseline | PR | 285826-vmul | ----------------------------------------------------------|---------------------------------------|------------------|------ Benchmark (size) Mode Cnt | Score Error | Score Error | Score Error | Units ----------------------------------------------------------|---------------------------------------|------------------|------ ArraysHashCode.bytes 1 avgt 15 | 0.859 ? 0.166 | 0.720 ? 0.103 | 0.732 ? 0.105 | ns/op ArraysHashCode.bytes 10 avgt 15 | 4.440 ? 0.013 | 2.262 ? 0.009 | 3.454 ? 0.057 | ns/op ArraysHashCode.bytes 100 avgt 15 | 78.642 ? 0.119 | 15.997 ? 0.023 | 12.753 ? 0.072 | ns/op ArraysHashCode.bytes 10000 avgt 15 | 9248.961 ? 11.332 | 1879.905 ? 11.609 | 1345.014 ? 1.947 | ns/op ArraysHashCode.chars 1 avgt 15 | 0.695 ? 0.036 | 0.694 ? 0.035 | 0.682 ? 0.036 | ns/op ArraysHashCode.chars 10 avgt 15 | 4.436 ? 0.015 | 2.428 ? 0.034 | 3.352 ? 0.031 | ns/op ArraysHashCode.chars 100 avgt 15 | 78.660 ? 0.113 | 14.508 ? 0.075 | 11.784 ? 0.088 | ns/op ArraysHashCode.chars 10000 avgt 15 | 9253.807 ? 13.660 | 2010.053 ? 3.549 | 1344.716 ? 1.936 | ns/op ArraysHashCode.ints 1 avgt 15 | 0.635 ? 0.022 | 0.640 ? 0.022 | 0.640 ? 0.022 | ns/op ArraysHashCode.ints 10 avgt 15 | 4.424 ? 0.006 | 2.752 ? 0.012 | 3.388 ? 0.004 | ns/op ArraysHashCode.ints 100 avgt 15 | 78.680 ? 0.120 | 14.794 ? 0.131 | 11.090 ? 0.055 | ns/op ArraysHashCode.ints 10000 avgt 15 | 9249.520 ? 13.305 | 1997.441 ? 3.299 | 1340.916 ? 1.843 | ns/op ArraysHashCode.multibytes 1 avgt 15 | 0.566 ? 0.023 | 0.563 ? 0.021 | 0.554 ? 0.012 | ns/op ArraysHashCode.multibytes 10 avgt 15 | 2.679 ? 0.018 | 1.798 ? 0.038 | 1.973 ? 0.021 | ns/op ArraysHashCode.multibytes 100 avgt 15 | 36.934 ? 0.055 | 9.118 ? 0.018 | 12.712 ? 0.026 | ns/op ArraysHashCode.multibytes 10000 avgt 15 | 4861.700 ? 6.563 | 1005.809 ? 2.260 | 721.366 ? 1.570 | ns/op ArraysHashCode.multichars 1 avgt 15 | 0.557 ? 0.016 | 0.552 ? 0.001 | 0.563 ? 0.021 | ns/op ArraysHashCode.multichars 10 avgt 15 | 2.700 ? 0.018 | 1.840 ? 0.024 | 1.978 ? 0.008 | ns/op ArraysHashCode.multichars 100 avgt 15 | 36.932 ? 0.054 | 8.633 ? 0.020 | 8.678 ? 0.052 | ns/op ArraysHashCode.multichars 10000 avgt 15 | 4859.462 ? 6.693 | 1063.788 ? 3.057 | 752.857 ? 5.262 | ns/op ArraysHashCode.multiints 1 avgt 15 | 0.574 ? 0.023 | 0.554 ? 0.011 | 0.559 ? 0.017 | ns/op ArraysHashCode.multiints 10 avgt 15 | 2.707 ? 0.028 | 1.907 ? 0.031 | 1.992 ? 0.036 | ns/op ArraysHashCode.multiints 100 avgt 15 | 36.942 ? 0.056 | 9.141 ? 0.013 | 8.174 ? 0.029 | ns/op ArraysHashCode.multiints 10000 avgt 15 | 4872.540 ? 7.479 | 1187.393 ? 12.083 | 785.256 ? 9.472 | ns/op ArraysHashCode.multishorts 1 avgt 15 | 0.558 ? 0.016 | 0.555 ? 0.012 | 0.566 ? 0.022 | ns/op ArraysHashCode.multishorts 10 avgt 15 | 2.696 ? 0.015 | 1.854 ? 0.027 | 1.983 ? 0.009 | ns/op ArraysHashCode.multishorts 100 avgt 15 | 36.930 ? 0.051 | 8.652 ? 0.011 | 8.681 ? 0.039 | ns/op ArraysHashCode.multishorts 10000 avgt 15 | 4863.966 ? 6.736 | 1068.627 ? 1.902 | 760.280 ? 5.150 | ns/op ArraysHashCode.shorts 1 avgt 15 | 0.665 ? 0.058 | 0.644 ? 0.022 | 0.636 ? 0.023 | ns/op ArraysHashCode.shorts 10 avgt 15 | 4.431 ? 0.006 | 2.432 ? 0.024 | 3.332 ? 0.026 | ns/op ArraysHashCode.shorts 100 avgt 15 | 78.630 ? 0.103 | 14.521 ? 0.077 | 11.783 ? 0.093 | ns/op ArraysHashCode.shorts 10000 avgt 15 | 9249.908 ? 12.039 | 2010.461 ? 2.548 | 1344.441 ? 1.818 | ns/op StringHashCode.Algorithm.defaultLatin1 1 avgt 15 | 0.770 ? 0.001 | 0.770 ? 0.001 | 0.770 ? 0.001 | ns/op StringHashCode.Algorithm.defaultLatin1 10 avgt 15 | 4.305 ? 0.009 | 2.260 ? 0.009 | 3.433 ? 0.015 | ns/op StringHashCode.Algorithm.defaultLatin1 100 avgt 15 | 78.355 ? 0.102 | 16.140 ? 0.038 | 12.767 ? 0.023 | ns/op StringHashCode.Algorithm.defaultLatin1 10000 avgt 15 | 9269.665 ? 13.817 | 1893.354 ? 3.677 | 1345.571 ? 1.930 | ns/op StringHashCode.Algorithm.defaultUTF16 1 avgt 15 | 0.736 ? 0.100 | 0.653 ? 0.083 | 0.690 ? 0.101 | ns/op StringHashCode.Algorithm.defaultUTF16 10 avgt 15 | 4.280 ? 0.018 | 2.374 ? 0.021 | 3.394 ? 0.010 | ns/op StringHashCode.Algorithm.defaultUTF16 100 avgt 15 | 78.312 ? 0.118 | 14.603 ? 0.103 | 11.837 ? 0.016 | ns/op StringHashCode.Algorithm.defaultUTF16 10000 avgt 15 | 9249.562 ? 13.113 | 2011.717 ? 4.097 | 1344.715 ? 1.896 | ns/op StringHashCode.cached N/A avgt 15 | 0.539 ? 0.027 | 0.525 ? 0.018 | 0.525 ? 0.018 | ns/op StringHashCode.empty N/A avgt 15 | 0.861 ? 0.163 | 0.670 ? 0.079 | 0.694 ? 0.093 | ns/op StringHashCode.notCached N/A avgt 15 | 0.698 ? 0.108 | 0.648 ? 0.024 | 0.637 ? 0.023 | ns/op There are several known issues: - [ ] For arrays shorter than the number of elements processed by a single iteration of the Neon loop performance is not optimal, though still better than the baseline's. - [ ] The intrinsic take 364 Bytes in the worst case (for BYTE/BOOLEAN types) which may either significantly increase code size or limit inlining opportunities. - [ ] As mentioned before, the implementation might be affected by https://bugs.openjdk.org/browse/JDK-8139457 . To address the first two we could implement the vectorized part of the algorithm as a separate stub method. Please let me know if this sound like a right approach or you have other suggestions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2211186951 From aph at openjdk.org Fri Jul 5 17:46:35 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 5 Jul 2024 17:46:35 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v9] In-Reply-To: <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com> Message-ID: <_8D8tMevrVR00rHIHSQRHnfBxjoApH7UcHH-1HRl2mo=.4866b276-5856-43b2-927e-86c2f4e9d60a@github.com> On Mon, 1 Jul 2024 16:54:55 GMT, Hamlin Li <mli at openjdk.org> wrote: >> Hi, >> Can you help to review the patch? >> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294). >> >> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time. >> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk. >> >> Besides of the code changes, one important task is to handle the legal process. >> >> Thanks! >> >> ## Test >> tests: >> * test/jdk/jdk/incubator/vector/ >> * test/hotspot/jtreg/compiler/vectorapi/ >> >> options: >> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:+EnableVectorSupport -XX:-UseVectorStubs >> >> ## Performance >> >> ### Options >> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' >> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' >> >> ### Float >> data >> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> >> Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0) >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408 >> Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876 >> Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936 >> Float128Vector.ATAN2 | 1024 | thrpt | 10 | 0.021 | ops/ms | 135.088 | 32.449 | 4.163 | 135.721 | 32.579 | 4.166 >> Float128Vector.CBRT | 1024 | thrpt | 10 | 0.004 | ops/ms | 114.547 | 39.517 | 2.... > > Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits: > > - Merge branch 'master' into sleef-aarch64-integrate-source > - merge master > - sleef 3.6.1 for riscv > - sleef 3.6.1 > - update header files for arm > - add inline header file for riscv64 > - remove notes about sleef changes > - fix performance issue > - disable unused-function warnings; add log msg > - minor > - ... and 23 more: https://git.openjdk.org/jdk/compare/2f4f6cc3...b54fc863 I have now wasted two hours trying to duplicate your results. I need you to write here the _exact_ command line that produced your numbers above, along with the full configure and build options you used. I also had problems with javac running out of heap space, which was very odd. I fixed it with this: diff --git a/make/autoconf/boot-jdk.m4 b/make/autoconf/boot-jdk.m4 index 8d272c28ad5..617ccfd8fff 100644 --- a/make/autoconf/boot-jdk.m4 +++ b/make/autoconf/boot-jdk.m4 @@ -470,7 +470,7 @@ AC_DEFUN_ONCE([BOOTJDK_SETUP_BOOT_JDK_ARGUMENTS], # Maximum amount of heap memory. JVM_HEAP_LIMIT_32="768" # Running a 64 bit JVM allows for and requires a bigger heap - JVM_HEAP_LIMIT_64="1600" + JVM_HEAP_LIMIT_64="6400" ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2211202867 PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2211202959 From iklam at openjdk.org Sun Jul 7 01:50:17 2024 From: iklam at openjdk.org (Ioi Lam) Date: Sun, 7 Jul 2024 01:50:17 GMT Subject: RFR: 8312125: Refactor CDS enum class handling [v3] In-Reply-To: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com> References: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com> Message-ID: <DcNbL1qEcDW0knnYfhWkR7hhD5UFwFw1Ko-qcUmx64Y=.50b0c364-aed1-46c6-a32e-62a347195c05@github.com> > Please review this simple refactoring of the CDS code for handling enum classes. The code is moved to new files cdsEnumKlass.cpp/hpp. There's otherwise no change. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into 8312125-refactor-cds-enum-class-handling - @calvinccheung comments - fixed copyright - 8312125: Refactor CDS enum class handling ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20013/files - new: https://git.openjdk.org/jdk/pull/20013/files/49dc109e..fcd987fb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20013&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20013&range=01-02 Stats: 5501 lines in 320 files changed: 2525 ins; 1662 del; 1314 mod Patch: https://git.openjdk.org/jdk/pull/20013.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20013/head:pull/20013 PR: https://git.openjdk.org/jdk/pull/20013 From iklam at openjdk.org Sun Jul 7 04:23:31 2024 From: iklam at openjdk.org (Ioi Lam) Date: Sun, 7 Jul 2024 04:23:31 GMT Subject: RFR: 8312125: Refactor CDS enum class handling [v2] In-Reply-To: <uy6CKlyFbVZ-yLe6Mklejpa6AmToFoMFuV_tL6VJ-f4=.0d123829-80e2-43f3-8d2c-c7ff8973cb0a@github.com> References: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com> <xxU06cCiROZP1kPcY6pWxomBLPTGPPnxrbc22c-K08E=.4c6d7a88-29d0-43eb-a917-4e90766ddfc9@github.com> <uy6CKlyFbVZ-yLe6Mklejpa6AmToFoMFuV_tL6VJ-f4=.0d123829-80e2-43f3-8d2c-c7ff8973cb0a@github.com> Message-ID: <BuBqsQfsbxbDl8S0LldmtcVabdNX6WxUS9Bkf_G1lyM=.d2244af7-0f1d-4e65-bca5-eec48c93ae97@github.com> On Wed, 3 Jul 2024 20:05:17 GMT, Matias Saavedra Silva <matsaave at openjdk.org> wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed copyright > > Thanks for the changes and clarification! Thanks @matias9927 @calvinccheung for the review ------------- PR Comment: https://git.openjdk.org/jdk/pull/20013#issuecomment-2212317191 From duke at openjdk.org Sun Jul 7 15:16:02 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Sun, 7 Jul 2024 15:16:02 GMT Subject: RFR: 8334999: RISC-V: implement AES single block encryption/decryption intrinsics [v2] In-Reply-To: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com> References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com> Message-ID: <eGRQlTfJGvdSd84lJn1MUGon75zsDTYTOhMbVqQryC8=.3cff42c0-7b5c-4870-929e-3acfa74e31bd@github.com> > Hello everyone! Please review this port of vector AES single block encryption/decryption intrinsics. On my QEMU with `Zvkned` extension enabled the `test/hotspot/jtreg/compiler/codegen/aes/TestAESMain.java` test is OK. I know that currently hardware implementing this extension is not available on the market but I suppose this PR can be a good starting point on supporting AES intrinsics for RISC-V in OpenJDK. ArsenyBochkarev has updated the pull request incrementally with three additional commits since the last revision: - Use t2 directly instead of temp2 - Rename temp1 -> x0 - Left a note on a side effect of generate_vle32_pack4 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19960/files - new: https://git.openjdk.org/jdk/pull/19960/files/02dc4e29..9f5c7831 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19960&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19960&range=00-01 Stats: 20 lines in 1 file changed: 6 ins; 4 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/19960.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19960/head:pull/19960 PR: https://git.openjdk.org/jdk/pull/19960 From duke at openjdk.org Sun Jul 7 15:16:03 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Sun, 7 Jul 2024 15:16:03 GMT Subject: RFR: 8334999: RISC-V: implement AES single block encryption/decryption intrinsics [v2] In-Reply-To: <BWV1qtKhP0MV1SrYotttrc0LqUNWLVWjUqwF5ZQQPj0=.c586f3d6-d2f3-4325-a2c8-9de67f67b6ec@github.com> References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com> <BWV1qtKhP0MV1SrYotttrc0LqUNWLVWjUqwF5ZQQPj0=.c586f3d6-d2f3-4325-a2c8-9de67f67b6ec@github.com> Message-ID: <S-lKZVVFKzT6MT8LKqxPRIbzQjOP5i2FSAfE4qwWtdo=.96cb5827-835b-492c-8669-0109b7144a67@github.com> On Mon, 1 Jul 2024 06:37:32 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >> ArsenyBochkarev has updated the pull request incrementally with three additional commits since the last revision: >> >> - Use t2 directly instead of temp2 >> - Rename temp1 -> x0 >> - Left a note on a side effect of generate_vle32_pack4 > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2348: > >> 2346: __ lwu(keylen, Address(key, arrayOopDesc::length_offset_in_bytes() - arrayOopDesc::base_offset_in_bytes(T_INT))); >> 2347: >> 2348: __ vsetivli(temp1, 4, Assembler::e32, Assembler::m1); > > There is no use of `temp1` after, should we replace with `x0`? Replaced, thanks! > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2351: > >> 2349: __ vle32_v(res, from); >> 2350: __ vmv_v_x(vzero, zr); >> 2351: generate_vle32_pack4(key, vtmp1, vtmp2, vtmp3, vtmp4); > > It would be great to add a quick comment mentioning the side effect on `key` of this function call. Same at https://github.com/openjdk/jdk/pull/19960/files#diff-97f199af6d1c8c17b2fa4f50eb1bbc0081858cc59a899f32792a2d31f933ccc4R2355 and https://github.com/openjdk/jdk/pull/19960/files#diff-97f199af6d1c8c17b2fa4f50eb1bbc0081858cc59a899f32792a2d31f933ccc4R2359 Done > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2362: > >> 2360: generate_rev8_pack2(vtmp1, vtmp2); >> 2361: >> 2362: __ mv(temp2, 44); > > You could replace `temp2` by `t0`/`t1`/`t2` Ok, done! I used `t2` > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2448: > >> 2446: __ lwu(keylen, Address(key, arrayOopDesc::length_offset_in_bytes() - arrayOopDesc::base_offset_in_bytes(T_INT))); >> 2447: >> 2448: __ vsetivli(temp1, 4, Assembler::e32, Assembler::m1); > > Same as for encrypt, there is no use of `temp1`, could you replace by `x0`? Replaced > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2459: > >> 2457: generate_aesdecrypt_round(res, vzero, vtmp1, vtmp2, vtmp3, vtmp4); >> 2458: >> 2459: generate_vle32_pack4(key, vtmp1, vtmp2, vtmp3, vtmp4); > > Same as above, please add a comment on the side effect on `key`. All done! > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2466: > >> 2464: generate_rev8_pack2(vtmp1, vtmp2); >> 2465: >> 2466: __ mv(temp2, 44); > > Same as above, could you use `t0`/`t1`/`t2` instead? Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1667713599 PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1667713593 PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1667713596 PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1667713589 PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1667713582 PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1667713587 From duke at openjdk.org Mon Jul 8 05:30:40 2024 From: duke at openjdk.org (duke) Date: Mon, 8 Jul 2024 05:30:40 GMT Subject: Withdrawn: 8330171: Lazy W^X switch implementation In-Reply-To: <9eymaXovxUNFdkAkzojFQP5trwl_yyY0jE2GzcMEjR4=.02ee2ef9-c476-4c7c-9e4a-e021425c38bc@github.com> References: <9eymaXovxUNFdkAkzojFQP5trwl_yyY0jE2GzcMEjR4=.02ee2ef9-c476-4c7c-9e4a-e021425c38bc@github.com> Message-ID: <Ed367SfEDzhRnhlB4mzXMj-ULsHh-0tK3oQ1orsj6aA=.da1f0f1b-37ec-477f-bf9d-27498c224052@github.com> On Fri, 12 Apr 2024 14:40:05 GMT, Sergey Nazarkin <snazarki at openjdk.org> wrote: > An alternative for preemptively switching the W^X thread mode on macOS with an AArch64 CPU. This implementation triggers the switch in response to the SIGBUS signal if the *si_addr* belongs to the CodeCache area. With this approach, it is now feasible to eliminate all WX guards and avoid potentially costly operations. However, no significant improvement or degradation in performance has been observed. Additionally, considering the issue with AsyncGetCallTrace, the patched JVM has been successfully operated with [asgct_bottom](https://github.com/parttimenerd/asgct_bottom) and [async-profiler](https://github.com/async-profiler/async-profiler). > > Additional testing: > - [x] MacOS AArch64 server fastdebug *gtets* > - [ ] MacOS AArch64 server fastdebug *jtreg:hotspot:tier4* > - [ ] Benchmarking > > @apangin and @parttimenerd could you please check the patch on your scenarios?? This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/18762 From tanksherman27 at gmail.com Mon Jul 8 06:04:11 2024 From: tanksherman27 at gmail.com (Julian Waters) Date: Mon, 8 Jul 2024 14:04:11 +0800 Subject: Where does VMError::print_native_stack and os::get_sender_for_C_frame load/use the frame pointer? Message-ID: <CAP2b4GNCAh20cyz_JgF+kg34zzyNHznGSUB4_5_E0ot1ZJnwoA@mail.gmail.com> Hi all, I have a question with regards to os::get_sender_for_C_frame and VMError::print_native_stack. In Windows specific code comments allude to both needing the rbp register to be saved, which is why VMError::print_native_stack doesn't work on Windows since Microsoft Visual C doesn't save the frame pointer, as stated: /* * Windows/x64 does not use stack frames the way expected by Java: * [1] in most cases, there is no frame pointer. All locals are addressed via RSP * [2] in rare cases, when alloca() is used, a frame pointer is used, but this may * not be RBP. * See http://msdn.microsoft.com/en-us/library/ew5tede7.aspx * * So it's not possible to print the native stack using the * while (...) {... fr = os::get_sender_for_C_frame(&fr); } * loop in vmError.cpp. We need to roll our own loop. */ // VC++ does not save frame pointer on stack in optimized build. It // can be turned off by -Oy-. If we really want to walk C frames, // we can use the StackWalk() API. I can't seem to find where rbp is loaded and used on platforms and compilers that do save the frame pointer though. Eclipse cannot find it through the vast collection of member methods inside the frame class and related code. Do anyone by any chance know where the code that loads and uses the frame pointer for os::get_sender_for_C_frame and VMError::print_native_stack is located on such platforms? best regards, Julian From xpeng at openjdk.org Mon Jul 8 06:23:32 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 8 Jul 2024 06:23:32 GMT Subject: RFR: 8334231: Optimize MethodData layout In-Reply-To: <qBONEcrgJyYqSsBdiDRbA9NeV8sC8uXKRY2zbpDE8Fc=.1dfd2cbc-5982-4958-b7cb-313d0c52139a@github.com> References: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com> <qBONEcrgJyYqSsBdiDRbA9NeV8sC8uXKRY2zbpDE8Fc=.1dfd2cbc-5982-4958-b7cb-313d0c52139a@github.com> Message-ID: <UpB5_GiY7tF1AcVV86gvr2GY3RCIoYwRpgKEPMlCmco=.efdb7ab9-e11b-49b1-b313-3ba12cc738d4@github.com> On Thu, 4 Jul 2024 12:41:15 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Hi all, >> This PR is a part of https://bugs.openjdk.org/browse/JDK-8334227 to optimize Hotspot C++ class layouts, this one is for the layout of MethodData. Here is the original layout from `pahole`: >> >> class MethodData : public Metadata { >> public: >> >> /* class Metadata <ancestor>; */ /* 0 0 */ >> >> /* XXX 8 bytes hole, try to pack */ >> >> class Method * _method; /* 8 8 */ >> int _size; /* 16 4 */ >> int _hint_di; /* 20 4 */ >> class Mutex _extra_data_lock; /* 24 104 */ >> /* --- cacheline 2 boundary (128 bytes) --- */ >> class CompilerCounters _compiler_counters; /* 128 80 */ >> /* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */ >> intx _eflags; /* 208 8 */ >> intx _arg_local; /* 216 8 */ >> intx _arg_stack; /* 224 8 */ >> intx _arg_returned; /* 232 8 */ >> int _creation_mileage; /* 240 4 */ >> class InvocationCounter _invocation_counter; /* 244 4 */ >> class InvocationCounter _backedge_counter; /* 248 4 */ >> int _invocation_counter_start; /* 252 4 */ >> /* --- cacheline 4 boundary (256 bytes) --- */ >> int _backedge_counter_start; /* 256 4 */ >> uint _tenure_traps; /* 260 4 */ >> int _invoke_mask; /* 264 4 */ >> int _backedge_mask; /* 268 4 */ >> short int _num_loops; /* 272 2 */ >> short int _num_blocks; /* 274 2 */ >> enum WouldProfile _would_profile; /* 276 4 */ >> int _jvmci_ir_size; /* 280 4 */ >> >> /* XXX 4 bytes hole, try to pack */ >> >> class FailedSpeculation * _failed_speculations; /* 288 8 */ >> int _data_size; /* 296 4 */ >> int _parameters_type_data_di; /* 300 4 */ >> int _exception_handler_data_di; /* 304 4 */ >> >> /* XXX 4 bytes hole, try to pack */ >> >> intptr_t _data[1]; /* 312 8 */ >> >> /* size: 320, cachelin... > > I don't think these "Optimize XXX layouts" should be marked as trivial, and I worry that we overuse the trivial rule. It circumvents the second reviewer as well as the 24hr rule, which both are necessary safeties. Especially in the wake of the xz fiasco. > > "Trivial" is usually reserved for either changes that need very quick reaction (e.g. reasonably simple build errors that require immediate fixing because everyone's CI is standing still) or things that are painfully obvious in being trivial, e.g. comment changes. Memory layout changes are neither urgent nor really trivial enough IMHO. Thanks @tstuefe @chhagedorn @dholmes-ora for the reviews and discussion about the "trivial" topic, I agree that this may not be trivial and I'll be cautious when declare PR is trivial. Honestly I don't know if there is explicit rules for trivial/non-trivial, I felt it was very simple change swapping the locations of two fields and should be qualified as trivial(subjective judgement). I'm ok to remove the declaration about trivial. For the PR itself, there are already two reviewer approvals, and we have probably got enough eyes on it, if you don't have other concerns, I'll integrate it next week @tstuefe ------------- PR Comment: https://git.openjdk.org/jdk/pull/20019#issuecomment-2213124652 From dholmes at openjdk.org Mon Jul 8 06:41:33 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 8 Jul 2024 06:41:33 GMT Subject: RFR: 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 [v3] In-Reply-To: <uFD2HVD2DS8b9XI68lOXqSyyT3gdfmNFXmYIUozJ3hc=.f5aa1a99-e90e-4f5e-9159-c9724205fbd9@github.com> References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> <uFD2HVD2DS8b9XI68lOXqSyyT3gdfmNFXmYIUozJ3hc=.f5aa1a99-e90e-4f5e-9159-c9724205fbd9@github.com> Message-ID: <dUqU_Wdq2TZmik1puxHRebJyRI6fFugtaOPW9saIpfg=.1da6f230-c9a8-4ff9-8fdb-1e6d63590076@github.com> On Fri, 5 Jul 2024 15:01:05 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote: >> The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens. >> >> The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details). >> >> The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. >> >> I tested the patch by running it through mach5 tiers 1-6. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > use DEBUG_ONLY on _used declaration Still good. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20012#pullrequestreview-2162398031 From david.holmes at oracle.com Mon Jul 8 07:15:18 2024 From: david.holmes at oracle.com (David Holmes) Date: Mon, 8 Jul 2024 17:15:18 +1000 Subject: Where does VMError::print_native_stack and os::get_sender_for_C_frame load/use the frame pointer? In-Reply-To: <CAP2b4GNCAh20cyz_JgF+kg34zzyNHznGSUB4_5_E0ot1ZJnwoA@mail.gmail.com> References: <CAP2b4GNCAh20cyz_JgF+kg34zzyNHznGSUB4_5_E0ot1ZJnwoA@mail.gmail.com> Message-ID: <e8667640-8e29-468d-8a51-e6f996921885@oracle.com> Hi Julian, On 8/07/2024 4:04 pm, Julian Waters wrote: > Hi all, > > I have a question with regards to os::get_sender_for_C_frame and > VMError::print_native_stack. In Windows specific code comments allude > to both needing the rbp register to be saved, which is why > VMError::print_native_stack > doesn't work on Windows since Microsoft Visual C doesn't save the frame > pointer, as stated: > > /* > * Windows/x64 does not use stack frames the way expected by Java: > * [1] in most cases, there is no frame pointer. All locals are addressed via RSP > * [2] in rare cases, when alloca() is used, a frame pointer is used, > but this may > * not be RBP. > * See http://msdn.microsoft.com/en-us/library/ew5tede7.aspx > * > * So it's not possible to print the native stack using the > * while (...) {... fr = os::get_sender_for_C_frame(&fr); } > * loop in vmError.cpp. We need to roll our own loop. > */ > > // VC++ does not save frame pointer on stack in optimized build. It > // can be turned off by -Oy-. If we really want to walk C frames, > // we can use the StackWalk() API. > > I can't seem to find where rbp is loaded and used on platforms and > compilers that do save the frame pointer though. Eclipse cannot find > it through the vast collection of member methods inside the frame > class and related code. Do anyone by any chance know where the code that > loads and uses the frame pointer for os::get_sender_for_C_frame and > VMError::print_native_stack is located on such platforms? Isn't this part of the ABI for these platforms, so the C/C++ compiler maintains them. ?? David ----- > best regards, > Julian From mli at openjdk.org Mon Jul 8 07:55:36 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 8 Jul 2024 07:55:36 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v9] In-Reply-To: <_8D8tMevrVR00rHIHSQRHnfBxjoApH7UcHH-1HRl2mo=.4866b276-5856-43b2-927e-86c2f4e9d60a@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com> <_8D8tMevrVR00rHIHSQRHnfBxjoApH7UcHH-1HRl2mo=.4866b276-5856-43b2-927e-86c2f4e9d60a@github.com> Message-ID: <-UPo0QHxKRA0bBqkX8prOTaMedIwLVygqzwvABbA4mY=.2ebd0262-a275-4d80-94a7-77225b4d54d7@github.com> On Fri, 5 Jul 2024 17:44:14 GMT, Andrew Haley <aph at openjdk.org> wrote: > I also had problems with javac running out of heap space, which was very odd. I fixed it with this: > > ``` > diff --git a/make/autoconf/boot-jdk.m4 b/make/autoconf/boot-jdk.m4 > index 8d272c28ad5..617ccfd8fff 100644 > --- a/make/autoconf/boot-jdk.m4 > +++ b/make/autoconf/boot-jdk.m4 > @@ -470,7 +470,7 @@ AC_DEFUN_ONCE([BOOTJDK_SETUP_BOOT_JDK_ARGUMENTS], > # Maximum amount of heap memory. > JVM_HEAP_LIMIT_32="768" > # Running a 64 bit JVM allows for and requires a bigger heap > - JVM_HEAP_LIMIT_64="1600" > + JVM_HEAP_LIMIT_64="6400" > ``` For the command to run the tests, I use `make test TEST=org.openjdk.bench.jdk.incubator.vector.operation.Float" MICRO="FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs"`. I just copy and run Double/Float benchmark tests (without copying other tests under `org.openjdk.bench.jdk.incubator.vector.operation`), in which way I think it will not have this OOM issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2213280630 From tanksherman27 at gmail.com Mon Jul 8 07:59:20 2024 From: tanksherman27 at gmail.com (Julian Waters) Date: Mon, 8 Jul 2024 15:59:20 +0800 Subject: Where does VMError::print_native_stack and os::get_sender_for_C_frame load/use the frame pointer? Message-ID: <CAP2b4GNPq3Fr3X=v=_8nFLwYTbC7e=0N5Xd2i2jOvXfqqftCrQ@mail.gmail.com> Hi David, Ah, I think you misunderstood me, I'm aware that the frame pointer is saved as required by the compiler (With the exception of the Microsoft compiler, which doesn't save it at all). What I meant was that the comments in Windows code imply that VMError::print_native_stack and os::get_sender_for_C_frame need to use the frame pointer, yet I can't seem to find where or how either of them obtain the frame pointer for whatever they use it for on platforms and compilers where the frame pointer is saved (For instance, on Linux), whether through handwritten assembly code or some other means. It follows that if they need to use the frame pointer, then they must grab it from somewhere, after all best regards, Julian From aboldtch at openjdk.org Mon Jul 8 08:25:02 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 8 Jul 2024 08:25:02 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping Message-ID: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. # Cleanups Cleaned up displaced header usage for: * BasicLock * Contains some Zero changes * Renames one exported JVMCI field * ObjectMonitor * Updates comments and tests consistencies # Refactoring `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ # LightweightSynchronizer Working on adapting and incorporating the following section as a comment in the source code ## Fast Locking CAS on locking bits in markWord. 0b00 (Fast Locked) <--> 0b01 (Unlocked) When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inflated locking is performed. ### Fast Lock Spinning (UseObjectMonitorTable) When a thread fails fast locking when a monitor is not yet inflated, it will spin on the markWord using a exponential backoff scheme. The thread will attempt the fast lock CAS and then SpinWait() for some time, doubling with every failed attempt, up to a maximum number of attempts. There is a diagnostic VM option LightweightFastLockingSpins which can be used to tune this value. The behavior of SpinWait() can be hardware dependent. A future improvement may be to adapt this spinning limit to observed behavior. Which would automatically adapt to the different hardware behavior of SpinWait(). ## Inflated Locking Inflated locking means that a ObjectMonitor is associated with the object and is used for locking instead of the locking bits in the markWord. ## Inflated Locking without table (!UseObjectMonitorTable) An inflating thread will create a ObjectMonitor and CAS the ObjectMonitor* into the markWord along with the 0b10 (Inflated) lock bits. If the transition of the lock bits is from 0b00 (Fast Locked) the ObjectMonitor must be published with an anonymous owner (setting _owner to ANONYMOUS_OWNER). If the transition of the lock bits is from 0b00 (Unlocked) the ObjectMonitor is published with no owner. When encountering an ObjectMonitor with an anonymous owner the thread checks its lock stack to see if it is the owner, in which case it removes the object from its lock stack and sets itself as the owner of the ObjectMonitor along with fixing the recursion level to correspond to the number of removed lock stack entires. ## Inflated Locking with table (UseObjectMonitorTable) Because publishing the ObjectMonitor* and signaling that a object's monitor is inflated is not atomic, more care must be taken (in the presence of deflation) so that all threads agree on which ObjectMonitor* to use. When encountering an ObjectMonitor with an anonymous owner the thread checks its lock stack to see if it is the owner, in which case it removes the object from its lock stack and sets itself as the owner of the ObjectMonitor along with fixing the recursion level to correspond to the number of removed lock stack entires. All complications arise from deflation, or the process of disassociating an ObjectMonitor from its Java Object. So first the mechanism used for deflation is explained. Followed by retrieval and creation of ObjectMonitors. ### Deflation An ObjectMonitor can only be deflated if it has no owner, its queues are empty and no thread is in a scope where it has incremented and checked the contentions reference counter. The interactions between deflation and wait is handled by having the owner and wait queue entry overlap to blocks out deflation; the wait queue entry is protected by a waiters reference counter which is only modified by the waiters while holding the monitor, incremented before exiting the monitor and decremented after reentering the monitor. For enter and exit where the deflator may observe empty queues and no owner a two step mechanism is used to synchronize deflation with concurrently locking threads; deflation is synchronized using the contentions reference counter. In the text below we refer to "holding the contentions reference counter". This means that a thread has incremented the contentions reference counter and verified that it is not negative. ```c++ if (Atomic::fetch_and_add(&monitor->_contentions, 1) >= 0) { // holding the contentions reference counter } Atomic::decrement(&monitor->_contentions); ``` #### Deflation protocol The first step for the deflator is to try and CAS the owner from no owner to a special marker (DEFLATER_MARKER). If this is successful it blocks any entering thread from successfully installing themselves as the owner and causes compiled code to take a slow path and call into the runtime. The second step for the deflator is to check waiters reference counter and if it is 0 try CAS the contentions reference counter from 0 to a large negative value (INT_MIN). If this succeeds the monitor is deflated. The deflator does not have to check the entry queues because every thread on the entry queues must have either hold the contentions reference counter, or incremented the waiters reference counter, in the case they were moved from the wait queue to the entry queues by a notify. The deflator check the waiters reference counter, with the memory ordering of Waiter: { increment waiters reference counter; release owner }, Deflator: { acquire owner; check waiters reference counter }. All threads on the entry queues or wait queue invariantly holds the contentions reference counter or the waiters reference counter. #### Deflation cleanup If deflation succeeds, locking bits are then transitioned back to 0b01 (Unlocked). With UseObjectMonitorTable it is required that this is done by the deflator, or it could lead to ABA problems in the locking bits. Without the table the whole ObjectMonitor* is part of the markWord transition, with its pointer being phased out of the system with a handshake, making every value distinguishable and avoiding ABA issues. For UseObjectMonitorTable the deflated monitor is also removed from the table. This is done after transitioning the markWord to allow concurrently entering threads to fast lock on the object while the monitor is being removed from the hash table. If deflation fails after the marker (DEFLATER_MARKER) has been CASed into the owner field the owner must be restored. From the deflation threads point of view it is as simple as CASing from the marker to no owner. However to not have all threads depend on the deflation thread making progress here we allow any thread to CAS from the marker if that thread has both incremented and checked the contentions counter. This thread has now effectively canceled the deflation, but it is important that the deflator observes this fact, we do this by forgetting to decrement the contentions counter. The effect is that the contentions CAS will fail, which will force the deflator to try and restore the owner, but this will also fail because it got canceled. So the deflator decrements the contentions counter instead on behalf of the canceling thread to balance the reference counting. (Currently this is implemented by doing a +1 +1 -1 reference count on the locking thread, but a simple only +1 would s uffice). ### Retrieve ObjectMonitor #### HashTable Maintains a mapping between Java Objects and ObjectMonitors. Lookups are done via the objects identity_hash. If the hash table contains an ObjectMonitor for a specific object then that ObjectMonitor is used for locking unless it is being deflated. Only deflation removes (not dead) entries inside the HashTable. #### ThreadLocal Cache (UseObjectMonitorTable) The most recently locked ObjectMonitors by a thread are cached in that thread's local storage. These are used to elide hash table lookups. These caches uses raw oops to make cache lookups trivial. However this requires special handling of the cache at safepoints. The caches are cleared when a safepoint is triggered (instead of letting the gc visit them), this to avoid keeping cache entries as gc roots. These cache entires may become deflated, but locking on such a monitor still participates in the normal deflation protocol. Because these entries are cleared during a safepoint, the handshake performed by monitor deflation to phase out ObjectMonitor* from the system will also phase these out. #### StackLocal Cache Each monitorenter has a corresponding BasicLock entry on the stack. Each successful inflated monitorenter saves the ObjectMonitor* inside this BasicLock entry and retrieves it when performing the corresponding monitorexit. This means it is important that the BasicLock entry is always initialized to a known state (nullptr is used). The RAII object class CacheSetter is used to ensure that the BasicLock gets initialized before leaving the runtime code, and that both caches gets updated correctly. (Only once, with the same locked ObjectMonitor). The cache entries are set when a monitor is entered and never used again after a that monitored has been exited. So there are no interactions with deflation here. Similarly these caches does not track the associated oop, but rely on the fact that the same BasicLock data created for a monitorenter is used when executing the corresponding monitorexit. ### Creating ObjectMonitor If retrieval of the ObjectMonitor fails, because there is no ObjectMonitor, either because this is the first time inflating or the ObjectMonitor has been deflated a new ObjectMonitor must be created and associated with the object. The inflating thread will then attempt to insert a newly created ObjectMonitor in the hash table. The important invariant is that any ObjectMonitor inserted must have an anonymous owner (setting _owner to ANONYMOUS_OWNER). This solves the issue of not being able to atomically inserting the ObjectMonitor in the hash table, and transitioning the markWord to 0b10 (Inflated). We instead have all inflating threads insert an identical anonymously owned ObjectMonitor in the table and then decide ownership based on how the markWord is transitioned to 0b10 (Inflated). Note: Only one ObjectMonitor can be inserted. This also has the effect of blocking deflation on a newly inserted ObjectMonitor, until the contentions reference counter can be incremented. The contentions reference counter is held while transitioning the markWord to block out deflation. * If a thread observes 0b10 (Inflated) * If the current thread is the thread that fast locked, take ownership. Update ObjectMonitor _recursions based on fast locked recursions. Call ObjectMonitor::enter(current); * Otherwise Some other thread is the owner, and will claim ownership. Call ObjectMonitor::enter(current); * If a thread succeeds with the CAS to 0b10 (Inflated) * From 0b00 (Fast Locked) * If the current thread is the thread that fast locked, take ownership. Update ObjectMonitor _recursions based on fast locked recursions. Call ObjectMonitor::enter(current); * Otherwise Some other thread is the owner, and will claim ownership. Call ObjectMonitor::enter(current); * From 0b01 (Unlocked) * Claim ownership, no ObjectMonitor::enter is required. * If a thread fails the CAS reload markWord and retry ### Un-contended Inflated Locking CAS on _owner field in ObjectMonitor. JavaThread* (Locked By Thread) <--> nullptr (Unlocked) ### Contended Inflated Locking Blocks out deflation. Spin CAS on _owner field in ObjectMonitor. JavaThread* (Locked By Thread) <--> nullptr (Unlocked) Details in ObjectMonitor.hpp ### HashTable Resizing and Cleanup Resizing is currently handled with the similar logic to what the string and symbol table uses. And is delegated to the ServiceThread. The goal is to eventually this to deflation thread, to allow for better interactions with the deflation cycles, making it possible to also shrink the table. But this will be done incrementally as a separate enhancement. The ServiceThread is currently used to deal with the fact that we currently allow the deflation thread to be turned off via JVM options. Cleanup is mostly handled by the the deflator which actively removes deflated monitors, which includes monitors for dead objects. However we allow any thread to remove dead objects' ObjectMonitor* associations. But actual memory reclamation of the ObjectMonitor is always handled by the deflator. The table is currently initialized before `init_globals`, as such the max size of the table which is based on `MaxHeapSize` may be incorrect because it is not yet finalized. ------------- Commit messages: - 8315884: New Object to ObjectMonitor mapping Changes: https://git.openjdk.org/jdk/pull/20067/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8315884 Stats: 3613 lines in 70 files changed: 2700 ins; 313 del; 600 mod Patch: https://git.openjdk.org/jdk/pull/20067.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067 PR: https://git.openjdk.org/jdk/pull/20067 From alanb at openjdk.org Mon Jul 8 08:30:33 2024 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 8 Jul 2024 08:30:33 GMT Subject: RFR: 8335269: [Graal] occasional timeout in java/lang/StringBuffer/TestSynchronization.java with loom In-Reply-To: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com> References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com> Message-ID: <9E_nLqk5ThBlynnp1khLmG1iislzXOY8eH0VV3J1itA=.a776501d-b4a3-4a2f-8162-eb1c536a7839@github.com> On Wed, 3 Jul 2024 19:54:46 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote: > Please review the following simple fix. A pinned virtual thread calling Thread.yield() in a loop might never poll for safepoints if the compiler relies on a poll in native method Continuation.doYield while optimizing. This is a special native method that doesn't always poll for safepoints, and in particular it doesn't if the virtual thread is pinned due to owning monitors. Currently this scenario can be reproduced with the Graal compiler. > > I included a test which reproduces the issue with Graal (couldn't reproduce the issue with c2). The test times out without the fix and passes with it. I also run the patch through mach5 tiers1-3. > > Thanks, > Patricio test/jdk/java/lang/Thread/virtual/ThreadYield.java line 29: > 27: * @summary Test that Thread.yield loop polls for safepoints > 28: * @requires vm.continuations > 29: * @modules java.base/java.lang:+open I assume the `@modules` isn't needed as this test doesn't need to open java.lang. test/jdk/java/lang/Thread/virtual/ThreadYield.java line 47: > 45: import static org.junit.jupiter.api.Assertions.*; > 46: > 47: class ThreadYield { This isn't a unit test for Thread.yield so I think it would be better to rename to something specific like ThreadYieldPollsSafepoint (or better name). test/jdk/java/lang/Thread/virtual/ThreadYield.java line 49: > 47: class ThreadYield { > 48: static void foo(AtomicBoolean done) { > 49: synchronized (done) { When this test makes it to the loom repo then we'll need to change it to pin by other means. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1668202884 PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1668202828 PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1668207758 From aph at openjdk.org Mon Jul 8 08:46:36 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 8 Jul 2024 08:46:36 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v9] In-Reply-To: <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com> Message-ID: <u9kWsTkbNZZ5_D9E9EENwp-S5xBjQYdAXr4LHDI9VeU=.99f598ab-ca7d-40bd-adc0-b2f082ad6c59@github.com> On Mon, 1 Jul 2024 16:54:55 GMT, Hamlin Li <mli at openjdk.org> wrote: >> Hi, >> Can you help to review the patch? >> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294). >> >> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time. >> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk. >> >> Besides of the code changes, one important task is to handle the legal process. >> >> Thanks! >> >> ## Test >> tests: >> * test/jdk/jdk/incubator/vector/ >> * test/hotspot/jtreg/compiler/vectorapi/ >> >> options: >> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:+EnableVectorSupport -XX:-UseVectorStubs >> >> ## Performance >> >> ### Options >> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' >> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' >> >> ### Float >> data >> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> >> Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0) >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408 >> Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876 >> Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936 >> Float128Vector.ATAN2 | 1024 | thrpt | 10 | 0.021 | ops/ms | 135.088 | 32.449 | 4.163 | 135.721 | 32.579 | 4.166 >> Float128Vector.CBRT | 1024 | thrpt | 10 | 0.004 | ops/ms | 114.547 | 39.517 | 2.... > > Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits: > > - Merge branch 'master' into sleef-aarch64-integrate-source > - merge master > - sleef 3.6.1 for riscv > - sleef 3.6.1 > - update header files for arm > - add inline header file for riscv64 > - remove notes about sleef changes > - fix performance issue > - disable unused-function warnings; add log msg > - minor > - ... and 23 more: https://git.openjdk.org/jdk/compare/2f4f6cc3...b54fc863 That doesn't work. Running tests using MICRO control variable 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' Unknown test selection: 'org.openjdk.bench.jdk.incubator.vector.operation.Float' ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2213389656 From david.holmes at oracle.com Mon Jul 8 08:48:52 2024 From: david.holmes at oracle.com (David Holmes) Date: Mon, 8 Jul 2024 18:48:52 +1000 Subject: Where does VMError::print_native_stack and os::get_sender_for_C_frame load/use the frame pointer? In-Reply-To: <CAP2b4GNPq3Fr3X=v=_8nFLwYTbC7e=0N5Xd2i2jOvXfqqftCrQ@mail.gmail.com> References: <CAP2b4GNPq3Fr3X=v=_8nFLwYTbC7e=0N5Xd2i2jOvXfqqftCrQ@mail.gmail.com> Message-ID: <cb2380cc-11ad-4f89-a0f2-92281ad7d5a0@oracle.com> On 8/07/2024 5:59 pm, Julian Waters wrote: > Hi David, > > Ah, I think you misunderstood me, I'm aware that the frame pointer is > saved as required by the compiler (With the exception of the Microsoft > compiler, which doesn't save it at all). What I meant was that the > comments in Windows code imply that VMError::print_native_stack and > os::get_sender_for_C_frame need to use the frame pointer, yet I can't > seem to find where or how either of them obtain the frame pointer for > whatever they use it for on platforms and compilers where the frame > pointer is saved (For instance, on Linux), whether through handwritten > assembly code or some other means. It follows that if they need to use > the frame pointer, then they must grab it from somewhere, after all Ah sorry. AFAICS we just create the frame() objects and wallk the stack via those. We use fetch_frame_from_context to kick things off in the case of a crash. David > best regards, > Julian From mli at openjdk.org Mon Jul 8 09:27:34 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 8 Jul 2024 09:27:34 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v9] In-Reply-To: <u9kWsTkbNZZ5_D9E9EENwp-S5xBjQYdAXr4LHDI9VeU=.99f598ab-ca7d-40bd-adc0-b2f082ad6c59@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com> <u9kWsTkbNZZ5_D9E9EENwp-S5xBjQYdAXr4LHDI9VeU=.99f598ab-ca7d-40bd-adc0-b2f082ad6c59@github.com> Message-ID: <GMhDa917-zWp4z0VZGYxdTruAnjh3gmrt-qhFQ9AS1s=.be41e581-a79c-4f75-bb61-f5a14db05a88@github.com> On Mon, 8 Jul 2024 08:43:34 GMT, Andrew Haley <aph at openjdk.org> wrote: > That doesn't work. > > ``` > Running tests using MICRO control variable 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' > Unknown test selection: 'org.openjdk.bench.jdk.incubator.vector.operation.Float' > ``` I think by copying the Float*.java and dependent files under test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector repo can resolve the issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2213501489 From duke at openjdk.org Mon Jul 8 09:42:31 2024 From: duke at openjdk.org (Thomas Wuerthinger) Date: Mon, 8 Jul 2024 09:42:31 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping In-Reply-To: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> Message-ID: <LjKHmY3_Qbp5xirMG8E6xo-Dqv2XLblJaWmGKw2-MF4=.2765e99c-dc5f-42fb-baad-56ae8295dba4@github.com> On Mon, 8 Jul 2024 08:18:42 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: > When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. > > This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. > > A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). > > This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). > > Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. > > # Cleanups > > Cleaned up displaced header usage for: > * BasicLock > * Contains some Zero changes > * Renames one exported JVMCI field > * ObjectMonitor > * Updates comments and tests consistencies > > # Refactoring > > `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. > > The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. > > _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ > > # LightweightSynchronizer > > Working on adapting and incorporating the following section as a comment in the source code > > ## Fast Locking > > CAS on locking bits in markWord. > 0b00 (Fast Locked) <--> 0b01 (Unlocked) > > When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. > > If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf... Is this change expected to require JVMCI and/or Graal JIT changes? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20067#issuecomment-2213534062 From aboldtch at openjdk.org Mon Jul 8 10:14:32 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 8 Jul 2024 10:14:32 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping In-Reply-To: <LjKHmY3_Qbp5xirMG8E6xo-Dqv2XLblJaWmGKw2-MF4=.2765e99c-dc5f-42fb-baad-56ae8295dba4@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <LjKHmY3_Qbp5xirMG8E6xo-Dqv2XLblJaWmGKw2-MF4=.2765e99c-dc5f-42fb-baad-56ae8295dba4@github.com> Message-ID: <YqFykA6XLEjqokNUW4xO7Rlu9yv7lbvuWRKtSr8uEio=.6746b15f-26ce-4993-a1c0-bb56a6ed3147@github.com> On Mon, 8 Jul 2024 09:39:32 GMT, Thomas Wuerthinger <duke at openjdk.org> wrote: > Is this change expected to require JVMCI and/or Graal JIT changes? Support for `UseObjectMonitorTable` would require changes to Graal JIT. (`UseObjectMonitorTable` is off by default). Minimal support would be to call into the VM for inflated monitors. (Similarly to what this patch does for C2 for none x86 / aarch64 platforms). For starting the VM normally without `UseObjectMonitorTable` no semantic change is required. All locking modes and VM invariants w.r.t. locking are the same. As mentioned this patch contains a refactoring which renames one exported `JVMCI` symbol which I suspect should only be used by Graal JIT for `LM_LEGACY`. As such the Graal JIT needs to be updated to use this new symbol name. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20067#issuecomment-2213602121 From shade at openjdk.org Mon Jul 8 10:35:36 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 Jul 2024 10:35:36 GMT Subject: RFR: 8334231: Optimize MethodData layout In-Reply-To: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com> References: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com> Message-ID: <QdFHYY07hwmxrteM-UEmCUepyqyWWi4IP7uEBsqFBRs=.790c912c-009b-44f3-994f-93082ec22afc@github.com> On Thu, 4 Jul 2024 00:08:35 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote: > Hi all, > This PR is a part of https://bugs.openjdk.org/browse/JDK-8334227 to optimize Hotspot C++ class layouts, this one is for the layout of MethodData. Here is the original layout from `pahole`: > > class MethodData : public Metadata { > public: > > /* class Metadata <ancestor>; */ /* 0 0 */ > > /* XXX 8 bytes hole, try to pack */ > > class Method * _method; /* 8 8 */ > int _size; /* 16 4 */ > int _hint_di; /* 20 4 */ > class Mutex _extra_data_lock; /* 24 104 */ > /* --- cacheline 2 boundary (128 bytes) --- */ > class CompilerCounters _compiler_counters; /* 128 80 */ > /* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */ > intx _eflags; /* 208 8 */ > intx _arg_local; /* 216 8 */ > intx _arg_stack; /* 224 8 */ > intx _arg_returned; /* 232 8 */ > int _creation_mileage; /* 240 4 */ > class InvocationCounter _invocation_counter; /* 244 4 */ > class InvocationCounter _backedge_counter; /* 248 4 */ > int _invocation_counter_start; /* 252 4 */ > /* --- cacheline 4 boundary (256 bytes) --- */ > int _backedge_counter_start; /* 256 4 */ > uint _tenure_traps; /* 260 4 */ > int _invoke_mask; /* 264 4 */ > int _backedge_mask; /* 268 4 */ > short int _num_loops; /* 272 2 */ > short int _num_blocks; /* 274 2 */ > enum WouldProfile _would_profile; /* 276 4 */ > int _jvmci_ir_size; /* 280 4 */ > > /* XXX 4 bytes hole, try to pack */ > > class FailedSpeculation * _failed_speculations; /* 288 8 */ > int _data_size; /* 296 4 */ > int _parameters_type_data_di; /* 300 4 */ > int _exception_handler_data_di; /* 304 4 */ > > /* XXX 4 bytes hole, try to pack */ > > intptr_t _data[1]; /* 312 8 */ > > /* size: 320, cachelines: 5, members: 27 */ > /* sum members: 304, holes: 3, sum holes: 16 */ > }; > > > There are 3 holes ... This looks fine to me. I think Thomas' comment was generally about handling the trivial PRs, which does not hold this PR from integration. The patch is simple, there had been enough eyes on this, and so we can just integrate. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20019#pullrequestreview-2162912408 From xpeng at openjdk.org Mon Jul 8 10:35:36 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 8 Jul 2024 10:35:36 GMT Subject: Integrated: 8334231: Optimize MethodData layout In-Reply-To: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com> References: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com> Message-ID: <r_mjQmOGKGHi0of5r2BdPN0KHldq-g4IaaCADyel1DA=.5dd714ba-8689-4a5c-af97-9c0f0320c51b@github.com> On Thu, 4 Jul 2024 00:08:35 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote: > Hi all, > This PR is a part of https://bugs.openjdk.org/browse/JDK-8334227 to optimize Hotspot C++ class layouts, this one is for the layout of MethodData. Here is the original layout from `pahole`: > > class MethodData : public Metadata { > public: > > /* class Metadata <ancestor>; */ /* 0 0 */ > > /* XXX 8 bytes hole, try to pack */ > > class Method * _method; /* 8 8 */ > int _size; /* 16 4 */ > int _hint_di; /* 20 4 */ > class Mutex _extra_data_lock; /* 24 104 */ > /* --- cacheline 2 boundary (128 bytes) --- */ > class CompilerCounters _compiler_counters; /* 128 80 */ > /* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */ > intx _eflags; /* 208 8 */ > intx _arg_local; /* 216 8 */ > intx _arg_stack; /* 224 8 */ > intx _arg_returned; /* 232 8 */ > int _creation_mileage; /* 240 4 */ > class InvocationCounter _invocation_counter; /* 244 4 */ > class InvocationCounter _backedge_counter; /* 248 4 */ > int _invocation_counter_start; /* 252 4 */ > /* --- cacheline 4 boundary (256 bytes) --- */ > int _backedge_counter_start; /* 256 4 */ > uint _tenure_traps; /* 260 4 */ > int _invoke_mask; /* 264 4 */ > int _backedge_mask; /* 268 4 */ > short int _num_loops; /* 272 2 */ > short int _num_blocks; /* 274 2 */ > enum WouldProfile _would_profile; /* 276 4 */ > int _jvmci_ir_size; /* 280 4 */ > > /* XXX 4 bytes hole, try to pack */ > > class FailedSpeculation * _failed_speculations; /* 288 8 */ > int _data_size; /* 296 4 */ > int _parameters_type_data_di; /* 300 4 */ > int _exception_handler_data_di; /* 304 4 */ > > /* XXX 4 bytes hole, try to pack */ > > intptr_t _data[1]; /* 312 8 */ > > /* size: 320, cachelines: 5, members: 27 */ > /* sum members: 304, holes: 3, sum holes: 16 */ > }; > > > There are 3 holes ... This pull request has now been integrated. Changeset: c5a668bb Author: Xiaolong Peng <xpeng at openjdk.org> Committer: Aleksey Shipilev <shade at openjdk.org> URL: https://git.openjdk.org/jdk/commit/c5a668bb653feb3408a9efa3274ceabf9f01a2c7 Stats: 3 lines in 1 file changed: 1 ins; 2 del; 0 mod 8334231: Optimize MethodData layout Reviewed-by: dholmes, chagedorn, shade ------------- PR: https://git.openjdk.org/jdk/pull/20019 From duke at openjdk.org Mon Jul 8 11:01:32 2024 From: duke at openjdk.org (Thomas Wuerthinger) Date: Mon, 8 Jul 2024 11:01:32 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping In-Reply-To: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> Message-ID: <4BqqOjbfNOV6NjtPe-hIf-98N8kT6ce_FBCuQ-vqBBY=.6e02875c-908f-43f0-8d66-ba4f5b01d488@github.com> On Mon, 8 Jul 2024 08:18:42 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: > When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. > > This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. > > A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). > > This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). > > Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. > > # Cleanups > > Cleaned up displaced header usage for: > * BasicLock > * Contains some Zero changes > * Renames one exported JVMCI field > * ObjectMonitor > * Updates comments and tests consistencies > > # Refactoring > > `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. > > The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. > > _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ > > # LightweightSynchronizer > > Working on adapting and incorporating the following section as a comment in the source code > > ## Fast Locking > > CAS on locking bits in markWord. > 0b00 (Fast Locked) <--> 0b01 (Unlocked) > > When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. > > If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf... OK. Will there be a CSR or JEP for this? When do you approximately expect this to land in main line? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20067#issuecomment-2213689308 From aboldtch at openjdk.org Mon Jul 8 11:55:33 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 8 Jul 2024 11:55:33 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping In-Reply-To: <4BqqOjbfNOV6NjtPe-hIf-98N8kT6ce_FBCuQ-vqBBY=.6e02875c-908f-43f0-8d66-ba4f5b01d488@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <4BqqOjbfNOV6NjtPe-hIf-98N8kT6ce_FBCuQ-vqBBY=.6e02875c-908f-43f0-8d66-ba4f5b01d488@github.com> Message-ID: <RdSvPsChCFViQ2aqHZFfZyF-2G1TWV0smRdto9q7jmY=.bfb714a5-5882-43bf-96a0-5260024454b7@github.com> On Mon, 8 Jul 2024 10:58:29 GMT, Thomas Wuerthinger <duke at openjdk.org> wrote: > OK. Will there be a CSR or JEP for this? There is no plan for this, nor should it be required. It?s an internal implementation. > When do you approximately expect this to land in main line? ASAP. Compatibility for the field name is being worked on in Graal JIT. The plan is not to integrate this prior to this work being completed. We should probably add a more graceful transition. Such that `UseObjectMonitorTable` is turned off ergonomically even if the user specified it when running with JVMCI enabled. The main goal here is to get something in main line which does not affect the default behaviour of the VM. But allows for supporting future features such as Lilliput, along with enabling incremental improvement to `UseObjectMonitorTable` via support for more platforms and compiler backends, performance improvements, etc to be done towards the JDK project. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20067#issuecomment-2213806034 From shade at openjdk.org Mon Jul 8 11:58:34 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 Jul 2024 11:58:34 GMT Subject: RFR: 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 [v3] In-Reply-To: <uFD2HVD2DS8b9XI68lOXqSyyT3gdfmNFXmYIUozJ3hc=.f5aa1a99-e90e-4f5e-9159-c9724205fbd9@github.com> References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> <uFD2HVD2DS8b9XI68lOXqSyyT3gdfmNFXmYIUozJ3hc=.f5aa1a99-e90e-4f5e-9159-c9724205fbd9@github.com> Message-ID: <FkDjtpGwpMtPg7NxC6vDwFgkBfK_t3noWiVmR0V5Tjk=.12715c7e-4a90-4ba9-9be2-2486d5ff77de@github.com> On Fri, 5 Jul 2024 15:01:05 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote: >> The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens. >> >> The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details). >> >> The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. >> >> I tested the patch by running it through mach5 tiers 1-6. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > use DEBUG_ONLY on _used declaration Looks fine. I am tracking this for backport to 21.0.5, which already got the `ResourceMark` in `frame::oops_interpreted_do` due to JDK-8329665 backport. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20012#pullrequestreview-2163081402 From aboldtch at openjdk.org Mon Jul 8 12:13:07 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 8 Jul 2024 12:13:07 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v2] In-Reply-To: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> Message-ID: <vy20BNZfIGQRLPRQrXH9R3pE6nUCY5kKNBDCtyhg0Y4=.f6fa0bd2-42ed-4a9b-a08e-ef56c54e8e48@github.com> > When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. > > This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. > > A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). > > This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). > > Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. > > # Cleanups > > Cleaned up displaced header usage for: > * BasicLock > * Contains some Zero changes > * Renames one exported JVMCI field > * ObjectMonitor > * Updates comments and tests consistencies > > # Refactoring > > `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. > > The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. > > _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ > > # LightweightSynchronizer > > Working on adapting and incorporating the following section as a comment in the source code > > ## Fast Locking > > CAS on locking bits in markWord. > 0b00 (Fast Locked) <--> 0b01 (Unlocked) > > When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. > > If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf... Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: More graceful JVMCI VM option interaction ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20067/files - new: https://git.openjdk.org/jdk/pull/20067/files/4d835b94..28143503 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=00-01 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20067.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067 PR: https://git.openjdk.org/jdk/pull/20067 From duke at openjdk.org Mon Jul 8 12:18:34 2024 From: duke at openjdk.org (Thomas Wuerthinger) Date: Mon, 8 Jul 2024 12:18:34 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v2] In-Reply-To: <vy20BNZfIGQRLPRQrXH9R3pE6nUCY5kKNBDCtyhg0Y4=.f6fa0bd2-42ed-4a9b-a08e-ef56c54e8e48@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <vy20BNZfIGQRLPRQrXH9R3pE6nUCY5kKNBDCtyhg0Y4=.f6fa0bd2-42ed-4a9b-a08e-ef56c54e8e48@github.com> Message-ID: <15fwghNOC4j6Hctnqn7sVKsRFbHGs7mwcgpadcok1qo=.63b2a07a-2fe0-4557-843d-f6b131e37a09@github.com> On Mon, 8 Jul 2024 12:13:07 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > More graceful JVMCI VM option interaction OK, thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20067#issuecomment-2213887069 From stuefe at openjdk.org Mon Jul 8 12:33:35 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 8 Jul 2024 12:33:35 GMT Subject: RFR: 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 [v3] In-Reply-To: <uFD2HVD2DS8b9XI68lOXqSyyT3gdfmNFXmYIUozJ3hc=.f5aa1a99-e90e-4f5e-9159-c9724205fbd9@github.com> References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> <uFD2HVD2DS8b9XI68lOXqSyyT3gdfmNFXmYIUozJ3hc=.f5aa1a99-e90e-4f5e-9159-c9724205fbd9@github.com> Message-ID: <IqgSXiXCZOj9b0mibWQ2BWb4qzDuNwXqgcix6pd0QxA=.ac7cde2a-45c5-46b4-9208-6a2c25346614@github.com> On Fri, 5 Jul 2024 15:01:05 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote: >> The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens. >> >> The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details). >> >> The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. >> >> I tested the patch by running it through mach5 tiers 1-6. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > use DEBUG_ONLY on _used declaration Small nits, otherwise good. Thanks a lot for fixing. src/hotspot/share/interpreter/oopMapCache.cpp line 183: > 181: if (mask_size() > small_mask_limit) { > 182: assert(!Thread::current()->resource_area()->contains((void*)_bit_mask[0]), > 183: "The bit mask should be allocated from the C heap"); Arguably, this assert is not needed. In debug builds, we have NMT enabled, and that does a check on os::free. However, an assert that _bit_mask[0] != 0 *would* make sense, since the free quielty swallows null pointers. src/hotspot/share/interpreter/oopMapCache.cpp line 405: > 403: // Implementation of OopMapCache > 404: > 405: void InterpreterOopMap::copy_from(OopMapCacheEntry* src) { Possibly for another RFE: src pointer should be const src/hotspot/share/interpreter/oopMapCache.cpp line 423: > 421: } else { > 422: _bit_mask[0] = (uintptr_t) NEW_C_HEAP_ARRAY(uintptr_t, mask_word_size(), mtClass); > 423: assert(_bit_mask[0] != 0, "bit mask was not allocated"); The assert can be removed, no? NEW_C_HEAP_ARRAY does a null check by default. src/hotspot/share/interpreter/oopMapCache.cpp line 424: > 422: _bit_mask[0] = (uintptr_t) NEW_C_HEAP_ARRAY(uintptr_t, mask_word_size(), mtClass); > 423: assert(_bit_mask[0] != 0, "bit mask was not allocated"); > 424: memcpy((void*) _bit_mask[0], (void*) src->_bit_mask[0], mask_word_size() * BytesPerWord); Are the (void*) cast really needed? src/hotspot/share/interpreter/oopMapCache.hpp line 92: > 90: > 91: protected: > 92: DEBUG_ONLY(bool _used;) Minor nit. This changes memory layout between debug and release builds, and this is used as part of OopMapCache. Not a big concern, but I usually prefer having the same layout between debug and release to test what we ship. Can't we not just assert that mask size == USHRT_MAX? ------------- PR Review: https://git.openjdk.org/jdk/pull/20012#pullrequestreview-2163133516 PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1668523910 PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1668536249 PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1668539154 PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1668535458 PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1668531734 From luhenry at openjdk.org Mon Jul 8 13:06:35 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 8 Jul 2024 13:06:35 GMT Subject: RFR: 8334999: RISC-V: implement AES single block encryption/decryption intrinsics [v2] In-Reply-To: <eGRQlTfJGvdSd84lJn1MUGon75zsDTYTOhMbVqQryC8=.3cff42c0-7b5c-4870-929e-3acfa74e31bd@github.com> References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com> <eGRQlTfJGvdSd84lJn1MUGon75zsDTYTOhMbVqQryC8=.3cff42c0-7b5c-4870-929e-3acfa74e31bd@github.com> Message-ID: <J2n1vjnGIfkPollglwm_WzuJcFmQrvx-QvzMxKHQdEA=.f25df6b1-4a3f-4a58-8640-0f3fe2e94001@github.com> On Sun, 7 Jul 2024 15:16:02 GMT, ArsenyBochkarev <duke at openjdk.org> wrote: >> Hello everyone! Please review this port of vector AES single block encryption/decryption intrinsics. On my QEMU with `Zvkned` extension enabled the `test/hotspot/jtreg/compiler/codegen/aes/TestAESMain.java` test is OK. I know that currently hardware implementing this extension is not available on the market but I suppose this PR can be a good starting point on supporting AES intrinsics for RISC-V in OpenJDK. > > ArsenyBochkarev has updated the pull request incrementally with three additional commits since the last revision: > > - Use t2 directly instead of temp2 > - Rename temp1 -> x0 > - Left a note on a side effect of generate_vle32_pack4 src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2360: > 2358: generate_aescrypt_round(res, vzero, vtmp1, vtmp2, vtmp3, vtmp4); > 2359: > 2360: generate_vle32_pack2(key, vtmp1, vtmp2); Could you add the comment for `key` here as well. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2370: > 2368: __ vaesem_vv(res, vzero); > 2369: > 2370: generate_vle32_pack2(key, vtmp1, vtmp2); And here as well for `key`. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2380: > 2378: __ vaesem_vv(res, vzero); > 2379: > 2380: generate_vle32_pack2(key, vtmp1, vtmp2); Here as well for `key`. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2465: > 2463: generate_aesdecrypt_round(res, vzero, vtmp1, vtmp2, vtmp3, vtmp4); > 2464: > 2465: generate_vle32_pack2(key, vtmp1, vtmp2); Same here for `key`. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2476: > 2474: __ vaesdm_vv(res, vzero); > 2475: > 2476: generate_vle32_pack2(key, vtmp1, vtmp2); Same here for `key`. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2487: > 2485: __ vaesdm_vv(res, vzero); > 2486: > 2487: generate_vle32_pack2(key, vtmp1, vtmp2); Same here for `key`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1668308026 PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1668308210 PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1668308458 PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1668308689 PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1668308755 PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1668308837 From aph at openjdk.org Mon Jul 8 13:39:36 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 8 Jul 2024 13:39:36 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v9] In-Reply-To: <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com> Message-ID: <-85jb7zkPiyjtG47_knDVtXF5iVTYH8hgMD5BTW1AM0=.ced6fcc8-4a11-409c-85ba-00d30cc35d47@github.com> On Mon, 1 Jul 2024 16:54:55 GMT, Hamlin Li <mli at openjdk.org> wrote: >> Hi, >> Can you help to review the patch? >> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294). >> >> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time. >> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk. >> >> Besides of the code changes, one important task is to handle the legal process. >> >> Thanks! >> >> ## Test >> tests: >> * test/jdk/jdk/incubator/vector/ >> * test/hotspot/jtreg/compiler/vectorapi/ >> >> options: >> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:+EnableVectorSupport -XX:-UseVectorStubs >> >> ## Performance >> >> ### Options >> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' >> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' >> >> ### Float >> data >> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> >> Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0) >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408 >> Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876 >> Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936 >> Float128Vector.ATAN2 | 1024 | thrpt | 10 | 0.021 | ops/ms | 135.088 | 32.449 | 4.163 | 135.721 | 32.579 | 4.166 >> Float128Vector.CBRT | 1024 | thrpt | 10 | 0.004 | ops/ms | 114.547 | 39.517 | 2.... > > Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits: > > - Merge branch 'master' into sleef-aarch64-integrate-source > - merge master > - sleef 3.6.1 for riscv > - sleef 3.6.1 > - update header files for arm > - add inline header file for riscv64 > - remove notes about sleef changes > - fix performance issue > - disable unused-function warnings; add log msg > - minor > - ... and 23 more: https://git.openjdk.org/jdk/compare/2f4f6cc3...b54fc863 There is something that makes me nervous. The big slab of preprocessed code in libvectormath/sleefinline_rvvm1.h is problematic. Firstly, in all open source software the code should be the preferred form: "The source code must be the preferred form in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. Intermediate forms such as the output of a preprocessor or translator are not allowed." https://opensource.org/osd Also, any such intermediate form is a golden example of a vector in which to hide something nasty. No one is going to read that file, and a malicious person with access to the JDK source base, either in our own github repo or in many other places downstream of OpenJDK could hide all manner of thing. In its form in this PR it's no better than checking in a binary. See https://arstechnica.com/security/2024/04/what-we-know-about-the-xz-utils-backdoor-that-almost-infected-the-world/ I'd look at including the SLEEF source code, along with a script which generates the preprocessed form we use in the JDK build, so that more paranoid JDK builders can regenerate the preprocessed code. Of course, I cannot be sure that my fellow reviewers will agree, but I think it's the right thing to do. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2214099558 From erikj at openjdk.org Mon Jul 8 14:08:38 2024 From: erikj at openjdk.org (Erik Joelsson) Date: Mon, 8 Jul 2024 14:08:38 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v9] In-Reply-To: <-85jb7zkPiyjtG47_knDVtXF5iVTYH8hgMD5BTW1AM0=.ced6fcc8-4a11-409c-85ba-00d30cc35d47@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com> <-85jb7zkPiyjtG47_knDVtXF5iVTYH8hgMD5BTW1AM0=.ced6fcc8-4a11-409c-85ba-00d30cc35d47@github.com> Message-ID: <6WE1CCFfFAgdyHzI32vo1L2u3t5o6JQvl214RmPeho4=.6ebc52a3-d50f-4695-b950-9458f1d71d84@github.com> On Mon, 8 Jul 2024 13:36:36 GMT, Andrew Haley <aph at openjdk.org> wrote: > There is something that makes me nervous. The big slab of preprocessed code in libvectormath/sleefinline_rvvm1.h is problematic. Firstly, in all open source software the code should be the preferred form: > > "The source code must be the preferred form in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. Intermediate forms such as the output of a preprocessor or translator are not allowed." https://opensource.org/osd > > Also, any such intermediate form is a golden example of a vector in which to hide something nasty. No one is going to read that file, and a malicious person with access to the JDK source base, either in our own github repo or in many other places downstream of OpenJDK could hide all manner of thing. In its form in this PR it's no better than checking in a binary. See https://arstechnica.com/security/2024/04/what-we-know-about-the-xz-utils-backdoor-that-almost-infected-the-world/ > > I'd look at including the SLEEF source code, along with a script which generates the preprocessed form we use in the JDK build, so that more paranoid JDK builders can regenerate the preprocessed code. > > Of course, I cannot be sure that my fellow reviewers will agree, but I think it's the right thing to do. While I agree with you in principle, we chose to import Sleef this way for practical reasons. (The actual importing of Sleef is happening in https://github.com/openjdk/jdk/pull/19185 / [JDK-8329816](https://bugs.openjdk.org/browse/JDK-8329816).) The "preprocessing/code-generation" part of the Sleef build was considered too complex to reasonably replicate in the OpenJDK build system. Sleef is built using Cmake and we do not want to add a build dependency on Cmake and call out to a foreign build system at build time, for efficiency and complexity reasons. JDK-8329816 comes with a script to automatically generate the imported source files, to make it easy to update Sleef in the future. It should also be easy enough to verify the imported contents using the same script for anyone who wants to check the validity of the import step. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2214172864 From yzheng at openjdk.org Mon Jul 8 14:55:36 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 8 Jul 2024 14:55:36 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v2] In-Reply-To: <vy20BNZfIGQRLPRQrXH9R3pE6nUCY5kKNBDCtyhg0Y4=.f6fa0bd2-42ed-4a9b-a08e-ef56c54e8e48@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <vy20BNZfIGQRLPRQrXH9R3pE6nUCY5kKNBDCtyhg0Y4=.f6fa0bd2-42ed-4a9b-a08e-ef56c54e8e48@github.com> Message-ID: <MF3dxDlQF9N18kCQGi3Fym9NLujEcStavCSfIMckNB0=.5e9a9ea1-553f-4d2e-a94e-86c9c6178bd1@github.com> On Mon, 8 Jul 2024 12:13:07 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > More graceful JVMCI VM option interaction Could you please revert 2814350 and export the following symbols to JVMCI? diff --git a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp index faf2cb24616..7be31aa0f5f 100644 --- a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp +++ b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp @@ -241,6 +241,7 @@ nonstatic_field(JavaThread, _stack_overflow_state._reserved_stack_activation, address) \ nonstatic_field(JavaThread, _held_monitor_count, intx) \ nonstatic_field(JavaThread, _lock_stack, LockStack) \ + nonstatic_field(JavaThread, _om_cache, OMCache) \ JVMTI_ONLY(nonstatic_field(JavaThread, _is_in_VTMS_transition, bool)) \ JVMTI_ONLY(nonstatic_field(JavaThread, _is_in_tmp_VTMS_transition, bool)) \ JVMTI_ONLY(nonstatic_field(JavaThread, _is_disable_suspend, bool)) \ @@ -531,6 +532,8 @@ \ declare_constant_with_value("CardTable::dirty_card", CardTable::dirty_card_val()) \ declare_constant_with_value("LockStack::_end_offset", LockStack::end_offset()) \ + declare_constant_with_value("OMCache::oop_to_oop_difference", OMCache::oop_to_oop_difference()) \ + declare_constant_with_value("OMCache::oop_to_monitor_difference", OMCache::oop_to_monitor_difference()) \ \ declare_constant(CodeInstaller::VERIFIED_ENTRY) \ declare_constant(CodeInstaller::UNVERIFIED_ENTRY) \ ------------- PR Comment: https://git.openjdk.org/jdk/pull/20067#issuecomment-2214322632 From duke at openjdk.org Mon Jul 8 15:24:13 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Mon, 8 Jul 2024 15:24:13 GMT Subject: RFR: 8334999: RISC-V: implement AES single block encryption/decryption intrinsics [v3] In-Reply-To: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com> References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com> Message-ID: <F1yms2X9VVITjLPANuQqABre5E199ILHQ4ywpS4cicY=.3e2c0af1-8070-497a-bfa0-5732eb199974@github.com> > Hello everyone! Please review this port of vector AES single block encryption/decryption intrinsics. On my QEMU with `Zvkned` extension enabled the `test/hotspot/jtreg/compiler/codegen/aes/TestAESMain.java` test is OK. I know that currently hardware implementing this extension is not available on the market but I suppose this PR can be a good starting point on supporting AES intrinsics for RISC-V in OpenJDK. ArsenyBochkarev has updated the pull request incrementally with one additional commit since the last revision: Left a note on a side effect of generate_vle32_pack2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19960/files - new: https://git.openjdk.org/jdk/pull/19960/files/9f5c7831..8520bc3a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19960&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19960&range=01-02 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19960.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19960/head:pull/19960 PR: https://git.openjdk.org/jdk/pull/19960 From duke at openjdk.org Mon Jul 8 15:24:13 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Mon, 8 Jul 2024 15:24:13 GMT Subject: RFR: 8334999: RISC-V: implement AES single block encryption/decryption intrinsics [v2] In-Reply-To: <J2n1vjnGIfkPollglwm_WzuJcFmQrvx-QvzMxKHQdEA=.f25df6b1-4a3f-4a58-8640-0f3fe2e94001@github.com> References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com> <eGRQlTfJGvdSd84lJn1MUGon75zsDTYTOhMbVqQryC8=.3cff42c0-7b5c-4870-929e-3acfa74e31bd@github.com> <J2n1vjnGIfkPollglwm_WzuJcFmQrvx-QvzMxKHQdEA=.f25df6b1-4a3f-4a58-8640-0f3fe2e94001@github.com> Message-ID: <BghNsitWJWaTDOZjWSDL4t2zEkdsZv5UPmvoJyHH-w8=.3aa312f3-9024-4827-9761-d3ca475f4a58@github.com> On Mon, 8 Jul 2024 09:30:36 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >> ArsenyBochkarev has updated the pull request incrementally with three additional commits since the last revision: >> >> - Use t2 directly instead of temp2 >> - Rename temp1 -> x0 >> - Left a note on a side effect of generate_vle32_pack4 > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2360: > >> 2358: generate_aescrypt_round(res, vzero, vtmp1, vtmp2, vtmp3, vtmp4); >> 2359: >> 2360: generate_vle32_pack2(key, vtmp1, vtmp2); > > Could you add the comment for `key` here as well. All done! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1668842170 From aboldtch at openjdk.org Mon Jul 8 16:21:16 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 8 Jul 2024 16:21:16 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v3] In-Reply-To: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> Message-ID: <5CNKzDumOf1MJQXM9OBHQh0Mj7eLv2ONio1V-AXeSJI=.54302b45-2dd2-4f18-a094-6b2c6a59517c@github.com> > When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. > > This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. > > A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). > > This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). > > Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. > > # Cleanups > > Cleaned up displaced header usage for: > * BasicLock > * Contains some Zero changes > * Renames one exported JVMCI field > * ObjectMonitor > * Updates comments and tests consistencies > > # Refactoring > > `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. > > The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. > > _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ > > # LightweightSynchronizer > > Working on adapting and incorporating the following section as a comment in the source code > > ## Fast Locking > > CAS on locking bits in markWord. > 0b00 (Fast Locked) <--> 0b01 (Unlocked) > > When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. > > If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf... Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: - Add JVMCI symbol exports - Revert "More graceful JVMCI VM option interaction" This reverts commit 2814350370cf142e130fe1d38610c646039f976d. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20067/files - new: https://git.openjdk.org/jdk/pull/20067/files/28143503..173b75b8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=01-02 Stats: 8 lines in 2 files changed: 3 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20067.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067 PR: https://git.openjdk.org/jdk/pull/20067 From aph at openjdk.org Mon Jul 8 16:23:37 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 8 Jul 2024 16:23:37 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v9] In-Reply-To: <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com> Message-ID: <TcrB6zIH-yx-6fyLfnQy4NHk5w8VqXm3anTAxbQJtXY=.8181016f-5d4d-4349-a8d7-343db9817f40@github.com> On Mon, 1 Jul 2024 16:54:55 GMT, Hamlin Li <mli at openjdk.org> wrote: >> Hi, >> Can you help to review the patch? >> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294). >> * NOTE: This pr depends on https://github.com/openjdk/jdk/pull/19185, which includes a README, a script to generate sleef inline headers and generated sleef inline headers. >> >> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time. >> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk. >> >> Besides of the code changes, one important task is to handle the legal process. >> >> Thanks! >> >> ## Test >> tests: >> * test/jdk/jdk/incubator/vector/ >> * test/hotspot/jtreg/compiler/vectorapi/ >> >> options: >> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:+EnableVectorSupport -XX:-UseVectorStubs >> >> ## Performance >> >> ### Options >> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' >> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' >> >> ### Float >> data >> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> >> Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0) >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408 >> Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876 >> Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936 >> Float128Vector.ATAN... > > Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits: > > - Merge branch 'master' into sleef-aarch64-integrate-source > - merge master > - sleef 3.6.1 for riscv > - sleef 3.6.1 > - update header files for arm > - add inline header file for riscv64 > - remove notes about sleef changes > - fix performance issue > - disable unused-function warnings; add log msg > - minor > - ... and 23 more: https://git.openjdk.org/jdk/compare/2f4f6cc3...b54fc863 I finally did some measurements. It would be nice if the JMH test were part of this patch. It mostly looks good, but I can see an odd regression of DoubleMaxVector.TANH (by 39%) on Apple M1. I don't really know why this is, given that tanh(x) is almost certainly based on expm1(x). This probably isn't important, but it is odd. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2214587236 From ayang at openjdk.org Mon Jul 8 16:24:08 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 8 Jul 2024 16:24:08 GMT Subject: RFR: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and VM_ParallelGCSystemGC Message-ID: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com> Similar cleanup as https://github.com/openjdk/jdk/pull/19056 but in Parallel. As a result, the corresponding code in `SerialHeap` and `ParallelScavengeHeap` share much similarity. The easiest way to review is to start from these two VM operations, `VM_ParallelCollectForAllocation` and `VM_ParallelGCCollect` and follow the new code directly, where one can see how allocation-failure triggers various GCs with different collection efforts. Test: tier1-6; perf-neural for dacapo, specjvm2008, specjbb2015 and cachestresser. ------------- Commit messages: - pgc-vm-operation Changes: https://git.openjdk.org/jdk/pull/20077/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20077&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335902 Stats: 352 lines in 14 files changed: 96 ins; 169 del; 87 mod Patch: https://git.openjdk.org/jdk/pull/20077.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20077/head:pull/20077 PR: https://git.openjdk.org/jdk/pull/20077 From ayang at openjdk.org Mon Jul 8 16:31:43 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 8 Jul 2024 16:31:43 GMT Subject: RFR: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and VM_ParallelGCSystemGC [v2] In-Reply-To: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com> References: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com> Message-ID: <N4uBvRzIP52a4DgIeIx3ArjKPF0JrTI2bVsmHtD0rJg=.f7e1bb49-9bcd-420c-97fb-2617c798b5b7@github.com> > Similar cleanup as https://github.com/openjdk/jdk/pull/19056 but in Parallel. As a result, the corresponding code in `SerialHeap` and `ParallelScavengeHeap` share much similarity. > > The easiest way to review is to start from these two VM operations, `VM_ParallelCollectForAllocation` and `VM_ParallelGCCollect` and follow the new code directly, where one can see how allocation-failure triggers various GCs with different collection efforts. > > Test: tier1-6; perf-neural for dacapo, specjvm2008, specjbb2015 and cachestresser. Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: pgc-vm-operation ------------- Changes: https://git.openjdk.org/jdk/pull/20077/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20077&range=01 Stats: 352 lines in 14 files changed: 96 ins; 169 del; 87 mod Patch: https://git.openjdk.org/jdk/pull/20077.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20077/head:pull/20077 PR: https://git.openjdk.org/jdk/pull/20077 From aph at openjdk.org Mon Jul 8 16:43:37 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 8 Jul 2024 16:43:37 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v9] In-Reply-To: <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com> Message-ID: <eT48AR-Up7CyMkuiFet-hoQtyaO_hifCSZUQ6LJrjnQ=.026071f1-de0f-4589-a247-c7fc2afe68c4@github.com> On Mon, 1 Jul 2024 16:54:55 GMT, Hamlin Li <mli at openjdk.org> wrote: >> Hi, >> Can you help to review the patch? >> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294). >> * NOTE: This pr depends on https://github.com/openjdk/jdk/pull/19185, which includes a README, a script to generate sleef inline headers and generated sleef inline headers. >> >> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time. >> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk. >> >> Besides of the code changes, one important task is to handle the legal process. >> >> Thanks! >> >> ## Test >> tests: >> * test/jdk/jdk/incubator/vector/ >> * test/hotspot/jtreg/compiler/vectorapi/ >> >> options: >> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:+EnableVectorSupport -XX:-UseVectorStubs >> >> ## Performance >> >> ### Options >> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' >> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' >> >> ### Float >> data >> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> >> Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0) >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408 >> Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876 >> Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936 >> Float128Vector.ATAN... > > Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits: > > - Merge branch 'master' into sleef-aarch64-integrate-source > - merge master > - sleef 3.6.1 for riscv > - sleef 3.6.1 > - update header files for arm > - add inline header file for riscv64 > - remove notes about sleef changes > - fix performance issue > - disable unused-function warnings; add log msg > - minor > - ... and 23 more: https://git.openjdk.org/jdk/compare/2f4f6cc3...b54fc863 > While I agree with you in principle, we chose to import Sleef this way for practical reasons. (The actual importing of Sleef is happening in #19185 / [JDK-8329816](https://bugs.openjdk.org/browse/JDK-8329816).) The "preprocessing/code-generation" part of the Sleef build was considered too complex to reasonably replicate in the OpenJDK build system. Sleef is built using Cmake and we do not want to add a build dependency on Cmake and call out to a foreign build system at build time, for efficiency and complexity reasons. Of course, there is no reason to rebuild the preprocessed headers every time we build the JDK. I'd never ask for that; the last thing I want is to make building the JDK slower. However, it should be possible to do so on a checked-out JDK source tree, at the builder's option. If there is a script, it doesn't have to be included in the OpenJDK build system itself, but it does have to be in the OpenJDK source tree. (It could be part of make/devkit, for example.) With a script to produce preprocessed files, it should be possible for anyone building the JDK to run that script, and produce the preprocessed source. SLEEF won't take up a prohibitive amount of space. We shouldn't be depending on some other web site somewhere being able to come up with the exact SLEEF sources we used, either. That fails the test of reproducibility. > JDK-8329816 comes with a script to automatically generate the imported source files, to make it easy to update Sleef in the future. It should also be easy enough to verify the imported contents using the same script for anyone who wants to check the validity of the import step. I get it, but not including everything we use in the OpenJDK tree is a dangerous precedent. It should be no big deal to do this right, given that we have the SLEEF sources and the build scripts already. I'm not asking for anything that doesn't exist already, I'm just saying that it must be checked in. Avoiding inconvenience, however great, is not sufficient to justify such a step. This is perhaps something to discuss at the next Committers' Workshop. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2214663777 From amitkumar at openjdk.org Mon Jul 8 16:50:40 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 8 Jul 2024 16:50:40 GMT Subject: RFR: 8334738: os::print_hex_dump should optionally print ASCII [v3] In-Reply-To: <b4NrQs2S9jYAEddRJvmJelnXOXo8tGRulqW7b9Q_RO8=.0a33e434-6096-427d-940b-6f87facc3db6@github.com> References: <YKa7IgCjp0GLJDZFTlLVoBfDavVdj1Fc5XmQV-xVBM8=.46792106-0555-47bd-899f-056fa5219d03@github.com> <nyLYOhw7-wSPlKjeWi3FyuLY0UzFwWJdj-19ijEInU4=.6f539aaf-0cff-4ab8-8ca0-3acd3b44d071@github.com> <EliUQk2e0HZE3BQ3BKOGvF81KROy_lLp4OgK-hRWazA=.79466db9-87df-403c-a928-15e1dea8bbd5@github.com> <b4NrQs2S9jYAEddRJvmJelnXOXo8tGRulqW7b9Q_RO8=.0a33e434-6096-427d-940b-6f87facc3db6@github.com> Message-ID: <-QmwjnH5R3sEqzJJItuuVirwUQawa36T3V5iECXwZ7I=.d4bd0d93-20f0-4371-9eb4-45b550b06130@github.com> On Thu, 4 Jul 2024 06:18:13 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> This isn't really the area of my expertise, but the patch seems reasonable to me. > > Many thanks, @jerboaa ! Hi @tstuefe, GTestWrapper.java is failing on s390x, after this commit, consistently. I have opened [JDK-8335906](https://bugs.openjdk.org/browse/JDK-8335906). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19835#issuecomment-2214677436 From pchilanomate at openjdk.org Mon Jul 8 18:14:54 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 8 Jul 2024 18:14:54 GMT Subject: RFR: 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 [v4] In-Reply-To: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> Message-ID: <SDFSzAJLVcfhnlfPyRDZTI2hiF7sLfYqbymrGe8-BUw=.1004d539-7085-4b89-81eb-0e411b960385@github.com> > The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens. > > The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details). > > The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. > > I tested the patch by running it through mach5 tiers 1-6. > > Thanks, > Patricio Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: address Thomas' comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20012/files - new: https://git.openjdk.org/jdk/pull/20012/files/7ce559cb..88d866ba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20012&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20012&range=02-03 Stats: 14 lines in 2 files changed: 0 ins; 6 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/20012.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20012/head:pull/20012 PR: https://git.openjdk.org/jdk/pull/20012 From pchilanomate at openjdk.org Mon Jul 8 18:14:54 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 8 Jul 2024 18:14:54 GMT Subject: RFR: 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 [v3] In-Reply-To: <IqgSXiXCZOj9b0mibWQ2BWb4qzDuNwXqgcix6pd0QxA=.ac7cde2a-45c5-46b4-9208-6a2c25346614@github.com> References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> <uFD2HVD2DS8b9XI68lOXqSyyT3gdfmNFXmYIUozJ3hc=.f5aa1a99-e90e-4f5e-9159-c9724205fbd9@github.com> <IqgSXiXCZOj9b0mibWQ2BWb4qzDuNwXqgcix6pd0QxA=.ac7cde2a-45c5-46b4-9208-6a2c25346614@github.com> Message-ID: <gr0mBxJWj5dhm4l-NDkbi9tHYcB12dHKy4jw0-BvxuA=.f49eb2b3-4073-494d-a735-bde26c6c9635@github.com> On Mon, 8 Jul 2024 12:19:48 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> use DEBUG_ONLY on _used declaration > > src/hotspot/share/interpreter/oopMapCache.cpp line 183: > >> 181: if (mask_size() > small_mask_limit) { >> 182: assert(!Thread::current()->resource_area()->contains((void*)_bit_mask[0]), >> 183: "The bit mask should be allocated from the C heap"); > > Arguably, this assert is not needed. In debug builds, we have NMT enabled, and that does a check on os::free. > > However, an assert that _bit_mask[0] != 0 *would* make sense, since the free quielty swallows null pointers. Fixed. We could had such case if the mask was never filled due to invalid bci, so I also improved the conditional. > src/hotspot/share/interpreter/oopMapCache.cpp line 405: > >> 403: // Implementation of OopMapCache >> 404: >> 405: void InterpreterOopMap::copy_from(OopMapCacheEntry* src) { > > Possibly for another RFE: src pointer should be const Fixed, should be fine to do it in this PR. > src/hotspot/share/interpreter/oopMapCache.cpp line 423: > >> 421: } else { >> 422: _bit_mask[0] = (uintptr_t) NEW_C_HEAP_ARRAY(uintptr_t, mask_word_size(), mtClass); >> 423: assert(_bit_mask[0] != 0, "bit mask was not allocated"); > > The assert can be removed, no? NEW_C_HEAP_ARRAY does a null check by default. Right, removed. > src/hotspot/share/interpreter/oopMapCache.cpp line 424: > >> 422: _bit_mask[0] = (uintptr_t) NEW_C_HEAP_ARRAY(uintptr_t, mask_word_size(), mtClass); >> 423: assert(_bit_mask[0] != 0, "bit mask was not allocated"); >> 424: memcpy((void*) _bit_mask[0], (void*) src->_bit_mask[0], mask_word_size() * BytesPerWord); > > Are the (void*) cast really needed? We need them here otherwise we get a compilation error on the conversion from intptr_t to void*. But we don't need them above so I removed those. > src/hotspot/share/interpreter/oopMapCache.hpp line 92: > >> 90: >> 91: protected: >> 92: DEBUG_ONLY(bool _used;) > > Minor nit. This changes memory layout between debug and release builds, and this is used as part of OopMapCache. Not a big concern, but I usually prefer having the same layout between debug and release to test what we ship. > > Can't we not just assert that mask size == USHRT_MAX? Yes, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1669071718 PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1669072023 PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1669073705 PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1669073099 PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1669072432 From stuefe at openjdk.org Mon Jul 8 18:20:33 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 8 Jul 2024 18:20:33 GMT Subject: RFR: 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 [v4] In-Reply-To: <SDFSzAJLVcfhnlfPyRDZTI2hiF7sLfYqbymrGe8-BUw=.1004d539-7085-4b89-81eb-0e411b960385@github.com> References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> <SDFSzAJLVcfhnlfPyRDZTI2hiF7sLfYqbymrGe8-BUw=.1004d539-7085-4b89-81eb-0e411b960385@github.com> Message-ID: <ycdFLutW434YRzauiklq5o_bqnLtC5Y-hw-Bzm2celI=.7744ba02-65ff-4864-ad91-92548f93f2e0@github.com> On Mon, 8 Jul 2024 18:14:54 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote: >> The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens. >> >> The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details). >> >> The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. >> >> I tested the patch by running it through mach5 tiers 1-6. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > address Thomas' comments good. thanks! ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20012#pullrequestreview-2164051598 From jrose at openjdk.org Mon Jul 8 18:31:07 2024 From: jrose at openjdk.org (John R Rose) Date: Mon, 8 Jul 2024 18:31:07 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields In-Reply-To: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> Message-ID: <kqWVtdaVUBaMBqNHBHzvXn2oCW-AcOJ3J8tu1DQWa7Y=.0fe9af2c-99dc-4c3d-bfc4-c5724d3898e2@github.com> On Mon, 10 Jun 2024 18:05:09 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > See bug for more discussion. > > Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193 > > A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice: > https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160 > > AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone. > > I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields. > > Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit. > > C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner. > > Additional testing: > - [x] New IR tests > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` I think we should remove `wrote_stable` and all associated logic. The full argument is in my comment the bug. https://bugs.openjdk.org/browse/JDK-8333791 Stable fields are in some ways ?better finals?, in that they can be used to store lazy but effectively final states. But part of the ?better? is that (correctly used) their race conditions are safe. Since racing is part of their nature, the fences are an unnecessary expense. So just removing that code would be the best outcome, unless I am missing something. We will want to run such a change through heavy testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2161140122 From shade at openjdk.org Mon Jul 8 18:31:07 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 Jul 2024 18:31:07 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields Message-ID: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> See bug for more discussion. Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193 A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160 AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone. I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields. Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit. C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner. Additional testing: - [x] New IR tests - [x] Linux x86_64 server fastdebug, `all` - [x] Linux AArch64 server fastdebug, `all` ------------- Commit messages: - Variant 2: Only final-field like semantics for stable inits - Variant 3: Handle everything, including reads by compilers Changes: https://git.openjdk.org/jdk/pull/19635/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19635&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333791 Stats: 1063 lines in 16 files changed: 1023 ins; 20 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/19635.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19635/head:pull/19635 PR: https://git.openjdk.org/jdk/pull/19635 From matsaave at openjdk.org Mon Jul 8 18:45:34 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 8 Jul 2024 18:45:34 GMT Subject: RFR: 8312125: Refactor CDS enum class handling [v3] In-Reply-To: <DcNbL1qEcDW0knnYfhWkR7hhD5UFwFw1Ko-qcUmx64Y=.50b0c364-aed1-46c6-a32e-62a347195c05@github.com> References: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com> <DcNbL1qEcDW0knnYfhWkR7hhD5UFwFw1Ko-qcUmx64Y=.50b0c364-aed1-46c6-a32e-62a347195c05@github.com> Message-ID: <WvtC8lNLqbThywaIJxVbV6jsQhvOw2LYfUFy5Los7xY=.9b332ee2-b287-4af7-86d2-89f16fd4755e@github.com> On Sun, 7 Jul 2024 01:50:17 GMT, Ioi Lam <iklam at openjdk.org> wrote: >> Please review this simple refactoring of the CDS code for handling enum classes. The code is moved to new files cdsEnumKlass.cpp/hpp. There's otherwise no change. > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into 8312125-refactor-cds-enum-class-handling > - @calvinccheung comments > - fixed copyright > - 8312125: Refactor CDS enum class handling Updates look good ------------- Marked as reviewed by matsaave (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20013#pullrequestreview-2164097860 From pchilanomate at openjdk.org Mon Jul 8 20:08:05 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 8 Jul 2024 20:08:05 GMT Subject: RFR: 8335269: [Graal] occasional timeout in java/lang/StringBuffer/TestSynchronization.java with loom [v2] In-Reply-To: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com> References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com> Message-ID: <s2b91CxdF_c01u14qHPvobUSw0Uz9IsjLJcw5o4dtF8=.5d861285-1468-4979-816d-824e8fec0f9c@github.com> > Please review the following simple fix. A pinned virtual thread calling Thread.yield() in a loop might never poll for safepoints if the compiler relies on a poll in native method Continuation.doYield while optimizing. This is a special native method that doesn't always poll for safepoints, and in particular it doesn't if the virtual thread is pinned due to owning monitors. Currently this scenario can be reproduced with the Graal compiler. > > I included a test which reproduces the issue with Graal (couldn't reproduce the issue with c2). The test times out without the fix and passes with it. I also run the patch through mach5 tiers1-3. > > Thanks, > Patricio Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: use VThreadPinner ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20016/files - new: https://git.openjdk.org/jdk/pull/20016/files/0490e6c8..ce777598 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20016&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20016&range=00-01 Stats: 13 lines in 1 file changed: 5 ins; 2 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20016.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20016/head:pull/20016 PR: https://git.openjdk.org/jdk/pull/20016 From pchilanomate at openjdk.org Mon Jul 8 20:08:05 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 8 Jul 2024 20:08:05 GMT Subject: RFR: 8335269: [Graal] occasional timeout in java/lang/StringBuffer/TestSynchronization.java with loom [v2] In-Reply-To: <9E_nLqk5ThBlynnp1khLmG1iislzXOY8eH0VV3J1itA=.a776501d-b4a3-4a2f-8162-eb1c536a7839@github.com> References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com> <9E_nLqk5ThBlynnp1khLmG1iislzXOY8eH0VV3J1itA=.a776501d-b4a3-4a2f-8162-eb1c536a7839@github.com> Message-ID: <iaBqzib4909AgZHO95lUgxtZHa5dgRNU1KvAjOjit-w=.e2acabb6-a421-4b87-8df2-c3a3c75f2c6f@github.com> On Mon, 8 Jul 2024 08:25:47 GMT, Alan Bateman <alanb at openjdk.org> wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> use VThreadPinner > > test/jdk/java/lang/Thread/virtual/ThreadYield.java line 29: > >> 27: * @summary Test that Thread.yield loop polls for safepoints >> 28: * @requires vm.continuations >> 29: * @modules java.base/java.lang:+open > > I assume the `@modules` isn't needed as this test doesn't need to open java.lang. Right, removed. > test/jdk/java/lang/Thread/virtual/ThreadYield.java line 47: > >> 45: import static org.junit.jupiter.api.Assertions.*; >> 46: >> 47: class ThreadYield { > > This isn't a unit test for Thread.yield so I think it would be better to rename to something specific like ThreadYieldPollsSafepoint (or better name). How about ThreadPollOnYield? > test/jdk/java/lang/Thread/virtual/ThreadYield.java line 49: > >> 47: class ThreadYield { >> 48: static void foo(AtomicBoolean done) { >> 49: synchronized (done) { > > When this test makes it to the loom repo then we'll need to change it to pin by other means. I changed it to use VThreadPinner. I verified the test still times out with Graal. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1669230583 PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1669229847 PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1669231244 From iklam at openjdk.org Mon Jul 8 20:17:38 2024 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 8 Jul 2024 20:17:38 GMT Subject: Integrated: 8312125: Refactor CDS enum class handling In-Reply-To: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com> References: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com> Message-ID: <PJ9_Gwsu7-gd-BGAwasKlTf8QvFiVhmur6XOabDU_3c=.0ee3de33-c94a-4a6e-8c49-6b09090a0bc5@github.com> On Wed, 3 Jul 2024 17:00:30 GMT, Ioi Lam <iklam at openjdk.org> wrote: > Please review this simple refactoring of the CDS code for handling enum classes. The code is moved to new files cdsEnumKlass.cpp/hpp. There's otherwise no change. This pull request has now been integrated. Changeset: 9c7a6eab Author: Ioi Lam <iklam at openjdk.org> URL: https://git.openjdk.org/jdk/commit/9c7a6eabb93c570fdb74076edc931576ed6be3e0 Stats: 285 lines in 5 files changed: 190 ins; 93 del; 2 mod 8312125: Refactor CDS enum class handling Reviewed-by: matsaave, ccheung ------------- PR: https://git.openjdk.org/jdk/pull/20013 From tanksherman27 at gmail.com Tue Jul 9 05:14:58 2024 From: tanksherman27 at gmail.com (Julian Waters) Date: Tue, 9 Jul 2024 13:14:58 +0800 Subject: Where does VMError::print_native_stack and os::get_sender_for_C_frame load/use the frame pointer? In-Reply-To: <cb2380cc-11ad-4f89-a0f2-92281ad7d5a0@oracle.com> References: <CAP2b4GNPq3Fr3X=v=_8nFLwYTbC7e=0N5Xd2i2jOvXfqqftCrQ@mail.gmail.com> <cb2380cc-11ad-4f89-a0f2-92281ad7d5a0@oracle.com> Message-ID: <CAP2b4GNOabGyT15SxtD9tP2ok15-2joy=Fo1ag8EB5f_iOTB=A@mail.gmail.com> Hi David, I just looked at the code for both, and it weirdly doesn't seem that fetch_frame_from_context is used in either. Out of curiosity I tried removing HAVE_PLATFORM_PRINT_NATIVE_STACK from Windows/x64 and deliberately crashed HotSpot after compiling the JDK, and the resulting hs_err file had almost no frame information as a result: --------------- S U M M A R Y ------------ Command Line: --enable-preview Crash Host: AMD Ryzen 9 7845HX with Radeon Graphics , 24 cores, 15G, Windows 11 , 64 bit Build 22621 (10.0.22621.3672) Time: Tue Jul 9 02:41:20 2024 Malay Peninsula Standard Time elapsed time: 0.070338 seconds (0d 0h 0m 0s) --------------- T H R E A D --------------- Current thread (0x0000017e8cead250): JavaThread "main" [_thread_in_vm, id=33760, stack(0x0000005c11f00000,0x0000005c12000000) (1024K)] Stack: [0x0000005c11f00000,0x0000005c12000000], sp=0x0000005c11fff000, free space=1020k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [jvm.dll+0x195ab57] Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j jdk.internal.misc.Unsafe.putLong(Ljava/lang/Object;JJ)V+0 java.base at 24-internal j jdk.internal.misc.Unsafe.putAddress(Ljava/lang/Object;JJ)V+24 java.base at 24-internal j jdk.internal.misc.Unsafe.putAddress(JJ)V+4 java.base at 24-internal j sun.misc.Unsafe.putAddress(JJ)V+8 jdk.unsupported at 24-internal j Crash.main()V+53 v ~StubRoutines::call_stub 0x0000017e9fc70fcd siginfo: EXCEPTION_ACCESS_VIOLATION (0xc0000005), writing address 0x0000000000000000 Native frames only has 1 frame in it, indicative of further frames not being found. I can't really tell what else is required to get it to work with the regular VMError::print_native_stack without requiring the Windows specific os::win32::platform_print_native_stack. I compiled HotSpot with gcc and verified that the frame pointer is indeed saved, so this not working is a little odd to me (Was testing in the off chance that the Microsoft compiler could be forced to preserve the frame pointer). There has to somehow be a way to walk the frames on Windows when the frame pointer is available for use best regards, Julian P.S. The attempted patch is attached below, if anyone is curious diff --git a/src/hotspot/os_cpu/windows_x86/os_windows_x86.cpp b/src/hotspot/os_cpu/windows_x86/os_windows_x86.cpp index 7e0814c014b..a4fa45ed78f 100644 --- a/src/hotspot/os_cpu/windows_x86/os_windows_x86.cpp +++ b/src/hotspot/os_cpu/windows_x86/os_windows_x86.cpp @@ -71,7 +71,7 @@ extern LONG WINAPI topLevelExceptionFilter(_EXCEPTION_POINTERS* ); // Install a win32 structured exception handler around thread. void os::os_exception_wrapper(java_call_t f, JavaValue* value, const methodHandle& method, JavaCallArguments* args, JavaThread* thread) { - __try { + WIN32_TRY { #ifndef AMD64 // We store the current thread in this wrapperthread location @@ -111,7 +111,7 @@ void os::os_exception_wrapper(java_call_t f, JavaValue* value, const methodHandl #endif // !AMD64 f(value, method, args, thread); - } __except(topLevelExceptionFilter((_EXCEPTION_POINTERS*)_exception_info())) { + } WIN32_EXCEPT (topLevelExceptionFilter(GetExceptionInformation())) { // Nothing to do. } } @@ -396,16 +396,32 @@ bool os::win32::get_frame_at_stack_banging_point(JavaThread* thread, // VC++ does not save frame pointer on stack in optimized build. It -// can be turned off by /Oy-. If we really want to walk C frames, +// can be turned off by -Oy-. If we really want to walk C frames, // we can use the StackWalk() API. frame os::get_sender_for_C_frame(frame* fr) { +#ifdef __GNUC__ + return frame(fr->sender_sp(), fr->link(), fr->sender_pc()); +#elif defined(_MSC_VER) ShouldNotReachHere(); return frame(); +#endif } frame os::current_frame() { +#ifdef __GNUC__ + frame f(reinterpret_cast<intptr_t*>(os::current_stack_pointer()), + reinterpret_cast<intptr_t*>(__builtin_frame_address(1)), + CAST_FROM_FN_PTR(address, &os::current_frame)); + if (os::is_first_C_frame(&f)) { + // stack is not walkable + return frame(); + } else { + return os::get_sender_for_C_frame(&f); + } +#elif defined(_MSC_VER) return frame(); // cannot walk Windows frames this way. See os::get_native_stack // and os::platform_print_native_stack +#endif } void os::print_context(outputStream *st, const void *context) { diff --git a/src/hotspot/os_cpu/windows_x86/os_windows_x86.inline.hpp b/src/hotspot/os_cpu/windows_x86/os_windows_x86.inline.hpp index f7622611da7..3461cd4c0b0 100644 --- a/src/hotspot/os_cpu/windows_x86/os_windows_x86.inline.hpp +++ b/src/hotspot/os_cpu/windows_x86/os_windows_x86.inline.hpp @@ -29,12 +29,14 @@ #include "os_windows.hpp" #ifdef AMD64 +#ifdef _MSC_VER #define HAVE_PLATFORM_PRINT_NATIVE_STACK 1 inline bool os::platform_print_native_stack(outputStream* st, const void* context, char *buf, int buf_size, address& lastpc) { return os::win32::platform_print_native_stack(st, context, buf, buf_size, lastpc); } #endif +#endif inline jlong os::rdtsc() { // 32 bit: 64 bit result in edx:eax diff --git a/src/hotspot/share/runtime/os.cpp b/src/hotspot/share/runtime/os.cpp index 7b766707b0d..d3613652f45 100644 --- a/src/hotspot/share/runtime/os.cpp +++ b/src/hotspot/share/runtime/os.cpp @@ -179,7 +179,7 @@ char* os::iso8601_time(jlong milliseconds_since_19700101, char* buffer, size_t b // No offset when dealing with UTC time_t UTC_to_local = 0; if (!utc) { -#if (defined(_ALLBSD_SOURCE) || defined(_GNU_SOURCE)) && !defined(AIX) +#if (defined(_ALLBSD_SOURCE) || defined(_GNU_SOURCE)) && !defined(AIX) && !defined(_WIN32) UTC_to_local = -(time_struct.tm_gmtoff); #elif defined(_WINDOWS) long zone; @@ -1349,7 +1349,9 @@ static bool is_pointer_bad(intptr_t* ptr) { bool os::is_first_C_frame(frame* fr) { #ifdef _WINDOWS +#ifdef _MSC_VER return true; // native stack isn't walkable on windows this way. +#endif #endif // Load up sp, fp, sender sp and sender fp, check for reasonable values. // Check usp first, because if that's bad the other accessors may fault On Mon, Jul 8, 2024 at 4:49?PM David Holmes <david.holmes at oracle.com> wrote: > > On 8/07/2024 5:59 pm, Julian Waters wrote: > > Hi David, > > > > Ah, I think you misunderstood me, I'm aware that the frame pointer is > > saved as required by the compiler (With the exception of the Microsoft > > compiler, which doesn't save it at all). What I meant was that the > > comments in Windows code imply that VMError::print_native_stack and > > os::get_sender_for_C_frame need to use the frame pointer, yet I can't > > seem to find where or how either of them obtain the frame pointer for > > whatever they use it for on platforms and compilers where the frame > > pointer is saved (For instance, on Linux), whether through handwritten > > assembly code or some other means. It follows that if they need to use > > the frame pointer, then they must grab it from somewhere, after all > > Ah sorry. AFAICS we just create the frame() objects and wallk the stack > via those. We use fetch_frame_from_context to kick things off in the > case of a crash. > > David > > > best regards, > > Julian From fyang at openjdk.org Tue Jul 9 05:30:32 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 9 Jul 2024 05:30:32 GMT Subject: RFR: 8334999: RISC-V: implement AES single block encryption/decryption intrinsics [v3] In-Reply-To: <F1yms2X9VVITjLPANuQqABre5E199ILHQ4ywpS4cicY=.3e2c0af1-8070-497a-bfa0-5732eb199974@github.com> References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com> <F1yms2X9VVITjLPANuQqABre5E199ILHQ4ywpS4cicY=.3e2c0af1-8070-497a-bfa0-5732eb199974@github.com> Message-ID: <IATUuy7OYBIasXTq1KFmVEjeg2eQ9qFM2UP5B0UhoHw=.7a112155-e875-4752-b6f4-fbeb56248759@github.com> On Mon, 8 Jul 2024 15:24:13 GMT, ArsenyBochkarev <duke at openjdk.org> wrote: >> Hello everyone! Please review this port of vector AES single block encryption/decryption intrinsics. On my QEMU with `Zvkned` extension enabled the `test/hotspot/jtreg/compiler/codegen/aes/TestAESMain.java` test is OK. I know that currently hardware implementing this extension is not available on the market but I suppose this PR can be a good starting point on supporting AES intrinsics for RISC-V in OpenJDK. > > ArsenyBochkarev has updated the pull request incrementally with one additional commit since the last revision: > > Left a note on a side effect of generate_vle32_pack2 Changes requested by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19960#pullrequestreview-2163577505 From fyang at openjdk.org Tue Jul 9 05:30:34 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 9 Jul 2024 05:30:34 GMT Subject: RFR: 8334999: RISC-V: implement AES single block encryption/decryption intrinsics [v2] In-Reply-To: <eGRQlTfJGvdSd84lJn1MUGon75zsDTYTOhMbVqQryC8=.3cff42c0-7b5c-4870-929e-3acfa74e31bd@github.com> References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com> <eGRQlTfJGvdSd84lJn1MUGon75zsDTYTOhMbVqQryC8=.3cff42c0-7b5c-4870-929e-3acfa74e31bd@github.com> Message-ID: <vknSXGLwqD-p-lOrVwzn8rU6mTY3o4NP3eRbp4smvoI=.33dba76f-cd79-4d55-9e87-58e37adfeaf8@github.com> On Sun, 7 Jul 2024 15:16:02 GMT, ArsenyBochkarev <duke at openjdk.org> wrote: >> Hello everyone! Please review this port of vector AES single block encryption/decryption intrinsics. On my QEMU with `Zvkned` extension enabled the `test/hotspot/jtreg/compiler/codegen/aes/TestAESMain.java` test is OK. I know that currently hardware implementing this extension is not available on the market but I suppose this PR can be a good starting point on supporting AES intrinsics for RISC-V in OpenJDK. > > ArsenyBochkarev has updated the pull request incrementally with three additional commits since the last revision: > > - Use t2 directly instead of temp2 > - Rename temp1 -> x0 > - Left a note on a side effect of generate_vle32_pack4 src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2282: > 2280: __ vrev8_v(vtmp1, vtmp1); > 2281: __ vrev8_v(vtmp2, vtmp2); > 2282: } Please leave a new line after each of these newly-added functions. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2332: > 2330: const Register key = c_rarg2; // key array address > 2331: const Register keylen = c_rarg3; > 2332: const Register x0 = c_rarg4; I think you can use the global `x0` (aka the zero register) instead for `vsetivli`. It very confusing to have register alias names like `x0` like here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1668794931 PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1668790141 From dean.long at oracle.com Tue Jul 9 05:32:40 2024 From: dean.long at oracle.com (dean.long at oracle.com) Date: Mon, 8 Jul 2024 22:32:40 -0700 Subject: Where does VMError::print_native_stack and os::get_sender_for_C_frame load/use the frame pointer? In-Reply-To: <CAP2b4GNCAh20cyz_JgF+kg34zzyNHznGSUB4_5_E0ot1ZJnwoA@mail.gmail.com> References: <CAP2b4GNCAh20cyz_JgF+kg34zzyNHznGSUB4_5_E0ot1ZJnwoA@mail.gmail.com> Message-ID: <281d9f6c-1fdc-4411-9f83-dc6e82a0faaa@oracle.com> It sounds like you are looking for frame::link().? See also frame::real_fp() and frame::fp(). dl On 7/7/24 11:04 PM, Julian Waters wrote: > Hi all, > > I have a question with regards to os::get_sender_for_C_frame and > VMError::print_native_stack. In Windows specific code comments allude > to both needing the rbp register to be saved, which is why > VMError::print_native_stack > doesn't work on Windows since Microsoft Visual C doesn't save the frame > pointer, as stated: > > /* > * Windows/x64 does not use stack frames the way expected by Java: > * [1] in most cases, there is no frame pointer. All locals are addressed via RSP > * [2] in rare cases, when alloca() is used, a frame pointer is used, > but this may > * not be RBP. > * See http://msdn.microsoft.com/en-us/library/ew5tede7.aspx > * > * So it's not possible to print the native stack using the > * while (...) {... fr = os::get_sender_for_C_frame(&fr); } > * loop in vmError.cpp. We need to roll our own loop. > */ > > // VC++ does not save frame pointer on stack in optimized build. It > // can be turned off by -Oy-. If we really want to walk C frames, > // we can use the StackWalk() API. > > I can't seem to find where rbp is loaded and used on platforms and > compilers that do save the frame pointer though. Eclipse cannot find > it through the vast collection of member methods inside the frame > class and related code. Do anyone by any chance know where the code that > loads and uses the frame pointer for os::get_sender_for_C_frame and > VMError::print_native_stack is located on such platforms? > > best regards, > Julian From dnsimon at openjdk.org Tue Jul 9 07:52:58 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 9 Jul 2024 07:52:58 GMT Subject: RFR: 8335553: [Graal] Compiler thread calls into jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation Message-ID: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com> This PR addresses intermittent failures in jtreg GC stress tests. The failures occur under these conditions: 1. Using a libgraal build with assertions enabled as the top tier JIT compiler. Such a libgraal build will cause a VM exit if an assertion or GraalError occurs in a compiler thread (as this catches more errors in testing). 2. A libgraal compiler thread makes a call into the VM (via `CompilerToVM`) to a routine that performs a HotSpot heap allocation that fails. 3. The resulting OOME is wrapped in a GraalError, causing the VM to exit as described in 1. An OOME thrown in these specific conditions should not exit the VM as it not related to an OOME in the app or test. Instead, the failure should be treated as a bailout and the libgraal compiler should continue. To accomplish this, libgraal needs to be able to distinguish a GraalError caused by an OOME. This PR modifies the exception translation code to make this possible. ------------- Commit messages: - improved exception translation between HotSpot and libgraal heaps Changes: https://git.openjdk.org/jdk/pull/20083/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20083&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335553 Stats: 84 lines in 5 files changed: 50 ins; 22 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/20083.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20083/head:pull/20083 PR: https://git.openjdk.org/jdk/pull/20083 From dnsimon at openjdk.org Tue Jul 9 07:52:58 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 9 Jul 2024 07:52:58 GMT Subject: RFR: 8335553: [Graal] Compiler thread calls into jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation In-Reply-To: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com> References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com> Message-ID: <rVd_Q0quLUtgmICEEtFkSzbGfPWD2_RkwX1y5cUS40w=.2fe82b2b-5b49-477a-81a5-9e39bf72a377@github.com> On Mon, 8 Jul 2024 19:01:05 GMT, Doug Simon <dnsimon at openjdk.org> wrote: > This PR addresses intermittent failures in jtreg GC stress tests. The failures occur under these conditions: > 1. Using a libgraal build with assertions enabled as the top tier JIT compiler. Such a libgraal build will cause a VM exit if an assertion or GraalError occurs in a compiler thread (as this catches more errors in testing). > 2. A libgraal compiler thread makes a call into the VM (via `CompilerToVM`) to a routine that performs a HotSpot heap allocation that fails. > 3. The resulting OOME is wrapped in a GraalError, causing the VM to exit as described in 1. > > An OOME thrown in these specific conditions should not exit the VM as it not related to an OOME in the app or test. Instead, the failure should be treated as a bailout and the libgraal compiler should continue. > > To accomplish this, libgraal needs to be able to distinguish a GraalError caused by an OOME. This PR modifies the exception translation code to make this possible. src/hotspot/share/utilities/exceptions.cpp line 114: > 112: #endif // ASSERT > 113: > 114: if (h_exception.is_null() && !thread->can_call_java()) { There is no reason to replace an existing exception object with a dummy exception object in the case where the current thread cannot call into Java. Since the exception object already exists, no Java call is necessary. This change is necessary to allow the libgraal exception translation mechanism to know that an OOME is being translated. src/hotspot/share/utilities/exceptions.cpp line 208: > 206: Handle h_loader, Handle h_protection_domain) { > 207: // Check for special boot-strapping/compiler-thread handling > 208: if (special_exception(thread, file, line, h_cause)) return; This fixes a long standing bug where `special_exception` is being queried with the *cause* of the exception being thrown instead of the *name* of the exception being thrown. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20083#discussion_r1669153819 PR Review Comment: https://git.openjdk.org/jdk/pull/20083#discussion_r1669148553 From fyang at openjdk.org Tue Jul 9 08:40:35 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 9 Jul 2024 08:40:35 GMT Subject: RFR: 8334999: RISC-V: implement AES single block encryption/decryption intrinsics [v2] In-Reply-To: <vknSXGLwqD-p-lOrVwzn8rU6mTY3o4NP3eRbp4smvoI=.33dba76f-cd79-4d55-9e87-58e37adfeaf8@github.com> References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com> <eGRQlTfJGvdSd84lJn1MUGon75zsDTYTOhMbVqQryC8=.3cff42c0-7b5c-4870-929e-3acfa74e31bd@github.com> <vknSXGLwqD-p-lOrVwzn8rU6mTY3o4NP3eRbp4smvoI=.33dba76f-cd79-4d55-9e87-58e37adfeaf8@github.com> Message-ID: <T59CuchKVcFhqy7VAzIHxakveuo2bJFrORdrKQwoFLE=.1b43c0cb-d05e-45eb-b85c-026b44dea080@github.com> On Mon, 8 Jul 2024 14:53:03 GMT, Fei Yang <fyang at openjdk.org> wrote: >> ArsenyBochkarev has updated the pull request incrementally with three additional commits since the last revision: >> >> - Use t2 directly instead of temp2 >> - Rename temp1 -> x0 >> - Left a note on a side effect of generate_vle32_pack4 > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2282: > >> 2280: __ vrev8_v(vtmp1, vtmp1); >> 2281: __ vrev8_v(vtmp2, vtmp2); >> 2282: } > > Please leave a new line after each of these newly-added functions. BTW: Did you compare this with the openssl version which also makes use of `vaesz_vs` instruction from `Zvkned` [1]? [1] https://github.com/openssl/openssl/blob/master/crypto/aes/asm/aes-riscv64-zvkb-zvkned.pl ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1670009486 From mli at openjdk.org Tue Jul 9 11:48:12 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 9 Jul 2024 11:48:12 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v10] In-Reply-To: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> Message-ID: <jOPhbkdyBhmThSdvmjaKeFrSa-TbIh4bLY1SPKQgmq8=.c6e21cd9-3e7c-4781-b68f-aff19d7e3552@github.com> > Hi, > Can you help to review the patch? > This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294). > * NOTE: This pr depends on https://github.com/openjdk/jdk/pull/19185, which includes a README, a script to generate sleef inline headers and generated sleef inline headers. > > Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time. > Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk. > > Besides of the code changes, one important task is to handle the legal process. > > Thanks! > > ## Test > tests: > * test/jdk/jdk/incubator/vector/ > * test/hotspot/jtreg/compiler/vectorapi/ > > options: > * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs > * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs > * -XX:+EnableVectorSupport -XX:-UseVectorStubs > > ## Performance > > ### Options > * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' > * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' > > ### Float > data > <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> > Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0) > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408 > Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876 > Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936 > Float128Vector.ATAN2 | 1024 | thrpt | 10 | 0.021 | ops/ms | 135.088 | 32.449 | 4.163 | 135.72... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: minor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18605/files - new: https://git.openjdk.org/jdk/pull/18605/files/b54fc863..da65cfa5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18605&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18605&range=08-09 Stats: 17 lines in 3 files changed: 11 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18605.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18605/head:pull/18605 PR: https://git.openjdk.org/jdk/pull/18605 From mli at openjdk.org Tue Jul 9 12:08:50 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 9 Jul 2024 12:08:50 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v11] In-Reply-To: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> Message-ID: <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com> > Hi, > Can you help to review the patch? > This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294). > * NOTE: This pr depends on https://github.com/openjdk/jdk/pull/19185, which includes a README, a script to generate sleef inline headers and generated sleef inline headers. > > Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time. > Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk. > > Besides of the code changes, one important task is to handle the legal process. > > Thanks! > > ## Test > tests: > * test/jdk/jdk/incubator/vector/ > * test/hotspot/jtreg/compiler/vectorapi/ > > options: > * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs > * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs > * -XX:+EnableVectorSupport -XX:-UseVectorStubs > > ## Performance > > ### Options > * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' > * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' > > ### Float > data > <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> > Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0) > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408 > Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876 > Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936 > Float128Vector.ATAN2 | 1024 | thrpt | 10 | 0.021 | ops/ms | 135.088 | 32.449 | 4.163 | 135.72... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: skip TANH ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18605/files - new: https://git.openjdk.org/jdk/pull/18605/files/da65cfa5..6061c25d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18605&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18605&range=09-10 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18605.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18605/head:pull/18605 PR: https://git.openjdk.org/jdk/pull/18605 From galder at openjdk.org Tue Jul 9 12:12:53 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 9 Jul 2024 12:12:53 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) Message-ID: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. Currently vectorization does not kick in for loops containing either of these calls because of the following error: VLoop::check_preconditions: failed: control flow in loop not allowed The control flow is due to the java implementation for these methods, e.g. public static long max(long a, long b) { return (a >= b) ? a : b; } This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. E.g. SuperWord::transform_loop: Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java 1 1 0 0 ============================== TEST SUCCESS long min 1155 long max 1173 After the patch, on darwin/aarch64 (M1): ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java 1 1 0 0 ============================== TEST SUCCESS long min 1042 long max 1042 This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. Therefore, it still relies on the macro expansion to transform those into CMoveL. I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:test/hotspot/jtreg:tier1 2500 2500 0 0 >> jtreg:test/jdk:tier1 2413 2412 1 0 << jtreg:test/langtools:tier1 4556 4556 0 0 jtreg:test/jaxp:tier1 0 0 0 0 jtreg:test/lib-test:tier1 33 33 0 0 ============================== The failure I got is [CODETOOLS-7903745](https://bugs.openjdk.org/browse/CODETOOLS-7903745) so unrelated to these changes. ------------- Commit messages: - 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) Changes: https://git.openjdk.org/jdk/pull/20098/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20098&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307513 Stats: 32 lines in 5 files changed: 32 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20098.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20098/head:pull/20098 PR: https://git.openjdk.org/jdk/pull/20098 From dnsimon at openjdk.org Tue Jul 9 13:46:46 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 9 Jul 2024 13:46:46 GMT Subject: RFR: 8335553: [Graal] Compiler thread calls into jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation [v2] In-Reply-To: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com> References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com> Message-ID: <BUPsFQTN-twZrvPQBoAMoHXNo_lqIMiTGH-pVnvVVpY=.2bfcc370-6ddb-4e12-8dcb-420aad9e4223@github.com> > This PR addresses intermittent failures in jtreg GC stress tests. The failures occur under these conditions: > 1. Using a libgraal build with assertions enabled as the top tier JIT compiler. Such a libgraal build will cause a VM exit if an assertion or GraalError occurs in a compiler thread (as this catches more errors in testing). > 2. A libgraal compiler thread makes a call into the VM (via `CompilerToVM`) to a routine that performs a HotSpot heap allocation that fails. > 3. The resulting OOME is wrapped in a GraalError, causing the VM to exit as described in 1. > > An OOME thrown in these specific conditions should not exit the VM as it not related to an OOME in the app or test. Instead, the failure should be treated as a bailout and the libgraal compiler should continue. > > To accomplish this, libgraal needs to be able to distinguish a GraalError caused by an OOME. This PR modifies the exception translation code to make this possible. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: fixed TestTranslatedException ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20083/files - new: https://git.openjdk.org/jdk/pull/20083/files/ff544be3..aa32491c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20083&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20083&range=00-01 Stats: 19 lines in 2 files changed: 12 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/20083.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20083/head:pull/20083 PR: https://git.openjdk.org/jdk/pull/20083 From alanb at openjdk.org Tue Jul 9 14:01:34 2024 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 9 Jul 2024 14:01:34 GMT Subject: RFR: 8335269: [Graal] occasional timeout in java/lang/StringBuffer/TestSynchronization.java with loom [v2] In-Reply-To: <iaBqzib4909AgZHO95lUgxtZHa5dgRNU1KvAjOjit-w=.e2acabb6-a421-4b87-8df2-c3a3c75f2c6f@github.com> References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com> <9E_nLqk5ThBlynnp1khLmG1iislzXOY8eH0VV3J1itA=.a776501d-b4a3-4a2f-8162-eb1c536a7839@github.com> <iaBqzib4909AgZHO95lUgxtZHa5dgRNU1KvAjOjit-w=.e2acabb6-a421-4b87-8df2-c3a3c75f2c6f@github.com> Message-ID: <4Nq-9jBCNPXjTt2JoVVAR4IMCWiur0HI8zTZNZepZqM=.86b3fd9d-d4c5-40b9-be91-14df861ae4eb@github.com> On Mon, 8 Jul 2024 20:04:04 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote: >> test/jdk/java/lang/Thread/virtual/ThreadYield.java line 47: >> >>> 45: import static org.junit.jupiter.api.Assertions.*; >>> 46: >>> 47: class ThreadYield { >> >> This isn't a unit test for Thread.yield so I think it would be better to rename to something specific like ThreadYieldPollsSafepoint (or better name). > > How about ThreadPollOnYield? That would be okay, main thing is to avoid any suggestion that it's a general test for Thread.yield. >> test/jdk/java/lang/Thread/virtual/ThreadYield.java line 49: >> >>> 47: class ThreadYield { >>> 48: static void foo(AtomicBoolean done) { >>> 49: synchronized (done) { >> >> When this test makes it to the loom repo then we'll need to change it to pin by other means. > > I changed it to use VThreadPinner. I verified the test still times out with Graal. Thanks, that avoids needing to update when it meets up with the monitor changes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1670582899 PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1670583501 From yzheng at openjdk.org Tue Jul 9 14:40:33 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 9 Jul 2024 14:40:33 GMT Subject: RFR: 8335553: [Graal] Compiler thread calls into jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation [v2] In-Reply-To: <BUPsFQTN-twZrvPQBoAMoHXNo_lqIMiTGH-pVnvVVpY=.2bfcc370-6ddb-4e12-8dcb-420aad9e4223@github.com> References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com> <BUPsFQTN-twZrvPQBoAMoHXNo_lqIMiTGH-pVnvVVpY=.2bfcc370-6ddb-4e12-8dcb-420aad9e4223@github.com> Message-ID: <h9dpaL1Pnl8D1T4inI12kuGJN8-QmLte0VCFMJdp0Ig=.7c58fb03-a5be-40cb-85a1-52ee9943f63e@github.com> On Tue, 9 Jul 2024 13:46:46 GMT, Doug Simon <dnsimon at openjdk.org> wrote: >> This PR addresses intermittent failures in jtreg GC stress tests. The failures occur under these conditions: >> 1. Using a libgraal build with assertions enabled as the top tier JIT compiler. Such a libgraal build will cause a VM exit if an assertion or GraalError occurs in a compiler thread (as this catches more errors in testing). >> 2. A libgraal compiler thread makes a call into the VM (via `CompilerToVM`) to a routine that performs a HotSpot heap allocation that fails. >> 3. The resulting OOME is wrapped in a GraalError, causing the VM to exit as described in 1. >> >> An OOME thrown in these specific conditions should not exit the VM as it not related to an OOME in the app or test. Instead, the failure should be treated as a bailout and the libgraal compiler should continue. >> >> To accomplish this, libgraal needs to be able to distinguish a GraalError caused by an OOME. This PR modifies the exception translation code to make this possible. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > fixed TestTranslatedException src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 782: > 780: while (true) { > 781: // Trigger an OutOfMemoryError > 782: objArrayOop next = oopFactory::new_objectArray(0x7FFFFFFF, CHECK_NULL); Shall we check for pending exception and break here? test/jdk/jdk/internal/vm/TestTranslatedException.java line 167: > 165: private static void assertThrowableEquals(Throwable originalIn, Throwable decodedIn) { > 166: Throwable original = originalIn; > 167: Throwable decoded = decodedIn; What is the purpose of this renaming? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20083#discussion_r1670646934 PR Review Comment: https://git.openjdk.org/jdk/pull/20083#discussion_r1670607742 From dnsimon at openjdk.org Tue Jul 9 14:45:33 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 9 Jul 2024 14:45:33 GMT Subject: RFR: 8335553: [Graal] Compiler thread calls into jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation [v2] In-Reply-To: <h9dpaL1Pnl8D1T4inI12kuGJN8-QmLte0VCFMJdp0Ig=.7c58fb03-a5be-40cb-85a1-52ee9943f63e@github.com> References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com> <BUPsFQTN-twZrvPQBoAMoHXNo_lqIMiTGH-pVnvVVpY=.2bfcc370-6ddb-4e12-8dcb-420aad9e4223@github.com> <h9dpaL1Pnl8D1T4inI12kuGJN8-QmLte0VCFMJdp0Ig=.7c58fb03-a5be-40cb-85a1-52ee9943f63e@github.com> Message-ID: <vgV9ewwD3yK8VwAqF6Uuy6zFeGju_9Ubd0tPHnQakv4=.d7988221-7ac9-46fc-b88c-a2edf4e85d64@github.com> On Tue, 9 Jul 2024 14:37:47 GMT, Yudi Zheng <yzheng at openjdk.org> wrote: >> Doug Simon has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed TestTranslatedException > > src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 782: > >> 780: while (true) { >> 781: // Trigger an OutOfMemoryError >> 782: objArrayOop next = oopFactory::new_objectArray(0x7FFFFFFF, CHECK_NULL); > > Shall we check for pending exception and break here? The `CHECK_NULL` macro effectively does that. > test/jdk/jdk/internal/vm/TestTranslatedException.java line 167: > >> 165: private static void assertThrowableEquals(Throwable originalIn, Throwable decodedIn) { >> 166: Throwable original = originalIn; >> 167: Throwable decoded = decodedIn; > > What is the purpose of this renaming? So that the printing down the bottom of this message shows the complete throwable, not just the cause on which the comparison failed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20083#discussion_r1670656254 PR Review Comment: https://git.openjdk.org/jdk/pull/20083#discussion_r1670654917 From yzheng at openjdk.org Tue Jul 9 15:01:34 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 9 Jul 2024 15:01:34 GMT Subject: RFR: 8335553: [Graal] Compiler thread calls into jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation [v2] In-Reply-To: <vgV9ewwD3yK8VwAqF6Uuy6zFeGju_9Ubd0tPHnQakv4=.d7988221-7ac9-46fc-b88c-a2edf4e85d64@github.com> References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com> <BUPsFQTN-twZrvPQBoAMoHXNo_lqIMiTGH-pVnvVVpY=.2bfcc370-6ddb-4e12-8dcb-420aad9e4223@github.com> <h9dpaL1Pnl8D1T4inI12kuGJN8-QmLte0VCFMJdp0Ig=.7c58fb03-a5be-40cb-85a1-52ee9943f63e@github.com> <vgV9ewwD3yK8VwAqF6Uuy6zFeGju_9Ubd0tPHnQakv4=.d7988221-7ac9-46fc-b88c-a2edf4e85d64@github.com> Message-ID: <PPjzrPv0uDmVtaDPJOM0fJeBITvDsjC7_MuE1ZAOCxg=.4195a091-f5e1-461e-ad25-5270c2119d1d@github.com> On Tue, 9 Jul 2024 14:42:42 GMT, Doug Simon <dnsimon at openjdk.org> wrote: >> test/jdk/jdk/internal/vm/TestTranslatedException.java line 167: >> >>> 165: private static void assertThrowableEquals(Throwable originalIn, Throwable decodedIn) { >>> 166: Throwable original = originalIn; >>> 167: Throwable decoded = decodedIn; >> >> What is the purpose of this renaming? > > So that the printing down the bottom of this message shows the complete throwable, not just the cause on which the comparison failed. Thanks! I missed the reassign in the folded unchanged code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20083#discussion_r1670683400 From yzheng at openjdk.org Tue Jul 9 15:01:33 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 9 Jul 2024 15:01:33 GMT Subject: RFR: 8335553: [Graal] Compiler thread calls into jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation [v2] In-Reply-To: <BUPsFQTN-twZrvPQBoAMoHXNo_lqIMiTGH-pVnvVVpY=.2bfcc370-6ddb-4e12-8dcb-420aad9e4223@github.com> References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com> <BUPsFQTN-twZrvPQBoAMoHXNo_lqIMiTGH-pVnvVVpY=.2bfcc370-6ddb-4e12-8dcb-420aad9e4223@github.com> Message-ID: <YtiLAGXigNYv4VlL2owOwZL0Xsi6aoayUYJxZqaZx3I=.0da95499-63b9-4279-9ea5-85e80888ba4c@github.com> On Tue, 9 Jul 2024 13:46:46 GMT, Doug Simon <dnsimon at openjdk.org> wrote: >> This PR addresses intermittent failures in jtreg GC stress tests. The failures occur under these conditions: >> 1. Using a libgraal build with assertions enabled as the top tier JIT compiler. Such a libgraal build will cause a VM exit if an assertion or GraalError occurs in a compiler thread (as this catches more errors in testing). >> 2. A libgraal compiler thread makes a call into the VM (via `CompilerToVM`) to a routine that performs a HotSpot heap allocation that fails. >> 3. The resulting OOME is wrapped in a GraalError, causing the VM to exit as described in 1. >> >> An OOME thrown in these specific conditions should not exit the VM as it not related to an OOME in the app or test. Instead, the failure should be treated as a bailout and the libgraal compiler should continue. >> >> To accomplish this, libgraal needs to be able to distinguish a GraalError caused by an OOME. This PR modifies the exception translation code to make this possible. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > fixed TestTranslatedException LGTM ------------- Marked as reviewed by yzheng (Committer). PR Review: https://git.openjdk.org/jdk/pull/20083#pullrequestreview-2166581323 From duke at openjdk.org Tue Jul 9 16:11:59 2024 From: duke at openjdk.org (Robert Toyonaga) Date: Tue, 9 Jul 2024 16:11:59 GMT Subject: RFR: 8330144: Revise os::free_memory() Message-ID: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com> ### Summary On linux, change `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` so that it uses `madvise(MADV_DONTNEED)` (similar to the BSD implementation) instead of recommitting over the existing committed memory to discard the existing pages. This function should free the underlying memory without uncommitting. The benefit of this change is that we can get rid of conditional logic dependent on whether we're dealing with huge pages, `madvise` can't fail, and we can also get rid of the "alignment_hint" parameter. `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` has also been renamed to `os::free_memory_without_uncommit(char *addr, size_t bytes)` to differentiate it from `os::free_memory()` which reports the size of free memory instead of actually releasing memory. **Transparent huge pages:** `madvise(MADV_DONTNEED)` works with THP. As with small pages, `madvise(MADV_DONTNEED)` results in the memory being freed, RSS decreasing, and the addresses can be re-touched without being explicitly recommitted. To determine this, I set /sys/kernel/mm/transparent_hugepage/enabled to "always" and allocated a large amount of memory. Then /proc/PID/smaps shows that THP are being used to back that memory. After calling `free_memory_without_uncommit`, RSS decreases indicating the memory is no longer live. The `os::committed_in_range function` also reports that the memory has been freed (This function should probably be renamed to `live_in_range`). Touching the addresses again afterward is fine as well. **Explicit huge pages:** `madvise(MADV_DONTNEED)` does not result in memory being freed when used on explicit huge pages. However, the pages are not lost either. Additionally, after `madvise(MADV_DONTNEED)`, we can retouch the addresses without any problems. In conclusion, `madvise(MADV_DONTNEED)` has no affect on huge pages. This means the behavior of of this function with respect to huge pages remains the same. We can remove the "alignment_hint" parameter. To determine this, I allocated some huge pages via /proc/sys/vm/nr_hugepages. Successful allocation was confirmed with /proc/meminfo. After calling `free_memory_without_uncommit`, /proc/meminfo shows no change in the number of huge pages in use. Explicit huge pages are not reflected in RSS so I used the `os::committed_in_range function` instead. After calling `free_memory_without_uncommit`, the `os::committed_in_range` function reports that the memory is still live. Unfortunately that's not an improvement upon existing behavior, but at least it's not a regression either. #### Testing - Added the gtest: free_without_uncommit. This test is excluded for AIX and Windows since on those platforms `free_memory_without_uncommit` does nothing. Interestingly, unlike linux, [madvise(DONTNEED) on BSD](https://man.freebsd.org/cgi/man.cgi?query=madvise&sektion=2&n=1) doesn't free pages, it only lowers their priority, increasing likelihood of future page faults. So this test has special handling is for BSD. If our intention is to have the pages freed, maybe MADV_FREE is a better choice. - tier1 ------------- Commit messages: - Improve free_memory and use madvise on linux. Changes: https://git.openjdk.org/jdk/pull/20080/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20080&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330144 Stats: 55 lines in 10 files changed: 35 ins; 8 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/20080.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20080/head:pull/20080 PR: https://git.openjdk.org/jdk/pull/20080 From duke at openjdk.org Tue Jul 9 16:11:59 2024 From: duke at openjdk.org (Robert Toyonaga) Date: Tue, 9 Jul 2024 16:11:59 GMT Subject: RFR: 8330144: Revise os::free_memory() In-Reply-To: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com> References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com> Message-ID: <2uIEexD6zsgJMPAzXuBT2P88TvUTfrvPT2PLr38RPnE=.75a51a49-a339-48d3-846c-cb639897b740@github.com> On Mon, 8 Jul 2024 17:33:41 GMT, Robert Toyonaga <duke at openjdk.org> wrote: > ### Summary > On linux, change `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` so that it uses `madvise(MADV_DONTNEED)` (similar to the BSD implementation) instead of recommitting over the existing committed memory to discard the existing pages. This function should free the underlying memory without uncommitting. The benefit of this change is that we can get rid of conditional logic dependent on whether we're dealing with huge pages, `madvise` can't fail, and we can also get rid of the "alignment_hint" parameter. > > `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` has also been renamed to `os::free_memory_without_uncommit(char *addr, size_t bytes)` to differentiate it from `os::free_memory()` which reports the size of free memory instead of actually releasing memory. > > **Transparent huge pages:** > `madvise(MADV_DONTNEED)` works with THP. As with small pages, `madvise(MADV_DONTNEED)` results in the memory being freed, RSS decreasing, and the addresses can be re-touched without being explicitly recommitted. > > To determine this, I set /sys/kernel/mm/transparent_hugepage/enabled to "always" and allocated a large amount of memory. Then /proc/PID/smaps shows that THP are being used to back that memory. After calling `free_memory_without_uncommit`, RSS decreases indicating the memory is no longer live. The `os::committed_in_range function` also reports that the memory has been freed (This function should probably be renamed to `live_in_range`). Touching the addresses again afterward is fine as well. > > **Explicit huge pages:** > `madvise(MADV_DONTNEED)` does not result in memory being freed when used on explicit huge pages. However, the pages are not lost either. Additionally, after `madvise(MADV_DONTNEED)`, we can retouch the addresses without any problems. In conclusion, `madvise(MADV_DONTNEED)` has no affect on huge pages. This means the behavior of of this function with respect to huge pages remains the same. We can remove the "alignment_hint" parameter. > > To determine this, I allocated some huge pages via /proc/sys/vm/nr_hugepages. Successful allocation was confirmed with /proc/meminfo. After calling `free_memory_without_uncommit`, /proc/meminfo shows no change in the number of huge pages in use. Explicit huge pages are not reflected in RSS so I used the `os::committed_in_range function` instead. After calling `free_memory_without_uncommit`, the `os::committed_in_range` function reports that the memory is still live. Unfortu... Failing GHA: -[linux-x86 / test (hs/tier1 compiler part 2)](https://github.com/roberttoyonaga/jdk/actions/runs/9857950912/job/27219620860): compiler/interpreter/Test6833129.java encounters an error with exit code 1. I don't think this is related, since these changes don't relate to compilation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20080#issuecomment-2217943553 From pchilanomate at openjdk.org Tue Jul 9 16:38:13 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 9 Jul 2024 16:38:13 GMT Subject: RFR: 8335269: [Graal] occasional timeout in java/lang/StringBuffer/TestSynchronization.java with loom [v3] In-Reply-To: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com> References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com> Message-ID: <xcZfnPE5iPxfz9WTSkNWCamtfVSXhpg5UNojhYBsW30=.72bf8fbc-60bc-4250-9284-79b2d75150fb@github.com> > Please review the following simple fix. A pinned virtual thread calling Thread.yield() in a loop might never poll for safepoints if the compiler relies on a poll in native method Continuation.doYield while optimizing. This is a special native method that doesn't always poll for safepoints, and in particular it doesn't if the virtual thread is pinned due to owning monitors. Currently this scenario can be reproduced with the Graal compiler. > > I included a test which reproduces the issue with Graal (couldn't reproduce the issue with c2). The test times out without the fix and passes with it. I also run the patch through mach5 tiers1-3. > > Thanks, > Patricio Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: Rename test to ThreadPollOnYield.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20016/files - new: https://git.openjdk.org/jdk/pull/20016/files/ce777598..79be1fcc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20016&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20016&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20016.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20016/head:pull/20016 PR: https://git.openjdk.org/jdk/pull/20016 From alanb at openjdk.org Tue Jul 9 16:48:34 2024 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 9 Jul 2024 16:48:34 GMT Subject: RFR: 8335269: [Graal] occasional timeout in java/lang/StringBuffer/TestSynchronization.java with loom [v3] In-Reply-To: <xcZfnPE5iPxfz9WTSkNWCamtfVSXhpg5UNojhYBsW30=.72bf8fbc-60bc-4250-9284-79b2d75150fb@github.com> References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com> <xcZfnPE5iPxfz9WTSkNWCamtfVSXhpg5UNojhYBsW30=.72bf8fbc-60bc-4250-9284-79b2d75150fb@github.com> Message-ID: <WuesF9Q5ft_qBS-SToSKAHFbJKj_LXZkUp-bEfmoUcQ=.a0952d22-9988-45dc-82e3-e4c0cb69e250@github.com> On Tue, 9 Jul 2024 16:38:13 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote: >> Please review the following simple fix. A pinned virtual thread calling Thread.yield() in a loop might never poll for safepoints if the compiler relies on a poll in native method Continuation.doYield while optimizing. This is a special native method that doesn't always poll for safepoints, and in particular it doesn't if the virtual thread is pinned due to owning monitors. Currently this scenario can be reproduced with the Graal compiler. >> >> I included a test which reproduces the issue with Graal (couldn't reproduce the issue with c2). The test times out without the fix and passes with it. I also run the patch through mach5 tiers1-3. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Rename test to ThreadPollOnYield.java Marked as reviewed by alanb (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20016#pullrequestreview-2166891851 From stuefe at openjdk.org Tue Jul 9 18:29:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 9 Jul 2024 18:29:17 GMT Subject: RFR: 8330144: Revise os::free_memory() In-Reply-To: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com> References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com> Message-ID: <RkdpuSUNmZ4sLShuFs-FxWivLrnc7Hd_0t5eAQspR0g=.75741bbc-6af3-42fb-acd5-1cc413060f8a@github.com> On Mon, 8 Jul 2024 17:33:41 GMT, Robert Toyonaga <duke at openjdk.org> wrote: > ### Summary > On linux, change `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` so that it uses `madvise(MADV_DONTNEED)` (similar to the BSD implementation) instead of recommitting over the existing committed memory to discard the existing pages. This function should free the underlying memory without uncommitting. The benefit of this change is that we can get rid of conditional logic dependent on whether we're dealing with huge pages, `madvise` can't fail, and we can also get rid of the "alignment_hint" parameter. > > `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` has also been renamed to `os::free_memory_without_uncommit(char *addr, size_t bytes)` to differentiate it from `os::free_memory()` which reports the size of free memory instead of actually releasing memory. > > **Transparent huge pages:** > `madvise(MADV_DONTNEED)` works with THP. As with small pages, `madvise(MADV_DONTNEED)` results in the memory being freed, RSS decreasing, and the addresses can be re-touched without being explicitly recommitted. > > To determine this, I set /sys/kernel/mm/transparent_hugepage/enabled to "always" and allocated a large amount of memory. Then /proc/PID/smaps shows that THP are being used to back that memory. After calling `free_memory_without_uncommit`, RSS decreases indicating the memory is no longer live. The `os::committed_in_range function` also reports that the memory has been freed (This function should probably be renamed to `live_in_range`). Touching the addresses again afterward is fine as well. > > **Explicit huge pages:** > `madvise(MADV_DONTNEED)` does not result in memory being freed when used on explicit huge pages. However, the pages are not lost either. Additionally, after `madvise(MADV_DONTNEED)`, we can retouch the addresses without any problems. In conclusion, `madvise(MADV_DONTNEED)` has no affect on huge pages. This means the behavior of of this function with respect to huge pages remains the same. We can remove the "alignment_hint" parameter. > > To determine this, I allocated some huge pages via /proc/sys/vm/nr_hugepages. Successful allocation was confirmed with /proc/meminfo. After calling `free_memory_without_uncommit`, /proc/meminfo shows no change in the number of huge pages in use. Explicit huge pages are not reflected in RSS so I used the `os::committed_in_range function` instead. After calling `free_memory_without_uncommit`, the `os::committed_in_range` function reports that the memory is still live. Unfortu... Great, thanks @roberttoyonaga. The main work was the analysis work beforehand. About naming, I would name the thing "os::disclaim_memory". free_without_uncommit is a mouthful. There is a precedence in the "disclaim" API on AIX, which in a future RFE may be used to implement os::disclaim_memory. test/hotspot/gtest/runtime/test_os.cpp line 988: > 986: const size_t size = pages * page_sz; > 987: > 988: char *base = os::reserve_memory(size, false, mtTest); I prefer char* base (star at type) syntax, and its much more common in hotspot. test/hotspot/gtest/runtime/test_os.cpp line 1002: > 1000: size_t committed_size; > 1001: address committed_start; > 1002: ASSERT_FALSE(os::committed_in_range((address) base, size, committed_start, committed_size)); Is there a chance of this generating false positives? Do we know if the madvise effect immediate or delayed? ------------- Changes requested by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20080#pullrequestreview-2167064443 PR Review Comment: https://git.openjdk.org/jdk/pull/20080#discussion_r1670980361 PR Review Comment: https://git.openjdk.org/jdk/pull/20080#discussion_r1670985051 From stuefe at openjdk.org Tue Jul 9 19:22:20 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 9 Jul 2024 19:22:20 GMT Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is failing [v4] In-Reply-To: <aiYvWQf9AqVWcE_I4yy5e4l1CL3pN9KjZyYEMa0t0N8=.67276cf4-29f0-41d7-8ef2-a1eb1d4dc68e@github.com> References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com> <aiYvWQf9AqVWcE_I4yy5e4l1CL3pN9KjZyYEMa0t0N8=.67276cf4-29f0-41d7-8ef2-a1eb1d4dc68e@github.com> Message-ID: <7pLl_uA6UDHCkT7qHS4czxdPaTfYBDjcdLumY0eFR00=.0f0ea68e-9962-40bd-980e-6d86ef583067@github.com> On Thu, 4 Jul 2024 07:49:32 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> See JBS issue. >> >> It is not completely obvious what the problem is in Oracle's CI, but the current assumption is that RSS of the testee VM gets reduced after it started and before we measured due to memory pressure. >> >> The patch: >> - exposes os::available_memory via Whitebox >> - For the test to count as failed, we require a certain minimum size of available memory both before and during the start of the testee JVM. Otherwise, we throw a `SkippedException` >> >> I have some misgivings about this solution, though: >> 1) obviously, it is not bullet-proof either, since it is vulnerable to fast changes in machine memory load. >> 2) On MacOS, we have the problem that 'os::available_memory()' totally underreports how much memory is available. Therefore, as an estimate of whether the test is valid, it is too conservative. I opened https://bugs.openjdk.org/browse/JDK-8334767 to track that issue. As long as it is not fixed, the tests will likely fall below the threshold on MacOS and, therefore, be skipped. Still, this is somewhat better than outright excluding the test for MacOS (or is it? Open to opinions) >> 3) `SkippedException` leads to the test counting as "passed", not "skipped". I think that is a usability issue with jtreg. I cannot easily see which tests had been skipped due to SkippedException. >> >> Despite my doubts, I think this is the best we can come up with if we want to have such a test. >> >> Note: One way to go about (3) would be to make "minimum available memory" a `@requires` tag, similar to os.maxMemory. However, I fear that this may be easily misused and cause many tests to be excluded without notice. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > comma Friendly ping ------------- PR Comment: https://git.openjdk.org/jdk/pull/19803#issuecomment-2218464968 From aturbanov at openjdk.org Tue Jul 9 20:06:22 2024 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Tue, 9 Jul 2024 20:06:22 GMT Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is failing [v4] In-Reply-To: <aiYvWQf9AqVWcE_I4yy5e4l1CL3pN9KjZyYEMa0t0N8=.67276cf4-29f0-41d7-8ef2-a1eb1d4dc68e@github.com> References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com> <aiYvWQf9AqVWcE_I4yy5e4l1CL3pN9KjZyYEMa0t0N8=.67276cf4-29f0-41d7-8ef2-a1eb1d4dc68e@github.com> Message-ID: <179ivC-StXqp1a8UuPYS1igE8x7h36P5On75huXrLAM=.2402195a-f3e3-4a0b-a7fb-683219d7594f@github.com> On Thu, 4 Jul 2024 07:49:32 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> See JBS issue. >> >> It is not completely obvious what the problem is in Oracle's CI, but the current assumption is that RSS of the testee VM gets reduced after it started and before we measured due to memory pressure. >> >> The patch: >> - exposes os::available_memory via Whitebox >> - For the test to count as failed, we require a certain minimum size of available memory both before and during the start of the testee JVM. Otherwise, we throw a `SkippedException` >> >> I have some misgivings about this solution, though: >> 1) obviously, it is not bullet-proof either, since it is vulnerable to fast changes in machine memory load. >> 2) On MacOS, we have the problem that 'os::available_memory()' totally underreports how much memory is available. Therefore, as an estimate of whether the test is valid, it is too conservative. I opened https://bugs.openjdk.org/browse/JDK-8334767 to track that issue. As long as it is not fixed, the tests will likely fall below the threshold on MacOS and, therefore, be skipped. Still, this is somewhat better than outright excluding the test for MacOS (or is it? Open to opinions) >> 3) `SkippedException` leads to the test counting as "passed", not "skipped". I think that is a usability issue with jtreg. I cannot easily see which tests had been skipped due to SkippedException. >> >> Despite my doubts, I think this is the best we can come up with if we want to have such a test. >> >> Note: One way to go about (3) would be to make "minimum available memory" a `@requires` tag, similar to os.maxMemory. However, I fear that this may be easily misused and cause many tests to be excluded without notice. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > comma test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java line 144: > 142: final static long expectedMaxNonHeapRSS = M * 256; > 143: // How much memory we require the host to have available before even starting the test > 144: final static long requiredAvailableBefore = heapsize * 2 + expectedMaxNonHeapRSS; Suggestion: final static long requiredAvailableBefore = heapsize * 2 + expectedMaxNonHeapRSS; test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java line 148: > 146: // count the low RSS as a real error - an indication for a misfunctioning pretouch, not just a low-memory > 147: // condition on the system. > 148: final static long requiredAvailableDuring = expectedMaxNonHeapRSS; Suggestion: final static long requiredAvailableDuring = expectedMaxNonHeapRSS; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19803#discussion_r1671167332 PR Review Comment: https://git.openjdk.org/jdk/pull/19803#discussion_r1671167658 From coleenp at openjdk.org Tue Jul 9 21:20:21 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 9 Jul 2024 21:20:21 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v3] In-Reply-To: <5CNKzDumOf1MJQXM9OBHQh0Mj7eLv2ONio1V-AXeSJI=.54302b45-2dd2-4f18-a094-6b2c6a59517c@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <5CNKzDumOf1MJQXM9OBHQh0Mj7eLv2ONio1V-AXeSJI=.54302b45-2dd2-4f18-a094-6b2c6a59517c@github.com> Message-ID: <-hS6aTxhzI_HzVegg0EziUtGxdq6orpF9s1rF3l2hZY=.0c4296b2-d27a-4578-a160-d17b65163655@github.com> On Mon, 8 Jul 2024 16:21:16 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: > > - Add JVMCI symbol exports > - Revert "More graceful JVMCI VM option interaction" > > This reverts commit 2814350370cf142e130fe1d38610c646039f976d. This is really great work, Axel! I've been reading this code for a while, and have done one pass looking through the PR with a few comments. src/hotspot/share/opto/library_call.cpp line 4620: > 4618: Node *unlocked_val = _gvn.MakeConX(markWord::unlocked_value); > 4619: Node *chk_unlocked = _gvn.transform(new CmpXNode(lmasked_header, unlocked_val)); > 4620: Node *test_not_unlocked = _gvn.transform(new BoolNode(chk_unlocked, BoolTest::ne)); I don't really know what this does. Someone from the c2 compiler group should look at this. src/hotspot/share/runtime/arguments.cpp line 1830: > 1828: FLAG_SET_CMDLINE(LockingMode, LM_LIGHTWEIGHT); > 1829: warning("UseObjectMonitorTable requires LM_LIGHTWEIGHT"); > 1830: } Maybe we want this to have the opposite sense - turn off UseObjectMonitorTable if not LM_LIGHTWEIGHT? src/hotspot/share/runtime/javaThread.inline.hpp line 258: > 256: } > 257: > 258: _om_cache.clear(); This could be shorter, ie: if (UseObjectMonitorTable) _om_cache.clear(); I think the not having an assert was to make the caller unconditional, which is good. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 393: > 391: > 392: ObjectMonitor* LightweightSynchronizer::get_or_insert_monitor(oop object, JavaThread* current, const ObjectSynchronizer::InflateCause cause, bool try_read) { > 393: assert(LockingMode == LM_LIGHTWEIGHT, "must be"); This assert should be assert(UseObjectMonitorTable not LM_LIGHTWEIGHT). src/hotspot/share/runtime/lightweightSynchronizer.cpp line 732: > 730: > 731: markWord mark = object->mark(); > 732: assert(!mark.is_unlocked(), "must be unlocked"); "must be locked" makes more sense. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 763: > 761: assert(mark.has_monitor(), "must be"); > 762: // The monitor exists > 763: ObjectMonitor* monitor = ObjectSynchronizer::read_monitor(current, object, mark); This looks in the table for the monitor in UseObjectMonitorTable, but could it first check the BasicLock? Or we got here because BasicLock.metadata was not the ObjectMonitor? src/hotspot/share/runtime/lightweightSynchronizer.cpp line 773: > 771: } > 772: > 773: ObjectMonitor* LightweightSynchronizer::inflate_locked_or_imse(oop obj, const ObjectSynchronizer::InflateCause cause, TRAPS) { I figured out at one point why we now check IMSE here but now cannot remember. Can you add a comment why above this function? ------------- PR Review: https://git.openjdk.org/jdk/pull/20067#pullrequestreview-2167461168 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1671214948 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1671216649 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1671220251 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1671225452 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1671229697 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1671231155 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1671231863 From dholmes at openjdk.org Wed Jul 10 05:28:20 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 10 Jul 2024 05:28:20 GMT Subject: RFR: 8335269: [Graal] occasional timeout in java/lang/StringBuffer/TestSynchronization.java with loom [v3] In-Reply-To: <xcZfnPE5iPxfz9WTSkNWCamtfVSXhpg5UNojhYBsW30=.72bf8fbc-60bc-4250-9284-79b2d75150fb@github.com> References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com> <xcZfnPE5iPxfz9WTSkNWCamtfVSXhpg5UNojhYBsW30=.72bf8fbc-60bc-4250-9284-79b2d75150fb@github.com> Message-ID: <4SmCasO8fGVxb0wnRWQcMDUM63yub0jqnDbVyRr-xBs=.042f56b8-d4f1-4460-95b9-ed09df545b3e@github.com> On Tue, 9 Jul 2024 16:38:13 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote: >> Please review the following simple fix. A pinned virtual thread calling Thread.yield() in a loop might never poll for safepoints if the compiler relies on a poll in native method Continuation.doYield while optimizing. This is a special native method that doesn't always poll for safepoints, and in particular it doesn't if the virtual thread is pinned due to owning monitors. Currently this scenario can be reproduced with the Graal compiler. >> >> I included a test which reproduces the issue with Graal (couldn't reproduce the issue with c2). The test times out without the fix and passes with it. I also run the patch through mach5 tiers1-3. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Rename test to ThreadPollOnYield.java test/jdk/java/lang/Thread/virtual/ThreadPollOnYield.java line 39: > 37: * @requires vm.continuations > 38: * @library /test/lib > 39: * @run junit/othervm -Xcomp -XX:-TieredCompilation -XX:CompileCommand=inline,*::yield* -XX:CompileCommand=inline,*::*Yield ThreadPollOnYield Given this forces -Xcomp shouldn't we skip running it when compilation mode is set via jtreg flags? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1671637893 From dholmes at openjdk.org Wed Jul 10 05:38:20 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 10 Jul 2024 05:38:20 GMT Subject: RFR: 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 [v4] In-Reply-To: <SDFSzAJLVcfhnlfPyRDZTI2hiF7sLfYqbymrGe8-BUw=.1004d539-7085-4b89-81eb-0e411b960385@github.com> References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> <SDFSzAJLVcfhnlfPyRDZTI2hiF7sLfYqbymrGe8-BUw=.1004d539-7085-4b89-81eb-0e411b960385@github.com> Message-ID: <UtACtbpQujJHrXFzh_GqeAIzPtttQEM5T48LRQhZB84=.0183e5bf-5a04-45ed-8fe1-aea9558a301c@github.com> On Mon, 8 Jul 2024 18:14:54 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote: >> The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens. >> >> The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details). >> >> The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. >> >> I tested the patch by running it through mach5 tiers 1-6. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > address Thomas' comments Still good. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20012#pullrequestreview-2168073270 From dholmes at openjdk.org Wed Jul 10 05:51:16 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 10 Jul 2024 05:51:16 GMT Subject: RFR: 8335553: [Graal] Compiler thread calls into jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation [v2] In-Reply-To: <rVd_Q0quLUtgmICEEtFkSzbGfPWD2_RkwX1y5cUS40w=.2fe82b2b-5b49-477a-81a5-9e39bf72a377@github.com> References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com> <rVd_Q0quLUtgmICEEtFkSzbGfPWD2_RkwX1y5cUS40w=.2fe82b2b-5b49-477a-81a5-9e39bf72a377@github.com> Message-ID: <euEkVDmhbZAK3bZW_b60yHwNzbwl0BWj8d-CuHFNGsQ=.587007e5-ec4d-470c-82db-50301067390f@github.com> On Mon, 8 Jul 2024 19:09:47 GMT, Doug Simon <dnsimon at openjdk.org> wrote: >> Doug Simon has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed TestTranslatedException > > src/hotspot/share/utilities/exceptions.cpp line 208: > >> 206: Handle h_loader, Handle h_protection_domain) { >> 207: // Check for special boot-strapping/compiler-thread handling >> 208: if (special_exception(thread, file, line, h_cause)) return; > > This fixes a long standing bug where `special_exception` is being queried with the *cause* of the exception being thrown instead of the *name* of the exception being thrown. I'm not so sure this is in fact a bug. If we are throwing with a cause, but we can't actually throw and so will do vm_exit, then the exception of interest is the cause not the more generic exception that would otherwise contain the cause. Though I have to wonder why there is not an original `_throw` for the "cause" exception, that would have triggered the special_exception handling anyway? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20083#discussion_r1671652583 From dholmes at openjdk.org Wed Jul 10 05:51:17 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 10 Jul 2024 05:51:17 GMT Subject: RFR: 8335553: [Graal] Compiler thread calls into jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation [v2] In-Reply-To: <euEkVDmhbZAK3bZW_b60yHwNzbwl0BWj8d-CuHFNGsQ=.587007e5-ec4d-470c-82db-50301067390f@github.com> References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com> <rVd_Q0quLUtgmICEEtFkSzbGfPWD2_RkwX1y5cUS40w=.2fe82b2b-5b49-477a-81a5-9e39bf72a377@github.com> <euEkVDmhbZAK3bZW_b60yHwNzbwl0BWj8d-CuHFNGsQ=.587007e5-ec4d-470c-82db-50301067390f@github.com> Message-ID: <3rVX0mcF68BflX71dFK30ztQEn_RJp9UPrb04AS6ZJM=.c12a4765-e310-43e2-a8ab-c4c3b2628d0c@github.com> On Wed, 10 Jul 2024 05:46:31 GMT, David Holmes <dholmes at openjdk.org> wrote: >> src/hotspot/share/utilities/exceptions.cpp line 208: >> >>> 206: Handle h_loader, Handle h_protection_domain) { >>> 207: // Check for special boot-strapping/compiler-thread handling >>> 208: if (special_exception(thread, file, line, h_cause)) return; >> >> This fixes a long standing bug where `special_exception` is being queried with the *cause* of the exception being thrown instead of the *name* of the exception being thrown. > > I'm not so sure this is in fact a bug. If we are throwing with a cause, but we can't actually throw and so will do vm_exit, then the exception of interest is the cause not the more generic exception that would otherwise contain the cause. > > Though I have to wonder why there is not an original `_throw` for the "cause" exception, that would have triggered the special_exception handling anyway? Though I see this is inconsistent with `Exceptions::_throw_msg_cause` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20083#discussion_r1671653968 From stuefe at openjdk.org Wed Jul 10 06:10:41 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 10 Jul 2024 06:10:41 GMT Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is failing [v5] In-Reply-To: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com> References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com> Message-ID: <JWAEg-gsIOnQEC1GeQ5GFO8vZDGf4UHo5O2y7RBbFF4=.26f9da9a-3886-425f-b533-657f7929aff4@github.com> > See JBS issue. > > It is not completely obvious what the problem is in Oracle's CI, but the current assumption is that RSS of the testee VM gets reduced after it started and before we measured due to memory pressure. > > The patch: > - exposes os::available_memory via Whitebox > - For the test to count as failed, we require a certain minimum size of available memory both before and during the start of the testee JVM. Otherwise, we throw a `SkippedException` > > I have some misgivings about this solution, though: > 1) obviously, it is not bullet-proof either, since it is vulnerable to fast changes in machine memory load. > 2) On MacOS, we have the problem that 'os::available_memory()' totally underreports how much memory is available. Therefore, as an estimate of whether the test is valid, it is too conservative. I opened https://bugs.openjdk.org/browse/JDK-8334767 to track that issue. As long as it is not fixed, the tests will likely fall below the threshold on MacOS and, therefore, be skipped. Still, this is somewhat better than outright excluding the test for MacOS (or is it? Open to opinions) > 3) `SkippedException` leads to the test counting as "passed", not "skipped". I think that is a usability issue with jtreg. I cannot easily see which tests had been skipped due to SkippedException. > > Despite my doubts, I think this is the best we can come up with if we want to have such a test. > > Note: One way to go about (3) would be to make "minimum available memory" a `@requires` tag, similar to os.maxMemory. However, I fear that this may be easily misused and cause many tests to be excluded without notice. Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: - Update test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java Co-authored-by: Andrey Turbanov <turbanoff at gmail.com> - Update test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java Co-authored-by: Andrey Turbanov <turbanoff at gmail.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19803/files - new: https://git.openjdk.org/jdk/pull/19803/files/eba72ed9..109e9172 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19803&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19803&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19803.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19803/head:pull/19803 PR: https://git.openjdk.org/jdk/pull/19803 From dholmes at openjdk.org Wed Jul 10 06:23:15 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 10 Jul 2024 06:23:15 GMT Subject: RFR: 8335553: [Graal] Compiler thread calls into jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation [v2] In-Reply-To: <3rVX0mcF68BflX71dFK30ztQEn_RJp9UPrb04AS6ZJM=.c12a4765-e310-43e2-a8ab-c4c3b2628d0c@github.com> References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com> <rVd_Q0quLUtgmICEEtFkSzbGfPWD2_RkwX1y5cUS40w=.2fe82b2b-5b49-477a-81a5-9e39bf72a377@github.com> <euEkVDmhbZAK3bZW_b60yHwNzbwl0BWj8d-CuHFNGsQ=.587007e5-ec4d-470c-82db-50301067390f@github.com> <3rVX0mcF68BflX71dFK30ztQEn_RJp9UPrb04AS6ZJM=.c12a4765-e310-43e2-a8ab-c4c3b2628d0c@github.com> Message-ID: <UdqI44rJgcX2ZaV-LzZm54NUA5v3NLT724p61yB_B44=.9218d574-5402-43cf-ada7-6930c0458396@github.com> On Wed, 10 Jul 2024 05:48:23 GMT, David Holmes <dholmes at openjdk.org> wrote: >> I'm not so sure this is in fact a bug. If we are throwing with a cause, but we can't actually throw and so will do vm_exit, then the exception of interest is the cause not the more generic exception that would otherwise contain the cause. >> >> Though I have to wonder why there is not an original `_throw` for the "cause" exception, that would have triggered the special_exception handling anyway? > > Though I see this is inconsistent with `Exceptions::_throw_msg_cause` Okay I think I see how the logic works. If we were going to abort we would never reach `_throw_cause` as the initial `_throw` would have exited. But for the `!thread->can_call_Java()` case the original `_throw` would replace the intended real exception with the dummy `VM_exception()`, which is then "caught" and we try to replace with a more specific exception to be thrown via `throw_cause`, which will again replace whichever exception is requested with the dummy `VM_exception()` - so the end result is we will throw the dummy regardless of whether the cause or wrapping exception is specified. So your fix here makes sense. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20083#discussion_r1671680471 From luhenry at openjdk.org Wed Jul 10 07:44:19 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 10 Jul 2024 07:44:19 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v9] In-Reply-To: <eT48AR-Up7CyMkuiFet-hoQtyaO_hifCSZUQ6LJrjnQ=.026071f1-de0f-4589-a247-c7fc2afe68c4@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com> <eT48AR-Up7CyMkuiFet-hoQtyaO_hifCSZUQ6LJrjnQ=.026071f1-de0f-4589-a247-c7fc2afe68c4@github.com> Message-ID: <pwP98IvP1jzN3sU1d_fa9Lkdzf4yfxHkNAmWKQsRP3w=.29bba507-a7c4-480a-88ae-28e2dbb280dd@github.com> On Mon, 8 Jul 2024 16:40:50 GMT, Andrew Haley <aph at openjdk.org> wrote: >> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits: >> >> - Merge branch 'master' into sleef-aarch64-integrate-source >> - merge master >> - sleef 3.6.1 for riscv >> - sleef 3.6.1 >> - update header files for arm >> - add inline header file for riscv64 >> - remove notes about sleef changes >> - fix performance issue >> - disable unused-function warnings; add log msg >> - minor >> - ... and 23 more: https://git.openjdk.org/jdk/compare/2f4f6cc3...b54fc863 > >> While I agree with you in principle, we chose to import Sleef this way for practical reasons. (The actual importing of Sleef is happening in #19185 / [JDK-8329816](https://bugs.openjdk.org/browse/JDK-8329816).) The "preprocessing/code-generation" part of the Sleef build was considered too complex to reasonably replicate in the OpenJDK build system. Sleef is built using Cmake and we do not want to add a build dependency on Cmake and call out to a foreign build system at build time, for efficiency and complexity reasons. > > Of course, there is no reason to rebuild the preprocessed headers every time we build the JDK. I'd never ask for that; the last thing I want is to make building the JDK slower. However, it should be possible to do so on a checked-out JDK source tree, at the builder's option. > > If there is a script, it doesn't have to be included in the OpenJDK build system itself, but it does have to be in the OpenJDK source tree. (It could be part of make/devkit, for example.) > > With a script to produce preprocessed files, it should be possible for anyone building the JDK to run that script, and produce the preprocessed source. SLEEF won't take up a prohibitive amount of space. > > We shouldn't be depending on some other web site somewhere being able to come up with the exact SLEEF sources we used, either. That fails the test of reproducibility. > >> JDK-8329816 comes with a script to automatically generate the imported source files, to make it easy to update Sleef in the future. It should also be easy enough to verify the imported contents using the same script for anyone who wants to check the validity of the import step. > > I get it, but not including everything we use in the OpenJDK tree is a dangerous precedent. It should be no big deal to do this right, given that we have the SLEEF sources and the build scripts already. I'm not asking for anything that doesn't exist already, I'm just saying that it must be checked in. > > Avoiding inconvenience, however great, is not sufficient to justify such a step. This is perhaps something to discuss at the next Committers' Workshop. @theRealAph a precendent that exists is for binutils/llvm/capstone and hsdis. Would it be sufficient for the user to choose to build SLEEF from a separate source directory assuming all the dependencies are installed already (the source are checked-out by the user; cmake and other build dependencies are installed; etc.)? We would then invoke the [make/devkit/createSleef.sh](https://github.com/openjdk/jdk/pull/19185/files#diff-4fe89562540474e866588cd87ca7385b920a06bd428da013cd3d3e4b375fdd10) script on the user's SLEEF checkout to regenerate the header files. And by default, we use the header files already checked-in the OpenJDK. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2219783035 From aboldtch at openjdk.org Wed Jul 10 09:46:13 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 10 Jul 2024 09:46:13 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v4] In-Reply-To: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> Message-ID: <IfTY6OjNfJHzo0dq7zQEm45B8ZcYk5aFPilG6g7oB5o=.a1491217-2db3-46d4-b803-83029f96c525@github.com> > When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. > > This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. > > A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). > > This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). > > Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. > > # Cleanups > > Cleaned up displaced header usage for: > * BasicLock > * Contains some Zero changes > * Renames one exported JVMCI field > * ObjectMonitor > * Updates comments and tests consistencies > > # Refactoring > > `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. > > The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. > > _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ > > # LightweightSynchronizer > > Working on adapting and incorporating the following section as a comment in the source code > > ## Fast Locking > > CAS on locking bits in markWord. > 0b00 (Fast Locked) <--> 0b01 (Unlocked) > > When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. > > If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf... Axel Boldt-Christmas has updated the pull request incrementally with five additional commits since the last revision: - Add comment LightweightSynchronizer::inflate_locked_or_imse - Fix BasicLock::object_monitor_cache() for other platforms - Update LightweightSynchronizer::exit assert - Update LightweightSynchronizer::get_or_insert_monitor assert - Update JavaThread::om_clear_monitor_cache ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20067/files - new: https://git.openjdk.org/jdk/pull/20067/files/173b75b8..d12aa5f6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=02-03 Stats: 23 lines in 3 files changed: 17 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20067.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067 PR: https://git.openjdk.org/jdk/pull/20067 From aboldtch at openjdk.org Wed Jul 10 09:46:18 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 10 Jul 2024 09:46:18 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v3] In-Reply-To: <-hS6aTxhzI_HzVegg0EziUtGxdq6orpF9s1rF3l2hZY=.0c4296b2-d27a-4578-a160-d17b65163655@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <5CNKzDumOf1MJQXM9OBHQh0Mj7eLv2ONio1V-AXeSJI=.54302b45-2dd2-4f18-a094-6b2c6a59517c@github.com> <-hS6aTxhzI_HzVegg0EziUtGxdq6orpF9s1rF3l2hZY=.0c4296b2-d27a-4578-a160-d17b65163655@github.com> Message-ID: <P4vwJuFYdy9C2GugO5UgMllMPgrFZyjQkRPCW1d3NxM=.13a88311-ce1e-4be8-8b14-b48177a75960@github.com> On Tue, 9 Jul 2024 20:44:58 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: >> >> - Add JVMCI symbol exports >> - Revert "More graceful JVMCI VM option interaction" >> >> This reverts commit 2814350370cf142e130fe1d38610c646039f976d. > > src/hotspot/share/runtime/arguments.cpp line 1830: > >> 1828: FLAG_SET_CMDLINE(LockingMode, LM_LIGHTWEIGHT); >> 1829: warning("UseObjectMonitorTable requires LM_LIGHTWEIGHT"); >> 1830: } > > Maybe we want this to have the opposite sense - turn off UseObjectMonitorTable if not LM_LIGHTWEIGHT? Maybe. It boils down to what to do when the JVM receives `-XX:LockingMode={LM_LEGACY,LM_MONITOR} -XX:+UseObjectMonitorTable` The options I see are 1. Select `LockingMode=LM_LIGHTWEIGHT` 2. Select `UseObjectMonitorTable=false` 3. Do not start the VM Between 1. and 2. it is impossible to know what the real intentions were. But with being a newer `-XX:+UseObjectMonitorTable` it somehow seems more likely. Option 3. is probably the sensible solution, but it is hard to determine. We tend to not close the VM because of incompatible options, rather fix them. But I believe there are precedence for both. If we do this however we will have to figure out all the interactions with our testing framework. And probably add some safeguards. > src/hotspot/share/runtime/javaThread.inline.hpp line 258: > >> 256: } >> 257: >> 258: _om_cache.clear(); > > This could be shorter, ie: if (UseObjectMonitorTable) _om_cache.clear(); > I think the not having an assert was to make the caller unconditional, which is good. Done. > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 393: > >> 391: >> 392: ObjectMonitor* LightweightSynchronizer::get_or_insert_monitor(oop object, JavaThread* current, const ObjectSynchronizer::InflateCause cause, bool try_read) { >> 393: assert(LockingMode == LM_LIGHTWEIGHT, "must be"); > > This assert should be assert(UseObjectMonitorTable not LM_LIGHTWEIGHT). Done. > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 732: > >> 730: >> 731: markWord mark = object->mark(); >> 732: assert(!mark.is_unlocked(), "must be unlocked"); > > "must be locked" makes more sense. Done. > This looks in the table for the monitor in UseObjectMonitorTable, but could it first check the BasicLock? We could. > Or we got here because BasicLock.metadata was not the ObjectMonitor? That is one reason we got here. We also get here from C1/interpreter as well as if there are other threads on the entry queues. I think there was an assumption that it would not be that crucial in those cases. One off the reasons we do not read the `BasicLock` cache from the runtime is that we are not as careful with keeping the `BasicLock` initialised on platforms without `UseObjectMonitorTable`. The idea was that as long as they call into the VM, we do not need to keep it invariant. But this made me realise `BasicLock::print_on` will be broken on non x86/aarch64 platforms if running with `UseObjectMonitorTable`. Rather then fix all platforms I will condition BasicLock::object_monitor_cache to return nullptr on not supported platforms. Could add this then. Should probably add an overload to `ObjectSynchronizer::read_monitor` which takes the lock and push i all the way here. > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 773: > >> 771: } >> 772: >> 773: ObjectMonitor* LightweightSynchronizer::inflate_locked_or_imse(oop obj, const ObjectSynchronizer::InflateCause cause, TRAPS) { > > I figured out at one point why we now check IMSE here but now cannot remember. Can you add a comment why above this function? Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1671959198 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1671959362 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1671959515 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1671959614 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1671959763 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1671959852 From jkarthikeyan at openjdk.org Wed Jul 10 20:07:04 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 10 Jul 2024 20:07:04 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) In-Reply-To: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: <l3QGajoAAxigBK5cfIYwdGPTKfbJJJLvnSYisn7O7x8=.15bd4030-3af2-4d3a-a013-8f9c392223f1@github.com> On Tue, 9 Jul 2024 12:07:37 GMT, Galder Zamarre?o <galder at openjdk.org> wrote: > This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. > > Currently vectorization does not kick in for loops containing either of these calls because of the following error: > > > VLoop::check_preconditions: failed: control flow in loop not allowed > > > The control flow is due to the java implementation for these methods, e.g. > > > public static long max(long a, long b) { > return (a >= b) ? a : b; > } > > > This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. > By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. > E.g. > > > SuperWord::transform_loop: > Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined > 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) > > > Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1155 > long max 1173 > > > After the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1042 > long max 1042 > > > This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. > Therefore, it still relies on the macro expansion to transform those into CMoveL. > > I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier1 2500 2500 0 0 >>> jtreg:test/jdk:tier1 ... The C2 changes look nice! I just added one comment here about style. It would also be good to add some IR tests checking that the intrinsic is creating `MaxL`/`MinL` nodes before macro expansion, and a microbenchmark to compare results. src/hotspot/share/opto/library_call.cpp line 8244: > 8242: bool LibraryCallKit::inline_long_min_max(vmIntrinsics::ID id) { > 8243: assert(callee()->signature()->size() == 4, "minL/maxL has 2 parameters of size 2 each."); > 8244: Node *a = argument(0); Suggestion: Node* a = argument(0); And the same for `b` and `n` as well. ------------- PR Review: https://git.openjdk.org/jdk/pull/20098#pullrequestreview-2169250610 PR Review Comment: https://git.openjdk.org/jdk/pull/20098#discussion_r1672350809 From never at openjdk.org Wed Jul 10 20:07:46 2024 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 10 Jul 2024 20:07:46 GMT Subject: RFR: 8335553: [Graal] Compiler thread calls into jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation [v2] In-Reply-To: <BUPsFQTN-twZrvPQBoAMoHXNo_lqIMiTGH-pVnvVVpY=.2bfcc370-6ddb-4e12-8dcb-420aad9e4223@github.com> References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com> <BUPsFQTN-twZrvPQBoAMoHXNo_lqIMiTGH-pVnvVVpY=.2bfcc370-6ddb-4e12-8dcb-420aad9e4223@github.com> Message-ID: <rhD6RNQA26jgX4TALJRlCPGiuF4GYBzMosX-mgBnAQs=.eaaa928e-dd4f-40f0-8696-bf3012c480ed@github.com> On Tue, 9 Jul 2024 13:46:46 GMT, Doug Simon <dnsimon at openjdk.org> wrote: >> This PR addresses intermittent failures in jtreg GC stress tests. The failures occur under these conditions: >> 1. Using a libgraal build with assertions enabled as the top tier JIT compiler. Such a libgraal build will cause a VM exit if an assertion or GraalError occurs in a compiler thread (as this catches more errors in testing). >> 2. A libgraal compiler thread makes a call into the VM (via `CompilerToVM`) to a routine that performs a HotSpot heap allocation that fails. >> 3. The resulting OOME is wrapped in a GraalError, causing the VM to exit as described in 1. >> >> An OOME thrown in these specific conditions should not exit the VM as it not related to an OOME in the app or test. Instead, the failure should be treated as a bailout and the libgraal compiler should continue. >> >> To accomplish this, libgraal needs to be able to distinguish a GraalError caused by an OOME. This PR modifies the exception translation code to make this possible. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > fixed TestTranslatedException looks good. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20083#pullrequestreview-2169495478 From dnsimon at openjdk.org Wed Jul 10 20:07:48 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 10 Jul 2024 20:07:48 GMT Subject: RFR: 8335553: [Graal] Compiler thread calls into jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation [v2] In-Reply-To: <UdqI44rJgcX2ZaV-LzZm54NUA5v3NLT724p61yB_B44=.9218d574-5402-43cf-ada7-6930c0458396@github.com> References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com> <rVd_Q0quLUtgmICEEtFkSzbGfPWD2_RkwX1y5cUS40w=.2fe82b2b-5b49-477a-81a5-9e39bf72a377@github.com> <euEkVDmhbZAK3bZW_b60yHwNzbwl0BWj8d-CuHFNGsQ=.587007e5-ec4d-470c-82db-50301067390f@github.com> <3rVX0mcF68BflX71dFK30ztQEn_RJp9UPrb04AS6ZJM=.c12a4765-e310-43e2-a8ab-c4c3b2628d0c@github.com> <UdqI44rJgcX2ZaV-LzZm54NUA5v3NLT724p61yB_B44=.9218d574-5402-43cf-ada7-6930c0458396@github.com> Message-ID: <heTvTZOQzc0I9H3RoQujcXIubmr5S9h6dSMZQYgHSCo=.c8da13bf-f724-43d3-bf2a-a91b7460e4dc@github.com> On Wed, 10 Jul 2024 06:19:52 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Though I see this is inconsistent with `Exceptions::_throw_msg_cause` > > Okay I think I see how the logic works. If we were going to abort we would never reach `_throw_cause` as the initial `_throw` would have exited. But for the `!thread->can_call_Java()` case the original `_throw` would replace the intended real exception with the dummy `VM_exception()`, which is then "caught" and we try to replace with a more specific exception to be thrown via `throw_cause`, which will again replace whichever exception is requested with the dummy `VM_exception()` - so the end result is we will throw the dummy regardless of whether the cause or wrapping exception is specified. So your fix here makes sense. Great. Would you mind approving this PR as this is the only non-JVMCI file changed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20083#discussion_r1672520461 From duke at openjdk.org Wed Jul 10 20:09:45 2024 From: duke at openjdk.org (Robert Toyonaga) Date: Wed, 10 Jul 2024 20:09:45 GMT Subject: RFR: 8330144: Revise os::free_memory() [v2] In-Reply-To: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com> References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com> Message-ID: <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com> > ### Summary > On linux, change `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` so that it uses `madvise(MADV_DONTNEED)` (similar to the BSD implementation) instead of recommitting over the existing committed memory to discard the existing pages. This function should free the underlying memory without uncommitting. The benefit of this change is that we can get rid of conditional logic dependent on whether we're dealing with huge pages, `madvise` can't fail, and we can also get rid of the "alignment_hint" parameter. > > `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` has also been renamed to `os::disclaim_memory(char *addr, size_t bytes)` to differentiate it from `os::free_memory()` which reports the size of free memory instead of actually releasing memory. > > **Transparent huge pages:** > `madvise(MADV_DONTNEED)` works with THP. As with small pages, `madvise(MADV_DONTNEED)` results in the memory being freed, RSS decreasing, and the addresses can be re-touched without being explicitly recommitted. > > To determine this, I set /sys/kernel/mm/transparent_hugepage/enabled to "always" and allocated a large amount of memory. Then /proc/PID/smaps shows that THP are being used to back that memory. After calling `disclaim_memory`, RSS decreases indicating the memory is no longer live. The `os::committed_in_range function` also reports that the memory has been freed (This function should probably be renamed to `live_in_range`). Touching the addresses again afterward is fine as well. > > **Explicit huge pages:** > `madvise(MADV_DONTNEED)` does not result in memory being freed when used on explicit huge pages. However, the pages are not lost either. Additionally, after `madvise(MADV_DONTNEED)`, we can retouch the addresses without any problems. In conclusion, `madvise(MADV_DONTNEED)` has no affect on huge pages. This means the behavior of of this function with respect to huge pages remains the same. We can remove the "alignment_hint" parameter. > > To determine this, I allocated some huge pages via /proc/sys/vm/nr_hugepages. Successful allocation was confirmed with /proc/meminfo. After calling `disclaim_memory`, /proc/meminfo shows no change in the number of huge pages in use. Explicit huge pages are not reflected in RSS so I used the `os::committed_in_range function` instead. After calling `disclaim_memory`, the `os::committed_in_range` function reports that the memory is still live. Unfortunately that's not an improvement upon existing behav... Robert Toyonaga has updated the pull request incrementally with two additional commits since the last revision: - Minor cleanup and comments. - rename to disclaim_memory and update test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20080/files - new: https://git.openjdk.org/jdk/pull/20080/files/dcf6c80f..6c9e6d5c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20080&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20080&range=00-01 Stats: 26 lines in 10 files changed: 2 ins; 11 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/20080.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20080/head:pull/20080 PR: https://git.openjdk.org/jdk/pull/20080 From stuefe at openjdk.org Wed Jul 10 20:09:49 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 10 Jul 2024 20:09:49 GMT Subject: RFR: 8330144: Revise os::free_memory() [v2] In-Reply-To: <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com> References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com> <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com> Message-ID: <6K4CYd2I1hDSi8nlwA8CEWyVkoqCsJtZ_FwE1Z6ufMQ=.d0e9c909-d1d6-4767-81a6-57f7bbda170f@github.com> On Wed, 10 Jul 2024 17:58:25 GMT, Robert Toyonaga <duke at openjdk.org> wrote: >> ### Summary >> On linux, change `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` so that it uses `madvise(MADV_DONTNEED)` (similar to the BSD implementation) instead of recommitting over the existing committed memory to discard the existing pages. This function should free the underlying memory without uncommitting. The benefit of this change is that we can get rid of conditional logic dependent on whether we're dealing with huge pages, `madvise` can't fail, and we can also get rid of the "alignment_hint" parameter. >> >> `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` has also been renamed to `os::disclaim_memory(char *addr, size_t bytes)` to differentiate it from `os::free_memory()` which reports the size of free memory instead of actually releasing memory. >> >> **Transparent huge pages:** >> `madvise(MADV_DONTNEED)` works with THP. As with small pages, `madvise(MADV_DONTNEED)` results in the memory being freed, RSS decreasing, and the addresses can be re-touched without being explicitly recommitted. >> >> To determine this, I set /sys/kernel/mm/transparent_hugepage/enabled to "always" and allocated a large amount of memory. Then /proc/PID/smaps shows that THP are being used to back that memory. After calling `disclaim_memory`, RSS decreases indicating the memory is no longer live. The `os::committed_in_range function` also reports that the memory has been freed (This function should probably be renamed to `live_in_range`). Touching the addresses again afterward is fine as well. >> >> **Explicit huge pages:** >> `madvise(MADV_DONTNEED)` does not result in memory being freed when used on explicit huge pages. However, the pages are not lost either. Additionally, after `madvise(MADV_DONTNEED)`, we can retouch the addresses without any problems. In conclusion, `madvise(MADV_DONTNEED)` has no affect on huge pages. This means the behavior of of this function with respect to huge pages remains the same. We can remove the "alignment_hint" parameter. >> >> To determine this, I allocated some huge pages via /proc/sys/vm/nr_hugepages. Successful allocation was confirmed with /proc/meminfo. After calling `disclaim_memory`, /proc/meminfo shows no change in the number of huge pages in use. Explicit huge pages are not reflected in RSS so I used the `os::committed_in_range function` instead. After calling `disclaim_memory`, the `os::committed_in_range` function reports that the memory is still live. Unfortunately that's not an imp... > > Robert Toyonaga has updated the pull request incrementally with two additional commits since the last revision: > > - Minor cleanup and comments. > - rename to disclaim_memory and update test minor nits, fine otherwise src/hotspot/os/windows/os_windows.cpp line 3896: > 3894: > 3895: void os::pd_realign_memory(char *addr, size_t bytes, size_t alignment_hint) { } > 3896: void os::pd_disclaim_memory(char *addr, size_t bytes) { } Give us a little comment about what this API does? "Hints to the OS that the memory is not needed anymore and can be reclaimed by the OS; will destroy memory content; it will be re-aquired on touch, no explicit committing needed" Something like that test/hotspot/gtest/runtime/test_os.cpp line 44: > 42: #include <sys/mman.h> > 43: #endif > 44: Not needed anymore ------------- Changes requested by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20080#pullrequestreview-2169683106 PR Review Comment: https://git.openjdk.org/jdk/pull/20080#discussion_r1672614807 PR Review Comment: https://git.openjdk.org/jdk/pull/20080#discussion_r1672616393 From duke at openjdk.org Wed Jul 10 20:09:53 2024 From: duke at openjdk.org (Robert Toyonaga) Date: Wed, 10 Jul 2024 20:09:53 GMT Subject: RFR: 8330144: Revise os::free_memory() [v2] In-Reply-To: <RkdpuSUNmZ4sLShuFs-FxWivLrnc7Hd_0t5eAQspR0g=.75741bbc-6af3-42fb-acd5-1cc413060f8a@github.com> References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com> <RkdpuSUNmZ4sLShuFs-FxWivLrnc7Hd_0t5eAQspR0g=.75741bbc-6af3-42fb-acd5-1cc413060f8a@github.com> Message-ID: <heNwdf-AALEgj3UMZHGlj2JRbpD2ziefcLrDPpgAYUo=.ab7733d2-c323-4d28-9ddf-1568dbf6c5cb@github.com> On Tue, 9 Jul 2024 18:26:52 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Robert Toyonaga has updated the pull request incrementally with two additional commits since the last revision: >> >> - Minor cleanup and comments. >> - rename to disclaim_memory and update test > > Great, thanks @roberttoyonaga. The main work was the analysis work beforehand. > > About naming, I would name the thing "os::disclaim_memory". free_without_uncommit is a mouthful. There is a precedence in the "disclaim" API on AIX, which in a future RFE may be used to implement os::disclaim_memory. Thank you @tstuefe for the review feedback! I've renamed `free_memory_without_uncommit` to `disclaim_memory` and removed the `committed_in_range` check so the unit test can be more reliable. > src/hotspot/os/windows/os_windows.cpp line 3896: > >> 3894: >> 3895: void os::pd_realign_memory(char *addr, size_t bytes, size_t alignment_hint) { } >> 3896: void os::pd_disclaim_memory(char *addr, size_t bytes) { } > > Give us a little comment about what this API does? > > "Hints to the OS that the memory is not needed anymore and can be reclaimed by the OS; will destroy memory content; it will be re-aquired on touch, no explicit committing needed" > > Something like that Ok I've added a comment with a description. Is it good practice to add these types of descriptions in the shared code header files (os.hpp), in the platform dependent code (os_linux.hpp), or both? I see some examples of all 3 cases, but I'm wondering if there's a best practice. > test/hotspot/gtest/runtime/test_os.cpp line 1002: > >> 1000: size_t committed_size; >> 1001: address committed_start; >> 1002: ASSERT_FALSE(os::committed_in_range((address) base, size, committed_start, committed_size)); > > Is there a chance of this generating false positives? Do we know if the madvise effect immediate or delayed? That's a good point. Based on the linux [docs](https://man7.org/linux/man-pages/man2/madvise.2.html) it might not happen immediately, causing the test to be flaky. I'll remove the `committed_in_range` check. I suppose we could poll with a timeout, but there's still no guarantee the pages actually get freed in a timely manner. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20080#issuecomment-2220609342 PR Review Comment: https://git.openjdk.org/jdk/pull/20080#discussion_r1672735056 PR Review Comment: https://git.openjdk.org/jdk/pull/20080#discussion_r1672249380 From aboldtch at openjdk.org Wed Jul 10 20:10:07 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 10 Jul 2024 20:10:07 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v5] In-Reply-To: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> Message-ID: <nho0iQHJu__oLvxJF3oE1qBlFiSvUoZ6dLEIc139KqA=.5ba0e931-a6c4-443d-b9d7-715da000d045@github.com> > When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. > > This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. > > A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). > > This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). > > Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. > > # Cleanups > > Cleaned up displaced header usage for: > * BasicLock > * Contains some Zero changes > * Renames one exported JVMCI field > * ObjectMonitor > * Updates comments and tests consistencies > > # Refactoring > > `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. > > The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. > > _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ > > # LightweightSynchronizer > > Working on adapting and incorporating the following section as a comment in the source code > > ## Fast Locking > > CAS on locking bits in markWord. > 0b00 (Fast Locked) <--> 0b01 (Unlocked) > > When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. > > If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf... Axel Boldt-Christmas has updated the pull request incrementally with four additional commits since the last revision: - Add extra comments in LightweightSynchronizer::inflate_fast_locked_object - Fix typos - Remove unused variable - Add missing inline qualifiers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20067/files - new: https://git.openjdk.org/jdk/pull/20067/files/d12aa5f6..a207544b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=03-04 Stats: 16 lines in 3 files changed: 8 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/20067.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067 PR: https://git.openjdk.org/jdk/pull/20067 From liach at openjdk.org Wed Jul 10 20:11:11 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 10 Jul 2024 20:11:11 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields In-Reply-To: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> Message-ID: <-jOMegvM_uFyEogeqPY8GwECPw70jvpiPZsabUMXB30=.976616cc-023e-4559-ad31-bebad0f92982@github.com> On Mon, 10 Jun 2024 18:05:09 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > See bug for more discussion. > > Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193 > > A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice: > https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160 > > AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone. > > I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields. > > Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit. > > C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner. > > Additional testing: > - [x] New IR tests > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Can the barrier issue be bypassed with this pattern: private @Stable Value field; Value getter() { var local = field; // avoid double read if (local == null) local = computeAndSet(); // avoid double read, no fence here in getter return local; } private Value computeAndSet() { var result = ... // compute value field = result; // write must be here, or barrier will be in getter return result; // to avoid double read } And since you are still inserting barriers the same way constructor barriers are inserted, can I say that such a more usual pattern: private @Stable Value field; Value getter() { var local = field; // avoid double read if (local == null) local = field = ... // inserts StoreStore after return local; } will still suffer from the regression observed in https://github.com/openjdk/jdk/pull/19433#discussion_r1619053915, or is that completely fixed? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2221087037 From rehn at openjdk.org Wed Jul 10 20:12:07 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 10 Jul 2024 20:12:07 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v23] In-Reply-To: <CJzw2cha3OyqX9jnxeFj9se8z4V6alfhaTAHxj_R63k=.86e35c57-9bf9-4d22-a350-45d10c4e307b@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> <CJzw2cha3OyqX9jnxeFj9se8z4V6alfhaTAHxj_R63k=.86e35c57-9bf9-4d22-a350-45d10c4e307b@github.com> Message-ID: <oMF9O0P5E3HDqMFBYXEjS6Vg1AErulnyVarmMrTBGSk=.38ecb986-596b-4f28-8e3b-b5dd9c18998e@github.com> On Thu, 4 Jul 2024 14:48:36 GMT, Robbin Ehn <rehn at openjdk.org> wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL <trampo> >> Stubs: >> AUIPC >> LD >> JALR >> <DEST> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> <DEST> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > _ld to ld I have not seen (new) issues in testing. I would have prefered one or two more reviewers, but since RV is not the biggest platform I'll settle with just passing the bar. I'll go ahead and integrate if @RealFYang and @Hamlin-Li re-reviews (as the new rules are in-effect which require latest rev to be reviewed). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2220282126 From fyang at openjdk.org Wed Jul 10 20:12:07 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 10 Jul 2024 20:12:07 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v23] In-Reply-To: <CJzw2cha3OyqX9jnxeFj9se8z4V6alfhaTAHxj_R63k=.86e35c57-9bf9-4d22-a350-45d10c4e307b@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> <CJzw2cha3OyqX9jnxeFj9se8z4V6alfhaTAHxj_R63k=.86e35c57-9bf9-4d22-a350-45d10c4e307b@github.com> Message-ID: <gBaz5XlGA4DywDyB2NIlCqY4A1zbkN5y7zhXTvEgFbM=.9fd83e49-0a5a-4530-a87d-321e05b66016@github.com> On Thu, 4 Jul 2024 14:48:36 GMT, Robbin Ehn <rehn at openjdk.org> wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL <trampo> >> Stubs: >> AUIPC >> LD >> JALR >> <DEST> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> <DEST> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > _ld to ld Also performed tier1-3 and hotspot:tier4 on my unmatched boards. Result looks fine. Just witnessed several unnecessary uses of namespace `Assembler`. Guess you might want to clean it up? Still good otherwise. diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp index b39ac79be6b..e349eab3177 100644 --- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp +++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp @@ -983,9 +983,9 @@ void MacroAssembler::load_link_jump(const address source, Register temp) { assert_cond(source != nullptr); int64_t distance = source - pc(); assert(is_simm32(distance), "Must be"); - Assembler::auipc(temp, (int32_t)distance + 0x800); - Assembler::ld(temp, temp, ((int32_t)distance << 20) >> 20); - Assembler::jalr(x1, temp, 0); + auipc(temp, (int32_t)distance + 0x800); + ld(temp, Address(temp, ((int32_t)distance << 20) >> 20)); + jalr(temp); } void MacroAssembler::jump_link(const address dest, Register temp) { @@ -994,7 +994,7 @@ void MacroAssembler::jump_link(const address dest, Register temp) { int64_t distance = dest - pc(); assert(is_simm21(distance), "Must be"); assert((distance % 2) == 0, "Must be"); - Assembler::jal(x1, distance); + jal(x1, distance); } void MacroAssembler::j(const address dest, Register temp) { ------------- PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2220677274 From aph at openjdk.org Wed Jul 10 20:14:06 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 10 Jul 2024 20:14:06 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v9] In-Reply-To: <eT48AR-Up7CyMkuiFet-hoQtyaO_hifCSZUQ6LJrjnQ=.026071f1-de0f-4589-a247-c7fc2afe68c4@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com> <eT48AR-Up7CyMkuiFet-hoQtyaO_hifCSZUQ6LJrjnQ=.026071f1-de0f-4589-a247-c7fc2afe68c4@github.com> Message-ID: <2VnXjMF_4HQa-bHWEW0-VaXF9VtQUs92mnPyUlF8UY8=.b6d68aab-b0f5-4544-b543-046d12f92b1b@github.com> On Mon, 8 Jul 2024 16:40:50 GMT, Andrew Haley <aph at openjdk.org> wrote: >> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits: >> >> - Merge branch 'master' into sleef-aarch64-integrate-source >> - merge master >> - sleef 3.6.1 for riscv >> - sleef 3.6.1 >> - update header files for arm >> - add inline header file for riscv64 >> - remove notes about sleef changes >> - fix performance issue >> - disable unused-function warnings; add log msg >> - minor >> - ... and 23 more: https://git.openjdk.org/jdk/compare/2f4f6cc3...b54fc863 > >> While I agree with you in principle, we chose to import Sleef this way for practical reasons. (The actual importing of Sleef is happening in #19185 / [JDK-8329816](https://bugs.openjdk.org/browse/JDK-8329816).) The "preprocessing/code-generation" part of the Sleef build was considered too complex to reasonably replicate in the OpenJDK build system. Sleef is built using Cmake and we do not want to add a build dependency on Cmake and call out to a foreign build system at build time, for efficiency and complexity reasons. > > Of course, there is no reason to rebuild the preprocessed headers every time we build the JDK. I'd never ask for that; the last thing I want is to make building the JDK slower. However, it should be possible to do so on a checked-out JDK source tree, at the builder's option. > > If there is a script, it doesn't have to be included in the OpenJDK build system itself, but it does have to be in the OpenJDK source tree. (It could be part of make/devkit, for example.) > > With a script to produce preprocessed files, it should be possible for anyone building the JDK to run that script, and produce the preprocessed source. SLEEF won't take up a prohibitive amount of space. > > We shouldn't be depending on some other web site somewhere being able to come up with the exact SLEEF sources we used, either. That fails the test of reproducibility. > >> JDK-8329816 comes with a script to automatically generate the imported source files, to make it easy to update Sleef in the future. It should also be easy enough to verify the imported contents using the same script for anyone who wants to check the validity of the import step. > > I get it, but not including everything we use in the OpenJDK tree is a dangerous precedent. It should be no big deal to do this right, given that we have the SLEEF sources and the build scripts already. I'm not asking for anything that doesn't exist already, I'm just saying that it must be checked in. > > Avoiding inconvenience, however great, is not sufficient to justify such a step. This is perhaps something to discuss at the next Committers' Workshop. > @theRealAph a precendent that exists is for binutils/llvm/capstone and hsdis. Would it be sufficient for the user to choose to build SLEEF from a separate source directory assuming all the dependencies are installed already (the source are checked-out by the user; cmake and other build dependencies are installed; etc.)? I believe that it's those who want to deviate from the standard best practice of providing source code in its preferred form who must come up with a compelling argument why it is necessary. I can't tell what problem we're trying to solve by not simply checking in the source code, in its preferred form, to the OpenJDK tree. Thhis has practical advantages to do with traceability and security, and in-principle reasons to do with basic Open Source practice too. On the other side, there are no disadvantages. We've been here before, and the response from @PaulSandoz to a similar case (checking in compiler-generated asm) was: > I don?t think this should be considered a generally acceptable approach for Vector API operations (most code for operations does not and should not follow this approach), nor is it generally acceptable for other kinds of intrinsic in HotSpot (I believe there are a few special cases under os_cpu). https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2021-May/047094.html Having said that, the problem in that case was much worse, in that the corresponding source code was not available at all. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2220180978 From mli at openjdk.org Wed Jul 10 20:14:06 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 10 Jul 2024 20:14:06 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v9] In-Reply-To: <TcrB6zIH-yx-6fyLfnQy4NHk5w8VqXm3anTAxbQJtXY=.8181016f-5d4d-4349-a8d7-343db9817f40@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com> <TcrB6zIH-yx-6fyLfnQy4NHk5w8VqXm3anTAxbQJtXY=.8181016f-5d4d-4349-a8d7-343db9817f40@github.com> Message-ID: <QyiEmIS1_Pev-mjPz602JVskO6NUBdr1qwKolAmBpFo=.a672265f-3029-4b8e-bd8f-adcd42899a31@github.com> On Mon, 8 Jul 2024 16:20:40 GMT, Andrew Haley <aph at openjdk.org> wrote: > I finally did some measurements. Thanks for testing it! > It would be nice if the JMH test were part of this patch. OK, I can do that later. > > It mostly looks good, but I can see an odd regression of DoubleMaxVector.TANH (by 39%) on Apple M1. I don't really know why this is, given that tanh(x) is almost certainly based on expm1(x). This probably isn't important, but it is odd. Yes, it has some regression in TANH, I have modified the code to skip TANH (https://github.com/openjdk/jdk/pull/18605/commits/6061c25de00423f2c92c08ce40af4815c0fa3933) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2220231384 From pchilanomate at openjdk.org Wed Jul 10 20:17:31 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 10 Jul 2024 20:17:31 GMT Subject: RFR: 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 [v4] In-Reply-To: <UtACtbpQujJHrXFzh_GqeAIzPtttQEM5T48LRQhZB84=.0183e5bf-5a04-45ed-8fe1-aea9558a301c@github.com> References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> <SDFSzAJLVcfhnlfPyRDZTI2hiF7sLfYqbymrGe8-BUw=.1004d539-7085-4b89-81eb-0e411b960385@github.com> <UtACtbpQujJHrXFzh_GqeAIzPtttQEM5T48LRQhZB84=.0183e5bf-5a04-45ed-8fe1-aea9558a301c@github.com> Message-ID: <Y5wE-fbEaWcKgFRtjfG-GxwlmkUIxYhEoCjg3RdKCkw=.b3a8a340-a9aa-454e-b1e3-25506dd3b618@github.com> On Wed, 10 Jul 2024 05:35:40 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> address Thomas' comments > > Still good. Thanks for the reviews @dholmes-ora, @coleenp, @shipilev and @tstuefe! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20012#issuecomment-2220963716 From pchilanomate at openjdk.org Wed Jul 10 20:17:32 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 10 Jul 2024 20:17:32 GMT Subject: Integrated: 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 In-Reply-To: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com> Message-ID: <nmx8GK5dgR3wmGoPDk-HxquxG_yOySOvwi2lhfYJz5g=.b271ab2d-fa96-4796-8dea-16be764bc42a@github.com> On Wed, 3 Jul 2024 16:24:20 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote: > The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens. > > The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details). > > The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. > > I tested the patch by running it through mach5 tiers 1-6. > > Thanks, > Patricio This pull request has now been integrated. Changeset: 7ab96c74 Author: Patricio Chilano Mateo <pchilanomate at openjdk.org> URL: https://git.openjdk.org/jdk/commit/7ab96c74e2c39f430a5c2f65a981da7314a2385b Stats: 55 lines in 3 files changed: 6 ins; 20 del; 29 mod 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 Reviewed-by: dholmes, stuefe, coleenp, shade ------------- PR: https://git.openjdk.org/jdk/pull/20012 From ayang at openjdk.org Wed Jul 10 20:29:37 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 10 Jul 2024 20:29:37 GMT Subject: RFR: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and VM_ParallelGCSystemGC [v3] In-Reply-To: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com> References: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com> Message-ID: <CTc1SUPyk4eTQPSB-vU374oKCCvcgLvaM-cPm9qFilk=.67d7d034-5055-429a-948a-d9ec1e834324@github.com> > Similar cleanup as https://github.com/openjdk/jdk/pull/19056 but in Parallel. As a result, the corresponding code in `SerialHeap` and `ParallelScavengeHeap` share much similarity. > > The easiest way to review is to start from these two VM operations, `VM_ParallelCollectForAllocation` and `VM_ParallelGCCollect` and follow the new code directly, where one can see how allocation-failure triggers various GCs with different collection efforts. > > Test: tier1-6; perf-neural for dacapo, specjvm2008, specjbb2015 and cachestresser. Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - review - Merge branch 'master' into pgc-vm-operation - pgc-vm-operation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20077/files - new: https://git.openjdk.org/jdk/pull/20077/files/a7c69102..1d10dd5b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20077&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20077&range=01-02 Stats: 1388 lines in 122 files changed: 508 ins; 309 del; 571 mod Patch: https://git.openjdk.org/jdk/pull/20077.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20077/head:pull/20077 PR: https://git.openjdk.org/jdk/pull/20077 From zgu at openjdk.org Wed Jul 10 20:29:40 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Wed, 10 Jul 2024 20:29:40 GMT Subject: RFR: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and VM_ParallelGCSystemGC [v2] In-Reply-To: <N4uBvRzIP52a4DgIeIx3ArjKPF0JrTI2bVsmHtD0rJg=.f7e1bb49-9bcd-420c-97fb-2617c798b5b7@github.com> References: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com> <N4uBvRzIP52a4DgIeIx3ArjKPF0JrTI2bVsmHtD0rJg=.f7e1bb49-9bcd-420c-97fb-2617c798b5b7@github.com> Message-ID: <BpWx1BDGYxzS7d-mUGi1KcIUsD9sScds0q-Gu_nV1R4=.8dd4d293-6ffd-472f-9d78-39ad1d97f446@github.com> On Mon, 8 Jul 2024 16:31:43 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: >> Similar cleanup as https://github.com/openjdk/jdk/pull/19056 but in Parallel. As a result, the corresponding code in `SerialHeap` and `ParallelScavengeHeap` share much similarity. >> >> The easiest way to review is to start from these two VM operations, `VM_ParallelCollectForAllocation` and `VM_ParallelGCCollect` and follow the new code directly, where one can see how allocation-failure triggers various GCs with different collection efforts. >> >> Test: tier1-6; perf-neural for dacapo, specjvm2008, specjbb2015 and cachestresser. > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > pgc-vm-operation I really like this refactor, that brings parallel close to other GCs. Just a few nits, otherwise, LGTM src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 273: > 271: > 272: bool is_tlab = false; > 273: return mem_allocate_work(size, is_tlab, gc_overhead_limit_was_exceeded); Suggest: `return mem_allocate_work(size, false /* is_tlab */, gc_overhead_limit_was_exceeded);` src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 478: > 476: > 477: const bool clear_all_soft_refs = true; > 478: do_full_collection_no_gc_locker(clear_all_soft_refs); Suggest: not define `const bool clear_all_soft_refs = true;` and do `do_full_collection_no_gc_locker(true /* clear_all_soft_refs */);` instead src/hotspot/share/gc/parallel/psVMOperations.cpp line 68: > 66: > 67: GCCauseSetter gccs(heap, _gc_cause); > 68: heap->try_collect_at_safepoint(is_cause_full(_gc_cause)); can be simplified to `heap->try_collect_at_safepoint(_full);` ------------- Marked as reviewed by zgu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20077#pullrequestreview-2166570482 PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1670678592 PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1671439533 PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1672170373 From rehn at openjdk.org Wed Jul 10 20:31:27 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 10 Jul 2024 20:31:27 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v23] In-Reply-To: <gBaz5XlGA4DywDyB2NIlCqY4A1zbkN5y7zhXTvEgFbM=.9fd83e49-0a5a-4530-a87d-321e05b66016@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> <CJzw2cha3OyqX9jnxeFj9se8z4V6alfhaTAHxj_R63k=.86e35c57-9bf9-4d22-a350-45d10c4e307b@github.com> <gBaz5XlGA4DywDyB2NIlCqY4A1zbkN5y7zhXTvEgFbM=.9fd83e49-0a5a-4530-a87d-321e05b66016@github.com> Message-ID: <5bnFsKi9By23yhgIvs-Kzx4KghK4lOaqE7a9igZyZnM=.4fc8a62b-1373-48ef-8c24-484933fea402@github.com> On Wed, 10 Jul 2024 14:31:23 GMT, Fei Yang <fyang at openjdk.org> wrote: > Also performed tier1-3 and hotspot:tier4 on my unmatched boards. Result looks fine. Just witnessed several unnecessary uses of namespace `Assembler`. Guess you might want to clean it up? Still good otherwise. Thanks, fixed! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2221370260 From pchilanomate at openjdk.org Wed Jul 10 20:31:43 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 10 Jul 2024 20:31:43 GMT Subject: RFR: 8335269: [Graal] occasional timeout in java/lang/StringBuffer/TestSynchronization.java with loom [v3] In-Reply-To: <4SmCasO8fGVxb0wnRWQcMDUM63yub0jqnDbVyRr-xBs=.042f56b8-d4f1-4460-95b9-ed09df545b3e@github.com> References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com> <xcZfnPE5iPxfz9WTSkNWCamtfVSXhpg5UNojhYBsW30=.72bf8fbc-60bc-4250-9284-79b2d75150fb@github.com> <4SmCasO8fGVxb0wnRWQcMDUM63yub0jqnDbVyRr-xBs=.042f56b8-d4f1-4460-95b9-ed09df545b3e@github.com> Message-ID: <RWb7Mt_BMrYVBR3UwJvh7tRR504wpP0RNwvfC5H1R4E=.440e6564-74fb-4758-a4ad-6d2938243893@github.com> On Wed, 10 Jul 2024 05:25:48 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename test to ThreadPollOnYield.java > > test/jdk/java/lang/Thread/virtual/ThreadPollOnYield.java line 39: > >> 37: * @requires vm.continuations >> 38: * @library /test/lib >> 39: * @run junit/othervm -Xcomp -XX:-TieredCompilation -XX:CompileCommand=inline,*::yield* -XX:CompileCommand=inline,*::*Yield ThreadPollOnYield > > Given this forces -Xcomp shouldn't we skip running it when compilation mode is set via jtreg flags? The test should never fail even with external flags, so if anything it's just extra testing. But I can add vm.flagless if you prefer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1672931556 From rehn at openjdk.org Wed Jul 10 20:31:27 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 10 Jul 2024 20:31:27 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v24] In-Reply-To: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> Message-ID: <5ejRWsbRIP1r1H0oOENrsDrHaMebfqfNGrIMc-UjogQ=.7ccd8152-311d-4164-8a4a-17a110561cac@github.com> > Hi all, please consider! > > Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). > Using a very small application or running very short time we have fast patchable calls. > But any normal application running longer will increase the code size and code chrun/fragmentation. > So whatever or not you get hot fast calls rely on luck. > > To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. > This would be the common case for a patchable call. > > Code stream: > JAL <trampo> > Stubs: > AUIPC > LD > JALR > <DEST> > > > On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. > Even if you don't have that problem having a call to a jump is not the fastest way. > Loading the address avoids the pitsfalls of cmodx. > > This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, > and instead do by default: > > Code stream: > AUIPC > LD > JALR > Stubs: > <DEST> > > An experimental option for turning trampolines back on exists. > > It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. > > Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): > > fop (msec) 2239 | 2128 = 0.950424 > h2 (msec) 18660 | 16594 = 0.889282 > jython (msec) 22022 | 21925 = 0.995595 > luindex (msec) 2866 | 2842 = 0.991626 > lusearch (msec) 4108 | 4311 = 1.04942 > lusearch-fix (msec) 4406 | 4116 = 0.934181 > pmd (msec) 5976 | 5897 = 0.98678 > jython (msec) 22022 | 21925 = 0.995595 > Avg: 0.974112 > fop(xcomp) (msec) 2721 | 2714 = 0.997427 > h2(xcomp) (msec) 37719 | 38004 = 1.00756 > jython(xcomp) ... Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: - Skip qualify ins - Merge branch 'master' into 8332689 - _ld to ld - Merge branch 'master' into 8332689 - Rename to reloc_call - Merge branch 'master' into 8332689 - Rename lc - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - Comments - ... and 24 more: https://git.openjdk.org/jdk/compare/242f1133...242c3790 ------------- Changes: https://git.openjdk.org/jdk/pull/19453/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=23 Stats: 897 lines in 16 files changed: 622 ins; 177 del; 98 mod Patch: https://git.openjdk.org/jdk/pull/19453.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453 PR: https://git.openjdk.org/jdk/pull/19453 From mli at openjdk.org Wed Jul 10 20:42:06 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 10 Jul 2024 20:42:06 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v23] In-Reply-To: <gBaz5XlGA4DywDyB2NIlCqY4A1zbkN5y7zhXTvEgFbM=.9fd83e49-0a5a-4530-a87d-321e05b66016@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> <CJzw2cha3OyqX9jnxeFj9se8z4V6alfhaTAHxj_R63k=.86e35c57-9bf9-4d22-a350-45d10c4e307b@github.com> <gBaz5XlGA4DywDyB2NIlCqY4A1zbkN5y7zhXTvEgFbM=.9fd83e49-0a5a-4530-a87d-321e05b66016@github.com> Message-ID: <SN7gZ_XJWn2jG_DXGmzHWqVfV1xz_vG-BTkotAbuzkM=.c48958e0-a982-4e38-bb0c-fac37d4de7f1@github.com> On Wed, 10 Jul 2024 14:31:23 GMT, Fei Yang <fyang at openjdk.org> wrote: >> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: >> >> _ld to ld > > Also performed tier1-3 and hotspot:tier4 on my unmatched boards. Result looks fine. > Just witnessed several unnecessary uses of namespace `Assembler`. Guess you might want to clean it up? Still good otherwise. > > > diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp > index b39ac79be6b..e349eab3177 100644 > --- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp > +++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp > @@ -983,9 +983,9 @@ void MacroAssembler::load_link_jump(const address source, Register temp) { > assert_cond(source != nullptr); > int64_t distance = source - pc(); > assert(is_simm32(distance), "Must be"); > - Assembler::auipc(temp, (int32_t)distance + 0x800); > - Assembler::ld(temp, temp, ((int32_t)distance << 20) >> 20); > - Assembler::jalr(x1, temp, 0); > + auipc(temp, (int32_t)distance + 0x800); > + ld(temp, Address(temp, ((int32_t)distance << 20) >> 20)); > + jalr(temp); > } > > void MacroAssembler::jump_link(const address dest, Register temp) { > @@ -994,7 +994,7 @@ void MacroAssembler::jump_link(const address dest, Register temp) { > int64_t distance = dest - pc(); > assert(is_simm21(distance), "Must be"); > assert((distance % 2) == 0, "Must be"); > - Assembler::jal(x1, distance); > + jal(x1, distance); > } > > void MacroAssembler::j(const address dest, Register temp) { > I have not seen (new) issues in testing. I would have prefered one or two more reviewers, but since RV is not the biggest platform I'll settle with just passing the bar. I'll go ahead and integrate if @RealFYang and @Hamlin-Li re-reviews (as the new rules are in-effect which require latest rev to be reviewed). Still good to me. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2221406698 From pchilanomate at openjdk.org Wed Jul 10 21:12:13 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 10 Jul 2024 21:12:13 GMT Subject: RFR: 8335269: [Graal] occasional timeout in java/lang/StringBuffer/TestSynchronization.java with loom [v4] In-Reply-To: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com> References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com> Message-ID: <3LQOJzJSDdWhZk3Xkg7WuWavMrXEbYnWZ1mnYfkGllQ=.0dbfb507-7531-4154-9429-98783fffbf38@github.com> > Please review the following simple fix. A pinned virtual thread calling Thread.yield() in a loop might never poll for safepoints if the compiler relies on a poll in native method Continuation.doYield while optimizing. This is a special native method that doesn't always poll for safepoints, and in particular it doesn't if the virtual thread is pinned due to owning monitors. Currently this scenario can be reproduced with the Graal compiler. > > I included a test which reproduces the issue with Graal (couldn't reproduce the issue with c2). The test times out without the fix and passes with it. I also run the patch through mach5 tiers1-3. > > Thanks, > Patricio Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: add new line at end ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20016/files - new: https://git.openjdk.org/jdk/pull/20016/files/79be1fcc..1cf425dd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20016&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20016&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20016.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20016/head:pull/20016 PR: https://git.openjdk.org/jdk/pull/20016 From dlong at openjdk.org Wed Jul 10 23:37:54 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 10 Jul 2024 23:37:54 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields In-Reply-To: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> Message-ID: <tyXcVLFV6t00-jDYMMHuaRPfScHuvsuGbPsJaCO1ALc=.c002e92d-92aa-4030-a320-a0dcfe7dc5d1@github.com> On Mon, 10 Jun 2024 18:05:09 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > See bug for more discussion. > > Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193 > > A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice: > https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160 > > AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone. > > I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields. > > Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit. > > C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner. > > Additional testing: > - [x] New IR tests > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Do we still need separate _wrote_stable and _wrote_final flags, or could we combine them into _wrote_stable_or_final? Then we are almost back to pre-8031818, when _wrote_final was overloaded to mean write to final or stable field. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2221706861 From liach at openjdk.org Wed Jul 10 23:41:55 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 10 Jul 2024 23:41:55 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields In-Reply-To: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> Message-ID: <Dx1u2Oy8ouhWC67EP_R94bPruoOh9bKgYkJx1C4_Yjw=.790a9fe9-0480-471f-b815-7f01ad226e37@github.com> On Mon, 10 Jun 2024 18:05:09 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > See bug for more discussion. > > Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193 > > A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice: > https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160 > > AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone. > > I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields. > > Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit. > > C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner. > > Additional testing: > - [x] New IR tests > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` If we merge the stable and final flags, won't this: Value getter() { var local = field; if (local == null) local = field = ... // makes the getter final-writing return local; } be regarded the same as any final-writing constructor? Then every call to `getter()` is fenced, and the issue in #19433 isn't solved at all. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2221711629 From fyang at openjdk.org Thu Jul 11 01:56:03 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 11 Jul 2024 01:56:03 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v24] In-Reply-To: <5ejRWsbRIP1r1H0oOENrsDrHaMebfqfNGrIMc-UjogQ=.7ccd8152-311d-4164-8a4a-17a110561cac@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> <5ejRWsbRIP1r1H0oOENrsDrHaMebfqfNGrIMc-UjogQ=.7ccd8152-311d-4164-8a4a-17a110561cac@github.com> Message-ID: <2z3xsgnAt9LvOSV173_Q9N2xsH889wguiWNcdeuw1Ow=.80214702-2567-4e09-bf84-932a25361e5d@github.com> On Wed, 10 Jul 2024 20:31:27 GMT, Robbin Ehn <rehn at openjdk.org> wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL <trampo> >> Stubs: >> AUIPC >> LD >> JALR >> <DEST> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> <DEST> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: > > - Skip qualify ins > - Merge branch 'master' into 8332689 > - _ld to ld > - Merge branch 'master' into 8332689 > - Rename to reloc_call > - Merge branch 'master' into 8332689 > - Rename lc > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Comments > - ... and 24 more: https://git.openjdk.org/jdk/compare/242f1133...242c3790 Looks fine now. Let's ship it :-) ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19453#pullrequestreview-2170722290 From dlong at openjdk.org Thu Jul 11 02:32:56 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 11 Jul 2024 02:32:56 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields In-Reply-To: <Dx1u2Oy8ouhWC67EP_R94bPruoOh9bKgYkJx1C4_Yjw=.790a9fe9-0480-471f-b815-7f01ad226e37@github.com> References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> <Dx1u2Oy8ouhWC67EP_R94bPruoOh9bKgYkJx1C4_Yjw=.790a9fe9-0480-471f-b815-7f01ad226e37@github.com> Message-ID: <xPleoGsDuUPV2s-kxPiWWcqL-_xIpBdoqBy22fibr-c=.4fef483e-0603-4143-bf2a-43bbd321f69f@github.com> On Wed, 10 Jul 2024 23:39:02 GMT, Chen Liang <liach at openjdk.org> wrote: >> See bug for more discussion. >> >> Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193 >> >> A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice: >> https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160 >> >> AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone. >> >> I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields. >> >> Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit. >> >> C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner. >> >> Additional testing: >> - [x] New IR tests >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > If we merge the stable and final flags, won't this: > > Value getter() { > var local = field; > if (local == null) > local = field = ... // makes the getter final-writing > return local; > } > > be regarded the same as any final-writing constructor? > > Then every call to `getter()` is fenced, and the issue in #19433 isn't solved at all. @liach, if I understand this PR correctly, it only adds barriers for final/stable fields in constructors. Previous code to emit barriers for stable fields outside of constructors is removed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2221869982 From dholmes at openjdk.org Thu Jul 11 02:42:56 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 11 Jul 2024 02:42:56 GMT Subject: RFR: 8335269: [Graal] occasional timeout in java/lang/StringBuffer/TestSynchronization.java with loom [v3] In-Reply-To: <RWb7Mt_BMrYVBR3UwJvh7tRR504wpP0RNwvfC5H1R4E=.440e6564-74fb-4758-a4ad-6d2938243893@github.com> References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com> <xcZfnPE5iPxfz9WTSkNWCamtfVSXhpg5UNojhYBsW30=.72bf8fbc-60bc-4250-9284-79b2d75150fb@github.com> <4SmCasO8fGVxb0wnRWQcMDUM63yub0jqnDbVyRr-xBs=.042f56b8-d4f1-4460-95b9-ed09df545b3e@github.com> <RWb7Mt_BMrYVBR3UwJvh7tRR504wpP0RNwvfC5H1R4E=.440e6564-74fb-4758-a4ad-6d2938243893@github.com> Message-ID: <KKXg1PeYIYOr45p4L6lBqNrjMIdMoQI-aydEGygCJZM=.785a668d-9f8a-4211-877b-8fd93f52a835@github.com> On Wed, 10 Jul 2024 20:29:19 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote: >> test/jdk/java/lang/Thread/virtual/ThreadPollOnYield.java line 39: >> >>> 37: * @requires vm.continuations >>> 38: * @library /test/lib >>> 39: * @run junit/othervm -Xcomp -XX:-TieredCompilation -XX:CompileCommand=inline,*::yield* -XX:CompileCommand=inline,*::*Yield ThreadPollOnYield >> >> Given this forces -Xcomp shouldn't we skip running it when compilation mode is set via jtreg flags? > > The test should never fail even with external flags, so if anything it's just extra testing. But I can add vm.flagless if you prefer. flagless might be going too far as we won't test with other GC's etc. Can we just use `@requires vm.compMode != "Xcomp"` to exclude it from the Xcomp specific testing which is redundant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1673317959 From dholmes at openjdk.org Thu Jul 11 02:44:05 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 11 Jul 2024 02:44:05 GMT Subject: RFR: 8335553: [Graal] Compiler thread calls into jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation [v2] In-Reply-To: <BUPsFQTN-twZrvPQBoAMoHXNo_lqIMiTGH-pVnvVVpY=.2bfcc370-6ddb-4e12-8dcb-420aad9e4223@github.com> References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com> <BUPsFQTN-twZrvPQBoAMoHXNo_lqIMiTGH-pVnvVVpY=.2bfcc370-6ddb-4e12-8dcb-420aad9e4223@github.com> Message-ID: <HkENYscsjtr10ThhWQrD3KLdRLO_V6W3lwy4n6t4-RU=.7a724248-60dc-4a93-937d-c6b0d8efcd31@github.com> On Tue, 9 Jul 2024 13:46:46 GMT, Doug Simon <dnsimon at openjdk.org> wrote: >> This PR addresses intermittent failures in jtreg GC stress tests. The failures occur under these conditions: >> 1. Using a libgraal build with assertions enabled as the top tier JIT compiler. Such a libgraal build will cause a VM exit if an assertion or GraalError occurs in a compiler thread (as this catches more errors in testing). >> 2. A libgraal compiler thread makes a call into the VM (via `CompilerToVM`) to a routine that performs a HotSpot heap allocation that fails. >> 3. The resulting OOME is wrapped in a GraalError, causing the VM to exit as described in 1. >> >> An OOME thrown in these specific conditions should not exit the VM as it not related to an OOME in the app or test. Instead, the failure should be treated as a bailout and the libgraal compiler should continue. >> >> To accomplish this, libgraal needs to be able to distinguish a GraalError caused by an OOME. This PR modifies the exception translation code to make this possible. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > fixed TestTranslatedException Non JVMCI changes look good. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20083#pullrequestreview-2170778148 From liach at openjdk.org Thu Jul 11 03:11:58 2024 From: liach at openjdk.org (Chen Liang) Date: Thu, 11 Jul 2024 03:11:58 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields In-Reply-To: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> Message-ID: <Axo66lnMuZLjSYfs7beOiIUJCbEQhkhknedqpD4DHE4=.0b6c900a-b6b3-4756-bc7a-7b9587835fe9@github.com> On Mon, 10 Jun 2024 18:05:09 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > See bug for more discussion. > > Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193 > > A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice: > https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160 > > AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone. > > I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields. > > Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit. > > C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner. > > Additional testing: > - [x] New IR tests > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Ah, so it's like a weaker version of always safe construction. Makes sense. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2221914037 From jkratochvil at openjdk.org Thu Jul 11 03:41:59 2024 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Thu, 11 Jul 2024 03:41:59 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v3] In-Reply-To: <t_jUv9-mkIFcGRInYKmcnfP0W8VwXEtflahjSUiK8zI=.d524b51c-1963-4024-87e0-b12911d475d0@github.com> References: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com> <t_jUv9-mkIFcGRInYKmcnfP0W8VwXEtflahjSUiK8zI=.d524b51c-1963-4024-87e0-b12911d475d0@github.com> Message-ID: <uWlQ046HkkvQZ6nmMUCFMYxlgoeqG296pNj6vBTS2uA=.2199f887-e0da-46da-831b-53fb8c5868aa@github.com> On Mon, 1 Jul 2024 14:43:58 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote: >> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. >> >> I'm adding those tests in order to not regress another time. >> >> Testing: >> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. >> - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) >> - [x] GHA > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Fix comments > - 8333446: Add tests for hierarchical container support test/hotspot/jtreg/ProblemList.txt line 119: > 117: containers/docker/TestMemoryAwareness.java 8303470 linux-all > 118: containers/docker/TestJFREvents.java 8327723 linux-x64 > 119: containers/systemd/SystemdMemoryAwarenessTest.java 8322420 linux-all This line should be removed as long as it gets applied after [17198](https://github.com/openjdk/jdk/pull/17198). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19530#discussion_r1673356918 From qxing at openjdk.org Thu Jul 11 03:43:26 2024 From: qxing at openjdk.org (Qizheng Xing) Date: Thu, 11 Jul 2024 03:43:26 GMT Subject: RFR: 8336163: Remove declarations of some debug-only methods in release build Message-ID: <r3075sVKxO34FohH4gtlidTGqmu5y_0qL4_TU3DdbG8=.fc8b604d-bc8a-48a5-a8a7-8fecbd5d3c4f@github.com> Some of the methods are defined only in debug mode, but their declarations still exist in release mode. This is considered a bug because these methods may be called mistakenly in release mode and cause the build to fail. ------------- Commit messages: - Do not declare some of the methods in release mode. Changes: https://git.openjdk.org/jdk/pull/20131/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20131&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8336163 Stats: 16 lines in 5 files changed: 16 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20131.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20131/head:pull/20131 PR: https://git.openjdk.org/jdk/pull/20131 From jkratochvil at openjdk.org Thu Jul 11 03:44:56 2024 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Thu, 11 Jul 2024 03:44:56 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v3] In-Reply-To: <t_jUv9-mkIFcGRInYKmcnfP0W8VwXEtflahjSUiK8zI=.d524b51c-1963-4024-87e0-b12911d475d0@github.com> References: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com> <t_jUv9-mkIFcGRInYKmcnfP0W8VwXEtflahjSUiK8zI=.d524b51c-1963-4024-87e0-b12911d475d0@github.com> Message-ID: <kSbubsK2cEF-sY-GX4AYliW9dMXZ8IYGBcKIZaalDcU=.b82004c9-2633-4996-8c61-18d3ff9b0fd0@github.com> On Mon, 1 Jul 2024 14:43:58 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote: >> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. >> >> I'm adding those tests in order to not regress another time. >> >> Testing: >> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. >> - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) >> - [x] GHA > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Fix comments > - 8333446: Add tests for hierarchical container support [test.patch.txt](https://github.com/user-attachments/files/16171122/test.patch.txt) * `CPUQuota` (changed it to `AllowedCPUs`) does not work for me - it properly distributes the load but JDK still sees all available CPU cores (4 of my VM). * the change 2 -> 1 cores: // We could check 2 cores ("0-1") but then it would fail on single-core nodes / virtual machines. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2221959393 From stuefe at openjdk.org Thu Jul 11 05:09:58 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 11 Jul 2024 05:09:58 GMT Subject: RFR: 8330144: Revise os::free_memory() [v2] In-Reply-To: <heNwdf-AALEgj3UMZHGlj2JRbpD2ziefcLrDPpgAYUo=.ab7733d2-c323-4d28-9ddf-1568dbf6c5cb@github.com> References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com> <RkdpuSUNmZ4sLShuFs-FxWivLrnc7Hd_0t5eAQspR0g=.75741bbc-6af3-42fb-acd5-1cc413060f8a@github.com> <heNwdf-AALEgj3UMZHGlj2JRbpD2ziefcLrDPpgAYUo=.ab7733d2-c323-4d28-9ddf-1568dbf6c5cb@github.com> Message-ID: <OeVK7uUIkZK572A_hPeHEQzs6YILjp5B2sv6hKv64kk=.89d76253-2ab3-40b5-9508-678a996f9d28@github.com> On Wed, 10 Jul 2024 17:58:25 GMT, Robert Toyonaga <duke at openjdk.org> wrote: >> src/hotspot/os/windows/os_windows.cpp line 3896: >> >>> 3894: >>> 3895: void os::pd_realign_memory(char *addr, size_t bytes, size_t alignment_hint) { } >>> 3896: void os::pd_disclaim_memory(char *addr, size_t bytes) { } >> >> Give us a little comment about what this API does? >> >> "Hints to the OS that the memory is not needed anymore and can be reclaimed by the OS; will destroy memory content; it will be re-aquired on touch, no explicit committing needed" >> >> Something like that > > Ok I've added a comment with a description. > > Is it good practice to add these types of descriptions in the shared code header files (os.hpp), in the platform dependent code (os_linux.hpp), or both? I see some examples of all 3 cases, but I'm wondering if there's a best practice. There is no common format. Sun did not comment any APIs in the beginning. I usually do it above the prototype in the hpp file, because then IDEs cab pick it up and show you the description in tooltips. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20080#discussion_r1673405671 From tanksherman27 at gmail.com Thu Jul 11 05:52:43 2024 From: tanksherman27 at gmail.com (Julian Waters) Date: Thu, 11 Jul 2024 05:52:43 +0000 Subject: Where does VMError::print_native_stack and os::get_sender_for_C_frame load/use the frame pointer? Message-ID: <CAP2b4GNfyhviAjRakViZ6nqjBev9S3hYE0ejJ+zb+CNpa_r4GA@mail.gmail.com> Hi Dean, I eventually did find frame::link(), but ultimately it didn't seem to help as VMError::print_native_stack still doesn't work properly on Windows. It seems as though frame::link() calls addr_at on x86, which in turn calls frame::fp(), which returns _fp. I think whatever sets _fp for VMError::print_native_stack is the missing link here, but unfortunately I don't know where it's set The code that I tried on Windows x64 is attached below best regards, Julian // VC++ does not save frame pointer on stack in optimized build. It // can be turned off by -Oy-. If we really want to walk C frames, // we can use the StackWalk() API. frame os::get_sender_for_C_frame(frame* fr) { #ifdef __GNUC__ return frame(fr->sender_sp(), fr->link(), fr->sender_pc()); #elif defined(_MSC_VER) ShouldNotReachHere(); return frame(); #endif } frame os::current_frame() { #ifdef __GNUC__ frame f(reinterpret_cast<intptr_t*>(os::current_stack_pointer()), reinterpret_cast<intptr_t*>(__builtin_frame_address(1)), CAST_FROM_FN_PTR(address, &os::current_frame)); if (os::is_first_C_frame(&f)) { // stack is not walkable return frame(); } else { return os::get_sender_for_C_frame(&f); } #elif defined(_MSC_VER) return frame(); // cannot walk Windows frames this way. See os::get_native_stack // and os::platform_print_native_stack #endif } From dholmes at openjdk.org Thu Jul 11 06:57:56 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 11 Jul 2024 06:57:56 GMT Subject: RFR: 8336163: Remove declarations of some debug-only methods in release build In-Reply-To: <r3075sVKxO34FohH4gtlidTGqmu5y_0qL4_TU3DdbG8=.fc8b604d-bc8a-48a5-a8a7-8fecbd5d3c4f@github.com> References: <r3075sVKxO34FohH4gtlidTGqmu5y_0qL4_TU3DdbG8=.fc8b604d-bc8a-48a5-a8a7-8fecbd5d3c4f@github.com> Message-ID: <IFhCsMwPTpiYCPHMcIiwqdX3Gx-MJkzTZUBHlDrVgAQ=.2b7f045d-3dc3-42fc-8fb3-e85a37b05cf9@github.com> On Thu, 11 Jul 2024 03:37:06 GMT, Qizheng Xing <qxing at openjdk.org> wrote: > Some of the methods are defined only in debug mode, but their declarations still exist in release mode. > > This is considered a bug because these methods may be called mistakenly in release mode and cause the build to fail. Looks good. Thanks for cleaning this up. Please update the copyright year in src/hotspot/share/runtime/registerMap.hpp ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20131#pullrequestreview-2171067941 From dnsimon at openjdk.org Thu Jul 11 07:06:06 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 11 Jul 2024 07:06:06 GMT Subject: Integrated: 8335553: [Graal] Compiler thread calls into jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation In-Reply-To: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com> References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com> Message-ID: <1hewrnDIw6jXlNydTQvT_A8ODaargcrVJyJkufOH-74=.baa6e8b9-e84e-43a1-bc85-ea4dc3c9b28b@github.com> On Mon, 8 Jul 2024 19:01:05 GMT, Doug Simon <dnsimon at openjdk.org> wrote: > This PR addresses intermittent failures in jtreg GC stress tests. The failures occur under these conditions: > 1. Using a libgraal build with assertions enabled as the top tier JIT compiler. Such a libgraal build will cause a VM exit if an assertion or GraalError occurs in a compiler thread (as this catches more errors in testing). > 2. A libgraal compiler thread makes a call into the VM (via `CompilerToVM`) to a routine that performs a HotSpot heap allocation that fails. > 3. The resulting OOME is wrapped in a GraalError, causing the VM to exit as described in 1. > > An OOME thrown in these specific conditions should not exit the VM as it not related to an OOME in the app or test. Instead, the failure should be treated as a bailout and the libgraal compiler should continue. > > To accomplish this, libgraal needs to be able to distinguish a GraalError caused by an OOME. This PR modifies the exception translation code to make this possible. This pull request has now been integrated. Changeset: cf940e13 Author: Doug Simon <dnsimon at openjdk.org> URL: https://git.openjdk.org/jdk/commit/cf940e139a76e5aabd52379b8a87065d82b2284c Stats: 103 lines in 6 files changed: 62 ins; 22 del; 19 mod 8335553: [Graal] Compiler thread calls into jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation Reviewed-by: yzheng, never, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/20083 From dnsimon at openjdk.org Thu Jul 11 07:06:05 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 11 Jul 2024 07:06:05 GMT Subject: RFR: 8335553: [Graal] Compiler thread calls into jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation [v2] In-Reply-To: <BUPsFQTN-twZrvPQBoAMoHXNo_lqIMiTGH-pVnvVVpY=.2bfcc370-6ddb-4e12-8dcb-420aad9e4223@github.com> References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com> <BUPsFQTN-twZrvPQBoAMoHXNo_lqIMiTGH-pVnvVVpY=.2bfcc370-6ddb-4e12-8dcb-420aad9e4223@github.com> Message-ID: <Ku915mcoIObN_yFr-rdMvxLTcIoewfS0x1DufVG9WPU=.4d357f51-92b0-4f1d-939f-361ea8c73b7a@github.com> On Tue, 9 Jul 2024 13:46:46 GMT, Doug Simon <dnsimon at openjdk.org> wrote: >> This PR addresses intermittent failures in jtreg GC stress tests. The failures occur under these conditions: >> 1. Using a libgraal build with assertions enabled as the top tier JIT compiler. Such a libgraal build will cause a VM exit if an assertion or GraalError occurs in a compiler thread (as this catches more errors in testing). >> 2. A libgraal compiler thread makes a call into the VM (via `CompilerToVM`) to a routine that performs a HotSpot heap allocation that fails. >> 3. The resulting OOME is wrapped in a GraalError, causing the VM to exit as described in 1. >> >> An OOME thrown in these specific conditions should not exit the VM as it not related to an OOME in the app or test. Instead, the failure should be treated as a bailout and the libgraal compiler should continue. >> >> To accomplish this, libgraal needs to be able to distinguish a GraalError caused by an OOME. This PR modifies the exception translation code to make this possible. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > fixed TestTranslatedException Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20083#issuecomment-2222187035 From qxing at openjdk.org Thu Jul 11 07:12:35 2024 From: qxing at openjdk.org (Qizheng Xing) Date: Thu, 11 Jul 2024 07:12:35 GMT Subject: RFR: 8336163: Remove declarations of some debug-only methods in release build [v2] In-Reply-To: <r3075sVKxO34FohH4gtlidTGqmu5y_0qL4_TU3DdbG8=.fc8b604d-bc8a-48a5-a8a7-8fecbd5d3c4f@github.com> References: <r3075sVKxO34FohH4gtlidTGqmu5y_0qL4_TU3DdbG8=.fc8b604d-bc8a-48a5-a8a7-8fecbd5d3c4f@github.com> Message-ID: <mqelp_tuVfW5bHGxY7EDkEPZYB6PR3Ogyi2OVXscA60=.e8610c4a-4352-4b5d-af18-1fbf49cfd7dd@github.com> > Some of the methods are defined only in debug mode, but their declarations still exist in release mode. > > This is considered a bug because these methods may be called mistakenly in release mode and cause the build to fail. Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: Update copyright. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20131/files - new: https://git.openjdk.org/jdk/pull/20131/files/5b044462..37a14107 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20131&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20131&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20131.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20131/head:pull/20131 PR: https://git.openjdk.org/jdk/pull/20131 From qxing at openjdk.org Thu Jul 11 07:12:35 2024 From: qxing at openjdk.org (Qizheng Xing) Date: Thu, 11 Jul 2024 07:12:35 GMT Subject: RFR: 8336163: Remove declarations of some debug-only methods in release build [v2] In-Reply-To: <IFhCsMwPTpiYCPHMcIiwqdX3Gx-MJkzTZUBHlDrVgAQ=.2b7f045d-3dc3-42fc-8fb3-e85a37b05cf9@github.com> References: <r3075sVKxO34FohH4gtlidTGqmu5y_0qL4_TU3DdbG8=.fc8b604d-bc8a-48a5-a8a7-8fecbd5d3c4f@github.com> <IFhCsMwPTpiYCPHMcIiwqdX3Gx-MJkzTZUBHlDrVgAQ=.2b7f045d-3dc3-42fc-8fb3-e85a37b05cf9@github.com> Message-ID: <XY6W7Z_T27A7__GE4smIBj2bDIIxt-E25N3jIUBT61s=.25a24a8d-c187-4358-a7f2-d072db5a44ad@github.com> On Thu, 11 Jul 2024 06:55:29 GMT, David Holmes <dholmes at openjdk.org> wrote: > Please update the copyright year in src/hotspot/share/runtime/registerMap.hpp Updated. Thanks for your suggestion! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20131#issuecomment-2222195566 From dean.long at oracle.com Thu Jul 11 07:17:18 2024 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 11 Jul 2024 00:17:18 -0700 Subject: [External] : Re: Where does VMError::print_native_stack and os::get_sender_for_C_frame load/use the frame pointer? In-Reply-To: <CAP2b4GNfyhviAjRakViZ6nqjBev9S3hYE0ejJ+zb+CNpa_r4GA@mail.gmail.com> References: <CAP2b4GNfyhviAjRakViZ6nqjBev9S3hYE0ejJ+zb+CNpa_r4GA@mail.gmail.com> Message-ID: <34aeae4d-22bd-45bd-85e9-4922a368c4c1@oracle.com> Using fr->link() in get_sender_for_C_frame() gives the wrong answer because it refers to the current frame, not the sender frame. There is no frame::sender_fp() because the information we need could be anywhere in the frame or even nowhere in the frame. This is what the comment about StackWalk() API is hinting at. Even debuggers can have trouble giving an accurate stack trace if external debug information is missing and frames do not contain the needed information themselves. dl On 7/10/24 10:52 PM, Julian Waters wrote: > Hi Dean, > > I eventually did find frame::link(), but ultimately it didn't seem to help as VMError::print_native_stack still doesn't work properly on Windows. It seems as though frame::link() calls addr_at on x86, which in turn calls frame::fp(), which returns _fp. I think whatever sets _fp for VMError::print_native_stack is the missing link here, but unfortunately I don't know where it's set > > The code that I tried on Windows x64 is attached below > > best regards, > Julian > > // VC++ does not save frame pointer on stack in optimized build. It > // can be turned off by -Oy-. If we really want to walk C frames, > // we can use the StackWalk() API. > frame os::get_sender_for_C_frame(frame* fr) { > #ifdef __GNUC__ > return frame(fr->sender_sp(), fr->link(), fr->sender_pc()); > #elif defined(_MSC_VER) > ShouldNotReachHere(); > return frame(); > #endif > } > > frame os::current_frame() { > #ifdef __GNUC__ > frame f(reinterpret_cast<intptr_t*>(os::current_stack_pointer()), > reinterpret_cast<intptr_t*>(__builtin_frame_address(1)), > CAST_FROM_FN_PTR(address, &os::current_frame)); > if (os::is_first_C_frame(&f)) { > // stack is not walkable > return frame(); > } else { > return os::get_sender_for_C_frame(&f); > } > #elif defined(_MSC_VER) > return frame(); // cannot walk Windows frames this way. See os::get_native_stack > // and os::platform_print_native_stack > #endif > } -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20240711/dcd9d651/attachment.htm> From dholmes at openjdk.org Thu Jul 11 07:18:55 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 11 Jul 2024 07:18:55 GMT Subject: RFR: 8336163: Remove declarations of some debug-only methods in release build [v2] In-Reply-To: <mqelp_tuVfW5bHGxY7EDkEPZYB6PR3Ogyi2OVXscA60=.e8610c4a-4352-4b5d-af18-1fbf49cfd7dd@github.com> References: <r3075sVKxO34FohH4gtlidTGqmu5y_0qL4_TU3DdbG8=.fc8b604d-bc8a-48a5-a8a7-8fecbd5d3c4f@github.com> <mqelp_tuVfW5bHGxY7EDkEPZYB6PR3Ogyi2OVXscA60=.e8610c4a-4352-4b5d-af18-1fbf49cfd7dd@github.com> Message-ID: <xCvFFDDqHcxVJONUa0VenAonG3oY6hP6Cjie3W54aYs=.9a3f6132-1eef-4c0a-b9cb-24ba590a9aa3@github.com> On Thu, 11 Jul 2024 07:12:35 GMT, Qizheng Xing <qxing at openjdk.org> wrote: >> Some of the methods are defined only in debug mode, but their declarations still exist in release mode. >> >> This is considered a bug because these methods may be called mistakenly in release mode and cause the build to fail. > > Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20131#pullrequestreview-2171105373 From qxing at openjdk.org Thu Jul 11 07:30:57 2024 From: qxing at openjdk.org (Qizheng Xing) Date: Thu, 11 Jul 2024 07:30:57 GMT Subject: RFR: 8336163: Remove declarations of some debug-only methods in release build [v2] In-Reply-To: <mqelp_tuVfW5bHGxY7EDkEPZYB6PR3Ogyi2OVXscA60=.e8610c4a-4352-4b5d-af18-1fbf49cfd7dd@github.com> References: <r3075sVKxO34FohH4gtlidTGqmu5y_0qL4_TU3DdbG8=.fc8b604d-bc8a-48a5-a8a7-8fecbd5d3c4f@github.com> <mqelp_tuVfW5bHGxY7EDkEPZYB6PR3Ogyi2OVXscA60=.e8610c4a-4352-4b5d-af18-1fbf49cfd7dd@github.com> Message-ID: <4BJhfsLXcD6wvT8iZNXOeQ3IL2llZcqCPlCbesjlH4U=.34316abb-b4b0-44f9-a4d2-0ed1c1800ea2@github.com> On Thu, 11 Jul 2024 07:12:35 GMT, Qizheng Xing <qxing at openjdk.org> wrote: >> Some of the methods are defined only in debug mode, but their declarations still exist in release mode. >> >> This is considered a bug because these methods may be called mistakenly in release mode and cause the build to fail. > > Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright. Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20131#issuecomment-2222227563 From duke at openjdk.org Thu Jul 11 07:30:58 2024 From: duke at openjdk.org (duke) Date: Thu, 11 Jul 2024 07:30:58 GMT Subject: RFR: 8336163: Remove declarations of some debug-only methods in release build [v2] In-Reply-To: <mqelp_tuVfW5bHGxY7EDkEPZYB6PR3Ogyi2OVXscA60=.e8610c4a-4352-4b5d-af18-1fbf49cfd7dd@github.com> References: <r3075sVKxO34FohH4gtlidTGqmu5y_0qL4_TU3DdbG8=.fc8b604d-bc8a-48a5-a8a7-8fecbd5d3c4f@github.com> <mqelp_tuVfW5bHGxY7EDkEPZYB6PR3Ogyi2OVXscA60=.e8610c4a-4352-4b5d-af18-1fbf49cfd7dd@github.com> Message-ID: <5pXzgjHqBDblF_ax7idXa5W_0QCqJHL-YFx9lhmH7Ks=.dd466800-57b9-454a-8e21-dc2f9fd84c20@github.com> On Thu, 11 Jul 2024 07:12:35 GMT, Qizheng Xing <qxing at openjdk.org> wrote: >> Some of the methods are defined only in debug mode, but their declarations still exist in release mode. >> >> This is considered a bug because these methods may be called mistakenly in release mode and cause the build to fail. > > Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright. @MaxXSoft Your change (at version 37a14107e9e5542024512bced9bbd03c0c606461) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20131#issuecomment-2222230127 From stuefe at openjdk.org Thu Jul 11 07:38:57 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 11 Jul 2024 07:38:57 GMT Subject: RFR: 8300732: Whitebox functions for Metaspace test should use byte size In-Reply-To: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com> References: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com> Message-ID: <7kTS7aOEGu5r0uCYvKrIb7nvf1-MBkuCngFWHxNzj2E=.1d2e2913-d442-429f-afc1-0732171cb514@github.com> On Thu, 4 Jul 2024 15:18:29 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: > Hi all, > > This PR addresses [8300732](https://bugs.openjdk.org/browse/JDK-8300732) switching Whitebox Metaspace test functions to use bytes as opposed to words. > > Testing: > - [x] `test/hotspot/jtreg/runtime/Metaspace` tests pass. > > Thanks, > Sonia It looks cautiously okay. Small nits remain. Please make sure the tests pass for both 64-bit and 32-bit (to test 32-build, simplest way is to build on a x64 linux as normal, but to specify --with-target-bits=32 when configuring). src/hotspot/share/prims/whitebox.cpp line 1769: > 1767: > 1768: WB_ENTRY(jlong, WB_AllocateFromMetaspaceTestArena(JNIEnv* env, jobject wb, jlong arena, jlong size)) > 1769: if (size % BytesPerWord != 0) { Just assert `is_aligned(size, BytesPerWord) ` test/hotspot/jtreg/runtime/Metaspace/elastic/MetaspaceTestContext.java line 232: > 230: // > 231: long expectedMaxCommitted = usageMeasured; > 232: expectedMaxCommitted += Settings.ROOT_CHUNK_WORD_SIZE; Needs scaling up by BytesPerWord now test/hotspot/jtreg/runtime/Metaspace/elastic/TestMetaspaceAllocationMT1.java line 98: > 96: > 97: final long wordSize = Settings.WORD_SIZE; > 98: final long testAllocationCeiling = 1024 * 1024 * 8 * wordSize; // 8m words = 64M on 64bit Here, and in other places where I hardcode expected memory values: don't scale by size, just hardcode now the real byte values. E.g. here, use 1024 * 1024 * 64. If possible, put a "KB" and "MB" define somewhere. Or even better, copy the "Unit" enum I added to TestTrimNative. Would be cool to have that somewhere central. ------------- Changes requested by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20039#pullrequestreview-2171122397 PR Review Comment: https://git.openjdk.org/jdk/pull/20039#discussion_r1673536814 PR Review Comment: https://git.openjdk.org/jdk/pull/20039#discussion_r1673538960 PR Review Comment: https://git.openjdk.org/jdk/pull/20039#discussion_r1673547131 From tanksherman27 at gmail.com Thu Jul 11 08:12:40 2024 From: tanksherman27 at gmail.com (Julian Waters) Date: Thu, 11 Jul 2024 16:12:40 +0800 Subject: [External] : Re: Where does VMError::print_native_stack and os::get_sender_for_C_frame load/use the frame pointer? In-Reply-To: <34aeae4d-22bd-45bd-85e9-4922a368c4c1@oracle.com> References: <CAP2b4GNfyhviAjRakViZ6nqjBev9S3hYE0ejJ+zb+CNpa_r4GA@mail.gmail.com> <34aeae4d-22bd-45bd-85e9-4922a368c4c1@oracle.com> Message-ID: <CAP2b4GNkafc7zpbXfXJf1OZJMt7wFM_GSF9cFZX4gajrujx+Zg@mail.gmail.com> Hi Dean, Thanks for the quick reply. At the risk of testing your patience, I don't really follow, since that is how os::get_sender_for_C_frame is implemented on other platforms (I copied it from Linux x86 in this case). All I got from the comment is that the only reason we usually have to use the StackWalk API on Windows is because the frame pointer is not saved when using the Microsoft compiler, however in my case I'm not using the Microsoft compiler and have verified that the frame pointer is saved in my custom JVMs. I'm not sure how VMError::print_native_stack on other platforms manages to work when they also do return frame(fr->sender_sp(), fr->link(), fr->sender_pc()); in os::get_sender_for_C_frame like I did here Thanks for your time and patience! best regards, Julian On Thu, Jul 11, 2024 at 3:17?PM <dean.long at oracle.com> wrote: > > Using fr->link() in get_sender_for_C_frame() gives the wrong answer because it refers to the current frame, not the sender frame. There is no frame::sender_fp() because the information we need could be anywhere in the frame or even nowhere in the frame. This is what the comment about StackWalk() API is hinting at. Even debuggers can have trouble giving an accurate stack trace if external debug information is missing and frames do not contain the needed information themselves. > > dl > > On 7/10/24 10:52 PM, Julian Waters wrote: > > Hi Dean, > > I eventually did find frame::link(), but ultimately it didn't seem to help as VMError::print_native_stack still doesn't work properly on Windows. It seems as though frame::link() calls addr_at on x86, which in turn calls frame::fp(), which returns _fp. I think whatever sets _fp for VMError::print_native_stack is the missing link here, but unfortunately I don't know where it's set > > The code that I tried on Windows x64 is attached below > > best regards, > Julian > > // VC++ does not save frame pointer on stack in optimized build. It > // can be turned off by -Oy-. If we really want to walk C frames, > // we can use the StackWalk() API. > frame os::get_sender_for_C_frame(frame* fr) { > #ifdef __GNUC__ > return frame(fr->sender_sp(), fr->link(), fr->sender_pc()); > #elif defined(_MSC_VER) > ShouldNotReachHere(); > return frame(); > #endif > } > > frame os::current_frame() { > #ifdef __GNUC__ > frame f(reinterpret_cast<intptr_t*>(os::current_stack_pointer()), > reinterpret_cast<intptr_t*>(__builtin_frame_address(1)), > CAST_FROM_FN_PTR(address, &os::current_frame)); > if (os::is_first_C_frame(&f)) { > // stack is not walkable > return frame(); > } else { > return os::get_sender_for_C_frame(&f); > } > #elif defined(_MSC_VER) > return frame(); // cannot walk Windows frames this way. See os::get_native_stack > // and os::platform_print_native_stack > #endif > } From luhenry at openjdk.org Thu Jul 11 08:40:02 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 11 Jul 2024 08:40:02 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v24] In-Reply-To: <5ejRWsbRIP1r1H0oOENrsDrHaMebfqfNGrIMc-UjogQ=.7ccd8152-311d-4164-8a4a-17a110561cac@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> <5ejRWsbRIP1r1H0oOENrsDrHaMebfqfNGrIMc-UjogQ=.7ccd8152-311d-4164-8a4a-17a110561cac@github.com> Message-ID: <7fhRknozHSB9GrctVa-AReMYCo1Wgh8cMUoMDAP9J2E=.0bf684bc-5b06-4b8e-ba57-5274c58b6ec5@github.com> On Wed, 10 Jul 2024 20:31:27 GMT, Robbin Ehn <rehn at openjdk.org> wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL <trampo> >> Stubs: >> AUIPC >> LD >> JALR >> <DEST> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> <DEST> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: > > - Skip qualify ins > - Merge branch 'master' into 8332689 > - _ld to ld > - Merge branch 'master' into 8332689 > - Rename to reloc_call > - Merge branch 'master' into 8332689 > - Rename lc > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Comments > - ... and 24 more: https://git.openjdk.org/jdk/compare/242f1133...242c3790 Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19453#pullrequestreview-2171272750 From shade at openjdk.org Thu Jul 11 08:49:12 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 Jul 2024 08:49:12 GMT Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers Message-ID: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com> All around Hotspot, we have calls to `method->is_initializer()`. That method test for both instance and static initializers. In many cases, the uses imply we actually want to test for constructor (instance initializer), not static initializer. Sometimes we filter explicitly for `!m->is_static()`, sometimes we don't. Often we get lucky by never being exposed to static initializers on particular paths. I would like to sharpen this. I went back and forth, and ultimately decided to remove `is_initializer` completely to avoid future confusion, and rewrite the uses appropriately. Additional testing: - [x] Linux AArch64 server fastdebug, `all` (includes Fuzzer and CTW tests) ------------- Commit messages: - Minor touchups - Relax MemberName, accept clinit as Method - Fix Changes: https://git.openjdk.org/jdk/pull/20120/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20120&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8336103 Stats: 46 lines in 15 files changed: 10 ins; 17 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/20120.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20120/head:pull/20120 PR: https://git.openjdk.org/jdk/pull/20120 From shade at openjdk.org Thu Jul 11 08:49:12 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 Jul 2024 08:49:12 GMT Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers In-Reply-To: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com> References: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com> Message-ID: <M7cb-w9MyVB-dVZnJSHzKtn2IPVZYFqlZhjwXfFik9c=.d2d75ea3-67cf-42f9-9513-e1c29a758d8a@github.com> On Wed, 10 Jul 2024 17:15:49 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > All around Hotspot, we have calls to `method->is_initializer()`. That method test for both instance and static initializers. In many cases, the uses imply we actually want to test for constructor (instance initializer), not static initializer. Sometimes we filter explicitly for `!m->is_static()`, sometimes we don't. Often we get lucky by never being exposed to static initializers on particular paths. > > I would like to sharpen this. I went back and forth, and ultimately decided to remove `is_initializer` completely to avoid future confusion, and rewrite the uses appropriately. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `all` (includes Fuzzer and CTW tests) Caught some test failures, back to draft. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20120#issuecomment-2221062542 From aturbanov at openjdk.org Thu Jul 11 08:49:13 2024 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Thu, 11 Jul 2024 08:49:13 GMT Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers In-Reply-To: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com> References: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com> Message-ID: <4b2tcmkJFGFQ3X9Uu_mg2UE_3vV_dqlP3tx24R5m7JY=.55d37e64-d4f7-40e1-913f-e757042b3629@github.com> On Wed, 10 Jul 2024 17:15:49 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > All around Hotspot, we have calls to `method->is_initializer()`. That method test for both instance and static initializers. In many cases, the uses imply we actually want to test for constructor (instance initializer), not static initializer. Sometimes we filter explicitly for `!m->is_static()`, sometimes we don't. Often we get lucky by never being exposed to static initializers on particular paths. > > I would like to sharpen this. I went back and forth, and ultimately decided to remove `is_initializer` completely to avoid future confusion, and rewrite the uses appropriately. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `all` (includes Fuzzer and CTW tests) src/hotspot/share/oops/klassVtable.cpp line 1233: > 1231: > 1232: inline bool interface_method_needs_itable_index(Method* m) { > 1233: if (m->is_static()) return false; // e.g., Stream.empty code alignment is now inconsistent ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20120#discussion_r1673644568 From jrose at openjdk.org Thu Jul 11 08:50:57 2024 From: jrose at openjdk.org (John R Rose) Date: Thu, 11 Jul 2024 08:50:57 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields In-Reply-To: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> Message-ID: <pfFWmbs1q_M-WQIDyBw15ctVdRcAudSrdJ6BEQRx41E=.762c100f-7650-47fd-bfe3-ac620913384f@github.com> On Mon, 10 Jun 2024 18:05:09 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > See bug for more discussion. > > Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193 > > A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice: > https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160 > > AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone. > > I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields. > > Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit. > > C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner. > > Additional testing: > - [x] New IR tests > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` I like this compromise. Let me see if I got it right: A stable write in a constructor is treated like a final write ? it triggers a barrier at the end of the constructor. That?s a cheap move. No other barriers are added automatically, for reads or other writes, saving us from doing less cheap moves. The burden would be on users of stable vars (in fancy access patterns) to add more fences if needed, but we don?t see any important cases of that, at the moment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2222374934 From tanksherman27 at gmail.com Thu Jul 11 08:57:23 2024 From: tanksherman27 at gmail.com (Julian Waters) Date: Thu, 11 Jul 2024 16:57:23 +0800 Subject: [External] : Re: Where does VMError::print_native_stack and os::get_sender_for_C_frame load/use the frame pointer? In-Reply-To: <CAP2b4GNkafc7zpbXfXJf1OZJMt7wFM_GSF9cFZX4gajrujx+Zg@mail.gmail.com> References: <CAP2b4GNfyhviAjRakViZ6nqjBev9S3hYE0ejJ+zb+CNpa_r4GA@mail.gmail.com> <34aeae4d-22bd-45bd-85e9-4922a368c4c1@oracle.com> <CAP2b4GNkafc7zpbXfXJf1OZJMt7wFM_GSF9cFZX4gajrujx+Zg@mail.gmail.com> Message-ID: <CAP2b4GPnh9C5VR84KM3c9wgkYiLDf80BMJ5fK26qWsvobGXvkA@mail.gmail.com> Seems like I found an old gem where the issue with the frame pointer was first discovered https://bugs.openjdk.org/browse/JDK-8022335 https://github.com/openjdk/jdk/commit/1c2a7eea85ea261102687190d6b2e92c560770b8 best regards, Julian On Thu, Jul 11, 2024 at 4:12?PM Julian Waters <tanksherman27 at gmail.com> wrote: > Hi Dean, > > Thanks for the quick reply. At the risk of testing your patience, I > don't really follow, since that is how os::get_sender_for_C_frame is > implemented on other platforms (I copied it from Linux x86 in this > case). All I got from the comment is that the only reason we usually > have to use the StackWalk API on Windows is because the frame pointer > is not saved when using the Microsoft compiler, however in my case I'm > not using the Microsoft compiler and have verified that the frame > pointer is saved in my custom JVMs. I'm not sure how > VMError::print_native_stack on other platforms manages to work when > they also do > > return frame(fr->sender_sp(), fr->link(), fr->sender_pc()); > > in os::get_sender_for_C_frame like I did here > > Thanks for your time and patience! > > best regards, > Julian > > On Thu, Jul 11, 2024 at 3:17?PM <dean.long at oracle.com> wrote: > > > > Using fr->link() in get_sender_for_C_frame() gives the wrong answer > because it refers to the current frame, not the sender frame. There is no > frame::sender_fp() because the information we need could be anywhere in the > frame or even nowhere in the frame. This is what the comment about > StackWalk() API is hinting at. Even debuggers can have trouble giving an > accurate stack trace if external debug information is missing and frames do > not contain the needed information themselves. > > > > dl > > > > On 7/10/24 10:52 PM, Julian Waters wrote: > > > > Hi Dean, > > > > I eventually did find frame::link(), but ultimately it didn't seem to > help as VMError::print_native_stack still doesn't work properly on Windows. > It seems as though frame::link() calls addr_at on x86, which in turn calls > frame::fp(), which returns _fp. I think whatever sets _fp for > VMError::print_native_stack is the missing link here, but unfortunately I > don't know where it's set > > > > The code that I tried on Windows x64 is attached below > > > > best regards, > > Julian > > > > // VC++ does not save frame pointer on stack in optimized build. It > > // can be turned off by -Oy-. If we really want to walk C frames, > > // we can use the StackWalk() API. > > frame os::get_sender_for_C_frame(frame* fr) { > > #ifdef __GNUC__ > > return frame(fr->sender_sp(), fr->link(), fr->sender_pc()); > > #elif defined(_MSC_VER) > > ShouldNotReachHere(); > > return frame(); > > #endif > > } > > > > frame os::current_frame() { > > #ifdef __GNUC__ > > frame f(reinterpret_cast<intptr_t*>(os::current_stack_pointer()), > > reinterpret_cast<intptr_t*>(__builtin_frame_address(1)), > > CAST_FROM_FN_PTR(address, &os::current_frame)); > > if (os::is_first_C_frame(&f)) { > > // stack is not walkable > > return frame(); > > } else { > > return os::get_sender_for_C_frame(&f); > > } > > #elif defined(_MSC_VER) > > return frame(); // cannot walk Windows frames this way. See > os::get_native_stack > > // and os::platform_print_native_stack > > #endif > > } > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20240711/98540c75/attachment.htm> From shade at openjdk.org Thu Jul 11 08:58:11 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 Jul 2024 08:58:11 GMT Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers [v2] In-Reply-To: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com> References: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com> Message-ID: <WSVnDVWEq7cIaiEd2-pdWW4Il8Qi4wwvjF2yyveKcgM=.613045d7-a827-4f3d-bcf4-ba9200a2c8f4@github.com> > All around Hotspot, we have calls to `method->is_initializer()`. That method test for both instance and static initializers. In many cases, the uses imply we actually want to test for constructor (instance initializer), not static initializer. Sometimes we filter explicitly for `!m->is_static()`, sometimes we don't. Often we get lucky by never being exposed to static initializers on particular paths. > > I would like to sharpen this. I went back and forth, and ultimately decided to remove `is_initializer` completely to avoid future confusion, and rewrite the uses appropriately. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `all` (includes Fuzzer and CTW tests) Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Indenting ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20120/files - new: https://git.openjdk.org/jdk/pull/20120/files/f586e4db..c5da5ebd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20120&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20120&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20120.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20120/head:pull/20120 PR: https://git.openjdk.org/jdk/pull/20120 From shade at openjdk.org Thu Jul 11 08:58:12 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 Jul 2024 08:58:12 GMT Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers [v2] In-Reply-To: <4b2tcmkJFGFQ3X9Uu_mg2UE_3vV_dqlP3tx24R5m7JY=.55d37e64-d4f7-40e1-913f-e757042b3629@github.com> References: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com> <4b2tcmkJFGFQ3X9Uu_mg2UE_3vV_dqlP3tx24R5m7JY=.55d37e64-d4f7-40e1-913f-e757042b3629@github.com> Message-ID: <zH5gC1wCdxLMrL4dCiFEB3YfvoAjnaMcSEYOudZGZmw=.846d8bbc-a9ea-46fb-81cb-b32073d253e5@github.com> On Thu, 11 Jul 2024 08:46:38 GMT, Andrey Turbanov <aturbanov at openjdk.org> wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Indenting > > src/hotspot/share/oops/klassVtable.cpp line 1233: > >> 1231: >> 1232: inline bool interface_method_needs_itable_index(Method* m) { >> 1233: if (m->is_static()) return false; // e.g., Stream.empty > > code alignment is now inconsistent Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20120#discussion_r1673657253 From shade at openjdk.org Thu Jul 11 08:59:55 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 Jul 2024 08:59:55 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields In-Reply-To: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> Message-ID: <pdP9zTaLI_DskFxG11PMF1qTDRC-1mbmdfhrmHnkJ-o=.179539f1-ffc4-4faf-a1b8-12940a498650@github.com> On Mon, 10 Jun 2024 18:05:09 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > See bug for more discussion. > > Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193 > > A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice: > https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160 > > AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone. > > I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields. > > Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit. > > C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner. > > Additional testing: > - [x] New IR tests > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` > If we merge the stable and final flags, won't this be regarded the same as any final-writing constructor? Nope. With this patch, we only care about stable field barriers in constructors. Methods are not affected. There are new IR tests that verify this directly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2222394296 From shade at openjdk.org Thu Jul 11 09:02:56 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 Jul 2024 09:02:56 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields In-Reply-To: <tyXcVLFV6t00-jDYMMHuaRPfScHuvsuGbPsJaCO1ALc=.c002e92d-92aa-4030-a320-a0dcfe7dc5d1@github.com> References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> <tyXcVLFV6t00-jDYMMHuaRPfScHuvsuGbPsJaCO1ALc=.c002e92d-92aa-4030-a320-a0dcfe7dc5d1@github.com> Message-ID: <seQquUMTfKdTEbU7TeM5IRuZe12PeoUxnznHyA_VXTA=.e7d8c514-39f9-4980-8b9d-a9563bd54df9@github.com> On Wed, 10 Jul 2024 23:35:01 GMT, Dean Long <dlong at openjdk.org> wrote: > Do we still need separate _wrote_stable and _wrote_final flags, or could we combine them into _wrote_stable_or_final? Then we are almost back to pre-8031818, when _wrote_final was overloaded to mean write to final or stable field. One of my previous iterations did this combination, but I thought it was: a) uglier; b) not future-proof, in case someone (probably me, later) would like to check the parser state for final fields writes specifically. So I thought to track final and stable field writes separately. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2222400103 From shade at openjdk.org Thu Jul 11 09:09:55 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 Jul 2024 09:09:55 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields In-Reply-To: <pfFWmbs1q_M-WQIDyBw15ctVdRcAudSrdJ6BEQRx41E=.762c100f-7650-47fd-bfe3-ac620913384f@github.com> References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> <pfFWmbs1q_M-WQIDyBw15ctVdRcAudSrdJ6BEQRx41E=.762c100f-7650-47fd-bfe3-ac620913384f@github.com> Message-ID: <o_chvBrK3sgS76_FnNSCGkujpNJzMde80_Yl3PD8bX8=.ba3c80f0-ac63-4f4a-8684-90a350ad453b@github.com> On Thu, 11 Jul 2024 08:47:59 GMT, John R Rose <jrose at openjdk.org> wrote: > I like this compromise. Let me see if I got it right: A stable write in a constructor is treated like a final write ? it triggers a barrier at the end of the constructor. That?s a cheap move. No other barriers are added automatically, for reads or other writes, saving us from doing less cheap moves. The burden would be on users of stable vars (in fancy access patterns) to add more fences if needed, but we don?t see any important cases of that, at the moment. Yes, pretty much. Looking at this another way, after this patch, there is no performance or safety cost for simple changes in user code like: 1. Changing the previously `final` field into `@Stable` field with value overwrite outside of constructor. This looks like a useful pattern in `java.lang.invoke`. 2. Changing the previously `@Stable` field into `final` field, if the only stores are in constructor. Basically, the reversal of (1). 3. Putting a `@Stable` over `final` field. This is where current `String` constructor gets a bad deal today. 4. Putting a `@Stable` over any field that is written outside of constructor. This is where lazy caches like `Enum.hashCode` get a bad deal today. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2222413725 From sgehwolf at openjdk.org Thu Jul 11 09:15:58 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Thu, 11 Jul 2024 09:15:58 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v3] In-Reply-To: <uWlQ046HkkvQZ6nmMUCFMYxlgoeqG296pNj6vBTS2uA=.2199f887-e0da-46da-831b-53fb8c5868aa@github.com> References: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com> <t_jUv9-mkIFcGRInYKmcnfP0W8VwXEtflahjSUiK8zI=.d524b51c-1963-4024-87e0-b12911d475d0@github.com> <uWlQ046HkkvQZ6nmMUCFMYxlgoeqG296pNj6vBTS2uA=.2199f887-e0da-46da-831b-53fb8c5868aa@github.com> Message-ID: <PDQay8tXsDTCW1HMDCNk8WcfB_ZNgLctDGe5J-GwHhY=.3d9841b5-5250-4d68-a775-0e7d45e612bd@github.com> On Thu, 11 Jul 2024 03:39:37 GMT, Jan Kratochvil <jkratochvil at openjdk.org> wrote: >> Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Fix comments >> - 8333446: Add tests for hierarchical container support > > test/hotspot/jtreg/ProblemList.txt line 119: > >> 117: containers/docker/TestMemoryAwareness.java 8303470 linux-all >> 118: containers/docker/TestJFREvents.java 8327723 linux-x64 >> 119: containers/systemd/SystemdMemoryAwarenessTest.java 8322420 linux-all > > This line should be removed as long as it gets applied after [17198](https://github.com/openjdk/jdk/pull/17198). Sure. We need to see which one goes in first and I'll adjust accordingly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19530#discussion_r1673683205 From sgehwolf at openjdk.org Thu Jul 11 09:26:55 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Thu, 11 Jul 2024 09:26:55 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v3] In-Reply-To: <kSbubsK2cEF-sY-GX4AYliW9dMXZ8IYGBcKIZaalDcU=.b82004c9-2633-4996-8c61-18d3ff9b0fd0@github.com> References: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com> <t_jUv9-mkIFcGRInYKmcnfP0W8VwXEtflahjSUiK8zI=.d524b51c-1963-4024-87e0-b12911d475d0@github.com> <kSbubsK2cEF-sY-GX4AYliW9dMXZ8IYGBcKIZaalDcU=.b82004c9-2633-4996-8c61-18d3ff9b0fd0@github.com> Message-ID: <0U-PWBKKJ7mHmK_GQ77s_gZ0tPRbRIsQcdjJRWdVmGg=.8b319781-3628-400b-b9d7-c0750a2a8637@github.com> On Thu, 11 Jul 2024 03:42:27 GMT, Jan Kratochvil <jkratochvil at openjdk.org> wrote: > [test.patch.txt](https://github.com/user-attachments/files/16171122/test.patch.txt) > > * `CPUQuota` (changed it to `AllowedCPUs`) does not work for me - it properly distributes the load but JDK still sees all available CPU cores (4 of my VM). Could you elaborate on that? What does not work? It's relying on the JVM properly detecting the set limit. `CPUQuota` sets the values in `cpu.max` on unified hierarchy for the `cpu` controller. See the [systemd doc](https://www.freedesktop.org/software/systemd/man/latest/systemd.resource-control.html). It's available since systemd 213. RHEL 7 has 219 which should be good enough. `AllowedCPUs` on the other hand uses the `cpuset` controller, which is a different thing. For the purpose of this test, we should use `CPUQuota`. > * the change 2 -> 1 cores: // We could check 2 cores ("0-1") but then it would fail on single-core nodes / virtual machines. Yeah, we have a chicken/egg problem there. It seemed assuming 2 cores is reasonable. We could query the number of not restricted CPUs (of the physical system) using the WB API and take the minimum of the two. Let me work on that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2222448285 From yzheng at openjdk.org Thu Jul 11 09:29:00 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Thu, 11 Jul 2024 09:29:00 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v5] In-Reply-To: <nho0iQHJu__oLvxJF3oE1qBlFiSvUoZ6dLEIc139KqA=.5ba0e931-a6c4-443d-b9d7-715da000d045@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <nho0iQHJu__oLvxJF3oE1qBlFiSvUoZ6dLEIc139KqA=.5ba0e931-a6c4-443d-b9d7-715da000d045@github.com> Message-ID: <GxCetSdxU-CzhK7QGtwUC2lxoREeieXA10J8zbizClw=.60084b5d-1f22-4592-9d6e-a99e52df478e@github.com> On Wed, 10 Jul 2024 20:10:07 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with four additional commits since the last revision: > > - Add extra comments in LightweightSynchronizer::inflate_fast_locked_object > - Fix typos > - Remove unused variable > - Add missing inline qualifiers src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 843: > 841: movptr(monitor, Address(box, BasicLock::object_monitor_cache_offset_in_bytes())); > 842: // null check with ZF == 0, no valid pointer below alignof(ObjectMonitor*) > 843: cmpptr(monitor, alignof(ObjectMonitor*)); Is this only for keeping `ZF == 0` and can be replaced by `test; je` if we are not using `jne` to jump to the slow path? Or is there any performance concern? Btw, I think `ZF` is always rewritten before entering into the slow path https://github.com/openjdk/jdk/blob/b32e4a68bca588d908bd81a398eb3171a6876dc5/src/hotspot/cpu/x86/c2_CodeStubs_x86.cpp#L98-L102 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1673704482 From rehn at openjdk.org Thu Jul 11 10:00:59 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 11 Jul 2024 10:00:59 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v23] In-Reply-To: <SN7gZ_XJWn2jG_DXGmzHWqVfV1xz_vG-BTkotAbuzkM=.c48958e0-a982-4e38-bb0c-fac37d4de7f1@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> <CJzw2cha3OyqX9jnxeFj9se8z4V6alfhaTAHxj_R63k=.86e35c57-9bf9-4d22-a350-45d10c4e307b@github.com> <gBaz5XlGA4DywDyB2NIlCqY4A1zbkN5y7zhXTvEgFbM=.9fd83e49-0a5a-4530-a87d-321e05b66016@github.com> <SN7gZ_XJWn2jG_DXGmzHWqVfV1xz_vG-BTkotAbuzkM=.c48958e0-a982-4e38-bb0c-fac37d4de7f1@github.com> Message-ID: <dj_6CgB5MNJuRLxsbuko3KB6P5EBnsNecuPyUzBzzZs=.ff095ccc-8f0b-4de9-b776-178f9df524f1@github.com> On Wed, 10 Jul 2024 20:39:24 GMT, Hamlin Li <mli at openjdk.org> wrote: >> Also performed tier1-3 and hotspot:tier4 on my unmatched boards. Result looks fine. >> Just witnessed several unnecessary uses of namespace `Assembler`. Guess you might want to clean it up? Still good otherwise. >> >> >> diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp >> index b39ac79be6b..e349eab3177 100644 >> --- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp >> +++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp >> @@ -983,9 +983,9 @@ void MacroAssembler::load_link_jump(const address source, Register temp) { >> assert_cond(source != nullptr); >> int64_t distance = source - pc(); >> assert(is_simm32(distance), "Must be"); >> - Assembler::auipc(temp, (int32_t)distance + 0x800); >> - Assembler::ld(temp, temp, ((int32_t)distance << 20) >> 20); >> - Assembler::jalr(x1, temp, 0); >> + auipc(temp, (int32_t)distance + 0x800); >> + ld(temp, Address(temp, ((int32_t)distance << 20) >> 20)); >> + jalr(temp); >> } >> >> void MacroAssembler::jump_link(const address dest, Register temp) { >> @@ -994,7 +994,7 @@ void MacroAssembler::jump_link(const address dest, Register temp) { >> int64_t distance = dest - pc(); >> assert(is_simm21(distance), "Must be"); >> assert((distance % 2) == 0, "Must be"); >> - Assembler::jal(x1, distance); >> + jal(x1, distance); >> } >> >> void MacroAssembler::j(const address dest, Register temp) { > >> I have not seen (new) issues in testing. I would have prefered one or two more reviewers, but since RV is not the biggest platform I'll settle with just passing the bar. I'll go ahead and integrate if @RealFYang and @Hamlin-Li re-reviews (as the new rules are in-effect which require latest rev to be reviewed). > > Still good to me. Thanks! Thanks! @Hamlin-Li please re-approve, background: https://mail.openjdk.org/pipermail/jdk-dev/2024-July/009199.html ------------- PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2222520657 From mli at openjdk.org Thu Jul 11 10:05:01 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 11 Jul 2024 10:05:01 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v24] In-Reply-To: <5ejRWsbRIP1r1H0oOENrsDrHaMebfqfNGrIMc-UjogQ=.7ccd8152-311d-4164-8a4a-17a110561cac@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> <5ejRWsbRIP1r1H0oOENrsDrHaMebfqfNGrIMc-UjogQ=.7ccd8152-311d-4164-8a4a-17a110561cac@github.com> Message-ID: <7iSqtZQW-vx36D_y9R5a-lWWRwOl0p0aGdTF6GhV6P0=.5065f3b7-d33c-4b1c-8a8d-b6666d251b9f@github.com> On Wed, 10 Jul 2024 20:31:27 GMT, Robbin Ehn <rehn at openjdk.org> wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL <trampo> >> Stubs: >> AUIPC >> LD >> JALR >> <DEST> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> <DEST> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: > > - Skip qualify ins > - Merge branch 'master' into 8332689 > - _ld to ld > - Merge branch 'master' into 8332689 > - Rename to reloc_call > - Merge branch 'master' into 8332689 > - Rename lc > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Comments > - ... and 24 more: https://git.openjdk.org/jdk/compare/242f1133...242c3790 Marked as reviewed by mli (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19453#pullrequestreview-2171471540 From mli at openjdk.org Thu Jul 11 10:05:02 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 11 Jul 2024 10:05:02 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v23] In-Reply-To: <SN7gZ_XJWn2jG_DXGmzHWqVfV1xz_vG-BTkotAbuzkM=.c48958e0-a982-4e38-bb0c-fac37d4de7f1@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> <CJzw2cha3OyqX9jnxeFj9se8z4V6alfhaTAHxj_R63k=.86e35c57-9bf9-4d22-a350-45d10c4e307b@github.com> <gBaz5XlGA4DywDyB2NIlCqY4A1zbkN5y7zhXTvEgFbM=.9fd83e49-0a5a-4530-a87d-321e05b66016@github.com> <SN7gZ_XJWn2jG_DXGmzHWqVfV1xz_vG-BTkotAbuzkM=.c48958e0-a982-4e38-bb0c-fac37d4de7f1@github.com> Message-ID: <RXAvVwAc0X_sVyyL89RAMsmVe1UDdUFIZjLnjB8ybng=.69f8bbb4-7de7-4ad9-9c2b-5d3e251866fc@github.com> On Wed, 10 Jul 2024 20:39:24 GMT, Hamlin Li <mli at openjdk.org> wrote: >> Also performed tier1-3 and hotspot:tier4 on my unmatched boards. Result looks fine. >> Just witnessed several unnecessary uses of namespace `Assembler`. Guess you might want to clean it up? Still good otherwise. >> >> >> diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp >> index b39ac79be6b..e349eab3177 100644 >> --- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp >> +++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp >> @@ -983,9 +983,9 @@ void MacroAssembler::load_link_jump(const address source, Register temp) { >> assert_cond(source != nullptr); >> int64_t distance = source - pc(); >> assert(is_simm32(distance), "Must be"); >> - Assembler::auipc(temp, (int32_t)distance + 0x800); >> - Assembler::ld(temp, temp, ((int32_t)distance << 20) >> 20); >> - Assembler::jalr(x1, temp, 0); >> + auipc(temp, (int32_t)distance + 0x800); >> + ld(temp, Address(temp, ((int32_t)distance << 20) >> 20)); >> + jalr(temp); >> } >> >> void MacroAssembler::jump_link(const address dest, Register temp) { >> @@ -994,7 +994,7 @@ void MacroAssembler::jump_link(const address dest, Register temp) { >> int64_t distance = dest - pc(); >> assert(is_simm21(distance), "Must be"); >> assert((distance % 2) == 0, "Must be"); >> - Assembler::jal(x1, distance); >> + jal(x1, distance); >> } >> >> void MacroAssembler::j(const address dest, Register temp) { > >> I have not seen (new) issues in testing. I would have prefered one or two more reviewers, but since RV is not the biggest platform I'll settle with just passing the bar. I'll go ahead and integrate if @RealFYang and @Hamlin-Li re-reviews (as the new rules are in-effect which require latest rev to be reviewed). > > Still good to me. Thanks! > Thanks! @Hamlin-Li please re-approve, background: https://mail.openjdk.org/pipermail/jdk-dev/2024-July/009199.html Done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2222529821 From rehn at openjdk.org Thu Jul 11 10:27:04 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 11 Jul 2024 10:27:04 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v24] In-Reply-To: <5ejRWsbRIP1r1H0oOENrsDrHaMebfqfNGrIMc-UjogQ=.7ccd8152-311d-4164-8a4a-17a110561cac@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> <5ejRWsbRIP1r1H0oOENrsDrHaMebfqfNGrIMc-UjogQ=.7ccd8152-311d-4164-8a4a-17a110561cac@github.com> Message-ID: <AsNSDOsQCpCshYp0nhTVmVESMGJ9lkF4svg_aO30cfo=.84a6c348-0a52-4d39-9cb5-b1c5bc7cea4e@github.com> On Wed, 10 Jul 2024 20:31:27 GMT, Robbin Ehn <rehn at openjdk.org> wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL <trampo> >> Stubs: >> AUIPC >> LD >> JALR >> <DEST> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> <DEST> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: > > - Skip qualify ins > - Merge branch 'master' into 8332689 > - _ld to ld > - Merge branch 'master' into 8332689 > - Rename to reloc_call > - Merge branch 'master' into 8332689 > - Rename lc > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Comments > - ... and 24 more: https://git.openjdk.org/jdk/compare/242f1133...242c3790 Thank you all for sticking with it! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2222568556 From rehn at openjdk.org Thu Jul 11 10:27:05 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 11 Jul 2024 10:27:05 GMT Subject: Integrated: 8332689: RISC-V: Use load instead of trampolines In-Reply-To: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com> Message-ID: <jOoD3_TrxTqic6fQICCoyljdD4q7zdabxS3QKltg1Ok=.f03f9f1b-7b42-47f2-a61e-67e4e817b709@github.com> On Wed, 29 May 2024 12:40:05 GMT, Robbin Ehn <rehn at openjdk.org> wrote: > Hi all, please consider! > > Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). > Using a very small application or running very short time we have fast patchable calls. > But any normal application running longer will increase the code size and code chrun/fragmentation. > So whatever or not you get hot fast calls rely on luck. > > To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. > This would be the common case for a patchable call. > > Code stream: > JAL <trampo> > Stubs: > AUIPC > LD > JALR > <DEST> > > > On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. > Even if you don't have that problem having a call to a jump is not the fastest way. > Loading the address avoids the pitsfalls of cmodx. > > This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, > and instead do by default: > > Code stream: > AUIPC > LD > JALR > Stubs: > <DEST> > > An experimental option for turning trampolines back on exists. > > It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. > > Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): > > fop (msec) 2239 | 2128 = 0.950424 > h2 (msec) 18660 | 16594 = 0.889282 > jython (msec) 22022 | 21925 = 0.995595 > luindex (msec) 2866 | 2842 = 0.991626 > lusearch (msec) 4108 | 4311 = 1.04942 > lusearch-fix (msec) 4406 | 4116 = 0.934181 > pmd (msec) 5976 | 5897 = 0.98678 > jython (msec) 22022 | 21925 = 0.995595 > Avg: 0.974112 > fop(xcomp) (msec) 2721 | 2714 = 0.997427 > h2(xcomp) (msec) 37719 | 38004 = 1.00756 > jython(xcomp) ... This pull request has now been integrated. Changeset: 5c612c23 Author: Robbin Ehn <rehn at openjdk.org> URL: https://git.openjdk.org/jdk/commit/5c612c230b0a852aed5fd36e58b82ebf2e1838af Stats: 897 lines in 16 files changed: 622 ins; 177 del; 98 mod 8332689: RISC-V: Use load instead of trampolines Reviewed-by: fyang, mli, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/19453 From aboldtch at openjdk.org Thu Jul 11 10:41:57 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 11 Jul 2024 10:41:57 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v5] In-Reply-To: <GxCetSdxU-CzhK7QGtwUC2lxoREeieXA10J8zbizClw=.60084b5d-1f22-4592-9d6e-a99e52df478e@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <nho0iQHJu__oLvxJF3oE1qBlFiSvUoZ6dLEIc139KqA=.5ba0e931-a6c4-443d-b9d7-715da000d045@github.com> <GxCetSdxU-CzhK7QGtwUC2lxoREeieXA10J8zbizClw=.60084b5d-1f22-4592-9d6e-a99e52df478e@github.com> Message-ID: <50LctfChrqd3_HlWrKZKsq4gADeTHWqY1SuFMSwzpL4=.6c6df72a-9bdd-4390-a38d-5d1ee95b8543@github.com> On Thu, 11 Jul 2024 09:25:52 GMT, Yudi Zheng <yzheng at openjdk.org> wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with four additional commits since the last revision: >> >> - Add extra comments in LightweightSynchronizer::inflate_fast_locked_object >> - Fix typos >> - Remove unused variable >> - Add missing inline qualifiers > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 843: > >> 841: movptr(monitor, Address(box, BasicLock::object_monitor_cache_offset_in_bytes())); >> 842: // null check with ZF == 0, no valid pointer below alignof(ObjectMonitor*) >> 843: cmpptr(monitor, alignof(ObjectMonitor*)); > > Is this only for keeping `ZF == 0` and can be replaced by `test; je` if we are not using `jne` to jump to the slow path? Or is there any performance concern? Btw, I think `ZF` is always rewritten before entering into the slow path https://github.com/openjdk/jdk/blob/b32e4a68bca588d908bd81a398eb3171a6876dc5/src/hotspot/cpu/x86/c2_CodeStubs_x86.cpp#L98-L102 You are correct the condition flag is not important here. At some point we had more than just `nullptr` and and `ObjectMonitor*` values, but also some small signal values which allowed us to move some slow path code into the runtime. When this was removed I just made the checks do the same on both aarch64 and x86. (Where aarch64 does not have a stub and jumps directly to the continuation requiring the correct condition flags after the branch.) _Side note: This might be something that will be explored further in the future. And allow to move a lot of the LM_LIGHTWEIGHT slow path code away from the lock node and code stub into the runtime._ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1673800250 From coleenp at openjdk.org Thu Jul 11 13:11:04 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 11 Jul 2024 13:11:04 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v3] In-Reply-To: <P4vwJuFYdy9C2GugO5UgMllMPgrFZyjQkRPCW1d3NxM=.13a88311-ce1e-4be8-8b14-b48177a75960@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <5CNKzDumOf1MJQXM9OBHQh0Mj7eLv2ONio1V-AXeSJI=.54302b45-2dd2-4f18-a094-6b2c6a59517c@github.com> <-hS6aTxhzI_HzVegg0EziUtGxdq6orpF9s1rF3l2hZY=.0c4296b2-d27a-4578-a160-d17b65163655@github.com> <P4vwJuFYdy9C2GugO5UgMllMPgrFZyjQkRPCW1d3NxM=.13a88311-ce1e-4be8-8b14-b48177a75960@github.com> Message-ID: <aEhln1AWzy1her4u3ffOamJl3Tz9eZaBb4ujhh4catg=.e5101f5f-f007-49d8-aae8-eb01b8a7fd25@github.com> On Wed, 10 Jul 2024 09:41:08 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> src/hotspot/share/runtime/arguments.cpp line 1830: >> >>> 1828: FLAG_SET_CMDLINE(LockingMode, LM_LIGHTWEIGHT); >>> 1829: warning("UseObjectMonitorTable requires LM_LIGHTWEIGHT"); >>> 1830: } >> >> Maybe we want this to have the opposite sense - turn off UseObjectMonitorTable if not LM_LIGHTWEIGHT? > > Maybe. It boils down to what to do when the JVM receives `-XX:LockingMode={LM_LEGACY,LM_MONITOR} -XX:+UseObjectMonitorTable` > The options I see are > 1. Select `LockingMode=LM_LIGHTWEIGHT` > 2. Select `UseObjectMonitorTable=false` > 3. Do not start the VM > > Between 1. and 2. it is impossible to know what the real intentions were. But with being a newer `-XX:+UseObjectMonitorTable` it somehow seems more likely. > > Option 3. is probably the sensible solution, but it is hard to determine. We tend to not close the VM because of incompatible options, rather fix them. But I believe there are precedence for both. If we do this however we will have to figure out all the interactions with our testing framework. And probably add some safeguards. UseObjectMonitorTable is a Diagnostic option and LockingMode is (Deprecated) but a full-fledged product option, so I think the product option should override. So I pick 2. They might have changed to Legacy to compare performance or something like that, and missed that the table is only for lightweight locking when switching the command line. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1673989707 From coleenp at openjdk.org Thu Jul 11 13:13:57 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 11 Jul 2024 13:13:57 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v3] In-Reply-To: <P4vwJuFYdy9C2GugO5UgMllMPgrFZyjQkRPCW1d3NxM=.13a88311-ce1e-4be8-8b14-b48177a75960@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <5CNKzDumOf1MJQXM9OBHQh0Mj7eLv2ONio1V-AXeSJI=.54302b45-2dd2-4f18-a094-6b2c6a59517c@github.com> <-hS6aTxhzI_HzVegg0EziUtGxdq6orpF9s1rF3l2hZY=.0c4296b2-d27a-4578-a160-d17b65163655@github.com> <P4vwJuFYdy9C2GugO5UgMllMPgrFZyjQkRPCW1d3NxM=.13a88311-ce1e-4be8-8b14-b48177a75960@github.com> Message-ID: <7_99_ouQ3MAtPFgIQzG01AlOOgFLGPrK1h1LMzzhK60=.0578b32c-8f8b-460c-a5c1-e5686369aba5@github.com> On Wed, 10 Jul 2024 09:41:37 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 763: >> >>> 761: assert(mark.has_monitor(), "must be"); >>> 762: // The monitor exists >>> 763: ObjectMonitor* monitor = ObjectSynchronizer::read_monitor(current, object, mark); >> >> This looks in the table for the monitor in UseObjectMonitorTable, but could it first check the BasicLock? Or we got here because BasicLock.metadata was not the ObjectMonitor? > >> This looks in the table for the monitor in UseObjectMonitorTable, but could it first check the BasicLock? > > We could. > >> Or we got here because BasicLock.metadata was not the ObjectMonitor? > > That is one reason we got here. We also get here from C1/interpreter as well as if there are other threads on the entry queues. > > I think there was an assumption that it would not be that crucial in those cases. > > One off the reasons we do not read the `BasicLock` cache from the runtime is that we are not as careful with keeping the `BasicLock` initialised on platforms without `UseObjectMonitorTable`. The idea was that as long as they call into the VM, we do not need to keep it invariant. > > But this made me realise `BasicLock::print_on` will be broken on non x86/aarch64 platforms if running with `UseObjectMonitorTable`. > > Rather then fix all platforms I will condition BasicLock::object_monitor_cache to return nullptr on not supported platforms. > > Could add this then. Should probably add an overload to `ObjectSynchronizer::read_monitor` which takes the lock and push i all the way here. I think I'd prefer not another overloading of read_monitor. It's kind of confusing as is. This is okay and we'll see if there's any performance benefit to checking BasicLock instead later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1673993770 From jkratochvil at openjdk.org Thu Jul 11 14:28:55 2024 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Thu, 11 Jul 2024 14:28:55 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v3] In-Reply-To: <0U-PWBKKJ7mHmK_GQ77s_gZ0tPRbRIsQcdjJRWdVmGg=.8b319781-3628-400b-b9d7-c0750a2a8637@github.com> References: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com> <t_jUv9-mkIFcGRInYKmcnfP0W8VwXEtflahjSUiK8zI=.d524b51c-1963-4024-87e0-b12911d475d0@github.com> <kSbubsK2cEF-sY-GX4AYliW9dMXZ8IYGBcKIZaalDcU=.b82004c9-2633-4996-8c61-18d3ff9b0fd0@github.com> <0U-PWBKKJ7mHmK_GQ77s_gZ0tPRbRIsQcdjJRWdVmGg=.8b319781-3628-400b-b9d7-c0750a2a8637@github.com> Message-ID: <VHCtyVakvKnmu2bUd9tMRb0TbOIN0PvxllFqpJJX28g=.5e946e9c-8e69-4785-bdd2-f65485d22cd8@github.com> On Thu, 11 Jul 2024 09:23:58 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote: > > ``` > > * `CPUQuota` (changed it to `AllowedCPUs`) does not work for me - it properly distributes the load but JDK still sees all available CPU cores (4 of my VM). > > ``` > > Could you elaborate on that? What does not work? In the log there is (`/proc/cpuinfo` has 4 entries on this system): [0.139s][trace][os,container] OSContainer::active_processor_count: 4 and therefore it fails with: ``` java.lang.RuntimeException: 'OSContainer::active_processor_count: 2' missing from stdout/stderr at jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:252) at SystemdMemoryAwarenessTest.testHelloSystemd(SystemdMemoryAwarenessTest.java:58) at SystemdMemoryAwarenessTest.main(SystemdMemoryAwarenessTest.java:43) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) at java.base/java.lang.Thread.run(Thread.java:1575) It is on Fedora 40 x86_64 (`systemd-255.8-1.fc40.x86_64`). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2223083324 From sgehwolf at openjdk.org Thu Jul 11 14:39:55 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Thu, 11 Jul 2024 14:39:55 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v3] In-Reply-To: <VHCtyVakvKnmu2bUd9tMRb0TbOIN0PvxllFqpJJX28g=.5e946e9c-8e69-4785-bdd2-f65485d22cd8@github.com> References: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com> <t_jUv9-mkIFcGRInYKmcnfP0W8VwXEtflahjSUiK8zI=.d524b51c-1963-4024-87e0-b12911d475d0@github.com> <kSbubsK2cEF-sY-GX4AYliW9dMXZ8IYGBcKIZaalDcU=.b82004c9-2633-4996-8c61-18d3ff9b0fd0@github.com> <0U-PWBKKJ7mHmK_GQ77s_gZ0tPRbRIsQcdjJRWdVmGg=.8b319781-3628-400b-b9d7-c0750a2a8637@github.com> <VHCtyVakvKnmu2bUd9tMRb0TbOIN0PvxllFqpJJX28g=.5e946e9c-8e69-4785-bdd2-f65485d22cd8@github.com> Message-ID: <PM9EbHpQiv_K9fkWrZNP5OlqdpOoXuoSWMQbSBEAbHM=.33629d4c-e0d2-4f54-a588-2d4b3599bffb@github.com> On Thu, 11 Jul 2024 14:26:23 GMT, Jan Kratochvil <jkratochvil at openjdk.org> wrote: > > > ``` > > > * `CPUQuota` (changed it to `AllowedCPUs`) does not work for me - it properly distributes the load but JDK still sees all available CPU cores (4 of my VM). > > > ``` > > > > > > Could you elaborate on that? What does not work? > > In the log there is (`/proc/cpuinfo` has 4 entries on this system): > > ``` > [0.139s][trace][os,container] OSContainer::active_processor_count: 4 > ``` > > and therefore it fails with: > > ``` > java.lang.RuntimeException: 'OSContainer::active_processor_count: 2' missing from stdout/stderr > at jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:252) > at SystemdMemoryAwarenessTest.testHelloSystemd(SystemdMemoryAwarenessTest.java:58) > at SystemdMemoryAwarenessTest.main(SystemdMemoryAwarenessTest.java:43) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) > at java.base/java.lang.reflect.Method.invoke(Method.java:580) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) > at java.base/java.lang.Thread.run(Thread.java:1575) > ``` > > It is on Fedora 40 x86_64 (`systemd-255.8-1.fc40.x86_64`). Well yes, because the limit isn't properly detected (needs a JVM change that does that; imo https://github.com/openjdk/jdk/pull/17198). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2223109654 From sgehwolf at openjdk.org Thu Jul 11 16:46:13 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Thu, 11 Jul 2024 16:46:13 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v4] In-Reply-To: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com> References: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com> Message-ID: <ZUmCX2Tqmw_48beJOefsyDEgjElCZWV6IVl7SMZi4r0=.37d3a4ee-2740-4745-ae47-766da3b7fb6e@github.com> > Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. > > I'm adding those tests in order to not regress another time. > > Testing: > - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. > - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) > - [x] GHA Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Add Whitebox check for host cpu - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Fix comments - 8333446: Add tests for hierarchical container support ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19530/files - new: https://git.openjdk.org/jdk/pull/19530/files/22141a48..139a9069 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19530&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19530&range=02-03 Stats: 13132 lines in 454 files changed: 8669 ins; 2561 del; 1902 mod Patch: https://git.openjdk.org/jdk/pull/19530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19530/head:pull/19530 PR: https://git.openjdk.org/jdk/pull/19530 From sgehwolf at openjdk.org Thu Jul 11 16:59:56 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Thu, 11 Jul 2024 16:59:56 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v4] In-Reply-To: <ZUmCX2Tqmw_48beJOefsyDEgjElCZWV6IVl7SMZi4r0=.37d3a4ee-2740-4745-ae47-766da3b7fb6e@github.com> References: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com> <ZUmCX2Tqmw_48beJOefsyDEgjElCZWV6IVl7SMZi4r0=.37d3a4ee-2740-4745-ae47-766da3b7fb6e@github.com> Message-ID: <h4NcwefKxH-wDTz-VekY135tQneTS5ti8HcnzXqOP2M=.096ee7bc-25ee-43ba-85ae-af8aace12e1d@github.com> On Thu, 11 Jul 2024 16:46:13 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote: >> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. >> >> I'm adding those tests in order to not regress another time. >> >> Testing: >> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. >> - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) >> - [x] GHA > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Add Whitebox check for host cpu > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Fix comments > - 8333446: Add tests for hierarchical container support Example test run on cgv1 with a fixed JVM: https://cr.openjdk.org/~sgehwolf/webrevs/jdk-8333446-systemd-slice-tests/cgv1/SystemdMemoryAwarenessTest.jtr Example test run on cgv2 with a fixed JVM: https://cr.openjdk.org/~sgehwolf/webrevs/jdk-8333446-systemd-slice-tests/cgv2/SystemdMemoryAwarenessTest.jtr ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2223432957 From gli at openjdk.org Thu Jul 11 17:38:15 2024 From: gli at openjdk.org (Guoxiong Li) Date: Thu, 11 Jul 2024 17:38:15 GMT Subject: RFR: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and VM_ParallelGCSystemGC [v3] In-Reply-To: <CTc1SUPyk4eTQPSB-vU374oKCCvcgLvaM-cPm9qFilk=.67d7d034-5055-429a-948a-d9ec1e834324@github.com> References: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com> <CTc1SUPyk4eTQPSB-vU374oKCCvcgLvaM-cPm9qFilk=.67d7d034-5055-429a-948a-d9ec1e834324@github.com> Message-ID: <qkTnSCS8GpxLSZsJrN0_QpK4HGeDscPHVHspATH923M=.56c82c79-a314-41b6-b7c6-ca1178e66152@github.com> On Wed, 10 Jul 2024 20:29:37 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: >> Similar cleanup as https://github.com/openjdk/jdk/pull/19056 but in Parallel. As a result, the corresponding code in `SerialHeap` and `ParallelScavengeHeap` share much similarity. >> >> The easiest way to review is to start from these two VM operations, `VM_ParallelCollectForAllocation` and `VM_ParallelGCCollect` and follow the new code directly, where one can see how allocation-failure triggers various GCs with different collection efforts. >> >> Test: tier1-6; perf-neural for dacapo, specjvm2008, specjbb2015 and cachestresser. > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - review > - Merge branch 'master' into pgc-vm-operation > - pgc-vm-operation Nice refactor. src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 434: > 432: void ParallelScavengeHeap::do_full_collection_no_gc_locker(bool clear_all_soft_refs) { > 433: bool maximum_compaction = clear_all_soft_refs; > 434: PSParallelCompact::invoke(maximum_compaction); The parameter `maximum_heap_compaction` of the method `PSParallelCompact::invoke` was changed to `clear_all_soft_refs` in [JDK-8334445](https://git.openjdk.org/jdk/pull/19763), so the variable `maximum_compaction` seems not necessary here. src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 443: > 441: if (result == nullptr && !is_tlab) { > 442: // auto expand inside > 443: result = old_gen()->allocate(size); If we expand the generation in the method `PSOldGen::allocate`. I think it is good to rename the method to `expand_and_allocate` (just like `TenuredGeneration::expand_and_allocate` in SerialGC). It is better to be polished at a followup issue. src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 446: > 444: } > 445: return result; // Could be null if we are out of space. > 446: } I notice the method `PSOldGen::allocate` can expand the size of the old gen, but the method `PSYoungGen::allocate` can't expand the size of the young gen. It is similar to a bug [1] in Serial. Fortunately, the size of the young generation can be resized during Parallel GC if the option `UseAdaptiveSizePolicy` is `true`. When the `UseAdaptiveSizePolicy` is set to `false` manually by the user, I suspect it is a bug in Parallel because of the unexpanded young generation size. [1] https://bugs.openjdk.org/browse/JDK-8333386 src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 478: > 476: > 477: const bool clear_all_soft_refs = true; > 478: do_full_collection_no_gc_locker(clear_all_soft_refs); If the young collection succeeded in method `collect_at_safepoint`. The normal full collection won't run in `collect_at_safepoint`. If the successful young collection didn't release any memory (or only released little memory but not enough for allocation), the allocation in line 462 will fail too. Then a full collection with maximum compaction will be run. It is strange. In my opinion, I think the steps look like below: 1. allocation 2. young collection 3. allocation 4. normal full collection 5. allocation 6. maximum full collection 7. allocation 8. OOM But in current patch, the step 4-5 may be skipped. src/hotspot/share/gc/parallel/parallelScavengeHeap.hpp line 114: > 112: > 113: // Perform a full collection > 114: void do_full_collection(bool clear_all_soft_refs) override; The comment seems redundant. src/hotspot/share/gc/parallel/psScavenge.cpp line 232: > 230: // Note that this method should only be called from the vm_thread while > 231: // at a safepoint! > 232: bool PSScavenge::invoke() { Nice removal. It is strange to run a full collection in `PSScavenge` before. ------------- Changes requested by gli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20077#pullrequestreview-2172099153 PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1674132961 PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1674390370 PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1674247874 PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1674379967 PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1674344233 PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1674384804 From gli at openjdk.org Thu Jul 11 17:38:15 2024 From: gli at openjdk.org (Guoxiong Li) Date: Thu, 11 Jul 2024 17:38:15 GMT Subject: RFR: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and VM_ParallelGCSystemGC [v3] In-Reply-To: <qkTnSCS8GpxLSZsJrN0_QpK4HGeDscPHVHspATH923M=.56c82c79-a314-41b6-b7c6-ca1178e66152@github.com> References: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com> <CTc1SUPyk4eTQPSB-vU374oKCCvcgLvaM-cPm9qFilk=.67d7d034-5055-429a-948a-d9ec1e834324@github.com> <qkTnSCS8GpxLSZsJrN0_QpK4HGeDscPHVHspATH923M=.56c82c79-a314-41b6-b7c6-ca1178e66152@github.com> Message-ID: <LYtIo9zp0PwC7-1PMJtaovc3MOTjNr9neZWgWgiA1IQ=.e4b4293d-f729-47c4-9df4-0be908726682@github.com> On Thu, 11 Jul 2024 14:39:47 GMT, Guoxiong Li <gli at openjdk.org> wrote: >> Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - review >> - Merge branch 'master' into pgc-vm-operation >> - pgc-vm-operation > > src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 434: > >> 432: void ParallelScavengeHeap::do_full_collection_no_gc_locker(bool clear_all_soft_refs) { >> 433: bool maximum_compaction = clear_all_soft_refs; >> 434: PSParallelCompact::invoke(maximum_compaction); > > The parameter `maximum_heap_compaction` of the method `PSParallelCompact::invoke` was changed to `clear_all_soft_refs` in [JDK-8334445](https://git.openjdk.org/jdk/pull/19763), so the variable `maximum_compaction` seems not necessary here. If the variable `maximum_compaction` is removed, it may be better to use `PSParallelCompact::invoke` directly and remove the method `do_full_collection_no_gc_locker` (just like using `PSScavenge::invoke` directly). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1674260428 From ayang at openjdk.org Thu Jul 11 18:06:34 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 11 Jul 2024 18:06:34 GMT Subject: RFR: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and VM_ParallelGCSystemGC [v4] In-Reply-To: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com> References: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com> Message-ID: <viPT0XNzMpheGP6HtlZ0RI1Gbi-nA9DkCraQSfo81rA=.481fe804-9c5c-441b-b069-7ad7baee772a@github.com> > Similar cleanup as https://github.com/openjdk/jdk/pull/19056 but in Parallel. As a result, the corresponding code in `SerialHeap` and `ParallelScavengeHeap` share much similarity. > > The easiest way to review is to start from these two VM operations, `VM_ParallelCollectForAllocation` and `VM_ParallelGCCollect` and follow the new code directly, where one can see how allocation-failure triggers various GCs with different collection efforts. > > Test: tier1-6; perf-neural for dacapo, specjvm2008, specjbb2015 and cachestresser. Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into pgc-vm-operation - review - review - Merge branch 'master' into pgc-vm-operation - pgc-vm-operation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20077/files - new: https://git.openjdk.org/jdk/pull/20077/files/1d10dd5b..974b6b08 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20077&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20077&range=02-03 Stats: 1640 lines in 65 files changed: 1034 ins; 342 del; 264 mod Patch: https://git.openjdk.org/jdk/pull/20077.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20077/head:pull/20077 PR: https://git.openjdk.org/jdk/pull/20077 From ayang at openjdk.org Thu Jul 11 18:14:37 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 11 Jul 2024 18:14:37 GMT Subject: RFR: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and VM_ParallelGCSystemGC [v3] In-Reply-To: <qkTnSCS8GpxLSZsJrN0_QpK4HGeDscPHVHspATH923M=.56c82c79-a314-41b6-b7c6-ca1178e66152@github.com> References: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com> <CTc1SUPyk4eTQPSB-vU374oKCCvcgLvaM-cPm9qFilk=.67d7d034-5055-429a-948a-d9ec1e834324@github.com> <qkTnSCS8GpxLSZsJrN0_QpK4HGeDscPHVHspATH923M=.56c82c79-a314-41b6-b7c6-ca1178e66152@github.com> Message-ID: <JZIUoUDNkyU49Kkaso_UnStuexRVA-yCmrT2Dt-rfsY=.873c408d-6478-4f7f-87d5-7cbeccb20714@github.com> On Thu, 11 Jul 2024 17:10:58 GMT, Guoxiong Li <gli at openjdk.org> wrote: > If the successful young collection didn't release any memory (or only released little memory but not enough for allocation), A successful young-gc often leave young-gen completely empty. Otherwise, max-compaction full-gc should be run -- there is little benefit of running non-max-compaction full-gc if old-gen is too packed to hold all young-gen objs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1674452022 From shade at openjdk.org Thu Jul 11 19:19:20 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 Jul 2024 19:19:20 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear Message-ID: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. Additional testing: - [x] Linux x86_64 server fastdebug, `all` - [ ] Linux AArch64 server fastdebug, `all` ------------- Commit messages: - Move the membar at the end - Revert C1 parts - Work Changes: https://git.openjdk.org/jdk/pull/20139/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329597 Stats: 132 lines in 7 files changed: 132 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139 PR: https://git.openjdk.org/jdk/pull/20139 From shade at openjdk.org Thu Jul 11 19:19:20 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 Jul 2024 19:19:20 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear In-Reply-To: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com> References: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com> Message-ID: <PkDvYvRdAryzDFqEAUgGNspKH6gsl3Kjp4a_4_6lJos=.fe9d87a5-b15b-47a9-8bcc-9e287ee70944@github.com> On Thu, 11 Jul 2024 15:28:37 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [ ] Linux AArch64 server fastdebug, `all` On Mac AArch64, which suffers from both native call and WX transition: Benchmark Mode Cnt Score Error Units # Intrinsic OFF ReferenceClear.phantom avgt 9 52,297 ? 0,294 ns/op ReferenceClear.phantom_new avgt 9 57,075 ? 0,296 ns/op ReferenceClear.soft avgt 9 52,567 ? 0,393 ns/op ReferenceClear.soft_new avgt 9 57,640 ? 0,264 ns/op ReferenceClear.weak avgt 9 53,018 ? 1,285 ns/op ReferenceClear.weak_new avgt 9 57,227 ? 0,483 ns/op # Intrinsic ON (default) ReferenceClear.phantom avgt 9 0,780 ? 0,017 ns/op ReferenceClear.soft avgt 9 0,784 ? 0,022 ns/op ReferenceClear.weak avgt 9 0,793 ? 0,033 ns/op ReferenceClear.phantom_new avgt 9 3,018 ? 0,015 ns/op ReferenceClear.soft_new avgt 9 3,268 ? 0,014 ns/op ReferenceClear.weak_new avgt 9 3,004 ? 0,057 ns/op On x86_64 m7a.16xlarge, which only suffers from the native call: Benchmark Mode Cnt Score Error Units # Intrinsic OFF ReferenceClear.phantom avgt 9 14.643 ? 0.049 ns/op ReferenceClear.soft avgt 9 14.939 ? 0.438 ns/op ReferenceClear.weak avgt 9 14.648 ? 0.081 ns/op ReferenceClear.phantom_new avgt 9 19.859 ? 2.405 ns/op ReferenceClear.soft_new avgt 9 20.208 ? 1.805 ns/op ReferenceClear.weak_new avgt 9 20.385 ? 2.570 ns/op # Intrinsic ON (default) ReferenceClear.phantom avgt 9 0.821 ? 0.010 ns/op ReferenceClear.soft avgt 9 0.817 ? 0.007 ns/op ReferenceClear.weak avgt 9 0.819 ? 0.010 ns/op ReferenceClear.phantom_new avgt 9 4.195 ? 0.729 ns/op ReferenceClear.soft_new avgt 9 4.315 ? 0.599 ns/op ReferenceClear.weak_new avgt 9 3.986 ? 0.596 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2223248114 From pchilanomate at openjdk.org Thu Jul 11 20:11:50 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 11 Jul 2024 20:11:50 GMT Subject: RFR: 8335269: [Graal] occasional timeout in java/lang/StringBuffer/TestSynchronization.java with loom [v5] In-Reply-To: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com> References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com> Message-ID: <BL9VfiV5JAdKbG-6FyvPjxE20A8zd6W6xJbbIgphzvc=.f5af6373-f7bf-4df8-bd31-a3a801c373ad@github.com> > Please review the following simple fix. A pinned virtual thread calling Thread.yield() in a loop might never poll for safepoints if the compiler relies on a poll in native method Continuation.doYield while optimizing. This is a special native method that doesn't always poll for safepoints, and in particular it doesn't if the virtual thread is pinned due to owning monitors. Currently this scenario can be reproduced with the Graal compiler. > > I included a test which reproduces the issue with Graal (couldn't reproduce the issue with c2). The test times out without the fix and passes with it. I also run the patch through mach5 tiers1-3. > > Thanks, > Patricio Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: add vm.compMode != Xcomp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20016/files - new: https://git.openjdk.org/jdk/pull/20016/files/1cf425dd..2a8b7076 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20016&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20016&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20016.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20016/head:pull/20016 PR: https://git.openjdk.org/jdk/pull/20016 From pchilanomate at openjdk.org Thu Jul 11 20:11:50 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 11 Jul 2024 20:11:50 GMT Subject: RFR: 8335269: [Graal] occasional timeout in java/lang/StringBuffer/TestSynchronization.java with loom [v3] In-Reply-To: <KKXg1PeYIYOr45p4L6lBqNrjMIdMoQI-aydEGygCJZM=.785a668d-9f8a-4211-877b-8fd93f52a835@github.com> References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com> <xcZfnPE5iPxfz9WTSkNWCamtfVSXhpg5UNojhYBsW30=.72bf8fbc-60bc-4250-9284-79b2d75150fb@github.com> <4SmCasO8fGVxb0wnRWQcMDUM63yub0jqnDbVyRr-xBs=.042f56b8-d4f1-4460-95b9-ed09df545b3e@github.com> <RWb7Mt_BMrYVBR3UwJvh7tRR504wpP0RNwvfC5H1R4E=.440e6564-74fb-4758-a4ad-6d2938243893@github.com> <KKXg1PeYIYOr45p4L6lBqNrjMIdMoQI-aydEGygCJZM=.785a668d-9f8a-4211-877b-8fd93f52a835@github.com> Message-ID: <PgG16h9CBBYOBbokn0AY0NsW2xmHKYKczvaAzmqlzk8=.5ebde8ff-0cf9-4128-8429-26cdf6b97aa3@github.com> On Thu, 11 Jul 2024 02:40:08 GMT, David Holmes <dholmes at openjdk.org> wrote: >> The test should never fail even with external flags, so if anything it's just extra testing. But I can add vm.flagless if you prefer. > > flagless might be going too far as we won't test with other GC's etc. Can we just use `@requires vm.compMode != "Xcomp"` to exclude it from the Xcomp specific testing which is redundant. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1674629390 From dean.long at oracle.com Thu Jul 11 20:51:46 2024 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 11 Jul 2024 13:51:46 -0700 Subject: [External] : Re: Where does VMError::print_native_stack and os::get_sender_for_C_frame load/use the frame pointer? In-Reply-To: <CAP2b4GNkafc7zpbXfXJf1OZJMt7wFM_GSF9cFZX4gajrujx+Zg@mail.gmail.com> References: <CAP2b4GNfyhviAjRakViZ6nqjBev9S3hYE0ejJ+zb+CNpa_r4GA@mail.gmail.com> <34aeae4d-22bd-45bd-85e9-4922a368c4c1@oracle.com> <CAP2b4GNkafc7zpbXfXJf1OZJMt7wFM_GSF9cFZX4gajrujx+Zg@mail.gmail.com> Message-ID: <5663ffff-8924-40a9-b5be-e3dacb86381c@oracle.com> Sorry, I responded too quickly.? For some reason I was thinking link() was the same as fp(). If link() works with your compiler, then that is indeed the correct choice. dl On 7/11/24 1:12 AM, Julian Waters wrote: > Hi Dean, > > Thanks for the quick reply. At the risk of testing your patience, I > don't really follow, since that is how os::get_sender_for_C_frame is > implemented on other platforms (I copied it from Linux x86 in this > case). All I got from the comment is that the only reason we usually > have to use the StackWalk API on Windows is because the frame pointer > is not saved when using the Microsoft compiler, however in my case I'm > not using the Microsoft compiler and have verified that the frame > pointer is saved in my custom JVMs. I'm not sure how > VMError::print_native_stack on other platforms manages to work when > they also do > > return frame(fr->sender_sp(), fr->link(), fr->sender_pc()); > > in os::get_sender_for_C_frame like I did here > > Thanks for your time and patience! > > best regards, > Julian > > On Thu, Jul 11, 2024 at 3:17?PM <dean.long at oracle.com> wrote: >> Using fr->link() in get_sender_for_C_frame() gives the wrong answer because it refers to the current frame, not the sender frame. There is no frame::sender_fp() because the information we need could be anywhere in the frame or even nowhere in the frame. This is what the comment about StackWalk() API is hinting at. Even debuggers can have trouble giving an accurate stack trace if external debug information is missing and frames do not contain the needed information themselves. >> >> dl >> >> On 7/10/24 10:52 PM, Julian Waters wrote: >> >> Hi Dean, >> >> I eventually did find frame::link(), but ultimately it didn't seem to help as VMError::print_native_stack still doesn't work properly on Windows. It seems as though frame::link() calls addr_at on x86, which in turn calls frame::fp(), which returns _fp. I think whatever sets _fp for VMError::print_native_stack is the missing link here, but unfortunately I don't know where it's set >> >> The code that I tried on Windows x64 is attached below >> >> best regards, >> Julian >> >> // VC++ does not save frame pointer on stack in optimized build. It >> // can be turned off by -Oy-. If we really want to walk C frames, >> // we can use the StackWalk() API. >> frame os::get_sender_for_C_frame(frame* fr) { >> #ifdef __GNUC__ >> return frame(fr->sender_sp(), fr->link(), fr->sender_pc()); >> #elif defined(_MSC_VER) >> ShouldNotReachHere(); >> return frame(); >> #endif >> } >> >> frame os::current_frame() { >> #ifdef __GNUC__ >> frame f(reinterpret_cast<intptr_t*>(os::current_stack_pointer()), >> reinterpret_cast<intptr_t*>(__builtin_frame_address(1)), >> CAST_FROM_FN_PTR(address, &os::current_frame)); >> if (os::is_first_C_frame(&f)) { >> // stack is not walkable >> return frame(); >> } else { >> return os::get_sender_for_C_frame(&f); >> } >> #elif defined(_MSC_VER) >> return frame(); // cannot walk Windows frames this way. See os::get_native_stack >> // and os::platform_print_native_stack >> #endif >> } From dean.long at oracle.com Thu Jul 11 21:02:27 2024 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 11 Jul 2024 14:02:27 -0700 Subject: [External] : Re: Where does VMError::print_native_stack and os::get_sender_for_C_frame load/use the frame pointer? In-Reply-To: <CAP2b4GPnh9C5VR84KM3c9wgkYiLDf80BMJ5fK26qWsvobGXvkA@mail.gmail.com> References: <CAP2b4GNfyhviAjRakViZ6nqjBev9S3hYE0ejJ+zb+CNpa_r4GA@mail.gmail.com> <34aeae4d-22bd-45bd-85e9-4922a368c4c1@oracle.com> <CAP2b4GNkafc7zpbXfXJf1OZJMt7wFM_GSF9cFZX4gajrujx+Zg@mail.gmail.com> <CAP2b4GPnh9C5VR84KM3c9wgkYiLDf80BMJ5fK26qWsvobGXvkA@mail.gmail.com> Message-ID: <0d5a728f-1dac-4a48-85f1-5d3cba200917@oracle.com> Right.? It shows an alternative to print_native_stack to use with the Microsoft compiler.? If you are using a different compiler that stores the caller FP at frame::link_offset then following the example of other platforms, like Linux, should work. dl On 7/11/24 1:57 AM, Julian Waters wrote: > Seems like I found an old gem where the issue with the frame pointer > was first discovered > > https://bugs.openjdk.org/browse/JDK-8022335 > https://github.com/openjdk/jdk/commit/1c2a7eea85ea261102687190d6b2e92c560770b8 > > best regards, > Julian > > > On Thu, Jul 11, 2024 at 4:12?PM Julian Waters > <tanksherman27 at gmail.com> wrote: > > Hi Dean, > > Thanks for the quick reply. At the risk of testing your patience, I > don't really follow, since that is how os::get_sender_for_C_frame is > implemented on other platforms (I copied it from Linux x86 in this > case). All I got from the comment is that the only reason we usually > have to use the StackWalk API on Windows is because the frame pointer > is not saved when using the Microsoft compiler, however in my case I'm > not using the Microsoft compiler and have verified that the frame > pointer is saved in my custom JVMs. I'm not sure how > VMError::print_native_stack on other platforms manages to work when > they also do > > return frame(fr->sender_sp(), fr->link(), fr->sender_pc()); > > in os::get_sender_for_C_frame like I did here > > Thanks for your time and patience! > > best regards, > Julian > > On Thu, Jul 11, 2024 at 3:17?PM <dean.long at oracle.com> wrote: > > > > Using fr->link() in get_sender_for_C_frame() gives the wrong > answer because it refers to the current frame, not the sender > frame. There is no frame::sender_fp() because the information we > need could be anywhere in the frame or even nowhere in the frame. > This is what the comment about StackWalk() API is hinting at. Even > debuggers can have trouble giving an accurate stack trace if > external debug information is missing and frames do not contain > the needed information themselves. > > > > dl > > > > On 7/10/24 10:52 PM, Julian Waters wrote: > > > > Hi Dean, > > > > I eventually did find frame::link(), but ultimately it didn't > seem to help as VMError::print_native_stack still doesn't work > properly on Windows. It seems as though frame::link() calls > addr_at on x86, which in turn calls frame::fp(), which returns > _fp. I think whatever sets _fp for VMError::print_native_stack is > the missing link here, but unfortunately I don't know where it's set > > > > The code that I tried on Windows x64 is attached below > > > > best regards, > > Julian > > > > // VC++ does not save frame pointer on stack in optimized build. It > > // can be turned off by -Oy-. If we really want to walk C frames, > > // we can use the StackWalk() API. > > frame os::get_sender_for_C_frame(frame* fr) { > > #ifdef __GNUC__ > >? ?return frame(fr->sender_sp(), fr->link(), fr->sender_pc()); > > #elif defined(_MSC_VER) > >? ?ShouldNotReachHere(); > >? ?return frame(); > > #endif > > } > > > > frame os::current_frame() { > > #ifdef __GNUC__ > >? ?frame f(reinterpret_cast<intptr_t*>(os::current_stack_pointer()), > > ?reinterpret_cast<intptr_t*>(__builtin_frame_address(1)), > >? ? ? ? ? ?CAST_FROM_FN_PTR(address, &os::current_frame)); > >? ?if (os::is_first_C_frame(&f)) { > >? ? ?// stack is not walkable > >? ? ?return frame(); > >? ?} else { > >? ? ?return os::get_sender_for_C_frame(&f); > >? ?} > > #elif defined(_MSC_VER) > >? ?return frame();? // cannot walk Windows frames this way.? See > os::get_native_stack > >? ? ? ? ? ? ? ? ? ? // and os::platform_print_native_stack > > #endif > > } > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20240711/31e86e76/attachment-0001.htm> From dnsimon at openjdk.org Thu Jul 11 21:27:51 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 11 Jul 2024 21:27:51 GMT Subject: RFR: 8335269: [Graal] occasional timeout in java/lang/StringBuffer/TestSynchronization.java with loom [v5] In-Reply-To: <BL9VfiV5JAdKbG-6FyvPjxE20A8zd6W6xJbbIgphzvc=.f5af6373-f7bf-4df8-bd31-a3a801c373ad@github.com> References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com> <BL9VfiV5JAdKbG-6FyvPjxE20A8zd6W6xJbbIgphzvc=.f5af6373-f7bf-4df8-bd31-a3a801c373ad@github.com> Message-ID: <FCUs5PXLhm2lw2QudWmlM0_ilbPSxNJc9UojiA3wnYg=.bb2d4030-34b7-4011-b773-a4b91c51f8bc@github.com> On Thu, 11 Jul 2024 20:11:50 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote: >> Please review the following simple fix. A pinned virtual thread calling Thread.yield() in a loop might never poll for safepoints if the compiler relies on a poll in native method Continuation.doYield while optimizing. This is a special native method that doesn't always poll for safepoints, and in particular it doesn't if the virtual thread is pinned due to owning monitors. Currently this scenario can be reproduced with the Graal compiler. >> >> I included a test which reproduces the issue with Graal (couldn't reproduce the issue with c2). The test times out without the fix and passes with it. I also run the patch through mach5 tiers1-3. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > add vm.compMode != Xcomp src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1582: > 1580: freeze_result res = entry->is_pinned() ? freeze_pinned_cs : freeze_pinned_monitor; > 1581: log_develop_trace(continuations)("=== end of freeze (fail %d)", res); > 1582: // Avoid Thread.yield() loops without safepoint polls (see 8335269). Is an explicit reference to a JBS issue id like this still recommended practice? After all, it will be a prefix in the merged commit message. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1674711298 From pchilanomate at openjdk.org Thu Jul 11 21:35:27 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 11 Jul 2024 21:35:27 GMT Subject: RFR: 8335269: [Graal] occasional timeout in java/lang/StringBuffer/TestSynchronization.java with loom [v6] In-Reply-To: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com> References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com> Message-ID: <lrcx4n_WnfGbmtYORBWqjQzBuDscQdyr5OFTmMLexko=.babb7658-ebba-49c2-ae5d-fc3d158ea7db@github.com> > Please review the following simple fix. A pinned virtual thread calling Thread.yield() in a loop might never poll for safepoints if the compiler relies on a poll in native method Continuation.doYield while optimizing. This is a special native method that doesn't always poll for safepoints, and in particular it doesn't if the virtual thread is pinned due to owning monitors. Currently this scenario can be reproduced with the Graal compiler. > > I included a test which reproduces the issue with Graal (couldn't reproduce the issue with c2). The test times out without the fix and passes with it. I also run the patch through mach5 tiers1-3. > > Thanks, > Patricio Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: remove JBS id reference ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20016/files - new: https://git.openjdk.org/jdk/pull/20016/files/2a8b7076..1ea1a06a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20016&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20016&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20016.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20016/head:pull/20016 PR: https://git.openjdk.org/jdk/pull/20016 From pchilanomate at openjdk.org Thu Jul 11 21:35:28 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 11 Jul 2024 21:35:28 GMT Subject: RFR: 8335269: [Graal] occasional timeout in java/lang/StringBuffer/TestSynchronization.java with loom [v5] In-Reply-To: <FCUs5PXLhm2lw2QudWmlM0_ilbPSxNJc9UojiA3wnYg=.bb2d4030-34b7-4011-b773-a4b91c51f8bc@github.com> References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com> <BL9VfiV5JAdKbG-6FyvPjxE20A8zd6W6xJbbIgphzvc=.f5af6373-f7bf-4df8-bd31-a3a801c373ad@github.com> <FCUs5PXLhm2lw2QudWmlM0_ilbPSxNJc9UojiA3wnYg=.bb2d4030-34b7-4011-b773-a4b91c51f8bc@github.com> Message-ID: <TbLM-2xeEDUaFNbsBASWoStGUudUVVDn_I9TJ1ZXwic=.90a502c9-e58f-4d08-bd79-cf5dc1471da4@github.com> On Thu, 11 Jul 2024 21:25:12 GMT, Doug Simon <dnsimon at openjdk.org> wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> add vm.compMode != Xcomp > > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1582: > >> 1580: freeze_result res = entry->is_pinned() ? freeze_pinned_cs : freeze_pinned_monitor; >> 1581: log_develop_trace(continuations)("=== end of freeze (fail %d)", res); >> 1582: // Avoid Thread.yield() loops without safepoint polls (see 8335269). > > Is an explicit reference to a JBS issue id like this still recommended practice? After all, it will be a prefix in the merged commit message. Right, I removed it from the comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1674716191 From vlivanov at openjdk.org Thu Jul 11 22:33:51 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 11 Jul 2024 22:33:51 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) In-Reply-To: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: <yVFbVwZTQCEgaPhy1gw8MPd3RaqlMxxwPFNpxm2SCfs=.aef2f2d3-527d-4218-b307-d49bd217f59e@github.com> On Tue, 9 Jul 2024 12:07:37 GMT, Galder Zamarre?o <galder at openjdk.org> wrote: > This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. > > Currently vectorization does not kick in for loops containing either of these calls because of the following error: > > > VLoop::check_preconditions: failed: control flow in loop not allowed > > > The control flow is due to the java implementation for these methods, e.g. > > > public static long max(long a, long b) { > return (a >= b) ? a : b; > } > > > This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. > By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. > E.g. > > > SuperWord::transform_loop: > Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined > 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) > > > Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1155 > long max 1173 > > > After the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1042 > long max 1042 > > > This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. > Therefore, it still relies on the macro expansion to transform those into CMoveL. > > I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier1 2500 2500 0 0 >>> jtreg:test/jdk:tier1 ... Overall, looks fine. So, there will be `inline_min_max`, `inline_fp_min_max`, and `inline_long_min_max` which slightly vary. I'd prefer to see them unified. (Or, at least, enhance `inline_min_max` to cover `minL`/maxL` cases). Also, it's a bit confusing to see int variants names w/o basic type (`_min`/`_minL` vs `_minI`/`_minL`). Please, clean it up along the way. (FTR I'm also fine handling the renaming as a separate change.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2224062122 From vlivanov at openjdk.org Fri Jul 12 00:01:56 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 12 Jul 2024 00:01:56 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> On Tue, 2 Jul 2024 14:52:09 GMT, Andrew Haley <aph at openjdk.org> wrote: > This patch expands the use of a hash table for secondary superclasses > to the interpreter, C1, and runtime. It also adds a C2 implementation > of hashed lookup in cases where the superclass isn't known at compile > time. > > HotSpot shared runtime > ---------------------- > > Building hashed secondary tables is now unconditional. It takes very > little time, and now that the shared runtime always has the tables, it > might as well take advantage of them. The shared code is easier to > follow now, I think. > > There might be a performance issue with x86-64 in that we build > HotSpot for a default x86-64 target that does not support popcount. > This means that HotSpot C++ runtime on x86 always uses a software > emulation for popcount, even though the vast majority of machines made > for the past 20 years can do popcount in a single instruction. It > wouldn't be terribly hard to do something about that. > > Having said that, the software popcount is really not bad. > > x86 > --- > > x86 is rather tricky, because we still support > `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as > well as 32- and 64-bit ports. There's some further complication in > that only `RCX` can be used as a shift count, so there's some register > shuffling to do. All of this makes the logic in macroAssembler_x86.cpp > rather gnarly, with multiple levels of conditionals at compile time > and runtime. > > AArch64 > ------- > > AArch64 is considerably more straightforward. We always have a > popcount instruction and (thankfully) no 32-bit code to worry about. > > Generally > --------- > > I would dearly love simply to rip out the "old" secondary supers cache > support, but I've left it in just in case someone has a performance > regression. > > The versions of `MacroAssembler::lookup_secondary_supers_table` that > work with variable superclasses don't take a fixed set of temp > registers, and neither do they call out to to a slow path subroutine. > Instead, the slow patch is expanded inline. > > I don't think this is necessarily bad. Apart from the very rare cases > where C2 can't determine the superclass to search for at compile time, > this code is only used for generating stubs, and it seemed to me > ridiculous to have stubs calling other stubs. > > I've followed the guidance from @iwanowww not to obsess too much about > the performance of C1-compiled secondary supers lookups, and to prefer > simplicity over absolute performance. Nonetheless, this is a > complicated patch that touches many areas. Looks very good, Andrew! Some comments on minor things follow. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 1433: > 1431: > 1432: // Don't check secondary_super_cache > 1433: if (super_check_offset.is_register() Do you see any effects from this particular change? It adds a runtime check on the fast path for all subtype checks (irrespective of whether it checks primary or secondary super). Moreover, the very same check is performed after primary super slot is checked. Unless `_secondary_super_cache` field is removed, unconditionally checking the slot at `super_check_offset` is benign. src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 1040: > 1038: > 1039: // Secondary subtype checking > 1040: void lookup_secondary_supers_table(Register sub_klass, While browsing the code, I noticed that it's far from evident at call sites which overload is used (especially with so many arguments). Does it make sense to avoid method overloads here and use distinct method names instead? src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 1981: > 1979: __ load_klass(r19_klass, copied_oop);// query the object klass > 1980: > 1981: BLOCK_COMMENT("type_check:"); Why don't you move it inside `generate_type_check`? src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4781: > 4779: Label* L_success, > 4780: Label* L_failure) { > 4781: if (! UseSecondarySupersTable) { Any particular reason to keep the condition negated? (Here and in general. There are multiple places where `!UseSecondarySupersTable` is used.) src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4810: > 4808: Label* L_success, > 4809: Label* L_failure) { > 4810: // NB! Callers may assume that, when temp2_reg is a valid register, Oh, that's a subtle point... Can we make it more evident at call sites? src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5062: > 5060: > 5061: #ifdef DEBUG > 5062: call_VM_leaf_base((address)&poo, /*number_of_arguments*/0); A leftover from debugging? src/hotspot/share/memory/universe.cpp line 443: > 441: > 442: { > 443: Universe::_the_array_interfaces_bitmap = Klass::compute_secondary_supers_bitmap(_the_array_interfaces_array); Cleanup idea: remove `Universe::` prefixes. The rest of the method don't use qualified names for class members. src/hotspot/share/oops/instanceKlass.cpp line 1410: > 1408: return nullptr; > 1409: } else if (num_extra_slots == 0) { > 1410: if (num_extra_slots == 0 && interfaces->length() <= 1) { Since `secondary_supers` are hashed unconditionally now, is `interfaces->length() <= 1` check still needed? src/hotspot/share/oops/instanceKlass.cpp line 3524: > 3522: > 3523: st->print(BULLET"secondary supers: "); secondary_supers()->print_value_on(st); st->cr(); > 3524: { Any particular reason to keep brackets around `hash_slot` and `bitmap`? src/hotspot/share/oops/klass.cpp line 175: > 173: if (secondary_supers()->at(i) == k) { > 174: if (UseSecondarySupersCache) { > 175: ((Klass*)this)->set_secondary_super_cache(k); Does it make sense to assert `UseSecondarySupersCache` in `Klass::set_secondary_super_cache()`? src/hotspot/share/oops/klass.cpp line 284: > 282: // which doesn't zero out the memory before calling the constructor. > 283: Klass::Klass(KlassKind kind) : _kind(kind), > 284: _bitmap(SECONDARY_SUPERS_BITMAP_FULL), I like the idea, but what are the benefits of initializing `_bitmap` separately from `_secondary_supers`? src/hotspot/share/oops/klass.cpp line 469: > 467: #endif > 468: > 469: bitmap = hash_secondary_supers(secondary_supers, /*rewrite=*/true); // rewrites freshly allocated array I like that hashing is performed unconditionally now. Looks like you can remove `UseSecondarySupersTable`-specific CDS support (in `filemap.cpp`). CDS archive should unconditionally contain hashed tables. src/hotspot/share/oops/klass.inline.hpp line 117: > 115: } > 116: > 117: inline bool Klass::search_secondary_supers(Klass *k) const { I see you moved `Klass::search_secondary_supers` in `klass.inline.hpp`, but I'm not sure how it interacts with `Klass::is_subtype_of` (the sole caller) being declared in `klass.hpp`. Will the inlining still happen if `Klass::is_subtype_of()` callers include `klass.hpp`? src/hotspot/share/oops/klass.inline.hpp line 122: > 120: return true; > 121: > 122: bool result = lookup_secondary_supers_table(k); Should `UseSecondarySupersTable` affect `Klass::search_secondary_supers` as well? ------------- PR Review: https://git.openjdk.org/jdk/pull/19989#pullrequestreview-2161098896 PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1674812600 PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1674823519 PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1674790160 PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1667194729 PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1674832710 PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1667037608 PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1667194339 PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1674792021 PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1667193916 PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1667190974 PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1667192207 PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1667192783 PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1674806107 PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1667193323 From vlivanov at openjdk.org Fri Jul 12 00:01:56 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 12 Jul 2024 00:01:56 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter In-Reply-To: <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> Message-ID: <7JeIjy2PKvI4EZpDain1vd0dBRlWjgjp42xPeY0bHMs=.fee63987-dd85-486d-b7d3-67e52fdbee6f@github.com> On Thu, 11 Jul 2024 23:17:10 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't known at compile >> time. >> >> HotSpot shared runtime >> ---------------------- >> >> Building hashed secondary tables is now unconditional. It takes very >> little time, and now that the shared runtime always has the tables, it >> might as well take advantage of them. The shared code is easier to >> follow now, I think. >> >> There might be a performance issue with x86-64 in that we build >> HotSpot for a default x86-64 target that does not support popcount. >> This means that HotSpot C++ runtime on x86 always uses a software >> emulation for popcount, even though the vast majority of machines made >> for the past 20 years can do popcount in a single instruction. It >> wouldn't be terribly hard to do something about that. >> >> Having said that, the software popcount is really not bad. >> >> x86 >> --- >> >> x86 is rather tricky, because we still support >> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as >> well as 32- and 64-bit ports. There's some further complication in >> that only `RCX` can be used as a shift count, so there's some register >> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp >> rather gnarly, with multiple levels of conditionals at compile time >> and runtime. >> >> AArch64 >> ------- >> >> AArch64 is considerably more straightforward. We always have a >> popcount instruction and (thankfully) no 32-bit code to worry about. >> >> Generally >> --------- >> >> I would dearly love simply to rip out the "old" secondary supers cache >> support, but I've left it in just in case someone has a performance >> regression. >> >> The versions of `MacroAssembler::lookup_secondary_supers_table` that >> work with variable superclasses don't take a fixed set of temp >> registers, and neither do they call out to to a slow path subroutine. >> Instead, the slow patch is expanded inline. >> >> I don't think this is necessarily bad. Apart from the very rare cases >> where C2 can't determine the superclass to search for at compile time, >> this code is only used for generating stubs, and it seemed to me >> ridiculous to have stubs calling other stubs. >> >> I've followed the guidance from @iwanowww not to obsess too much about >> the performance of C1-compiled secondary supers lookups, and to prefer >> simplicity over absolute performance. Nonetheless, this i... > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 1433: > >> 1431: >> 1432: // Don't check secondary_super_cache >> 1433: if (super_check_offset.is_register() > > Do you see any effects from this particular change? > > It adds a runtime check on the fast path for all subtype checks (irrespective of whether it checks primary or secondary super). Moreover, the very same check is performed after primary super slot is checked. > > Unless `_secondary_super_cache` field is removed, unconditionally checking the slot at `super_check_offset` is benign. BTW `MacroAssembler::check_klass_subtype_fast_path` deserves a cleanup: `super_check_offset` can be safely turned into `Register` thus eliminating the code guarded by `super_check_offset.is_register() == false`. > src/hotspot/share/oops/instanceKlass.cpp line 1410: > >> 1408: return nullptr; >> 1409: } else if (num_extra_slots == 0) { >> 1410: if (num_extra_slots == 0 && interfaces->length() <= 1) { > > Since `secondary_supers` are hashed unconditionally now, is `interfaces->length() <= 1` check still needed? Also, `num_extra_slots == 0` check is redundant. > src/hotspot/share/oops/klass.cpp line 284: > >> 282: // which doesn't zero out the memory before calling the constructor. >> 283: Klass::Klass(KlassKind kind) : _kind(kind), >> 284: _bitmap(SECONDARY_SUPERS_BITMAP_FULL), > > I like the idea, but what are the benefits of initializing `_bitmap` separately from `_secondary_supers`? Another observation while browsing the code: `_secondary_supers_bitmap` would be a better name. (Same considerations apply to `_hash_slot`.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1674815196 PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1674798719 PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1674828164 From tanksherman27 at gmail.com Fri Jul 12 00:57:39 2024 From: tanksherman27 at gmail.com (Julian Waters) Date: Fri, 12 Jul 2024 08:57:39 +0800 Subject: Where does VMError::print_native_stack and os::get_sender_for_C_frame load/use the frame pointer? Message-ID: <CAP2b4GM0Ka6BD6nVts7yhs7RRpFNXbZx22egmvw4q0BRRN3r9A@mail.gmail.com> Yep, gcc does indeed save the frame pointer as expected by Java, but it still isn't working (Only prints 1 frame then quits). I would start debugging, but the debugger on my Windows device is down at the moment. Sigh... best regards, Julian -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20240712/cce5e92a/attachment-0001.htm> From gli at openjdk.org Fri Jul 12 01:18:00 2024 From: gli at openjdk.org (Guoxiong Li) Date: Fri, 12 Jul 2024 01:18:00 GMT Subject: RFR: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and VM_ParallelGCSystemGC [v3] In-Reply-To: <JZIUoUDNkyU49Kkaso_UnStuexRVA-yCmrT2Dt-rfsY=.873c408d-6478-4f7f-87d5-7cbeccb20714@github.com> References: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com> <CTc1SUPyk4eTQPSB-vU374oKCCvcgLvaM-cPm9qFilk=.67d7d034-5055-429a-948a-d9ec1e834324@github.com> <qkTnSCS8GpxLSZsJrN0_QpK4HGeDscPHVHspATH923M=.56c82c79-a314-41b6-b7c6-ca1178e66152@github.com> <JZIUoUDNkyU49Kkaso_UnStuexRVA-yCmrT2Dt-rfsY=.873c408d-6478-4f7f-87d5-7cbeccb20714@github.com> Message-ID: <2u2YExr_N6N4lae-i_FV8JVbEOT6cYzOHAftkb2BOmY=.f87ac878-9d9a-45c0-a20d-a5ffb1cabcad@github.com> On Thu, 11 Jul 2024 18:09:24 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: >> src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 478: >> >>> 476: >>> 477: const bool clear_all_soft_refs = true; >>> 478: do_full_collection_no_gc_locker(clear_all_soft_refs); >> >> If the young collection succeeded in method `collect_at_safepoint`. The normal full collection won't run in `collect_at_safepoint`. If the successful young collection didn't release any memory (or only released little memory but not enough for allocation), the allocation in line 462 will fail too. Then a full collection with maximum compaction will be run. It is strange. In my opinion, I think the steps look like below: >> >> 1. allocation >> 2. young collection >> 3. allocation >> 4. normal full collection >> 5. allocation >> 6. maximum full collection >> 7. allocation >> 8. OOM >> >> But in current patch, the step 4-5 may be skipped. > >> If the successful young collection didn't release any memory (or only released little memory but not enough for allocation), > > A successful young-gc often leave young-gen completely empty. Otherwise, max-compaction full-gc should be run -- there is little benefit of running non-max-compaction full-gc if old-gen is too packed to hold all young-gen objs. Thanks for your explanation. I am OK with the current solution now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1674923459 From gli at openjdk.org Fri Jul 12 01:24:58 2024 From: gli at openjdk.org (Guoxiong Li) Date: Fri, 12 Jul 2024 01:24:58 GMT Subject: RFR: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and VM_ParallelGCSystemGC [v3] In-Reply-To: <qkTnSCS8GpxLSZsJrN0_QpK4HGeDscPHVHspATH923M=.56c82c79-a314-41b6-b7c6-ca1178e66152@github.com> References: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com> <CTc1SUPyk4eTQPSB-vU374oKCCvcgLvaM-cPm9qFilk=.67d7d034-5055-429a-948a-d9ec1e834324@github.com> <qkTnSCS8GpxLSZsJrN0_QpK4HGeDscPHVHspATH923M=.56c82c79-a314-41b6-b7c6-ca1178e66152@github.com> Message-ID: <0bOEaMQ75JB0T22pbSsMEP1UV7lh1pUHoGjgTkold-w=.bfdbb2f7-010d-4d24-ab79-7fc40aadc929@github.com> On Thu, 11 Jul 2024 15:40:01 GMT, Guoxiong Li <gli at openjdk.org> wrote: >> Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - review >> - Merge branch 'master' into pgc-vm-operation >> - pgc-vm-operation > > src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 446: > >> 444: } >> 445: return result; // Could be null if we are out of space. >> 446: } > > I notice the method `PSOldGen::allocate` can expand the size of the old gen, but the method `PSYoungGen::allocate` can't expand the size of the young gen. It is similar to a bug [1] in Serial. Fortunately, the size of the young generation can be resized during Parallel GC if the option `UseAdaptiveSizePolicy` is `true`. When the `UseAdaptiveSizePolicy` is set to `false` manually by the user, I suspect it is a bug in Parallel because of the unexpanded young generation size. > > [1] https://bugs.openjdk.org/browse/JDK-8333386 @albertnetymk Do you think whether we need to expand young generation during allocation (both Serial and Parallel)? In Serial, `UseAdaptiveSizePolicy` is not used, so it is indeed a bug in Serial (the young generation can't be resized and is always the initial size). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1674933630 From duke at openjdk.org Fri Jul 12 01:43:14 2024 From: duke at openjdk.org (Shaojin Wen) Date: Fri, 12 Jul 2024 01:43:14 GMT Subject: RFR: 8336278: Micro-optimize Replace String.format("%n") to System.lineSeparator Message-ID: <Wq0CZfwc1zPhr-zfj7K2iSXSMbRtbr9mfvjBshZNpo0=.cd467619-c484-4167-a34c-516e05bbc67f@github.com> There are three places in the JDK code where String.format("%n") is used. This is actually equivalent to System.lineSeparator and does not require the implementation of String.format. ------------- Commit messages: - replace String.format("%n") to System.lineSeparator() Changes: https://git.openjdk.org/jdk/pull/20149/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20149&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8336278 Stats: 6 lines in 3 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20149.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20149/head:pull/20149 PR: https://git.openjdk.org/jdk/pull/20149 From dholmes at openjdk.org Fri Jul 12 02:23:06 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 12 Jul 2024 02:23:06 GMT Subject: RFR: 8325945: Error reporting should limit the number of String characters printed Message-ID: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log. The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 0 to O_BUFLEN. The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless. If a string's length exceeds `max_length` then we print it as follows: "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged) For example if we print "ABCDE" with a max_length of 4 then the output is literally: "AB ... DE" (abridged) The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive). For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`. Testing: - new test added for validation purposes - tiers 1 - 3 as sanity testing Thanks ------------- Commit messages: - Improve flag description - Fix indent - Merge branch 'master' into 8325945-print-string - 8325945: Error reporting should limit the number of String characters printed Changes: https://git.openjdk.org/jdk/pull/20150/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20150&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325945 Stats: 154 lines in 6 files changed: 151 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20150.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20150/head:pull/20150 PR: https://git.openjdk.org/jdk/pull/20150 From dholmes at openjdk.org Fri Jul 12 02:59:50 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 12 Jul 2024 02:59:50 GMT Subject: RFR: 8335269: [Graal] occasional timeout in java/lang/StringBuffer/TestSynchronization.java with loom [v6] In-Reply-To: <lrcx4n_WnfGbmtYORBWqjQzBuDscQdyr5OFTmMLexko=.babb7658-ebba-49c2-ae5d-fc3d158ea7db@github.com> References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com> <lrcx4n_WnfGbmtYORBWqjQzBuDscQdyr5OFTmMLexko=.babb7658-ebba-49c2-ae5d-fc3d158ea7db@github.com> Message-ID: <7vypV2vgUGYMeKbyw5--Vhe7p0bxby0eH5j1sthpZso=.3fcf07a0-7680-4b22-ab73-0e77147d7a08@github.com> On Thu, 11 Jul 2024 21:35:27 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote: >> Please review the following simple fix. A pinned virtual thread calling Thread.yield() in a loop might never poll for safepoints if the compiler relies on a poll in native method Continuation.doYield while optimizing. This is a special native method that doesn't always poll for safepoints, and in particular it doesn't if the virtual thread is pinned due to owning monitors. Currently this scenario can be reproduced with the Graal compiler. >> >> I included a test which reproduces the issue with Graal (couldn't reproduce the issue with c2). The test times out without the fix and passes with it. I also run the patch through mach5 tiers1-3. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > remove JBS id reference Nothing further from me. Seems reasonable. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20016#pullrequestreview-2173576817 From jwaters at openjdk.org Fri Jul 12 03:24:58 2024 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 12 Jul 2024 03:24:58 GMT Subject: RFR: 8316930: HotSpot should use noexcept instead of throw() [v5] In-Reply-To: <9k00GYxtEiNBgrtIsIYJUIdwwPjynEm6aONdchZreP4=.0ad54916-180e-4317-8385-e339595a340a@github.com> References: <kc_cq_sBCqn-iAwHCEaTqgMVYrnT6tKsk3SZnD_qP-s=.1b5d24dd-a925-4f6d-aefb-67b4df6bddac@github.com> <9k00GYxtEiNBgrtIsIYJUIdwwPjynEm6aONdchZreP4=.0ad54916-180e-4317-8385-e339595a340a@github.com> Message-ID: <EHOth-ipPAwv50rX0JRcBl3rP_z8mpPbY3TOvEnMHyU=.02579b23-fab1-4b49-9f84-35f95d603efe@github.com> On Tue, 6 Feb 2024 07:04:00 GMT, Julian Waters <jwaters at openjdk.org> wrote: >> throw() has been deprecated since C++11 alongside dynamic exception specifications, we should replace all instances of it with noexcept to prepare HotSpot for later versions of C++ > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - Merge branch 'openjdk:master' into noexcept > - Merge branch 'openjdk:master' into noexcept > - Typo in GensrcAdlc.gmk > - Merge branch 'openjdk:master' into noexcept > - Merge branch 'master' into noexcept > - ic in compiledIC.hpp > - Revert compiledIC.cpp > - Revert compiledIC.hpp > - Partially Revert parse.hpp > - Merge branch 'master' into noexcept > - ... and 4 more: https://git.openjdk.org/jdk/compare/9ee9f288...b73a6882 I would like to address this soon, but will probably need help writing noexcept for the Style Guide ------------- PR Comment: https://git.openjdk.org/jdk/pull/15910#issuecomment-2224351303 From darcy at openjdk.org Fri Jul 12 04:07:50 2024 From: darcy at openjdk.org (Joe Darcy) Date: Fri, 12 Jul 2024 04:07:50 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) In-Reply-To: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: <cDo0IswnQEKfYYkgGcx1DlmNCb25Kg7EUeqPQEosyS8=.e92172e2-a4f3-481f-9c90-c04dbf3558fb@github.com> On Tue, 9 Jul 2024 12:07:37 GMT, Galder Zamarre?o <galder at openjdk.org> wrote: > This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. > > Currently vectorization does not kick in for loops containing either of these calls because of the following error: > > > VLoop::check_preconditions: failed: control flow in loop not allowed > > > The control flow is due to the java implementation for these methods, e.g. > > > public static long max(long a, long b) { > return (a >= b) ? a : b; > } > > > This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. > By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. > E.g. > > > SuperWord::transform_loop: > Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined > 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) > > > Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1155 > long max 1173 > > > After the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1042 > long max 1042 > > > This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. > Therefore, it still relies on the macro expansion to transform those into CMoveL. > > I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier1 2500 2500 0 0 >>> jtreg:test/jdk:tier1 ... Marked as reviewed by darcy (Reviewer). Core libs changes looks fine; bumping review count for the remainder of the PR. ------------- PR Review: https://git.openjdk.org/jdk/pull/20098#pullrequestreview-2173771454 PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2224456985 From dholmes at openjdk.org Fri Jul 12 04:15:56 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 12 Jul 2024 04:15:56 GMT Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers [v2] In-Reply-To: <WSVnDVWEq7cIaiEd2-pdWW4Il8Qi4wwvjF2yyveKcgM=.613045d7-a827-4f3d-bcf4-ba9200a2c8f4@github.com> References: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com> <WSVnDVWEq7cIaiEd2-pdWW4Il8Qi4wwvjF2yyveKcgM=.613045d7-a827-4f3d-bcf4-ba9200a2c8f4@github.com> Message-ID: <t3K5QhtFrCpM4EoXc_pskncDv72bSfKgUKfguzjVI0Q=.4e5b01d1-9cad-45ec-8d70-656615bee374@github.com> On Thu, 11 Jul 2024 08:58:11 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: >> All around Hotspot, we have calls to `method->is_initializer()`. That methods test for both instance and static initializers. In many cases, the uses imply we actually want to test for constructor (instance initializer), not static initializer. Sometimes we filter explicitly for `!m->is_static()`, sometimes we don't. Often we get lucky by never being exposed to static initializers on particular paths. >> >> I would like to sharpen this. I went back and forth, and ultimately decided to remove `is_initializer` completely to avoid future confusion, and rewrite the uses appropriately. >> >> Additional testing: >> - [x] Linux AArch64 server fastdebug, `all` (includes Fuzzer and CTW tests) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Indenting It is evident that people have been unfamiliar/sloppy with this API. This change should help prevent that in future. I have a concern about one change. Thanks src/hotspot/share/classfile/javaClasses.cpp line 3018: > 3016: int flags = (jushort)( m->access_flags().as_short() & JVM_RECOGNIZED_METHOD_MODIFIERS ); > 3017: if (m->is_object_initializer()) { > 3018: flags |= java_lang_invoke_MemberName::MN_IS_CONSTRUCTOR; I'm going to assume that `clinit` would already get filtered out at some point otherwise this would be a change in behaviour. src/hotspot/share/runtime/reflection.cpp line 772: > 770: assert(!method()->is_object_initializer() && > 771: (for_constant_pool_access || !method()->is_static_initializer()), > 772: "should call new_constructor instead"); Nit: existing -The assert message isn't really correct ------------- PR Review: https://git.openjdk.org/jdk/pull/20120#pullrequestreview-2173741407 PR Review Comment: https://git.openjdk.org/jdk/pull/20120#discussion_r1675207989 PR Review Comment: https://git.openjdk.org/jdk/pull/20120#discussion_r1675249560 From dholmes at openjdk.org Fri Jul 12 04:18:50 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 12 Jul 2024 04:18:50 GMT Subject: RFR: 8336163: Remove declarations of some debug-only methods in release build [v2] In-Reply-To: <4BJhfsLXcD6wvT8iZNXOeQ3IL2llZcqCPlCbesjlH4U=.34316abb-b4b0-44f9-a4d2-0ed1c1800ea2@github.com> References: <r3075sVKxO34FohH4gtlidTGqmu5y_0qL4_TU3DdbG8=.fc8b604d-bc8a-48a5-a8a7-8fecbd5d3c4f@github.com> <mqelp_tuVfW5bHGxY7EDkEPZYB6PR3Ogyi2OVXscA60=.e8610c4a-4352-4b5d-af18-1fbf49cfd7dd@github.com> <4BJhfsLXcD6wvT8iZNXOeQ3IL2llZcqCPlCbesjlH4U=.34316abb-b4b0-44f9-a4d2-0ed1c1800ea2@github.com> Message-ID: <seGC926uzjtTonUUes139jASOT8QaEGo38y9UkDLgRI=.52e038d9-3564-4deb-b272-7d978a744fff@github.com> On Thu, 11 Jul 2024 07:26:52 GMT, Qizheng Xing <qxing at openjdk.org> wrote: >> Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: >> >> Update copyright. > > Thanks for the review. @MaxXSoft hotspot changes require two reviewers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20131#issuecomment-2224509238 From aboldtch at openjdk.org Fri Jul 12 05:57:30 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 12 Jul 2024 05:57:30 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6] In-Reply-To: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> Message-ID: <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com> > When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. > > This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. > > A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). > > This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). > > Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. > > # Cleanups > > Cleaned up displaced header usage for: > * BasicLock > * Contains some Zero changes > * Renames one exported JVMCI field > * ObjectMonitor > * Updates comments and tests consistencies > > # Refactoring > > `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. > > The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. > > _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ > > # LightweightSynchronizer > > Working on adapting and incorporating the following section as a comment in the source code > > ## Fast Locking > > CAS on locking bits in markWord. > 0b00 (Fast Locked) <--> 0b01 (Unlocked) > > When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. > > If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf... Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Update arguments.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20067/files - new: https://git.openjdk.org/jdk/pull/20067/files/a207544b..15997bc3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20067.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067 PR: https://git.openjdk.org/jdk/pull/20067 From qxing at openjdk.org Fri Jul 12 07:02:52 2024 From: qxing at openjdk.org (Qizheng Xing) Date: Fri, 12 Jul 2024 07:02:52 GMT Subject: RFR: 8336163: Remove declarations of some debug-only methods in release build [v2] In-Reply-To: <seGC926uzjtTonUUes139jASOT8QaEGo38y9UkDLgRI=.52e038d9-3564-4deb-b272-7d978a744fff@github.com> References: <r3075sVKxO34FohH4gtlidTGqmu5y_0qL4_TU3DdbG8=.fc8b604d-bc8a-48a5-a8a7-8fecbd5d3c4f@github.com> <mqelp_tuVfW5bHGxY7EDkEPZYB6PR3Ogyi2OVXscA60=.e8610c4a-4352-4b5d-af18-1fbf49cfd7dd@github.com> <4BJhfsLXcD6wvT8iZNXOeQ3IL2llZcqCPlCbesjlH4U=.34316abb-b4b0-44f9-a4d2-0ed1c1800ea2@github.com> <seGC926uzjtTonUUes139jASOT8QaEGo38y9UkDLgRI=.52e038d9-3564-4deb-b272-7d978a744fff@github.com> Message-ID: <o73U9AZ5aYUFRuptiTid2ygpbEDMXZRhY2V87_0lAO0=.8eba6994-91c3-4b36-9aec-0fcc79cb11a7@github.com> On Fri, 12 Jul 2024 04:15:48 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Thanks for the review. > > @MaxXSoft hotspot changes require two reviewers. @dholmes-ora Sorry. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20131#issuecomment-2224950909 From dnsimon at openjdk.org Fri Jul 12 07:37:50 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 12 Jul 2024 07:37:50 GMT Subject: RFR: 8336278: Micro-optimize Replace String.format("%n") to System.lineSeparator In-Reply-To: <Wq0CZfwc1zPhr-zfj7K2iSXSMbRtbr9mfvjBshZNpo0=.cd467619-c484-4167-a34c-516e05bbc67f@github.com> References: <Wq0CZfwc1zPhr-zfj7K2iSXSMbRtbr9mfvjBshZNpo0=.cd467619-c484-4167-a34c-516e05bbc67f@github.com> Message-ID: <pgFdiXYSe8Y3DE9EQl1dl0z-xd9YODEu9we1VcqJULM=.f853b262-2554-42aa-8261-f02a27cb2ab3@github.com> On Thu, 11 Jul 2024 22:45:47 GMT, Shaojin Wen <duke at openjdk.org> wrote: > There are three places in the JDK code where String.format("%n") is used. This is actually equivalent to System.lineSeparator and does not require the implementation of String.format. Looks good and trivial to me. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20149#pullrequestreview-2174075257 From shade at openjdk.org Fri Jul 12 07:57:50 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 12 Jul 2024 07:57:50 GMT Subject: RFR: 8336278: Micro-optimize Replace String.format("%n") to System.lineSeparator In-Reply-To: <Wq0CZfwc1zPhr-zfj7K2iSXSMbRtbr9mfvjBshZNpo0=.cd467619-c484-4167-a34c-516e05bbc67f@github.com> References: <Wq0CZfwc1zPhr-zfj7K2iSXSMbRtbr9mfvjBshZNpo0=.cd467619-c484-4167-a34c-516e05bbc67f@github.com> Message-ID: <iewRfB_a0Sy1rtMNowp945-8lkGaiuPms-KeQBHLlEo=.d79da08b-7614-4f51-b41d-139ef024ad53@github.com> On Thu, 11 Jul 2024 22:45:47 GMT, Shaojin Wen <duke at openjdk.org> wrote: > There are three places in the JDK code where String.format("%n") is used. This is actually equivalent to System.lineSeparator and does not require the implementation of String.format. Hah! Yes, it makes no sense to call into `String.format` to just get the line separator. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20149#pullrequestreview-2174110063 From shade at openjdk.org Fri Jul 12 09:17:22 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 12 Jul 2024 09:17:22 GMT Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers [v3] In-Reply-To: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com> References: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com> Message-ID: <swBWpqAm_k6hHjGcwdNBowWfdBpksxtD63PiGp0KI1c=.ad02279c-ed66-40a0-9b01-379d4410a16c@github.com> > All around Hotspot, we have calls to `method->is_initializer()`. That methods test for both instance and static initializers. In many cases, the uses imply we actually want to test for constructor (instance initializer), not static initializer. Sometimes we filter explicitly for `!m->is_static()`, sometimes we don't. Often we get lucky by never being exposed to static initializers on particular paths. > > I would like to sharpen this. I went back and forth, and ultimately decided to remove `is_initializer` completely to avoid future confusion, and rewrite the uses appropriately. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `all` (includes Fuzzer and CTW tests) Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Touch up assert messages ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20120/files - new: https://git.openjdk.org/jdk/pull/20120/files/c5da5ebd..a18f7a46 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20120&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20120&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20120.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20120/head:pull/20120 PR: https://git.openjdk.org/jdk/pull/20120 From shade at openjdk.org Fri Jul 12 09:17:23 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 12 Jul 2024 09:17:23 GMT Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers [v2] In-Reply-To: <t3K5QhtFrCpM4EoXc_pskncDv72bSfKgUKfguzjVI0Q=.4e5b01d1-9cad-45ec-8d70-656615bee374@github.com> References: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com> <WSVnDVWEq7cIaiEd2-pdWW4Il8Qi4wwvjF2yyveKcgM=.613045d7-a827-4f3d-bcf4-ba9200a2c8f4@github.com> <t3K5QhtFrCpM4EoXc_pskncDv72bSfKgUKfguzjVI0Q=.4e5b01d1-9cad-45ec-8d70-656615bee374@github.com> Message-ID: <0j_XZ2e84ADGz8jxk21pFyF0QNhubV0i7sVi5sxnSyg=.7281e6d1-bf24-49f1-96a6-8284c4c9f90d@github.com> On Fri, 12 Jul 2024 03:59:06 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Indenting > > src/hotspot/share/classfile/javaClasses.cpp line 3018: > >> 3016: int flags = (jushort)( m->access_flags().as_short() & JVM_RECOGNIZED_METHOD_MODIFIERS ); >> 3017: if (m->is_object_initializer()) { >> 3018: flags |= java_lang_invoke_MemberName::MN_IS_CONSTRUCTOR; > > I'm going to assume that `clinit` would already get filtered out at some point otherwise this would be a change in behaviour. No, it is not filtered, we still have `clinit`-s on this path. In the initial version https://github.com/openjdk/jdk/pull/20120/commits/6769cfe609849aa9ed0985dcbecb2b0aa24bca03 I caught the assert in many tests, mostly in stack traces generation. Yes, this changes the behavior: `clinit` would now be recorded as "method", instead of "constructor". Tracing back the uses of `get_flags`: it is used for initializing `java.lang.ClassFrameInfo.flags`. There seem to be no readers for this field in VM. Java side for `j.l.CFI` does not seem to check any method/constructor flags. So I would say this change in behavior is not really visible, and there is no need to try and keep the old (odd) behavior. > src/hotspot/share/runtime/reflection.cpp line 772: > >> 770: assert(!method()->is_object_initializer() && >> 771: (for_constant_pool_access || !method()->is_static_initializer()), >> 772: "should call new_constructor instead"); > > Nit: existing -The assert message isn't really correct Yeah, it is a bit odd. I thought to leave the messages alone, but we can massage them as well. Should be done in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20120#discussion_r1675564908 PR Review Comment: https://git.openjdk.org/jdk/pull/20120#discussion_r1675565003 From rkennke at openjdk.org Fri Jul 12 09:55:52 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 Jul 2024 09:55:52 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6] In-Reply-To: <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com> Message-ID: <TZcCyU0Zgrw6UwJ6-v_k0W06ChzxniusrEiK1UPErt0=.a028b5e9-dd11-4f2c-94d2-e427ad85a8ee@github.com> On Fri, 12 Jul 2024 05:57:30 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Update arguments.cpp I've reviewed and tested some earlier versions of this change in the context of Lilliput, and haven't encountered any showstopping problems. Very nice work! When you say 'This patch has been evaluated to be performance neutral when UseObjectMonitorTable is turned off (the default).' - what does the performance look like with +UOMT? How does it compare to -UOMT? I've only reviewed the platform-specific changes so far. Mostly looks good, I only have some relatively minor remarks. Will review the shared code changes separately. src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 318: > 316: > 317: // Loop after unrolling, advance iterator. > 318: increment(t3_t, in_bytes(OMCache::oop_to_oop_difference())); Maybe I am misreading this but... in the unroll loop you avoid emitting the increment on the last iteration, but then you emit it explicitely here? Wouldn't it be cleaner to do it in the unroll loop always and elide the explicit increment after loop? src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 343: > 341: const Register t3_owner = t3; > 342: const ByteSize monitor_tag = in_ByteSize(UseObjectMonitorTable ? 0 : checked_cast<int>(markWord::monitor_value)); > 343: const Address owner_address{t1_monitor, ObjectMonitor::owner_offset() - monitor_tag}; That may be just me, but I found that syntax weird. I first needed to look-up what the {}-initializer actually means. Hiccups like this reduce readability, IMO. I'd prefer the normal ()-init for the Address like we seem to do everywhere else. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 674: > 672: > 673: // Loop after unrolling, advance iterator. > 674: increment(t, in_bytes(OMCache::oop_to_oop_difference())); Same issue as in aarch64 code. ------------- Changes requested by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20067#pullrequestreview-2174300266 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675587650 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675597362 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675605009 From ayang at openjdk.org Fri Jul 12 09:56:51 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 12 Jul 2024 09:56:51 GMT Subject: RFR: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and VM_ParallelGCSystemGC [v3] In-Reply-To: <0bOEaMQ75JB0T22pbSsMEP1UV7lh1pUHoGjgTkold-w=.bfdbb2f7-010d-4d24-ab79-7fc40aadc929@github.com> References: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com> <CTc1SUPyk4eTQPSB-vU374oKCCvcgLvaM-cPm9qFilk=.67d7d034-5055-429a-948a-d9ec1e834324@github.com> <qkTnSCS8GpxLSZsJrN0_QpK4HGeDscPHVHspATH923M=.56c82c79-a314-41b6-b7c6-ca1178e66152@github.com> <0bOEaMQ75JB0T22pbSsMEP1UV7lh1pUHoGjgTkold-w=.bfdbb2f7-010d-4d24-ab79-7fc40aadc929@github.com> Message-ID: <IjVuddD0IemO58P8xHSnFquVaDehOl3OA-0r9kDZjh8=.30b5dcef-ac21-4a3d-a782-d7659b159229@github.com> On Fri, 12 Jul 2024 01:22:28 GMT, Guoxiong Li <gli at openjdk.org> wrote: >> src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 446: >> >>> 444: } >>> 445: return result; // Could be null if we are out of space. >>> 446: } >> >> I notice the method `PSOldGen::allocate` can expand the size of the old gen, but the method `PSYoungGen::allocate` can't expand the size of the young gen. It is similar to a bug [1] in Serial. Fortunately, the size of the young generation can be resized during Parallel GC if the option `UseAdaptiveSizePolicy` is `true`. When the `UseAdaptiveSizePolicy` is set to `false` manually by the user, I suspect it is a bug in Parallel because of the unexpanded young generation size. >> >> [1] https://bugs.openjdk.org/browse/JDK-8333386 > > @albertnetymk Do you think whether we need to expand young generation during allocation (both Serial and Parallel)? In Serial, `UseAdaptiveSizePolicy` is not used, so it is indeed a bug in Serial (the young generation can't be resized and is always the initial size). Due to the internal structure (eden/survivor) of young-gen, it's not super easy to expand young-gen during allocation like old-gen. Need a dedicated ticket to properly evaluate its cost/benefit. > Serial (the young generation can't be resized and is always the initial size). That sounds like a definite bug; at least young-gen should be resizable during young-gc/full-gc. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1675613933 From rkennke at openjdk.org Fri Jul 12 10:14:52 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 Jul 2024 10:14:52 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6] In-Reply-To: <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com> Message-ID: <ZrdkZBka-bE3JWmIIozLSyuncmU2cAi8MB-sdZE0ue0=.f8f5c66a-da77-4bad-b8f4-842158312cb4@github.com> On Fri, 12 Jul 2024 05:57:30 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Update arguments.cpp Is there a plan to get rid of the UseObjectMonitorTable flag in a future release? Ideally we would have one fast-locking implementation (LW locking) with one OM mapping (+UOMT), right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20067#issuecomment-2225260710 From eosterlund at openjdk.org Fri Jul 12 10:18:51 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 12 Jul 2024 10:18:51 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear In-Reply-To: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com> References: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com> Message-ID: <o7zszGQ4GxfAx_LutX6S8rCLrZVHro9Ggreo5tICcvw=.825e4096-7b13-4ce5-b5cc-53e8d5603ecf@github.com> On Thu, 11 Jul 2024 15:28:37 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [ ] Linux AArch64 server fastdebug, `all` The reason we did not do this before is that this is not a strong reference store. Strong reference stores with a SATB collector will keep the referent alive, which is typically the exact opposite of what a user wants when they clear a Reference. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2225266939 From shade at openjdk.org Fri Jul 12 10:24:57 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 12 Jul 2024 10:24:57 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear In-Reply-To: <o7zszGQ4GxfAx_LutX6S8rCLrZVHro9Ggreo5tICcvw=.825e4096-7b13-4ce5-b5cc-53e8d5603ecf@github.com> References: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com> <o7zszGQ4GxfAx_LutX6S8rCLrZVHro9Ggreo5tICcvw=.825e4096-7b13-4ce5-b5cc-53e8d5603ecf@github.com> Message-ID: <K2EJ43EXkTgJE0pjwzy50s3BoTAhF1Y2trwHtDzhojQ=.e837c7bc-717c-4826-8cc3-82a2232bc928@github.com> On Fri, 12 Jul 2024 10:16:13 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: > The reason we did not do this before is that this is not a strong reference store. Strong reference stores with a SATB collector will keep the referent alive, which is typically the exact opposite of what a user wants when they clear a Reference. You mean not doing this store just on the Java side? Yes, I agree, it would be awkward. In intrinsic, we are storing with the same decorators that `JVM_ReferenceClear` is using, which should be good with SATB collectors. Perhaps I am misunderstanding the comment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2225277261 From aboldtch at openjdk.org Fri Jul 12 10:54:23 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 12 Jul 2024 10:54:23 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v7] In-Reply-To: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> Message-ID: <PgX4o-qTT_4gDiZ94towRtB7xs7zkYMcoTpp51iz5vM=.4085c8c0-679a-4e84-8cb0-20bfb9ec80bf@github.com> > When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. > > This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. > > A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). > > This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). > > Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. > > # Cleanups > > Cleaned up displaced header usage for: > * BasicLock > * Contains some Zero changes > * Renames one exported JVMCI field > * ObjectMonitor > * Updates comments and tests consistencies > > # Refactoring > > `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. > > The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. > > _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ > > # LightweightSynchronizer > > Working on adapting and incorporating the following section as a comment in the source code > > ## Fast Locking > > CAS on locking bits in markWord. > 0b00 (Fast Locked) <--> 0b01 (Unlocked) > > When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. > > If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf... Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Cleanup c2 cache lookup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20067/files - new: https://git.openjdk.org/jdk/pull/20067/files/15997bc3..e1eb8c95 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=05-06 Stats: 12 lines in 2 files changed: 0 ins; 10 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20067.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067 PR: https://git.openjdk.org/jdk/pull/20067 From aboldtch at openjdk.org Fri Jul 12 10:54:24 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 12 Jul 2024 10:54:24 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6] In-Reply-To: <TZcCyU0Zgrw6UwJ6-v_k0W06ChzxniusrEiK1UPErt0=.a028b5e9-dd11-4f2c-94d2-e427ad85a8ee@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com> <TZcCyU0Zgrw6UwJ6-v_k0W06ChzxniusrEiK1UPErt0=.a028b5e9-dd11-4f2c-94d2-e427ad85a8ee@github.com> Message-ID: <qk3m4DFlOPmoz1Ke2dUv74782IsXXrzJAlz-Axlvy4o=.29df14df-e3bf-4649-bc36-7d9082c9fdb8@github.com> On Fri, 12 Jul 2024 09:32:44 GMT, Roman Kennke <rkennke at openjdk.org> wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Update arguments.cpp > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 318: > >> 316: >> 317: // Loop after unrolling, advance iterator. >> 318: increment(t3_t, in_bytes(OMCache::oop_to_oop_difference())); > > Maybe I am misreading this but... in the unroll loop you avoid emitting the increment on the last iteration, but then you emit it explicitely here? Wouldn't it be cleaner to do it in the unroll loop always and elide the explicit increment after loop? You are correct. It is a leftover from when it was possible to tweak the number of unrolled lookups as well as whether it should loop the tail. Fixed. > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 343: > >> 341: const Register t3_owner = t3; >> 342: const ByteSize monitor_tag = in_ByteSize(UseObjectMonitorTable ? 0 : checked_cast<int>(markWord::monitor_value)); >> 343: const Address owner_address{t1_monitor, ObjectMonitor::owner_offset() - monitor_tag}; > > That may be just me, but I found that syntax weird. I first needed to look-up what the {}-initializer actually means. Hiccups like this reduce readability, IMO. I'd prefer the normal ()-init for the Address like we seem to do everywhere else. I see. I tend to prefer uniform initialization as it makes narrowing conversions illegal. I remember `uniform initialization` coming up in some previous PR as well. It is really only neccesary for some types of templated code, but it does also makes easier to not make mistakes in the general case (as long as you avoid `std::initializer_list`, which I think we explicitly forbid in our coding guidelines). I do not recall what the conclusion of that discussion was. But maybe it was that this feature is to exotic and foreign for hotspot. I prefer it tough. Even if I fail to consistently use it. > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 674: > >> 672: >> 673: // Loop after unrolling, advance iterator. >> 674: increment(t, in_bytes(OMCache::oop_to_oop_difference())); > > Same issue as in aarch64 code. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675676768 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675676879 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675677068 From aboldtch at openjdk.org Fri Jul 12 11:08:53 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 12 Jul 2024 11:08:53 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6] In-Reply-To: <TZcCyU0Zgrw6UwJ6-v_k0W06ChzxniusrEiK1UPErt0=.a028b5e9-dd11-4f2c-94d2-e427ad85a8ee@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com> <TZcCyU0Zgrw6UwJ6-v_k0W06ChzxniusrEiK1UPErt0=.a028b5e9-dd11-4f2c-94d2-e427ad85a8ee@github.com> Message-ID: <n7QM8yrj5JF-VZjcfLG9OnSXGb9Kbtt4uFMrXNMDkJw=.2a91529b-7bcb-4d89-a21f-0917fc0b129d@github.com> On Fri, 12 Jul 2024 09:53:11 GMT, Roman Kennke <rkennke at openjdk.org> wrote: > When you say 'This patch has been evaluated to be performance neutral when UseObjectMonitorTable is turned off (the default).' - what does the performance look like with +UOMT? How does it compare to -UOMT? Most benchmarks are unaffected as they do not use any contended locking or wait/notify. Some see improvements and some show regressions. The most significant regressions are in `DaCapo-xalan` which is very sensitive to the timing of enter. It seems to rely quite heavily on how fast you can get to `ObjectMonitor::TrySpin` as well as the exact behaviour of this spinning. Then there are all the workloads which have not been tested in all these benchmark suites. The hope is to be able to incrementally iterate on the performance of the worst outliers. > Is there a plan to get rid of the UseObjectMonitorTable flag in a future release? Ideally we would have one fast-locking implementation (LW locking) with one OM mapping (+UOMT), right? My understanding (and shared hope) is that this is the ambition. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20067#issuecomment-2225341285 From eosterlund at openjdk.org Fri Jul 12 12:00:56 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 12 Jul 2024 12:00:56 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear In-Reply-To: <K2EJ43EXkTgJE0pjwzy50s3BoTAhF1Y2trwHtDzhojQ=.e837c7bc-717c-4826-8cc3-82a2232bc928@github.com> References: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com> <o7zszGQ4GxfAx_LutX6S8rCLrZVHro9Ggreo5tICcvw=.825e4096-7b13-4ce5-b5cc-53e8d5603ecf@github.com> <K2EJ43EXkTgJE0pjwzy50s3BoTAhF1Y2trwHtDzhojQ=.e837c7bc-717c-4826-8cc3-82a2232bc928@github.com> Message-ID: <iFxcPJTPGoxZgIaQKYtEbtg06xXYJewHfSA-f7nbofQ=.37070a3a-681b-4ccb-8857-91be898fd3c9@github.com> On Fri, 12 Jul 2024 10:22:42 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > > The reason we did not do this before is that this is not a strong reference store. Strong reference stores with a SATB collector will keep the referent alive, which is typically the exact opposite of what a user wants when they clear a Reference. > > You mean not doing this store just on the Java side? Yes, I agree, it would be awkward. In intrinsic, we are storing with the same decorators that `JVM_ReferenceClear` is using, which should be good with SATB collectors. Perhaps I am misunderstanding the comment. The runtime use of the Access API knows how to resolve an unknown oop ref strength using AccessBarrierSupport::resolve_unknown_oop_ref_strength. However, we do not have support for that in the C2 backend. In fact, it does not understand non-strong oop stores at all. Because there hasn't really been a use case for it, other than clearing a Reference. That's the precise reason why we do not have a clear intrinsic; it would have to add that infrastructure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2225430174 From aph at openjdk.org Fri Jul 12 12:08:55 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 12 Jul 2024 12:08:55 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6] In-Reply-To: <TZcCyU0Zgrw6UwJ6-v_k0W06ChzxniusrEiK1UPErt0=.a028b5e9-dd11-4f2c-94d2-e427ad85a8ee@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com> <TZcCyU0Zgrw6UwJ6-v_k0W06ChzxniusrEiK1UPErt0=.a028b5e9-dd11-4f2c-94d2-e427ad85a8ee@github.com> Message-ID: <PIXggQ3dHB2fzB-FFKqZv88c0HFbbgj8U3uMEH9LJWM=.7521eb79-276a-4afc-9081-bd99dadac5c6@github.com> On Fri, 12 Jul 2024 09:40:45 GMT, Roman Kennke <rkennke at openjdk.org> wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Update arguments.cpp > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 343: > >> 341: const Register t3_owner = t3; >> 342: const ByteSize monitor_tag = in_ByteSize(UseObjectMonitorTable ? 0 : checked_cast<int>(markWord::monitor_value)); >> 343: const Address owner_address{t1_monitor, ObjectMonitor::owner_offset() - monitor_tag}; > > That may be just me, but I found that syntax weird. I first needed to look-up what the {}-initializer actually means. Hiccups like this reduce readability, IMO. I'd prefer the normal ()-init for the Address like we seem to do everywhere else. I agree with @rkennke . When we wrote the AArch64 MacroAssembler we were concentrating on readability and familiarity, and this separate declaration and use, with unusual syntax, IMO makes life harder for the reader. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675751592 From gli at openjdk.org Fri Jul 12 12:17:52 2024 From: gli at openjdk.org (Guoxiong Li) Date: Fri, 12 Jul 2024 12:17:52 GMT Subject: RFR: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and VM_ParallelGCSystemGC [v4] In-Reply-To: <viPT0XNzMpheGP6HtlZ0RI1Gbi-nA9DkCraQSfo81rA=.481fe804-9c5c-441b-b069-7ad7baee772a@github.com> References: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com> <viPT0XNzMpheGP6HtlZ0RI1Gbi-nA9DkCraQSfo81rA=.481fe804-9c5c-441b-b069-7ad7baee772a@github.com> Message-ID: <s6pj4WO2tSlHfBq_95tbuSP3u3d5IBP9N7WWEk_fvNQ=.b8ef026f-3110-4c23-8726-f83d47fc5722@github.com> On Thu, 11 Jul 2024 18:06:34 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: >> Similar cleanup as https://github.com/openjdk/jdk/pull/19056 but in Parallel. As a result, the corresponding code in `SerialHeap` and `ParallelScavengeHeap` share much similarity. >> >> The easiest way to review is to start from these two VM operations, `VM_ParallelCollectForAllocation` and `VM_ParallelGCCollect` and follow the new code directly, where one can see how allocation-failure triggers various GCs with different collection efforts. >> >> Test: tier1-6; perf-neural for dacapo, specjvm2008, specjbb2015 and cachestresser. > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into pgc-vm-operation > - review > - review > - Merge branch 'master' into pgc-vm-operation > - pgc-vm-operation Marked as reviewed by gli (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20077#pullrequestreview-2174586077 From jkratochvil at openjdk.org Fri Jul 12 12:30:52 2024 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Fri, 12 Jul 2024 12:30:52 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v4] In-Reply-To: <ZUmCX2Tqmw_48beJOefsyDEgjElCZWV6IVl7SMZi4r0=.37d3a4ee-2740-4745-ae47-766da3b7fb6e@github.com> References: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com> <ZUmCX2Tqmw_48beJOefsyDEgjElCZWV6IVl7SMZi4r0=.37d3a4ee-2740-4745-ae47-766da3b7fb6e@github.com> Message-ID: <VJ5KHzRz789LNMySGGlflRLtFi1wXHuLyh56ifMK19c=.9f49db56-674c-4a70-8e86-294b6d041f59@github.com> On Thu, 11 Jul 2024 16:46:13 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote: >> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. >> >> I'm adding those tests in order to not regress another time. >> >> Testing: >> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. >> - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) >> - [x] GHA > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Add Whitebox check for host cpu > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Fix comments > - 8333446: Add tests for hierarchical container support With #17198 and this updated patch I still get the a FAIL due to: [0.333s][trace][os,container] OSContainer::active_processor_count: 4 But let's resolve it after #17198 gets final/approved. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2225475953 From sgehwolf at openjdk.org Fri Jul 12 12:46:50 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 12 Jul 2024 12:46:50 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v4] In-Reply-To: <VJ5KHzRz789LNMySGGlflRLtFi1wXHuLyh56ifMK19c=.9f49db56-674c-4a70-8e86-294b6d041f59@github.com> References: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com> <ZUmCX2Tqmw_48beJOefsyDEgjElCZWV6IVl7SMZi4r0=.37d3a4ee-2740-4745-ae47-766da3b7fb6e@github.com> <VJ5KHzRz789LNMySGGlflRLtFi1wXHuLyh56ifMK19c=.9f49db56-674c-4a70-8e86-294b6d041f59@github.com> Message-ID: <i6J1-zEhPuCqHs8lXwpjlNxfvX7lnCIkSuATD5U3S9M=.b43cf310-f870-40be-acfc-2889861183e4@github.com> On Fri, 12 Jul 2024 12:28:16 GMT, Jan Kratochvil <jkratochvil at openjdk.org> wrote: > With #17198 and this updated patch I still get the a FAIL due to: > > ``` > [0.333s][trace][os,container] OSContainer::active_processor_count: 4 > ``` > > But let's resolve it after #17198 gets final/approved. Because the #17198 is incomplete. As mentioned in the review: > We ought to also trim the path for the CPU controller. This patch only fixes the memory controller. That's exactly why the test is failing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2225501718 From coleenp at openjdk.org Fri Jul 12 12:50:52 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 12 Jul 2024 12:50:52 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v7] In-Reply-To: <PgX4o-qTT_4gDiZ94towRtB7xs7zkYMcoTpp51iz5vM=.4085c8c0-679a-4e84-8cb0-20bfb9ec80bf@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <PgX4o-qTT_4gDiZ94towRtB7xs7zkYMcoTpp51iz5vM=.4085c8c0-679a-4e84-8cb0-20bfb9ec80bf@github.com> Message-ID: <6Aa4oWKwpgo9Br75tCLj3AGQLxP9Rw2dgjzOXJQ6CTo=.e92e83f5-e4b3-43d8-8e89-3349de99524d@github.com> On Fri, 12 Jul 2024 10:54:23 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup c2 cache lookup Thank you for making the argument change. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20067#pullrequestreview-2174647655 From ayang at openjdk.org Fri Jul 12 13:01:59 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 12 Jul 2024 13:01:59 GMT Subject: RFR: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and VM_ParallelGCSystemGC [v4] In-Reply-To: <viPT0XNzMpheGP6HtlZ0RI1Gbi-nA9DkCraQSfo81rA=.481fe804-9c5c-441b-b069-7ad7baee772a@github.com> References: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com> <viPT0XNzMpheGP6HtlZ0RI1Gbi-nA9DkCraQSfo81rA=.481fe804-9c5c-441b-b069-7ad7baee772a@github.com> Message-ID: <nmAzpCfQlgjRjRZhg89_QRFj0PebhJ85w9OrQfHZ9RU=.27e474f6-a59d-4b1a-a548-8cc2d5f2dcac@github.com> On Thu, 11 Jul 2024 18:06:34 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: >> Similar cleanup as https://github.com/openjdk/jdk/pull/19056 but in Parallel. As a result, the corresponding code in `SerialHeap` and `ParallelScavengeHeap` share much similarity. >> >> The easiest way to review is to start from these two VM operations, `VM_ParallelCollectForAllocation` and `VM_ParallelGCCollect` and follow the new code directly, where one can see how allocation-failure triggers various GCs with different collection efforts. >> >> Test: tier1-6; perf-neural for dacapo, specjvm2008, specjbb2015 and cachestresser. > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into pgc-vm-operation > - review > - review > - Merge branch 'master' into pgc-vm-operation > - pgc-vm-operation Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20077#issuecomment-2225527435 From ayang at openjdk.org Fri Jul 12 13:02:00 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 12 Jul 2024 13:02:00 GMT Subject: Integrated: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and VM_ParallelGCSystemGC In-Reply-To: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com> References: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com> Message-ID: <2rPj9VK03GeaMLDlBBlNBKwNQCLPWNk-cbsLN_G3ymA=.a932a4b7-185d-46b3-acac-4a27ff4a1ee8@github.com> On Mon, 8 Jul 2024 16:18:22 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: > Similar cleanup as https://github.com/openjdk/jdk/pull/19056 but in Parallel. As a result, the corresponding code in `SerialHeap` and `ParallelScavengeHeap` share much similarity. > > The easiest way to review is to start from these two VM operations, `VM_ParallelCollectForAllocation` and `VM_ParallelGCCollect` and follow the new code directly, where one can see how allocation-failure triggers various GCs with different collection efforts. > > Test: tier1-6; perf-neural for dacapo, specjvm2008, specjbb2015 and cachestresser. This pull request has now been integrated. Changeset: 34d8562a Author: Albert Mingkun Yang <ayang at openjdk.org> URL: https://git.openjdk.org/jdk/commit/34d8562a913b8382601e4c0c31ad34a663b9ec0a Stats: 342 lines in 14 files changed: 88 ins; 169 del; 85 mod 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and VM_ParallelGCSystemGC Reviewed-by: gli, zgu ------------- PR: https://git.openjdk.org/jdk/pull/20077 From aboldtch at openjdk.org Fri Jul 12 13:06:40 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 12 Jul 2024 13:06:40 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v8] In-Reply-To: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> Message-ID: <VM1cqaoV2Fx9tbdONunfk4TMc8NUHbjdVFCDy5ySDuE=.9eafd768-5e31-48c7-b0fb-e676e801ddc4@github.com> > When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. > > This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. > > A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). > > This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). > > Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. > > # Cleanups > > Cleaned up displaced header usage for: > * BasicLock > * Contains some Zero changes > * Renames one exported JVMCI field > * ObjectMonitor > * Updates comments and tests consistencies > > # Refactoring > > `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. > > The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. > > _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ > > # LightweightSynchronizer > > Working on adapting and incorporating the following section as a comment in the source code > > ## Fast Locking > > CAS on locking bits in markWord. > 0b00 (Fast Locked) <--> 0b01 (Unlocked) > > When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. > > If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf... Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Avoid uniform initialization ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20067/files - new: https://git.openjdk.org/jdk/pull/20067/files/e1eb8c95..cccffeda Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=06-07 Stats: 6 lines in 3 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20067.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067 PR: https://git.openjdk.org/jdk/pull/20067 From aboldtch at openjdk.org Fri Jul 12 13:06:40 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 12 Jul 2024 13:06:40 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6] In-Reply-To: <PIXggQ3dHB2fzB-FFKqZv88c0HFbbgj8U3uMEH9LJWM=.7521eb79-276a-4afc-9081-bd99dadac5c6@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com> <TZcCyU0Zgrw6UwJ6-v_k0W06ChzxniusrEiK1UPErt0=.a028b5e9-dd11-4f2c-94d2-e427ad85a8ee@github.com> <PIXggQ3dHB2fzB-FFKqZv88c0HFbbgj8U3uMEH9LJWM=.7521eb79-276a-4afc-9081-bd99dadac5c6@github.com> Message-ID: <FxdbAVZ0cLhyLrpgp-3Nwj7pSm_PVli71my96vMU_b8=.f131763f-576c-476f-8146-d805f3f074cf@github.com> On Fri, 12 Jul 2024 12:06:05 GMT, Andrew Haley <aph at openjdk.org> wrote: >> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 343: >> >>> 341: const Register t3_owner = t3; >>> 342: const ByteSize monitor_tag = in_ByteSize(UseObjectMonitorTable ? 0 : checked_cast<int>(markWord::monitor_value)); >>> 343: const Address owner_address{t1_monitor, ObjectMonitor::owner_offset() - monitor_tag}; >> >> That may be just me, but I found that syntax weird. I first needed to look-up what the {}-initializer actually means. Hiccups like this reduce readability, IMO. I'd prefer the normal ()-init for the Address like we seem to do everywhere else. > > I agree with @rkennke . When we wrote the AArch64 MacroAssembler we were concentrating on readability and familiarity, and this separate declaration and use, with unusual syntax, IMO makes life harder for the reader. Fair enough. ? _To me uniform initialization is just safer less problematic way of expressing the same thing._ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675828324 From shade at openjdk.org Fri Jul 12 13:21:50 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 12 Jul 2024 13:21:50 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear In-Reply-To: <iFxcPJTPGoxZgIaQKYtEbtg06xXYJewHfSA-f7nbofQ=.37070a3a-681b-4ccb-8857-91be898fd3c9@github.com> References: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com> <o7zszGQ4GxfAx_LutX6S8rCLrZVHro9Ggreo5tICcvw=.825e4096-7b13-4ce5-b5cc-53e8d5603ecf@github.com> <K2EJ43EXkTgJE0pjwzy50s3BoTAhF1Y2trwHtDzhojQ=.e837c7bc-717c-4826-8cc3-82a2232bc928@github.com> <iFxcPJTPGoxZgIaQKYtEbtg06xXYJewHfSA-f7nbofQ=.37070a3a-681b-4ccb-8857-91be898fd3c9@github.com> Message-ID: <WOpJEGXtCPcCZv7YFhUT2ZOHe8j3mnavPrLjbbFD0Ns=.e514c8c3-ee1f-4e0d-a9ae-a83171959a0e@github.com> On Fri, 12 Jul 2024 11:57:56 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: > The runtime use of the Access API knows how to resolve an unknown oop ref strength using AccessBarrierSupport::resolve_unknown_oop_ref_strength. However, we do not have support for that in the C2 backend. In fact, it does not understand non-strong oop stores at all. Aw, nice usability landmine. I thought C2 barrier set would assert on me if it cannot deliver. Apparently not, I see it just does pre-barriers when it is not sure what strongness the store is. Hrmpf. OK, let me see what can be done here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2225577027 From fgao at openjdk.org Fri Jul 12 13:52:04 2024 From: fgao at openjdk.org (Fei Gao) Date: Fri, 12 Jul 2024 13:52:04 GMT Subject: RFR: 8336245: AArch64: remove extra register copy when converting from long to pointer Message-ID: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> In the cases like: UNSAFE.putLong(address + off1 + 1030, lseed); UNSAFE.putLong(address + 1023, lseed); UNSAFE.putLong(address + off2 + 1001, lseed); Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like: ldr R10, [R15, #120] # int ! Field: address ldr R11, [R16, #136] # int ! Field: off1 ldr R12, [R16, #144] # int ! Field: off2 add R11, R11, R10 mov R11, R11 # long -> ptr add R12, R12, R10 mov R10, R10 # long -> ptr add R11, R11, #1030 # ptr str R17, [R11] # int add R10, R10, #1023 # ptr str R17, [R10] # int mov R10, R12 # long -> ptr add R10, R10, #1001 # ptr str R17, [R10] # int In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly: ldr x10, [x15,#120] ldp x11, x12, [x16,#136] add x11, x11, x10 add x12, x12, x10 add x11, x11, #0x406 str x17, [x11] add x10, x10, #0x3ff str x17, [x10] mov x10, x12 <--- extra register copy add x10, x10, #0x3e9 str x17, [x10] There is still one extra register copy, which we're trying to remove in this patch. This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition. Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906 ------------- Commit messages: - 8336245: AArch64: remove extra register copy when converting from long to pointer Changes: https://git.openjdk.org/jdk/pull/20157/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20157&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8336245 Stats: 320 lines in 5 files changed: 297 ins; 3 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/20157.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20157/head:pull/20157 PR: https://git.openjdk.org/jdk/pull/20157 From fgao at openjdk.org Fri Jul 12 13:52:05 2024 From: fgao at openjdk.org (Fei Gao) Date: Fri, 12 Jul 2024 13:52:05 GMT Subject: RFR: 8336245: AArch64: remove extra register copy when converting from long to pointer In-Reply-To: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> Message-ID: <2Ln6-ZIklVFgsBWZmmyOU2G-wZmknxjsoT1xcTKSXDc=.54473598-6e15-43d1-9e5f-95c796d11066@github.com> On Fri, 12 Jul 2024 13:44:25 GMT, Fei Gao <fgao at openjdk.org> wrote: > In the cases like: > > UNSAFE.putLong(address + off1 + 1030, lseed); > UNSAFE.putLong(address + 1023, lseed); > UNSAFE.putLong(address + off2 + 1001, lseed); > > > Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like: > > ldr R10, [R15, #120] # int ! Field: address > ldr R11, [R16, #136] # int ! Field: off1 > ldr R12, [R16, #144] # int ! Field: off2 > add R11, R11, R10 > mov R11, R11 # long -> ptr > add R12, R12, R10 > mov R10, R10 # long -> ptr > add R11, R11, #1030 # ptr > str R17, [R11] # int > add R10, R10, #1023 # ptr > str R17, [R10] # int > mov R10, R12 # long -> ptr > add R10, R10, #1001 # ptr > str R17, [R10] # int > > > In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly: > > ldr x10, [x15,#120] > ldp x11, x12, [x16,#136] > add x11, x11, x10 > add x12, x12, x10 > add x11, x11, #0x406 > str x17, [x11] > add x10, x10, #0x3ff > str x17, [x10] > mov x10, x12 <--- extra register copy > add x10, x10, #0x3e9 > str x17, [x10] > > > There is still one extra register copy, which we're trying to remove in this patch. > > This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition. > > Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so > > [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906 src/hotspot/share/opto/machnode.cpp line 400: > 398: > 399: if (t->isa_intptr_t() && > 400: #if !defined(AARCH64) After applying the operand "IndirectX2P", we may have some patterns like: str val, [CastX2P base] The code path here will resolve the `base`, which is actually a `intptr`, not a `ptr`, and the offset is `0`. I guess the code here was intended to support `[base, offset]`, where base can be a `intptr` but offset can not be `0`. I'm not sure why there is such a limitation that offset can not be `0`, maybe for some old machines? I don't think the limitation is applied to aarch64 machines now. So I unblock it for aarch64. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20157#discussion_r1675959482 From fgao at openjdk.org Fri Jul 12 14:17:50 2024 From: fgao at openjdk.org (Fei Gao) Date: Fri, 12 Jul 2024 14:17:50 GMT Subject: RFR: 8336245: AArch64: remove extra register copy when converting from long to pointer In-Reply-To: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> Message-ID: <TDTev6CsRM2rnR1nNML50zEUQoPZ79l0y9Zg0CpAwgU=.7b792eac-cec5-4ced-b84c-704802ca9f57@github.com> On Fri, 12 Jul 2024 13:44:25 GMT, Fei Gao <fgao at openjdk.org> wrote: > In the cases like: > > UNSAFE.putLong(address + off1 + 1030, lseed); > UNSAFE.putLong(address + 1023, lseed); > UNSAFE.putLong(address + off2 + 1001, lseed); > > > Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like: > > ldr R10, [R15, #120] # int ! Field: address > ldr R11, [R16, #136] # int ! Field: off1 > ldr R12, [R16, #144] # int ! Field: off2 > add R11, R11, R10 > mov R11, R11 # long -> ptr > add R12, R12, R10 > mov R10, R10 # long -> ptr > add R11, R11, #1030 # ptr > str R17, [R11] # int > add R10, R10, #1023 # ptr > str R17, [R10] # int > mov R10, R12 # long -> ptr > add R10, R10, #1001 # ptr > str R17, [R10] # int > > > In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly: > > ldr x10, [x15,#120] > ldp x11, x12, [x16,#136] > add x11, x11, x10 > add x12, x12, x10 > add x11, x11, #0x406 > str x17, [x11] > add x10, x10, #0x3ff > str x17, [x10] > mov x10, x12 <--- extra register copy > add x10, x10, #0x3e9 > str x17, [x10] > > > There is still one extra register copy, which we're trying to remove in this patch. > > This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition. > > Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so > > [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906 https://github.com/openjdk/jdk/pull/20159 is also to fix the same issue. Please feel free to review the draft PR. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20157#issuecomment-2225679586 From aph at openjdk.org Fri Jul 12 14:36:51 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 12 Jul 2024 14:36:51 GMT Subject: RFR: 8336245: AArch64: remove extra register copy when converting from long to pointer In-Reply-To: <2Ln6-ZIklVFgsBWZmmyOU2G-wZmknxjsoT1xcTKSXDc=.54473598-6e15-43d1-9e5f-95c796d11066@github.com> References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> <2Ln6-ZIklVFgsBWZmmyOU2G-wZmknxjsoT1xcTKSXDc=.54473598-6e15-43d1-9e5f-95c796d11066@github.com> Message-ID: <BDLL94Te55nHGCUuLtN6qIQynYIWdux300wtYvdxbkU=.0bdaf9f1-cb7e-403c-96f8-3b3ba69f8484@github.com> On Fri, 12 Jul 2024 13:49:32 GMT, Fei Gao <fgao at openjdk.org> wrote: >> In the cases like: >> >> UNSAFE.putLong(address + off1 + 1030, lseed); >> UNSAFE.putLong(address + 1023, lseed); >> UNSAFE.putLong(address + off2 + 1001, lseed); >> >> >> Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like: >> >> ldr R10, [R15, #120] # int ! Field: address >> ldr R11, [R16, #136] # int ! Field: off1 >> ldr R12, [R16, #144] # int ! Field: off2 >> add R11, R11, R10 >> mov R11, R11 # long -> ptr >> add R12, R12, R10 >> mov R10, R10 # long -> ptr >> add R11, R11, #1030 # ptr >> str R17, [R11] # int >> add R10, R10, #1023 # ptr >> str R17, [R10] # int >> mov R10, R12 # long -> ptr >> add R10, R10, #1001 # ptr >> str R17, [R10] # int >> >> >> In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly: >> >> ldr x10, [x15,#120] >> ldp x11, x12, [x16,#136] >> add x11, x11, x10 >> add x12, x12, x10 >> add x11, x11, #0x406 >> str x17, [x11] >> add x10, x10, #0x3ff >> str x17, [x10] >> mov x10, x12 <--- extra register copy >> add x10, x10, #0x3e9 >> str x17, [x10] >> >> >> There is still one extra register copy, which we're trying to remove in this patch. >> >> This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition. >> >> Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so >> >> [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906 > > src/hotspot/share/opto/machnode.cpp line 400: > >> 398: >> 399: if (t->isa_intptr_t() && >> 400: #if !defined(AARCH64) > > After applying the operand "IndirectX2P", we may have some patterns like: > > str val, [CastX2P base] > > The code path here will resolve the `base`, which is actually a `intptr`, not a `ptr`, and the offset is `0`. > > I guess the code here was intended to support `[base, offset]`, where base can be a `intptr` but offset can not be `0`. I'm not sure why there is such a limitation that offset can not be `0`, maybe for some old machines? > > I don't think the limitation is applied to aarch64 machines now. So I unblock it for aarch64. I think it's the other way around. Isn't this code saying that if the address is an intptr + a nonzero offset, then the returned type is bottom, ie nothing? What effect does this change have? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20157#discussion_r1676024922 From jvernee at openjdk.org Fri Jul 12 14:43:25 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 12 Jul 2024 14:43:25 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena Message-ID: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. In this PR: - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: Benchmark Threads Mode Cnt Score Error Units ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op ConcurrentClose.sharedAccess 1 avgt 10 52.042 ? 0.630 us/op ConcurrentClose.confinedAccess 32 avgt 10 25.517 ? 1.069 us/op ConcurrentClose.confinedAccess 1 avgt 10 12.398 ? 0.098 us/op (I manually added the `Threads` collumn btw) Testing: tier 1-4 ------------- Commit messages: - polish - slightly improve comment - tweak comment - improve benchmark parameters - cleanup - add benchmark - add note about lacking session oop at safepoint - Only deopt if necessary - refactor close handshake - Return before deoptimizing of target thread already has async exception Changes: https://git.openjdk.org/jdk/pull/20158/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335480 Stats: 428 lines in 6 files changed: 339 ins; 19 del; 70 mod Patch: https://git.openjdk.org/jdk/pull/20158.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20158/head:pull/20158 PR: https://git.openjdk.org/jdk/pull/20158 From shade at openjdk.org Fri Jul 12 14:45:53 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 12 Jul 2024 14:45:53 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields In-Reply-To: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> Message-ID: <SGPbCyVziPj9rzNvesb3ME37e9-Ld4wSCuTTQYbGNWo=.a29bc9f6-974f-4dc2-960f-a4fbba474710@github.com> On Mon, 10 Jun 2024 18:05:09 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > See bug for more discussion. > > Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193 > > A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice: > https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160 > > AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone. > > I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields. > > Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit. > > C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner. > > Additional testing: > - [x] New IR tests > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, jcstre... Still waiting for formal reviews, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2225744502 From liach at openjdk.org Fri Jul 12 14:57:53 2024 From: liach at openjdk.org (Chen Liang) Date: Fri, 12 Jul 2024 14:57:53 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields In-Reply-To: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> Message-ID: <_1clZ5gStZtmtewSn4IBK_hNMGirBETJC9Szgrw6xzE=.1ba387d8-317d-44f3-8fa9-2860f9d53242@github.com> On Mon, 10 Jun 2024 18:05:09 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > See bug for more discussion. > > Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193 > > A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice: > https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160 > > AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone. > > I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields. > > Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit. > > C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner. > > Additional testing: > - [x] New IR tests > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, jcstre... Marked as reviewed by liach (Reviewer). src/hotspot/share/opto/parse1.cpp line 1040: > 1038: if (PrintOpto && (Verbose || WizardMode)) { > 1039: method()->print_name(); > 1040: tty->print_cr(" writes @Stable and needs a memory barrier"); This is the generic, non-constructor stable write release barrier removed, right? ------------- PR Review: https://git.openjdk.org/jdk/pull/19635#pullrequestreview-2175116293 PR Review Comment: https://git.openjdk.org/jdk/pull/19635#discussion_r1676061372 From shade at openjdk.org Fri Jul 12 15:07:51 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 12 Jul 2024 15:07:51 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields In-Reply-To: <_1clZ5gStZtmtewSn4IBK_hNMGirBETJC9Szgrw6xzE=.1ba387d8-317d-44f3-8fa9-2860f9d53242@github.com> References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> <_1clZ5gStZtmtewSn4IBK_hNMGirBETJC9Szgrw6xzE=.1ba387d8-317d-44f3-8fa9-2860f9d53242@github.com> Message-ID: <SPYs_DoEu46y0-9C7D45r5Oxfl8TCb3SoW_pi864DAQ=.5d8ff9d9-c2e3-4727-82a8-eb979cf71c0d@github.com> On Fri, 12 Jul 2024 14:54:58 GMT, Chen Liang <liach at openjdk.org> wrote: >> See bug for more discussion. >> >> Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193 >> >> A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice: >> https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160 >> >> AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone. >> >> I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields. >> >> Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit. >> >> C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner. >> >> Additional testing: >> - [x] New IR tests >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x... > > src/hotspot/share/opto/parse1.cpp line 1040: > >> 1038: if (PrintOpto && (Verbose || WizardMode)) { >> 1039: method()->print_name(); >> 1040: tty->print_cr(" writes @Stable and needs a memory barrier"); > > This is the generic, non-constructor stable write release barrier removed, right? Yes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19635#discussion_r1676075164 From rkennke at openjdk.org Fri Jul 12 16:18:02 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 Jul 2024 16:18:02 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6] In-Reply-To: <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com> Message-ID: <H1xx5Q5Wsuz3cl0FP1fwX4kL-jYdqbQ3skKwYcd54vo=.bd7abee8-0300-4253-a8b4-428ae8da1a0e@github.com> On Fri, 12 Jul 2024 05:57:30 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Update arguments.cpp Here comes my first-pass review of the shared code. (Man, I hope we can get rid of UOMT soon, again...) src/hotspot/share/oops/instanceKlass.cpp line 1090: > 1088: > 1089: // Step 2 > 1090: // If we were to use wait() instead of waitUninterruptibly() then This is a nice correction (even though, the actual call below is wait_uninterruptibly() ;-) ), but seems totally unrelated. src/hotspot/share/oops/markWord.cpp line 27: > 25: #include "precompiled.hpp" > 26: #include "oops/markWord.hpp" > 27: #include "runtime/basicLock.inline.hpp" I don't think this include is needed (at least not by the changed code parts, I haven't checked existing code). src/hotspot/share/runtime/arguments.cpp line 1820: > 1818: warning("New lightweight locking not supported on this platform"); > 1819: } > 1820: if (UseObjectMonitorTable) { Uhm, wait a second. That list of platforms covers all existing platforms anyway, so the whole block could be removed? Or is there a deeper meaning here that I don't understand? src/hotspot/share/runtime/basicLock.cpp line 37: > 35: if (mon != nullptr) { > 36: mon->print_on(st); > 37: } I am not sure if we wanted to do this, but we know the owner, therefore we could also look-up the OM from the table, and print it. It wouldn't have all that much to do with the BasicLock, though. src/hotspot/share/runtime/basicLock.inline.hpp line 45: > 43: return reinterpret_cast<ObjectMonitor*>(get_metadata()); > 44: #else > 45: // Other platforms does not make use of the cache yet, If it's not used, why does it matter to special case the code here? src/hotspot/share/runtime/lightweightSynchronizer.cpp line 28: > 26: > 27: #include "classfile/vmSymbols.hpp" > 28: #include "javaThread.inline.hpp" This include is incorrect (and my IDE says it's not needed). src/hotspot/share/runtime/lightweightSynchronizer.cpp line 31: > 29: #include "jfrfiles/jfrEventClasses.hpp" > 30: #include "logging/log.hpp" > 31: #include "logging/logStream.hpp" Include of logStream.hpp not needed? src/hotspot/share/runtime/lightweightSynchronizer.cpp line 58: > 56: > 57: // > 58: // Lightweight synchronization. This comment doesn't really say anything. Either remove it, or add a nice summary of how LW locking and OM table stuff works. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 80: > 78: > 79: ConcurrentTable* _table; > 80: volatile size_t _table_count; Looks like a misnomer to me. We only have one table, but we do have N entries/nodes. This is counted when new nodes are allocated or old nodes are freed. Consider renaming this to '_entry_count' or '_node_count'? I'm actually a bit surprised if ConcurrentHashTable doesn't already track this... src/hotspot/share/runtime/lightweightSynchronizer.cpp line 88: > 86: > 87: public: > 88: Lookup(oop obj) : _obj(obj) {} Make explicit? src/hotspot/share/runtime/lightweightSynchronizer.cpp line 97: > 95: > 96: bool equals(ObjectMonitor** value) { > 97: // The entry is going to be removed soon. What does this comment mean? src/hotspot/share/runtime/lightweightSynchronizer.cpp line 112: > 110: > 111: public: > 112: LookupMonitor(ObjectMonitor* monitor) : _monitor(monitor) {} Make explicit? src/hotspot/share/runtime/lightweightSynchronizer.cpp line 159: > 157: static size_t min_log_size() { > 158: // ~= log(AvgMonitorsPerThreadEstimate default) > 159: return 10; Uh wait - are we assuming that threads hold 1024 monitors *on average* ? Isn't this a bit excessive? I would have thought maybe 8 monitors/thread. Yes there are workloads that are bonkers. Or maybe the comment/flag name does not say what I think it says. Or why not use AvgMonitorsPerThreadEstimate directly? src/hotspot/share/runtime/lightweightSynchronizer.cpp line 349: > 347: assert(LockingMode == LM_LIGHTWEIGHT, "must be"); > 348: > 349: if (try_read) { All the callers seem to pass try_read = true. Why do we have the branch at all? src/hotspot/share/runtime/lightweightSynchronizer.cpp line 401: > 399: > 400: if (inserted) { > 401: // Hopefully the performance counters are allocated on distinct It doesn't look like the counters are on distinct cache lines (see objectMonitor.hpp, lines 212ff). If this is a concern, file a bug to investigate it later? The comment here is a bit misplaced, IMO. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 473: > 471: int _length; > 472: > 473: void do_oop(oop* o) final { C++ always provides something to learn - C++ has got a final keyword! :-) Looks like a reasonable use of it here, though. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 477: > 475: if (obj->mark_acquire().has_monitor()) { > 476: if (_length > 0 && _contended_oops[_length-1] == obj) { > 477: // assert(VM_Version::supports_recursive_lightweight_locking(), "must be"); Uncomment or remove assert? src/hotspot/share/runtime/lightweightSynchronizer.cpp line 554: > 552: bool _no_safepoint; > 553: union { > 554: struct {} _dummy; Uhh ... Why does this need to be wrapped in a union and struct? src/hotspot/share/runtime/lightweightSynchronizer.cpp line 563: > 561: assert(locking_thread == current || locking_thread->is_obj_deopt_suspend(), "locking_thread may not run concurrently"); > 562: if (_no_safepoint) { > 563: ::new (&_nsv) NoSafepointVerifier(); I'm thinking that it might be easier and cleaner to just re-do what the NoSafepointVerifier does? It just calls thread->inc/dec _no_safepoint_count(). src/hotspot/share/runtime/lightweightSynchronizer.cpp line 748: > 746: } > 747: > 748: // Fast-locking does not use the 'lock' argument. I believe the comment is outdated. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 969: > 967: > 968: for (;;) { > 969: // Fetch the monitor from the table Wrong intendation. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 1157: > 1155: // enter can block for safepoints; clear the unhandled object oop > 1156: PauseNoSafepointVerifier pnsv(&nsv); > 1157: object = nullptr; What is the point of that statement? object is not an out-arg (afaict), and not used subsequently. src/hotspot/share/runtime/lightweightSynchronizer.hpp line 68: > 66: static void exit(oop object, JavaThread* current); > 67: > 68: static ObjectMonitor* inflate_into_object_header(Thread* current, JavaThread* inflating_thread, oop object, const ObjectSynchronizer::InflateCause cause); My IDE flags this with a warning 'Parameter 'cause' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions' *shrugs* src/hotspot/share/runtime/lockStack.inline.hpp line 232: > 230: oop obj = monitor->object_peek(); > 231: assert(obj != nullptr, "must be alive"); > 232: assert(monitor == LightweightSynchronizer::get_monitor_from_table(JavaThread::current(), obj), "must be exist in table"); "must be exist in table" -> "must exist in table" src/hotspot/share/runtime/objectMonitor.cpp line 56: > 54: #include "runtime/safepointMechanism.inline.hpp" > 55: #include "runtime/sharedRuntime.hpp" > 56: #include "runtime/synchronizer.hpp" This include is not used. src/hotspot/share/runtime/objectMonitor.hpp line 193: > 191: ObjectWaiter* volatile _WaitSet; // LL of threads wait()ing on the monitor > 192: volatile int _waiters; // number of waiting threads > 193: private: You can now also remove the 'private:' here src/hotspot/share/runtime/synchronizer.cpp line 390: > 388: > 389: static bool useHeavyMonitors() { > 390: #if defined(X86) || defined(AARCH64) || defined(PPC64) || defined(RISCV64) || defined(S390) Why are those if-defs here? Why is ARM excluded? ------------- Changes requested by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20067#pullrequestreview-2174478048 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675695457 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675696406 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675704824 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675707735 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675711809 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675744474 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675745048 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1676111067 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675773683 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675747483 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675765460 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675766088 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675781420 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675791687 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675799897 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675803217 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675805690 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675824394 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675832868 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675854207 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675876915 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675932005 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675936943 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1676107048 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1676112375 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1676125325 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1676140201 From rkennke at openjdk.org Fri Jul 12 16:18:03 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 Jul 2024 16:18:03 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v3] In-Reply-To: <-hS6aTxhzI_HzVegg0EziUtGxdq6orpF9s1rF3l2hZY=.0c4296b2-d27a-4578-a160-d17b65163655@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <5CNKzDumOf1MJQXM9OBHQh0Mj7eLv2ONio1V-AXeSJI=.54302b45-2dd2-4f18-a094-6b2c6a59517c@github.com> <-hS6aTxhzI_HzVegg0EziUtGxdq6orpF9s1rF3l2hZY=.0c4296b2-d27a-4578-a160-d17b65163655@github.com> Message-ID: <hK7cMXwnR14MPvDtZ08migcBjRmXqlXpFEI5BLyAA2M=.cec68237-c10b-4cdd-976f-495c6d25560b@github.com> On Tue, 9 Jul 2024 20:43:06 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: >> >> - Add JVMCI symbol exports >> - Revert "More graceful JVMCI VM option interaction" >> >> This reverts commit 2814350370cf142e130fe1d38610c646039f976d. > > src/hotspot/share/opto/library_call.cpp line 4620: > >> 4618: Node *unlocked_val = _gvn.MakeConX(markWord::unlocked_value); >> 4619: Node *chk_unlocked = _gvn.transform(new CmpXNode(lmasked_header, unlocked_val)); >> 4620: Node *test_not_unlocked = _gvn.transform(new BoolNode(chk_unlocked, BoolTest::ne)); > > I don't really know what this does. Someone from the c2 compiler group should look at this. Yes, that looks correct. I'm familiar with this code because I messed with it in my attempts to implement compact identity hashcode in Lilliput2. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675699672 From rkennke at openjdk.org Fri Jul 12 16:18:03 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 Jul 2024 16:18:03 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6] In-Reply-To: <H1xx5Q5Wsuz3cl0FP1fwX4kL-jYdqbQ3skKwYcd54vo=.bd7abee8-0300-4253-a8b4-428ae8da1a0e@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com> <H1xx5Q5Wsuz3cl0FP1fwX4kL-jYdqbQ3skKwYcd54vo=.bd7abee8-0300-4253-a8b4-428ae8da1a0e@github.com> Message-ID: <CkZ-Sr3ITmhrMyAhsjGUsf2LgyiU2QhaNdvbkoMWL1Y=.48abe896-7a09-4bf1-a236-d86ffd35fcdf@github.com> On Fri, 12 Jul 2024 15:56:59 GMT, Roman Kennke <rkennke at openjdk.org> wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Update arguments.cpp > > src/hotspot/share/runtime/synchronizer.cpp line 390: > >> 388: >> 389: static bool useHeavyMonitors() { >> 390: #if defined(X86) || defined(AARCH64) || defined(PPC64) || defined(RISCV64) || defined(S390) > > Why are those if-defs here? Why is ARM excluded? Oh I see, you only moved this up. Still a bit puzzling. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1676142470 From coleenp at openjdk.org Fri Jul 12 17:44:53 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 12 Jul 2024 17:44:53 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6] In-Reply-To: <CkZ-Sr3ITmhrMyAhsjGUsf2LgyiU2QhaNdvbkoMWL1Y=.48abe896-7a09-4bf1-a236-d86ffd35fcdf@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com> <H1xx5Q5Wsuz3cl0FP1fwX4kL-jYdqbQ3skKwYcd54vo=.bd7abee8-0300-4253-a8b4-428ae8da1a0e@github.com> <CkZ-Sr3ITmhrMyAhsjGUsf2LgyiU2QhaNdvbkoMWL1Y=.48abe896-7a09-4bf1-a236-d86ffd35fcdf@github.com> Message-ID: <mYOetX5LzfVBYpl9xDGQlJJxxntXdKRfAYCs_g0L5_g=.4863065e-35c1-474e-abcc-cb19789ed6aa@github.com> On Fri, 12 Jul 2024 15:58:56 GMT, Roman Kennke <rkennke at openjdk.org> wrote: >> src/hotspot/share/runtime/synchronizer.cpp line 390: >> >>> 388: >>> 389: static bool useHeavyMonitors() { >>> 390: #if defined(X86) || defined(AARCH64) || defined(PPC64) || defined(RISCV64) || defined(S390) >> >> Why are those if-defs here? Why is ARM excluded? > > Oh I see, you only moved this up. Still a bit puzzling. This code was just moved. No idea why ARM is excluded. I filed this to deal with this. https://bugs.openjdk.org/browse/JDK-8336325 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1676253183 From jbhateja at openjdk.org Fri Jul 12 18:31:59 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 12 Jul 2024 18:31:59 GMT Subject: RFR: 8335860: compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard AVX/SSE settings Message-ID: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com> Enabling test with explicit feature checks for x86 target. Removing from test/hotspot/jtreg/ProblemList.txt Best Regards, Jatin ------------- Commit messages: - 8335860: compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard AVX/SSE settings Changes: https://git.openjdk.org/jdk/pull/20160/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20160&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335860 Stats: 5 lines in 3 files changed: 1 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20160.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20160/head:pull/20160 PR: https://git.openjdk.org/jdk/pull/20160 From jvernee at openjdk.org Fri Jul 12 20:59:26 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 12 Jul 2024 20:59:26 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2] In-Reply-To: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> Message-ID: <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> > This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. > > Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. > > In this PR: > - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside compiled code. > - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. > - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. > - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. > > I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: > > > Benchmark Threads Mode Cnt Score Error Units > ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op > ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op > ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op > ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op > ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op > ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op > ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op > ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op > ConcurrentClose.sharedAccess 1 avgt 10 52.042 ? 0.630 us/op > ConcurrentClose.confinedAccess 32 avgt 10 25.517 ? 1.069 ... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: track has_scoped_access for compiled methods ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20158/files - new: https://git.openjdk.org/jdk/pull/20158/files/34ff5fd8..d1266b53 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=00-01 Stats: 42 lines in 15 files changed: 38 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20158.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20158/head:pull/20158 PR: https://git.openjdk.org/jdk/pull/20158 From jvernee at openjdk.org Fri Jul 12 20:59:26 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 12 Jul 2024 20:59:26 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena In-Reply-To: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> Message-ID: <-ywJLT6LHavlxhuYXJQTh6xvvhz00oFECkpiCvz_Y4w=.a67a4c7d-b503-475f-aee0-0e042acbccc6@github.com> On Fri, 12 Jul 2024 13:57:23 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: > This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. > > Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. > > In this PR: > - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside compiled code. > - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. > - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. > - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. > > I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: > > > Benchmark Threads Mode Cnt Score Error Units > ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op > ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op > ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op > ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op > ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op > ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op > ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op > ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op > ConcurrentClose.sharedAccess 1 avgt 10 52.042 ? 0.630 us/op > ConcurrentClose.confinedAccess 32 avgt 10 25.517 ? 1.069 ... > This could be narrowed down further by tracking for each compiled method if it has an (inlined) call to an `@Scoped` method, but I've left that out for now. I decided to add this to the PR for completeness, so that we don't go and deoptimize frames that are not using scoped accesses at all. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2226341535 From duke at openjdk.org Fri Jul 12 21:18:53 2024 From: duke at openjdk.org (duke) Date: Fri, 12 Jul 2024 21:18:53 GMT Subject: RFR: 8336278: Micro-optimize Replace String.format("%n") to System.lineSeparator In-Reply-To: <Wq0CZfwc1zPhr-zfj7K2iSXSMbRtbr9mfvjBshZNpo0=.cd467619-c484-4167-a34c-516e05bbc67f@github.com> References: <Wq0CZfwc1zPhr-zfj7K2iSXSMbRtbr9mfvjBshZNpo0=.cd467619-c484-4167-a34c-516e05bbc67f@github.com> Message-ID: <xNX4TovrPZgAe2DdqIuUNO1RVgo3rVlGuGV47_r6tto=.934fa260-a0de-4579-9a3a-22dde9420a9d@github.com> On Thu, 11 Jul 2024 22:45:47 GMT, Shaojin Wen <duke at openjdk.org> wrote: > There are three places in the JDK code where String.format("%n") is used. This is actually equivalent to System.lineSeparator and does not require the implementation of String.format. @wenshao Your change (at version 829da3e149eadedd22d81d22a2d025516c59c210) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20149#issuecomment-2226364868 From duke at openjdk.org Fri Jul 12 21:52:04 2024 From: duke at openjdk.org (Shaojin Wen) Date: Fri, 12 Jul 2024 21:52:04 GMT Subject: Integrated: 8336278: Micro-optimize Replace String.format("%n") to System.lineSeparator In-Reply-To: <Wq0CZfwc1zPhr-zfj7K2iSXSMbRtbr9mfvjBshZNpo0=.cd467619-c484-4167-a34c-516e05bbc67f@github.com> References: <Wq0CZfwc1zPhr-zfj7K2iSXSMbRtbr9mfvjBshZNpo0=.cd467619-c484-4167-a34c-516e05bbc67f@github.com> Message-ID: <MAqa6C_mHMuQtCN7ka4-e_wGH1aPNziQaZt8LsuZALc=.e8b9d092-e89d-4c1c-bda6-62249403ca32@github.com> On Thu, 11 Jul 2024 22:45:47 GMT, Shaojin Wen <duke at openjdk.org> wrote: > There are three places in the JDK code where String.format("%n") is used. This is actually equivalent to System.lineSeparator and does not require the implementation of String.format. This pull request has now been integrated. Changeset: 4957145e Author: Shaojin Wen <shaojin.wensj at alibaba-inc.com> Committer: Chen Liang <liach at openjdk.org> URL: https://git.openjdk.org/jdk/commit/4957145e6c823bfaa638a77457da5c031af978b9 Stats: 6 lines in 3 files changed: 0 ins; 0 del; 6 mod 8336278: Micro-optimize Replace String.format("%n") to System.lineSeparator Reviewed-by: dnsimon, shade ------------- PR: https://git.openjdk.org/jdk/pull/20149 From hboehm at google.com Sat Jul 13 00:36:33 2024 From: hboehm at google.com (Hans Boehm) Date: Fri, 12 Jul 2024 17:36:33 -0700 Subject: RFR: 8333791: Fix memory barriers for @Stable fields In-Reply-To: <mailman.15905.1720688298.324.hotspot-dev@openjdk.org> References: <mailman.15905.1720688298.324.hotspot-dev@openjdk.org> Message-ID: <CAMOCf+i5Eb8xFMPw_+eeSpyXcFEiXeEacLk0ZqYQBxCpHkxDxg@mail.gmail.com> > Message: 1 > Date: Thu, 11 Jul 2024 08:50:57 GMT > From: John R Rose <jrose at openjdk.org> > To: <hotspot-dev at openjdk.org> > Subject: Re: RFR: 8333791: Fix memory barriers for @Stable fields > Message-ID: > <pfFWmbs1q_M-WQIDyBw15ctVdRcAudSrdJ6BEQRx41E=. 762c100f-7650-47fd-bfe3-ac620913384f at github.com> > > Content-Type: text/plain; charset=utf-8 > > On Mon, 10 Jun 2024 18:05:09 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > > > See bug for more discussion. > > > > Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193 > > > > A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice: > > https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160 > > > > AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone. > > > > I [performed an audit]( https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields. > > > > Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit. > > > > C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32]( https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner. > > > > Additional testing: > > - [x] New IR tests > > - [x] Linux x86_64 server fastdebug, `all` > > - [x] Linux AArch64 server fastdebug, `all` > > I like this compromise. Let me see if I got it right: A stable write in a constructor is treated like a final write ? it triggers a barrier at the end of the constructor. That?s a cheap move. No other barriers are added automatically, for reads or other writes, saving us from doing less cheap moves. The burden would be on users of stable vars (in fancy access patterns) to add more fences if needed, but we don?t see any important cases of that, at the moment. > No opinion on the merits here. But IIUC, "as memory safe as finals" is a slightly squishy notion here. The downside of not having the release fence is that even with safe publication, a write to an @Stable field outside the constructor can be seen by a read in the constructor, before the object is published. That's arguably weirder than final field behavior, and not something that can arise with final fields. But it still only happens in the presence of data races, and thus probably not in code you should be writing anyway. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20240712/c76ad150/attachment.htm> From duke at openjdk.org Sat Jul 13 05:19:03 2024 From: duke at openjdk.org (duke) Date: Sat, 13 Jul 2024 05:19:03 GMT Subject: Withdrawn: 8301464: Code in GenFullCP is still disabled after JDK-8079697 was fixed In-Reply-To: <BAqvKpFjumZRFacqHsYUioKLlfPISiGcfCUbJtFyLA0=.4f580a3b-4b65-41cb-885e-1d945c380b1c@github.com> References: <BAqvKpFjumZRFacqHsYUioKLlfPISiGcfCUbJtFyLA0=.4f580a3b-4b65-41cb-885e-1d945c380b1c@github.com> Message-ID: <K1IyX2kviBwKDaCsIGi6Q8XjsCc8wRddGcMlb0AjTV4=.62814bf8-7dd8-4418-aa96-0ce2607371df@github.com> On Tue, 14 May 2024 03:05:27 GMT, xiaotaonan <duke at openjdk.org> wrote: > Code in GenFullCP is still disabled after JDK-8079697 was fixed > note:I have not found any relevant information on why ClassWriter.COMPUTE_FRAMES is disabled in JDK-8079697. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/19228 From forax at openjdk.org Sat Jul 13 11:12:51 2024 From: forax at openjdk.org (=?UTF-8?B?UsOpbWk=?= Forax) Date: Sat, 13 Jul 2024 11:12:51 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2] In-Reply-To: <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> Message-ID: <s1XasFdk32maXz3tFyJk5buq1tlHS5xV2GoETU3-Tys=.962cbff0-0271-4deb-9357-c7c4e26599f6@github.com> On Fri, 12 Jul 2024 20:59:26 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. >> >> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. >> >> In this PR: >> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside compiled code. >> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. >> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. >> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. >> >> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: >> >> >> Benchmark Threads Mode Cnt Score Error Units >> ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op >> ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op >> ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op >> ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op >> ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op >> ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op >> ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op >> ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op >> ConcurrentClose.sharedAccess 1 avgt 10 52.042 ? 0.630 us/op >> ConcurrentClose.confine... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > track has_scoped_access for compiled methods Nice work ! Thinking a bit about how to improve the benchmark and given the semantics of Arena.close(), there is a trick that you can use. There are two kinds of memory segments, the one that only visible from Java and the one that are visible not only from Java. By example, a memory segment created from a mmap or a memory segment with an address sent to a C code are visible from outside Java, for those, you have no choice but wait in Arena.close() until all threads have answered to the handshakes. For all the other memory segments, because they are only visible from Java, their memory can be reclaimed asynchronously, i.e. the last thread of the handshakes can free the corresponding memory segments, so the thread that call Arena.close() is free to run even if the memory is not yet reclaimed. >From my armchair, that seems a awful lot of engeneering so it may not worth it, but now you know :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2226858328 From jvernee at openjdk.org Sat Jul 13 14:29:55 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Sat, 13 Jul 2024 14:29:55 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2] In-Reply-To: <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> Message-ID: <t2RnOoTnKhtDfrsmF42_BRTwV-eWFcUobQ89P-VJjbM=.5081d3f4-2e2c-4a2a-9f03-08e25af0275d@github.com> On Fri, 12 Jul 2024 20:59:26 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. >> >> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. >> >> In this PR: >> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it. >> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. >> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. >> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. >> >> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: >> >> >> Benchmark Threads Mode Cnt Score Error Units >> ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op >> ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op >> ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op >> ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op >> ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op >> ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op >> ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op >> ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op >> ConcurrentClose.sharedAccess 1 avgt 10 ... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > track has_scoped_access for compiled methods That is something we considered in the past as well (I think Maurizio even had a prototype at some point). The issue is that close should be deterministic. i.e. after the call to `close()` returns, all memory should be freed. That is an essential property for applications that have most of their virtual address space tied up, and then want to release and immediately re-use a big chunk of it. If it's not important that memory is freed deterministically, but memory should still be accessible from multiple threads, an automatic arena might be a better choice. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2226929736 From mgronlun at openjdk.org Sat Jul 13 14:53:19 2024 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sat, 13 Jul 2024 14:53:19 GMT Subject: RFR: 8334781: JFR crash: assert(((((JfrTraceIdBits::load(klass)) & ((JfrTraceIdEpoch::this_epoch_method_and_class_bits()))) != 0))) failed: invariant Message-ID: <aHCZUov46bOLAQiJBG-h65BUAKeOvc0Lz-Jkr39PQ98=.743a7e19-075b-40ee-b886-82a6717641a2@github.com> Greetings, Please help review this adjustment, which fixes rare situations where methods that have been retransformed or redefined can be perceived as being tagged by JFR when they, in fact, are not. The fix unconditionally sets the metatag clear bits on artefact initialization and adds assertions about the JFR bit tag state machine. Testing: jdk_jfr, stress testing Thanks Markus ------------- Commit messages: - 8334781 Changes: https://git.openjdk.org/jdk/pull/20171/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20171&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334781 Stats: 34 lines in 7 files changed: 17 ins; 3 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/20171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20171/head:pull/20171 PR: https://git.openjdk.org/jdk/pull/20171 From eosterlund at openjdk.org Sat Jul 13 15:15:51 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Sat, 13 Jul 2024 15:15:51 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2] In-Reply-To: <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> Message-ID: <JlANbo3VlMnnTFmbfBDKxQUjYy3PBX4JlzzQmFEjtjg=.34c00e97-7dcf-494e-8c07-2dabe6deb978@github.com> On Fri, 12 Jul 2024 20:59:26 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. >> >> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. >> >> In this PR: >> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it. >> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. >> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. >> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. >> >> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: >> >> >> Benchmark Threads Mode Cnt Score Error Units >> ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op >> ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op >> ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op >> ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op >> ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op >> ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op >> ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op >> ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op >> ConcurrentClose.sharedAccess 1 avgt 10 ... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > track has_scoped_access for compiled methods Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20158#pullrequestreview-2176318150 From eosterlund at openjdk.org Sat Jul 13 15:31:55 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Sat, 13 Jul 2024 15:31:55 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2] In-Reply-To: <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> Message-ID: <OnqjJptKgPgbiZbFAHOraOyF5BgiP3dz_6o5Wz8OYxs=.d37f7763-7efe-4bdf-9523-52c4f733bb59@github.com> On Fri, 12 Jul 2024 20:59:26 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. >> >> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. >> >> In this PR: >> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it. >> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. >> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. >> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. >> >> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: >> >> >> Benchmark Threads Mode Cnt Score Error Units >> ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op >> ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op >> ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op >> ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op >> ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op >> ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op >> ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op >> ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op >> ConcurrentClose.sharedAccess 1 avgt 10 ... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > track has_scoped_access for compiled methods @dougxc might want to have a look at Graal support for this one. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2226957995 From jvernee at openjdk.org Sat Jul 13 16:08:50 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Sat, 13 Jul 2024 16:08:50 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2] In-Reply-To: <OnqjJptKgPgbiZbFAHOraOyF5BgiP3dz_6o5Wz8OYxs=.d37f7763-7efe-4bdf-9523-52c4f733bb59@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> <OnqjJptKgPgbiZbFAHOraOyF5BgiP3dz_6o5Wz8OYxs=.d37f7763-7efe-4bdf-9523-52c4f733bb59@github.com> Message-ID: <y2sUl8FsxgrwFuAcQg_w9CffblaZWgyn6RAopMSk7Z8=.1fc54eb8-aa1b-4fe8-9aae-12d86e3942b8@github.com> On Sat, 13 Jul 2024 15:28:57 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: > @dougxc might want to have a look at Graal support for this one. Yes, I conservatively implemented `has_scoped_access()` for Graal (see `jvmciRuntime.cpp` changes). It won't regress anything, but there's still an opportunity for improvement. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2226974681 From forax at openjdk.org Sat Jul 13 16:45:50 2024 From: forax at openjdk.org (=?UTF-8?B?UsOpbWk=?= Forax) Date: Sat, 13 Jul 2024 16:45:50 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2] In-Reply-To: <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> Message-ID: <0zSpYFkv6lAR8G0FpPDyFP-uLqh92ZQ5uW5xVCRXmyg=.c14d0ee0-e0bf-4367-9dfa-c613489684c9@github.com> On Fri, 12 Jul 2024 20:59:26 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. >> >> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. >> >> In this PR: >> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it. >> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. >> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. >> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. >> >> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: >> >> >> Benchmark Threads Mode Cnt Score Error Units >> ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op >> ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op >> ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op >> ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op >> ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op >> ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op >> ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op >> ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op >> ConcurrentClose.sharedAccess 1 avgt 10 ... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > track has_scoped_access for compiled methods Knowing that all the segments are freed during close() is something you may want. But having the execution time of close() be linear with the number of threads is also problematic. Maybe, it means that we need another kind of Arena that works like shared() but allow the freed to be done asynchronously (ofSharedAsyncFree ?). Note that the semantics of ofSharedAsyncFree() is different from ofAuto(), ofAuto() relies on the GC to free a segment so the delay before a segment is freed is not time bounded if the application has enough memory, the memory of the segment may never be reclaimed. With ofSharedAsyncFree(), the segments are freed by the last thread, so while this mechanism is not deterministic, it is time bounded. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2226992713 From uschindler at openjdk.org Sun Jul 14 11:04:54 2024 From: uschindler at openjdk.org (Uwe Schindler) Date: Sun, 14 Jul 2024 11:04:54 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2] In-Reply-To: <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> Message-ID: <LjCucUevFLYVoUMkuwCFQVefc4XJOe4LhnKyzKgv7dc=.45bba479-3885-4c34-a9cf-d737d67cb432@github.com> On Fri, 12 Jul 2024 20:59:26 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. >> >> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. >> >> In this PR: >> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it. >> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. >> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. >> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. >> >> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: >> >> >> Benchmark Threads Mode Cnt Score Error Units >> ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op >> ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op >> ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op >> ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op >> ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op >> ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op >> ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op >> ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op >> ConcurrentClose.sharedAccess 1 avgt 10 ... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > track has_scoped_access for compiled methods Hi Jorn, Many thanks for working on this! I have one problem with the benchmark: I think it is not measuring the whole setup in a way that is our workload: The basic problem is that we don't want to deoptimize threads which are not related to MemorySegments. So basically, the throughput of those threads should not be affected. For threads currently in a memory-segment read it should have a bit of effect, but it should recover fast. The given benchmark somehow only measures the following: It starts many threads; in each it opens a shared memory segment, does some work and closes it. So it measures the throughput of the whole "create shared/work on it/close shared" workload. Actually the problems we see in Lucene are more that we have many threads working on shared memory segments or on other tasks not related to memory segments at all, while a few threads are concurrently closing and opening new arenas. With more threads concurrently closing the arenas, also the throughput on other threads degrades. So IMHO, the benchamrk should be improved to have a few threads (configurable) that open/close memory segments and a list of other threads that do other work and finally a list of threads reading from the memory segments opened by first thread. The testcase you wrote is more fitting the above workload. Maybe the benchmark should be setup more like the test. If you have a benchmark with that workload it should better show an improvement. The current benchmark has the problem that it measures the whole open/work/close on shared sgements. And slosing a shared segment is always heavy, because it has to trigger and wait for the thread-local handshake. Why is the test preventing inlining of the inner read method? I may be able to benchmark a Lucene workload with a custom JDK build next week. It might be an idea to use the wrong DaCapoBenchmark (downgrade to older version before it has fixed https://github.com/dacapobench/dacapobench/issues/264 , specifically https://github.com/dacapobench/dacapobench/commit/76588b28d516ae19f51a80e7287d404385a2c146). Uwe ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2227303884 From uschindler at openjdk.org Sun Jul 14 11:10:00 2024 From: uschindler at openjdk.org (Uwe Schindler) Date: Sun, 14 Jul 2024 11:10:00 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2] In-Reply-To: <0zSpYFkv6lAR8G0FpPDyFP-uLqh92ZQ5uW5xVCRXmyg=.c14d0ee0-e0bf-4367-9dfa-c613489684c9@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> <0zSpYFkv6lAR8G0FpPDyFP-uLqh92ZQ5uW5xVCRXmyg=.c14d0ee0-e0bf-4367-9dfa-c613489684c9@github.com> Message-ID: <Ii5WTv26SUHutPAvrKXxS-0pWJvE7roJ7DXQBStH3XI=.42b58d78-60be-4533-a60e-7693a3dfeed0@github.com> On Sat, 13 Jul 2024 16:43:16 GMT, R?mi Forax <forax at openjdk.org> wrote: > Knowing that all the segments are freed during close() is something you may want. But having the execution time of close() be linear with the number of threads is also problematic. Maybe, it means that we need another kind of Arena that works like shared() but allow the freed to be done asynchronously (ofSharedAsyncFree ?). > > Note that the semantics of ofSharedAsyncFree() is different from ofAuto(), ofAuto() relies on the GC to free a segment so the delay before a segment is freed is not time bounded if the application has enough memory, the memory of the segment may never be reclaimed. With ofSharedAsyncFree(), the segments are freed by the last thread, so while this mechanism is not deterministic, it is time bounded. That's a great suggestion! In our case we just want the index files open as soon as possible, but not on next GC (which will be horrible and brings us back into the times of DirectByteBuffer). The problem with GC is that the Arena/MemorySegments and so on are tiny objects which will live for very long time, especially when they were used for quite some time (like an index segment of an Lucene index). Of course for testing purposes in Lucene we could use `ofShared()` (to make sure all mmapped files are freeed, especially on Windows as soon as index is close), but in production environments we could offer the option to use delayed close to improve throughput. Uwe ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2227305407 From duke at openjdk.org Sun Jul 14 13:17:20 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Sun, 14 Jul 2024 13:17:20 GMT Subject: RFR: 8334999: RISC-V: implement AES single block encryption/decryption intrinsics [v4] In-Reply-To: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com> References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com> Message-ID: <OoIrgTl5AYcdVT0LI29XBYklYtlfeyu8BmEwkw2dnss=.ce405fe8-7445-4408-933b-c89c5767bc53@github.com> > Hello everyone! Please review this port of vector AES single block encryption/decryption intrinsics. On my QEMU with `Zvkned` extension enabled the `test/hotspot/jtreg/compiler/codegen/aes/TestAESMain.java` test is OK. I know that currently hardware implementing this extension is not available on the market but I suppose this PR can be a good starting point on supporting AES intrinsics for RISC-V in OpenJDK. ArsenyBochkarev has updated the pull request incrementally with two additional commits since the last revision: - Add newlines after each of new functions - Use global x0 instead of alias for it ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19960/files - new: https://git.openjdk.org/jdk/pull/19960/files/8520bc3a..ede19103 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19960&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19960&range=02-03 Stats: 5 lines in 1 file changed: 3 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19960.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19960/head:pull/19960 PR: https://git.openjdk.org/jdk/pull/19960 From duke at openjdk.org Sun Jul 14 13:17:20 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Sun, 14 Jul 2024 13:17:20 GMT Subject: RFR: 8334999: RISC-V: implement AES single block encryption/decryption intrinsics [v2] In-Reply-To: <T59CuchKVcFhqy7VAzIHxakveuo2bJFrORdrKQwoFLE=.1b43c0cb-d05e-45eb-b85c-026b44dea080@github.com> References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com> <eGRQlTfJGvdSd84lJn1MUGon75zsDTYTOhMbVqQryC8=.3cff42c0-7b5c-4870-929e-3acfa74e31bd@github.com> <vknSXGLwqD-p-lOrVwzn8rU6mTY3o4NP3eRbp4smvoI=.33dba76f-cd79-4d55-9e87-58e37adfeaf8@github.com> <T59CuchKVcFhqy7VAzIHxakveuo2bJFrORdrKQwoFLE=.1b43c0cb-d05e-45eb-b85c-026b44dea080@github.com> Message-ID: <x-FAWINJvvNt_Qg4EAmdtmfAsJZBDU7pbuP1QcvABrU=.7db48ca9-9dee-4b9c-829a-b1ef07e4271a@github.com> On Tue, 9 Jul 2024 08:36:16 GMT, Fei Yang <fyang at openjdk.org> wrote: >> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2282: >> >>> 2280: __ vrev8_v(vtmp1, vtmp1); >>> 2281: __ vrev8_v(vtmp2, vtmp2); >>> 2282: } >> >> Please leave a new line after each of these newly-added functions. > > BTW: Did you compare this with the openssl version which also makes use of `vaesz_vs` instruction from `Zvkned` [1]? > > [1] https://github.com/openssl/openssl/blob/master/crypto/aes/asm/aes-riscv64-zvkb-zvkned.pl Done! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1677133443 From duke at openjdk.org Sun Jul 14 13:17:21 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Sun, 14 Jul 2024 13:17:21 GMT Subject: RFR: 8334999: RISC-V: implement AES single block encryption/decryption intrinsics [v2] In-Reply-To: <vknSXGLwqD-p-lOrVwzn8rU6mTY3o4NP3eRbp4smvoI=.33dba76f-cd79-4d55-9e87-58e37adfeaf8@github.com> References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com> <eGRQlTfJGvdSd84lJn1MUGon75zsDTYTOhMbVqQryC8=.3cff42c0-7b5c-4870-929e-3acfa74e31bd@github.com> <vknSXGLwqD-p-lOrVwzn8rU6mTY3o4NP3eRbp4smvoI=.33dba76f-cd79-4d55-9e87-58e37adfeaf8@github.com> Message-ID: <sjq-jSCSS8ZVs4OHKtAbFNQ4UTMWlvl6T0npQSjAbNs=.f2992394-755e-4acd-8d87-1c33f8635d82@github.com> On Mon, 8 Jul 2024 14:50:00 GMT, Fei Yang <fyang at openjdk.org> wrote: >> ArsenyBochkarev has updated the pull request incrementally with three additional commits since the last revision: >> >> - Use t2 directly instead of temp2 >> - Rename temp1 -> x0 >> - Left a note on a side effect of generate_vle32_pack4 > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2332: > >> 2330: const Register key = c_rarg2; // key array address >> 2331: const Register keylen = c_rarg3; >> 2332: const Register x0 = c_rarg4; > > I think you can use the global `x0` (aka the zero register) instead for `vsetivli`. It very confusing to have register alias names like `x0` like here. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1677133451 From duke at openjdk.org Sun Jul 14 15:00:04 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Sun, 14 Jul 2024 15:00:04 GMT Subject: RFR: 8334999: RISC-V: implement AES single block encryption/decryption intrinsics [v5] In-Reply-To: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com> References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com> Message-ID: <Yt-B895uR8jzQ6h90NG1ObK9-Dq1xk0dLAaz30Pi6gY=.6a78060e-f59f-47e8-9819-df255c8cee83@github.com> > Hello everyone! Please review this port of vector AES single block encryption/decryption intrinsics. On my QEMU with `Zvkned` extension enabled the `test/hotspot/jtreg/compiler/codegen/aes/TestAESMain.java` test is OK. I know that currently hardware implementing this extension is not available on the market but I suppose this PR can be a good starting point on supporting AES intrinsics for RISC-V in OpenJDK. ArsenyBochkarev has updated the pull request incrementally with one additional commit since the last revision: Multiversion encryption depending on keylen ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19960/files - new: https://git.openjdk.org/jdk/pull/19960/files/ede19103..8f1f98b5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19960&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19960&range=03-04 Stats: 76 lines in 1 file changed: 22 ins; 25 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/19960.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19960/head:pull/19960 PR: https://git.openjdk.org/jdk/pull/19960 From duke at openjdk.org Sun Jul 14 15:00:04 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Sun, 14 Jul 2024 15:00:04 GMT Subject: RFR: 8334999: RISC-V: implement AES single block encryption/decryption intrinsics [v3] In-Reply-To: <IATUuy7OYBIasXTq1KFmVEjeg2eQ9qFM2UP5B0UhoHw=.7a112155-e875-4752-b6f4-fbeb56248759@github.com> References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com> <F1yms2X9VVITjLPANuQqABre5E199ILHQ4ywpS4cicY=.3e2c0af1-8070-497a-bfa0-5732eb199974@github.com> <IATUuy7OYBIasXTq1KFmVEjeg2eQ9qFM2UP5B0UhoHw=.7a112155-e875-4752-b6f4-fbeb56248759@github.com> Message-ID: <gUcwCyjQ9PLL3JiB2PFaCzec5qfwHz2BUotiqLqGfJA=.ad375f3a-17a3-4058-9ee5-f07586d64e42@github.com> On Tue, 9 Jul 2024 05:28:13 GMT, Fei Yang <fyang at openjdk.org> wrote: >> ArsenyBochkarev has updated the pull request incrementally with one additional commit since the last revision: >> >> Left a note on a side effect of generate_vle32_pack2 > > Changes requested by fyang (Reviewer). As for comparison with the openssl version: first of all, thanks for the sources, @RealFYang! The main difference that I see is that they introduced three different different versions of encryption depending on the key sizes, which allows them to skip a couple of instructions, like when I did `vaesem_vv(res, vzero)` followed by `vxor_vv(res, res, vtemp1)`. So I thought it'll be more efficient to replace the current version by something openssl-lookalike. The only problem I see is increasing code size a bit. Please let me know if we are not interested in this change for some reason ------------- PR Comment: https://git.openjdk.org/jdk/pull/19960#issuecomment-2227377554 From duke at openjdk.org Sun Jul 14 15:09:04 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Sun, 14 Jul 2024 15:09:04 GMT Subject: RFR: 8334999: RISC-V: implement AES single block encryption/decryption intrinsics [v6] In-Reply-To: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com> References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com> Message-ID: <GPGRoOpsnwB4pgPTPjLAB_urcM6X8fhrQgRQwT6tMQY=.0b8f0d8d-9d63-4086-9378-cf4d36359a3d@github.com> > Hello everyone! Please review this port of vector AES single block encryption/decryption intrinsics. On my QEMU with `Zvkned` extension enabled the `test/hotspot/jtreg/compiler/codegen/aes/TestAESMain.java` test is OK. I know that currently hardware implementing this extension is not available on the market but I suppose this PR can be a good starting point on supporting AES intrinsics for RISC-V in OpenJDK. ArsenyBochkarev has updated the pull request incrementally with one additional commit since the last revision: Use one L_end for all AES key sizes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19960/files - new: https://git.openjdk.org/jdk/pull/19960/files/8f1f98b5..407b9af0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19960&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19960&range=04-05 Stats: 12 lines in 1 file changed: 1 ins; 8 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19960.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19960/head:pull/19960 PR: https://git.openjdk.org/jdk/pull/19960 From aboldtch at openjdk.org Mon Jul 15 00:50:30 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 15 Jul 2024 00:50:30 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9] In-Reply-To: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> Message-ID: <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com> > When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. > > This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. > > A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). > > This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). > > Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. > > # Cleanups > > Cleaned up displaced header usage for: > * BasicLock > * Contains some Zero changes > * Renames one exported JVMCI field > * ObjectMonitor > * Updates comments and tests consistencies > > # Refactoring > > `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. > > The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. > > _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ > > # LightweightSynchronizer > > Working on adapting and incorporating the following section as a comment in the source code > > ## Fast Locking > > CAS on locking bits in markWord. > 0b00 (Fast Locked) <--> 0b01 (Unlocked) > > When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. > > If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf... Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision: - Remove try_read - Add explicit to single parameter constructors - Remove superfluous access specifier - Remove unused include - Update assert message OMCache::set_monitor - Fix indentation - Remove outdated comment LightweightSynchronizer::exit - Remove logStream include - Remove strange comment - Fix javaThread include ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20067/files - new: https://git.openjdk.org/jdk/pull/20067/files/cccffeda..ebf11542 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=07-08 Stats: 25 lines in 5 files changed: 0 ins; 8 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/20067.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067 PR: https://git.openjdk.org/jdk/pull/20067 From aboldtch at openjdk.org Mon Jul 15 00:50:33 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 15 Jul 2024 00:50:33 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6] In-Reply-To: <H1xx5Q5Wsuz3cl0FP1fwX4kL-jYdqbQ3skKwYcd54vo=.bd7abee8-0300-4253-a8b4-428ae8da1a0e@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com> <H1xx5Q5Wsuz3cl0FP1fwX4kL-jYdqbQ3skKwYcd54vo=.bd7abee8-0300-4253-a8b4-428ae8da1a0e@github.com> Message-ID: <u2VLk8hKBH5V6331fMIPCwusNARMd_v-q_wL_7r0AOA=.99b9b9f1-ac37-4cb6-9ad0-4e019fe3c1fe@github.com> On Fri, 12 Jul 2024 11:09:35 GMT, Roman Kennke <rkennke at openjdk.org> wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Update arguments.cpp > > src/hotspot/share/oops/instanceKlass.cpp line 1090: > >> 1088: >> 1089: // Step 2 >> 1090: // If we were to use wait() instead of waitUninterruptibly() then > > This is a nice correction (even though, the actual call below is wait_uninterruptibly() ;-) ), but seems totally unrelated. I was thinking it was referring to `ObjectSynchronizer::waitUninterruptibly` added the same commit as the comment b3bf31a0a08da679ec2fd21613243fb17b1135a9 > src/hotspot/share/oops/markWord.cpp line 27: > >> 25: #include "precompiled.hpp" >> 26: #include "oops/markWord.hpp" >> 27: #include "runtime/basicLock.inline.hpp" > > I don't think this include is needed (at least not by the changed code parts, I haven't checked existing code). It is probably included through some other transitive include. However all the metadata functions are now inlined. These are used here. `inline markWord BasicLock::displaced_header() const` and `inline void BasicLock::set_displaced_header(markWord header)` > src/hotspot/share/runtime/arguments.cpp line 1820: > >> 1818: warning("New lightweight locking not supported on this platform"); >> 1819: } >> 1820: if (UseObjectMonitorTable) { > > Uhm, wait a second. That list of platforms covers all existing platforms anyway, so the whole block could be removed? Or is there a deeper meaning here that I don't understand? Zero. Used as as start point for porting to new platforms. > src/hotspot/share/runtime/basicLock.cpp line 37: > >> 35: if (mon != nullptr) { >> 36: mon->print_on(st); >> 37: } > > I am not sure if we wanted to do this, but we know the owner, therefore we could also look-up the OM from the table, and print it. It wouldn't have all that much to do with the BasicLock, though. Yeah maybe it is unwanted. Not sure how we should treat these prints of the frames. My thinking was that there is something in the cache, print it. But maybe just treating it as some internal data, maybe print "monitor { <Cached ObjectMonitor* address> }" or similar is better. > src/hotspot/share/runtime/basicLock.inline.hpp line 45: > >> 43: return reinterpret_cast<ObjectMonitor*>(get_metadata()); >> 44: #else >> 45: // Other platforms does not make use of the cache yet, > > If it's not used, why does it matter to special case the code here? Because it is not used it there may be uninitialised values there. See https://github.com/openjdk/jdk/pull/20067#discussion_r1671959763 > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 28: > >> 26: >> 27: #include "classfile/vmSymbols.hpp" >> 28: #include "javaThread.inline.hpp" > > This include is incorrect (and my IDE says it's not needed). Correct, is should be `runtime/javaThread.inline.hpp`. Fixed. > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 31: > >> 29: #include "jfrfiles/jfrEventClasses.hpp" >> 30: #include "logging/log.hpp" >> 31: #include "logging/logStream.hpp" > > Include of logStream.hpp not needed? Yeah we removed all log streams. Removed. > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 80: > >> 78: >> 79: ConcurrentTable* _table; >> 80: volatile size_t _table_count; > > Looks like a misnomer to me. We only have one table, but we do have N entries/nodes. This is counted when new nodes are allocated or old nodes are freed. Consider renaming this to '_entry_count' or '_node_count'? I'm actually a bit surprised if ConcurrentHashTable doesn't already track this... I think I was thinking of the names as a prefix to refer to the `Count of the table` and `Size of the table`. And not the `Number of tables`. But I can see the confusion. `ConcurrentHashTable` tracks no statistics except for JFR which added some counters directly into the implementation. All statistics are for the users to manage, even if there are helpers for gather these statistics. The current implementation is based on what we do for the StringTable and SymbolTable > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 88: > >> 86: >> 87: public: >> 88: Lookup(oop obj) : _obj(obj) {} > > Make explicit? Done. > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 97: > >> 95: >> 96: bool equals(ObjectMonitor** value) { >> 97: // The entry is going to be removed soon. > > What does this comment mean? Not sure where it came from. Removed. > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 112: > >> 110: >> 111: public: >> 112: LookupMonitor(ObjectMonitor* monitor) : _monitor(monitor) {} > > Make explicit? Done. > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 159: > >> 157: static size_t min_log_size() { >> 158: // ~= log(AvgMonitorsPerThreadEstimate default) >> 159: return 10; > > Uh wait - are we assuming that threads hold 1024 monitors *on average* ? Isn't this a bit excessive? I would have thought maybe 8 monitors/thread. Yes there are workloads that are bonkers. Or maybe the comment/flag name does not say what I think it says. > > Or why not use AvgMonitorsPerThreadEstimate directly? Maybe that is resonable. I believe I had that at some point but it had to deal with how to handle extreme values of `AvgMonitorsPerThreadEstimate` as well as what to do when `AvgMonitorsPerThreadEstimate` was disabled `=0`. One 4 / 8 KB allocation seems harmless. But this was very arbitrary. This will probably be changed when/if the resizing of the table becomes more synchronised with deflation, allowing for shrinking the table. > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 349: > >> 347: assert(LockingMode == LM_LIGHTWEIGHT, "must be"); >> 348: >> 349: if (try_read) { > > All the callers seem to pass try_read = true. Why do we have the branch at all? I'll clean this up. From experiments if was never better to use `insert_get` over a `get; insert_get`, even if we tried to be cleaver on when we skipped the initial get. > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 401: > >> 399: >> 400: if (inserted) { >> 401: // Hopefully the performance counters are allocated on distinct > > It doesn't look like the counters are on distinct cache lines (see objectMonitor.hpp, lines 212ff). If this is a concern, file a bug to investigate it later? The comment here is a bit misplaced, IMO. It originates from https://github.com/openjdk/jdk/blob/15997bc3dfe9dddf21f20fa189f97291824892de/src/hotspot/share/runtime/synchronizer.cpp#L1543 I think we just kept it and did not think more about it. Not sure what it is referring to. Maybe @dcubed-ojdk knows more, they originated from him (9 years old comment). > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 477: > >> 475: if (obj->mark_acquire().has_monitor()) { >> 476: if (_length > 0 && _contended_oops[_length-1] == obj) { >> 477: // assert(VM_Version::supports_recursive_lightweight_locking(), "must be"); > > Uncomment or remove assert? Yeah not sure why it was ever uncommented. To me it seems like that the assert should be invariant. But will investigate. > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 554: > >> 552: bool _no_safepoint; >> 553: union { >> 554: struct {} _dummy; > > Uhh ... Why does this need to be wrapped in a union and struct? A poor man's optional. > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 563: > >> 561: assert(locking_thread == current || locking_thread->is_obj_deopt_suspend(), "locking_thread may not run concurrently"); >> 562: if (_no_safepoint) { >> 563: ::new (&_nsv) NoSafepointVerifier(); > > I'm thinking that it might be easier and cleaner to just re-do what the NoSafepointVerifier does? It just calls thread->inc/dec > _no_safepoint_count(). I wanted to avoid having to add `NoSafepointVerifier` implementation details in the synchroniser code. I guess `ContinuationWrapper` already does this. Simply creating a `NoSafepointVerifier` when you expect no safepoint is more obvious to me, shows the intent better. > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 748: > >> 746: } >> 747: >> 748: // Fast-locking does not use the 'lock' argument. > > I believe the comment is outdated. Removed. > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 969: > >> 967: >> 968: for (;;) { >> 969: // Fetch the monitor from the table > > Wrong intendation. Fixed. > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 1157: > >> 1155: // enter can block for safepoints; clear the unhandled object oop >> 1156: PauseNoSafepointVerifier pnsv(&nsv); >> 1157: object = nullptr; > > What is the point of that statement? object is not an out-arg (afaict), and not used subsequently. `CHECK_UNHANDLED_OOPS` + `-XX:+CheckUnhandledOops` https://github.com/openjdk/jdk/blob/15997bc3dfe9dddf21f20fa189f97291824892de/src/hotspot/share/oops/oopsHierarchy.hpp#L53-L55 > src/hotspot/share/runtime/lightweightSynchronizer.hpp line 68: > >> 66: static void exit(oop object, JavaThread* current); >> 67: >> 68: static ObjectMonitor* inflate_into_object_header(Thread* current, JavaThread* inflating_thread, oop object, const ObjectSynchronizer::InflateCause cause); > > My IDE flags this with a warning 'Parameter 'cause' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions' *shrugs* Yeah. The only effect is has is that you cannot reassign the variable. It was the style taken from [synchronizer.hpp](https://github.com/openjdk/jdk/blob/15997bc3dfe9dddf21f20fa189f97291824892de/src/hotspot/share/runtime/synchronizer.hpp) where all `InflateCause` parameters are const. > src/hotspot/share/runtime/lockStack.inline.hpp line 232: > >> 230: oop obj = monitor->object_peek(); >> 231: assert(obj != nullptr, "must be alive"); >> 232: assert(monitor == LightweightSynchronizer::get_monitor_from_table(JavaThread::current(), obj), "must be exist in table"); > > "must be exist in table" -> "must exist in table" Done. > src/hotspot/share/runtime/objectMonitor.cpp line 56: > >> 54: #include "runtime/safepointMechanism.inline.hpp" >> 55: #include "runtime/sharedRuntime.hpp" >> 56: #include "runtime/synchronizer.hpp" > > This include is not used. Removed. > src/hotspot/share/runtime/objectMonitor.hpp line 193: > >> 191: ObjectWaiter* volatile _WaitSet; // LL of threads wait()ing on the monitor >> 192: volatile int _waiters; // number of waiting threads >> 193: private: > > You can now also remove the 'private:' here Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240569 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240591 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240598 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240629 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240633 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240644 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240655 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240709 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240664 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240684 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240695 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240712 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240735 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240747 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240787 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240807 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240936 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677241002 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677241011 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677241037 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677241082 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677241093 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677241121 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677241145 From dholmes at openjdk.org Mon Jul 15 05:28:51 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 15 Jul 2024 05:28:51 GMT Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers [v2] In-Reply-To: <0j_XZ2e84ADGz8jxk21pFyF0QNhubV0i7sVi5sxnSyg=.7281e6d1-bf24-49f1-96a6-8284c4c9f90d@github.com> References: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com> <WSVnDVWEq7cIaiEd2-pdWW4Il8Qi4wwvjF2yyveKcgM=.613045d7-a827-4f3d-bcf4-ba9200a2c8f4@github.com> <t3K5QhtFrCpM4EoXc_pskncDv72bSfKgUKfguzjVI0Q=.4e5b01d1-9cad-45ec-8d70-656615bee374@github.com> <0j_XZ2e84ADGz8jxk21pFyF0QNhubV0i7sVi5sxnSyg=.7281e6d1-bf24-49f1-96a6-8284c4c9f90d@github.com> Message-ID: <G5EBaq25gdUcR-5HHsF3Bg8vvpXImOqwnKZbIht8LMI=.07dd543a-1442-495b-97cd-c2bffe268949@github.com> On Fri, 12 Jul 2024 09:14:00 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: >> src/hotspot/share/classfile/javaClasses.cpp line 3018: >> >>> 3016: int flags = (jushort)( m->access_flags().as_short() & JVM_RECOGNIZED_METHOD_MODIFIERS ); >>> 3017: if (m->is_object_initializer()) { >>> 3018: flags |= java_lang_invoke_MemberName::MN_IS_CONSTRUCTOR; >> >> I'm going to assume that `clinit` would already get filtered out at some point otherwise this would be a change in behaviour. > > No, it is not filtered, we still have `clinit`-s on this path. In the initial version https://github.com/openjdk/jdk/pull/20120/commits/1a0d18f1333866ab2eceb02b30c0fe363473d4e6#diff-a8ed79cab8961103a78187704b7a14fd00b322da06e75518bcfd888d9b940040R3020 I caught the assert in many tests, mostly in stack traces generation. > > Yes, this changes the behavior: `clinit` would now be recorded as "method", instead of "constructor". Tracing back the uses of `get_flags`: it is used for initializing `java.lang.ClassFrameInfo.flags`. There seem to be no readers for this field in VM. Java side for `j.l.CFI` does not seem to check any method/constructor flags. So I would say this change in behavior is not really visible, and there is no need to try and keep the old (odd) behavior. Okay, such a change in behaviour was unexpected for a "cleanup" PR. I'm looking into it now. Perhaps @mlchung can comment? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20120#discussion_r1677324623 From dnsimon at openjdk.org Mon Jul 15 08:30:51 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 15 Jul 2024 08:30:51 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2] In-Reply-To: <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> Message-ID: <xS-SSkFYhZMr5bOOD766HSBI2qeZwCap1zxpH8FakX8=.c3488826-41a7-40b5-858e-531b0156a909@github.com> On Fri, 12 Jul 2024 20:59:26 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. >> >> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. >> >> In this PR: >> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it. >> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. >> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. >> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. >> >> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: >> >> >> Benchmark Threads Mode Cnt Score Error Units >> ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op >> ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op >> ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op >> ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op >> ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op >> ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op >> ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op >> ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op >> ConcurrentClose.sharedAccess 1 avgt 10 ... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > track has_scoped_access for compiled methods src/hotspot/share/prims/scopedMemoryAccess.cpp line 179: > 177: // > 178: // The safepoint at which we're stopped may be in between the liveness check > 179: // and actual memory access, but is itself 'outside' of @Scoped code what is `@Scoped code`? I don't see that annotation mentioned here: https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/ScopedValue.html ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20158#discussion_r1677474756 From dnsimon at openjdk.org Mon Jul 15 08:43:53 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 15 Jul 2024 08:43:53 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2] In-Reply-To: <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> Message-ID: <ptpaZ2nDeFW4XCq-qpHEWSCXxYQRhvvpO8Ol2Zo0fyE=.83707754-6088-465c-85bd-ea1ac96af034@github.com> On Fri, 12 Jul 2024 20:59:26 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. >> >> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. >> >> In this PR: >> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it. >> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. >> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. >> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. >> >> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: >> >> >> Benchmark Threads Mode Cnt Score Error Units >> ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op >> ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op >> ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op >> ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op >> ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op >> ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op >> ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op >> ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op >> ConcurrentClose.sharedAccess 1 avgt 10 ... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > track has_scoped_access for compiled methods src/hotspot/share/jvmci/jvmciRuntime.cpp line 2186: > 2184: nm->set_has_wide_vectors(has_wide_vector); > 2185: nm->set_has_monitors(has_monitors); > 2186: nm->set_has_scoped_access(true); // conservative What does "conservative" imply here? That is, what performance penalty will be incurred for Graal compiled code until it completely supports this "scoped access" bit? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20158#discussion_r1677490130 From uschindler at openjdk.org Mon Jul 15 08:43:54 2024 From: uschindler at openjdk.org (Uwe Schindler) Date: Mon, 15 Jul 2024 08:43:54 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2] In-Reply-To: <xS-SSkFYhZMr5bOOD766HSBI2qeZwCap1zxpH8FakX8=.c3488826-41a7-40b5-858e-531b0156a909@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> <xS-SSkFYhZMr5bOOD766HSBI2qeZwCap1zxpH8FakX8=.c3488826-41a7-40b5-858e-531b0156a909@github.com> Message-ID: <iZUCNDNa_4a6TemwjTTiax82KYrB8ZVnmN4csEC58Ek=.858f62d2-a36a-405d-95a9-31b04ec0ac00@github.com> On Mon, 15 Jul 2024 08:28:16 GMT, Doug Simon <dnsimon at openjdk.org> wrote: >> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: >> >> track has_scoped_access for compiled methods > > src/hotspot/share/prims/scopedMemoryAccess.cpp line 179: > >> 177: // >> 178: // The safepoint at which we're stopped may be in between the liveness check >> 179: // and actual memory access, but is itself 'outside' of @Scoped code > > what is `@Scoped code`? I don't see that annotation mentioned here: https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/ScopedValue.html This is the whole magic around the shared arena. It is not public API and internal to Hotspot/VM: - https://github.com/openjdk/jdk/blob/a96de6d8d273d75a6500e10ed06faab9955f893b/src/java.base/share/classes/jdk/internal/misc/X-ScopedMemoryAccess.java.template#L117-L119 - https://github.com/openjdk/jdk/blob/a96de6d8d273d75a6500e10ed06faab9955f893b/src/hotspot/share/prims/scopedMemoryAccess.cpp#L143-L149 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20158#discussion_r1677486942 From alanb at openjdk.org Mon Jul 15 08:43:54 2024 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 15 Jul 2024 08:43:54 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2] In-Reply-To: <iZUCNDNa_4a6TemwjTTiax82KYrB8ZVnmN4csEC58Ek=.858f62d2-a36a-405d-95a9-31b04ec0ac00@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> <xS-SSkFYhZMr5bOOD766HSBI2qeZwCap1zxpH8FakX8=.c3488826-41a7-40b5-858e-531b0156a909@github.com> <iZUCNDNa_4a6TemwjTTiax82KYrB8ZVnmN4csEC58Ek=.858f62d2-a36a-405d-95a9-31b04ec0ac00@github.com> Message-ID: <GzjVKiahxVjaGdUKwOkTOFDkGSexn-alBu30Igy4SDA=.027c7042-4777-4943-84dd-f018dd038a45@github.com> On Mon, 15 Jul 2024 08:38:59 GMT, Uwe Schindler <uschindler at openjdk.org> wrote: >> src/hotspot/share/prims/scopedMemoryAccess.cpp line 179: >> >>> 177: // >>> 178: // The safepoint at which we're stopped may be in between the liveness check >>> 179: // and actual memory access, but is itself 'outside' of @Scoped code >> >> what is `@Scoped code`? I don't see that annotation mentioned here: https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/ScopedValue.html > > This is the whole magic around the shared arena. It is not public API and internal to Hotspot/VM: > - https://github.com/openjdk/jdk/blob/a96de6d8d273d75a6500e10ed06faab9955f893b/src/java.base/share/classes/jdk/internal/misc/X-ScopedMemoryAccess.java.template#L117-L119 > - https://github.com/openjdk/jdk/blob/a96de6d8d273d75a6500e10ed06faab9955f893b/src/hotspot/share/prims/scopedMemoryAccess.cpp#L143-L149 > what is `@Scoped code`? I don't see that annotation mentioned here: https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/ScopedValue.html This is nothing to do with scoped values, instead this is an annotation declared in jdk.internal.misc.ScopedMemoryAccess that is known to the VM. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20158#discussion_r1677489415 From uschindler at openjdk.org Mon Jul 15 08:53:52 2024 From: uschindler at openjdk.org (Uwe Schindler) Date: Mon, 15 Jul 2024 08:53:52 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2] In-Reply-To: <GzjVKiahxVjaGdUKwOkTOFDkGSexn-alBu30Igy4SDA=.027c7042-4777-4943-84dd-f018dd038a45@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> <xS-SSkFYhZMr5bOOD766HSBI2qeZwCap1zxpH8FakX8=.c3488826-41a7-40b5-858e-531b0156a909@github.com> <iZUCNDNa_4a6TemwjTTiax82KYrB8ZVnmN4csEC58Ek=.858f62d2-a36a-405d-95a9-31b04ec0ac00@github.com> <GzjVKiahxVjaGdUKwOkTOFDkGSexn-alBu30Igy4SDA=.027c7042-4777-4943-84dd-f018dd038a45@github.com> Message-ID: <L1lKorlvjPi33N65zUk7wYlXZtSqxIoVUNY-o66Q6dw=.6a05ef06-56fb-42d7-8fa1-f2d4ecf769b9@github.com> On Mon, 15 Jul 2024 08:41:01 GMT, Alan Bateman <alanb at openjdk.org> wrote: >> This is the whole magic around the shared arena. It is not public API and internal to Hotspot/VM: >> - https://github.com/openjdk/jdk/blob/a96de6d8d273d75a6500e10ed06faab9955f893b/src/java.base/share/classes/jdk/internal/misc/X-ScopedMemoryAccess.java.template#L117-L119 >> - https://github.com/openjdk/jdk/blob/a96de6d8d273d75a6500e10ed06faab9955f893b/src/hotspot/share/prims/scopedMemoryAccess.cpp#L143-L149 > >> what is `@Scoped code`? I don't see that annotation mentioned here: https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/ScopedValue.html > > This is nothing to do with scoped values, instead this is an annotation declared in jdk.internal.misc.ScopedMemoryAccess that is known to the VM. Basically if the VM is inside a `@Scoped` method and it starts a thread-local handshake, it will deoptimize top-most frame of all those threads so they can do the "isAlive" check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20158#discussion_r1677501161 From mcimadamore at openjdk.org Mon Jul 15 08:56:54 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 15 Jul 2024 08:56:54 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2] In-Reply-To: <LjCucUevFLYVoUMkuwCFQVefc4XJOe4LhnKyzKgv7dc=.45bba479-3885-4c34-a9cf-d737d67cb432@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> <LjCucUevFLYVoUMkuwCFQVefc4XJOe4LhnKyzKgv7dc=.45bba479-3885-4c34-a9cf-d737d67cb432@github.com> Message-ID: <C1dTpyl6SMMzerAW2RzQMIKnEFfGbSR-b7Y3igJvfeQ=.be2c9b83-ac19-429c-bf4a-53075c6a4ec6@github.com> On Sun, 14 Jul 2024 11:01:58 GMT, Uwe Schindler <uschindler at openjdk.org> wrote: > I have one problem with the benchmark: I think it is not measuring the whole setup in a way that is our workload: The basic problem is that we don't want to deoptimize threads which are not related to MemorySegments. So basically, the throughput of those threads should not be affected. For threads currently in a memory-segment read it should have a bit of effect, but it should recover fast. IMHO there is a bit of confusion in this discussion. When we say that a shared arena close operation is slow, we might mean one of two things: 1. calling the `close()` method itself is slow (this is what the benchmark effectively measures) 2. throughput of unrelated threads is affected (I think this is what Lucene is seeing) Addressing (2) than (1) (in the sense that, if you sign up for a shared arena close, you know it's going to be deterministic, but expensive, as the javadoc itself admits). For this reason, I'm unsure about some of the "delaying tactics" I see mentioned here: if we delay the underlying "free"/"unmap" operation, this is only going to affect (1). You still need some global operation (e.g. handshake) to make sure all threads agree on the segment state. Moving the cost of the free/unmap from one place to another is not really going to do much for (2). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228002760 From luhenry at openjdk.org Mon Jul 15 08:58:53 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 15 Jul 2024 08:58:53 GMT Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v4] In-Reply-To: <FZMjsZWO9NKx4v5svo8qQPE5HKqvoiM-lc0oiDCah80=.2d250429-524a-4e93-a453-bf1db0238626@github.com> References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com> <FZMjsZWO9NKx4v5svo8qQPE5HKqvoiM-lc0oiDCah80=.2d250429-524a-4e93-a453-bf1db0238626@github.com> Message-ID: <yRaSxAW_ivL6zaoRe4-MDVu1yockrIPgGh8EhghgeHM=.832b8741-972c-4725-9b8e-f0e2597228f6@github.com> On Tue, 2 Jul 2024 14:16:35 GMT, Hamlin Li <mli at openjdk.org> wrote: >> Hi, >> Can you help to review the patch? >> >> I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it. >> >> Thanks. >> >> ## Test >> benchmarks run on CanVM-K230 (vlenb == 16), and banana-pi (vlenb == 32) >> >> I've tried several implementations, respectively with vector group >> * m2+m1+scalar >> * m2+scalar >> * m1+scalar >> * pure scalar >> The best one is combination of m2+m1, it have best performance in all source size. >> >> ### K230 >> >> this implementation (m2+m1) >> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> >> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic >> -- | -- | -- | -- | -- | -- | -- | -- | -- >> Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063 >> Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443 >> Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364 >> Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856 >> Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146 >> Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423 >> Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862 >> Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469 >> Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631 >> Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698 >> Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876 >> >> </google-sheets-html-origin> >> >> vector with only m2 >> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 4... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > move label Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19973#pullrequestreview-2177185304 From mcimadamore at openjdk.org Mon Jul 15 09:00:52 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 15 Jul 2024 09:00:52 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2] In-Reply-To: <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> Message-ID: <1x3PmjfjQXR2h3k8UlLT0N9_yvLbNw_cn3O7NRLDt_U=.68c48202-af31-40ed-836b-ecafd051113f@github.com> On Fri, 12 Jul 2024 20:59:26 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. >> >> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. >> >> In this PR: >> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it. >> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. >> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. >> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. >> >> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: >> >> >> Benchmark Threads Mode Cnt Score Error Units >> ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op >> ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op >> ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op >> ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op >> ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op >> ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op >> ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op >> ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op >> ConcurrentClose.sharedAccess 1 avgt 10 ... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > track has_scoped_access for compiled methods test/micro/org/openjdk/bench/java/lang/foreign/ConcurrentClose.java line 34: > 32: import static java.lang.foreign.ValueLayout.*; > 33: > 34: @BenchmarkMode(Mode.AverageTime) Doesn't the existing bench `MemorySessionClose` already covers this? That benchmark has three stress modes, and one of them spawns many unrelated threads (but there is only one thread doing a close). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20158#discussion_r1677508532 From uschindler at openjdk.org Mon Jul 15 09:00:53 2024 From: uschindler at openjdk.org (Uwe Schindler) Date: Mon, 15 Jul 2024 09:00:53 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2] In-Reply-To: <1x3PmjfjQXR2h3k8UlLT0N9_yvLbNw_cn3O7NRLDt_U=.68c48202-af31-40ed-836b-ecafd051113f@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> <1x3PmjfjQXR2h3k8UlLT0N9_yvLbNw_cn3O7NRLDt_U=.68c48202-af31-40ed-836b-ecafd051113f@github.com> Message-ID: <yLIZx6_mKpBkcqNL2CnMkDAhHwN0XbP85IHCyPZl__w=.79858167-e0c3-427f-a5d8-435b5745e6c9@github.com> On Mon, 15 Jul 2024 08:57:08 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: >> >> track has_scoped_access for compiled methods > > test/micro/org/openjdk/bench/java/lang/foreign/ConcurrentClose.java line 34: > >> 32: import static java.lang.foreign.ValueLayout.*; >> 33: >> 34: @BenchmarkMode(Mode.AverageTime) > > Doesn't the existing bench `MemorySessionClose` already covers this? That benchmark has three stress modes, and one of them spawns many unrelated threads (but there is only one thread doing a close). It should also run threads not doing any scoped accesses to verify that other threads are not affected. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20158#discussion_r1677509975 From uschindler at openjdk.org Mon Jul 15 09:04:54 2024 From: uschindler at openjdk.org (Uwe Schindler) Date: Mon, 15 Jul 2024 09:04:54 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2] In-Reply-To: <C1dTpyl6SMMzerAW2RzQMIKnEFfGbSR-b7Y3igJvfeQ=.be2c9b83-ac19-429c-bf4a-53075c6a4ec6@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> <LjCucUevFLYVoUMkuwCFQVefc4XJOe4LhnKyzKgv7dc=.45bba479-3885-4c34-a9cf-d737d67cb432@github.com> <C1dTpyl6SMMzerAW2RzQMIKnEFfGbSR-b7Y3igJvfeQ=.be2c9b83-ac19-429c-bf4a-53075c6a4ec6@github.com> Message-ID: <kt2ziWwc2mDqXYZCUEdDxEnqAcdmjrymlUnbOZN4TJg=.b8f7f731-e8e0-43ee-8b13-5bd8ab8ef5b8@github.com> On Mon, 15 Jul 2024 08:54:11 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: > > I have one problem with the benchmark: I think it is not measuring the whole setup in a way that is our workload: The basic problem is that we don't want to deoptimize threads which are not related to MemorySegments. So basically, the throughput of those threads should not be affected. For threads currently in a memory-segment read it should have a bit of effect, but it should recover fast. > > IMHO there is a bit of confusion in this discussion. When we say that a shared arena close operation is slow, we might mean one of two things: > > 1. calling the `close()` method itself is slow (this is what the benchmark effectively measures) > 2. throughput of unrelated threads is affected (I think this is what Lucene is seeing) > > Addressing (2) than (1) (in the sense that, if you sign up for a shared arena close, you know it's going to be deterministic, but expensive, as the javadoc itself admits). I fully agree, we mixed two different approaches. The problem is that the benchmark measures both, 1 and 2 per thread. To see an effect of this change, the benchmark should have 3 types of threads: One only closing arenas, another set that consumes scoped memory and a third group doing totally unrelated stuff. > For this reason, I'm unsure about some of the "delaying tactics" I see mentioned here: if we delay the underlying "free"/"unmap" operation, this is only going to affect (1). You still need some global operation (e.g. handshake) to make sure all threads agree on the segment state. Moving the cost of the free/unmap from one place to another is not really going to do much for (2). This is indeed unrelated. It is just an idea I also thorught of. In Apache Lucene we are mostly interested to close the shared arena as soon as possible. We don't need to make sure it is closed after the "close" call finished (we don't care), but we can't wait until GC closes the arena possibly after hours or even days. The reason for the latter is that the Arena is a small, long-living instance and GC does not want to free it, as there is no pressure. So basically for us it would be best to trigger the close and then do other stuff. Of course we can do that in a separate thread (this is my idea how to improve the closes in lucene). The only problem is that Lucene does not have own threadpools, so this would be responsibility of the caller to possibly close our indexes in a separate thread (and a single one only). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228018619 From mcimadamore at openjdk.org Mon Jul 15 09:14:53 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 15 Jul 2024 09:14:53 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2] In-Reply-To: <kt2ziWwc2mDqXYZCUEdDxEnqAcdmjrymlUnbOZN4TJg=.b8f7f731-e8e0-43ee-8b13-5bd8ab8ef5b8@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> <LjCucUevFLYVoUMkuwCFQVefc4XJOe4LhnKyzKgv7dc=.45bba479-3885-4c34-a9cf-d737d67cb432@github.com> <C1dTpyl6SMMzerAW2RzQMIKnEFfGbSR-b7Y3igJvfeQ=.be2c9b83-ac19-429c-bf4a-53075c6a4ec6@github.com> <kt2ziWwc2mDqXYZCUEdDxEnqAcdmjrymlUnbOZN4TJg=.b8f7f731-e8e0-43ee-8b13-5bd8ab8ef5b8@github.com> Message-ID: <LVyAn4d_sDCGkBiHz3Xyo7phPDe9zvNsHAw81TetnSw=.d18de3cb-8b55-4de2-93ce-964d30ac1dd4@github.com> On Mon, 15 Jul 2024 09:02:29 GMT, Uwe Schindler <uschindler at openjdk.org> wrote: > One only closing arenas, another set that consumes scoped memory and a third group doing totally unrelated stuff. Exactly. My general feeling is that the cost of handshaking a thread dominates everything else, so doing improvements around e.g. avoiding unnecessary deoptimization (as in this PR) is not going to help much, even for threads doing unrelated stuff, but I'd be happy to be proven wrong. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228035799 From shade at openjdk.org Mon Jul 15 09:17:55 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 15 Jul 2024 09:17:55 GMT Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers [v2] In-Reply-To: <G5EBaq25gdUcR-5HHsF3Bg8vvpXImOqwnKZbIht8LMI=.07dd543a-1442-495b-97cd-c2bffe268949@github.com> References: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com> <WSVnDVWEq7cIaiEd2-pdWW4Il8Qi4wwvjF2yyveKcgM=.613045d7-a827-4f3d-bcf4-ba9200a2c8f4@github.com> <t3K5QhtFrCpM4EoXc_pskncDv72bSfKgUKfguzjVI0Q=.4e5b01d1-9cad-45ec-8d70-656615bee374@github.com> <0j_XZ2e84ADGz8jxk21pFyF0QNhubV0i7sVi5sxnSyg=.7281e6d1-bf24-49f1-96a6-8284c4c9f90d@github.com> <G5EBaq25gdUcR-5HHsF3Bg8vvpXImOqwnKZbIht8LMI=.07dd543a-1442-495b-97cd-c2bffe268949@github.com> Message-ID: <5Xt9rNCHwYnwvFMglf_Yp5ZzwKEDNrmRecR_NrFLGMA=.7aa1fef1-a977-4244-ad24-df9897bb2743@github.com> On Mon, 15 Jul 2024 05:26:30 GMT, David Holmes <dholmes at openjdk.org> wrote: >> No, it is not filtered, we still have `clinit`-s on this path. In the initial version https://github.com/openjdk/jdk/pull/20120/commits/1a0d18f1333866ab2eceb02b30c0fe363473d4e6#diff-a8ed79cab8961103a78187704b7a14fd00b322da06e75518bcfd888d9b940040R3020 I caught the assert in many tests, mostly in stack traces generation. >> >> Yes, this changes the behavior: `clinit` would now be recorded as "method", instead of "constructor". Tracing back the uses of `get_flags`: it is used for initializing `java.lang.ClassFrameInfo.flags`. There seem to be no readers for this field in VM. Java side for `j.l.CFI` does not seem to check any method/constructor flags. So I would say this change in behavior is not really visible, and there is no need to try and keep the old (odd) behavior. > > Okay, such a change in behaviour was unexpected for a "cleanup" PR. I'm looking into it now. Perhaps @mlchung can comment? Yeah, this is not really a cleanup (behaviors stay the same) change. For this particular hunk, keeping the old behavior seems to be unnecessary work. Note that we are also changing the behavior in C2: both in `do_exits` we no longer emit the barriers for `static final` stores in `clinits`, plus EA does not care about `clinits` anymore as well. Those are also behavioral changes. If you prefer, I can turn this PR into a behaviorally similar cleanup, and do the behavior changes separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20120#discussion_r1677530301 From mcimadamore at openjdk.org Mon Jul 15 09:21:53 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 15 Jul 2024 09:21:53 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2] In-Reply-To: <LVyAn4d_sDCGkBiHz3Xyo7phPDe9zvNsHAw81TetnSw=.d18de3cb-8b55-4de2-93ce-964d30ac1dd4@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> <LjCucUevFLYVoUMkuwCFQVefc4XJOe4LhnKyzKgv7dc=.45bba479-3885-4c34-a9cf-d737d67cb432@github.com> <C1dTpyl6SMMzerAW2RzQMIKnEFfGbSR-b7Y3igJvfeQ=.be2c9b83-ac19-429c-bf4a-53075c6a4ec6@github.com> <kt2ziWwc2mDqXYZCUEdDxEnqAcdmjrymlUnbOZN4TJg=.b8f7f731-e8e0-43ee-8b13-5bd8ab8ef5b8@github.com> <LVyAn4d_sDCGkBiHz3Xyo7phPDe9zvNsHAw81TetnSw=.d18de3cb-8b55-4de2-93ce-964d30ac1dd4@github.com> Message-ID: <7_MKD2O70VqPmWUn5_TcL3AZ-yT8iB6uv7zk8s9xIDQ=.57d7a025-d31c-4f32-a55c-c919490218e3@github.com> On Mon, 15 Jul 2024 09:11:53 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: > avoiding unnecessary deoptimization (as in this PR) is not going to help much, What would definitively help is to somehow reduce the number of threads to handshake when calling close - e.g. have an arena that is shared but only to a *group* of thread. We can do that easily using structured concurrency. But for unstructured code there's not a lot that can be done, as there's no way for the runtime to guess which threads can access segments created by a given arena. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228046170 From uschindler at openjdk.org Mon Jul 15 09:21:53 2024 From: uschindler at openjdk.org (Uwe Schindler) Date: Mon, 15 Jul 2024 09:21:53 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2] In-Reply-To: <7_MKD2O70VqPmWUn5_TcL3AZ-yT8iB6uv7zk8s9xIDQ=.57d7a025-d31c-4f32-a55c-c919490218e3@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> <LjCucUevFLYVoUMkuwCFQVefc4XJOe4LhnKyzKgv7dc=.45bba479-3885-4c34-a9cf-d737d67cb432@github.com> <C1dTpyl6SMMzerAW2RzQMIKnEFfGbSR-b7Y3igJvfeQ=.be2c9b83-ac19-429c-bf4a-53075c6a4ec6@github.com> <kt2ziWwc2mDqXYZCUEdDxEnqAcdmjrymlUnbOZN4TJg=.b8f7f731-e8e0-43ee-8b13-5bd8ab8ef5b8@github.com> <LVyAn4d_sDCGkBiHz3Xyo7phPDe9zvNsHAw81TetnSw=.d18de3cb-8b55-4de2-93ce-964d30ac1dd4@github.com> <7_MKD2O70VqPmWUn5_TcL3AZ-yT8iB6uv7zk8s9xIDQ=.57d7a025-d31c-4f32-a55c-c919490218e3@github.com> Message-ID: <A_-nlvRr4xrHguQhAL5cMzT7IDkySpqbAKHvHWDjfm8=.9727b308-9350-4315-8d35-8836650884e0@github.com> On Mon, 15 Jul 2024 09:17:31 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >>> One only closing arenas, another set that consumes scoped memory and a third group doing totally unrelated stuff. >> >> Exactly. My general feeling is that the cost of handshaking a thread dominates everything else, so doing improvements around e.g. avoiding unnecessary deoptimization (as in this PR) is not going to help much, even for threads doing unrelated stuff, but I'd be happy to be proven wrong. > >> avoiding unnecessary deoptimization (as in this PR) is not going to help much, > > What would definitively help is to somehow reduce the number of threads to handshake when calling close - e.g. have an arena that is shared but only to a *group* of thread. We can do that easily using structured concurrency. But for unstructured code there's not a lot that can be done, as there's no way for the runtime to guess which threads can access segments created by a given arena. @mcimadamore: FYI, at the moment we are working on grouping mmapped files together (by their index segment file pattern) and use the same arena for multiple index files. Because those are closed together we can use a refcounted aproach. All files of a group (the index segment name) share the same arena and this one is closed after last file in group is closed: https://github.com/apache/lucene/pull/13570 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228049242 From shipilev at amazon.de Mon Jul 15 10:00:48 2024 From: shipilev at amazon.de (Aleksey Shipilev) Date: Mon, 15 Jul 2024 12:00:48 +0200 Subject: RFR: 8333791: Fix memory barriers for @Stable fields In-Reply-To: <CAMOCf+i5Eb8xFMPw_+eeSpyXcFEiXeEacLk0ZqYQBxCpHkxDxg@mail.gmail.com> References: <mailman.15905.1720688298.324.hotspot-dev@openjdk.org> <CAMOCf+i5Eb8xFMPw_+eeSpyXcFEiXeEacLk0ZqYQBxCpHkxDxg@mail.gmail.com> Message-ID: <fa562ac5-4d0d-4d15-ad4e-b03e97eef5b0@amazon.de> Hi Hans, On 13.07.24 02:36, Hans Boehm wrote: > No opinion on the?merits here. But IIUC, "as memory safe as finals" is a slightly squishy notion > here. The downside of not having the release fence is that even with safe publication, a write to > an?@Stable field outside the constructor can be seen by a read in the constructor, before the object > is published. That's arguably weirder than final field behavior, and not something that can arise > with final fields. But it still only happens in the presence of data races, and thus probably?not in > code you should be writing anyway. Agreed. "Hans Boehm's argument for doing release in initializers" [1] lives in my head rent-free :) In the presence of @Stable writes outside of constructors, users of @Stable are basically on their own, and are responsible for proper fencing if data races are not benign. Including putting the release/seqcst fences in constructors if constructors read the @Stable fields back. I think not giving into handling these corner cases by default gives us a reasonable performance/safety model for @Stable: https://github.com/openjdk/jdk/pull/19635#issuecomment-2222413725 -Aleksey [1] https://www.hboehm.info/c++mm/no_write_fences.html Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597 From jvernee at openjdk.org Mon Jul 15 10:28:52 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 15 Jul 2024 10:28:52 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2] In-Reply-To: <ptpaZ2nDeFW4XCq-qpHEWSCXxYQRhvvpO8Ol2Zo0fyE=.83707754-6088-465c-85bd-ea1ac96af034@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> <ptpaZ2nDeFW4XCq-qpHEWSCXxYQRhvvpO8Ol2Zo0fyE=.83707754-6088-465c-85bd-ea1ac96af034@github.com> Message-ID: <1MbFi_08NlZRB0wF-sBB_JnNzHr4DjDdOa6hkGmXjjY=.ba6c7bcb-88d9-4fb6-b817-2b2527934931@github.com> On Mon, 15 Jul 2024 08:41:38 GMT, Doug Simon <dnsimon at openjdk.org> wrote: >> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: >> >> track has_scoped_access for compiled methods > > src/hotspot/share/jvmci/jvmciRuntime.cpp line 2186: > >> 2184: nm->set_has_wide_vectors(has_wide_vector); >> 2185: nm->set_has_monitors(has_monitors); >> 2186: nm->set_has_scoped_access(true); // conservative > > What does "conservative" imply here? That is, what performance penalty will be incurred for Graal compiled code until it completely supports this "scoped access" bit? It means we will always deoptimize a top-most frame of any thread, when closing a shared arena, and it is compiled by Graal. (This is a one-off deoptimization though. The compiled code is not thrown away). It essentially matches the current behavior before this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20158#discussion_r1677613801 From fgao at openjdk.org Mon Jul 15 10:52:53 2024 From: fgao at openjdk.org (Fei Gao) Date: Mon, 15 Jul 2024 10:52:53 GMT Subject: RFR: 8336245: AArch64: remove extra register copy when converting from long to pointer In-Reply-To: <BDLL94Te55nHGCUuLtN6qIQynYIWdux300wtYvdxbkU=.0bdaf9f1-cb7e-403c-96f8-3b3ba69f8484@github.com> References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> <2Ln6-ZIklVFgsBWZmmyOU2G-wZmknxjsoT1xcTKSXDc=.54473598-6e15-43d1-9e5f-95c796d11066@github.com> <BDLL94Te55nHGCUuLtN6qIQynYIWdux300wtYvdxbkU=.0bdaf9f1-cb7e-403c-96f8-3b3ba69f8484@github.com> Message-ID: <h9Apat3UK9nQkjKkieLkCq2ZhNNk73JI8sELbHsEHZk=.086a01f3-b0e5-4d8f-9e34-ca88084a07d8@github.com> On Fri, 12 Jul 2024 14:34:20 GMT, Andrew Haley <aph at openjdk.org> wrote: >> src/hotspot/share/opto/machnode.cpp line 400: >> >>> 398: >>> 399: if (t->isa_intptr_t() && >>> 400: #if !defined(AARCH64) >> >> After applying the operand "IndirectX2P", we may have some patterns like: >> >> str val, [CastX2P base] >> >> The code path here will resolve the `base`, which is actually a `intptr`, not a `ptr`, and the offset is `0`. >> >> I guess the code here was intended to support `[base, offset]`, where base can be a `intptr` but offset can not be `0`. I'm not sure why there is such a limitation that offset can not be `0`, maybe for some old machines? >> >> I don't think the limitation is applied to aarch64 machines now. So I unblock it for aarch64. > > I think it's the other way around. Isn't this code saying that if the address is an intptr + a nonzero offset, then the returned type is bottom, ie nothing? What effect does this change have? Thanks for review! Yeah, this code says if the address is an `intptr` + a nonzero offset, then return `TypeRawPtr::BOTTOM`. Then it continues [the verification](https://github.com/openjdk/jdk/blob/a96de6d8d273d75a6500e10ed06faab9955f893b/src/hotspot/share/opto/matcher.cpp#L1834). Without the change here, an `intptr` + a `zero` offset would fail to assert on next lines, https://github.com/openjdk/jdk/blob/a96de6d8d273d75a6500e10ed06faab9955f893b/src/hotspot/share/opto/machnode.cpp#L409-L413 AFAIK, the return value `TypeRawPtr::BOTTOM` represents `raw access to memory` here. And an `intptr` + a `zero` offset is also a valid `raw access`, so I unblock it here. WDYT? Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20157#discussion_r1677638951 From jvernee at openjdk.org Mon Jul 15 10:52:53 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 15 Jul 2024 10:52:53 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2] In-Reply-To: <kt2ziWwc2mDqXYZCUEdDxEnqAcdmjrymlUnbOZN4TJg=.b8f7f731-e8e0-43ee-8b13-5bd8ab8ef5b8@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> <LjCucUevFLYVoUMkuwCFQVefc4XJOe4LhnKyzKgv7dc=.45bba479-3885-4c34-a9cf-d737d67cb432@github.com> <C1dTpyl6SMMzerAW2RzQMIKnEFfGbSR-b7Y3igJvfeQ=.be2c9b83-ac19-429c-bf4a-53075c6a4ec6@github.com> <kt2ziWwc2mDqXYZCUEdDxEnqAcdmjrymlUnbOZN4TJg=.b8f7f731-e8e0-43ee-8b13-5bd8ab8ef5b8@github.com> Message-ID: <GAhBy0q0U31weyO24JHCXijauSe2bk_J3kQ07qZ5s70=.19332107-6225-49a3-a8ee-d0bf79a65873@github.com> On Mon, 15 Jul 2024 09:02:29 GMT, Uwe Schindler <uschindler at openjdk.org> wrote: > Of course we can do that in a separate thread (this is my idea how to improve the closes in lucene). This is what I was thinking of as well. `close()` on a shared arena can be called by any thread, so it would be possible to have an executor service with 1-n threads that is dedicated to closing memory. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228215031 From aph at openjdk.org Mon Jul 15 11:03:50 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 15 Jul 2024 11:03:50 GMT Subject: RFR: 8336245: AArch64: remove extra register copy when converting from long to pointer In-Reply-To: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> Message-ID: <A7LqCA84i3ml2kFafMJr2_ENuyn9yW-KjBViIryuKBU=.8efd29b0-3636-4ef7-aa2c-dc92228cefc5@github.com> On Fri, 12 Jul 2024 13:44:25 GMT, Fei Gao <fgao at openjdk.org> wrote: > In the cases like: > > UNSAFE.putLong(address + off1 + 1030, lseed); > UNSAFE.putLong(address + 1023, lseed); > UNSAFE.putLong(address + off2 + 1001, lseed); > > > Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like: > > ldr R10, [R15, #120] # int ! Field: address > ldr R11, [R16, #136] # int ! Field: off1 > ldr R12, [R16, #144] # int ! Field: off2 > add R11, R11, R10 > mov R11, R11 # long -> ptr > add R12, R12, R10 > mov R10, R10 # long -> ptr > add R11, R11, #1030 # ptr > str R17, [R11] # int > add R10, R10, #1023 # ptr > str R17, [R10] # int > mov R10, R12 # long -> ptr > add R10, R10, #1001 # ptr > str R17, [R10] # int > > > In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly: > > ldr x10, [x15,#120] > ldp x11, x12, [x16,#136] > add x11, x11, x10 > add x12, x12, x10 > add x11, x11, #0x406 > str x17, [x11] > add x10, x10, #0x3ff > str x17, [x10] > mov x10, x12 <--- extra register copy > add x10, x10, #0x3e9 > str x17, [x10] > > > There is still one extra register copy, which we're trying to remove in this patch. > > This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition. > > Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so > > [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906 Marked as reviewed by aph (Reviewer). This will need quite a lot of testing, perhaps higher tiers and jcstress. You can test these two PRs together. ------------- PR Review: https://git.openjdk.org/jdk/pull/20157#pullrequestreview-2177415307 PR Comment: https://git.openjdk.org/jdk/pull/20157#issuecomment-2228232139 From aph at openjdk.org Mon Jul 15 11:03:51 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 15 Jul 2024 11:03:51 GMT Subject: RFR: 8336245: AArch64: remove extra register copy when converting from long to pointer In-Reply-To: <h9Apat3UK9nQkjKkieLkCq2ZhNNk73JI8sELbHsEHZk=.086a01f3-b0e5-4d8f-9e34-ca88084a07d8@github.com> References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> <2Ln6-ZIklVFgsBWZmmyOU2G-wZmknxjsoT1xcTKSXDc=.54473598-6e15-43d1-9e5f-95c796d11066@github.com> <BDLL94Te55nHGCUuLtN6qIQynYIWdux300wtYvdxbkU=.0bdaf9f1-cb7e-403c-96f8-3b3ba69f8484@github.com> <h9Apat3UK9nQkjKkieLkCq2ZhNNk73JI8sELbHsEHZk=.086a01f3-b0e5-4d8f-9e34-ca88084a07d8@github.com> Message-ID: <573t-GxA0Cutej_7Ei1rOdp90v3tpBsEgyMXr3STUPk=.d6791101-87d2-4fab-964d-a620a2be4d24@github.com> On Mon, 15 Jul 2024 10:50:32 GMT, Fei Gao <fgao at openjdk.org> wrote: >> I think it's the other way around. Isn't this code saying that if the address is an intptr + a nonzero offset, then the returned type is bottom, ie nothing? What effect does this change have? > > Thanks for review! Yeah, this code says if the address is an `intptr` + a nonzero offset, then return `TypeRawPtr::BOTTOM`. Then it continues [the verification](https://github.com/openjdk/jdk/blob/a96de6d8d273d75a6500e10ed06faab9955f893b/src/hotspot/share/opto/matcher.cpp#L1834). > > Without the change here, an `intptr` + a `zero` offset would fail to assert on next lines, https://github.com/openjdk/jdk/blob/a96de6d8d273d75a6500e10ed06faab9955f893b/src/hotspot/share/opto/machnode.cpp#L409-L413 > > AFAIK, the return value `TypeRawPtr::BOTTOM` represents `raw access to memory` here. And an `intptr` + a `zero` offset is also a valid `raw access`, so I unblock it here. WDYT? Thanks. I learn something every day, I guess. It's been a long while since I looked, but I expected "pointer to anything" to be TOP, not BOTTOM. Thinking some more, a `TypeRawPtr` must be casted to a usable physical type in order to use it, so "pointer to nothing" makes more sense than "pointer to anything". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20157#discussion_r1677648436 From jvernee at openjdk.org Mon Jul 15 11:33:30 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 15 Jul 2024 11:33:30 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v3] In-Reply-To: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> Message-ID: <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com> > This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. > > Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. > > In this PR: > - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it. > - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. > - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. > - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. > > I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: > > > Benchmark Threads Mode Cnt Score Error Units > ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op > ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op > ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op > ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op > ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op > ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op > ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op > ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op > ConcurrentClose.sharedAccess 1 avgt 10 52.042 ? 0.630 us/op > ConcurrentClose.conf... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: improve benchmark ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20158/files - new: https://git.openjdk.org/jdk/pull/20158/files/d1266b53..6d0b9b57 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=01-02 Stats: 28 lines in 1 file changed: 14 ins; 1 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/20158.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20158/head:pull/20158 PR: https://git.openjdk.org/jdk/pull/20158 From forax at openjdk.org Mon Jul 15 11:33:30 2024 From: forax at openjdk.org (=?UTF-8?B?UsOpbWk=?= Forax) Date: Mon, 15 Jul 2024 11:33:30 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2] In-Reply-To: <GAhBy0q0U31weyO24JHCXijauSe2bk_J3kQ07qZ5s70=.19332107-6225-49a3-a8ee-d0bf79a65873@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> <LjCucUevFLYVoUMkuwCFQVefc4XJOe4LhnKyzKgv7dc=.45bba479-3885-4c34-a9cf-d737d67cb432@github.com> <C1dTpyl6SMMzerAW2RzQMIKnEFfGbSR-b7Y3igJvfeQ=.be2c9b83-ac19-429c-bf4a-53075c6a4ec6@github.com> <kt2ziWwc2mDqXYZCUEdDxEnqAcdmjrymlUnbOZN4TJg=.b8f7f731-e8e0-43ee-8b13-5bd8ab8ef5b8@github.com> <GAhBy0q0U31weyO24JHCXijauSe2bk_J3kQ07qZ5s70=.19332107-6225-49a3-a8ee-d0bf79a65873@github.com> Message-ID: <IaCxpypcq5BsE5qkCekQQfHpbvkrcU60kDW8_y8bgn8=.64109d21-8177-4477-a950-1bb5a3317eec@github.com> On Mon, 15 Jul 2024 10:50:34 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: > This is what I was thinking of as well. close() on a shared arena can be called by any thread, so it would be possible to have an executor service with 1-n threads that is dedicated to closing memory. This delays both the closing of the Arena and the freeing of the segments, so bugs may be not discovered if the arena is accessed in between the time the thread pool is notified and the time the close() is effectively called. And you loose the structured part of the API, you can not use a try-with-resources anymore. I think that part can be fixed using a wrapper on top of Arena.ofShared(). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228280373 From jvernee at openjdk.org Mon Jul 15 11:49:53 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 15 Jul 2024 11:49:53 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v3] In-Reply-To: <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com> Message-ID: <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com> On Mon, 15 Jul 2024 11:33:30 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. >> >> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. >> >> In this PR: >> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it. >> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. >> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. >> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. >> >> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: >> >> >> Benchmark Threads Mode Cnt Score Error Units >> ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op >> ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op >> ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op >> ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op >> ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op >> ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op >> ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op >> ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op >> ConcurrentClose.sharedAccess 1 avgt 10 ... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > improve benchmark I've update the benchmark to run with 3 separate threads: 1 thread that is just creating and closing shared arenas in a loop, 1 that is accessing memory using the FFM API, and 1 that is accessing a `byte[]`. Current: Benchmark Mode Cnt Score Error Units ConcurrentClose.sharedClose avgt 10 50.093 ? 6.200 us/op ConcurrentClose.sharedClose:closing avgt 10 46.269 ? 0.786 us/op ConcurrentClose.sharedClose:memorySegmentAccess avgt 10 98.072 ? 19.061 us/op ConcurrentClose.sharedClose:otherAccess avgt 10 5.938 ? 0.058 us/op I do see a pretty big difference on the memory segment accessing thread when I remove deoptimization altogether: Benchmark Mode Cnt Score Error Units ConcurrentClose.sharedClose avgt 10 22.664 ? 0.409 us/op ConcurrentClose.sharedClose:closing avgt 10 45.351 ? 1.554 us/op ConcurrentClose.sharedClose:memorySegmentAccess avgt 10 16.671 ? 0.251 us/op ConcurrentClose.sharedClose:otherAccess avgt 10 5.969 ? 0.089 us/op When I remove the `has_scoped_access()` check before the deopt, I expect the `otherAccess` thread to be affected, but the effect isn't nearly as big as with the FFM thread. I think this is likely due to the `otherAccess` benchmark being less sensitive to optimization (i.e. it already runs fairly fast in the interpreter). I also tried using `MethodHandles::arrayElementGetter` for the access, but the numbers I got were pretty much the same: Benchmark Mode Cnt Score Error Units ConcurrentClose.sharedClose avgt 10 52.745 ? 1.071 us/op ConcurrentClose.sharedClose:closing avgt 10 46.670 ? 0.453 us/op ConcurrentClose.sharedClose:memorySegmentAccess avgt 10 102.663 ? 3.430 us/op ConcurrentClose.sharedClose:otherAccess avgt 10 8.901 ? 0.109 us/op I think, to really test the effect of the `has_scoped_access` check, we need to look at a more realistic scenario. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228311368 From mcimadamore at openjdk.org Mon Jul 15 12:02:51 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 15 Jul 2024 12:02:51 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v3] In-Reply-To: <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com> <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com> Message-ID: <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com> On Mon, 15 Jul 2024 11:47:43 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: > I've update the benchmark to run with 3 separate threads: 1 thread that is just creating and closing shared arenas in a loop, 1 that is accessing memory using the FFM API, and 1 that is accessing a `byte[]`. > > Current: > > ``` > Benchmark Mode Cnt Score Error Units > ConcurrentClose.sharedClose avgt 10 50.093 ? 6.200 us/op > ConcurrentClose.sharedClose:closing avgt 10 46.269 ? 0.786 us/op > ConcurrentClose.sharedClose:memorySegmentAccess avgt 10 98.072 ? 19.061 us/op > ConcurrentClose.sharedClose:otherAccess avgt 10 5.938 ? 0.058 us/op > ``` > > I do see a pretty big difference on the memory segment accessing thread when I remove deoptimization altogether: > > ``` > Benchmark Mode Cnt Score Error Units > ConcurrentClose.sharedClose avgt 10 22.664 ? 0.409 us/op > ConcurrentClose.sharedClose:closing avgt 10 45.351 ? 1.554 us/op > ConcurrentClose.sharedClose:memorySegmentAccess avgt 10 16.671 ? 0.251 us/op > ConcurrentClose.sharedClose:otherAccess avgt 10 5.969 ? 0.089 us/op > ``` > > When I remove the `has_scoped_access()` check before the deopt, I expect the `otherAccess` thread to be affected, but the effect isn't nearly as big as with the FFM thread. I think this is likely due to the `otherAccess` benchmark being less sensitive to optimization (i.e. it already runs fairly fast in the interpreter). I also tried using `MethodHandles::arrayElementGetter` for the access, but the numbers I got were pretty much the same: > > ``` > Benchmark Mode Cnt Score Error Units > ConcurrentClose.sharedClose avgt 10 52.745 ? 1.071 us/op > ConcurrentClose.sharedClose:closing avgt 10 46.670 ? 0.453 us/op > ConcurrentClose.sharedClose:memorySegmentAccess avgt 10 102.663 ? 3.430 us/op > ConcurrentClose.sharedClose:otherAccess avgt 10 8.901 ? 0.109 us/op > ``` > > I think, to really test the effect of the `has_scoped_access` check, we need to look at a more realistic scenario. Interesting benchmark. What is the baseline here? E.g. can we also compare against same benchmark that is using a confined arena to do the closing? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228335857 From mcimadamore at openjdk.org Mon Jul 15 12:12:53 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 15 Jul 2024 12:12:53 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v3] In-Reply-To: <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com> <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com> <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com> Message-ID: <vkis_Q4wJQqAp1yD68PRq0cMZrUx40OCYWRSIInivPE=.d2b50a6e-1cfc-46ea-b315-43abfd46ea63@github.com> On Mon, 15 Jul 2024 12:00:31 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: > When I remove the `has_scoped_access()` check before the deopt, I expect the `otherAccess` thread to be affected, but the effect isn't nearly as big as with the FFM thread. I think this is likely due to the `otherAccess` benchmark being less sensitive to optimization (i.e. it already runs fairly fast in the interpreter). I also tried using `MethodHandles::arrayElementGetter` for the access, but the numbers I got were pretty much the same: To put this into perspective, once the underlying bug with reachability fences is addressed, then we should see the numbers for this benchmark align with the ones where you removed deopt completely (as we won't deopt threads that don't have the target arena in their oopmap) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228353326 From mcimadamore at openjdk.org Mon Jul 15 12:17:51 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 15 Jul 2024 12:17:51 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v3] In-Reply-To: <vkis_Q4wJQqAp1yD68PRq0cMZrUx40OCYWRSIInivPE=.d2b50a6e-1cfc-46ea-b315-43abfd46ea63@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com> <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com> <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com> <vkis_Q4wJQqAp1yD68PRq0cMZrUx40OCYWRSIInivPE=.d2b50a6e-1cfc-46ea-b315-43abfd46ea63@github.com> Message-ID: <AHMA1a2t2LmvOsuoTJXkA-g4vY1MMiUmoC7QcUath14=.b68d4910-9a33-4af0-87e0-0da3e356bfd0@github.com> On Mon, 15 Jul 2024 12:10:02 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: > I also tried using `MethodHandles::arrayElementGetter` for the access, but the numbers I got were pretty much the same: This is quite strange, as the code involved should be quite similar to those with memory segments (e.g. you go through a method handle pointing to some helper class). I would have said this would have provided a fairly good comparison. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228359381 From mcimadamore at openjdk.org Mon Jul 15 12:17:52 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 15 Jul 2024 12:17:52 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v3] In-Reply-To: <AHMA1a2t2LmvOsuoTJXkA-g4vY1MMiUmoC7QcUath14=.b68d4910-9a33-4af0-87e0-0da3e356bfd0@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com> <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com> <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com> <vkis_Q4wJQqAp1yD68PRq0cMZrUx40OCYWRSIInivPE=.d2b50a6e-1cfc-46ea-b315-43abfd46ea63@github.com> <AHMA1a2t2LmvOsuoTJXkA-g4vY1MMiUmoC7QcUath14=.b68d4910-9a33-4af0-87e0-0da3e356bfd0@github.com> Message-ID: <SLRLJ0POQPOmE_s6A7xdWVS3OA8Nsk3cz11OGpUMgDw=.0ca111a0-efaf-4551-9802-9b52dbaa83df@github.com> On Mon, 15 Jul 2024 12:13:23 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: > > I also tried using `MethodHandles::arrayElementGetter` for the access, but the numbers I got were pretty much the same: > > This is quite strange, as the code involved should be quite similar to those with memory segments (e.g. you go through a method handle pointing to some helper class). I would have said this would have provided a fairly good comparison. Ah! I had `arrayElementVarHandle` in mind - maybe you can try that? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228361950 From jvernee at openjdk.org Mon Jul 15 12:36:53 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 15 Jul 2024 12:36:53 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v3] In-Reply-To: <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com> Message-ID: <pLIRTEBVE6RFNwR0N7hZ3eRrqmAPcFB3v2PkPnDQUg0=.0e7ea973-6c9b-41e0-abb0-5d975108fbc5@github.com> On Mon, 15 Jul 2024 11:33:30 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. >> >> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. >> >> In this PR: >> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it. >> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. >> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. >> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. >> >> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: >> >> >> Benchmark Threads Mode Cnt Score Error Units >> ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op >> ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op >> ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op >> ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op >> ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op >> ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op >> ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op >> ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op >> ConcurrentClose.sharedAccess 1 avgt 10 ... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > improve benchmark This is the baseline if I change `closing` to use a confined arena: Benchmark Mode Cnt Score Error Units ConcurrentClose.sharedClose avgt 10 8.089 ? 0.006 us/op ConcurrentClose.sharedClose:closing avgt 10 0.001 ? 0.001 us/op ConcurrentClose.sharedClose:memorySegmentAccess avgt 10 20.046 ? 0.019 us/op ConcurrentClose.sharedClose:otherAccess avgt 10 4.220 ? 0.002 us/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228401517 From jvernee at openjdk.org Mon Jul 15 12:49:56 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 15 Jul 2024 12:49:56 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v3] In-Reply-To: <SLRLJ0POQPOmE_s6A7xdWVS3OA8Nsk3cz11OGpUMgDw=.0ca111a0-efaf-4551-9802-9b52dbaa83df@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com> <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com> <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com> <vkis_Q4wJQqAp1yD68PRq0cMZrUx40OCYWRSIInivPE=.d2b50a6e-1cfc-46ea-b315-43abfd46ea63@github.com> <AHMA1a2t2LmvOsuoTJXkA-g4vY1MMiUmoC7QcUath14=.b68d4910-9a33-4af0-87e0-0da3e356bfd0@github.com> <SLRLJ0POQPOmE_s6A7xdWVS3OA8Nsk3cz11OGpUMgDw=.0ca111a0-efaf-4551-9802-9b52dbaa83df@github.com> Message-ID: <xjy3mm5IYGAjpUPi3pW6PzKlyGjm4MDBByQdZoKwP-U=.0ee7421a-a825-46da-900e-1120ce20bbac@github.com> On Mon, 15 Jul 2024 12:14:52 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: > Ah! I had `arrayElementVarHandle` in mind - maybe you can try that? Even with `arrayElementVarHandle` it's about the same ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228425705 From mcimadamore at openjdk.org Mon Jul 15 13:01:52 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 15 Jul 2024 13:01:52 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v3] In-Reply-To: <pLIRTEBVE6RFNwR0N7hZ3eRrqmAPcFB3v2PkPnDQUg0=.0e7ea973-6c9b-41e0-abb0-5d975108fbc5@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com> <pLIRTEBVE6RFNwR0N7hZ3eRrqmAPcFB3v2PkPnDQUg0=.0e7ea973-6c9b-41e0-abb0-5d975108fbc5@github.com> Message-ID: <_PDgnriMr5GoRUoTpxJnhZjIqEcjdF2kscNx94ScPlc=.b035d8ac-e218-46ed-86d9-a08368c63dc5@github.com> On Mon, 15 Jul 2024 12:34:37 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: > This is the baseline if I change `closing` to use a confined arena: > > ``` > Benchmark Mode Cnt Score Error Units > ConcurrentClose.sharedClose avgt 10 8.089 ? 0.006 us/op > ConcurrentClose.sharedClose:closing avgt 10 0.001 ? 0.001 us/op > ConcurrentClose.sharedClose:memorySegmentAccess avgt 10 20.046 ? 0.019 us/op > ConcurrentClose.sharedClose:otherAccess avgt 10 4.220 ? 0.002 us/op > ``` This is promising. Effectively, once all the issues surrounding reachability fences will be addressed, we should be able to achieve numbers similar to above even in the case of shared close. The only thing being slower in that case would be the closing thread itself. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228448722 From mcimadamore at openjdk.org Mon Jul 15 13:11:52 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 15 Jul 2024 13:11:52 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v3] In-Reply-To: <xjy3mm5IYGAjpUPi3pW6PzKlyGjm4MDBByQdZoKwP-U=.0ee7421a-a825-46da-900e-1120ce20bbac@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com> <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com> <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com> <vkis_Q4wJQqAp1yD68PRq0cMZrUx40OCYWRSIInivPE=.d2b50a6e-1cfc-46ea-b315-43abfd46ea63@github.com> <AHMA1a2t2LmvOsuoTJXkA-g4vY1MMiUmoC7QcUath14=.b68d4910-9a33-4af0-87e0-0da3e356bfd0@github.com> <SLRLJ0POQPOmE_s6A7xdWVS3OA8Nsk3cz11OGpUMgDw=.0ca111a0-efaf-4551-9802-9b52dbaa83df@github.com> <xjy3mm5IYGAjpUPi3pW6PzKlyGjm4MDBByQdZoKwP-U=.0ee7421a-a825-46da-900e-1120ce20bbac@github.com> Message-ID: <05NPlQ4U6cgxul3_rm6V-5PhPdRYSWO6oKIn67lfTxo=.e36064f0-274e-422a-aeed-4672159aaf7e@github.com> On Mon, 15 Jul 2024 12:47:30 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: > Even with `arrayElementVarHandle` it's about the same This is very odd, and I don't have a good explanation as to why that is the case. What does the baseline (confined arena) look like for `arrayElementVarHandle` ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228469162 From mcimadamore at openjdk.org Mon Jul 15 13:24:20 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 15 Jul 2024 13:24:20 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v9] In-Reply-To: <CZGbki5iFdCGPPagE-ya-_L8Nkgf7OunywApOsxC548=.2cee82fa-a5d2-4165-b012-30982a85030a@github.com> References: <CZGbki5iFdCGPPagE-ya-_L8Nkgf7OunywApOsxC548=.2cee82fa-a5d2-4165-b012-30982a85030a@github.com> Message-ID: <SNuYItiKx3q_g0ZUVt50iHQw67J-RmNEAP2c5M-kGeY=.bef589ad-fb18-40ff-b291-5da29d4d7451@github.com> > This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: > > * `System::load` and `System::loadLibrary` are now restricted methods > * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods > * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation > > This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. > > Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. > > Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Merge branch 'master' into restricted_jni - Address review comments - Add note on --illegal-native-access default value in the launcher help - Address review comment - Refine warning text for JNI method binding - Address review comments Improve warning for JNI methods, similar to what's described in JEP 472 Beef up tests - Address review comments - Fix another typo - Fix typo - Add more comments - ... and 2 more: https://git.openjdk.org/jdk/compare/2ced23fe...ff51ac6a ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19213/files - new: https://git.openjdk.org/jdk/pull/19213/files/789bdf48..ff51ac6a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19213&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19213&range=07-08 Stats: 168976 lines in 3271 files changed: 114666 ins; 38249 del; 16061 mod Patch: https://git.openjdk.org/jdk/pull/19213.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19213/head:pull/19213 PR: https://git.openjdk.org/jdk/pull/19213 From mcimadamore at openjdk.org Mon Jul 15 13:24:20 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 15 Jul 2024 13:24:20 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v8] In-Reply-To: <vWr2PdgTv6vEllxW8820KO3aZ3tR3xvMhCxD2k7QpS0=.8bfd6c47-6c93-4fc3-aced-7079889aa6a2@github.com> References: <CZGbki5iFdCGPPagE-ya-_L8Nkgf7OunywApOsxC548=.2cee82fa-a5d2-4165-b012-30982a85030a@github.com> <vWr2PdgTv6vEllxW8820KO3aZ3tR3xvMhCxD2k7QpS0=.8bfd6c47-6c93-4fc3-aced-7079889aa6a2@github.com> Message-ID: <1f6cPvfYhyTzqeYoeA6uQi2WULB_Bq49AhF_RoEVWDQ=.9577a65e-b626-43fd-ab03-09783b978d94@github.com> On Fri, 17 May 2024 13:38:25 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: >> >> * `System::load` and `System::loadLibrary` are now restricted methods >> * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods >> * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation >> >> This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. >> >> Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. >> >> Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments keep alive ------------- PR Comment: https://git.openjdk.org/jdk/pull/19213#issuecomment-2228489298 From jvernee at openjdk.org Mon Jul 15 13:52:53 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 15 Jul 2024 13:52:53 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v3] In-Reply-To: <05NPlQ4U6cgxul3_rm6V-5PhPdRYSWO6oKIn67lfTxo=.e36064f0-274e-422a-aeed-4672159aaf7e@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com> <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com> <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com> <vkis_Q4wJQqAp1yD68PRq0cMZrUx40OCYWRSIInivPE=.d2b50a6e-1cfc-46ea-b315-43abfd46ea63@github.com> <AHMA1a2t2LmvOsuoTJXkA-g4vY1MMiUmoC7QcUath14=.b68d4910-9a33-4af0-87e0-0da3e356bfd0@github.com> <SLRLJ0POQPOmE_s6A7xdWVS3OA8Nsk3cz11OGpUMgDw=.0ca111a0-efaf-4551-9802-9b52dbaa83df@github.com> <xjy3mm5IYGAjpUPi3pW6PzKlyGjm4MDBByQdZoKwP-U=.0ee7421a-a825-46da-900e-1120ce20bbac@github.com> <05NPlQ4U6cgxul3_rm6V-5PhPdRYSWO6oKIn67lfTxo=.e36064f0-274e-422a-aeed-4672159aaf7e@github.com> Message-ID: <6LWfBFLTU5Umn6EoF6qNsNjOi-uzedphDp661DUr2Q4=.7cc12bce-2283-4038-b3a5-28e6750dacfa@github.com> On Mon, 15 Jul 2024 13:09:21 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: > > Even with `arrayElementVarHandle` it's about the same > > This is very odd, and I don't have a good explanation as to why that is the case. What does the baseline (confined arena) look like for `arrayElementVarHandle` ? Pretty much exactly the same ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228555214 From mcimadamore at openjdk.org Mon Jul 15 14:04:52 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 15 Jul 2024 14:04:52 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v3] In-Reply-To: <6LWfBFLTU5Umn6EoF6qNsNjOi-uzedphDp661DUr2Q4=.7cc12bce-2283-4038-b3a5-28e6750dacfa@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com> <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com> <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com> <vkis_Q4wJQqAp1yD68PRq0cMZrUx40OCYWRSIInivPE=.d2b50a6e-1cfc-46ea-b315-43abfd46ea63@github.com> <AHMA1a2t2LmvOsuoTJXkA-g4vY1MMiUmoC7QcUath14=.b68d4910-9a33-4af0-87e0-0da3e356bfd0@github.com> <SLRLJ0POQPOmE_s6A7xdWVS3OA8Nsk3cz11OGpUMgDw=.0ca111a0-efaf-4551-9802-9b52dbaa83df@github.com> <xjy3mm5IYGAjpUPi3pW6PzKlyGjm4MDBByQdZoKwP-U=.0ee7421a-a825-46da-900e-1120ce20bbac@github.com> <05NPlQ4U6cgxul3_rm6V-5PhPdRYSWO6oKIn67lfTxo=.e36064f0-274e-422a-aeed-4672159aaf7e@github.com> <6LWfBFLTU5Umn6EoF6qNsNjOi-uzedphDp661DUr2Q4=.7cc12bce-2283-4038-b3a5-28e6750dacfa@github.com> Message-ID: <SDZzJPMEmQsSOaDtkC7g10HN4XPM_Q1CmdlCsAZYcKM=.465d6eca-3af2-4d39-8d33-f4f8a026834e@github.com> On Mon, 15 Jul 2024 13:49:57 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: > > > Even with `arrayElementVarHandle` it's about the same > > > > > > This is very odd, and I don't have a good explanation as to why that is the case. What does the baseline (confined arena) look like for `arrayElementVarHandle` ? > > Pretty much exactly the same So, that means that `arrayElementVarHandle` is ~4x faster than memory segment? Isn't that a bit odd? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228582926 From mli at openjdk.org Mon Jul 15 14:45:54 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 15 Jul 2024 14:45:54 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v9] In-Reply-To: <2VnXjMF_4HQa-bHWEW0-VaXF9VtQUs92mnPyUlF8UY8=.b6d68aab-b0f5-4544-b543-046d12f92b1b@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com> <eT48AR-Up7CyMkuiFet-hoQtyaO_hifCSZUQ6LJrjnQ=.026071f1-de0f-4589-a247-c7fc2afe68c4@github.com> <2VnXjMF_4HQa-bHWEW0-VaXF9VtQUs92mnPyUlF8UY8=.b6d68aab-b0f5-4544-b543-046d12f92b1b@github.com> Message-ID: <iaoi0o--txlXDpM7hHfpbn_wQWD9DxBlRDwXaQ8V9RQ=.e59b601f-3c14-4560-aec1-ba3bce070c01@github.com> On Wed, 10 Jul 2024 10:48:19 GMT, Andrew Haley <aph at openjdk.org> wrote: > I can't tell what problem we're trying to solve by not simply checking in the source code, in its preferred form, to the OpenJDK tree. Thhis has practical advantages to do with traceability and security, and in-principle reasons to do with basic Open Source practice too. On the other side, there are no disadvantages. Do you suggest to copy the whole sleef source repo into jdk? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2228672993 From pchilanomate at openjdk.org Mon Jul 15 14:56:58 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 15 Jul 2024 14:56:58 GMT Subject: RFR: 8335269: [Graal] occasional timeout in java/lang/StringBuffer/TestSynchronization.java with loom [v3] In-Reply-To: <WuesF9Q5ft_qBS-SToSKAHFbJKj_LXZkUp-bEfmoUcQ=.a0952d22-9988-45dc-82e3-e4c0cb69e250@github.com> References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com> <xcZfnPE5iPxfz9WTSkNWCamtfVSXhpg5UNojhYBsW30=.72bf8fbc-60bc-4250-9284-79b2d75150fb@github.com> <WuesF9Q5ft_qBS-SToSKAHFbJKj_LXZkUp-bEfmoUcQ=.a0952d22-9988-45dc-82e3-e4c0cb69e250@github.com> Message-ID: <nJcAz6i5sFgUQ9F5r_GkBBYEKu3lAG-f5WndZYakCXA=.38a29967-99c9-497b-9c38-028128394298@github.com> On Tue, 9 Jul 2024 16:46:02 GMT, Alan Bateman <alanb at openjdk.org> wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename test to ThreadPollOnYield.java > > Marked as reviewed by alanb (Reviewer). Thanks for the reviews and comments @AlanBateman, @dholmes-ora and @dougxc! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20016#issuecomment-2228695102 From pchilanomate at openjdk.org Mon Jul 15 14:56:59 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 15 Jul 2024 14:56:59 GMT Subject: Integrated: 8335269: [Graal] occasional timeout in java/lang/StringBuffer/TestSynchronization.java with loom In-Reply-To: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com> References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com> Message-ID: <JtTWrVvbZ2mDPUZpS140uz5grsKiSzx8Kl9Z7LF-k1E=.20bc6ef6-b8fa-44a0-a117-c6b7e27174b1@github.com> On Wed, 3 Jul 2024 19:54:46 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote: > Please review the following simple fix. A pinned virtual thread calling Thread.yield() in a loop might never poll for safepoints if the compiler relies on a poll in native method Continuation.doYield while optimizing. This is a special native method that doesn't always poll for safepoints, and in particular it doesn't if the virtual thread is pinned due to owning monitors. Currently this scenario can be reproduced with the Graal compiler. > > I included a test which reproduces the issue with Graal (couldn't reproduce the issue with c2). The test times out without the fix and passes with it. I also run the patch through mach5 tiers1-3. > > Thanks, > Patricio This pull request has now been integrated. Changeset: 000de306 Author: Patricio Chilano Mateo <pchilanomate at openjdk.org> URL: https://git.openjdk.org/jdk/commit/000de306286bb75bbdad2f572ce6dafd4184680e Stats: 84 lines in 2 files changed: 84 ins; 0 del; 0 mod 8335269: [Graal] occasional timeout in java/lang/StringBuffer/TestSynchronization.java with loom Reviewed-by: dholmes, alanb ------------- PR: https://git.openjdk.org/jdk/pull/20016 From uschindler at openjdk.org Mon Jul 15 15:08:54 2024 From: uschindler at openjdk.org (Uwe Schindler) Date: Mon, 15 Jul 2024 15:08:54 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v3] In-Reply-To: <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com> Message-ID: <AMZu30w-TvEZXJCUyAOIwf1ul52k4wdFxNnJKSe-n1w=.7b4ac77f-3989-43d8-b586-ffacb4b52a5a@github.com> On Mon, 15 Jul 2024 11:33:30 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. >> >> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. >> >> In this PR: >> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it. >> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. >> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. >> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. >> >> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: >> >> >> Benchmark Threads Mode Cnt Score Error Units >> ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op >> ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op >> ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op >> ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op >> ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op >> ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op >> ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op >> ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op >> ConcurrentClose.sharedAccess 1 avgt 10 ... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > improve benchmark Thins looks all promising! Together with making sure that Apache Solr and Elasticsearch/Opensearch close indexes one-by-one in a separate thread (with the PR https://github.com/apache/lucene/pull/13570 in place, too), the issues should be fixed. What is the issue with memory fences? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228726262 From jvernee at openjdk.org Mon Jul 15 15:24:54 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 15 Jul 2024 15:24:54 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2] In-Reply-To: <IaCxpypcq5BsE5qkCekQQfHpbvkrcU60kDW8_y8bgn8=.64109d21-8177-4477-a950-1bb5a3317eec@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> <LjCucUevFLYVoUMkuwCFQVefc4XJOe4LhnKyzKgv7dc=.45bba479-3885-4c34-a9cf-d737d67cb432@github.com> <C1dTpyl6SMMzerAW2RzQMIKnEFfGbSR-b7Y3igJvfeQ=.be2c9b83-ac19-429c-bf4a-53075c6a4ec6@github.com> <kt2ziWwc2mDqXYZCUEdDxEnqAcdmjrymlUnbOZN4TJg=.b8f7f731-e8e0-43ee-8b13-5bd8ab8ef5b8@github.com> <GAhBy0q0U31weyO24JHCXijauSe2bk_J3kQ07qZ5s70=.19332107-6225-49a3-a8ee-d0bf79a65873@github.com> <IaCxpypcq5BsE5qkCekQQfHpbvkrcU60kDW8_y8bgn8=.64109d21-8177-4477-a950-1bb5a3317eec@github.com> Message-ID: <go4TYWdHjDfdqUBwKYn11NxzEsvupNIgu6CNLLvvcKg=.a18dca9d-078a-4f94-b4e0-b0a0652dcd2c@github.com> On Mon, 15 Jul 2024 11:29:49 GMT, R?mi Forax <forax at openjdk.org> wrote: > > This is what I was thinking of as well. close() on a shared arena can be called by any thread, so it would be possible to have an executor service with 1-n threads that is dedicated to closing memory. > > This delays both the closing of the Arena and the freeing of the segments, so bugs may be not discovered if the arena is accessed in between the time the thread pool is notified and the time the close() is effectively called. Closing the arena is what requires the handshake, which is where the majority of the cost is. I don't see the point in closing synchronously, but then freeing the memory asynchronously, since the latter is relatively cheap. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228756598 From uschindler at openjdk.org Mon Jul 15 15:24:54 2024 From: uschindler at openjdk.org (Uwe Schindler) Date: Mon, 15 Jul 2024 15:24:54 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2] In-Reply-To: <go4TYWdHjDfdqUBwKYn11NxzEsvupNIgu6CNLLvvcKg=.a18dca9d-078a-4f94-b4e0-b0a0652dcd2c@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com> <LjCucUevFLYVoUMkuwCFQVefc4XJOe4LhnKyzKgv7dc=.45bba479-3885-4c34-a9cf-d737d67cb432@github.com> <C1dTpyl6SMMzerAW2RzQMIKnEFfGbSR-b7Y3igJvfeQ=.be2c9b83-ac19-429c-bf4a-53075c6a4ec6@github.com> <kt2ziWwc2mDqXYZCUEdDxEnqAcdmjrymlUnbOZN4TJg=.b8f7f731-e8e0-43ee-8b13-5bd8ab8ef5b8@github.com> <GAhBy0q0U31weyO24JHCXijauSe2bk_J3kQ07qZ5s70=.19332107-6225-49a3-a8ee-d0bf79a65873@github.com> <IaCxpypcq5BsE5qkCekQQfHpbvkrcU60kDW8_y8bgn8=.64109d21-8177-4477-a950-1bb5a3317eec@github.com> <go4TYWdHjDfdqUBwKYn11NxzEsvupNIgu6CNLLvvcKg=.a18dca9d-078a-4f94-b4e0-b0a0652dcd2c@github.com> Message-ID: <qcosrYz-ELNTFcuHX5ngZMCxrFAeNfcB1F4_VNDfMyM=.b8d19ff9-7ee9-4cff-829a-3f77706d500d@github.com> On Mon, 15 Jul 2024 15:18:20 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: > > > This is what I was thinking of as well. close() on a shared arena can be called by any thread, so it would be possible to have an executor service with 1-n threads that is dedicated to closing memory. > > > > > > This delays both the closing of the Arena and the freeing of the segments, so bugs may be not discovered if the arena is accessed in between the time the thread pool is notified and the time the close() is effectively called. > > Closing the arena is what requires the handshake, which is where the majority of the cost is. I don't see the point in closing synchronously, but then freeing the memory asynchronously, since the latter is relatively cheap. I think the idea is to trigger the handshake async and then close after the handshake (in a callback when hadshake finishs). This is only a problem if you for example want to delete a mmapped file on Windows. This won't work as long as the memory is mmapped, but in all other cases. So there should be the option to allow async close() [if client supports it], but the defaulkt should be synchronized. I think this is what @forax suggested. But anyways: Using a separate extra thread is a good idea. I proposed this for Apache Solr and Elasticsearch people are checking their code at moment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228760782 From mcimadamore at openjdk.org Mon Jul 15 16:34:00 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 15 Jul 2024 16:34:00 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v3] In-Reply-To: <SDZzJPMEmQsSOaDtkC7g10HN4XPM_Q1CmdlCsAZYcKM=.465d6eca-3af2-4d39-8d33-f4f8a026834e@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com> <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com> <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com> <vkis_Q4wJQqAp1yD68PRq0cMZrUx40OCYWRSIInivPE=.d2b50a6e-1cfc-46ea-b315-43abfd46ea63@github.com> <AHMA1a2t2LmvOsuoTJXkA-g4vY1MMiUmoC7QcUath14=.b68d4910-9a33-4af0-87e0-0da3e356bfd0@github.com> <SLRLJ0POQPOmE_s6A7xdWVS3OA8Nsk3cz11OGpUMgDw=.0ca111a0-efaf-4551-9802-9b52dbaa83df@github.com> <xjy3mm5IYGAjpUPi3pW6PzKlyGjm4MDBByQdZoKwP-U=.0ee7421a-a825-46da-900e-1120ce20bbac@github.com> <05NPlQ4U6cgxul3_rm6V-5PhPdRYSWO6oKIn67lfTxo=.e36064f0-274e-422a-aeed-4672159aaf7e@github.com> <6LWfBFLTU5Umn6EoF6qNsNjOi-uzedphDp661DUr2Q4=.7cc12bce-2283-4038-b3a5-28e6750dacfa@github.com> <SDZzJPMEmQsSOaDtkC7g10HN4XPM_Q1Cmdl CsAZYcKM=.465d6eca-3af2-4d39-8d33-f4f8a026834e@github.com> Message-ID: <N0Q0GTZ0BpZzjFdQ5_-tR0hV1ba_uPXjiEWYVE7SerE=.ce7681b0-a3f8-4f38-8ad7-717d42773aab@github.com> On Mon, 15 Jul 2024 14:02:27 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: > So, that means that `arrayElementVarHandle` is ~4x faster than memory segment? Isn't that a bit odd? I did some more analyis of the benchmark. I first eliminated the closing thread, and started with two simple benchmarks: @Benchmark public int memorySegmentAccess() { int sum = 0; for (int i = 0; i < segment.byteSize(); i++) { sum += segment.get(JAVA_BYTE, i); } return sum; } and @Benchmark public int otherAccess() { int sum = 0; for (int i = 0; i < array.length; i++) { sum += (byte)BYTE_HANDLE.get(array, i); } return sum; } where the setup code is as follows: static final int SIZE = 10_000; MemorySegment segment; byte[] array; static final VarHandle BYTE_HANDLE = MethodHandles.arrayElementVarHandle(byte[].class); @Setup public void setup() { array = new byte[SIZE]; segment = MemorySegment.ofArray(array); } With this, I obtained the following results: Benchmark Mode Cnt Score Error Units ConcurrentClose.memorySegmentAccess avgt 10 13.879 ? 0.478 us/op ConcurrentClose.otherAccess avgt 10 2.256 ? 0.017 us/op Ugh. It seems like C2 "blows up" at the third iteration: # Run progress: 0.00% complete, ETA 00:05:00 # Fork: 1 of 1 # Warmup Iteration 1: 6.712 us/op # Warmup Iteration 2: 5.756 us/op # Warmup Iteration 3: 13.267 us/op # Warmup Iteration 4: 13.267 us/op # Warmup Iteration 5: 13.274 us/op This might be a bug/regression. But, let's move on. I then tweaked the induction variable of the memory segment loop to be `long`, not `int` and I got: Benchmark Mode Cnt Score Error Units ConcurrentClose.memorySegmentAccess avgt 10 2.764 ? 0.016 us/op ConcurrentClose.otherAccess avgt 10 2.240 ? 0.016 us/op Far more respectable! And now we have a good baseline, since both workloads take amount the same time, so we can use them to draw interesting comparisons. So, let's add back a thread that does a shared arena close: Benchmark Mode Cnt Score Error Units ConcurrentClose.sharedClose avgt 10 12.001 ? 0.061 us/op ConcurrentClose.sharedClose:closing avgt 10 19.281 ? 0.323 us/op ConcurrentClose.sharedClose:memorySegmentAccess avgt 10 9.802 ? 0.314 us/op ConcurrentClose.sharedClose:otherAccess avgt 10 6.921 ? 0.151 us/op This is with vanilla JDK. If I apply the changes in this PR, I get this: Benchmark Mode Cnt Score Error Units ConcurrentClose.sharedClose avgt 10 10.837 ? 0.241 us/op ConcurrentClose.sharedClose:closing avgt 10 20.337 ? 1.674 us/op ConcurrentClose.sharedClose:memorySegmentAccess avgt 10 8.672 ? 0.993 us/op ConcurrentClose.sharedClose:otherAccess avgt 10 3.501 ? 0.162 us/op This is good. Note how `otherAccess` improved almost 2x, as the code is no longer redundantly de-optimized. Now, we know that, even for memory segment access, we can avoid redundant deopt once JDK-8290892 is fixed. To simulate that, I've dropped the lines which apply the conservative deoptimization in `scopedMemoryAccess.cpp` and ran the bench again: Benchmark Mode Cnt Score Error Units ConcurrentClose.sharedClose avgt 10 8.957 ? 0.089 us/op ConcurrentClose.sharedClose:closing avgt 10 18.898 ? 0.338 us/op ConcurrentClose.sharedClose:memorySegmentAccess avgt 10 4.403 ? 0.054 us/op ConcurrentClose.sharedClose:otherAccess avgt 10 3.571 ? 0.042 us/op Ok, now both accessor threads seem faster. If I swap the shared arena close with a confined arena close I get this: Benchmark Mode Cnt Score Error Units ConcurrentClose.sharedClose avgt 10 1.760 ? 0.008 us/op ConcurrentClose.sharedClose:closing avgt 10 ? 10?? us/op ConcurrentClose.sharedClose:memorySegmentAccess avgt 10 2.912 ? 0.016 us/op ConcurrentClose.sharedClose:otherAccess avgt 10 2.367 ? 0.009 us/op Summing up: * there is some issue involving segment access with `int` induction variable which we should investigate separately * this PR significantly improves performance of threads that are not touching memory segments, even under heavy shared arena close loads * performance of unrelated memory segment access is still affected by concurrent shared arena close. This is due to conservative deoptimization which will be removed once JDK-8290892 is fixed * when all fixes will be applied, the performance of the accessing threads gets quite close to ideal, but not 100% there. The loss seems in the acceptable range - given that this benchmark is closing shared arenas in a loop, arguably the worst possible case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228916752 From uschindler at openjdk.org Mon Jul 15 16:37:53 2024 From: uschindler at openjdk.org (Uwe Schindler) Date: Mon, 15 Jul 2024 16:37:53 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v3] In-Reply-To: <N0Q0GTZ0BpZzjFdQ5_-tR0hV1ba_uPXjiEWYVE7SerE=.ce7681b0-a3f8-4f38-8ad7-717d42773aab@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com> <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com> <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com> <vkis_Q4wJQqAp1yD68PRq0cMZrUx40OCYWRSIInivPE=.d2b50a6e-1cfc-46ea-b315-43abfd46ea63@github.com> <AHMA1a2t2LmvOsuoTJXkA-g4vY1MMiUmoC7QcUath14=.b68d4910-9a33-4af0-87e0-0da3e356bfd0@github.com> <SLRLJ0POQPOmE_s6A7xdWVS3OA8Nsk3cz11OGpUMgDw=.0ca111a0-efaf-4551-9802-9b52dbaa83df@github.com> <xjy3mm5IYGAjpUPi3pW6PzKlyGjm4MDBByQdZoKwP-U=.0ee7421a-a825-46da-900e-1120ce20bbac@github.com> <05NPlQ4U6cgxul3_rm6V-5PhPdRYSWO6oKIn67lfTxo=.e36064f0-274e-422a-aeed-4672159aaf7e@github.com> <6LWfBFLTU5Umn6EoF6qNsNjOi-uzedphDp661DUr2Q4=.7cc12bce-2283-4038-b3a5-28e6750dacfa@github.com> <SDZzJPMEmQsSOaDtkC7g10HN4XPM_Q1Cmdl CsAZYcKM=.465d6eca-3af2-4d39-8d33-f4f8a026834e@github.com> <N0Q0GTZ0BpZzjFdQ5_-tR0hV1ba_uPXjiEWYVE7SerE=.ce7681b0-a3f8-4f38-8ad7-717d42773aab@github.com> Message-ID: <U_6YS1DDbI2NSgpRxRyPAB0chDFlwDJLDzTG3bKn4kI=.b929ff6a-0a60-4b47-abc6-dfbc12b8b8b5@github.com> On Mon, 15 Jul 2024 16:30:11 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >>> > > Even with `arrayElementVarHandle` it's about the same >>> > >>> > >>> > This is very odd, and I don't have a good explanation as to why that is the case. What does the baseline (confined arena) look like for `arrayElementVarHandle` ? >>> >>> Pretty much exactly the same >> >> So, that means that `arrayElementVarHandle` is ~4x faster than memory segment? Isn't that a bit odd? > >> So, that means that `arrayElementVarHandle` is ~4x faster than memory segment? Isn't that a bit odd? > > I did some more analyis of the benchmark. I first eliminated the closing thread, and started with two simple benchmarks: > > > @Benchmark > public int memorySegmentAccess() { > int sum = 0; > for (int i = 0; i < segment.byteSize(); i++) { > sum += segment.get(JAVA_BYTE, i); > } > return sum; > } > > > and > > > @Benchmark > public int otherAccess() { > int sum = 0; > for (int i = 0; i < array.length; i++) { > sum += (byte)BYTE_HANDLE.get(array, i); > } > return sum; > } > > > where the setup code is as follows: > > > static final int SIZE = 10_000; > > MemorySegment segment; > byte[] array; > > static final VarHandle BYTE_HANDLE = MethodHandles.arrayElementVarHandle(byte[].class); > > @Setup > public void setup() { > array = new byte[SIZE]; > segment = MemorySegment.ofArray(array); > } > > > With this, I obtained the following results: > > > Benchmark Mode Cnt Score Error Units > ConcurrentClose.memorySegmentAccess avgt 10 13.879 ? 0.478 us/op > ConcurrentClose.otherAccess avgt 10 2.256 ? 0.017 us/op > > > Ugh. It seems like C2 "blows up" at the third iteration: > > > # Run progress: 0.00% complete, ETA 00:05:00 > # Fork: 1 of 1 > # Warmup Iteration 1: 6.712 us/op > # Warmup Iteration 2: 5.756 us/op > # Warmup Iteration 3: 13.267 us/op > # Warmup Iteration 4: 13.267 us/op > # Warmup Iteration 5: 13.274 us/op > > > This might be a bug/regression. But, let's move on. I then tweaked the induction variable of the memory segment loop to be `long`, not `int` and I got: > > > Benchmark Mode Cnt Score Error Units > ConcurrentClose.memorySegmentAccess avgt 10 2.764 ? 0.016 us/op > ConcurrentClose.otherAccess avgt 10 2.240 ? 0.016 us/op > > > Far more respectable! And now we have a good baseline, since both workloads take amount the same time, so we can use them to draw interesting comparisons. So, let's add back a thread that does a shared arena close: > > > Benchmark Mode Cnt Score Error Units > ConcurrentClose.sharedClose avgt 10 12.001 ? 0.061 us/op > ConcurrentClose.sharedClose:closing avgt 10 19.281 ? 0.323 us/op > ConcurrentClose.sharedClose:memorySegmentAccess avgt 10 9.802 ? 0.314 us/op > ConcurrentClose.sharedClose:otherAccess avgt 1... Thanks @mcimadamore, this sound great! I am so happy that we at least reduced the overhead for non-memory segment threads. This will also be the case for Lucene/Solr because we do not read from segments all the time, we also have other code sometimes executed between reads from memory segments :-) So +1 to merge this and hopefully backport it at least to 21? This would be great, but as it is not a bug not strictly necessary. We should open issues for the int problem and work on JDK-8290892. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228926251 From mcimadamore at openjdk.org Mon Jul 15 16:44:54 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 15 Jul 2024 16:44:54 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v3] In-Reply-To: <U_6YS1DDbI2NSgpRxRyPAB0chDFlwDJLDzTG3bKn4kI=.b929ff6a-0a60-4b47-abc6-dfbc12b8b8b5@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com> <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com> <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com> <vkis_Q4wJQqAp1yD68PRq0cMZrUx40OCYWRSIInivPE=.d2b50a6e-1cfc-46ea-b315-43abfd46ea63@github.com> <AHMA1a2t2LmvOsuoTJXkA-g4vY1MMiUmoC7QcUath14=.b68d4910-9a33-4af0-87e0-0da3e356bfd0@github.com> <SLRLJ0POQPOmE_s6A7xdWVS3OA8Nsk3cz11OGpUMgDw=.0ca111a0-efaf-4551-9802-9b52dbaa83df@github.com> <xjy3mm5IYGAjpUPi3pW6PzKlyGjm4MDBByQdZoKwP-U=.0ee7421a-a825-46da-900e-1120ce20bbac@github.com> <05NPlQ4U6cgxul3_rm6V-5PhPdRYSWO6oKIn67lfTxo=.e36064f0-274e-422a-aeed-4672159aaf7e@github.com> <6LWfBFLTU5Umn6EoF6qNsNjOi-uzedphDp661DUr2Q4=.7cc12bce-2283-4038-b3a5-28e6750dacfa@github.com> <SDZzJPMEmQsSOaDtkC7g10HN4XPM_Q1Cmdl CsAZYcKM=.465d6eca-3af2-4d39-8d33-f4f8a026834e@github.com> <N0Q0GTZ0BpZzjFdQ5_-tR0hV1ba_uPXjiEWYVE7SerE=.ce7681b0-a3f8-4f38-8ad7-717d42773aab@github.com> <U_6YS1DDbI2NSgpRxRyPAB0chDFlwDJLDzTG3bKn4kI=.b929ff6a-0a60-4b47-abc6-dfbc12b8b8b5@github.com> Message-ID: <O6xuVpHQIshoOQNyhroVBRHYI_6xhIFSx_HnH6s89Zg=.a2c7cfc3-c081-4c8f-a640-36bad0dc3d03@github.com> On Mon, 15 Jul 2024 16:35:26 GMT, Uwe Schindler <uschindler at openjdk.org> wrote: > So +1 to merge this and hopefully backport it at least to 21? Backport to 21 is difficult, given the handshake code there is different (and, FFM is preview there). But, might be more possible for 22. I have notified Roland re. the `int` problem, will update once I know more about the nature of this issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228939812 From uschindler at openjdk.org Mon Jul 15 16:44:56 2024 From: uschindler at openjdk.org (Uwe Schindler) Date: Mon, 15 Jul 2024 16:44:56 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v3] In-Reply-To: <O6xuVpHQIshoOQNyhroVBRHYI_6xhIFSx_HnH6s89Zg=.a2c7cfc3-c081-4c8f-a640-36bad0dc3d03@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com> <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com> <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com> <vkis_Q4wJQqAp1yD68PRq0cMZrUx40OCYWRSIInivPE=.d2b50a6e-1cfc-46ea-b315-43abfd46ea63@github.com> <AHMA1a2t2LmvOsuoTJXkA-g4vY1MMiUmoC7QcUath14=.b68d4910-9a33-4af0-87e0-0da3e356bfd0@github.com> <SLRLJ0POQPOmE_s6A7xdWVS3OA8Nsk3cz11OGpUMgDw=.0ca111a0-efaf-4551-9802-9b52dbaa83df@github.com> <xjy3mm5IYGAjpUPi3pW6PzKlyGjm4MDBByQdZoKwP-U=.0ee7421a-a825-46da-900e-1120ce20bbac@github.com> <05NPlQ4U6cgxul3_rm6V-5PhPdRYSWO6oKIn67lfTxo=.e36064f0-274e-422a-aeed-4672159aaf7e@github.com> <6LWfBFLTU5Umn6EoF6qNsNjOi-uzedphDp661DUr2Q4=.7cc12bce-2283-4038-b3a5-28e6750dacfa@github.com> <SDZzJPMEmQsSOaDtkC7g10HN4XPM_Q1Cmdl CsAZYcKM=.465d6eca-3af2-4d39-8d33-f4f8a026834e@github.com> <N0Q0GTZ0BpZzjFdQ5_-tR0hV1ba_uPXjiEWYVE7SerE=.ce7681b0-a3f8-4f38-8ad7-717d42773aab@github.com> <U_6YS1DDbI2NSgpRxRyPAB0chDFlwDJLDzTG3bKn4kI=.b929ff6a-0a60-4b47-abc6-dfbc12b8b8b5@github.com> <O6xuVpHQIshoOQNyhroVBRHYI_6xhIFSx_HnH6s89Zg=.a2c7cfc3-c081-4c8f-a640-36bad0dc3d03@github.com> Message-ID: <1lPwGcVzUtreIzS-ieJpTrtRyHM9PElbHN31NqFCDNI=.c44ff158-4c82-4dc3-9166-01393330db40@github.com> On Mon, 15 Jul 2024 16:40:06 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: > > So +1 to merge this and hopefully backport it at least to 21? > > Backport to 21 is difficult, given the handshake code there is different (and, FFM is preview there). But, might be more possible for 22. I have notified Roland re. the `int` problem, will update once I know more about the nature of this issue. Ah I remember: the tristate! All fine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228943554 From forax at openjdk.org Mon Jul 15 17:02:55 2024 From: forax at openjdk.org (=?UTF-8?B?UsOpbWk=?= Forax) Date: Mon, 15 Jul 2024 17:02:55 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v3] In-Reply-To: <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com> Message-ID: <YA4PQgaaD2hop3iZs3pV6hFjQwAps0cKjCf54NBl9eA=.a24956f6-84bf-4530-aeb8-e1d77621e9cf@github.com> On Mon, 15 Jul 2024 11:33:30 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. >> >> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. >> >> In this PR: >> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it. >> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. >> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. >> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. >> >> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: >> >> >> Benchmark Threads Mode Cnt Score Error Units >> ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op >> ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op >> ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op >> ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op >> ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op >> ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op >> ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op >> ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op >> ConcurrentClose.sharedAccess 1 avgt 10 ... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > improve benchmark Even if the int vs long issue is fixed for this case, i think we should recommand to call `withInvokeExactBehavior()` after creating any VarHandle so all the auto-conversions are treated as runtime errors. This is what i do with my students (when using compareAndSet) and it makes this kind of perf issue easy to find and easy to fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228979913 From aph at openjdk.org Mon Jul 15 17:02:56 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 15 Jul 2024 17:02:56 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v9] In-Reply-To: <iaoi0o--txlXDpM7hHfpbn_wQWD9DxBlRDwXaQ8V9RQ=.e59b601f-3c14-4560-aec1-ba3bce070c01@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com> <eT48AR-Up7CyMkuiFet-hoQtyaO_hifCSZUQ6LJrjnQ=.026071f1-de0f-4589-a247-c7fc2afe68c4@github.com> <2VnXjMF_4HQa-bHWEW0-VaXF9VtQUs92mnPyUlF8UY8=.b6d68aab-b0f5-4544-b543-046d12f92b1b@github.com> <iaoi0o--txlXDpM7hHfpbn_wQWD9DxBlRDwXaQ8V9RQ=.e59b601f-3c14-4560-aec1-ba3bce070c01@github.com> Message-ID: <5M8k0CGVXI79Dgu5BVVkEU6sHy7Z3jLvkqyTAg7TelU=.85707058-20a5-4574-86a4-b5c6ca05b4a7@github.com> On Mon, 15 Jul 2024 14:42:45 GMT, Hamlin Li <mli at openjdk.org> wrote: > > I can't tell what problem we're trying to solve by not simply checking in the source code, in its preferred form, to the OpenJDK tree. Thhis has practical advantages to do with traceability and security, and in-principle reasons to do with basic Open Source practice too. On the other side, there are no disadvantages. > > Do you suggest to copy the whole sleef source repo into jdk? I think so, along with scripting that generates the preprocessed file we use. It might be the case that there are some sleef files not used at all they could be omitted, but I'm not sure it would be useful, and from a traceability point of view it's probably best to grab it all, unless it's really huge ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2228979596 From mcimadamore at openjdk.org Mon Jul 15 17:09:53 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 15 Jul 2024 17:09:53 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v3] In-Reply-To: <YA4PQgaaD2hop3iZs3pV6hFjQwAps0cKjCf54NBl9eA=.a24956f6-84bf-4530-aeb8-e1d77621e9cf@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com> <YA4PQgaaD2hop3iZs3pV6hFjQwAps0cKjCf54NBl9eA=.a24956f6-84bf-4530-aeb8-e1d77621e9cf@github.com> Message-ID: <0a3-qVymtC5HI4wh8FQiNeebnG_-8Ax1seA80JhCQLY=.c24d9e87-b34f-45c7-aa7c-e63e294249d4@github.com> On Mon, 15 Jul 2024 17:00:24 GMT, R?mi Forax <forax at openjdk.org> wrote: > Even if the int vs long issue is fixed for this case, i think we should recommand to call `withInvokeExactBehavior()` after creating any VarHandle so all the auto-conversions are treated as runtime errors. > > This is what i do with my students (when using compareAndSet) and it makes this kind of perf issue easy to find and easy to fix. Note that this has nothing to do with implicit conversion, as the memory segment var handle is called by our implementation, with the correct type (a long). This is likely an issue with bound check elimination with "long loops". ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228992252 From luhenry at openjdk.org Mon Jul 15 17:38:52 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 15 Jul 2024 17:38:52 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v9] In-Reply-To: <5M8k0CGVXI79Dgu5BVVkEU6sHy7Z3jLvkqyTAg7TelU=.85707058-20a5-4574-86a4-b5c6ca05b4a7@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com> <eT48AR-Up7CyMkuiFet-hoQtyaO_hifCSZUQ6LJrjnQ=.026071f1-de0f-4589-a247-c7fc2afe68c4@github.com> <2VnXjMF_4HQa-bHWEW0-VaXF9VtQUs92mnPyUlF8UY8=.b6d68aab-b0f5-4544-b543-046d12f92b1b@github.com> <iaoi0o--txlXDpM7hHfpbn_wQWD9DxBlRDwXaQ8V9RQ=.e59b601f-3c14-4560-aec1-ba3bce070c01@github.com> <5M8k0CGVXI79Dgu5BVVkEU6sHy7Z3jLvkqyTAg7TelU=.85707058-20a5-4574-86a4-b5c6ca05b4a7@github.com> Message-ID: <UwSxg6BMklnndJlZGLVLgDgvcr-VrZbeuJyyBHMFrZ0=.eaad3ac9-b520-4abd-8f74-663f70e20f6d@github.com> On Mon, 15 Jul 2024 17:00:13 GMT, Andrew Haley <aph at openjdk.org> wrote: > > > I can't tell what problem we're trying to solve by not simply checking in the source code, in its preferred form, to the OpenJDK tree. Thhis has practical advantages to do with traceability and security, and in-principle reasons to do with basic Open Source practice too. On the other side, there are no disadvantages. > > > > > > Do you suggest to copy the whole sleef source repo into jdk? > > I think so, along with scripting that generates the preprocessed file we use. It might be the case that there are some sleef files not used at all they could be omitted, but I'm not sure it would be useful, and from a traceability point of view it's probably best to grab it all, unless it's really huge Given the Sleef build system currently uses cmake, we would have two choices to build the header files as part of the OpenJDK build system: 1. take a dependency on cmake in order to build the Sleef headers 2. write a custom build system for Sleef to integrate into OpenJDK Neither approach sound good to me as a mandatory option. However, if we are to allow the person building OpenJDK to _optionally_ generate the headers from a Sleef source checkout (provided by the user with a `--with-sleef-src=/path/to/sleef`), we can then more easily take the assumption that the user has installed the necessary dependencies. That would also be in line with how binutils is being built and integrated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2229040615 From pchilanomate at openjdk.org Mon Jul 15 18:36:17 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 15 Jul 2024 18:36:17 GMT Subject: [jdk23] RFR: 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 Message-ID: <d-ziRnu1_RcjgWDVhYQYb4U0xIWyi5B-hljLzDwQlt4=.a53602c1-25b7-4c93-b468-d55201959846@github.com> Hi all, This pull request contains a backport of commit [7ab96c74](https://github.com/openjdk/jdk/commit/7ab96c74e2c39f430a5c2f65a981da7314a2385b) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Patricio Chilano Mateo on 10 Jul 2024 and was reviewed by David Holmes, Thomas Stuefe, Coleen Phillimore and Aleksey Shipilev. Thanks ------------- Commit messages: - Backport 7ab96c74e2c39f430a5c2f65a981da7314a2385b Changes: https://git.openjdk.org/jdk/pull/20185/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20185&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335409 Stats: 55 lines in 3 files changed: 6 ins; 20 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/20185.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20185/head:pull/20185 PR: https://git.openjdk.org/jdk/pull/20185 From shade at openjdk.org Mon Jul 15 18:36:17 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 15 Jul 2024 18:36:17 GMT Subject: [jdk23] RFR: 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 In-Reply-To: <d-ziRnu1_RcjgWDVhYQYb4U0xIWyi5B-hljLzDwQlt4=.a53602c1-25b7-4c93-b468-d55201959846@github.com> References: <d-ziRnu1_RcjgWDVhYQYb4U0xIWyi5B-hljLzDwQlt4=.a53602c1-25b7-4c93-b468-d55201959846@github.com> Message-ID: <g51dGtyAfFu3y_uVGE7KBXcGPIkb1KICznIibbbDoLs=.202e218b-83ca-4b29-840c-0ae03949620a@github.com> On Mon, 15 Jul 2024 18:13:53 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote: > Hi all, > > This pull request contains a backport of commit [7ab96c74](https://github.com/openjdk/jdk/commit/7ab96c74e2c39f430a5c2f65a981da7314a2385b) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Patricio Chilano Mateo on 10 Jul 2024 and was reviewed by David Holmes, Thomas Stuefe, Coleen Phillimore and Aleksey Shipilev. > > Thanks Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20185#pullrequestreview-2178383297 From szaldana at openjdk.org Mon Jul 15 19:41:14 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 15 Jul 2024 19:41:14 GMT Subject: RFR: 8300732: Whitebox functions for Metaspace test should use byte size [v2] In-Reply-To: <7kTS7aOEGu5r0uCYvKrIb7nvf1-MBkuCngFWHxNzj2E=.1d2e2913-d442-429f-afc1-0732171cb514@github.com> References: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com> <7kTS7aOEGu5r0uCYvKrIb7nvf1-MBkuCngFWHxNzj2E=.1d2e2913-d442-429f-afc1-0732171cb514@github.com> Message-ID: <54XZwy3Z2ZIeHVMruRRbvsHd750jRJT7zvj-HVkojbM=.9d6b1185-2c9d-472c-aede-97a595d53ca0@github.com> On Thu, 11 Jul 2024 07:36:06 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > It looks cautiously okay. Small nits remain. > > Please make sure the tests pass for both 64-bit and 32-bit (to test 32-build, simplest way is to build on a x64 linux as normal, but to specify --with-target-bits=32 when configuring). I made some updates based on feedback. Apologies for the delay - I was figuring out how to verify the 32-bit build. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20039#issuecomment-2229245477 From szaldana at openjdk.org Mon Jul 15 19:41:13 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 15 Jul 2024 19:41:13 GMT Subject: RFR: 8300732: Whitebox functions for Metaspace test should use byte size [v2] In-Reply-To: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com> References: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com> Message-ID: <it3fVJfbutmJVOuZy7XTjmzgHXPMC8TJHAGI1gxCKjs=.3f8c72d6-97c1-408d-8e27-20ea940d7f89@github.com> > Hi all, > > This PR addresses [8300732](https://bugs.openjdk.org/browse/JDK-8300732) switching Whitebox Metaspace test functions to use bytes as opposed to words. > > Testing: > - [x] `test/hotspot/jtreg/runtime/Metaspace` tests pass. > > Thanks, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with two additional commits since the last revision: - Hard coding values and adding Unit class - whitebox changes based on feedback. Using is_aligned and asserts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20039/files - new: https://git.openjdk.org/jdk/pull/20039/files/5dcc6c9e..7c0138ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20039&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20039&range=00-01 Stats: 83 lines in 6 files changed: 42 ins; 19 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/20039.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20039/head:pull/20039 PR: https://git.openjdk.org/jdk/pull/20039 From mikael at openjdk.org Mon Jul 15 20:42:53 2024 From: mikael at openjdk.org (Mikael Vidstedt) Date: Mon, 15 Jul 2024 20:42:53 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v11] In-Reply-To: <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com> Message-ID: <Ebg73qVwNWOzw_TEZ37GgRIeV2AIIWfsSa6EExexRtk=.43e71880-42c8-4d29-b333-76b91563d428@github.com> On Tue, 9 Jul 2024 12:08:50 GMT, Hamlin Li <mli at openjdk.org> wrote: >> Hi, >> Can you help to review the patch? >> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294). >> * NOTE: This pr depends on https://github.com/openjdk/jdk/pull/19185, which includes a README, a script to generate sleef inline headers and generated sleef inline headers. >> >> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time. >> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk. >> >> Besides of the code changes, one important task is to handle the legal process. >> >> Thanks! >> >> ## Test >> tests: >> * test/jdk/jdk/incubator/vector/ >> * test/hotspot/jtreg/compiler/vectorapi/ >> >> options: >> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:+EnableVectorSupport -XX:-UseVectorStubs >> >> ## Performance >> >> ### Options >> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' >> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' >> >> ### Float >> data >> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> >> Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0) >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408 >> Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876 >> Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936 >> Float128Vector.ATAN... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > skip TANH If we want the traceability (which I agree is good) of the SLEEF source code but want to avoid having it in the jdk repo itself (adding unnecessary "bloat" for everybody), perhaps we can consider having it in a separate repository somewhere in/under `openjdk`? It's not immediately clear to me that we need to have support in the JDK build system (configure/make) itself for building/updating the header files, as long as there's a simple, documented way of doing so. I like to think the `createSleef.sh` script is that, but I recognize that I'm biased because I wrote it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2229380066 From mli at openjdk.org Mon Jul 15 21:00:54 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 15 Jul 2024 21:00:54 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v9] In-Reply-To: <UwSxg6BMklnndJlZGLVLgDgvcr-VrZbeuJyyBHMFrZ0=.eaad3ac9-b520-4abd-8f74-663f70e20f6d@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com> <eT48AR-Up7CyMkuiFet-hoQtyaO_hifCSZUQ6LJrjnQ=.026071f1-de0f-4589-a247-c7fc2afe68c4@github.com> <2VnXjMF_4HQa-bHWEW0-VaXF9VtQUs92mnPyUlF8UY8=.b6d68aab-b0f5-4544-b543-046d12f92b1b@github.com> <iaoi0o--txlXDpM7hHfpbn_wQWD9DxBlRDwXaQ8V9RQ=.e59b601f-3c14-4560-aec1-ba3bce070c01@github.com> <5M8k0CGVXI79Dgu5BVVkEU6sHy7Z3jLvkqyTAg7TelU=.85707058-20a5-4574-86a4-b5c6ca05b4a7@github.com> <UwSxg6BMklnndJlZGLVLgDgvcr-VrZbeuJyyBHMFrZ0=.eaad3ac9-b520-4abd-8f74-663f70e20f6d@github.com> Message-ID: <Sbebz83QoFGDL33tqAZROLgJsrJCaH05-ic6q8B9Q_Q=.892d3a14-e0e8-4f31-8068-bda6c5891880@github.com> On Mon, 15 Jul 2024 17:35:59 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: > I think so, along with scripting that generates the preprocessed file we use. It might be the case that there are some sleef files not used at all they could be omitted, but I'm not sure it would be useful, and from a traceability point of view it's probably best to grab it all, unless it's really huge Currently, * in https://github.com/openjdk/jdk/pull/19185 it generates the sleef inline headers from sleef 3.6.1, which is tagged in sleef repo. * And with the script in https://github.com/openjdk/jdk/pull/19185, anyone with access to sleef repo can re-generate these inline headers by himself( in fact anyone can generate the inline headers from sleef from scratch without using scripts in https://github.com/openjdk/jdk/pull/19185, our script just make it easy for the future maintenance), so it's easy for anyone to verify these inline header files used in jdk. With these 2 points, seems the traceability is fine to me, please kindly point out if I missed some points. Maybe we can add some more clear and specific information in README or createSleef.sh in https://github.com/openjdk/jdk/pull/19185 to indicate which version of sleef source we're using in jdk. I'm also fine with your suggestion to add whole sleef repo into jdk (maybe we can remove some of files, but we can ignore the difference temporarily in the dicussion here). To copy the sleef repo into jdk, we still need to pre-generate the inline header files, and check them in jdk along with the sleef repo, I think you also think so too (As without checking in these inline headers, we will have to bring some extra dependencies into jdk, and increase extra compilation time when building jdk). But from traceability point of view, seems to me it does not bring extra benefit than current https://github.com/openjdk/jdk/pull/19185. If someone want to verify the pre-generate inline headers in jdk, he still need to verify the sleef source in jdk, then the pre-generated sleef inline headers. How do you think about it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2229421715 From mikael at openjdk.org Mon Jul 15 21:19:55 2024 From: mikael at openjdk.org (Mikael Vidstedt) Date: Mon, 15 Jul 2024 21:19:55 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v11] In-Reply-To: <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com> Message-ID: <uWeKJ7D4_DgMnBgU2o4KzvVU6lB4xVBds2-SAAPEthU=.cfa1b966-9791-4773-9f10-cb35f58871f0@github.com> On Tue, 9 Jul 2024 12:08:50 GMT, Hamlin Li <mli at openjdk.org> wrote: >> Hi, >> Can you help to review the patch? >> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294). >> * NOTE: This pr depends on https://github.com/openjdk/jdk/pull/19185, which includes a README, a script to generate sleef inline headers and generated sleef inline headers. >> >> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time. >> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk. >> >> Besides of the code changes, one important task is to handle the legal process. >> >> Thanks! >> >> ## Test >> tests: >> * test/jdk/jdk/incubator/vector/ >> * test/hotspot/jtreg/compiler/vectorapi/ >> >> options: >> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:+EnableVectorSupport -XX:-UseVectorStubs >> >> ## Performance >> >> ### Options >> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' >> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' >> >> ### Float >> data >> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> >> Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0) >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408 >> Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876 >> Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936 >> Float128Vector.ATAN... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > skip TANH I think the key question is whether we're comfortable relying on/pointing at an external repository which may or may not be there tomorrow and/or where tags may change outside of our control. The SLEEF source code looks to be around 7.5MB, give or take. That's not enormous, but it's not exactly small when keeping in mind that if we `#include` it in the jdk repo it's going to be there for every cloned repo in every project/branch and very few will actually care about it. I agree that we'd still have to include the pre-generated header files. Hence my suggestion to consider putting it under our control, but in a separate `openjdk` controlled repository. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2229457499 From psandoz at openjdk.org Mon Jul 15 23:31:54 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 15 Jul 2024 23:31:54 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields In-Reply-To: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> Message-ID: <YFP94FW91LrpdTMeak-ePVmpwlW788IBynq_qBZVves=.a6acb940-78b0-4fce-826a-fb065d8a41f6@github.com> On Mon, 10 Jun 2024 18:05:09 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > See bug for more discussion. > > Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193 > > A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice: > https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160 > > AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone. > > I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields. > > Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit. > > C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner. > > Additional testing: > - [x] New IR tests > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, jcstre... IIUC this means we can remove the explicit fence here: public ConstantCallSite(MethodHandle target) { super(target); isFrozen = true; UNSAFE.storeStoreFence(); // properly publish isFrozen update } ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2229615130 From liach at openjdk.org Tue Jul 16 03:07:58 2024 From: liach at openjdk.org (Chen Liang) Date: Tue, 16 Jul 2024 03:07:58 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields In-Reply-To: <YFP94FW91LrpdTMeak-ePVmpwlW788IBynq_qBZVves=.a6acb940-78b0-4fce-826a-fb065d8a41f6@github.com> References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> <YFP94FW91LrpdTMeak-ePVmpwlW788IBynq_qBZVves=.a6acb940-78b0-4fce-826a-fb065d8a41f6@github.com> Message-ID: <jkFv8H1REe7218LdmB3Bwa5k0r7Aj_fWqO_hd6VT3IE=.4ee32eac-ce76-4bdf-938c-26672366cd83@github.com> On Mon, 15 Jul 2024 23:29:37 GMT, Paul Sandoz <psandoz at openjdk.org> wrote: > IIUC this means we can remove the explicit fence here `ConstantCallSite` is non-sealed, and we probably wish to read `isFrozen == true` when we can read anything initialized by the subclasses, especially if a malicious subclass leaks itself into some multithreaded environment before quitting the constructor. That said, I think we can change this to a StoreStore or Release: https://github.com/openjdk/jdk/blob/8feabc849ba2f617c8c6dbb2ec5074297beb6437/src/java.base/share/classes/java/lang/invoke/MutableCallSite.java#L277 ------------- PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2229913291 From liach at openjdk.org Tue Jul 16 03:50:17 2024 From: liach at openjdk.org (Chen Liang) Date: Tue, 16 Jul 2024 03:50:17 GMT Subject: RFR: 8336275: Move common Method and Constructor fields to Executable Message-ID: <nYtWyeRXdAr_zmzpxdugyZNRUzhfHJUKX1K2ilpSs8A=.cb1c31be-a7e0-49b5-ab9b-18a3abd122a9@github.com> Move fields common to Method and Field to executable, which simplifies implementation. Removed useless transient modifiers as Method and Field were never serializable. ------------- Commit messages: - Inline some common ctor + method fields to executable Changes: https://git.openjdk.org/jdk/pull/20188/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20188&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8336275 Stats: 451 lines in 11 files changed: 77 ins; 238 del; 136 mod Patch: https://git.openjdk.org/jdk/pull/20188.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20188/head:pull/20188 PR: https://git.openjdk.org/jdk/pull/20188 From dholmes at openjdk.org Tue Jul 16 04:58:52 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 16 Jul 2024 04:58:52 GMT Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers [v2] In-Reply-To: <5Xt9rNCHwYnwvFMglf_Yp5ZzwKEDNrmRecR_NrFLGMA=.7aa1fef1-a977-4244-ad24-df9897bb2743@github.com> References: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com> <WSVnDVWEq7cIaiEd2-pdWW4Il8Qi4wwvjF2yyveKcgM=.613045d7-a827-4f3d-bcf4-ba9200a2c8f4@github.com> <t3K5QhtFrCpM4EoXc_pskncDv72bSfKgUKfguzjVI0Q=.4e5b01d1-9cad-45ec-8d70-656615bee374@github.com> <0j_XZ2e84ADGz8jxk21pFyF0QNhubV0i7sVi5sxnSyg=.7281e6d1-bf24-49f1-96a6-8284c4c9f90d@github.com> <G5EBaq25gdUcR-5HHsF3Bg8vvpXImOqwnKZbIht8LMI=.07dd543a-1442-495b-97cd-c2bffe268949@github.com> <5Xt9rNCHwYnwvFMglf_Yp5ZzwKEDNrmRecR_NrFLGMA=.7aa1fef1-a977-4244-ad24-df9897bb2743@github.com> Message-ID: <JbfdSfSkvw0v3-W6vH1_jeilVN47W1vtRGKCCLuBI-Q=.37ffdc4f-95e7-4b2d-b7a1-89895bb081d4@github.com> On Mon, 15 Jul 2024 09:15:02 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: >> Okay, such a change in behaviour was unexpected for a "cleanup" PR. I'm looking into it now. Perhaps @mlchung can comment? > > Yeah, this is not really a cleanup (behaviors stay the same) change. For this particular hunk, keeping the old behavior seems to be unnecessary work. Note that we are also changing the behavior in C2: both in `do_exits` we no longer emit the barriers for `static final` stores in `clinits`, plus EA does not care about `clinits` anymore as well. Those are also behavioral changes. > > If you prefer, I can turn this PR into a behaviorally similar cleanup, and do the behavior changes separately. I certainly would not want those C2 changes to hidden behind what looks like a cleanup on the surface, so please do separate things out. BTW Mandy is away for a while so we can't get her input on the original intent here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20120#discussion_r1678756832 From aph at openjdk.org Tue Jul 16 07:50:55 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 16 Jul 2024 07:50:55 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v9] In-Reply-To: <UwSxg6BMklnndJlZGLVLgDgvcr-VrZbeuJyyBHMFrZ0=.eaad3ac9-b520-4abd-8f74-663f70e20f6d@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com> <eT48AR-Up7CyMkuiFet-hoQtyaO_hifCSZUQ6LJrjnQ=.026071f1-de0f-4589-a247-c7fc2afe68c4@github.com> <2VnXjMF_4HQa-bHWEW0-VaXF9VtQUs92mnPyUlF8UY8=.b6d68aab-b0f5-4544-b543-046d12f92b1b@github.com> <iaoi0o--txlXDpM7hHfpbn_wQWD9DxBlRDwXaQ8V9RQ=.e59b601f-3c14-4560-aec1-ba3bce070c01@github.com> <5M8k0CGVXI79Dgu5BVVkEU6sHy7Z3jLvkqyTAg7TelU=.85707058-20a5-4574-86a4-b5c6ca05b4a7@github.com> <UwSxg6BMklnndJlZGLVLgDgvcr-VrZbeuJyyBHMFrZ0=.eaad3ac9-b520-4abd-8f74-663f70e20f6d@github.com> Message-ID: <2YOVweTWkX1_HY8VRJktqfgY9gMsEqfEpov0qdhpTQM=.5472511d-7d82-4f32-98be-d998e2fee617@github.com> On Mon, 15 Jul 2024 17:35:59 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: > Given the Sleef build system currently uses cmake, we would have two choices to build the header files as part of the OpenJDK build system I don't think that anyone is proposing to do that, so we can discount it altogether. > However, if we are to allow the person building OpenJDK to _optionally_ generate the headers from a Sleef source checkout (provided by the user with a `--with-sleef-src=/path/to/sleef`), we can then more easily take the assumption that the user has installed the necessary dependencies. That would also be in line with how binutils is being built and integrated. Mmm, but we don't need to do that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2230245021 From aph at openjdk.org Tue Jul 16 08:23:57 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 16 Jul 2024 08:23:57 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v11] In-Reply-To: <uWeKJ7D4_DgMnBgU2o4KzvVU6lB4xVBds2-SAAPEthU=.cfa1b966-9791-4773-9f10-cb35f58871f0@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com> <uWeKJ7D4_DgMnBgU2o4KzvVU6lB4xVBds2-SAAPEthU=.cfa1b966-9791-4773-9f10-cb35f58871f0@github.com> Message-ID: <brYLslKpd_pgvj7HsK527bi1vXS-7FuzMBusHLpZ25I=.e205b93e-ac3b-4213-bda1-bab72946f206@github.com> On Mon, 15 Jul 2024 21:17:03 GMT, Mikael Vidstedt <mikael at openjdk.org> wrote: > I think the key question is whether we're comfortable relying on/pointing at an external repository which may or may not be there tomorrow and/or where tags may change outside of our control. Right. We should adopt best practice, both from an Open Source compliance point of view and (from a security, traceability, and binary reproduceability point of view) with regard to the xz backdoor hack. > The SLEEF source code looks to be around 7.5MB, give or take. That's not enormous, but it's not exactly small when keeping in mind that if we `#include` it in the jdk repo it's going to be there for every cloned repo in every project/branch and very few will actually care about it. I agree that we'd still have to include the pre-generated header files. > > Hence my suggestion to consider putting it under our control, but in a separate `openjdk` controlled repository. That ticks many of the boxes, as long as we can be sure to tag everything. But from a space point of view I'm not sure it's compelling. After all, we've recently decided to use branches rather than separate repos for releases, which is a good idea because it keeps everything together, but it does increase the repo size for everyone. It would be very nice if Git allowed a subset of the repo to be checked out, but as far as I can see it doesn't. Before checkout, the OpenJDK repo is 1.4G. After checkout that's 2.1G. So, about 0.7G of that is the JDK source code, if you include the file system overhead. 7.5Mb doesn't sound excessive when you consider that SLEEF potentially provides vectorized routines for many OpenJDK targets. It's not just about AArch64. This is starting to sound like we need a policy decision, because we don't want to re-hash this discussion every time the question comes up, as it surely will. For me, that supplying preprocessed source code without real source is known bad practice, even to the extent of being expressly forbidden in the open source definition, is a slam-dunk argument. But clearly that argument doesn't work for everyone. Maybe something to be discussed at the workshop? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2230304980 From aph at openjdk.org Tue Jul 16 08:37:58 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 16 Jul 2024 08:37:58 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v11] In-Reply-To: <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com> Message-ID: <HIRpMmVn-DXZ0J6woZFgviDZEcVnNR2m6YecDBTiuPY=.13e18284-0aa7-45fb-8bd2-1c0ae0be1914@github.com> On Tue, 9 Jul 2024 12:08:50 GMT, Hamlin Li <mli at openjdk.org> wrote: >> Hi, >> Can you help to review the patch? >> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294). >> * NOTE: This pr depends on https://github.com/openjdk/jdk/pull/19185, which includes a README, a script to generate sleef inline headers and generated sleef inline headers. >> >> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time. >> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk. >> >> Besides of the code changes, one important task is to handle the legal process. >> >> Thanks! >> >> ## Test >> tests: >> * test/jdk/jdk/incubator/vector/ >> * test/hotspot/jtreg/compiler/vectorapi/ >> >> options: >> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:+EnableVectorSupport -XX:-UseVectorStubs >> >> ## Performance >> >> ### Options >> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' >> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' >> >> ### Float >> data >> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> >> Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0) >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408 >> Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876 >> Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936 >> Float128Vector.ATAN... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > skip TANH > Currently, > > * in [8329816: Add SLEEF version 3.6.1 #19185](https://github.com/openjdk/jdk/pull/19185) it generates the sleef inline headers from sleef 3.6.1, which is tagged in sleef repo. > > * And with the script in [8329816: Add SLEEF version 3.6.1 #19185](https://github.com/openjdk/jdk/pull/19185), anyone with access to sleef repo can re-generate these inline headers by himself Right, but think about package builders. This isn't about J Random Hacker doing it by hand. When a package gets built, the builder machine unpacks source code. If SLEEF is included as part of JDK source, all the builder has to do is run the script and overwrite whatever preprocessed source is in there. The alternative is packaging the SLEEF source code tarball separately in the OpenJDK source package. Sure, all of this can be done, but it's a question of whether we do it once, here, now, or all the downstream builders have to do it themselves. > ( in fact anyone can generate the inline headers from sleef from scratch without using scripts in [8329816: Add SLEEF version 3.6.1 #19185](https://github.com/openjdk/jdk/pull/19185), our script just make it easy for the future maintenance), so it's easy for anyone to verify these inline header files used in jdk. That script must be checked in to the OpenJDK tree. > With these 2 points, seems the traceability is fine to me, please kindly point out if I missed some points. Maybe we can add some more clear and specific information in README or createSleef.sh in #19185 to indicate which version of sleef source we're using in jdk. > > I'm also fine with your suggestion to add whole sleef repo into jdk (maybe we can remove some of files, but we can ignore the difference temporarily in the dicussion here). To copy the sleef repo into jdk, we still need to pre-generate the inline header files, and check them in jdk along with the sleef repo, I think you also think so too Yes. > (As without checking in these inline headers, we will have to bring some extra dependencies into jdk, and increase extra compilation time when building jdk). But from traceability point of view, seems to me it does not bring extra benefit than current #19185. For example, if someone want to verify the pre-generate inline headers in jdk, he need to first verify the sleef source in jdk, then the pre-generated sleef inline headers. You don't need to verify the pre-generated inline headers, just overwrite them. The point is that the sleef source is digitally signed, not just by the SLEEF maintainers, _but by OpenJDK as well._ This is not a small thing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2230332083 From mli at openjdk.org Tue Jul 16 09:40:01 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 16 Jul 2024 09:40:01 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v11] In-Reply-To: <HIRpMmVn-DXZ0J6woZFgviDZEcVnNR2m6YecDBTiuPY=.13e18284-0aa7-45fb-8bd2-1c0ae0be1914@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com> <HIRpMmVn-DXZ0J6woZFgviDZEcVnNR2m6YecDBTiuPY=.13e18284-0aa7-45fb-8bd2-1c0ae0be1914@github.com> Message-ID: <6LI53-1gh5fncS7RCdJvtGKUjiFtEj3v0quJmzZUbNw=.49250609-57b1-4901-a7ad-8323771f94c7@github.com> On Tue, 16 Jul 2024 08:35:25 GMT, Andrew Haley <aph at openjdk.org> wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> skip TANH > >> Currently, >> >> * in [8329816: Add SLEEF version 3.6.1 #19185](https://github.com/openjdk/jdk/pull/19185) it generates the sleef inline headers from sleef 3.6.1, which is tagged in sleef repo. >> >> * And with the script in [8329816: Add SLEEF version 3.6.1 #19185](https://github.com/openjdk/jdk/pull/19185), anyone with access to sleef repo can re-generate these inline headers by himself > > Right, but think about package builders. This isn't about J Random Hacker doing it by hand. > > When a package gets built, the builder machine unpacks source code. If SLEEF is included as part of JDK source, all the builder has to do is run the script and overwrite whatever preprocessed source is in there. The alternative is packaging the SLEEF source code tarball separately in the OpenJDK source package. Sure, all of this can be done, but it's a question of whether we do it once, here, now, or all the downstream builders have to do it themselves. > >> ( in fact anyone can generate the inline headers from sleef from scratch without using scripts in [8329816: Add SLEEF version 3.6.1 #19185](https://github.com/openjdk/jdk/pull/19185), our script just make it easy for the future maintenance), so it's easy for anyone to verify these inline header files used in jdk. > > That script must be checked in to the OpenJDK tree. > >> With these 2 points, seems the traceability is fine to me, please kindly point out if I missed some points. Maybe we can add some more clear and specific information in README or createSleef.sh in #19185 to indicate which version of sleef source we're using in jdk. >> >> I'm also fine with your suggestion to add whole sleef repo into jdk (maybe we can remove some of files, but we can ignore the difference temporarily in the dicussion here). To copy the sleef repo into jdk, we still need to pre-generate the inline header files, and check them in jdk along with the sleef repo, I think you also think so too > > Yes. > >> (As without checking in these inline headers, we will have to bring some extra dependencies into jdk, and increase extra compilation time when building jdk). But from traceability point of view, seems to me it does not bring extra benefit than current #19185. For example, if someone want to verify the pre-generate inline headers in jdk, he need to first verify the sleef source in jdk, then the pre-generated sleef inline headers. > > You don't need to verify the pre-generated inline headers, just overwrite them. The point is that the sleef source is di... @theRealAph Thanks for clarification. I think there are several different parts involved in the above discussion, please kindly correct me if I misunderstood. 1. package builders. This is about the release of jdk (both src and binary), by either openjdk, adoptium, or any other downstream vendors. 2. jdk daily development. This is about to modify, build, run/test jdk daily by jdk developers. For the package builders, original sleef source is necessary; for the jdk daily development, only pre-generated sleef inline headers are necessary. The script to pre-generate sleef inline headers is only triggerred by package builders (and I think it involves some scripts which are not part of jdk source ? e.g. the script to trigger pre-generating script), but for jdk daily development, we just need pre-generated sleef inline headers. Am I understanding correctly above? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2230456463 From aph at openjdk.org Tue Jul 16 09:51:57 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 16 Jul 2024 09:51:57 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v11] In-Reply-To: <HIRpMmVn-DXZ0J6woZFgviDZEcVnNR2m6YecDBTiuPY=.13e18284-0aa7-45fb-8bd2-1c0ae0be1914@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com> <HIRpMmVn-DXZ0J6woZFgviDZEcVnNR2m6YecDBTiuPY=.13e18284-0aa7-45fb-8bd2-1c0ae0be1914@github.com> Message-ID: <m3SuSFaXSrlS3hEl2vwD43JqUZFg0CbgXozRVliTa-Q=.26512d19-ecf9-4de5-9106-27794407c61d@github.com> On Tue, 16 Jul 2024 08:35:25 GMT, Andrew Haley <aph at openjdk.org> wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> skip TANH > >> Currently, >> >> * in [8329816: Add SLEEF version 3.6.1 #19185](https://github.com/openjdk/jdk/pull/19185) it generates the sleef inline headers from sleef 3.6.1, which is tagged in sleef repo. >> >> * And with the script in [8329816: Add SLEEF version 3.6.1 #19185](https://github.com/openjdk/jdk/pull/19185), anyone with access to sleef repo can re-generate these inline headers by himself > > Right, but think about package builders. This isn't about J Random Hacker doing it by hand. > > When a package gets built, the builder machine unpacks source code. If SLEEF is included as part of JDK source, all the builder has to do is run the script and overwrite whatever preprocessed source is in there. The alternative is packaging the SLEEF source code tarball separately in the OpenJDK source package. Sure, all of this can be done, but it's a question of whether we do it once, here, now, or all the downstream builders have to do it themselves. > >> ( in fact anyone can generate the inline headers from sleef from scratch without using scripts in [8329816: Add SLEEF version 3.6.1 #19185](https://github.com/openjdk/jdk/pull/19185), our script just make it easy for the future maintenance), so it's easy for anyone to verify these inline header files used in jdk. > > That script must be checked in to the OpenJDK tree. > >> With these 2 points, seems the traceability is fine to me, please kindly point out if I missed some points. Maybe we can add some more clear and specific information in README or createSleef.sh in #19185 to indicate which version of sleef source we're using in jdk. >> >> I'm also fine with your suggestion to add whole sleef repo into jdk (maybe we can remove some of files, but we can ignore the difference temporarily in the dicussion here). To copy the sleef repo into jdk, we still need to pre-generate the inline header files, and check them in jdk along with the sleef repo, I think you also think so too > > Yes. > >> (As without checking in these inline headers, we will have to bring some extra dependencies into jdk, and increase extra compilation time when building jdk). But from traceability point of view, seems to me it does not bring extra benefit than current #19185. For example, if someone want to verify the pre-generate inline headers in jdk, he need to first verify the sleef source in jdk, then the pre-generated sleef inline headers. > > You don't need to verify the pre-generated inline headers, just overwrite them. The point is that the sleef source is di... > @theRealAph Thanks for clarification. > > I think there are several different parts involved in the above discussion, please kindly correct me if I misunderstood. > > 1. package builders. This is about the release of jdk (both src and binary), by either openjdk, adoptium, or any other downstream vendors. > > 2. jdk daily development. This is about to modify, build, run/test jdk daily by jdk developers. > > For the package builders, original sleef source is may be > necessary; for the jdk daily development, only pre-generated sleef inline headers are necessary. Yes, most of the time. Some devs will want to be more thorough. > The script to pre-generate sleef inline headers is only triggerred by package builders (and I think it involves some scripts which are not part of jdk source ? e.g. the script to trigger pre-generating script), No: all of the scripts to generate the preprocessed source from the SLEEF source must in the OpenJDK source. > but for jdk daily development, we just need pre-generated sleef inline headers. Am I understanding correctly above? Yes, most of the time. Bear in mind that convenient daily development of OpenJDK is important, because we don't want to discourage developers. But we've never treated the size of the repo as one of our primary considerations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2230478845 From shade at openjdk.org Tue Jul 16 09:56:57 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 16 Jul 2024 09:56:57 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields In-Reply-To: <YFP94FW91LrpdTMeak-ePVmpwlW788IBynq_qBZVves=.a6acb940-78b0-4fce-826a-fb065d8a41f6@github.com> References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> <YFP94FW91LrpdTMeak-ePVmpwlW788IBynq_qBZVves=.a6acb940-78b0-4fce-826a-fb065d8a41f6@github.com> Message-ID: <xi5HafA-_m5iDGFCKkFvkqUtOe9vzbxc1Ix6m9EyPVU=.ac3930b3-7aa0-493e-9331-7706f2224f6a@github.com> On Mon, 15 Jul 2024 23:29:37 GMT, Paul Sandoz <psandoz at openjdk.org> wrote: > IIUC this means we can remove the explicit fence here: > > ``` > public ConstantCallSite(MethodHandle target) { > super(target); > isFrozen = true; > UNSAFE.storeStoreFence(); // properly publish isFrozen update > } > ``` I think so, but there is more to it: there are other fences around the `CallSite`-s that might be simplified. I would prefer not to do it any of usage changes in this PR. Separately, I tried to benchmark `new ConstantCallSite(MH)` just to see if these barriers are merged, and quickly realized there is a bunch of `MethodHandleNatives$CallSiteContext` with `Cleaners` get instantiated for every `CCS` created, which completely dominates any wins we get from removing this fence. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2230489134 From shade at openjdk.org Tue Jul 16 10:21:51 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 16 Jul 2024 10:21:51 GMT Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers [v2] In-Reply-To: <JbfdSfSkvw0v3-W6vH1_jeilVN47W1vtRGKCCLuBI-Q=.37ffdc4f-95e7-4b2d-b7a1-89895bb081d4@github.com> References: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com> <WSVnDVWEq7cIaiEd2-pdWW4Il8Qi4wwvjF2yyveKcgM=.613045d7-a827-4f3d-bcf4-ba9200a2c8f4@github.com> <t3K5QhtFrCpM4EoXc_pskncDv72bSfKgUKfguzjVI0Q=.4e5b01d1-9cad-45ec-8d70-656615bee374@github.com> <0j_XZ2e84ADGz8jxk21pFyF0QNhubV0i7sVi5sxnSyg=.7281e6d1-bf24-49f1-96a6-8284c4c9f90d@github.com> <G5EBaq25gdUcR-5HHsF3Bg8vvpXImOqwnKZbIht8LMI=.07dd543a-1442-495b-97cd-c2bffe268949@github.com> <5Xt9rNCHwYnwvFMglf_Yp5ZzwKEDNrmRecR_NrFLGMA=.7aa1fef1-a977-4244-ad24-df9897bb2743@github.com> <JbfdSfSkvw0v3-W6vH1_jeilVN47W1vtRGKCCLuBI-Q=.37ffdc4f-95e7-4b2d-b7a1-89895bb081d4@github.com> Message-ID: <igvhqHUrjhPCzZuCcU3N27827N5KRF3m6N8PuqXJdAc=.5b174ff0-0044-4618-af75-5cbbf3021f7d@github.com> On Tue, 16 Jul 2024 04:55:52 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Yeah, this is not really a cleanup (behaviors stay the same) change. For this particular hunk, keeping the old behavior seems to be unnecessary work. Note that we are also changing the behavior in C2: both in `do_exits` we no longer emit the barriers for `static final` stores in `clinits`, plus EA does not care about `clinits` anymore as well. Those are also behavioral changes. >> >> If you prefer, I can turn this PR into a behaviorally similar cleanup, and do the behavior changes separately. > > I certainly would not want those C2 changes to hidden behind what looks like a cleanup on the surface, so please do separate things out. > > BTW Mandy is away for a while so we can't get her input on the original intent here. Fine by me. I am splitting out C2 parts here: https://bugs.openjdk.org/browse/JDK-8336465 https://bugs.openjdk.org/browse/JDK-8336466 I'll probably fork `get_flags` change as a separate bug as well. This PR would then be only the final non-behavioral cleanups that would eliminate the remnant uses of `is_initializer`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20120#discussion_r1679144326 From adinn at openjdk.org Tue Jul 16 10:33:53 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 16 Jul 2024 10:33:53 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v11] In-Reply-To: <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com> Message-ID: <5jNrh7ZNo8EcvpWN0AFr73R-TpkZif8iRxM8zzgz458=.a408857e-1642-4a49-98a1-fc6322697115@github.com> On Tue, 9 Jul 2024 12:08:50 GMT, Hamlin Li <mli at openjdk.org> wrote: >> Hi, >> Can you help to review the patch? >> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294). >> * NOTE: This pr depends on https://github.com/openjdk/jdk/pull/19185, which includes a README, a script to generate sleef inline headers and generated sleef inline headers. >> >> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time. >> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk. >> >> Besides of the code changes, one important task is to handle the legal process. >> >> Thanks! >> >> ## Test >> tests: >> * test/jdk/jdk/incubator/vector/ >> * test/hotspot/jtreg/compiler/vectorapi/ >> >> options: >> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:+EnableVectorSupport -XX:-UseVectorStubs >> >> ## Performance >> >> ### Options >> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' >> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' >> >> ### Float >> data >> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> >> Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0) >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408 >> Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876 >> Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936 >> Float128Vector.ATAN... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > skip TANH Obviously we need to include pre-generated sources in the repo so that most people can just build the library using *sanctioned* code without needing to regenerate anything. I absolutely agree with @theRealAph that we need to have all relevant SLEEF header build scripts in the OpenJDK repo so that anyone who want to rebuild the headers can do so. I don't believe it is just packagers who will want to do that and it is good open source practice to allow and, where possible, make it easy for anyone to do so. Given the size of the original SLEEF sources I also agree with @theRealAph it is no great burden to include them in the jdk repo. However, I am not averse to @vidmik's alternative of putting the sources in an openjdk/sleef repo. That would be fine so long as the openjdk repo includes SLEEF build scripts that pull a determinate hash to generate the headers. Likewise I agree with @vidmik's suggestion of omitting the extra packages the SLEEF generate step requires from the standard configure/make scripts would be fine so long as the SLEEF build scripts prompt users on what to install. We don't want to force everyone to install packages that they don't need. But we do still need to make it straightforward for those who do want to regenerate the sources to achieve that goal. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2230559757 From duke at openjdk.org Tue Jul 16 10:38:58 2024 From: duke at openjdk.org (Stewart X Addison) Date: Tue, 16 Jul 2024 10:38:58 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v11] In-Reply-To: <brYLslKpd_pgvj7HsK527bi1vXS-7FuzMBusHLpZ25I=.e205b93e-ac3b-4213-bda1-bab72946f206@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com> <uWeKJ7D4_DgMnBgU2o4KzvVU6lB4xVBds2-SAAPEthU=.cfa1b966-9791-4773-9f10-cb35f58871f0@github.com> <brYLslKpd_pgvj7HsK527bi1vXS-7FuzMBusHLpZ25I=.e205b93e-ac3b-4213-bda1-bab72946f206@github.com> Message-ID: <yWw7g4vjHgTC-zflIyxoc7_f18EcKeoAtJ8KHV1f76Y=.925a5e88-2fb7-4d74-b500-9d9d400b6dfb@github.com> On Tue, 16 Jul 2024 08:21:04 GMT, Andrew Haley <aph at openjdk.org> wrote: > This is starting to sound like we need a policy decision, because we don't want to re-hash this discussion every time the question comes up, as it surely will. +1 to this if we don't already have one While I haven't read through every comment in this thread in this specific case I generally agree with what @theRealAph has said in some of his earlier comments. My primary concern is that the generated code in there is currently effectively unreviewable in terms of checking for potential vulnerabilities so I also feel it's best to check in the whole (reviewable) source if this PR is to be accepted. Much as I dislike repository bloat I think it's a fairly easy decision in this case IMHO with SLEEF being 7.5MB in size when the openjdk codebase is so large. An alternative "absolute minimum" would be to reference the GitHub SHA of the SLEEF source and include the process for regenerating it reproducibly so that this information is available to anyone who wanted to verify it. With my distributor (Temurin) hat on either of those solutions would mean we have the original source referenced for inclusion in the product SBOM to track the supply chain. I'll also note that I'm also making an assumption here that the generated code from SLEEF is reproducible and not sensitive to the build environment like the CDS archives - I have not tried building them myself to verify but I feel that is important to understand before merging the generated code. As a project should also consider whole issue of ensuring that we have sufficient trust from a supply-chain perspective on the SLEEF source ... I have no specific reason to distrust it but it might be good to understand how well reviewed it is before doing this as it's not a project I'm personally familiar with. _On a slightly separate note (and I see @luhenry is in this comment thread too and has contributed to SLEEF) it will be good if this can be used to enhance the performance on RISC-V too in the future ;-)_ ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2230569814 From mli at openjdk.org Tue Jul 16 10:38:59 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 16 Jul 2024 10:38:59 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v11] In-Reply-To: <m3SuSFaXSrlS3hEl2vwD43JqUZFg0CbgXozRVliTa-Q=.26512d19-ecf9-4de5-9106-27794407c61d@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com> <HIRpMmVn-DXZ0J6woZFgviDZEcVnNR2m6YecDBTiuPY=.13e18284-0aa7-45fb-8bd2-1c0ae0be1914@github.com> <m3SuSFaXSrlS3hEl2vwD43JqUZFg0CbgXozRVliTa-Q=.26512d19-ecf9-4de5-9106-27794407c61d@github.com> Message-ID: <Sd8-4UxWAFuL8h7xMKkLAwF1InMGxN-raDX6HjKNFSY=.4fb8f96a-cb91-417f-802d-837ed28e6266@github.com> On Tue, 16 Jul 2024 09:48:55 GMT, Andrew Haley <aph at openjdk.org> wrote: >>> Currently, >>> >>> * in [8329816: Add SLEEF version 3.6.1 #19185](https://github.com/openjdk/jdk/pull/19185) it generates the sleef inline headers from sleef 3.6.1, which is tagged in sleef repo. >>> >>> * And with the script in [8329816: Add SLEEF version 3.6.1 #19185](https://github.com/openjdk/jdk/pull/19185), anyone with access to sleef repo can re-generate these inline headers by himself >> >> Right, but think about package builders. This isn't about J Random Hacker doing it by hand. >> >> When a package gets built, the builder machine unpacks source code. If SLEEF is included as part of JDK source, all the builder has to do is run the script and overwrite whatever preprocessed source is in there. The alternative is packaging the SLEEF source code tarball separately in the OpenJDK source package. Sure, all of this can be done, but it's a question of whether we do it once, here, now, or all the downstream builders have to do it themselves. >> >>> ( in fact anyone can generate the inline headers from sleef from scratch without using scripts in [8329816: Add SLEEF version 3.6.1 #19185](https://github.com/openjdk/jdk/pull/19185), our script just make it easy for the future maintenance), so it's easy for anyone to verify these inline header files used in jdk. >> >> That script must be checked in to the OpenJDK tree. >> >>> With these 2 points, seems the traceability is fine to me, please kindly point out if I missed some points. Maybe we can add some more clear and specific information in README or createSleef.sh in #19185 to indicate which version of sleef source we're using in jdk. >>> >>> I'm also fine with your suggestion to add whole sleef repo into jdk (maybe we can remove some of files, but we can ignore the difference temporarily in the dicussion here). To copy the sleef repo into jdk, we still need to pre-generate the inline header files, and check them in jdk along with the sleef repo, I think you also think so too >> >> Yes. >> >>> (As without checking in these inline headers, we will have to bring some extra dependencies into jdk, and increase extra compilation time when building jdk). But from traceability point of view, seems to me it does not bring extra benefit than current #19185. For example, if someone want to verify the pre-generate inline headers in jdk, he need to first verify the sleef source in jdk, then the pre-generated sleef inline headers. >> >> You don't need to verify the pre-generated inline headers, just overwrite them. The ... > >> @theRealAph Thanks for clarification. >> >> I think there are several different parts involved in the above discussion, please kindly correct me if I misunderstood. >> >> 1. package builders. This is about the release of jdk (both src and binary), by either openjdk, adoptium, or any other downstream vendors. >> >> 2. jdk daily development. This is about to modify, build, run/test jdk daily by jdk developers. >> >> For the package builders, original sleef source is > > may be > >> necessary; for the jdk daily development, only pre-generated sleef inline headers are necessary. > > Yes, most of the time. Some devs will want to be more thorough. > >> The script to pre-generate sleef inline headers is only triggerred by package builders (and I think it involves some scripts which are not part of jdk source ? e.g. the script to trigger pre-generating script), > > No: all of the scripts to generate the preprocessed source from the SLEEF source must in the OpenJDK source. > >> but for jdk daily development, we just need pre-generated sleef inline headers. Am I understanding correctly above? > > Yes, most of the time. > > Bear in mind that convenient daily development of OpenJDK is important, because we don't want to discourage developers. But we've never treated the size of the repo as one of our primary considerations. @theRealAph I see, I think now I understand the whole picture of your concerns. Thanks! > I think the key question is whether we're comfortable relying on/pointing at an external repository which may or may not be there tomorrow and/or where tags may change outside of our control. > The SLEEF source code looks to be around 7.5MB, give or take. That's not enormous, but it's not exactly small when keeping in mind that if we #include it in the jdk repo it's going to be there for every cloned repo in every project/branch and very few will actually care about it. I agree that we'd still have to include the pre-generated header files. > Hence my suggestion to consider putting it under our control, but in a separate openjdk controlled repository. Based on @vidmik 's previous comments, I think we all agree original sleef source should be added into jdk, including pre-generated sleef inline headers, the only different opinions between us are about how to include sleef source into jdk, one is to just add it into jdk repo itself, another is to put it in another repo which is under control of jdk. Please kindly correct me if I misunderstood. I have not particular preference which options to take. My only concern is how long it will take to make that decision. If it could take rather long time, can we take several incremental steps to achieve the final goal? e.g. 1. add pre-generated sleef inline headers into jdk, which is done by https://github.com/openjdk/jdk/pull/19185 2. support vector math in jdk, which is done by this pr. 3. add sleef source into either jdk repo itself or another repo under control of jdk. I think we have plenty time to achieve the final goal in jdk-24. How do you think about it? @theRealAph @vidmik @luhenry @magicus @erikj79 ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2230571779 From mli at openjdk.org Tue Jul 16 10:44:56 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 16 Jul 2024 10:44:56 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v9] In-Reply-To: <UwSxg6BMklnndJlZGLVLgDgvcr-VrZbeuJyyBHMFrZ0=.eaad3ac9-b520-4abd-8f74-663f70e20f6d@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com> <eT48AR-Up7CyMkuiFet-hoQtyaO_hifCSZUQ6LJrjnQ=.026071f1-de0f-4589-a247-c7fc2afe68c4@github.com> <2VnXjMF_4HQa-bHWEW0-VaXF9VtQUs92mnPyUlF8UY8=.b6d68aab-b0f5-4544-b543-046d12f92b1b@github.com> <iaoi0o--txlXDpM7hHfpbn_wQWD9DxBlRDwXaQ8V9RQ=.e59b601f-3c14-4560-aec1-ba3bce070c01@github.com> <5M8k0CGVXI79Dgu5BVVkEU6sHy7Z3jLvkqyTAg7TelU=.85707058-20a5-4574-86a4-b5c6ca05b4a7@github.com> <UwSxg6BMklnndJlZGLVLgDgvcr-VrZbeuJyyBHMFrZ0=.eaad3ac9-b520-4abd-8f74-663f70e20f6d@github.com> Message-ID: <t8i-uWYm_zkQs-I5gD9oW0hXKwcyqv9Q3knUox47A6k=.2ec13bf7-a79d-4438-9cc6-1b840c65c29b@github.com> On Mon, 15 Jul 2024 17:35:59 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >>> > I can't tell what problem we're trying to solve by not simply checking in the source code, in its preferred form, to the OpenJDK tree. Thhis has practical advantages to do with traceability and security, and in-principle reasons to do with basic Open Source practice too. On the other side, there are no disadvantages. >>> >>> Do you suggest to copy the whole sleef source repo into jdk? >> >> I think so, along with scripting that generates the preprocessed file we use. It might be the case that there are some sleef files not used at all they could be omitted, but I'm not sure it would be useful, and from a traceability point of view it's probably best to grab it all, unless it's really huge > >> > > I can't tell what problem we're trying to solve by not simply checking in the source code, in its preferred form, to the OpenJDK tree. Thhis has practical advantages to do with traceability and security, and in-principle reasons to do with basic Open Source practice too. On the other side, there are no disadvantages. >> > >> > >> > Do you suggest to copy the whole sleef source repo into jdk? >> >> I think so, along with scripting that generates the preprocessed file we use. It might be the case that there are some sleef files not used at all they could be omitted, but I'm not sure it would be useful, and from a traceability point of view it's probably best to grab it all, unless it's really huge > > Given the Sleef build system currently uses cmake, we would have two choices to build the header files as part of the OpenJDK build system: > 1. take a dependency on cmake in order to build the Sleef headers > 2. write a custom build system for Sleef to integrate into OpenJDK > > Neither approach sound good to me as a mandatory option. > > However, if we are to allow the person building OpenJDK to _optionally_ generate the headers from a Sleef source checkout (provided by the user with a `--with-sleef-src=/path/to/sleef`), we can then more easily take the assumption that the user has installed the necessary dependencies. That would also be in line with how binutils is being built and integrated. > _On a slightly separate note (and I see @luhenry is in this comment thread too and has contributed to SLEEF) it will be good if this can be used to enhance the performance on RISC-V too in the future ;-)_ We already had a prototype which depends on this pr, and the performance gain is promising. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2230579500 From aph at openjdk.org Tue Jul 16 10:44:57 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 16 Jul 2024 10:44:57 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v11] In-Reply-To: <m3SuSFaXSrlS3hEl2vwD43JqUZFg0CbgXozRVliTa-Q=.26512d19-ecf9-4de5-9106-27794407c61d@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com> <HIRpMmVn-DXZ0J6woZFgviDZEcVnNR2m6YecDBTiuPY=.13e18284-0aa7-45fb-8bd2-1c0ae0be1914@github.com> <m3SuSFaXSrlS3hEl2vwD43JqUZFg0CbgXozRVliTa-Q=.26512d19-ecf9-4de5-9106-27794407c61d@github.com> Message-ID: <GvXWotWLspc5hS8zeLHxCKLcFdqzeH4D-1r6Ju4lYIw=.c8070480-d31d-48c2-8d0f-9be57a0441de@github.com> On Tue, 16 Jul 2024 09:48:55 GMT, Andrew Haley <aph at openjdk.org> wrote: > @theRealAph Thanks for clarification. > > I think there are several different parts involved in the above discussion, please kindly correct me if I misunderstood. > > 1. package builders. This is about the release of jdk (both src and binary), by either openjdk, adoptium, or any other downstream vendors. > > 2. jdk daily development. This is about to modify, build, run/test jdk daily by jdk developers. > > For the package builders, original sleef source is may be > necessary; for the jdk daily development, only pre-generated sleef inline headers are necessary. Yes, most of the time. Some devs will want to be more thorough. > The script to pre-generate sleef inline headers is only triggerred by package builders (and I think it involves some scripts which are not part of jdk source ? e.g. the script to trigger pre-generating script), No: all of the scripts to generate the preprocessed source from the SLEEF source must in the OpenJDK source. > but for jdk daily development, we just need pre-generated sleef inline headers. Am I understanding correctly above? Yes, most of the time. Bear in mind that convenient daily development of OpenJDK is important, because we don't want to discourage developers. But we've never treated the size of the repo as one of our primary considerations. > I have not particular preference which options to take. My only concern is how long it will take to make that decision. If it could take rather long time, can we take several incremental steps to achieve the final goal? e.g. We're only a couple of weeks away from the summit. What would be a long time? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2230581736 From mli at openjdk.org Tue Jul 16 10:50:55 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 16 Jul 2024 10:50:55 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v11] In-Reply-To: <GvXWotWLspc5hS8zeLHxCKLcFdqzeH4D-1r6Ju4lYIw=.c8070480-d31d-48c2-8d0f-9be57a0441de@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com> <HIRpMmVn-DXZ0J6woZFgviDZEcVnNR2m6YecDBTiuPY=.13e18284-0aa7-45fb-8bd2-1c0ae0be1914@github.com> <m3SuSFaXSrlS3hEl2vwD43JqUZFg0CbgXozRVliTa-Q=.26512d19-ecf9-4de5-9106-27794407c61d@github.com> <GvXWotWLspc5hS8zeLHxCKLcFdqzeH4D-1r6Ju4lYIw=.c8070480-d31d-48c2-8d0f-9be57a0441de@github.com> Message-ID: <XA-SfmwxqKp1L0Ca3fIalhVzhaukZAEgUEOQ1UDD8Cw=.1346d19a-5dfd-4146-bc45-c593e834d14f@github.com> On Tue, 16 Jul 2024 10:42:24 GMT, Andrew Haley <aph at openjdk.org> wrote: > We're only a couple of weeks away from the summit. What would be a long time? OK, then let's wait for it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2230591233 From shade at openjdk.org Tue Jul 16 11:31:55 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 16 Jul 2024 11:31:55 GMT Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers [v2] In-Reply-To: <igvhqHUrjhPCzZuCcU3N27827N5KRF3m6N8PuqXJdAc=.5b174ff0-0044-4618-af75-5cbbf3021f7d@github.com> References: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com> <WSVnDVWEq7cIaiEd2-pdWW4Il8Qi4wwvjF2yyveKcgM=.613045d7-a827-4f3d-bcf4-ba9200a2c8f4@github.com> <t3K5QhtFrCpM4EoXc_pskncDv72bSfKgUKfguzjVI0Q=.4e5b01d1-9cad-45ec-8d70-656615bee374@github.com> <0j_XZ2e84ADGz8jxk21pFyF0QNhubV0i7sVi5sxnSyg=.7281e6d1-bf24-49f1-96a6-8284c4c9f90d@github.com> <G5EBaq25gdUcR-5HHsF3Bg8vvpXImOqwnKZbIht8LMI=.07dd543a-1442-495b-97cd-c2bffe268949@github.com> <5Xt9rNCHwYnwvFMglf_Yp5ZzwKEDNrmRecR_NrFLGMA=.7aa1fef1-a977-4244-ad24-df9897bb2743@github.com> <JbfdSfSkvw0v3-W6vH1_jeilVN47W1vtRGKCCLuBI-Q=.37ffdc4f-95e7-4b2d-b7a1-89895bb081d4@github.com> <igvhqHUrjhPCzZuCcU3N27827N5KRF3m6N8PuqXJdAc=.5b174ff0-0044-4618-af75-5cbbf3021f7d@github.com> Message-ID: <leWZFcvd_U0fheNtw2jg5UNqI3FWde3kWDouz-ysr_w=.cd791610-3d70-45c3-b7bd-1e21b62596d4@github.com> On Tue, 16 Jul 2024 10:18:53 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > I'll probably fork get_flags change as a separate bug as well. Now part of: https://bugs.openjdk.org/browse/JDK-8336468 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20120#discussion_r1679228645 From shade at openjdk.org Tue Jul 16 11:37:51 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 16 Jul 2024 11:37:51 GMT Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers [v3] In-Reply-To: <swBWpqAm_k6hHjGcwdNBowWfdBpksxtD63PiGp0KI1c=.ad02279c-ed66-40a0-9b01-379d4410a16c@github.com> References: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com> <swBWpqAm_k6hHjGcwdNBowWfdBpksxtD63PiGp0KI1c=.ad02279c-ed66-40a0-9b01-379d4410a16c@github.com> Message-ID: <oTI3X6WBWYwOAhQtv9O2saeOZUMZ0X1eo1cVPd2ojvw=.c531e3e5-dd9e-4793-8152-dafce6162b86@github.com> On Fri, 12 Jul 2024 09:17:22 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: >> All around Hotspot, we have calls to `method->is_initializer()`. That methods test for both instance and static initializers. In many cases, the uses imply we actually want to test for constructor (instance initializer), not static initializer. Sometimes we filter explicitly for `!m->is_static()`, sometimes we don't. Often we get lucky by never being exposed to static initializers on particular paths. >> >> I would like to sharpen this. I went back and forth, and ultimately decided to remove `is_initializer` completely to avoid future confusion, and rewrite the uses appropriately. >> >> Additional testing: >> - [x] Linux AArch64 server fastdebug, `all` (includes Fuzzer and CTW tests) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Touch up assert messages Putting back to draft until the behavioral changes are done in separate sub-tasks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20120#issuecomment-2230671038 From shade at openjdk.org Tue Jul 16 12:09:14 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 16 Jul 2024 12:09:14 GMT Subject: RFR: 8336468: Reflection and MethodHandles should use more precise initializer checks Message-ID: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com> This PR should cover the Reflection/MethodHandles part of [JDK-8336103](https://bugs.openjdk.org/browse/JDK-8336103). There are places where we change the behavior: `clinit` would now be recorded as "method", instead of "constructor". Tracing back the uses of `get_flags`: it is used for initializing `java.lang.ClassFrameInfo.flags`. There seem to be no readers for this field in VM. Java side for `j.l.CFI` does not seem to check any method/constructor flags. So I would say this change in behavior is not really visible, and there is no need to try and keep the old (odd) behavior. I also inlined the `select_method` definition, which allows for a bit more straight-forward local code, and obviates the need for wrapping things with `methodHandle`. @mlchung, you probably want to look at this more closely. Additional testing: - [x] Linux x86_64 server fastdebug, `tier1` - [ ] Linux x86_64 server fastdebug, `all` ------------- Commit messages: - Remove unnecessary handle-izing - Fix - Fix Changes: https://git.openjdk.org/jdk/pull/20192/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20192&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8336468 Stats: 38 lines in 5 files changed: 14 ins; 10 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/20192.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20192/head:pull/20192 PR: https://git.openjdk.org/jdk/pull/20192 From rkennke at openjdk.org Tue Jul 16 12:46:55 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 16 Jul 2024 12:46:55 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9] In-Reply-To: <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com> Message-ID: <1fs1zYHKJsoWuEpKNb1ZY_VQ7_i_gQrbmx4d2fJvQo0=.1e3cbf20-dedf-4113-95c2-444869a75d1d@github.com> On Mon, 15 Jul 2024 00:50:30 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision: > > - Remove try_read > - Add explicit to single parameter constructors > - Remove superfluous access specifier > - Remove unused include > - Update assert message OMCache::set_monitor > - Fix indentation > - Remove outdated comment LightweightSynchronizer::exit > - Remove logStream include > - Remove strange comment > - Fix javaThread include Another review pass by me. It looks to me like the cache lookup can be improved, see comments below. src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 323: > 321: ldr(t1, Address(t3_t)); > 322: cmp(obj, t1); > 323: br(Assembler::EQ, monitor_found); I think the loop could be optimized a bit, if we start with the (cache_address) - 1 in t3, then increment t3 at the start of the loop, and let the success-case fall-through and only branch back to loop-start or to failure-path. Something like: bind(loop); increment(t3_t, in_bytes(OMCache::oop_to_oop_difference())); ldr(t1, Address(t3_t)); cbnz(t1, loop); cmp(obj, t1); br(Assembler::NE, loop); // Success Advantage would be that we have no forward-branch in the fast/expected case. CPU static branch prediction tends to not like that. I'm not sure if if makes a difference, though. Also, if you do that, then the unrolled loop also needs corresponding adjustment. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 674: > 672: > 673: // Search for obj in cache. > 674: bind(loop); Same loop transformation would be possible here. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 776: > 774: movl(top, Address(thread, JavaThread::lock_stack_top_offset())); > 775: > 776: if (!UseObjectMonitorTable) { Why is the mark loaded here in the !UOMT case, but later in the +UOMT case? ------------- PR Review: https://git.openjdk.org/jdk/pull/20067#pullrequestreview-2179942149 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1679210139 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1679313050 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1679315158 From rkennke at openjdk.org Tue Jul 16 12:46:55 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 16 Jul 2024 12:46:55 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9] In-Reply-To: <1fs1zYHKJsoWuEpKNb1ZY_VQ7_i_gQrbmx4d2fJvQo0=.1e3cbf20-dedf-4113-95c2-444869a75d1d@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com> <1fs1zYHKJsoWuEpKNb1ZY_VQ7_i_gQrbmx4d2fJvQo0=.1e3cbf20-dedf-4113-95c2-444869a75d1d@github.com> Message-ID: <men3PDFTArcoKhAknFkS7kJ-OEcYDd8-6oyH4MOx36M=.0e9ee765-ed8e-4c9f-8faf-62a7b489f76e@github.com> On Tue, 16 Jul 2024 12:37:43 GMT, Roman Kennke <rkennke at openjdk.org> wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision: >> >> - Remove try_read >> - Add explicit to single parameter constructors >> - Remove superfluous access specifier >> - Remove unused include >> - Update assert message OMCache::set_monitor >> - Fix indentation >> - Remove outdated comment LightweightSynchronizer::exit >> - Remove logStream include >> - Remove strange comment >> - Fix javaThread include > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 776: > >> 774: movl(top, Address(thread, JavaThread::lock_stack_top_offset())); >> 775: >> 776: if (!UseObjectMonitorTable) { > > Why is the mark loaded here in the !UOMT case, but later in the +UOMT case? Ah I see, it is because we don't have enough registers. Right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1679316824 From stuefe at openjdk.org Tue Jul 16 14:19:56 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 16 Jul 2024 14:19:56 GMT Subject: RFR: 8330144: Revise os::free_memory() [v2] In-Reply-To: <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com> References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com> <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com> Message-ID: <jeGYenDGeG2TuOZuIr_B2ZZBDt4f-HqSm3VmCnI05V0=.dbe1f1cd-641c-4819-85ff-5f5d0e356847@github.com> On Wed, 10 Jul 2024 20:09:45 GMT, Robert Toyonaga <duke at openjdk.org> wrote: >> ### Summary >> On linux, change `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` so that it uses `madvise(MADV_DONTNEED)` (similar to the BSD implementation) instead of recommitting over the existing committed memory to discard the existing pages. This function should free the underlying memory without uncommitting. The benefit of this change is that we can get rid of conditional logic dependent on whether we're dealing with huge pages, `madvise` can't fail, and we can also get rid of the "alignment_hint" parameter. >> >> `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` has also been renamed to `os::disclaim_memory(char *addr, size_t bytes)` to differentiate it from `os::free_memory()` which reports the size of free memory instead of actually releasing memory. >> >> **Transparent huge pages:** >> `madvise(MADV_DONTNEED)` works with THP. As with small pages, `madvise(MADV_DONTNEED)` results in the memory being freed, RSS decreasing, and the addresses can be re-touched without being explicitly recommitted. >> >> To determine this, I set /sys/kernel/mm/transparent_hugepage/enabled to "always" and allocated a large amount of memory. Then /proc/PID/smaps shows that THP are being used to back that memory. After calling `disclaim_memory`, RSS decreases indicating the memory is no longer live. The `os::committed_in_range function` also reports that the memory has been freed (This function should probably be renamed to `live_in_range`). Touching the addresses again afterward is fine as well. >> >> **Explicit huge pages:** >> `madvise(MADV_DONTNEED)` does not result in memory being freed when used on explicit huge pages. However, the pages are not lost either. Additionally, after `madvise(MADV_DONTNEED)`, we can retouch the addresses without any problems. In conclusion, `madvise(MADV_DONTNEED)` has no affect on huge pages. This means the behavior of of this function with respect to huge pages remains the same. We can remove the "alignment_hint" parameter. >> >> To determine this, I allocated some huge pages via /proc/sys/vm/nr_hugepages. Successful allocation was confirmed with /proc/meminfo. After calling `disclaim_memory`, /proc/meminfo shows no change in the number of huge pages in use. Explicit huge pages are not reflected in RSS so I used the `os::committed_in_range function` instead. After calling `disclaim_memory`, the `os::committed_in_range` function reports that the memory is still live. Unfortunately that's not an imp... > > Robert Toyonaga has updated the pull request incrementally with two additional commits since the last revision: > > - Minor cleanup and comments. > - rename to disclaim_memory and update test Good. Thanks! ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20080#pullrequestreview-2180397408 From stuefe at openjdk.org Tue Jul 16 14:21:53 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 16 Jul 2024 14:21:53 GMT Subject: RFR: 8300732: Whitebox functions for Metaspace test should use byte size [v2] In-Reply-To: <it3fVJfbutmJVOuZy7XTjmzgHXPMC8TJHAGI1gxCKjs=.3f8c72d6-97c1-408d-8e27-20ea940d7f89@github.com> References: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com> <it3fVJfbutmJVOuZy7XTjmzgHXPMC8TJHAGI1gxCKjs=.3f8c72d6-97c1-408d-8e27-20ea940d7f89@github.com> Message-ID: <aHSDP47aGik9Qecjv90nVZqfvVoHjqDj-5BaiOaxU44=.c54d9138-58a3-4f63-92e1-8e9abeec7c21@github.com> On Mon, 15 Jul 2024 19:41:13 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This PR addresses [8300732](https://bugs.openjdk.org/browse/JDK-8300732) switching Whitebox Metaspace test functions to use bytes as opposed to words. >> >> Testing: >> - [x] `test/hotspot/jtreg/runtime/Metaspace` tests pass. >> >> Thanks, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with two additional commits since the last revision: > > - Hard coding values and adding Unit class > - whitebox changes based on feedback. Using is_aligned and asserts Okay, if 32-bit passes. Thanks! test/hotspot/jtreg/runtime/Metaspace/elastic/TestMetaspaceAllocation.java line 56: > 54: MetaspaceTestContext context = new MetaspaceTestContext(); > 55: MetaspaceTestArena arena1 = context.createArena(false, 32L * Unit.valueOf("M").size()); > 56: MetaspaceTestArena arena2 = context.createArena(true, 32L * Unit.valueOf("M").size()); Why not just `Unit.M.size()` ? ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20039#pullrequestreview-2180404772 PR Review Comment: https://git.openjdk.org/jdk/pull/20039#discussion_r1679500651 From szaldana at openjdk.org Tue Jul 16 14:39:52 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Tue, 16 Jul 2024 14:39:52 GMT Subject: RFR: 8300732: Whitebox functions for Metaspace test should use byte size [v2] In-Reply-To: <aHSDP47aGik9Qecjv90nVZqfvVoHjqDj-5BaiOaxU44=.c54d9138-58a3-4f63-92e1-8e9abeec7c21@github.com> References: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com> <it3fVJfbutmJVOuZy7XTjmzgHXPMC8TJHAGI1gxCKjs=.3f8c72d6-97c1-408d-8e27-20ea940d7f89@github.com> <aHSDP47aGik9Qecjv90nVZqfvVoHjqDj-5BaiOaxU44=.c54d9138-58a3-4f63-92e1-8e9abeec7c21@github.com> Message-ID: <JcktcaPbfK8cJuEZ5J9ibD5BX0IpFlXY35wVb001QXI=.1550ab2f-8b35-43a7-8964-ff7b7de95249@github.com> On Tue, 16 Jul 2024 14:19:42 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > Okay, if 32-bit passes. Thanks! Correct! I made a GHA job to verify with the builds there. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20039#issuecomment-2231088158 From pchilanomate at openjdk.org Tue Jul 16 14:40:55 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 16 Jul 2024 14:40:55 GMT Subject: [jdk23] RFR: 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 In-Reply-To: <g51dGtyAfFu3y_uVGE7KBXcGPIkb1KICznIibbbDoLs=.202e218b-83ca-4b29-840c-0ae03949620a@github.com> References: <d-ziRnu1_RcjgWDVhYQYb4U0xIWyi5B-hljLzDwQlt4=.a53602c1-25b7-4c93-b468-d55201959846@github.com> <g51dGtyAfFu3y_uVGE7KBXcGPIkb1KICznIibbbDoLs=.202e218b-83ca-4b29-840c-0ae03949620a@github.com> Message-ID: <vvHbxZrKVLQPJKy78pXaJCQwncehdQFwtKoT1UnxLiQ=.084fe4ff-7ae1-4d6d-b9d8-8b726e814b34@github.com> On Mon, 15 Jul 2024 18:23:44 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: >> Hi all, >> >> This pull request contains a backport of commit [7ab96c74](https://github.com/openjdk/jdk/commit/7ab96c74e2c39f430a5c2f65a981da7314a2385b) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. >> >> The commit being backported was authored by Patricio Chilano Mateo on 10 Jul 2024 and was reviewed by David Holmes, Thomas Stuefe, Coleen Phillimore and Aleksey Shipilev. >> >> Thanks > > Marked as reviewed by shade (Reviewer). Thanks for the review @shipilev! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20185#issuecomment-2231083921 From pchilanomate at openjdk.org Tue Jul 16 14:40:56 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 16 Jul 2024 14:40:56 GMT Subject: [jdk23] Integrated: 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 In-Reply-To: <d-ziRnu1_RcjgWDVhYQYb4U0xIWyi5B-hljLzDwQlt4=.a53602c1-25b7-4c93-b468-d55201959846@github.com> References: <d-ziRnu1_RcjgWDVhYQYb4U0xIWyi5B-hljLzDwQlt4=.a53602c1-25b7-4c93-b468-d55201959846@github.com> Message-ID: <LJe1370nGwLg6AfN-sP11NoXuo3gOw_DNBlL4IAGzuo=.aadc127e-0505-4c6f-9ed1-3721f4a5871f@github.com> On Mon, 15 Jul 2024 18:13:53 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote: > Hi all, > > This pull request contains a backport of commit [7ab96c74](https://github.com/openjdk/jdk/commit/7ab96c74e2c39f430a5c2f65a981da7314a2385b) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Patricio Chilano Mateo on 10 Jul 2024 and was reviewed by David Holmes, Thomas Stuefe, Coleen Phillimore and Aleksey Shipilev. > > Thanks This pull request has now been integrated. Changeset: d7b7c172 Author: Patricio Chilano Mateo <pchilanomate at openjdk.org> URL: https://git.openjdk.org/jdk/commit/d7b7c1724d87e611c854c73a9a6140a91f132125 Stats: 55 lines in 3 files changed: 6 ins; 20 del; 29 mod 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 Reviewed-by: shade Backport-of: 7ab96c74e2c39f430a5c2f65a981da7314a2385b ------------- PR: https://git.openjdk.org/jdk/pull/20185 From szaldana at openjdk.org Tue Jul 16 14:42:52 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Tue, 16 Jul 2024 14:42:52 GMT Subject: RFR: 8300732: Whitebox functions for Metaspace test should use byte size [v2] In-Reply-To: <aHSDP47aGik9Qecjv90nVZqfvVoHjqDj-5BaiOaxU44=.c54d9138-58a3-4f63-92e1-8e9abeec7c21@github.com> References: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com> <it3fVJfbutmJVOuZy7XTjmzgHXPMC8TJHAGI1gxCKjs=.3f8c72d6-97c1-408d-8e27-20ea940d7f89@github.com> <aHSDP47aGik9Qecjv90nVZqfvVoHjqDj-5BaiOaxU44=.c54d9138-58a3-4f63-92e1-8e9abeec7c21@github.com> Message-ID: <0K8_t5xuwvqbF5FI0J_LDcSEj6tXJ6m6GEkUXfJHZJw=.a1bce5d3-4a2f-4efb-ac75-bce29583a6a1@github.com> On Tue, 16 Jul 2024 14:18:27 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Sonia Zaldana Calles has updated the pull request incrementally with two additional commits since the last revision: >> >> - Hard coding values and adding Unit class >> - whitebox changes based on feedback. Using is_aligned and asserts > > test/hotspot/jtreg/runtime/Metaspace/elastic/TestMetaspaceAllocation.java line 56: > >> 54: MetaspaceTestContext context = new MetaspaceTestContext(); >> 55: MetaspaceTestArena arena1 = context.createArena(false, 32L * Unit.valueOf("M").size()); >> 56: MetaspaceTestArena arena2 = context.createArena(true, 32L * Unit.valueOf("M").size()); > > Why not just `Unit.M.size()` ? Good point - direct access is less error prone anyway. I can update it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20039#discussion_r1679543890 From jvernee at openjdk.org Tue Jul 16 14:46:13 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 16 Jul 2024 14:46:13 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v4] In-Reply-To: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> Message-ID: <mJv-yN_MA6-w1fqnIbYfoTUJmR_p2myDO2-0BA-Op7I=.ce80bd5d-da24-4150-bed9-5c614c02b3b8@github.com> > This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. > > Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. > > In this PR: > - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it. > - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. > - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. > - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. > > I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: > > > Benchmark Threads Mode Cnt Score Error Units > ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op > ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op > ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op > ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op > ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op > ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op > ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op > ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op > ConcurrentClose.sharedAccess 1 avgt 10 52.042 ? 0.630 us/op > ConcurrentClose.conf... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: JVMCI support ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20158/files - new: https://git.openjdk.org/jdk/pull/20158/files/6d0b9b57..62849aa8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=02-03 Stats: 31 lines in 9 files changed: 29 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20158.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20158/head:pull/20158 PR: https://git.openjdk.org/jdk/pull/20158 From szaldana at openjdk.org Tue Jul 16 14:49:27 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Tue, 16 Jul 2024 14:49:27 GMT Subject: RFR: 8300732: Whitebox functions for Metaspace test should use byte size [v3] In-Reply-To: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com> References: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com> Message-ID: <D7Jfl3uzXBUJgWKE_iu88rhdWpkge5IK4SPs3lt4xUM=.b112d42f-c718-4922-ad69-da4714cf5ecb@github.com> > Hi all, > > This PR addresses [8300732](https://bugs.openjdk.org/browse/JDK-8300732) switching Whitebox Metaspace test functions to use bytes as opposed to words. > > Testing: > - [x] `test/hotspot/jtreg/runtime/Metaspace` tests pass. > > Thanks, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: Feedback - updating Unit.valueOf to direct access ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20039/files - new: https://git.openjdk.org/jdk/pull/20039/files/7c0138ca..d6a1155d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20039&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20039&range=01-02 Stats: 6 lines in 3 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20039.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20039/head:pull/20039 PR: https://git.openjdk.org/jdk/pull/20039 From duke at openjdk.org Tue Jul 16 14:57:53 2024 From: duke at openjdk.org (Robert Toyonaga) Date: Tue, 16 Jul 2024 14:57:53 GMT Subject: RFR: 8330144: Revise os::free_memory() [v2] In-Reply-To: <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com> References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com> <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com> Message-ID: <ra9L9o5yJ32V6_UDWlfpVtQC_3trCI_o5xr_k4jTHGo=.85b235fd-54cb-4366-994a-c9f71c57a72a@github.com> On Wed, 10 Jul 2024 20:09:45 GMT, Robert Toyonaga <duke at openjdk.org> wrote: >> ### Summary >> On linux, change `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` so that it uses `madvise(MADV_DONTNEED)` (similar to the BSD implementation) instead of recommitting over the existing committed memory to discard the existing pages. This function should free the underlying memory without uncommitting. The benefit of this change is that we can get rid of conditional logic dependent on whether we're dealing with huge pages, `madvise` can't fail, and we can also get rid of the "alignment_hint" parameter. >> >> `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` has also been renamed to `os::disclaim_memory(char *addr, size_t bytes)` to differentiate it from `os::free_memory()` which reports the size of free memory instead of actually releasing memory. >> >> **Transparent huge pages:** >> `madvise(MADV_DONTNEED)` works with THP. As with small pages, `madvise(MADV_DONTNEED)` results in the memory being freed, RSS decreasing, and the addresses can be re-touched without being explicitly recommitted. >> >> To determine this, I set /sys/kernel/mm/transparent_hugepage/enabled to "always" and allocated a large amount of memory. Then /proc/PID/smaps shows that THP are being used to back that memory. After calling `disclaim_memory`, RSS decreases indicating the memory is no longer live. The `os::committed_in_range function` also reports that the memory has been freed (This function should probably be renamed to `live_in_range`). Touching the addresses again afterward is fine as well. >> >> **Explicit huge pages:** >> `madvise(MADV_DONTNEED)` does not result in memory being freed when used on explicit huge pages. However, the pages are not lost either. Additionally, after `madvise(MADV_DONTNEED)`, we can retouch the addresses without any problems. In conclusion, `madvise(MADV_DONTNEED)` has no affect on huge pages. This means the behavior of of this function with respect to huge pages remains the same. We can remove the "alignment_hint" parameter. >> >> To determine this, I allocated some huge pages via /proc/sys/vm/nr_hugepages. Successful allocation was confirmed with /proc/meminfo. After calling `disclaim_memory`, /proc/meminfo shows no change in the number of huge pages in use. Explicit huge pages are not reflected in RSS so I used the `os::committed_in_range function` instead. After calling `disclaim_memory`, the `os::committed_in_range` function reports that the memory is still live. Unfortunately that's not an imp... > > Robert Toyonaga has updated the pull request incrementally with two additional commits since the last revision: > > - Minor cleanup and comments. > - rename to disclaim_memory and update test Thank you @tstuefe for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20080#issuecomment-2231122489 From duke at openjdk.org Tue Jul 16 14:57:53 2024 From: duke at openjdk.org (duke) Date: Tue, 16 Jul 2024 14:57:53 GMT Subject: RFR: 8330144: Revise os::free_memory() [v2] In-Reply-To: <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com> References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com> <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com> Message-ID: <YI-psWpho2158NJ2PL0y6TLAI0VxV7GbAniOT4BynbI=.fa804499-0236-4bac-8b9e-a41f3fb5fe0b@github.com> On Wed, 10 Jul 2024 20:09:45 GMT, Robert Toyonaga <duke at openjdk.org> wrote: >> ### Summary >> On linux, change `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` so that it uses `madvise(MADV_DONTNEED)` (similar to the BSD implementation) instead of recommitting over the existing committed memory to discard the existing pages. This function should free the underlying memory without uncommitting. The benefit of this change is that we can get rid of conditional logic dependent on whether we're dealing with huge pages, `madvise` can't fail, and we can also get rid of the "alignment_hint" parameter. >> >> `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` has also been renamed to `os::disclaim_memory(char *addr, size_t bytes)` to differentiate it from `os::free_memory()` which reports the size of free memory instead of actually releasing memory. >> >> **Transparent huge pages:** >> `madvise(MADV_DONTNEED)` works with THP. As with small pages, `madvise(MADV_DONTNEED)` results in the memory being freed, RSS decreasing, and the addresses can be re-touched without being explicitly recommitted. >> >> To determine this, I set /sys/kernel/mm/transparent_hugepage/enabled to "always" and allocated a large amount of memory. Then /proc/PID/smaps shows that THP are being used to back that memory. After calling `disclaim_memory`, RSS decreases indicating the memory is no longer live. The `os::committed_in_range function` also reports that the memory has been freed (This function should probably be renamed to `live_in_range`). Touching the addresses again afterward is fine as well. >> >> **Explicit huge pages:** >> `madvise(MADV_DONTNEED)` does not result in memory being freed when used on explicit huge pages. However, the pages are not lost either. Additionally, after `madvise(MADV_DONTNEED)`, we can retouch the addresses without any problems. In conclusion, `madvise(MADV_DONTNEED)` has no affect on huge pages. This means the behavior of of this function with respect to huge pages remains the same. We can remove the "alignment_hint" parameter. >> >> To determine this, I allocated some huge pages via /proc/sys/vm/nr_hugepages. Successful allocation was confirmed with /proc/meminfo. After calling `disclaim_memory`, /proc/meminfo shows no change in the number of huge pages in use. Explicit huge pages are not reflected in RSS so I used the `os::committed_in_range function` instead. After calling `disclaim_memory`, the `os::committed_in_range` function reports that the memory is still live. Unfortunately that's not an imp... > > Robert Toyonaga has updated the pull request incrementally with two additional commits since the last revision: > > - Minor cleanup and comments. > - rename to disclaim_memory and update test @roberttoyonaga Your change (at version 6c9e6d5c385740e140b800113ec8d2b4d0a93e82) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20080#issuecomment-2231126734 From dnsimon at openjdk.org Tue Jul 16 15:02:54 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 16 Jul 2024 15:02:54 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v4] In-Reply-To: <mJv-yN_MA6-w1fqnIbYfoTUJmR_p2myDO2-0BA-Op7I=.ce80bd5d-da24-4150-bed9-5c614c02b3b8@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <mJv-yN_MA6-w1fqnIbYfoTUJmR_p2myDO2-0BA-Op7I=.ce80bd5d-da24-4150-bed9-5c614c02b3b8@github.com> Message-ID: <cjBvDhr4WYpomLhE1f2PFYk68zD-maQIeFE0Juhld00=.e20ac51b-98e5-4d4d-bc0a-698623c18f6f@github.com> On Tue, 16 Jul 2024 14:46:13 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. >> >> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. >> >> In this PR: >> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it. >> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. >> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. >> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. >> >> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: >> >> >> Benchmark Threads Mode Cnt Score Error Units >> ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op >> ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op >> ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op >> ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op >> ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op >> ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op >> ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op >> ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op >> ConcurrentClose.sharedAccess 1 avgt 10 ... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > JVMCI support src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethod.java line 62: > 60: > 61: /** > 62: * Returns true if this method has a {@code Scoped} annotation. Can you please make this a qualified name: `jdk.internal.misc.ScopedMemoryAccess.Scoped`. That makes it easier for someone not familiar with the code base to find. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20158#discussion_r1679575238 From jvernee at openjdk.org Tue Jul 16 15:02:53 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 16 Jul 2024 15:02:53 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v4] In-Reply-To: <mJv-yN_MA6-w1fqnIbYfoTUJmR_p2myDO2-0BA-Op7I=.ce80bd5d-da24-4150-bed9-5c614c02b3b8@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <mJv-yN_MA6-w1fqnIbYfoTUJmR_p2myDO2-0BA-Op7I=.ce80bd5d-da24-4150-bed9-5c614c02b3b8@github.com> Message-ID: <BK3p94TURpN85760p7WQitpA4OmHhbPud0w1woLOYzg=.90c85018-3eda-4db7-9c3c-2da61a7a1f31@github.com> On Tue, 16 Jul 2024 14:46:13 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. >> >> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. >> >> In this PR: >> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it. >> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. >> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. >> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. >> >> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: >> >> >> Benchmark Threads Mode Cnt Score Error Units >> ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op >> ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op >> ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op >> ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op >> ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op >> ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op >> ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op >> ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op >> ConcurrentClose.sharedAccess 1 avgt 10 ... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > JVMCI support Added JVMCI/Graal support, courtesy of @c-refice ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2231133607 From jvernee at openjdk.org Tue Jul 16 15:12:15 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 16 Jul 2024 15:12:15 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v5] In-Reply-To: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> Message-ID: <gmVrODnXkmeQEvYVo4lI7ETijuxh0FXdsV6U14keGR0=.428ee145-c159-4efc-a705-b13439e92d97@github.com> > This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. > > Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. > > In this PR: > - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it. > - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. > - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. > - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. > > I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: > > > Benchmark Threads Mode Cnt Score Error Units > ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op > ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op > ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op > ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op > ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op > ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op > ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op > ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op > ConcurrentClose.sharedAccess 1 avgt 10 52.042 ? 0.630 us/op > ConcurrentClose.conf... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: clarify javadoc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20158/files - new: https://git.openjdk.org/jdk/pull/20158/files/62849aa8..cd5f290e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20158.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20158/head:pull/20158 PR: https://git.openjdk.org/jdk/pull/20158 From jvernee at openjdk.org Tue Jul 16 15:12:15 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 16 Jul 2024 15:12:15 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v4] In-Reply-To: <cjBvDhr4WYpomLhE1f2PFYk68zD-maQIeFE0Juhld00=.e20ac51b-98e5-4d4d-bc0a-698623c18f6f@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <mJv-yN_MA6-w1fqnIbYfoTUJmR_p2myDO2-0BA-Op7I=.ce80bd5d-da24-4150-bed9-5c614c02b3b8@github.com> <cjBvDhr4WYpomLhE1f2PFYk68zD-maQIeFE0Juhld00=.e20ac51b-98e5-4d4d-bc0a-698623c18f6f@github.com> Message-ID: <OY61zjgZBuXUaCZFohHoJY_Xw5Q9rB2ChH6IVMPWeqo=.083cfde3-3d70-4050-a7c1-cb7a8bf9de9b@github.com> On Tue, 16 Jul 2024 15:00:04 GMT, Doug Simon <dnsimon at openjdk.org> wrote: >> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: >> >> JVMCI support > > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethod.java line 62: > >> 60: >> 61: /** >> 62: * Returns true if this method has a {@code Scoped} annotation. > > Can you please make this a qualified name: `jdk.internal.misc.ScopedMemoryAccess.Scoped`. > That makes it easier for someone not familiar with the code base to find. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20158#discussion_r1679589138 From stuefe at openjdk.org Tue Jul 16 15:22:52 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 16 Jul 2024 15:22:52 GMT Subject: RFR: 8330144: Revise os::free_memory() [v2] In-Reply-To: <ra9L9o5yJ32V6_UDWlfpVtQC_3trCI_o5xr_k4jTHGo=.85b235fd-54cb-4366-994a-c9f71c57a72a@github.com> References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com> <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com> <ra9L9o5yJ32V6_UDWlfpVtQC_3trCI_o5xr_k4jTHGo=.85b235fd-54cb-4366-994a-c9f71c57a72a@github.com> Message-ID: <mkuk1TTS0tu1oaF10kVz3QT-565CvFW8uQasZ1_wUUo=.ba47496c-7024-4252-aa38-d80433b59c7d@github.com> On Tue, 16 Jul 2024 14:53:00 GMT, Robert Toyonaga <duke at openjdk.org> wrote: >> Robert Toyonaga has updated the pull request incrementally with two additional commits since the last revision: >> >> - Minor cleanup and comments. >> - rename to disclaim_memory and update test > > Thank you @tstuefe for the review! Hi @roberttoyonaga, unfortunately you'll need a second reviewer (standard rule for hotspot changes). @MBaesken maybe? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20080#issuecomment-2231212584 From psandoz at openjdk.org Tue Jul 16 15:43:54 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 16 Jul 2024 15:43:54 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields In-Reply-To: <xi5HafA-_m5iDGFCKkFvkqUtOe9vzbxc1Ix6m9EyPVU=.ac3930b3-7aa0-493e-9331-7706f2224f6a@github.com> References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> <YFP94FW91LrpdTMeak-ePVmpwlW788IBynq_qBZVves=.a6acb940-78b0-4fce-826a-fb065d8a41f6@github.com> <xi5HafA-_m5iDGFCKkFvkqUtOe9vzbxc1Ix6m9EyPVU=.ac3930b3-7aa0-493e-9331-7706f2224f6a@github.com> Message-ID: <I0QvjXd3fIHwBOpDG3kRuX0S8D_8oFOoGS42FnOvsVU=.06dfa82c-051e-44fb-a4a2-62d6d7052d63@github.com> On Tue, 16 Jul 2024 09:53:55 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > > IIUC this means we can remove the explicit fence here: > > ``` > > public ConstantCallSite(MethodHandle target) { > > super(target); > > isFrozen = true; > > UNSAFE.storeStoreFence(); // properly publish isFrozen update > > } > > ``` > > I think so, but there is more to it: there are other fences around the `CallSite`-s that might be related to this. I would prefer not to do it any of usage changes in this PR. Agreed, just wanted to test my understanding. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2231262373 From shade at openjdk.org Tue Jul 16 17:12:08 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 16 Jul 2024 17:12:08 GMT Subject: RFR: 8336468: Reflection and MethodHandles should use more precise initializer checks [v2] In-Reply-To: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com> References: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com> Message-ID: <gkGtD5v4Ll6St0x5ABZpmL3wMVi1SreY-vNtIuaN-90=.a9282c6b-b8c1-46ea-8a89-ff26badcd949@github.com> > This PR should cover the Reflection/MethodHandles part of [JDK-8336103](https://bugs.openjdk.org/browse/JDK-8336103). > > There are places where we change the behavior: `clinit` would now be recorded as "method", instead of "constructor". Tracing back the uses of `get_flags`: it is used for initializing `java.lang.ClassFrameInfo.flags`. There seem to be no readers for this field in VM. Java side for `j.l.CFI` does not seem to check any method/constructor flags. So I would say this change in behavior is not really visible, and there is no need to try and keep the old (odd) behavior. > > I also inlined the `select_method` definition, which allows for a bit more straight-forward local code, and obviates the need for wrapping things with `methodHandle`. > > @mlchung, you probably want to look at this more closely. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [ ] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into JDK-8336468-reflection-init-checks - Remove unnecessary handle-izing - Fix - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20192/files - new: https://git.openjdk.org/jdk/pull/20192/files/ac4fbcbf..6e35634b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20192&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20192&range=00-01 Stats: 4 lines in 2 files changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20192.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20192/head:pull/20192 PR: https://git.openjdk.org/jdk/pull/20192 From jvernee at openjdk.org Tue Jul 16 18:09:20 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 16 Jul 2024 18:09:20 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v6] In-Reply-To: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> Message-ID: <Kq6Xf3hnyRVLinNi7rm0oPm34BtiW1-qIqvahxWvXv0=.d44f3b37-e903-4274-aad6-820ec269fc8d@github.com> > This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. > > Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. > > In this PR: > - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it. > - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. > - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. > - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. > > I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: > > > Benchmark Threads Mode Cnt Score Error Units > ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op > ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op > ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op > ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op > ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op > ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op > ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op > ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op > ConcurrentClose.sharedAccess 1 avgt 10 52.042 ? 0.630 us/op > ConcurrentClose.conf... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: Revert JVMCI changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20158/files - new: https://git.openjdk.org/jdk/pull/20158/files/cd5f290e..138fba42 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=04-05 Stats: 31 lines in 9 files changed: 0 ins; 29 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20158.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20158/head:pull/20158 PR: https://git.openjdk.org/jdk/pull/20158 From jvernee at openjdk.org Tue Jul 16 18:09:21 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 16 Jul 2024 18:09:21 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v5] In-Reply-To: <gmVrODnXkmeQEvYVo4lI7ETijuxh0FXdsV6U14keGR0=.428ee145-c159-4efc-a705-b13439e92d97@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <gmVrODnXkmeQEvYVo4lI7ETijuxh0FXdsV6U14keGR0=.428ee145-c159-4efc-a705-b13439e92d97@github.com> Message-ID: <dqB_uPkMWEf9t9WG1Fd-A8yIHDxiGucSjnSqmM2zEAw=.b1898fd7-8630-4a97-a520-d97750a8a2b4@github.com> On Tue, 16 Jul 2024 15:12:15 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. >> >> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. >> >> In this PR: >> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it. >> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. >> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. >> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. >> >> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: >> >> >> Benchmark Threads Mode Cnt Score Error Units >> ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op >> ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op >> ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op >> ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op >> ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op >> ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op >> ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op >> ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op >> ConcurrentClose.sharedAccess 1 avgt 10 ... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > clarify javadoc As discussed offline, JVMCI/Graal changes will be handled by a followup PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2231508565 From sviswanathan at openjdk.org Tue Jul 16 18:25:51 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 16 Jul 2024 18:25:51 GMT Subject: RFR: 8335860: compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard AVX/SSE settings In-Reply-To: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com> References: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com> Message-ID: <jc55k6HOXEz5yz6Pk0mDv4x-kCPfexkW1QNZ4B8vQaw=.bdb3b2d3-553a-4b56-9490-451092439e65@github.com> On Fri, 12 Jul 2024 18:26:26 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote: > Enabling test with explicit feature checks for x86 target. > Removing from test/hotspot/jtreg/ProblemList.txt > > Best Regards, > Jatin @jatin-bhateja There was also a suggestion from @eme64 as part of https://github.com/openjdk/jdk/pull/20062 to remove @requires vm.compiler2.enabled from the test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20160#issuecomment-2231538140 From sviswanathan at openjdk.org Tue Jul 16 18:32:55 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 16 Jul 2024 18:32:55 GMT Subject: RFR: 8335860: compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard AVX/SSE settings In-Reply-To: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com> References: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com> Message-ID: <QLczQ2tsKGcHLl1_3_4X7o2OC2CSNpp9gEdcYC9OD0c=.2e7d7e07-ac30-4be4-803c-c8ac49789eac@github.com> On Fri, 12 Jul 2024 18:26:26 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote: > Enabling test with explicit feature checks for x86 target. > Removing from test/hotspot/jtreg/ProblemList.txt > > Best Regards, > Jatin src/hotspot/cpu/x86/vm_version_x86.hpp line 838: > 836: > 837: // For AVX CPUs only since it needs VEX encoding which is missing on SSE targets, > 838: // thus f16c support is disabled if UseAVX == 0. This comment is somewhat or vey confusing. The code for supports_float16() by itself is very clear. I am wondering why do we need this explanation in the comment at all? Let us remove it altogether. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20160#discussion_r1679872428 From dholmes at openjdk.org Tue Jul 16 21:57:51 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 16 Jul 2024 21:57:51 GMT Subject: RFR: 8336275: Move common Method and Constructor fields to Executable In-Reply-To: <nYtWyeRXdAr_zmzpxdugyZNRUzhfHJUKX1K2ilpSs8A=.cb1c31be-a7e0-49b5-ab9b-18a3abd122a9@github.com> References: <nYtWyeRXdAr_zmzpxdugyZNRUzhfHJUKX1K2ilpSs8A=.cb1c31be-a7e0-49b5-ab9b-18a3abd122a9@github.com> Message-ID: <8cVBN_0pZKGqYGrjKoXi3Rda7wzJJHFU3uui8PSdUFI=.1d65c77a-db23-4278-9ab3-16608b19f0aa@github.com> On Tue, 16 Jul 2024 03:45:36 GMT, Chen Liang <liach at openjdk.org> wrote: > Move fields common to Method and Field to executable s/Field/Constructor I was a bit confused about executable fields for a moment. :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20188#issuecomment-2231889229 From liach at openjdk.org Tue Jul 16 22:16:51 2024 From: liach at openjdk.org (Chen Liang) Date: Tue, 16 Jul 2024 22:16:51 GMT Subject: RFR: 8336468: Reflection and MethodHandles should use more precise initializer checks [v2] In-Reply-To: <gkGtD5v4Ll6St0x5ABZpmL3wMVi1SreY-vNtIuaN-90=.a9282c6b-b8c1-46ea-8a89-ff26badcd949@github.com> References: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com> <gkGtD5v4Ll6St0x5ABZpmL3wMVi1SreY-vNtIuaN-90=.a9282c6b-b8c1-46ea-8a89-ff26badcd949@github.com> Message-ID: <KeewYhuYEmnBNOOvt6jYqMyaCyjKa1WQao734RKOXwU=.3962ee19-ff17-45bc-9ae3-67a368165476@github.com> On Tue, 16 Jul 2024 17:12:08 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: >> This PR should cover the Reflection/MethodHandles part of [JDK-8336103](https://bugs.openjdk.org/browse/JDK-8336103). >> >> There are places where we change the behavior: `clinit` would now be recorded as "method", instead of "constructor". Tracing back the uses of `get_flags`: it is used for initializing `java.lang.ClassFrameInfo.flags`. There seem to be no readers for this field in VM. Java side for `j.l.CFI` does not seem to check any method/constructor flags. So I would say this change in behavior is not really visible, and there is no need to try and keep the old (odd) behavior. >> >> I also inlined the `select_method` definition, which allows for a bit more straight-forward local code, and obviates the need for wrapping things with `methodHandle`. >> >> @mlchung, you probably want to look at this more closely. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into JDK-8336468-reflection-init-checks > - Remove unnecessary handle-izing > - Fix > - Fix src/hotspot/share/runtime/reflection.cpp line 769: > 767: > 768: oop Reflection::new_method(const methodHandle& method, bool for_constant_pool_access, TRAPS) { > 769: // Allow sun.reflect.ConstantPool to refer to <clinit> methods as java.lang.reflect.Methods. Not quite related, but it's jdk.internal.reflect.ConstantPool now :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20192#discussion_r1680121069 From dholmes at openjdk.org Tue Jul 16 22:27:52 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 16 Jul 2024 22:27:52 GMT Subject: RFR: 8336275: Move common Method and Constructor fields to Executable In-Reply-To: <nYtWyeRXdAr_zmzpxdugyZNRUzhfHJUKX1K2ilpSs8A=.cb1c31be-a7e0-49b5-ab9b-18a3abd122a9@github.com> References: <nYtWyeRXdAr_zmzpxdugyZNRUzhfHJUKX1K2ilpSs8A=.cb1c31be-a7e0-49b5-ab9b-18a3abd122a9@github.com> Message-ID: <PQVwUikoHRvHZMA_KJhf05g7YNQXvGghDv5F5KbddPo=.10defabe-2651-4de4-90a1-df071b0a9b7b@github.com> On Tue, 16 Jul 2024 03:45:36 GMT, Chen Liang <liach at openjdk.org> wrote: > Move fields common to Method and Field to executable, which simplifies implementation. Removed useless transient modifiers as Method and Field were never serializable. Hotspot changes look good. Core-libs do too but I will leave that for libs folk to approve src/java.base/share/classes/java/lang/reflect/Executable.java line 54: > 52: public abstract sealed class Executable extends AccessibleObject > 53: implements Member, GenericDeclaration permits Constructor, Method { > 54: // fields injected by hotspot If a field is listed here then it is NOT injected by hotspot. src/java.base/share/classes/java/lang/reflect/Method.java line 73: > 71: */ > 72: public final class Method extends Executable { > 73: // fields injected by hotspot Again not injected ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20188#pullrequestreview-2181384669 PR Review Comment: https://git.openjdk.org/jdk/pull/20188#discussion_r1680112370 PR Review Comment: https://git.openjdk.org/jdk/pull/20188#discussion_r1680113161 From liach at openjdk.org Tue Jul 16 22:43:51 2024 From: liach at openjdk.org (Chen Liang) Date: Tue, 16 Jul 2024 22:43:51 GMT Subject: RFR: 8336275: Move common Method and Constructor fields to Executable In-Reply-To: <PQVwUikoHRvHZMA_KJhf05g7YNQXvGghDv5F5KbddPo=.10defabe-2651-4de4-90a1-df071b0a9b7b@github.com> References: <nYtWyeRXdAr_zmzpxdugyZNRUzhfHJUKX1K2ilpSs8A=.cb1c31be-a7e0-49b5-ab9b-18a3abd122a9@github.com> <PQVwUikoHRvHZMA_KJhf05g7YNQXvGghDv5F5KbddPo=.10defabe-2651-4de4-90a1-df071b0a9b7b@github.com> Message-ID: <rgKZc3deTibJ4l1BZCk5c2SzfguSjYHbhKJSgO6fEDk=.224cc0e2-7d7f-4fdc-8e3c-ef8277a6aa14@github.com> On Tue, 16 Jul 2024 22:00:49 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Move fields common to Method and Field to executable, which simplifies implementation. Removed useless transient modifiers as Method and Field were never serializable. > > src/java.base/share/classes/java/lang/reflect/Executable.java line 54: > >> 52: public abstract sealed class Executable extends AccessibleObject >> 53: implements Member, GenericDeclaration permits Constructor, Method { >> 54: // fields injected by hotspot > > If a field is listed here then it is NOT injected by hotspot. What would be the terminology for a final field that's set by hotspot, against the regular java constrcutor rules? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20188#discussion_r1680139439 From cjplummer at openjdk.org Wed Jul 17 00:04:01 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 17 Jul 2024 00:04:01 GMT Subject: RFR: 8336587: failure_handler lldb command times out on macosx-aarch64 core file Message-ID: <L1fxCYdEJTI5I2mfuEWOkkTihGnPgioh2A2Q5f-qXwg=.4ba1fe74-0395-4a87-bf39-56af4080b55b@github.com> I was looking at the failure_handler output for the lldb command on a macosx-aarch64 core file (it is trying to use lldb to get a back trace of all threads), and noticed it timed out: ---------------------------------------- [2024-07-15 05:15:47] [<snip>/usr/bin/lldb, --core, <snip>/core.92643, <snip>/bin/java, -o, thread backtrace all, -o, quit] timeout=20000 in <snip> ---------------------------------------- (lldb) target create "<snip>/bin/java" --core "<snip>/core.92643" WARNING: tool timed out: killed process after 20000 ms ---------------------------------------- [2024-07-15 05:16:07] exit code: -2 time: 20163 ms ---------------------------------------- 20 seconds is the failure_handler default timeout for all commands. Core files on macosx-aarch64 tend to be very large. This one was over 13gb. On my MBPro it took 30 seconds. I bumped up the timeout to 60 seconds and reproduce the same crash in mach5 (more than once), and it usually took about 55 seconds for the lldb command, but it did succeed with the longer timeout. I think we should change the timeout to event more than 60 seconds just to make sure we won't see timeouts. 120 seconds is probably a good amount. ------------- Commit messages: - Fix comment. - Give lldb a much longer timeout. Changes: https://git.openjdk.org/jdk/pull/20206/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20206&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8336587 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20206.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20206/head:pull/20206 PR: https://git.openjdk.org/jdk/pull/20206 From dlong at openjdk.org Wed Jul 17 00:49:00 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 17 Jul 2024 00:49:00 GMT Subject: RFR: 8336587: failure_handler lldb command times out on macosx-aarch64 core file In-Reply-To: <L1fxCYdEJTI5I2mfuEWOkkTihGnPgioh2A2Q5f-qXwg=.4ba1fe74-0395-4a87-bf39-56af4080b55b@github.com> References: <L1fxCYdEJTI5I2mfuEWOkkTihGnPgioh2A2Q5f-qXwg=.4ba1fe74-0395-4a87-bf39-56af4080b55b@github.com> Message-ID: <tu_swVP6kVSj4xAVpfM1-RGBVU0wBxh1_isozGXv12E=.48299cc0-93cf-489b-9b36-a3f91dd08f26@github.com> On Tue, 16 Jul 2024 23:59:09 GMT, Chris Plummer <cjplummer at openjdk.org> wrote: > I was looking at the failure_handler output for the lldb command on a macosx-aarch64 core file (it is trying to use lldb to get a back trace of all threads), and noticed it timed out: > > > ---------------------------------------- > [2024-07-15 05:15:47] [<snip>/usr/bin/lldb, --core, <snip>/core.92643, <snip>/bin/java, -o, thread backtrace all, -o, quit] timeout=20000 in <snip> > ---------------------------------------- > (lldb) target create "<snip>/bin/java" --core "<snip>/core.92643" > WARNING: tool timed out: killed process after 20000 ms > ---------------------------------------- > [2024-07-15 05:16:07] exit code: -2 time: 20163 ms > ---------------------------------------- > > > 20 seconds is the failure_handler default timeout for all commands. Core files on macosx-aarch64 tend to be very large. This one was over 13gb. On my MBPro it took 30 seconds. I bumped up the timeout to 60 seconds and reproduce the same crash in mach5 (more than once), and it usually took about 55 seconds for the lldb command, but it did succeed with the longer timeout. I think we should change the timeout to event more than 60 seconds just to make sure we won't see timeouts. 120 seconds is probably a good amount. Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20206#pullrequestreview-2181547441 From liach at openjdk.org Wed Jul 17 03:03:23 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 17 Jul 2024 03:03:23 GMT Subject: RFR: 8336275: Move common Method and Constructor fields to Executable [v2] In-Reply-To: <nYtWyeRXdAr_zmzpxdugyZNRUzhfHJUKX1K2ilpSs8A=.cb1c31be-a7e0-49b5-ab9b-18a3abd122a9@github.com> References: <nYtWyeRXdAr_zmzpxdugyZNRUzhfHJUKX1K2ilpSs8A=.cb1c31be-a7e0-49b5-ab9b-18a3abd122a9@github.com> Message-ID: <Z04ux2yyYVR5W1y8prXM4lYPhycn-DE7aM7elm92C3k=.e9eb01d1-cbc3-4da8-be66-03ad947c20ff@github.com> > Move fields common to Method and Field to executable, which simplifies implementation. Removed useless transient modifiers as Method and Field were never serializable. Chen Liang has updated the pull request incrementally with one additional commit since the last revision: Redundant transient; Update the comments to be more accurate ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20188/files - new: https://git.openjdk.org/jdk/pull/20188/files/dbe59a5f..184e8a4e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20188&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20188&range=00-01 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20188.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20188/head:pull/20188 PR: https://git.openjdk.org/jdk/pull/20188 From liach at openjdk.org Wed Jul 17 03:03:23 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 17 Jul 2024 03:03:23 GMT Subject: RFR: 8336275: Move common Method and Constructor fields to Executable [v2] In-Reply-To: <rgKZc3deTibJ4l1BZCk5c2SzfguSjYHbhKJSgO6fEDk=.224cc0e2-7d7f-4fdc-8e3c-ef8277a6aa14@github.com> References: <nYtWyeRXdAr_zmzpxdugyZNRUzhfHJUKX1K2ilpSs8A=.cb1c31be-a7e0-49b5-ab9b-18a3abd122a9@github.com> <PQVwUikoHRvHZMA_KJhf05g7YNQXvGghDv5F5KbddPo=.10defabe-2651-4de4-90a1-df071b0a9b7b@github.com> <rgKZc3deTibJ4l1BZCk5c2SzfguSjYHbhKJSgO6fEDk=.224cc0e2-7d7f-4fdc-8e3c-ef8277a6aa14@github.com> Message-ID: <khPrRqxhRoqdOvCPQIQosNrarHwJCdVujMaghVWNB7U=.e94ade82-3de8-45dc-aa6e-bb6ca18a0602@github.com> On Tue, 16 Jul 2024 22:41:40 GMT, Chen Liang <liach at openjdk.org> wrote: >> src/java.base/share/classes/java/lang/reflect/Executable.java line 54: >> >>> 52: public abstract sealed class Executable extends AccessibleObject >>> 53: implements Member, GenericDeclaration permits Constructor, Method { >>> 54: // fields injected by hotspot >> >> If a field is listed here then it is NOT injected by hotspot. > > What would be the terminology for a final field that's set by hotspot, against the regular java constrcutor rules? I have chosen the wording "all final fields are used by the VM" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20188#discussion_r1680341545 From kvn at openjdk.org Wed Jul 17 03:42:07 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 17 Jul 2024 03:42:07 GMT Subject: RFR: 8335921: Fix HotSpot VM build without JVMTI Message-ID: <SFc7wGgnmCR8hHO_6h9j_LC5drW2BLC-sRKuFNtAOjE=.d061ebae-ba38-4d05-9648-e0ff17bb3343@github.com> Citing David Holmes from bug report: "We provided the ability to leave out certain VM services (JVMTI, GC's other than serial, ...) as part of the design of the MinimalVM to support Java SE Embedded, along with the Compact Profiles of JDK 8. This manifested in the source code as a set of INCLUDE_XXX ifdef guards. The build system later exposed these as individual --with-jvm-features=xxx,yyy support. However, it was never intended (and certainly not tested) that you could mix-and-match arbitrary subsets of these VM features at will. Consequently if you start trying to do this you will find things that need fixing." I added `INCLUDE_JVMTI` guards in two places where it was missed: JVMCI and JFR. Affected code was added recently, in the past year. After that I was able to build VM on all supported platforms. Note: building VM without JVMTI is not officially supported feature. We are not testing it and such failures (missing guards) are not unexpected. A lot of tests failed with VM without JVMTI. All are expected failures. I listed failed tests in bug report. I fixed (added requires `vm.jvmti`) only one which was part of [JDK-8257967](https://bugs.openjdk.org/browse/JDK-8257967) changes which introduced JFR code without `INCLUDE_JVMTI` guards. I ran 2 rounds of testing: First, only **tier1** with VM built without JVMTI to see if builds passed and which tests affected. I wrote comment in bug report which tests failed (all expected to fail without JVMTI). Second round of testing with JVMTI in VM: tier1-4 ------------- Commit messages: - 8335921: Fix HotSpot VM build without JVMTI Changes: https://git.openjdk.org/jdk/pull/20209/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20209&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335921 Stats: 20 lines in 8 files changed: 7 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/20209.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20209/head:pull/20209 PR: https://git.openjdk.org/jdk/pull/20209 From dholmes at openjdk.org Wed Jul 17 04:59:51 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 17 Jul 2024 04:59:51 GMT Subject: RFR: 8335921: Fix HotSpot VM build without JVMTI In-Reply-To: <SFc7wGgnmCR8hHO_6h9j_LC5drW2BLC-sRKuFNtAOjE=.d061ebae-ba38-4d05-9648-e0ff17bb3343@github.com> References: <SFc7wGgnmCR8hHO_6h9j_LC5drW2BLC-sRKuFNtAOjE=.d061ebae-ba38-4d05-9648-e0ff17bb3343@github.com> Message-ID: <9pz4Ru-DFK42pLhG6ny7_-bkHzTvDiBq5NfHk_0ron0=.3b8e2d59-7dc2-461c-be8a-00ccc00fe1f8@github.com> On Wed, 17 Jul 2024 03:37:36 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote: > Citing David Holmes from bug report: > "We provided the ability to leave out certain VM services (JVMTI, GC's other than serial, ...) as part of the design of the MinimalVM to support Java SE Embedded, along with the Compact Profiles of JDK 8. This manifested in the source code as a set of INCLUDE_XXX ifdef guards. The build system later exposed these as individual --with-jvm-features=xxx,yyy support. However, it was never intended (and certainly not tested) that you could mix-and-match arbitrary subsets of these VM features at will. Consequently if you start trying to do this you will find things that need fixing." > > I added `INCLUDE_JVMTI` guards in two places where it was missed: JVMCI and JFR. Affected code was added recently, in the past year. After that I was able to build VM on all supported platforms. > > Note: building VM without JVMTI is not officially supported feature. We are not testing it and such failures (missing guards) are not unexpected. > > A lot of tests failed with VM without JVMTI. All are expected failures. I listed failed tests in bug report. > I fixed (added requires `vm.jvmti`) only one which was part of [JDK-8257967](https://bugs.openjdk.org/browse/JDK-8257967) changes which introduced JFR code without `INCLUDE_JVMTI` guards. > > I ran 2 rounds of testing: > > First, only **tier1** with VM built without JVMTI to see if builds passed and which tests affected. I wrote comment in bug report which tests failed (all expected to fail without JVMTI). > > Second round of testing with JVMTI in VM: tier1-4 This seems reasonable to me. It highlights the problem we have with optional components in that you either have to work things so that semantically we have a do-nothing implementation of that component, or else you have to put the guards around every piece of code that would normally interact with it. Thanks. src/hotspot/share/jfr/instrumentation/jfrJvmtiAgent.hpp line 35: > 33: JfrJvmtiAgent(); > 34: ~JfrJvmtiAgent(); > 35: static bool create() NOT_JVMTI_RETURN_(true); It initially seemed odd to return `true` here, but looking through the JFR code that interacts with the Agent it seems the right way to view this is that without JVMTI we have a no-op agent. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20209#pullrequestreview-2181875380 PR Review Comment: https://git.openjdk.org/jdk/pull/20209#discussion_r1680403451 From thartmann at openjdk.org Wed Jul 17 05:08:51 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 17 Jul 2024 05:08:51 GMT Subject: RFR: 8325945: Error reporting should limit the number of String characters printed In-Reply-To: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> Message-ID: <iEpt5qvDzA23twaT_Yib2vYd1Bc2y7Zl_dfUnbpNORE=.58b42c83-64bb-42a9-9c6f-cd0e60780b89@github.com> On Fri, 12 Jul 2024 02:17:46 GMT, David Holmes <dholmes at openjdk.org> wrote: > Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log. > > The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 2 to O_BUFLEN. > > The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless. > > If a string's length exceeds `max_length` then we print it as follows: > > "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged) > > For example if we print "ABCDE" with a max_length of 4 then the output is literally: > > "AB ... DE" (abridged) > > The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive). > > For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`. > > Testing: > - new test added for validation purposes > - tiers 1 - 3 as sanity testing > > Thanks Looks good to me otherwise. Thanks for fixing this! src/hotspot/share/runtime/globals.hpp line 1310: > 1308: "maximum number of characters to print for a java.lang.String " \ > 1309: "in the VM. If exceeded, an abridged version of the string is " \ > 1310: "print with the middle of the string is elided.") \ Suggestion: "printed with the middle of the string is elided.") \ ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20150#pullrequestreview-2181885422 PR Review Comment: https://git.openjdk.org/jdk/pull/20150#discussion_r1680410176 From thartmann at openjdk.org Wed Jul 17 05:08:51 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 17 Jul 2024 05:08:51 GMT Subject: RFR: 8325945: Error reporting should limit the number of String characters printed In-Reply-To: <iEpt5qvDzA23twaT_Yib2vYd1Bc2y7Zl_dfUnbpNORE=.58b42c83-64bb-42a9-9c6f-cd0e60780b89@github.com> References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> <iEpt5qvDzA23twaT_Yib2vYd1Bc2y7Zl_dfUnbpNORE=.58b42c83-64bb-42a9-9c6f-cd0e60780b89@github.com> Message-ID: <gABpAo4f_ZUdhuir197iOpkbO_uk-T_BjZzUyfw2f-0=.1ac5353a-d43e-432e-9a46-3583c24b8f11@github.com> On Wed, 17 Jul 2024 05:03:28 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: >> Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log. >> >> The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 2 to O_BUFLEN. >> >> The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless. >> >> If a string's length exceeds `max_length` then we print it as follows: >> >> "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged) >> >> For example if we print "ABCDE" with a max_length of 4 then the output is literally: >> >> "AB ... DE" (abridged) >> >> The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive). >> >> For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`. >> >> Testing: >> - new test added for validation purposes >> - tiers 1 - 3 as sanity testing >> >> Thanks > > src/hotspot/share/runtime/globals.hpp line 1310: > >> 1308: "maximum number of characters to print for a java.lang.String " \ >> 1309: "in the VM. If exceeded, an abridged version of the string is " \ >> 1310: "print with the middle of the string is elided.") \ > > Suggestion: > > "printed with the middle of the string is elided.") \ I think it should also be "... of the string elided" (without the "is"). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20150#discussion_r1680410977 From dholmes at openjdk.org Wed Jul 17 05:18:51 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 17 Jul 2024 05:18:51 GMT Subject: RFR: 8336275: Move common Method and Constructor fields to Executable [v2] In-Reply-To: <Z04ux2yyYVR5W1y8prXM4lYPhycn-DE7aM7elm92C3k=.e9eb01d1-cbc3-4da8-be66-03ad947c20ff@github.com> References: <nYtWyeRXdAr_zmzpxdugyZNRUzhfHJUKX1K2ilpSs8A=.cb1c31be-a7e0-49b5-ab9b-18a3abd122a9@github.com> <Z04ux2yyYVR5W1y8prXM4lYPhycn-DE7aM7elm92C3k=.e9eb01d1-cbc3-4da8-be66-03ad947c20ff@github.com> Message-ID: <s7x0E7F-pTzuxRKXrxsSwAVzE4I5-IxZWhrCWmSM_UQ=.60f9529b-8a94-453f-96ba-5dd32beab06c@github.com> On Wed, 17 Jul 2024 03:03:23 GMT, Chen Liang <liach at openjdk.org> wrote: >> Move fields common to Method and Field to executable, which simplifies implementation. Removed useless transient modifiers as Method and Field were never serializable. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Redundant transient; Update the comments to be more accurate Marked as reviewed by dholmes (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20188#pullrequestreview-2181897055 From dholmes at openjdk.org Wed Jul 17 05:18:51 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 17 Jul 2024 05:18:51 GMT Subject: RFR: 8336275: Move common Method and Constructor fields to Executable [v2] In-Reply-To: <khPrRqxhRoqdOvCPQIQosNrarHwJCdVujMaghVWNB7U=.e94ade82-3de8-45dc-aa6e-bb6ca18a0602@github.com> References: <nYtWyeRXdAr_zmzpxdugyZNRUzhfHJUKX1K2ilpSs8A=.cb1c31be-a7e0-49b5-ab9b-18a3abd122a9@github.com> <PQVwUikoHRvHZMA_KJhf05g7YNQXvGghDv5F5KbddPo=.10defabe-2651-4de4-90a1-df071b0a9b7b@github.com> <rgKZc3deTibJ4l1BZCk5c2SzfguSjYHbhKJSgO6fEDk=.224cc0e2-7d7f-4fdc-8e3c-ef8277a6aa14@github.com> <khPrRqxhRoqdOvCPQIQosNrarHwJCdVujMaghVWNB7U=.e94ade82-3de8-45dc-aa6e-bb6ca18a0602@github.com> Message-ID: <zFD7Vl2l5pukfvrtJM0ZrjWBbM9kdORK-S90hig6tc4=.957fdb58-12f2-4750-8634-8572087a7226@github.com> On Wed, 17 Jul 2024 02:57:51 GMT, Chen Liang <liach at openjdk.org> wrote: >> What would be the terminology for a final field that's set by hotspot, against the regular java constrcutor rules? > > I have chosen the wording "all final fields are used by the VM" I don't know of any specific terminology - we typically just add a comment saying the field is set and/or read by the VM. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20188#discussion_r1680417581 From dholmes at openjdk.org Wed Jul 17 05:25:52 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 17 Jul 2024 05:25:52 GMT Subject: RFR: 8336587: failure_handler lldb command times out on macosx-aarch64 core file In-Reply-To: <L1fxCYdEJTI5I2mfuEWOkkTihGnPgioh2A2Q5f-qXwg=.4ba1fe74-0395-4a87-bf39-56af4080b55b@github.com> References: <L1fxCYdEJTI5I2mfuEWOkkTihGnPgioh2A2Q5f-qXwg=.4ba1fe74-0395-4a87-bf39-56af4080b55b@github.com> Message-ID: <o7Pft4WpyCuNlKDRObyzzyRiAxrNHRnQJO9CLyxsx84=.e24057cf-8b5c-4963-8e7e-365245099bb4@github.com> On Tue, 16 Jul 2024 23:59:09 GMT, Chris Plummer <cjplummer at openjdk.org> wrote: > I was looking at the failure_handler output for the lldb command on a macosx-aarch64 core file (it is trying to use lldb to get a back trace of all threads), and noticed it timed out: > > > ---------------------------------------- > [2024-07-15 05:15:47] [<snip>/usr/bin/lldb, --core, <snip>/core.92643, <snip>/bin/java, -o, thread backtrace all, -o, quit] timeout=20000 in <snip> > ---------------------------------------- > (lldb) target create "<snip>/bin/java" --core "<snip>/core.92643" > WARNING: tool timed out: killed process after 20000 ms > ---------------------------------------- > [2024-07-15 05:16:07] exit code: -2 time: 20163 ms > ---------------------------------------- > > > 20 seconds is the failure_handler default timeout for all commands. Core files on macosx-aarch64 tend to be very large. This one was over 13gb. On my MBPro it took 30 seconds. I bumped up the timeout to 60 seconds and reproduce the same crash in mach5 (more than once), and it usually took about 55 seconds for the lldb command, but it did succeed with the longer timeout. I think we should change the timeout to even more than 60 seconds just to make sure we won't see timeouts. 120 seconds is probably a good amount. Marked as reviewed by dholmes (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20206#pullrequestreview-2181903818 From stuefe at openjdk.org Wed Jul 17 05:27:53 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 17 Jul 2024 05:27:53 GMT Subject: RFR: 8325945: Error reporting should limit the number of String characters printed In-Reply-To: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> Message-ID: <0C2xrw7X8gn7dl7LWNZu9lh5XJjvOSNbA0PRqa6ydoM=.29d1d6ee-242f-4ab5-abaa-d2113d030f82@github.com> On Fri, 12 Jul 2024 02:17:46 GMT, David Holmes <dholmes at openjdk.org> wrote: > Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log. > > The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 2 to O_BUFLEN. > > The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless. > > If a string's length exceeds `max_length` then we print it as follows: > > "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged) > > For example if we print "ABCDE" with a max_length of 4 then the output is literally: > > "AB ... DE" (abridged) > > The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive). > > For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`. > > Testing: > - new test added for validation purposes > - tiers 1 - 3 as sanity testing > > Thanks src/hotspot/share/classfile/javaClasses.cpp line 785: > 783: index = length - (max_length / 2); > 784: abridge = false; // only do this once > 785: } Instead of the trailing "abridged", in similar cases I printed out the number of omitted characters. E.g. "Very long long long ... (53 characters omitted) ... long long string" Makes it obvious how much has been cut, and no danger of confusing the ellipse with naturally occurring dots. Additionally, I would only do this if length > max_length + X, with X being at least as long as the middle part (3 characters if you only print an ellipse). You end up with printed strings that may be slightly longer than maxlen, but OTOH the output is clearer. Otherwise you may indicate omission where none happened (if length == max_length) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20150#discussion_r1680421675 From stuefe at openjdk.org Wed Jul 17 05:29:54 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 17 Jul 2024 05:29:54 GMT Subject: RFR: 8300732: Whitebox functions for Metaspace test should use byte size [v3] In-Reply-To: <D7Jfl3uzXBUJgWKE_iu88rhdWpkge5IK4SPs3lt4xUM=.b112d42f-c718-4922-ad69-da4714cf5ecb@github.com> References: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com> <D7Jfl3uzXBUJgWKE_iu88rhdWpkge5IK4SPs3lt4xUM=.b112d42f-c718-4922-ad69-da4714cf5ecb@github.com> Message-ID: <BeeNYBIpm-clapTYjHIWxZVVPvrb7QG0K-Gv0Tl8yaQ=.7f5f8fb6-0b10-48a6-98bb-9dd0b8f6e214@github.com> On Tue, 16 Jul 2024 14:49:27 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This PR addresses [8300732](https://bugs.openjdk.org/browse/JDK-8300732) switching Whitebox Metaspace test functions to use bytes as opposed to words. >> >> Testing: >> - [x] `test/hotspot/jtreg/runtime/Metaspace` tests pass. >> >> Thanks, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Feedback - updating Unit.valueOf to direct access Still good. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20039#pullrequestreview-2181909030 From dholmes at openjdk.org Wed Jul 17 05:32:26 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 17 Jul 2024 05:32:26 GMT Subject: RFR: 8325945: Error reporting should limit the number of String characters printed [v2] In-Reply-To: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> Message-ID: <SBNE7wMZ0WMp1bQzyK9EASI6EOXgtVPSJw1uWCqRFko=.947c9a5d-ec2e-450a-a7ca-6272470827ae@github.com> > Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log. > > The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 2 to O_BUFLEN. > > The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless. > > If a string's length exceeds `max_length` then we print it as follows: > > "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged) > > For example if we print "ABCDE" with a max_length of 4 then the output is literally: > > "AB ... DE" (abridged) > > The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive). > > For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`. > > Testing: > - new test added for validation purposes > - tiers 1 - 3 as sanity testing > > Thanks David Holmes has updated the pull request incrementally with one additional commit since the last revision: Fixed grammar ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20150/files - new: https://git.openjdk.org/jdk/pull/20150/files/7b155abc..39256bd3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20150&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20150&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20150.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20150/head:pull/20150 PR: https://git.openjdk.org/jdk/pull/20150 From dholmes at openjdk.org Wed Jul 17 05:32:26 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 17 Jul 2024 05:32:26 GMT Subject: RFR: 8325945: Error reporting should limit the number of String characters printed [v2] In-Reply-To: <iEpt5qvDzA23twaT_Yib2vYd1Bc2y7Zl_dfUnbpNORE=.58b42c83-64bb-42a9-9c6f-cd0e60780b89@github.com> References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> <iEpt5qvDzA23twaT_Yib2vYd1Bc2y7Zl_dfUnbpNORE=.58b42c83-64bb-42a9-9c6f-cd0e60780b89@github.com> Message-ID: <wTjWL_BESpfR0JfYBkUXZCVp2g5KAS0CiPQmS4h_BRw=.6ad07592-cc78-446b-9b57-dff79e4900a9@github.com> On Wed, 17 Jul 2024 05:05:56 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed grammar > > Looks good to me otherwise. Thanks for fixing this! Thanks for the review @TobiHartmann ------------- PR Comment: https://git.openjdk.org/jdk/pull/20150#issuecomment-2232449368 From dholmes at openjdk.org Wed Jul 17 05:32:26 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 17 Jul 2024 05:32:26 GMT Subject: RFR: 8325945: Error reporting should limit the number of String characters printed [v2] In-Reply-To: <gABpAo4f_ZUdhuir197iOpkbO_uk-T_BjZzUyfw2f-0=.1ac5353a-d43e-432e-9a46-3583c24b8f11@github.com> References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> <iEpt5qvDzA23twaT_Yib2vYd1Bc2y7Zl_dfUnbpNORE=.58b42c83-64bb-42a9-9c6f-cd0e60780b89@github.com> <gABpAo4f_ZUdhuir197iOpkbO_uk-T_BjZzUyfw2f-0=.1ac5353a-d43e-432e-9a46-3583c24b8f11@github.com> Message-ID: <JlguV-ZjQ8cUjVfzfqKI7f84IhxgHHBoJZE2CG-cRVA=.bbd81084-0818-4589-88e1-96b0b6feb368@github.com> On Wed, 17 Jul 2024 05:04:45 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: >> src/hotspot/share/runtime/globals.hpp line 1310: >> >>> 1308: "maximum number of characters to print for a java.lang.String " \ >>> 1309: "in the VM. If exceeded, an abridged version of the string is " \ >>> 1310: "print with the middle of the string is elided.") \ >> >> Suggestion: >> >> "printed with the middle of the string is elided.") \ > > I think it should also be "... of the string elided" (without the "is"). Fixed. Don't know how I missed that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20150#discussion_r1680426244 From jbhateja at openjdk.org Wed Jul 17 05:37:03 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 17 Jul 2024 05:37:03 GMT Subject: RFR: 8335860: compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard AVX/SSE settings [v2] In-Reply-To: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com> References: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com> Message-ID: <w77DS-gliOwhxLiidSa1fqG0-aq-7dax2Lcndxk-uLs=.98d75a7d-e0f7-4817-9e5d-1dec22f64c3b@github.com> > Enabling test with explicit feature checks for x86 target. > Removing from test/hotspot/jtreg/ProblemList.txt > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review suggestions incorporated ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20160/files - new: https://git.openjdk.org/jdk/pull/20160/files/16ebbbaa..fa68e6ac Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20160&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20160&range=00-01 Stats: 3 lines in 2 files changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20160.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20160/head:pull/20160 PR: https://git.openjdk.org/jdk/pull/20160 From jbhateja at openjdk.org Wed Jul 17 05:37:04 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 17 Jul 2024 05:37:04 GMT Subject: RFR: 8335860: compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard AVX/SSE settings In-Reply-To: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com> References: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com> Message-ID: <Xzk8WdTohUoGqud1EBY6YeTn-MRheHLXJxT3xhX88a4=.34b86427-e557-46ad-94a4-2966184fe33f@github.com> On Fri, 12 Jul 2024 18:26:26 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote: > Enabling test with explicit feature checks for x86 target. > Removing from test/hotspot/jtreg/ProblemList.txt > > Best Regards, > Jatin > @jatin-bhateja There was also a suggestion from @eme64 as part of #20062 to remove requires vm.compiler2.enabled from the test. Test only validates specific C2 IR patten, framework makes sure to compile @Test annotated methods with top tier (c2 : default) compiler using Whitebox mechanism. So @require flags looks redundant here. Agree. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20160#issuecomment-2232454739 From jbhateja at openjdk.org Wed Jul 17 05:37:04 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 17 Jul 2024 05:37:04 GMT Subject: RFR: 8335860: compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard AVX/SSE settings [v2] In-Reply-To: <QLczQ2tsKGcHLl1_3_4X7o2OC2CSNpp9gEdcYC9OD0c=.2e7d7e07-ac30-4be4-803c-c8ac49789eac@github.com> References: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com> <QLczQ2tsKGcHLl1_3_4X7o2OC2CSNpp9gEdcYC9OD0c=.2e7d7e07-ac30-4be4-803c-c8ac49789eac@github.com> Message-ID: <cHOK-juVOMTDpwPxfKcjn7TCBI8k0hBKyTDQgv4PAb8=.2fd4d184-fb95-4b63-9a51-e8c8adcd3a0f@github.com> On Tue, 16 Jul 2024 18:30:35 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review suggestions incorporated > > src/hotspot/cpu/x86/vm_version_x86.hpp line 838: > >> 836: >> 837: // For AVX CPUs only since it needs VEX encoding which is missing on SSE targets, >> 838: // thus f16c support is disabled if UseAVX == 0. > > This comment is somewhat or vey confusing. The code for supports_float16() by itself is very clear. I am wondering why do we need this explanation in the comment at all? Let us remove it altogether. Sure, FTR my comment was in relation to following check https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L1056 FP16-FP32 conversions are VEX encoded instructions and do not have SSE flavor. Thus, only works with UseAVX > 0. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20160#discussion_r1680430908 From jpai at openjdk.org Wed Jul 17 05:37:51 2024 From: jpai at openjdk.org (Jaikiran Pai) Date: Wed, 17 Jul 2024 05:37:51 GMT Subject: [jdk23] RFR: Merge d876cacf73ad698eda6668ccebbdfbe7690a0b06 Message-ID: <bY_ChE6g_Ce7D_S_6Hk54mceqpE4d7dnHOTnIQ-mgQ4=.8663036b-b285-4a2a-8e08-8b8d4caab76f@github.com> This brings the CPU24_07 changes into jdk23 branch. ------------- Commit messages: - 8323390: Enhance mask blit functionality - 8320097: Improve Image transformations - 8327413: Enhance compilation efficiency - 8324559: Improve 2D image handling - 8325600: Better symbol storage - 8319859: Better symbol storage - 8314794: Improve UTF8 String supports - 8320548: Improved loop handling - 8323231: Improve array management The merge commit only contains trivial merges, so no merge-specific webrevs have been generated. Changes: https://git.openjdk.org/jdk/pull/20212/files Stats: 162 lines in 14 files changed: 98 ins; 4 del; 60 mod Patch: https://git.openjdk.org/jdk/pull/20212.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20212/head:pull/20212 PR: https://git.openjdk.org/jdk/pull/20212 From jpai at openjdk.org Wed Jul 17 05:37:53 2024 From: jpai at openjdk.org (Jaikiran Pai) Date: Wed, 17 Jul 2024 05:37:53 GMT Subject: RFR: Merge 13341ca70276c891add2e4652b6e1e8020610988 Message-ID: <MRvTHoQ77X4EABZzfWIXuTtI1N6lT-CzJjBDNOUIWxQ=.32ebe203-c6f1-42e8-8d55-18c137c4be35@github.com> This brings in CPU24_07 changes into master branch ------------- Commit messages: - 8323390: Enhance mask blit functionality - 8320097: Improve Image transformations - 8327413: Enhance compilation efficiency - 8324559: Improve 2D image handling - 8325600: Better symbol storage - 8319859: Better symbol storage - 8314794: Improve UTF8 String supports - 8320548: Improved loop handling - 8323231: Improve array management The merge commit only contains trivial merges, so no merge-specific webrevs have been generated. Changes: https://git.openjdk.org/jdk/pull/20211/files Stats: 162 lines in 14 files changed: 98 ins; 4 del; 60 mod Patch: https://git.openjdk.org/jdk/pull/20211.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20211/head:pull/20211 PR: https://git.openjdk.org/jdk/pull/20211 From dholmes at openjdk.org Wed Jul 17 05:40:53 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 17 Jul 2024 05:40:53 GMT Subject: RFR: 8325945: Error reporting should limit the number of String characters printed [v2] In-Reply-To: <0C2xrw7X8gn7dl7LWNZu9lh5XJjvOSNbA0PRqa6ydoM=.29d1d6ee-242f-4ab5-abaa-d2113d030f82@github.com> References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> <0C2xrw7X8gn7dl7LWNZu9lh5XJjvOSNbA0PRqa6ydoM=.29d1d6ee-242f-4ab5-abaa-d2113d030f82@github.com> Message-ID: <j1xFGdRG38i_hvtMSBHJeHVlC4-HTghiPnz1aTEKY8Q=.cec14e3f-5274-420a-9683-1a90ce86aefc@github.com> On Wed, 17 Jul 2024 05:21:57 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed grammar > > src/hotspot/share/classfile/javaClasses.cpp line 785: > >> 783: index = length - (max_length / 2); >> 784: abridge = false; // only do this once >> 785: } > > Instead of the trailing "abridged", in similar cases I printed out the number of omitted characters. E.g. > > "Very long long long ... (53 characters omitted) ... long long string" > > Makes it obvious how much has been cut, and no danger of confusing the ellipse with naturally occurring dots. > > Additionally, I would only do this if length > max_length + X, with X being at least as long as the middle part (3 characters if you only print an ellipse). You end up with printed strings that may be slightly longer than maxlen, but OTOH the output is clearer. Otherwise you may indicate omission where none happened (if length == max_length) @tstuefe - thanks for looking at this Thomas. I don't get your second point. First I only abridge when length > max_length. Second adding in the X fudge factor just means max_length should have been set differently. To your first point, there will be similar changes to this coming so it would be good to standardise on how to report them. I like the idea you propose, but I couldn't find the code you mention. ?? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20150#discussion_r1680433575 From kvn at openjdk.org Wed Jul 17 05:41:52 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 17 Jul 2024 05:41:52 GMT Subject: RFR: 8335921: Fix HotSpot VM build without JVMTI In-Reply-To: <SFc7wGgnmCR8hHO_6h9j_LC5drW2BLC-sRKuFNtAOjE=.d061ebae-ba38-4d05-9648-e0ff17bb3343@github.com> References: <SFc7wGgnmCR8hHO_6h9j_LC5drW2BLC-sRKuFNtAOjE=.d061ebae-ba38-4d05-9648-e0ff17bb3343@github.com> Message-ID: <7TC1wAE-NN6af0pg5dEJxInkJxhIU0mq0RJ8NDK_c3U=.84572ae7-b0d0-44e4-89dc-df7bd73a58ea@github.com> On Wed, 17 Jul 2024 03:37:36 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote: > Citing David Holmes from bug report: > "We provided the ability to leave out certain VM services (JVMTI, GC's other than serial, ...) as part of the design of the MinimalVM to support Java SE Embedded, along with the Compact Profiles of JDK 8. This manifested in the source code as a set of INCLUDE_XXX ifdef guards. The build system later exposed these as individual --with-jvm-features=xxx,yyy support. However, it was never intended (and certainly not tested) that you could mix-and-match arbitrary subsets of these VM features at will. Consequently if you start trying to do this you will find things that need fixing." > > I added `INCLUDE_JVMTI` guards in two places where it was missed: JVMCI and JFR. Affected code was added recently, in the past year. After that I was able to build VM on all supported platforms. > > Note: building VM without JVMTI is not officially supported feature. We are not testing it and such failures (missing guards) are not unexpected. > > A lot of tests failed with VM without JVMTI. All are expected failures. I listed failed tests in bug report. > I fixed (added requires `vm.jvmti`) only one which was part of [JDK-8257967](https://bugs.openjdk.org/browse/JDK-8257967) changes which introduced JFR code without `INCLUDE_JVMTI` guards. > > I ran 2 rounds of testing: > > First, only **tier1** with VM built without JVMTI to see if builds passed and which tests affected. I wrote comment in bug report which tests failed (all expected to fail without JVMTI). > > Second round of testing with JVMTI in VM: tier1-4 Thank you, David, for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20209#issuecomment-2232459481 From kvn at openjdk.org Wed Jul 17 05:41:53 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 17 Jul 2024 05:41:53 GMT Subject: RFR: 8335921: Fix HotSpot VM build without JVMTI In-Reply-To: <9pz4Ru-DFK42pLhG6ny7_-bkHzTvDiBq5NfHk_0ron0=.3b8e2d59-7dc2-461c-be8a-00ccc00fe1f8@github.com> References: <SFc7wGgnmCR8hHO_6h9j_LC5drW2BLC-sRKuFNtAOjE=.d061ebae-ba38-4d05-9648-e0ff17bb3343@github.com> <9pz4Ru-DFK42pLhG6ny7_-bkHzTvDiBq5NfHk_0ron0=.3b8e2d59-7dc2-461c-be8a-00ccc00fe1f8@github.com> Message-ID: <ir04Nyodej1X5r5JqWJhK7pAqeLhVFaZD2nE7A0iJBI=.4e0163f0-831d-430c-a5f3-f1e8c3c1c31c@github.com> On Wed, 17 Jul 2024 04:52:35 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Citing David Holmes from bug report: >> "We provided the ability to leave out certain VM services (JVMTI, GC's other than serial, ...) as part of the design of the MinimalVM to support Java SE Embedded, along with the Compact Profiles of JDK 8. This manifested in the source code as a set of INCLUDE_XXX ifdef guards. The build system later exposed these as individual --with-jvm-features=xxx,yyy support. However, it was never intended (and certainly not tested) that you could mix-and-match arbitrary subsets of these VM features at will. Consequently if you start trying to do this you will find things that need fixing." >> >> I added `INCLUDE_JVMTI` guards in two places where it was missed: JVMCI and JFR. Affected code was added recently, in the past year. After that I was able to build VM on all supported platforms. >> >> Note: building VM without JVMTI is not officially supported feature. We are not testing it and such failures (missing guards) are not unexpected. >> >> A lot of tests failed with VM without JVMTI. All are expected failures. I listed failed tests in bug report. >> I fixed (added requires `vm.jvmti`) only one which was part of [JDK-8257967](https://bugs.openjdk.org/browse/JDK-8257967) changes which introduced JFR code without `INCLUDE_JVMTI` guards. >> >> I ran 2 rounds of testing: >> >> First, only **tier1** with VM built without JVMTI to see if builds passed and which tests affected. I wrote comment in bug report which tests failed (all expected to fail without JVMTI). >> >> Second round of testing with JVMTI in VM: tier1-4 > > src/hotspot/share/jfr/instrumentation/jfrJvmtiAgent.hpp line 35: > >> 33: JfrJvmtiAgent(); >> 34: ~JfrJvmtiAgent(); >> 35: static bool create() NOT_JVMTI_RETURN_(true); > > It initially seemed odd to return `true` here, but looking through the JFR code that interacts with the Agent it seems the right way to view this is that without JVMTI we have a no-op agent. Right. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20209#discussion_r1680433885 From djelinski at openjdk.org Wed Jul 17 05:43:54 2024 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Wed, 17 Jul 2024 05:43:54 GMT Subject: RFR: Merge 13341ca70276c891add2e4652b6e1e8020610988 In-Reply-To: <MRvTHoQ77X4EABZzfWIXuTtI1N6lT-CzJjBDNOUIWxQ=.32ebe203-c6f1-42e8-8d55-18c137c4be35@github.com> References: <MRvTHoQ77X4EABZzfWIXuTtI1N6lT-CzJjBDNOUIWxQ=.32ebe203-c6f1-42e8-8d55-18c137c4be35@github.com> Message-ID: <-RlBW1WFIiQyUpGqtLzAdpPq_uVik1dne-34b9lxOUY=.ff17a5c9-db80-4d41-8e7d-9c81b2d665a4@github.com> On Wed, 17 Jul 2024 05:33:15 GMT, Jaikiran Pai <jpai at openjdk.org> wrote: > This brings in CPU24_07 changes into master branch Marked as reviewed by djelinski (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20211#pullrequestreview-2181924404 From djelinski at openjdk.org Wed Jul 17 05:44:52 2024 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Wed, 17 Jul 2024 05:44:52 GMT Subject: [jdk23] RFR: Merge d876cacf73ad698eda6668ccebbdfbe7690a0b06 In-Reply-To: <bY_ChE6g_Ce7D_S_6Hk54mceqpE4d7dnHOTnIQ-mgQ4=.8663036b-b285-4a2a-8e08-8b8d4caab76f@github.com> References: <bY_ChE6g_Ce7D_S_6Hk54mceqpE4d7dnHOTnIQ-mgQ4=.8663036b-b285-4a2a-8e08-8b8d4caab76f@github.com> Message-ID: <purSX-2YubBpPXCReHDrjW4IVHXozzq_f7EQX95xwD0=.d0b80e1c-4970-47dd-9582-b79b44f14a99@github.com> On Wed, 17 Jul 2024 05:33:54 GMT, Jaikiran Pai <jpai at openjdk.org> wrote: > This brings the CPU24_07 changes into jdk23 branch. Marked as reviewed by djelinski (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20212#pullrequestreview-2181924794 From jbhateja at openjdk.org Wed Jul 17 05:45:05 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 17 Jul 2024 05:45:05 GMT Subject: RFR: 8335860: compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard AVX/SSE settings [v3] In-Reply-To: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com> References: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com> Message-ID: <89mJFTY-O4WgqC7eYEu125ehHkgVCFtecRPhSQuEisI=.2b86f885-8225-4103-9dcb-6a4be73bad71@github.com> > Enabling test with explicit feature checks for x86 target. > Removing from test/hotspot/jtreg/ProblemList.txt > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Restoring earlier comment - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8335860 - Review suggestions incorporated - 8335860: compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard AVX/SSE settings ------------- Changes: https://git.openjdk.org/jdk/pull/20160/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20160&range=02 Stats: 4 lines in 2 files changed: 0 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20160.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20160/head:pull/20160 PR: https://git.openjdk.org/jdk/pull/20160 From dholmes at openjdk.org Wed Jul 17 05:56:50 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 17 Jul 2024 05:56:50 GMT Subject: RFR: Merge 13341ca70276c891add2e4652b6e1e8020610988 In-Reply-To: <MRvTHoQ77X4EABZzfWIXuTtI1N6lT-CzJjBDNOUIWxQ=.32ebe203-c6f1-42e8-8d55-18c137c4be35@github.com> References: <MRvTHoQ77X4EABZzfWIXuTtI1N6lT-CzJjBDNOUIWxQ=.32ebe203-c6f1-42e8-8d55-18c137c4be35@github.com> Message-ID: <IR7yERVG_xEnc_ZksWMfirmU73e9o1U69H_A_e3tZ9Y=.e446bbeb-a58d-4fd6-9db6-60fd32a17f1e@github.com> On Wed, 17 Jul 2024 05:33:15 GMT, Jaikiran Pai <jpai at openjdk.org> wrote: > This brings in CPU24_07 changes into master branch Hotspot looks good. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20211#pullrequestreview-2181940719 From dholmes at openjdk.org Wed Jul 17 05:57:50 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 17 Jul 2024 05:57:50 GMT Subject: [jdk23] RFR: Merge d876cacf73ad698eda6668ccebbdfbe7690a0b06 In-Reply-To: <bY_ChE6g_Ce7D_S_6Hk54mceqpE4d7dnHOTnIQ-mgQ4=.8663036b-b285-4a2a-8e08-8b8d4caab76f@github.com> References: <bY_ChE6g_Ce7D_S_6Hk54mceqpE4d7dnHOTnIQ-mgQ4=.8663036b-b285-4a2a-8e08-8b8d4caab76f@github.com> Message-ID: <VPJ1zpP40QbHN3TIVUxtdjgp-Pg9ixM7rHUrcvjhmTY=.0d02a69d-cfe7-425a-add0-0504840a55d8@github.com> On Wed, 17 Jul 2024 05:33:54 GMT, Jaikiran Pai <jpai at openjdk.org> wrote: > This brings the CPU24_07 changes into jdk23 branch. Hotspot looks good. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20212#pullrequestreview-2181941981 From jpai at openjdk.org Wed Jul 17 06:09:08 2024 From: jpai at openjdk.org (Jaikiran Pai) Date: Wed, 17 Jul 2024 06:09:08 GMT Subject: RFR: Merge 13341ca70276c891add2e4652b6e1e8020610988 [v2] In-Reply-To: <MRvTHoQ77X4EABZzfWIXuTtI1N6lT-CzJjBDNOUIWxQ=.32ebe203-c6f1-42e8-8d55-18c137c4be35@github.com> References: <MRvTHoQ77X4EABZzfWIXuTtI1N6lT-CzJjBDNOUIWxQ=.32ebe203-c6f1-42e8-8d55-18c137c4be35@github.com> Message-ID: <7GKxRhSM3KmXxcXQWeiB_FjMI845R58e06v8z3LPYRg=.f1103e22-f12d-42dc-9528-8cab49e7620e@github.com> > This brings in CPU24_07 changes into master branch Jaikiran Pai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20211/files - new: https://git.openjdk.org/jdk/pull/20211/files/13341ca7..13341ca7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20211&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20211&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20211.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20211/head:pull/20211 PR: https://git.openjdk.org/jdk/pull/20211 From jpai at openjdk.org Wed Jul 17 06:09:09 2024 From: jpai at openjdk.org (Jaikiran Pai) Date: Wed, 17 Jul 2024 06:09:09 GMT Subject: RFR: Merge 13341ca70276c891add2e4652b6e1e8020610988 In-Reply-To: <MRvTHoQ77X4EABZzfWIXuTtI1N6lT-CzJjBDNOUIWxQ=.32ebe203-c6f1-42e8-8d55-18c137c4be35@github.com> References: <MRvTHoQ77X4EABZzfWIXuTtI1N6lT-CzJjBDNOUIWxQ=.32ebe203-c6f1-42e8-8d55-18c137c4be35@github.com> Message-ID: <jIdXM_bLVOKm5AhfkbA5EtUys_ALwbTgNA3-m86eFMg=.7df0a878-dbb9-4ec6-9eb9-feeae3e44dad@github.com> On Wed, 17 Jul 2024 05:33:15 GMT, Jaikiran Pai <jpai at openjdk.org> wrote: > This brings in CPU24_07 changes into master branch Thank you David and Daniel for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20211#issuecomment-2232494499 From jpai at openjdk.org Wed Jul 17 06:09:09 2024 From: jpai at openjdk.org (Jaikiran Pai) Date: Wed, 17 Jul 2024 06:09:09 GMT Subject: Integrated: Merge 13341ca70276c891add2e4652b6e1e8020610988 In-Reply-To: <MRvTHoQ77X4EABZzfWIXuTtI1N6lT-CzJjBDNOUIWxQ=.32ebe203-c6f1-42e8-8d55-18c137c4be35@github.com> References: <MRvTHoQ77X4EABZzfWIXuTtI1N6lT-CzJjBDNOUIWxQ=.32ebe203-c6f1-42e8-8d55-18c137c4be35@github.com> Message-ID: <hFXcLRnvb1hk3kN5O6Fx0EsNNC9dR-d-XKFVBTI1zkU=.6ce13b48-d962-46da-ad55-c51fa8d669e9@github.com> On Wed, 17 Jul 2024 05:33:15 GMT, Jaikiran Pai <jpai at openjdk.org> wrote: > This brings in CPU24_07 changes into master branch This pull request has now been integrated. Changeset: d90c20c0 Author: Jaikiran Pai <jpai at openjdk.org> URL: https://git.openjdk.org/jdk/commit/d90c20c0c728ced94493e0e58956153f6f61f898 Stats: 162 lines in 14 files changed: 98 ins; 4 del; 60 mod Merge Reviewed-by: djelinski, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/20211 From jpai at openjdk.org Wed Jul 17 06:09:25 2024 From: jpai at openjdk.org (Jaikiran Pai) Date: Wed, 17 Jul 2024 06:09:25 GMT Subject: [jdk23] RFR: Merge d876cacf73ad698eda6668ccebbdfbe7690a0b06 [v2] In-Reply-To: <bY_ChE6g_Ce7D_S_6Hk54mceqpE4d7dnHOTnIQ-mgQ4=.8663036b-b285-4a2a-8e08-8b8d4caab76f@github.com> References: <bY_ChE6g_Ce7D_S_6Hk54mceqpE4d7dnHOTnIQ-mgQ4=.8663036b-b285-4a2a-8e08-8b8d4caab76f@github.com> Message-ID: <0K-SFe5PeIBtOlrtjWuWi-iNf0kcjadeDHT3Ir_XCrg=.4051dd76-12ff-4ea8-8fe0-a7918d2142b4@github.com> > This brings the CPU24_07 changes into jdk23 branch. Jaikiran Pai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20212/files - new: https://git.openjdk.org/jdk/pull/20212/files/d876cacf..d876cacf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20212&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20212&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20212.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20212/head:pull/20212 PR: https://git.openjdk.org/jdk/pull/20212 From jpai at openjdk.org Wed Jul 17 06:09:25 2024 From: jpai at openjdk.org (Jaikiran Pai) Date: Wed, 17 Jul 2024 06:09:25 GMT Subject: [jdk23] Integrated: Merge d876cacf73ad698eda6668ccebbdfbe7690a0b06 In-Reply-To: <bY_ChE6g_Ce7D_S_6Hk54mceqpE4d7dnHOTnIQ-mgQ4=.8663036b-b285-4a2a-8e08-8b8d4caab76f@github.com> References: <bY_ChE6g_Ce7D_S_6Hk54mceqpE4d7dnHOTnIQ-mgQ4=.8663036b-b285-4a2a-8e08-8b8d4caab76f@github.com> Message-ID: <1_DJAHbifu2iXhbt0f4md0YF8Ym8i3t_b-qHM6j2BpU=.c70ecfc8-db8b-469a-a07f-1ff437a97024@github.com> On Wed, 17 Jul 2024 05:33:54 GMT, Jaikiran Pai <jpai at openjdk.org> wrote: > This brings the CPU24_07 changes into jdk23 branch. This pull request has now been integrated. Changeset: 7afb958e Author: Jaikiran Pai <jpai at openjdk.org> URL: https://git.openjdk.org/jdk/commit/7afb958e8d30221456a7b4634de0200dfe074950 Stats: 162 lines in 14 files changed: 98 ins; 4 del; 60 mod Merge Reviewed-by: djelinski, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/20212 From stuefe at openjdk.org Wed Jul 17 06:28:52 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 17 Jul 2024 06:28:52 GMT Subject: RFR: 8325945: Error reporting should limit the number of String characters printed [v2] In-Reply-To: <j1xFGdRG38i_hvtMSBHJeHVlC4-HTghiPnz1aTEKY8Q=.cec14e3f-5274-420a-9683-1a90ce86aefc@github.com> References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> <0C2xrw7X8gn7dl7LWNZu9lh5XJjvOSNbA0PRqa6ydoM=.29d1d6ee-242f-4ab5-abaa-d2113d030f82@github.com> <j1xFGdRG38i_hvtMSBHJeHVlC4-HTghiPnz1aTEKY8Q=.cec14e3f-5274-420a-9683-1a90ce86aefc@github.com> Message-ID: <onfNeoQzCfo3lsgAdomCn6xxQx3nsVVk8h8h3gQDJl8=.b8c3a559-fad4-4b60-b22f-f07fd5f0b807@github.com> On Wed, 17 Jul 2024 05:38:41 GMT, David Holmes <dholmes at openjdk.org> wrote: >> src/hotspot/share/classfile/javaClasses.cpp line 785: >> >>> 783: index = length - (max_length / 2); >>> 784: abridge = false; // only do this once >>> 785: } >> >> Instead of the trailing "abridged", in similar cases I printed out the number of omitted characters. E.g. >> >> "Very long long long ... (53 characters omitted) ... long long string" >> >> Makes it obvious how much has been cut, and no danger of confusing the ellipse with naturally occurring dots. >> >> Additionally, I would only do this if length > max_length + X, with X being at least as long as the middle part (3 characters if you only print an ellipse). You end up with printed strings that may be slightly longer than maxlen, but OTOH the output is clearer. Otherwise you may indicate omission where none happened (if length == max_length) > > @tstuefe - thanks for looking at this Thomas. I don't get your second point. First I only abridge when length > max_length. Second adding in the X fudge factor just means max_length should have been set differently. > > To your first point, there will be similar changes to this coming so it would be good to standardise on how to report them. I like the idea you propose, but I couldn't find the code you mention. ?? Hi David, > I don't get your second point. First I only abridge when length > max_length. Second adding in the X fudge factor just means max_length should have been set differently. AFAICS you start abridging if length is exactly max_length. So, you could have: "Hallo David" maxlen=11 => "Hallo ... omitted 0 characters ... David" which is just strange. Additionally, it may seem just strange to replace a small inner portion with a larger "omitted" text, because then the text plus replacement is larger than the original text, e.g. "Hallo David" maxlen=10 => "Hallo ... omitted 1 characters ... David" So my proposal would be to have a stretch zone the size of the omission text (roughly, modulo variable digits in text) and allow larger text than maxlen but smaller than the stretch zone length. E.g. "Hallo David" maxlen=10 => "Hallo David" "Hallo 0123456789012345678901234567890123456789 David" => "Hallo 01234 ... omitted 30 characters ... 56789 David" Its mainly for optics. > To your first point, there will be similar changes to this coming so it would be good to standardise on how to report them. I like the idea you propose, but I couldn't find the code you mention. ?? Oh, sorry, this was not in the OpenJDK; this was for a different product I worked on at SAP. But that omission text form served us well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20150#discussion_r1680478197 From dholmes at openjdk.org Wed Jul 17 06:37:57 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 17 Jul 2024 06:37:57 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9] In-Reply-To: <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com> Message-ID: <Wj5uaRxDmVYqDnt2V1PgErk7dI10LCro6WSfAm4Q6BU=.6fd91b51-ec40-438f-95a4-d2fbf593a288@github.com> On Mon, 15 Jul 2024 00:50:30 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision: > > - Remove try_read > - Add explicit to single parameter constructors > - Remove superfluous access specifier > - Remove unused include > - Update assert message OMCache::set_monitor > - Fix indentation > - Remove outdated comment LightweightSynchronizer::exit > - Remove logStream include > - Remove strange comment > - Fix javaThread include src/hotspot/share/runtime/basicLock.hpp line 44: > 42: // a sentinel zero value indicating a recursive stack-lock. > 43: // * For LM_LIGHTWEIGHT > 44: // Used as a cache the ObjectMonitor* used when locking. Must either The first sentence doesn't read correctly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1680492976 From dholmes at openjdk.org Wed Jul 17 06:42:53 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 17 Jul 2024 06:42:53 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9] In-Reply-To: <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com> Message-ID: <0Dwv0GUezG25Soj6iG3Ti4NCm_RQJdF7psmnDoUAdRU=.c38a44c6-f6e6-4e2a-84ef-45c32d145a13@github.com> On Mon, 15 Jul 2024 00:50:30 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision: > > - Remove try_read > - Add explicit to single parameter constructors > - Remove superfluous access specifier > - Remove unused include > - Update assert message OMCache::set_monitor > - Fix indentation > - Remove outdated comment LightweightSynchronizer::exit > - Remove logStream include > - Remove strange comment > - Fix javaThread include src/hotspot/share/runtime/basicLock.hpp line 46: > 44: // Used as a cache the ObjectMonitor* used when locking. Must either > 45: // be nullptr or the ObjectMonitor* used when locking. > 46: volatile uintptr_t _metadata; The displaced header/markword terminology was very well known to people, whereas "metadata" is really abstract - people will always need to go and find out what it actually refers to. Could we not define a union here to support the legacy and lightweight modes more explicitly and keep the existing terminology for the setters/getters for the code that uses it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1680496495 From dholmes at openjdk.org Wed Jul 17 06:42:54 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 17 Jul 2024 06:42:54 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9] In-Reply-To: <0Dwv0GUezG25Soj6iG3Ti4NCm_RQJdF7psmnDoUAdRU=.c38a44c6-f6e6-4e2a-84ef-45c32d145a13@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com> <0Dwv0GUezG25Soj6iG3Ti4NCm_RQJdF7psmnDoUAdRU=.c38a44c6-f6e6-4e2a-84ef-45c32d145a13@github.com> Message-ID: <U3cg8IdnKu5Eeg-52muJuU0vEGJTRaX4jhKCOB3DVtk=.a1acc8fc-c3b7-4d38-ace8-dd39eff6c139@github.com> On Wed, 17 Jul 2024 06:39:14 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision: >> >> - Remove try_read >> - Add explicit to single parameter constructors >> - Remove superfluous access specifier >> - Remove unused include >> - Update assert message OMCache::set_monitor >> - Fix indentation >> - Remove outdated comment LightweightSynchronizer::exit >> - Remove logStream include >> - Remove strange comment >> - Fix javaThread include > > src/hotspot/share/runtime/basicLock.hpp line 46: > >> 44: // Used as a cache the ObjectMonitor* used when locking. Must either >> 45: // be nullptr or the ObjectMonitor* used when locking. >> 46: volatile uintptr_t _metadata; > > The displaced header/markword terminology was very well known to people, whereas "metadata" is really abstract - people will always need to go and find out what it actually refers to. Could we not define a union here to support the legacy and lightweight modes more explicitly and keep the existing terminology for the setters/getters for the code that uses it? I should have read ahead. I see you do keep the setters/getters. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1680497748 From dholmes at openjdk.org Wed Jul 17 06:46:00 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 17 Jul 2024 06:46:00 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9] In-Reply-To: <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com> Message-ID: <LftlPgWcCiaMMys4K2eXSS2HNSS2WNh5sImxQ8QHKFY=.df009912-4c6e-4c09-b6c9-c5d308cf5cf1@github.com> On Mon, 15 Jul 2024 00:50:30 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision: > > - Remove try_read > - Add explicit to single parameter constructors > - Remove superfluous access specifier > - Remove unused include > - Update assert message OMCache::set_monitor > - Fix indentation > - Remove outdated comment LightweightSynchronizer::exit > - Remove logStream include > - Remove strange comment > - Fix javaThread include src/hotspot/share/runtime/deoptimization.cpp line 1641: > 1639: assert(fr.is_deoptimized_frame(), "frame must be scheduled for deoptimization"); > 1640: if (LockingMode == LM_LEGACY) { > 1641: mon_info->lock()->set_displaced_header(markWord::unused_mark()); In the existing code how is this restricted to the LM_LEGACY case?? It appears to be unconditional which suggests you are changing the non-UOMT LM_LIGHTWEIGHT logic. ?? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1680500696 From dholmes at openjdk.org Wed Jul 17 06:50:55 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 17 Jul 2024 06:50:55 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9] In-Reply-To: <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com> Message-ID: <1FImJurji3MUi1rauLpFYqETg45LmnlxLrRijzXBukg=.7125982a-3507-4711-922e-2c7c9706d87c@github.com> On Mon, 15 Jul 2024 00:50:30 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision: > > - Remove try_read > - Add explicit to single parameter constructors > - Remove superfluous access specifier > - Remove unused include > - Update assert message OMCache::set_monitor > - Fix indentation > - Remove outdated comment LightweightSynchronizer::exit > - Remove logStream include > - Remove strange comment > - Fix javaThread include src/hotspot/share/runtime/lightweightSynchronizer.cpp line 60: > 58: > 59: // ConcurrentHashTable storing links from objects to ObjectMonitors > 60: class ObjectMonitorWorld : public CHeapObj<MEMFLAGS::mtObjectMonitor> { OMWorld describes the project not the hashtable, this should be called ObjectMonitorTable or some such. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 62: > 60: class ObjectMonitorWorld : public CHeapObj<MEMFLAGS::mtObjectMonitor> { > 61: struct Config { > 62: using Value = ObjectMonitor*; Does this alias really help? We don't state the type that many times and it looks odd to end up with a mix of `Value` and `ObjectMonitor*` in the same code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1680508685 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1680508801 From dholmes at openjdk.org Wed Jul 17 07:02:55 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 17 Jul 2024 07:02:55 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9] In-Reply-To: <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com> Message-ID: <fM04k6Q6c7d_WrQHqLgruy7mpffpZrI6A2o7ZcMAwz0=.5433e04a-1d24-4b20-a126-218f20313cfd@github.com> On Mon, 15 Jul 2024 00:50:30 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision: > > - Remove try_read > - Add explicit to single parameter constructors > - Remove superfluous access specifier > - Remove unused include > - Update assert message OMCache::set_monitor > - Fix indentation > - Remove outdated comment LightweightSynchronizer::exit > - Remove logStream include > - Remove strange comment > - Fix javaThread include src/hotspot/share/runtime/lightweightSynchronizer.cpp line 102: > 100: assert(*value != nullptr, "must be"); > 101: return (*value)->object_is_cleared(); > 102: } The `is_dead` functions seem oddly placed given they do not relate to the object stored in the wrapper. Why are they here? And what is the difference between `object_is_cleared` and `object_is_dead` (as used by `LookupMonitor`) ? src/hotspot/share/runtime/lightweightSynchronizer.cpp line 105: > 103: }; > 104: > 105: class LookupMonitor : public StackObj { I'm not understanding why we need this little wrapper class. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1680526331 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1680526868 From epeter at openjdk.org Wed Jul 17 07:36:51 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 Jul 2024 07:36:51 GMT Subject: RFR: 8335860: compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard AVX/SSE settings In-Reply-To: <Xzk8WdTohUoGqud1EBY6YeTn-MRheHLXJxT3xhX88a4=.34b86427-e557-46ad-94a4-2966184fe33f@github.com> References: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com> <Xzk8WdTohUoGqud1EBY6YeTn-MRheHLXJxT3xhX88a4=.34b86427-e557-46ad-94a4-2966184fe33f@github.com> Message-ID: <stZn5gAlPWALwY9tQlWkFg1uZkG7Hqsg6cSPCR_1ZhI=.4b89ed7b-52c0-4cf4-9c75-daf78a052559@github.com> On Wed, 17 Jul 2024 05:34:46 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote: >> Enabling test with explicit feature checks for x86 target. >> Removing from test/hotspot/jtreg/ProblemList.txt >> >> Best Regards, >> Jatin > >> @jatin-bhateja There was also a suggestion from @eme64 as part of #20062 to remove requires vm.compiler2.enabled from the test. > > Test only validates specific C2 IR patten, framework makes sure to compile @Test annotated methods with top tier (c2 : default) compiler using Whitebox mechanism. So @require flags looks redundant here. Agree. Hi @jatin-bhateja Are you also going to address the questions for the VM changes from https://github.com/openjdk/jdk/pull/20062? That could be done in a separate RFE, but it would be nice to at least hear what you plan to do or not to do ;) Or would you like to look into the VM changes @jaskarth ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20160#issuecomment-2232629036 From jbhateja at openjdk.org Wed Jul 17 07:40:52 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 17 Jul 2024 07:40:52 GMT Subject: RFR: 8335860: compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard AVX/SSE settings In-Reply-To: <Xzk8WdTohUoGqud1EBY6YeTn-MRheHLXJxT3xhX88a4=.34b86427-e557-46ad-94a4-2966184fe33f@github.com> References: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com> <Xzk8WdTohUoGqud1EBY6YeTn-MRheHLXJxT3xhX88a4=.34b86427-e557-46ad-94a4-2966184fe33f@github.com> Message-ID: <dcOE47WZPyhKVU_yN3x68rLj0JDpyXGnIFjncthF-Ss=.a6161082-a3e0-42a0-8bd0-82babed1587d@github.com> On Wed, 17 Jul 2024 05:34:46 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote: >> Enabling test with explicit feature checks for x86 target. >> Removing from test/hotspot/jtreg/ProblemList.txt >> >> Best Regards, >> Jatin > >> @jatin-bhateja There was also a suggestion from @eme64 as part of #20062 to remove requires vm.compiler2.enabled from the test. > > Test only validates specific C2 IR patten, framework makes sure to compile @Test annotated methods with top tier (c2 : default) compiler using Whitebox mechanism. So @require flags looks redundant here. Agree. > Hi @jatin-bhateja Are you also going to address the questions for the VM changes from #20062? That could be done in a separate RFE, but it would be nice to at least hear what you plan to do or not to do ;) > > Or would you like to look into the VM changes @jaskarth ? Hi @eme64 , My apologies, did not find time to respond in time due to other priority items, please go ahead and file RFE and kindly assign it to me. I will handle it as the part of RFE or close that with appropriate justifications. This PR is for test enablement. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20160#issuecomment-2232634295 From mbaesken at openjdk.org Wed Jul 17 07:51:52 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 17 Jul 2024 07:51:52 GMT Subject: RFR: 8330144: Revise os::free_memory() [v2] In-Reply-To: <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com> References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com> <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com> Message-ID: <0NDfnOazfhdITpxweiblZT0H-K8El5-xeP40MQ5J4LY=.ab0c4862-84dd-42c4-a103-3e7fa36a808a@github.com> On Wed, 10 Jul 2024 20:09:45 GMT, Robert Toyonaga <duke at openjdk.org> wrote: >> ### Summary >> On linux, change `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` so that it uses `madvise(MADV_DONTNEED)` (similar to the BSD implementation) instead of recommitting over the existing committed memory to discard the existing pages. This function should free the underlying memory without uncommitting. The benefit of this change is that we can get rid of conditional logic dependent on whether we're dealing with huge pages, `madvise` can't fail, and we can also get rid of the "alignment_hint" parameter. >> >> `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` has also been renamed to `os::disclaim_memory(char *addr, size_t bytes)` to differentiate it from `os::free_memory()` which reports the size of free memory instead of actually releasing memory. >> >> **Transparent huge pages:** >> `madvise(MADV_DONTNEED)` works with THP. As with small pages, `madvise(MADV_DONTNEED)` results in the memory being freed, RSS decreasing, and the addresses can be re-touched without being explicitly recommitted. >> >> To determine this, I set /sys/kernel/mm/transparent_hugepage/enabled to "always" and allocated a large amount of memory. Then /proc/PID/smaps shows that THP are being used to back that memory. After calling `disclaim_memory`, RSS decreases indicating the memory is no longer live. The `os::committed_in_range function` also reports that the memory has been freed (This function should probably be renamed to `live_in_range`). Touching the addresses again afterward is fine as well. >> >> **Explicit huge pages:** >> `madvise(MADV_DONTNEED)` does not result in memory being freed when used on explicit huge pages. However, the pages are not lost either. Additionally, after `madvise(MADV_DONTNEED)`, we can retouch the addresses without any problems. In conclusion, `madvise(MADV_DONTNEED)` has no affect on huge pages. This means the behavior of of this function with respect to huge pages remains the same. We can remove the "alignment_hint" parameter. >> >> To determine this, I allocated some huge pages via /proc/sys/vm/nr_hugepages. Successful allocation was confirmed with /proc/meminfo. After calling `disclaim_memory`, /proc/meminfo shows no change in the number of huge pages in use. Explicit huge pages are not reflected in RSS so I used the `os::committed_in_range function` instead. After calling `disclaim_memory`, the `os::committed_in_range` function reports that the memory is still live. Unfortunately that's not an imp... > > Robert Toyonaga has updated the pull request incrementally with two additional commits since the last revision: > > - Minor cleanup and comments. > - rename to disclaim_memory and update test The comments above and coding looks okay to me. Will test it first in our CI . Are you sure there are no side effects or maybe 'bad' kernel versions where unwanted things occur when using madvice instead? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20080#issuecomment-2232657238 From stuefe at openjdk.org Wed Jul 17 08:44:52 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 17 Jul 2024 08:44:52 GMT Subject: RFR: 8330144: Revise os::free_memory() [v2] In-Reply-To: <0NDfnOazfhdITpxweiblZT0H-K8El5-xeP40MQ5J4LY=.ab0c4862-84dd-42c4-a103-3e7fa36a808a@github.com> References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com> <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com> <0NDfnOazfhdITpxweiblZT0H-K8El5-xeP40MQ5J4LY=.ab0c4862-84dd-42c4-a103-3e7fa36a808a@github.com> Message-ID: <CW9S4_d61GhdYN2aJuRYiTTB94xZm6tgpjt0d2Rww6A=.555139c2-241f-4d77-a87f-1fb66a3f8fd7@github.com> On Wed, 17 Jul 2024 07:49:23 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote: > The comments above and coding looks okay to me. Will test it first in our CI . Are you sure there are no side effects or maybe 'bad' kernel versions where unwanted things occur when using madvice instead? @MBaesken Reasonably sure. The same technique has been used by the glibc for C-heap trimming for a long time. Thanks for putting this into SAP's CI! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20080#issuecomment-2232763075 From shade at openjdk.org Wed Jul 17 08:55:51 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 17 Jul 2024 08:55:51 GMT Subject: RFR: 8335921: Fix HotSpot VM build without JVMTI In-Reply-To: <SFc7wGgnmCR8hHO_6h9j_LC5drW2BLC-sRKuFNtAOjE=.d061ebae-ba38-4d05-9648-e0ff17bb3343@github.com> References: <SFc7wGgnmCR8hHO_6h9j_LC5drW2BLC-sRKuFNtAOjE=.d061ebae-ba38-4d05-9648-e0ff17bb3343@github.com> Message-ID: <gvShf6Szqz7zSWXeUB7SZnzAIN_IFy2KwJJrKQxVuMw=.e83a36f6-b619-4ddc-816e-e3336eb63941@github.com> On Wed, 17 Jul 2024 03:37:36 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote: > Citing David Holmes from bug report: > "We provided the ability to leave out certain VM services (JVMTI, GC's other than serial, ...) as part of the design of the MinimalVM to support Java SE Embedded, along with the Compact Profiles of JDK 8. This manifested in the source code as a set of INCLUDE_XXX ifdef guards. The build system later exposed these as individual --with-jvm-features=xxx,yyy support. However, it was never intended (and certainly not tested) that you could mix-and-match arbitrary subsets of these VM features at will. Consequently if you start trying to do this you will find things that need fixing." > > I added `INCLUDE_JVMTI` guards in two places where it was missed: JVMCI and JFR. Affected code was added recently, in the past year. After that I was able to build VM on all supported platforms. > > Note: building VM without JVMTI is not officially supported feature. We are not testing it and such failures (missing guards) are not unexpected. > > A lot of tests failed with VM without JVMTI. All are expected failures. I listed failed tests in bug report. > I fixed (added requires `vm.jvmti`) only one which was part of [JDK-8257967](https://bugs.openjdk.org/browse/JDK-8257967) changes which introduced JFR code without `INCLUDE_JVMTI` guards. > > I ran 2 rounds of testing: > > First, only **tier1** with VM built without JVMTI to see if builds passed and which tests affected. I wrote comment in bug report which tests failed (all expected to fail without JVMTI). > > Second round of testing with JVMTI in VM: tier1-4 Looks okay. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20209#pullrequestreview-2182317512 From shade at openjdk.org Wed Jul 17 08:58:52 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 17 Jul 2024 08:58:52 GMT Subject: RFR: 8335921: Fix HotSpot VM build without JVMTI In-Reply-To: <9pz4Ru-DFK42pLhG6ny7_-bkHzTvDiBq5NfHk_0ron0=.3b8e2d59-7dc2-461c-be8a-00ccc00fe1f8@github.com> References: <SFc7wGgnmCR8hHO_6h9j_LC5drW2BLC-sRKuFNtAOjE=.d061ebae-ba38-4d05-9648-e0ff17bb3343@github.com> <9pz4Ru-DFK42pLhG6ny7_-bkHzTvDiBq5NfHk_0ron0=.3b8e2d59-7dc2-461c-be8a-00ccc00fe1f8@github.com> Message-ID: <EMXROqwZDbBelzLxDkBNoKLRMwku-GG0bVF7FgJfZU8=.c6d63260-eafa-4e24-aed8-a16e76823001@github.com> On Wed, 17 Jul 2024 04:57:38 GMT, David Holmes <dholmes at openjdk.org> wrote: > It highlights the problem we have with optional components in that you either have to work things so that semantically we have a do-nothing implementation of that component, or else you have to put the guards around every piece of code that would normally interact with it. At some point a few years ago I explored a private testing pipeline that built VM with different combination of options. It worked, but there were so many issues that cropped up continuously that I scratched that off as the lost cause. I gave up even on building Minimal. Fixing the particular build configurations every once in a while -- like this PR -- seems to be a pragmatic compromise. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20209#issuecomment-2232790329 From galder at openjdk.org Wed Jul 17 09:20:51 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 17 Jul 2024 09:20:51 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) In-Reply-To: <l3QGajoAAxigBK5cfIYwdGPTKfbJJJLvnSYisn7O7x8=.15bd4030-3af2-4d3a-a013-8f9c392223f1@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <l3QGajoAAxigBK5cfIYwdGPTKfbJJJLvnSYisn7O7x8=.15bd4030-3af2-4d3a-a013-8f9c392223f1@github.com> Message-ID: <MWiyM5dWze8wwUA4nKY5V-TiH98NO5qRlG3UcA3QbKw=.3c1c0d0f-de0c-485b-a5d0-f18c77aa32a4@github.com> On Wed, 10 Jul 2024 14:24:05 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote: > The C2 changes look nice! I just added one comment here about style. It would also be good to add some IR tests checking that the intrinsic is creating `MaxL`/`MinL` nodes before macro expansion, and a microbenchmark to compare results. Thanks for the review. +1 to the IR tests, I'll work on those. Re: microbenchmark - what do you have exactly in mind? For vectorization performance there is `ReductionPerf` though it's not a microbenchmark per se. Do you want a microbenchmark for the performance of vectorized max/min long? For non-vectorization performance there is `MathBench`. I would not expect performance differences in `MathBench` because the backend is still the same and this change really benefits vectorization. I've run the min/max long tests on darwin/aarch64 and linux/x64 and indeed I see no difference: linux/x64 Benchmark (seed) Mode Cnt Score Error Units MathBench.maxLong 0 thrpt 8 1464197.164 ? 27044.205 ops/ms # base MathBench.minLong 0 thrpt 8 1469917.328 ? 25397.401 ops/ms # base MathBench.maxLong 0 thrpt 8 1469615.250 ? 17950.429 ops/ms # patched MathBench.minLong 0 thrpt 8 1456290.514 ? 44455.727 ops/ms # patched darwin/aarch64 Benchmark (seed) Mode Cnt Score Error Units MathBench.maxLong 0 thrpt 8 1739341.447 ? 210983.444 ops/ms # base MathBench.minLong 0 thrpt 8 1659547.649 ? 260554.159 ops/ms # base MathBench.maxLong 0 thrpt 8 1660449.074 ? 254534.725 ops/ms # patched MathBench.minLong 0 thrpt 8 1729728.021 ? 16327.575 ops/ms # patched ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2232836799 From jpai at openjdk.org Wed Jul 17 10:26:50 2024 From: jpai at openjdk.org (Jaikiran Pai) Date: Wed, 17 Jul 2024 10:26:50 GMT Subject: RFR: 8336587: failure_handler lldb command times out on macosx-aarch64 core file In-Reply-To: <L1fxCYdEJTI5I2mfuEWOkkTihGnPgioh2A2Q5f-qXwg=.4ba1fe74-0395-4a87-bf39-56af4080b55b@github.com> References: <L1fxCYdEJTI5I2mfuEWOkkTihGnPgioh2A2Q5f-qXwg=.4ba1fe74-0395-4a87-bf39-56af4080b55b@github.com> Message-ID: <ytnudkg8F2EyXlgAJrGC5F9oTCnH8T8qMGTP7z5b3-0=.1cc62657-6859-409a-a721-0560d544bfb6@github.com> On Tue, 16 Jul 2024 23:59:09 GMT, Chris Plummer <cjplummer at openjdk.org> wrote: > I was looking at the failure_handler output for the lldb command on a macosx-aarch64 core file (it is trying to use lldb to get a back trace of all threads), and noticed it timed out: > > > ---------------------------------------- > [2024-07-15 05:15:47] [<snip>/usr/bin/lldb, --core, <snip>/core.92643, <snip>/bin/java, -o, thread backtrace all, -o, quit] timeout=20000 in <snip> > ---------------------------------------- > (lldb) target create "<snip>/bin/java" --core "<snip>/core.92643" > WARNING: tool timed out: killed process after 20000 ms > ---------------------------------------- > [2024-07-15 05:16:07] exit code: -2 time: 20163 ms > ---------------------------------------- > > > 20 seconds is the failure_handler default timeout for all commands. Core files on macosx-aarch64 tend to be very large. This one was over 13gb. On my MBPro it took 30 seconds. I bumped up the timeout to 60 seconds and reproduce the same crash in mach5 (more than once), and it usually took about 55 seconds for the lldb command, but it did succeed with the longer timeout. I think we should change the timeout to even more than 60 seconds just to make sure we won't see timeouts. 120 seconds is probably a good amount. Marked as reviewed by jpai (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20206#pullrequestreview-2182516025 From chegar at openjdk.org Wed Jul 17 13:26:57 2024 From: chegar at openjdk.org (Chris Hegarty) Date: Wed, 17 Jul 2024 13:26:57 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v6] In-Reply-To: <Kq6Xf3hnyRVLinNi7rm0oPm34BtiW1-qIqvahxWvXv0=.d44f3b37-e903-4274-aad6-820ec269fc8d@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <Kq6Xf3hnyRVLinNi7rm0oPm34BtiW1-qIqvahxWvXv0=.d44f3b37-e903-4274-aad6-820ec269fc8d@github.com> Message-ID: <q4Hqz-Tv4gYMfJm2HwDRBzVZXJeneqSxkugIzh1u9bI=.32d426a7-3fa7-48df-8210-4430dfa1d430@github.com> On Tue, 16 Jul 2024 18:09:20 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. >> >> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. >> >> In this PR: >> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it. >> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. >> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. >> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. >> >> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: >> >> >> Benchmark Threads Mode Cnt Score Error Units >> ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op >> ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op >> ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op >> ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op >> ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op >> ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op >> ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op >> ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op >> ConcurrentClose.sharedAccess 1 avgt 10 ... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > Revert JVMCI changes Thanks for the discussion and changes in this PR - it's super helpful ( in what we can do to workaround ), as well as a great improvement for the future. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2233313234 From chegar at openjdk.org Wed Jul 17 13:26:58 2024 From: chegar at openjdk.org (Chris Hegarty) Date: Wed, 17 Jul 2024 13:26:58 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v3] In-Reply-To: <_PDgnriMr5GoRUoTpxJnhZjIqEcjdF2kscNx94ScPlc=.b035d8ac-e218-46ed-86d9-a08368c63dc5@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com> <pLIRTEBVE6RFNwR0N7hZ3eRrqmAPcFB3v2PkPnDQUg0=.0e7ea973-6c9b-41e0-abb0-5d975108fbc5@github.com> <_PDgnriMr5GoRUoTpxJnhZjIqEcjdF2kscNx94ScPlc=.b035d8ac-e218-46ed-86d9-a08368c63dc5@github.com> Message-ID: <Rbwpfm2sVsTEvbqEw3rOepj7QkXEX66XiOoaT1-RvLg=.e4cbf7c7-3a14-4b75-86c2-b90afab04f6c@github.com> On Mon, 15 Jul 2024 12:59:27 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: > Effectively, once all the issues surrounding reachability fences will be addressed, we should be able to achieve numbers similar to above even in the case of shared close. Is there an issue where I can follow this? [ EDIT: oh! it's [JDK-8290892](https://bugs.openjdk.org/browse/JDK-8290892) ] ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2233317727 From szaldana at openjdk.org Wed Jul 17 13:52:05 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Wed, 17 Jul 2024 13:52:05 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID Message-ID: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> Hi all, This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. This PR addresses the following diagnostic commands: - [x] Compiler.perfmap - [x] GC.heap_dump - [x] System.dump_map - [x] Thread.dump_to_file - [x] VM.cds Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, filename (Optional) Name of the file to which the flight recording data is written when the recording is stopped. If no filename is given, a filename is generated from the PID and the current date and is placed in the directory where the process was started. The filename may also be a directory in which case, the filename is generated from the PID and the current date in the specified directory. (STRING, no default value) Note: If a filename is given, '%p' in the filename will be replaced by the PID, and '%t' will be replaced by the time in 'yyyy_MM_dd_HH_mm_ss' format. Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. Testing: - [x] Added test case passes. - [x] Modified existing VM.cds tests to also check for `%p` filenames. Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). Cheers, Sonia ------------- Commit messages: - 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID Changes: https://git.openjdk.org/jdk/pull/20198/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334492 Stats: 130 lines in 5 files changed: 116 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/20198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198 PR: https://git.openjdk.org/jdk/pull/20198 From szaldana at openjdk.org Wed Jul 17 14:02:31 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Wed, 17 Jul 2024 14:02:31 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v2] In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> Message-ID: <vluUCz7LJUc6FInntimxmXcyImSJfrxWkBOUWat-2zs=.7b3ab621-30a8-4e6d-89f2-77c3504dc432@github.com> > Hi all, > > This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. > > This PR addresses the following diagnostic commands: > - [x] Compiler.perfmap > - [x] GC.heap_dump > - [x] System.dump_map > - [x] Thread.dump_to_file > - [x] VM.cds > > Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). > > I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, > > > filename (Optional) Name of the file to which the flight recording data is > written when the recording is stopped. If no filename is given, a > filename is generated from the PID and the current date and is > placed in the directory where the process was started. The > filename may also be a directory in which case, the filename is > generated from the PID and the current date in the specified > directory. (STRING, no default value) > > Note: If a filename is given, '%p' in the filename will be > replaced by the PID, and '%t' will be replaced by the time in > 'yyyy_MM_dd_HH_mm_ss' format. > > > Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. > > Testing: > > - [x] Added test case passes. > - [x] Modified existing VM.cds tests to also check for `%p` filenames. > > Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). > > Cheers, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: Updating copyright headers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20198/files - new: https://git.openjdk.org/jdk/pull/20198/files/ee46dab5..eea54f6d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198 PR: https://git.openjdk.org/jdk/pull/20198 From stuefe at openjdk.org Wed Jul 17 14:24:54 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 17 Jul 2024 14:24:54 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v2] In-Reply-To: <vluUCz7LJUc6FInntimxmXcyImSJfrxWkBOUWat-2zs=.7b3ab621-30a8-4e6d-89f2-77c3504dc432@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <vluUCz7LJUc6FInntimxmXcyImSJfrxWkBOUWat-2zs=.7b3ab621-30a8-4e6d-89f2-77c3504dc432@github.com> Message-ID: <0FaB5dyzz0jaa0RETfdT4wcbS3jPg4QzIzj1s-pPWvw=.805a55dc-d141-482f-b6aa-e6c4fdfbb97d@github.com> On Wed, 17 Jul 2024 14:02:31 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. >> >> This PR addresses the following diagnostic commands: >> - [x] Compiler.perfmap >> - [x] GC.heap_dump >> - [x] System.dump_map >> - [x] Thread.dump_to_file >> - [x] VM.cds >> >> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). >> >> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, >> >> >> filename (Optional) Name of the file to which the flight recording data is >> written when the recording is stopped. If no filename is given, a >> filename is generated from the PID and the current date and is >> placed in the directory where the process was started. The >> filename may also be a directory in which case, the filename is >> generated from the PID and the current date in the specified >> directory. (STRING, no default value) >> >> Note: If a filename is given, '%p' in the filename will be >> replaced by the PID, and '%t' will be replaced by the time in >> 'yyyy_MM_dd_HH_mm_ss' format. >> >> >> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. >> >> Testing: >> >> - [x] Added test case passes. >> - [x] Modified existing VM.cds tests to also check for `%p` filenames. >> >> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). >> >> Cheers, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Updating copyright headers First cursory review. That is a useful feature - In all cases: please, in case of an error, don't THROW, don't do `warning`. Instead, just print to the `output()` of the DCmd. You want an error to appear to the user of the dcmd - so, to stdout or stderr of the jcmd process issuing the command. Not an exception in the target JVM process, nor a warning in the target JVM stderr stream - Can you give us a variant of `Arguments::copy_expand_pid` that receives a zero-terminated const char* as input so that we can avoid having to pass in the length of the input each time? - when passing in output buffers to functions, try to use sizeof(buffer) instead of repeating the buffer size. Then, one can change the size of the buffer array without having to modify using calls (but aware: pitfall, sizeof(char[]) vs sizeof(char*)) src/hotspot/share/code/codeCache.cpp line 1796: > 1794: // Perf expects to find the map file at /tmp/perf-<pid>.map > 1795: // if the file name is not specified. > 1796: char fname[JVM_MAXPATHLEN]; Good to see this gone, the old code implicitly relied on: max pid len -2147483647 = 11 chars, + length of "/tmp/perf-.map" not overflowing 32, which cuts a bit close to the bone. src/hotspot/share/code/codeCache.cpp line 1800: > 1798: jio_snprintf(fname, sizeof(fname), "/tmp/perf-%d.map", > 1799: os::current_process_id()); > 1800: } Arguably one could just do constexpr char[] filename_default = "/tmp/perf-%p.map"; Arguments::copy_expand_pid(filename == nullptr ? filename_default : filename, .....); src/hotspot/share/services/diagnosticCommand.cpp line 525: > 523: stringStream msg; > 524: msg.print("Invalid file path name specified: %s", _filename.value()); > 525: THROW_MSG(vmSymbols::java_lang_IllegalArgumentException(), msg.base()); Why throw? Why not just print an error message to the output() stream and return? src/hotspot/share/services/diagnosticCommand.cpp line 1059: > 1057: fileh = java_lang_String::create_from_str(fname, CHECK); > 1058: } else { > 1059: warning("Invalid file path name specified, fall back to default name"); `warning` prints a warning to the stdout of the JVM process. You don't want that; you want a warning to the issuer of the dcmd, which is another - possibly even remote - process. Write errors to `output()`, instead. src/hotspot/share/services/diagnosticCommand.cpp line 1138: > 1136: stringStream msg; > 1137: msg.print("Invalid file path name specified: %s", _filepath.value()); > 1138: THROW_MSG(vmSymbols::java_lang_IllegalArgumentException(), msg.base()); write to output() and return instead of throwing ------------- PR Review: https://git.openjdk.org/jdk/pull/20198#pullrequestreview-2183023385 PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1681109673 PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1681115247 PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1681118969 PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1681124783 PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1681125914 From stuefe at openjdk.org Wed Jul 17 14:24:54 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 17 Jul 2024 14:24:54 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v2] In-Reply-To: <0FaB5dyzz0jaa0RETfdT4wcbS3jPg4QzIzj1s-pPWvw=.805a55dc-d141-482f-b6aa-e6c4fdfbb97d@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <vluUCz7LJUc6FInntimxmXcyImSJfrxWkBOUWat-2zs=.7b3ab621-30a8-4e6d-89f2-77c3504dc432@github.com> <0FaB5dyzz0jaa0RETfdT4wcbS3jPg4QzIzj1s-pPWvw=.805a55dc-d141-482f-b6aa-e6c4fdfbb97d@github.com> Message-ID: <TwN9BcK3S4jrx7Kxu6GcBdXJULEURmkE3JF3JbG1sF8=.7d0ec11e-86be-4e30-b385-9b2bc6659c74@github.com> On Wed, 17 Jul 2024 14:02:01 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: >> >> Updating copyright headers > > src/hotspot/share/code/codeCache.cpp line 1800: > >> 1798: jio_snprintf(fname, sizeof(fname), "/tmp/perf-%d.map", >> 1799: os::current_process_id()); >> 1800: } > > Arguably one could just do > > constexpr char[] filename_default = "/tmp/perf-%p.map"; > Arguments::copy_expand_pid(filename == nullptr ? filename_default : filename, .....); This pattern can be followed in all cases where we have default filenames ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1681149193 From fgao at openjdk.org Wed Jul 17 15:04:53 2024 From: fgao at openjdk.org (Fei Gao) Date: Wed, 17 Jul 2024 15:04:53 GMT Subject: RFR: 8336245: AArch64: remove extra register copy when converting from long to pointer In-Reply-To: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> Message-ID: <Kvsq4IVjcpSDlBwrMxfnv_p2OWa9sgPkqCe6ZOjrb0Y=.7978edc8-6d5c-4840-86ec-9e728729bb0d@github.com> On Fri, 12 Jul 2024 13:44:25 GMT, Fei Gao <fgao at openjdk.org> wrote: > In the cases like: > > UNSAFE.putLong(address + off1 + 1030, lseed); > UNSAFE.putLong(address + 1023, lseed); > UNSAFE.putLong(address + off2 + 1001, lseed); > > > Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like: > > ldr R10, [R15, #120] # int ! Field: address > ldr R11, [R16, #136] # int ! Field: off1 > ldr R12, [R16, #144] # int ! Field: off2 > add R11, R11, R10 > mov R11, R11 # long -> ptr > add R12, R12, R10 > mov R10, R10 # long -> ptr > add R11, R11, #1030 # ptr > str R17, [R11] # int > add R10, R10, #1023 # ptr > str R17, [R10] # int > mov R10, R12 # long -> ptr > add R10, R10, #1001 # ptr > str R17, [R10] # int > > > In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly: > > ldr x10, [x15,#120] > ldp x11, x12, [x16,#136] > add x11, x11, x10 > add x12, x12, x10 > add x11, x11, #0x406 > str x17, [x11] > add x10, x10, #0x3ff > str x17, [x10] > mov x10, x12 <--- extra register copy > add x10, x10, #0x3e9 > str x17, [x10] > > > There is still one extra register copy, which we're trying to remove in this patch. > > This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition. > > Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so > > [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906 >> #20159 is also to fix the same issue. Please feel free to review the draft PR. Thanks. > This will need quite a lot of testing, perhaps higher tiers and jcstress. You can test these two PRs together. Thanks for approval, @theRealAph. I'll test jcstress on my local. Could you help review and test these two PRs with higher tiers please? @TobiHartmann @vnkozlov Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20157#issuecomment-2233542851 From jvernee at openjdk.org Wed Jul 17 15:19:18 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 17 Jul 2024 15:19:18 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v7] In-Reply-To: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> Message-ID: <tTrbqFeAXbr7uyNTiBPBts2WGPtSIqcuoZryE6T1_eY=.42caae9e-b377-457b-8e18-9bf2a3c15cf7@github.com> > This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. > > Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. > > In this PR: > - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it. > - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. > - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. > - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. > > I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: > > > Benchmark Threads Mode Cnt Score Error Units > ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op > ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op > ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op > ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op > ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op > ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op > ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op > ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op > ConcurrentClose.sharedAccess 1 avgt 10 52.042 ? 0.630 us/op > ConcurrentClose.conf... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: benchmark review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20158/files - new: https://git.openjdk.org/jdk/pull/20158/files/138fba42..2019289b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=05-06 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20158.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20158/head:pull/20158 PR: https://git.openjdk.org/jdk/pull/20158 From mcimadamore at openjdk.org Wed Jul 17 15:43:54 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 17 Jul 2024 15:43:54 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v7] In-Reply-To: <tTrbqFeAXbr7uyNTiBPBts2WGPtSIqcuoZryE6T1_eY=.42caae9e-b377-457b-8e18-9bf2a3c15cf7@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <tTrbqFeAXbr7uyNTiBPBts2WGPtSIqcuoZryE6T1_eY=.42caae9e-b377-457b-8e18-9bf2a3c15cf7@github.com> Message-ID: <ohN1Vo69B9dR4rAyFJlhXTcvVdmy9lznmdQ-TMZZgKQ=.e9ec24bb-fcb4-4af0-90d1-72bad69488b3@github.com> On Wed, 17 Jul 2024 15:19:18 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. >> >> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. >> >> In this PR: >> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it. >> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. >> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. >> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. >> >> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: >> >> >> Benchmark Threads Mode Cnt Score Error Units >> ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op >> ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op >> ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op >> ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op >> ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op >> ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op >> ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op >> ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op >> ConcurrentClose.sharedAccess 1 avgt 10 ... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > benchmark review comments Changes in scopedMemoryAccess and benchmark look good ------------- Marked as reviewed by mcimadamore (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20158#pullrequestreview-2183305234 From sviswanathan at openjdk.org Wed Jul 17 15:55:55 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 17 Jul 2024 15:55:55 GMT Subject: RFR: 8335860: compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard AVX/SSE settings [v3] In-Reply-To: <89mJFTY-O4WgqC7eYEu125ehHkgVCFtecRPhSQuEisI=.2b86f885-8225-4103-9dcb-6a4be73bad71@github.com> References: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com> <89mJFTY-O4WgqC7eYEu125ehHkgVCFtecRPhSQuEisI=.2b86f885-8225-4103-9dcb-6a4be73bad71@github.com> Message-ID: <lJPvlapZrnp_aOnL4jWaveFV-FdbNEVMFJ67cZ47S6Y=.45d2da93-caf4-47f3-a4dc-aaa8dc1ff7ab@github.com> On Wed, 17 Jul 2024 05:45:05 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote: >> Enabling test with explicit feature checks for x86 target. >> Removing from test/hotspot/jtreg/ProblemList.txt >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Restoring earlier comment > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8335860 > - Review suggestions incorporated > - 8335860: compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard AVX/SSE settings Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20160#pullrequestreview-2183342654 From liach at openjdk.org Wed Jul 17 16:17:51 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 17 Jul 2024 16:17:51 GMT Subject: RFR: 8336275: Move common Method and Constructor fields to Executable [v2] In-Reply-To: <Z04ux2yyYVR5W1y8prXM4lYPhycn-DE7aM7elm92C3k=.e9eb01d1-cbc3-4da8-be66-03ad947c20ff@github.com> References: <nYtWyeRXdAr_zmzpxdugyZNRUzhfHJUKX1K2ilpSs8A=.cb1c31be-a7e0-49b5-ab9b-18a3abd122a9@github.com> <Z04ux2yyYVR5W1y8prXM4lYPhycn-DE7aM7elm92C3k=.e9eb01d1-cbc3-4da8-be66-03ad947c20ff@github.com> Message-ID: <UBLc0-jkwNqLXnWhNf-LCRwJQu2G0x3TrrOLGZzmpwE=.1202e9cb-a3d9-4f93-92ab-a1d65c648e38@github.com> On Wed, 17 Jul 2024 03:03:23 GMT, Chen Liang <liach at openjdk.org> wrote: >> Move fields common to Method and Field to executable, which simplifies implementation. Removed useless transient modifiers as Method and Field were never serializable. >> >> Note to core-libs reviewers: Please review the associated CSR on trivial removal of `abstract` modifier as well. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Redundant transient; Update the comments to be more accurate Just noted that I removed some abstract flags from `Executable` methods, which requires a CSR archive entry. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20188#issuecomment-2233696779 From alanb at openjdk.org Wed Jul 17 16:30:52 2024 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 17 Jul 2024 16:30:52 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v2] In-Reply-To: <0FaB5dyzz0jaa0RETfdT4wcbS3jPg4QzIzj1s-pPWvw=.805a55dc-d141-482f-b6aa-e6c4fdfbb97d@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <vluUCz7LJUc6FInntimxmXcyImSJfrxWkBOUWat-2zs=.7b3ab621-30a8-4e6d-89f2-77c3504dc432@github.com> <0FaB5dyzz0jaa0RETfdT4wcbS3jPg4QzIzj1s-pPWvw=.805a55dc-d141-482f-b6aa-e6c4fdfbb97d@github.com> Message-ID: <70ENucevFdhpARyQni4sy3uUh93-FFQz038hJYswZqE=.76860fb0-94a3-473e-bb3c-9c9733a850b2@github.com> On Wed, 17 Jul 2024 14:07:55 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: >> >> Updating copyright headers > > src/hotspot/share/services/diagnosticCommand.cpp line 1138: > >> 1136: stringStream msg; >> 1137: msg.print("Invalid file path name specified: %s", _filepath.value()); >> 1138: THROW_MSG(vmSymbols::java_lang_IllegalArgumentException(), msg.base()); > > write to output() and return instead of throwing Yes, the error needs to be written the stream so that it is printed by the tool (at the end other of the pipe). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1681352690 From kvn at openjdk.org Wed Jul 17 16:36:52 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 17 Jul 2024 16:36:52 GMT Subject: RFR: 8335921: Fix HotSpot VM build without JVMTI In-Reply-To: <SFc7wGgnmCR8hHO_6h9j_LC5drW2BLC-sRKuFNtAOjE=.d061ebae-ba38-4d05-9648-e0ff17bb3343@github.com> References: <SFc7wGgnmCR8hHO_6h9j_LC5drW2BLC-sRKuFNtAOjE=.d061ebae-ba38-4d05-9648-e0ff17bb3343@github.com> Message-ID: <0ta5_zgRm_HBLh0jhqZm807Qe5sYsSb3rNTEF9j2p2Q=.a303a8ac-b957-4727-832a-36d834d165a4@github.com> On Wed, 17 Jul 2024 03:37:36 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote: > Citing David Holmes from bug report: > "We provided the ability to leave out certain VM services (JVMTI, GC's other than serial, ...) as part of the design of the MinimalVM to support Java SE Embedded, along with the Compact Profiles of JDK 8. This manifested in the source code as a set of INCLUDE_XXX ifdef guards. The build system later exposed these as individual --with-jvm-features=xxx,yyy support. However, it was never intended (and certainly not tested) that you could mix-and-match arbitrary subsets of these VM features at will. Consequently if you start trying to do this you will find things that need fixing." > > I added `INCLUDE_JVMTI` guards in two places where it was missed: JVMCI and JFR. Affected code was added recently, in the past year. After that I was able to build VM on all supported platforms. > > Note: building VM without JVMTI is not officially supported feature. We are not testing it and such failures (missing guards) are not unexpected. > > A lot of tests failed with VM without JVMTI. All are expected failures. I listed failed tests in bug report. > I fixed (added requires `vm.jvmti`) only one which was part of [JDK-8257967](https://bugs.openjdk.org/browse/JDK-8257967) changes which introduced JFR code without `INCLUDE_JVMTI` guards. > > I ran 2 rounds of testing: > > First, only **tier1** with VM built without JVMTI to see if builds passed and which tests affected. I wrote comment in bug report which tests failed (all expected to fail without JVMTI). > > Second round of testing with JVMTI in VM: tier1-4 Thank you, Aleksey, for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20209#issuecomment-2233729056 From kvn at openjdk.org Wed Jul 17 16:48:54 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 17 Jul 2024 16:48:54 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v7] In-Reply-To: <tTrbqFeAXbr7uyNTiBPBts2WGPtSIqcuoZryE6T1_eY=.42caae9e-b377-457b-8e18-9bf2a3c15cf7@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <tTrbqFeAXbr7uyNTiBPBts2WGPtSIqcuoZryE6T1_eY=.42caae9e-b377-457b-8e18-9bf2a3c15cf7@github.com> Message-ID: <lt202f35FQwPU8wzVkgqeaebZINd2UIna8B-XJCcl_M=.0d2e9a2c-72d3-4557-9f54-04334e8f77d7@github.com> On Wed, 17 Jul 2024 15:19:18 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. >> >> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. >> >> In this PR: >> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it. >> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. >> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. >> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. >> >> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: >> >> >> Benchmark Threads Mode Cnt Score Error Units >> ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op >> ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op >> ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op >> ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op >> ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op >> ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op >> ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op >> ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op >> ConcurrentClose.sharedAccess 1 avgt 10 ... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > benchmark review comments I am fine with compiler and CI changes - it is just marking nmethod as having scoped access. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20158#pullrequestreview-2183464779 From aph at openjdk.org Wed Jul 17 17:17:52 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 17 Jul 2024 17:17:52 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter In-Reply-To: <7JeIjy2PKvI4EZpDain1vd0dBRlWjgjp42xPeY0bHMs=.fee63987-dd85-486d-b7d3-67e52fdbee6f@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> <7JeIjy2PKvI4EZpDain1vd0dBRlWjgjp42xPeY0bHMs=.fee63987-dd85-486d-b7d3-67e52fdbee6f@github.com> Message-ID: <FMWMwnwhdReuAohf_e_EWQN7yFM6WNl8Hv0_0S7goek=.9004d9f0-5755-471e-a120-6b6e83c8ebbd@github.com> On Thu, 11 Jul 2024 23:47:51 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> src/hotspot/share/oops/klass.cpp line 284: >> >>> 282: // which doesn't zero out the memory before calling the constructor. >>> 283: Klass::Klass(KlassKind kind) : _kind(kind), >>> 284: _bitmap(SECONDARY_SUPERS_BITMAP_FULL), >> >> I like the idea, but what are the benefits of initializing `_bitmap` separately from `_secondary_supers`? > > Another observation while browsing the code: `_secondary_supers_bitmap` would be a better name. (Same considerations apply to `_hash_slot`.) This is because the C++ runtime does secondary super cache lookups even before the bitmap has been calculated and the hash table sorted. In this case the bitmap is zero, so teh search thinks there are no secondary supers. Setting _bitmap to SECONDARY_SUPERS_BITMAP_FULL forces a linear search. I guess there might be a better way to do this. Perhaps a comment is needed? I agree about `_secondary_supers_bitmap` name. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1681410651 From aph at openjdk.org Wed Jul 17 17:17:53 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 17 Jul 2024 17:17:53 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter In-Reply-To: <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> Message-ID: <A2v60vdAPL9qb22NB6kLVyuCACPDeqHUYoYFRFX6ig0=.9ef6f86b-559d-463a-9061-d0bbb6093aa7@github.com> On Fri, 5 Jul 2024 22:37:34 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't known at compile >> time. >> >> HotSpot shared runtime >> ---------------------- >> >> Building hashed secondary tables is now unconditional. It takes very >> little time, and now that the shared runtime always has the tables, it >> might as well take advantage of them. The shared code is easier to >> follow now, I think. >> >> There might be a performance issue with x86-64 in that we build >> HotSpot for a default x86-64 target that does not support popcount. >> This means that HotSpot C++ runtime on x86 always uses a software >> emulation for popcount, even though the vast majority of machines made >> for the past 20 years can do popcount in a single instruction. It >> wouldn't be terribly hard to do something about that. >> >> Having said that, the software popcount is really not bad. >> >> x86 >> --- >> >> x86 is rather tricky, because we still support >> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as >> well as 32- and 64-bit ports. There's some further complication in >> that only `RCX` can be used as a shift count, so there's some register >> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp >> rather gnarly, with multiple levels of conditionals at compile time >> and runtime. >> >> AArch64 >> ------- >> >> AArch64 is considerably more straightforward. We always have a >> popcount instruction and (thankfully) no 32-bit code to worry about. >> >> Generally >> --------- >> >> I would dearly love simply to rip out the "old" secondary supers cache >> support, but I've left it in just in case someone has a performance >> regression. >> >> The versions of `MacroAssembler::lookup_secondary_supers_table` that >> work with variable superclasses don't take a fixed set of temp >> registers, and neither do they call out to to a slow path subroutine. >> Instead, the slow patch is expanded inline. >> >> I don't think this is necessarily bad. Apart from the very rare cases >> where C2 can't determine the superclass to search for at compile time, >> this code is only used for generating stubs, and it seemed to me >> ridiculous to have stubs calling other stubs. >> >> I've followed the guidance from @iwanowww not to obsess too much about >> the performance of C1-compiled secondary supers lookups, and to prefer >> simplicity over absolute performance. Nonetheless, this i... > > src/hotspot/share/oops/klass.cpp line 469: > >> 467: #endif >> 468: >> 469: bitmap = hash_secondary_supers(secondary_supers, /*rewrite=*/true); // rewrites freshly allocated array > > I like that hashing is performed unconditionally now. > > Looks like you can remove `UseSecondarySupersTable`-specific CDS support (in `filemap.cpp`). CDS archive should unconditionally contain hashed tables. Sure. > src/hotspot/share/oops/klass.inline.hpp line 122: > >> 120: return true; >> 121: >> 122: bool result = lookup_secondary_supers_table(k); > > Should `UseSecondarySupersTable` affect `Klass::search_secondary_supers` as well? I think not. It'd complicate C++ runtime for no useful reason. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1681411088 PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1681412747 From kvn at openjdk.org Wed Jul 17 17:21:52 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 17 Jul 2024 17:21:52 GMT Subject: RFR: 8335860: compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard AVX/SSE settings [v3] In-Reply-To: <89mJFTY-O4WgqC7eYEu125ehHkgVCFtecRPhSQuEisI=.2b86f885-8225-4103-9dcb-6a4be73bad71@github.com> References: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com> <89mJFTY-O4WgqC7eYEu125ehHkgVCFtecRPhSQuEisI=.2b86f885-8225-4103-9dcb-6a4be73bad71@github.com> Message-ID: <juaTQskdYFNW5cFDGPY8zr01qgqRwvflxIuw9ZjVVzM=.7f2ebf55-5214-4d11-a2f9-3d3ae755d7ec@github.com> On Wed, 17 Jul 2024 05:45:05 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote: >> Enabling test with explicit feature checks for x86 target. >> Removing from test/hotspot/jtreg/ProblemList.txt >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Restoring earlier comment > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8335860 > - Review suggestions incorporated > - 8335860: compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard AVX/SSE settings Looks good. I submitted testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20160#issuecomment-2233813722 From liach at openjdk.org Wed Jul 17 18:08:51 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 17 Jul 2024 18:08:51 GMT Subject: RFR: 8334772: Change Class::protectionDomain and signers to explicit fields Message-ID: <FWayOxGhFrGGyh33wkJIMHkIO4azia9jFdmKszY9EBs=.9ac3895f-e96e-41aa-8a58-e491aaa68339@github.com> Please review this change that moves `Class.protectionDomain` and `signers` to explicit fields. Related native methods in `Class` and `AccessController::getProtectionDomain` are converted to pure Java. These fields are still set and used by hotspot. Also fixes the incorrect `protectiondomain_signature` in `vmSymbols`, which is actually an array descriptor. Note that these new fields are not filtered: filtering in early bootstrap requires other unrelated adjustments as we can't even use hashCode on String, and filtering is not proper encapsulation either. ------------- Commit messages: - Tests rely on Class ctor - Move class protectionDomain and signers fields to be explicit Changes: https://git.openjdk.org/jdk/pull/20221/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20221&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334772 Stats: 145 lines in 15 files changed: 25 ins; 90 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/20221.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20221/head:pull/20221 PR: https://git.openjdk.org/jdk/pull/20221 From alanb at openjdk.org Wed Jul 17 18:38:38 2024 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 17 Jul 2024 18:38:38 GMT Subject: RFR: 8334772: Change Class::protectionDomain and signers to explicit fields In-Reply-To: <FWayOxGhFrGGyh33wkJIMHkIO4azia9jFdmKszY9EBs=.9ac3895f-e96e-41aa-8a58-e491aaa68339@github.com> References: <FWayOxGhFrGGyh33wkJIMHkIO4azia9jFdmKszY9EBs=.9ac3895f-e96e-41aa-8a58-e491aaa68339@github.com> Message-ID: <Qnq3ypuxY7aa3QpAXP072UDegGz3vEM36DZg0_ql6mQ=.8ea26223-bd99-45f5-9dcf-aa0ee7d1a8c2@github.com> On Wed, 17 Jul 2024 17:47:11 GMT, Chen Liang <liach at openjdk.org> wrote: > Please review this change that moves `Class.protectionDomain` and `signers` to explicit fields. > > Related native methods in `Class` and `AccessController::getProtectionDomain` are converted to pure Java. These fields are still set and used by hotspot. Also fixes the incorrect `protectiondomain_signature` in `vmSymbols`, which is actually an array descriptor. > > Note that these new fields are not filtered: filtering in early bootstrap requires other unrelated adjustments as we can't even use hashCode on String, and filtering is not proper encapsulation either. Offline discussion with Chen and I think the advice is to drop all the changes for ProtectionDomain for now. This area will change significantly as part of the SecurityManager removal work. src/java.base/share/classes/jdk/internal/access/JavaLangAccess.java line 430: > 428: * {@link Class#getProtectionDomain()} > 429: */ > 430: ProtectionDomain protectionDomain(Class<?> c, boolean raw); I don't think we should expose this outside of java.lang. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20221#issuecomment-2233996684 PR Review Comment: https://git.openjdk.org/jdk/pull/20221#discussion_r1681559624 From liach at openjdk.org Wed Jul 17 18:45:38 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 17 Jul 2024 18:45:38 GMT Subject: RFR: 8334772: Change Class::protectionDomain and signers to explicit fields In-Reply-To: <FWayOxGhFrGGyh33wkJIMHkIO4azia9jFdmKszY9EBs=.9ac3895f-e96e-41aa-8a58-e491aaa68339@github.com> References: <FWayOxGhFrGGyh33wkJIMHkIO4azia9jFdmKszY9EBs=.9ac3895f-e96e-41aa-8a58-e491aaa68339@github.com> Message-ID: <J1wdWdHGNcBaJsnGL43q9iJkvz4pTkx0O0IJXrTClJM=.fe47dd18-67f1-4d84-a2d8-a006518ba37d@github.com> On Wed, 17 Jul 2024 17:47:11 GMT, Chen Liang <liach at openjdk.org> wrote: > Please review this change that moves `Class.protectionDomain` and `signers` to explicit fields. > > Related native methods in `Class` and `AccessController::getProtectionDomain` are converted to pure Java. These fields are still set and used by hotspot. Also fixes the incorrect `protectiondomain_signature` in `vmSymbols`, which is actually an array descriptor. > > Note that these new fields are not filtered: filtering in early bootstrap requires other unrelated adjustments as we can't even use hashCode on String, and filtering is not proper encapsulation either. The migration of signers will be in a new PR. This patch will be kept so people will know the extra test updates related to migration of protectionDomain. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20221#issuecomment-2234008622 From liach at openjdk.org Wed Jul 17 18:45:38 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 17 Jul 2024 18:45:38 GMT Subject: Withdrawn: 8334772: Change Class::protectionDomain and signers to explicit fields In-Reply-To: <FWayOxGhFrGGyh33wkJIMHkIO4azia9jFdmKszY9EBs=.9ac3895f-e96e-41aa-8a58-e491aaa68339@github.com> References: <FWayOxGhFrGGyh33wkJIMHkIO4azia9jFdmKszY9EBs=.9ac3895f-e96e-41aa-8a58-e491aaa68339@github.com> Message-ID: <OgugPiJxEziwY9rRp_GkDdQ3K5Adye7kO-qPeQGzRYs=.f4d29f5e-ef1c-43ca-b946-924eac809a87@github.com> On Wed, 17 Jul 2024 17:47:11 GMT, Chen Liang <liach at openjdk.org> wrote: > Please review this change that moves `Class.protectionDomain` and `signers` to explicit fields. > > Related native methods in `Class` and `AccessController::getProtectionDomain` are converted to pure Java. These fields are still set and used by hotspot. Also fixes the incorrect `protectiondomain_signature` in `vmSymbols`, which is actually an array descriptor. > > Note that these new fields are not filtered: filtering in early bootstrap requires other unrelated adjustments as we can't even use hashCode on String, and filtering is not proper encapsulation either. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/20221 From vlivanov at openjdk.org Wed Jul 17 18:48:33 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 17 Jul 2024 18:48:33 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter In-Reply-To: <FMWMwnwhdReuAohf_e_EWQN7yFM6WNl8Hv0_0S7goek=.9004d9f0-5755-471e-a120-6b6e83c8ebbd@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> <7JeIjy2PKvI4EZpDain1vd0dBRlWjgjp42xPeY0bHMs=.fee63987-dd85-486d-b7d3-67e52fdbee6f@github.com> <FMWMwnwhdReuAohf_e_EWQN7yFM6WNl8Hv0_0S7goek=.9004d9f0-5755-471e-a120-6b6e83c8ebbd@github.com> Message-ID: <xNV7-nhHtDKME2kWU_k3bKZJId61Ii_CW12KMQvd0IY=.03b01561-0358-4635-9d1c-ee931f14f12f@github.com> On Wed, 17 Jul 2024 17:13:49 GMT, Andrew Haley <aph at openjdk.org> wrote: >> Another observation while browsing the code: `_secondary_supers_bitmap` would be a better name. (Same considerations apply to `_hash_slot`.) > > This is because the C++ runtime does secondary super cache lookups even before the bitmap has been calculated and the hash table sorted. In this case the bitmap is zero, so teh search thinks there are no secondary supers. Setting _bitmap to SECONDARY_SUPERS_BITMAP_FULL forces a linear search. > > I guess there might be a better way to do this. Perhaps a comment is needed? > > I agree about `_secondary_supers_bitmap` name. Now it starts to sound concerning... `Klass::set_secondary_supers()` initializes both `_secondary_supers` and `_bitmap` which implies that `Klass::is_subtype_of()` may be called on not yet initialized Klass. It that's the case, it does look like a bug on its own. How is it expected to work when `_secondary_supers` hasn't been set yet? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1681571806 From kvn at openjdk.org Wed Jul 17 18:48:35 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 17 Jul 2024 18:48:35 GMT Subject: Integrated: 8335921: Fix HotSpot VM build without JVMTI In-Reply-To: <SFc7wGgnmCR8hHO_6h9j_LC5drW2BLC-sRKuFNtAOjE=.d061ebae-ba38-4d05-9648-e0ff17bb3343@github.com> References: <SFc7wGgnmCR8hHO_6h9j_LC5drW2BLC-sRKuFNtAOjE=.d061ebae-ba38-4d05-9648-e0ff17bb3343@github.com> Message-ID: <cA66XRsCeAiEtD8Z4E4OJUAeIZRA1AFtG53dK_NpdrY=.351d8e9b-f6a6-4331-83b3-0d00b8675203@github.com> On Wed, 17 Jul 2024 03:37:36 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote: > Citing David Holmes from bug report: > "We provided the ability to leave out certain VM services (JVMTI, GC's other than serial, ...) as part of the design of the MinimalVM to support Java SE Embedded, along with the Compact Profiles of JDK 8. This manifested in the source code as a set of INCLUDE_XXX ifdef guards. The build system later exposed these as individual --with-jvm-features=xxx,yyy support. However, it was never intended (and certainly not tested) that you could mix-and-match arbitrary subsets of these VM features at will. Consequently if you start trying to do this you will find things that need fixing." > > I added `INCLUDE_JVMTI` guards in two places where it was missed: JVMCI and JFR. Affected code was added recently, in the past year. After that I was able to build VM on all supported platforms. > > Note: building VM without JVMTI is not officially supported feature. We are not testing it and such failures (missing guards) are not unexpected. > > A lot of tests failed with VM without JVMTI. All are expected failures. I listed failed tests in bug report. > I fixed (added requires `vm.jvmti`) only one which was part of [JDK-8257967](https://bugs.openjdk.org/browse/JDK-8257967) changes which introduced JFR code without `INCLUDE_JVMTI` guards. > > I ran 2 rounds of testing: > > First, only **tier1** with VM built without JVMTI to see if builds passed and which tests affected. I wrote comment in bug report which tests failed (all expected to fail without JVMTI). > > Second round of testing with JVMTI in VM: tier1-4 This pull request has now been integrated. Changeset: bcb5e695 Author: Vladimir Kozlov <kvn at openjdk.org> URL: https://git.openjdk.org/jdk/commit/bcb5e69505f6cc8a4f323924cd2c58e630595fc0 Stats: 20 lines in 8 files changed: 7 ins; 0 del; 13 mod 8335921: Fix HotSpot VM build without JVMTI Reviewed-by: dholmes, shade ------------- PR: https://git.openjdk.org/jdk/pull/20209 From shade at openjdk.org Wed Jul 17 18:52:13 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 17 Jul 2024 18:52:13 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v2] In-Reply-To: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com> References: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com> Message-ID: <TIMHU8ZaFhaLG2GUXBkb0hrvMsUz8Orays0vZlsYO4k=.39157f2c-f720-43d7-b1d8-6900551e5237@github.com> > [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. > > We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [ ] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - More precise barriers - Tests work - More touchups - Fixing the conditions, fixing the tests - Crude prototype, still failing the tests ------------- Changes: https://git.openjdk.org/jdk/pull/20139/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=01 Stats: 329 lines in 13 files changed: 328 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139 PR: https://git.openjdk.org/jdk/pull/20139 From kbarrett at openjdk.org Wed Jul 17 18:52:13 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 17 Jul 2024 18:52:13 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear In-Reply-To: <WOpJEGXtCPcCZv7YFhUT2ZOHe8j3mnavPrLjbbFD0Ns=.e514c8c3-ee1f-4e0d-a9ae-a83171959a0e@github.com> References: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com> <o7zszGQ4GxfAx_LutX6S8rCLrZVHro9Ggreo5tICcvw=.825e4096-7b13-4ce5-b5cc-53e8d5603ecf@github.com> <K2EJ43EXkTgJE0pjwzy50s3BoTAhF1Y2trwHtDzhojQ=.e837c7bc-717c-4826-8cc3-82a2232bc928@github.com> <iFxcPJTPGoxZgIaQKYtEbtg06xXYJewHfSA-f7nbofQ=.37070a3a-681b-4ccb-8857-91be898fd3c9@github.com> <WOpJEGXtCPcCZv7YFhUT2ZOHe8j3mnavPrLjbbFD0Ns=.e514c8c3-ee1f-4e0d-a9ae-a83171959a0e@github.com> Message-ID: <1Y2PaVuIsawmIC7NnLuk4I7WLDmHC55dAamEe3M_gOM=.12267ab4-ecaf-4cd7-8f80-b1c6cbf57507@github.com> On Fri, 12 Jul 2024 13:19:31 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > > The runtime use of the Access API knows how to resolve an unknown oop ref strength using AccessBarrierSupport::resolve_unknown_oop_ref_strength. However, we do not have support for that in the C2 backend. In fact, it does not understand non-strong oop stores at all. > > Aw, nice usability landmine. I thought C2 barrier set would assert on me if it cannot deliver. Apparently not, [...] Reference.refersTo has similar issues. See refersToImpl and refersTo0 in both Reference and PhantomReference. I think you should be able to model on those and the intrinsic implementation for refersTo to get what you want. One additional complication is that Reference.enqueue intentionally calls clear0. If implementing clear similarly to refersTo, then enqueue should be changed to call clearImpl. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2228872926 From kbarrett at openjdk.org Wed Jul 17 18:52:13 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 17 Jul 2024 18:52:13 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear In-Reply-To: <1Y2PaVuIsawmIC7NnLuk4I7WLDmHC55dAamEe3M_gOM=.12267ab4-ecaf-4cd7-8f80-b1c6cbf57507@github.com> References: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com> <o7zszGQ4GxfAx_LutX6S8rCLrZVHro9Ggreo5tICcvw=.825e4096-7b13-4ce5-b5cc-53e8d5603ecf@github.com> <K2EJ43EXkTgJE0pjwzy50s3BoTAhF1Y2trwHtDzhojQ=.e837c7bc-717c-4826-8cc3-82a2232bc928@github.com> <iFxcPJTPGoxZgIaQKYtEbtg06xXYJewHfSA-f7nbofQ=.37070a3a-681b-4ccb-8857-91be898fd3c9@github.com> <WOpJEGXtCPcCZv7YFhUT2ZOHe8j3mnavPrLjbbFD0Ns=.e514c8c3-ee1f-4e0d-a9ae-a83171959a0e@github.com> <1Y2PaVuIsawmIC7NnLuk4I7WLDmHC55dAamEe3M_gOM=.12267ab4-ecaf-4cd7-8f80-b1c6cbf57507@github.com> Message-ID: <bK0RXMOH98fi_QKz0ueMaiEh0GotFD6fHA9D-RBDXoM=.982af5c2-25fe-47bf-afa3-e08ca5291661@github.com> On Mon, 15 Jul 2024 16:09:39 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: > > Aw, nice usability landmine. I thought C2 barrier set would assert on me if it cannot deliver. Apparently not, [...] > > Reference.refersTo has similar issues. See refersToImpl and refersTo0 in both Reference and PhantomReference. I think you should be able to model on those and the intrinsic implementation for refersTo to get what you want. > > One additional complication is that Reference.enqueue intentionally calls clear0. If implementing clear similarly to refersTo, then enqueue should be changed to call clearImpl. I should have read what I was replying to more carefully, rather than focusing on what was further up in the thread. Looks like you (@shipilev) already spotted the refersTo stuff. But the enqueue => clear0 could have easily been missed, so perhaps not an entirely unneeded suggestion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2231464762 From shade at openjdk.org Wed Jul 17 18:52:13 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 17 Jul 2024 18:52:13 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear In-Reply-To: <WOpJEGXtCPcCZv7YFhUT2ZOHe8j3mnavPrLjbbFD0Ns=.e514c8c3-ee1f-4e0d-a9ae-a83171959a0e@github.com> References: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com> <o7zszGQ4GxfAx_LutX6S8rCLrZVHro9Ggreo5tICcvw=.825e4096-7b13-4ce5-b5cc-53e8d5603ecf@github.com> <K2EJ43EXkTgJE0pjwzy50s3BoTAhF1Y2trwHtDzhojQ=.e837c7bc-717c-4826-8cc3-82a2232bc928@github.com> <iFxcPJTPGoxZgIaQKYtEbtg06xXYJewHfSA-f7nbofQ=.37070a3a-681b-4ccb-8857-91be898fd3c9@github.com> <WOpJEGXtCPcCZv7YFhUT2ZOHe8j3mnavPrLjbbFD0Ns=.e514c8c3-ee1f-4e0d-a9ae-a83171959a0e@github.com> Message-ID: <45m_iZZsJLn9OJowCePyhoipmHYfepPVN7GyrTgaaz0=.52fcb527-3318-4eb6-91e6-09b868e9ea32@github.com> On Fri, 12 Jul 2024 13:19:31 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: >>> > The reason we did not do this before is that this is not a strong reference store. Strong reference stores with a SATB collector will keep the referent alive, which is typically the exact opposite of what a user wants when they clear a Reference. >>> >>> You mean not doing this store just on the Java side? Yes, I agree, it would be awkward. In intrinsic, we are storing with the same decorators that `JVM_ReferenceClear` is using, which should be good with SATB collectors. Perhaps I am misunderstanding the comment. >> >> The runtime use of the Access API knows how to resolve an unknown oop ref strength using AccessBarrierSupport::resolve_unknown_oop_ref_strength. However, we do not have support for that in the C2 backend. In fact, it does not understand non-strong oop stores at all. Because there hasn't really been a use case for it, other than clearing a Reference. That's the precise reason why we do not have a clear intrinsic; it would have to add that infrastructure. > >> The runtime use of the Access API knows how to resolve an unknown oop ref strength using AccessBarrierSupport::resolve_unknown_oop_ref_strength. However, we do not have support for that in the C2 backend. In fact, it does not understand non-strong oop stores at all. > > Aw, nice usability landmine. I thought C2 barrier set would assert on me if it cannot deliver. Apparently not, I see it just does pre-barriers when it is not sure what strongness the store is. Hrmpf. OK, let me see what can be done here. It might be just easier to further specialize `Reference.clear` in subclasses and carry down the actual strongness, like we do with `refersTo0` currently. This would still require C2 backend adjustments to handle `AS_NO_KEEPALIVE` on stores, but at least we would not have to guess about the strongness type in C2 intrinsic. > I should have read what I was replying to more carefully, rather than focusing on what was further up in the thread. Looks like you (@shipilev) already spotted the refersTo stuff. But the enqueue => clear0 could have easily been missed, so perhaps not an entirely unneeded suggestion. Yeah, thanks. The `enqueue => clear0` was indeed easy to miss. Pushed the crude prototype that follows `refersTo` example and drills some new `AS_NO_KEEPALIVE` holes in C2 Access API to cover this intrinsic case. Super untested. IR tests are still failing, I'll take more in-depth look there. (Perhaps it would not be possible to clearly match the absence of pre-barrier in IR tests, we'll see.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2231613218 From shade at openjdk.org Wed Jul 17 18:52:14 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 17 Jul 2024 18:52:14 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear In-Reply-To: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com> References: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com> Message-ID: <QoMe48MGvE2JlplJD5N6KP-LU5Tbk8XNavLLySPeP-Q=.cbb17fb0-9496-472c-95df-f844452dfc9b@github.com> On Thu, 11 Jul 2024 15:28:37 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. > > We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [ ] Linux AArch64 server fastdebug, `all` Split out the `refersTo` test to https://github.com/openjdk/jdk/pull/20215. Yeah, so this version seems to work well on tests. I am being extra paranoid about only accepting `null` stores, since `AS_NO_KEEPALIVE` means all other barriers like inter-generational post-barriers in G1 should still work. G1 barrier set delegates the stores to `CardTable/ModRefBarrierSet`, which: a) does not know which barriers can be bypassed by `AS_NO_KEEPALIVE`; b) calls back `G1BarrierSet` for prebarrier generation, but already loses the decorators. So the simplest way to deal with this is to handle this special case specially. I think this is insanely sane, given how sharp-edged `AS_NO_KEEPALIVE` is. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2233066721 PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2234005550 From vlivanov at openjdk.org Wed Jul 17 18:57:32 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 17 Jul 2024 18:57:32 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter In-Reply-To: <xNV7-nhHtDKME2kWU_k3bKZJId61Ii_CW12KMQvd0IY=.03b01561-0358-4635-9d1c-ee931f14f12f@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> <7JeIjy2PKvI4EZpDain1vd0dBRlWjgjp42xPeY0bHMs=.fee63987-dd85-486d-b7d3-67e52fdbee6f@github.com> <FMWMwnwhdReuAohf_e_EWQN7yFM6WNl8Hv0_0S7goek=.9004d9f0-5755-471e-a120-6b6e83c8ebbd@github.com> <xNV7-nhHtDKME2kWU_k3bKZJId61Ii_CW12KMQvd0IY=.03b01561-0358-4635-9d1c-ee931f14f12f@github.com> Message-ID: <Eq4u2V3UeGi1VeGyEtA0FPS0sKoqAwqCgw5RmfRww-Y=.7dea6a8d-b59c-49b7-8b31-480b970d3de8@github.com> On Wed, 17 Jul 2024 18:46:11 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> This is because the C++ runtime does secondary super cache lookups even before the bitmap has been calculated and the hash table sorted. In this case the bitmap is zero, so teh search thinks there are no secondary supers. Setting _bitmap to SECONDARY_SUPERS_BITMAP_FULL forces a linear search. >> >> I guess there might be a better way to do this. Perhaps a comment is needed? >> >> I agree about `_secondary_supers_bitmap` name. > > Now it starts to sound concerning... `Klass::set_secondary_supers()` initializes both `_secondary_supers` and `_bitmap` which implies that `Klass::is_subtype_of()` may be called on not yet initialized Klass. It that's the case, it does look like a bug on its own. How is it expected to work when `_secondary_supers` hasn't been set yet? On a second thought the following setter may be the culprit: void Klass::set_secondary_supers(Array<Klass*>* secondaries) { assert(!UseSecondarySupersTable || secondaries == nullptr, ""); set_secondary_supers(secondaries, SECONDARY_SUPERS_BITMAP_EMPTY); } It should be adjusted to set `SECONDARY_SUPERS_BITMAP_FULL` instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1681581587 From uschindler at openjdk.org Wed Jul 17 19:03:34 2024 From: uschindler at openjdk.org (Uwe Schindler) Date: Wed, 17 Jul 2024 19:03:34 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v7] In-Reply-To: <tTrbqFeAXbr7uyNTiBPBts2WGPtSIqcuoZryE6T1_eY=.42caae9e-b377-457b-8e18-9bf2a3c15cf7@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <tTrbqFeAXbr7uyNTiBPBts2WGPtSIqcuoZryE6T1_eY=.42caae9e-b377-457b-8e18-9bf2a3c15cf7@github.com> Message-ID: <gYDpM3O_h6F4SdlH8Yo4kCW47pKh6GxM__8P6FOJI9A=.0d4eebbb-d38e-48f8-94be-7de417701071@github.com> On Wed, 17 Jul 2024 15:19:18 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. >> >> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. >> >> In this PR: >> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it. >> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. >> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. >> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. >> >> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: >> >> >> Benchmark Threads Mode Cnt Score Error Units >> ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op >> ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op >> ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op >> ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op >> ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op >> ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op >> ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op >> ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op >> ConcurrentClose.sharedAccess 1 avgt 10 ... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > benchmark review comments Marked as reviewed by uschindler (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/20158#pullrequestreview-2183805940 From liach at openjdk.org Wed Jul 17 19:54:00 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 17 Jul 2024 19:54:00 GMT Subject: RFR: 8334772: Change Class::signers to an explicit field Message-ID: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com> `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.) Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR. ------------- Commit messages: - 8334772: Change Class::signers to an explicit field Changes: https://git.openjdk.org/jdk/pull/20223/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20223&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334772 Stats: 71 lines in 6 files changed: 7 ins; 53 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/20223.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20223/head:pull/20223 PR: https://git.openjdk.org/jdk/pull/20223 From szaldana at openjdk.org Wed Jul 17 19:59:10 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Wed, 17 Jul 2024 19:59:10 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v3] In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> Message-ID: <nV9Wgx-NgJ1u0fkiRpE0g_HB1fSFvXohGzMYAV5b6jY=.ac6dd18b-8e67-41ae-9db7-2c4b6ef029b9@github.com> > Hi all, > > This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. > > This PR addresses the following diagnostic commands: > - [x] Compiler.perfmap > - [x] GC.heap_dump > - [x] System.dump_map > - [x] Thread.dump_to_file > - [x] VM.cds > > Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). > > I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, > > > filename (Optional) Name of the file to which the flight recording data is > written when the recording is stopped. If no filename is given, a > filename is generated from the PID and the current date and is > placed in the directory where the process was started. The > filename may also be a directory in which case, the filename is > generated from the PID and the current date in the specified > directory. (STRING, no default value) > > Note: If a filename is given, '%p' in the filename will be > replaced by the PID, and '%t' will be replaced by the time in > 'yyyy_MM_dd_HH_mm_ss' format. > > > Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. > > Testing: > > - [x] Added test case passes. > - [x] Modified existing VM.cds tests to also check for `%p` filenames. > > Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). > > Cheers, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: Updates based on feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20198/files - new: https://git.openjdk.org/jdk/pull/20198/files/eea54f6d..3bb774d3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=01-02 Stats: 35 lines in 4 files changed: 0 ins; 12 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/20198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198 PR: https://git.openjdk.org/jdk/pull/20198 From vlivanov at openjdk.org Wed Jul 17 20:37:35 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 17 Jul 2024 20:37:35 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v7] In-Reply-To: <tTrbqFeAXbr7uyNTiBPBts2WGPtSIqcuoZryE6T1_eY=.42caae9e-b377-457b-8e18-9bf2a3c15cf7@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <tTrbqFeAXbr7uyNTiBPBts2WGPtSIqcuoZryE6T1_eY=.42caae9e-b377-457b-8e18-9bf2a3c15cf7@github.com> Message-ID: <AOvJNfA2r8_pXv_UzMA3z48O91R1lDwXKTkjw2rqCMY=.cf0bcda3-f0da-4d39-b2dc-ae847729c27e@github.com> On Wed, 17 Jul 2024 15:19:18 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. >> >> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. >> >> In this PR: >> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it. >> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. >> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. >> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. >> >> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: >> >> >> Benchmark Threads Mode Cnt Score Error Units >> ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op >> ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op >> ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op >> ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op >> ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op >> ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op >> ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op >> ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op >> ConcurrentClose.sharedAccess 1 avgt 10 ... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > benchmark review comments Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20158#pullrequestreview-2183984622 From kvn at openjdk.org Wed Jul 17 21:16:33 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 17 Jul 2024 21:16:33 GMT Subject: RFR: 8335860: compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard AVX/SSE settings [v3] In-Reply-To: <89mJFTY-O4WgqC7eYEu125ehHkgVCFtecRPhSQuEisI=.2b86f885-8225-4103-9dcb-6a4be73bad71@github.com> References: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com> <89mJFTY-O4WgqC7eYEu125ehHkgVCFtecRPhSQuEisI=.2b86f885-8225-4103-9dcb-6a4be73bad71@github.com> Message-ID: <4duGDJ3x3AOCD4JWUpJ8RQQn-gV4DjQlZcDupfgYaH0=.5a57d7d2-7eac-4e66-a2e3-43454abefc00@github.com> On Wed, 17 Jul 2024 05:45:05 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote: >> Enabling test with explicit feature checks for x86 target. >> Removing from test/hotspot/jtreg/ProblemList.txt >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Restoring earlier comment > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8335860 > - Review suggestions incorporated > - 8335860: compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard AVX/SSE settings My tier1-3 testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20160#pullrequestreview-2184074033 From dholmes at openjdk.org Wed Jul 17 22:01:33 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 17 Jul 2024 22:01:33 GMT Subject: RFR: 8334772: Change Class::signers to an explicit field In-Reply-To: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com> References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com> Message-ID: <po0gx6r_TqVL3Dv1kD5KNDaU6xsqfzFfK0h3sQsgB7E=.aee9ed05-0345-4b0e-ae87-566fe0bdeebf@github.com> On Wed, 17 Jul 2024 19:47:44 GMT, Chen Liang <liach at openjdk.org> wrote: > `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.) > > Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR. The relocation of the field to Java looks good. But I am concerned that the field is now exposed to reflection. ------------- PR Review: https://git.openjdk.org/jdk/pull/20223#pullrequestreview-2184157888 From jkarthikeyan at openjdk.org Wed Jul 17 22:50:31 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 17 Jul 2024 22:50:31 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) In-Reply-To: <MWiyM5dWze8wwUA4nKY5V-TiH98NO5qRlG3UcA3QbKw=.3c1c0d0f-de0c-485b-a5d0-f18c77aa32a4@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <l3QGajoAAxigBK5cfIYwdGPTKfbJJJLvnSYisn7O7x8=.15bd4030-3af2-4d3a-a013-8f9c392223f1@github.com> <MWiyM5dWze8wwUA4nKY5V-TiH98NO5qRlG3UcA3QbKw=.3c1c0d0f-de0c-485b-a5d0-f18c77aa32a4@github.com> Message-ID: <idYp67bpPuveqjhqTLQOL2T1DotalBWz_m8iRayFsts=.f8334e0e-2061-458e-86f9-c009444748e6@github.com> On Wed, 17 Jul 2024 09:18:31 GMT, Galder Zamarre?o <galder at openjdk.org> wrote: > Do you want a microbenchmark for the performance of vectorized max/min long? Yeah, I think a simple benchmark that tests for long min/max vectorization and reduction would be good. I worry that checking performance manually like in `ReductionPerf` can lead to harder to interpret results than with a microbenchmark, especially with vm warmup ? Thanks for looking into this! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2234515775 From cjplummer at openjdk.org Thu Jul 18 00:23:36 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 18 Jul 2024 00:23:36 GMT Subject: RFR: 8336587: failure_handler lldb command times out on macosx-aarch64 core file In-Reply-To: <L1fxCYdEJTI5I2mfuEWOkkTihGnPgioh2A2Q5f-qXwg=.4ba1fe74-0395-4a87-bf39-56af4080b55b@github.com> References: <L1fxCYdEJTI5I2mfuEWOkkTihGnPgioh2A2Q5f-qXwg=.4ba1fe74-0395-4a87-bf39-56af4080b55b@github.com> Message-ID: <gbd7tPdXO0cC1Xj4UGVIhoviY8yd53OtsE1GL4YKcfs=.9d3a4d34-17c1-4b6c-84a1-4c3fab1c545c@github.com> On Tue, 16 Jul 2024 23:59:09 GMT, Chris Plummer <cjplummer at openjdk.org> wrote: > I was looking at the failure_handler output for the lldb command on a macosx-aarch64 core file (it is trying to use lldb to get a back trace of all threads), and noticed it timed out: > > > ---------------------------------------- > [2024-07-15 05:15:47] [<snip>/usr/bin/lldb, --core, <snip>/core.92643, <snip>/bin/java, -o, thread backtrace all, -o, quit] timeout=20000 in <snip> > ---------------------------------------- > (lldb) target create "<snip>/bin/java" --core "<snip>/core.92643" > WARNING: tool timed out: killed process after 20000 ms > ---------------------------------------- > [2024-07-15 05:16:07] exit code: -2 time: 20163 ms > ---------------------------------------- > > > 20 seconds is the failure_handler default timeout for all commands. Core files on macosx-aarch64 tend to be very large. This one was over 13gb. On my MBPro it took 30 seconds. I bumped up the timeout to 60 seconds and reproduce the same crash in mach5 (more than once), and it usually took about 55 seconds for the lldb command, but it did succeed with the longer timeout. I think we should change the timeout to even more than 60 seconds just to make sure we won't see timeouts. 120 seconds is probably a good amount. Thanks for the reviews Jai, Dean, and David! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20206#issuecomment-2234871521 From cjplummer at openjdk.org Thu Jul 18 00:23:36 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 18 Jul 2024 00:23:36 GMT Subject: Integrated: 8336587: failure_handler lldb command times out on macosx-aarch64 core file In-Reply-To: <L1fxCYdEJTI5I2mfuEWOkkTihGnPgioh2A2Q5f-qXwg=.4ba1fe74-0395-4a87-bf39-56af4080b55b@github.com> References: <L1fxCYdEJTI5I2mfuEWOkkTihGnPgioh2A2Q5f-qXwg=.4ba1fe74-0395-4a87-bf39-56af4080b55b@github.com> Message-ID: <4uOQrbD6VqzPOFKV6XZmmVO65XfBiNIlC10HFLpIhzk=.5d9c634e-d546-458c-8e8c-8457ea825fac@github.com> On Tue, 16 Jul 2024 23:59:09 GMT, Chris Plummer <cjplummer at openjdk.org> wrote: > I was looking at the failure_handler output for the lldb command on a macosx-aarch64 core file (it is trying to use lldb to get a back trace of all threads), and noticed it timed out: > > > ---------------------------------------- > [2024-07-15 05:15:47] [<snip>/usr/bin/lldb, --core, <snip>/core.92643, <snip>/bin/java, -o, thread backtrace all, -o, quit] timeout=20000 in <snip> > ---------------------------------------- > (lldb) target create "<snip>/bin/java" --core "<snip>/core.92643" > WARNING: tool timed out: killed process after 20000 ms > ---------------------------------------- > [2024-07-15 05:16:07] exit code: -2 time: 20163 ms > ---------------------------------------- > > > 20 seconds is the failure_handler default timeout for all commands. Core files on macosx-aarch64 tend to be very large. This one was over 13gb. On my MBPro it took 30 seconds. I bumped up the timeout to 60 seconds and reproduce the same crash in mach5 (more than once), and it usually took about 55 seconds for the lldb command, but it did succeed with the longer timeout. I think we should change the timeout to even more than 60 seconds just to make sure we won't see timeouts. 120 seconds is probably a good amount. This pull request has now been integrated. Changeset: 21a6cf84 Author: Chris Plummer <cjplummer at openjdk.org> URL: https://git.openjdk.org/jdk/commit/21a6cf848da00c795d833f926f831c7aea05dfa3 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod 8336587: failure_handler lldb command times out on macosx-aarch64 core file Reviewed-by: dlong, dholmes, jpai ------------- PR: https://git.openjdk.org/jdk/pull/20206 From lmesnik at openjdk.org Thu Jul 18 00:49:34 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 18 Jul 2024 00:49:34 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v3] In-Reply-To: <nV9Wgx-NgJ1u0fkiRpE0g_HB1fSFvXohGzMYAV5b6jY=.ac6dd18b-8e67-41ae-9db7-2c4b6ef029b9@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <nV9Wgx-NgJ1u0fkiRpE0g_HB1fSFvXohGzMYAV5b6jY=.ac6dd18b-8e67-41ae-9db7-2c4b6ef029b9@github.com> Message-ID: <-1Ejx0t2f_Q0Hl9nKsil_C2hweWnB9pTdvbID9_OMtQ=.fa91c0d0-fb81-4e12-b946-a9cdc1125c6c@github.com> On Wed, 17 Jul 2024 19:59:10 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. >> >> This PR addresses the following diagnostic commands: >> - [x] Compiler.perfmap >> - [x] GC.heap_dump >> - [x] System.dump_map >> - [x] Thread.dump_to_file >> - [x] VM.cds >> >> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). >> >> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, >> >> >> filename (Optional) Name of the file to which the flight recording data is >> written when the recording is stopped. If no filename is given, a >> filename is generated from the PID and the current date and is >> placed in the directory where the process was started. The >> filename may also be a directory in which case, the filename is >> generated from the PID and the current date in the specified >> directory. (STRING, no default value) >> >> Note: If a filename is given, '%p' in the filename will be >> replaced by the PID, and '%t' will be replaced by the time in >> 'yyyy_MM_dd_HH_mm_ss' format. >> >> >> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. >> >> Testing: >> >> - [x] Added test case passes. >> - [x] Modified existing VM.cds tests to also check for `%p` filenames. >> >> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). >> >> Cheers, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Updates based on feedback Changes requested by lmesnik (Reviewer). src/hotspot/share/code/codeCache.cpp line 1791: > 1789: > 1790: #ifdef LINUX > 1791: void CodeCache::write_perf_map(const char* filename, outputStream* out) { I don't think that it is a right place ti expand arguments here. I think it is more consistent to do it in diagnosticCommand.cpp for any jcmd command. So this logic might be the same for any _filename processing. Moreover it would be better be add 'FileArgument' like 'MemorySizeArgument' that correctly parse patterns like %p for all file arguments, used be all commands and be extensible for new macroses if needed. test/jdk/sun/tools/jcmd/TestJcmdPIDSubstitution.java line 32: > 30: * @modules java.base/jdk.internal.misc > 31: * java.management > 32: * @run main/othervm TestJcmdPIDSubstitution Why othervm is needed here? I suggest to add driver mode instead. test/jdk/sun/tools/jcmd/TestJcmdPIDSubstitution.java line 59: > 57: do { > 58: path = Paths.get("myfile%d".formatted(pid)); > 59: } while(Files.exists(path)); Why this do/while loop is needed? Each test should have it's own empty scratch directory. ------------- PR Review: https://git.openjdk.org/jdk/pull/20198#pullrequestreview-2184333180 PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1681931084 PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1681921063 PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1681920781 From dholmes at openjdk.org Thu Jul 18 03:44:31 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 18 Jul 2024 03:44:31 GMT Subject: RFR: 8325945: Error reporting should limit the number of String characters printed [v2] In-Reply-To: <onfNeoQzCfo3lsgAdomCn6xxQx3nsVVk8h8h3gQDJl8=.b8c3a559-fad4-4b60-b22f-f07fd5f0b807@github.com> References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> <0C2xrw7X8gn7dl7LWNZu9lh5XJjvOSNbA0PRqa6ydoM=.29d1d6ee-242f-4ab5-abaa-d2113d030f82@github.com> <j1xFGdRG38i_hvtMSBHJeHVlC4-HTghiPnz1aTEKY8Q=.cec14e3f-5274-420a-9683-1a90ce86aefc@github.com> <onfNeoQzCfo3lsgAdomCn6xxQx3nsVVk8h8h3gQDJl8=.b8c3a559-fad4-4b60-b22f-f07fd5f0b807@github.com> Message-ID: <0dEcF8Fq29rrS4qhGPuhmWoU8Cpw9vN7FO8yWLtXlTo=.08a889de-2632-4b9c-b814-1414bec74f4d@github.com> On Wed, 17 Jul 2024 06:25:55 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > AFAICS you start abridging if length is exactly max_length Where do you see that? I have: bool abridge = length > max_length; > it may seem just strange to replace a small inner portion with a larger "omitted" text, because then the text plus replacement is larger than the original text, That is not a practical concern as max_length is expected to be much larger than the replacement text. We don't need the added complexity of the "stretch" zone. I'm open to changing the omitted text portion though, to include the number of characters omitted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20150#discussion_r1682067434 From stuefe at openjdk.org Thu Jul 18 05:03:31 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 18 Jul 2024 05:03:31 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v3] In-Reply-To: <-1Ejx0t2f_Q0Hl9nKsil_C2hweWnB9pTdvbID9_OMtQ=.fa91c0d0-fb81-4e12-b946-a9cdc1125c6c@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <nV9Wgx-NgJ1u0fkiRpE0g_HB1fSFvXohGzMYAV5b6jY=.ac6dd18b-8e67-41ae-9db7-2c4b6ef029b9@github.com> <-1Ejx0t2f_Q0Hl9nKsil_C2hweWnB9pTdvbID9_OMtQ=.fa91c0d0-fb81-4e12-b946-a9cdc1125c6c@github.com> Message-ID: <doTMEY5xsducmkODj09ef6yA6eG_GzsEyPpvjKo_Yzo=.b8e03eec-59cb-466e-bb14-d9f881fce861@github.com> On Thu, 18 Jul 2024 00:45:24 GMT, Leonid Mesnik <lmesnik at openjdk.org> wrote: >> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: >> >> Updates based on feedback > > src/hotspot/share/code/codeCache.cpp line 1791: > >> 1789: >> 1790: #ifdef LINUX >> 1791: void CodeCache::write_perf_map(const char* filename, outputStream* out) { > > I don't think that it is a right place ti expand arguments here. I think it is more consistent to do it in diagnosticCommand.cpp for any jcmd command. So this logic might be the same for any _filename processing. Moreover it would be better be add 'FileArgument' like 'MemorySizeArgument' that correctly parse patterns like %p for all file arguments, used be all commands and be extensible for new macroses if needed. That's a good idea. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1682118709 From dholmes at openjdk.org Thu Jul 18 06:46:03 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 18 Jul 2024 06:46:03 GMT Subject: RFR: 8325945: Error reporting should limit the number of String characters printed [v3] In-Reply-To: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> Message-ID: <A9iMU-hV7SpgO7s8zSdIi-FeoH21jRYIQvcDDqmY860=.20798082-5958-4338-aebe-245b9153f269@github.com> > Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log. > > The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 2 to O_BUFLEN. > > The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless. > > If a string's length exceeds `max_length` then we print it as follows: > > "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged) > > For example if we print "ABCDE" with a max_length of 4 then the output is literally: > > "AB ... DE" (abridged) > > The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive). > > For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`. > > Testing: > - new test added for validation purposes > - tiers 1 - 3 as sanity testing > > Thanks David Holmes has updated the pull request incrementally with one additional commit since the last revision: Change output format to show number of characters ommitted as a suggested by @tstuefe ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20150/files - new: https://git.openjdk.org/jdk/pull/20150/files/39256bd3..6d445dbf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20150&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20150&range=01-02 Stats: 19 lines in 2 files changed: 11 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/20150.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20150/head:pull/20150 PR: https://git.openjdk.org/jdk/pull/20150 From dholmes at openjdk.org Thu Jul 18 06:52:44 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 18 Jul 2024 06:52:44 GMT Subject: RFR: 8325945: Error reporting should limit the number of String characters printed [v4] In-Reply-To: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> Message-ID: <mHlCtFCitj8_YGchzdAHdKC3db_MXGam6Am_z_M1BNM=.1e9e4b5a-3f8c-4946-8254-c425d64da354@github.com> > Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log. > > The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 2 to O_BUFLEN. > > The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless. > > If a string's length exceeds `max_length` then we print it as follows: > > "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged) > > For example if we print "ABCDE" with a max_length of 4 then the output is literally: > > "AB ... DE" (abridged) > > The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive). > > For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`. > > Testing: > - new test added for validation purposes > - tiers 1 - 3 as sanity testing > > Thanks David Holmes has updated the pull request incrementally with one additional commit since the last revision: Update comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20150/files - new: https://git.openjdk.org/jdk/pull/20150/files/6d445dbf..e45fb904 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20150&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20150&range=02-03 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20150.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20150/head:pull/20150 PR: https://git.openjdk.org/jdk/pull/20150 From dholmes at openjdk.org Thu Jul 18 06:52:44 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 18 Jul 2024 06:52:44 GMT Subject: RFR: 8325945: Error reporting should limit the number of String characters printed [v2] In-Reply-To: <SBNE7wMZ0WMp1bQzyK9EASI6EOXgtVPSJw1uWCqRFko=.947c9a5d-ec2e-450a-a7ca-6272470827ae@github.com> References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> <SBNE7wMZ0WMp1bQzyK9EASI6EOXgtVPSJw1uWCqRFko=.947c9a5d-ec2e-450a-a7ca-6272470827ae@github.com> Message-ID: <e-kCgxAGG_3lxzWCjgLpyPjsbyzPKgenDF0MrBgfUZg=.a02583e4-e7ce-4d38-bf7c-e1d9b96f0822@github.com> On Wed, 17 Jul 2024 05:32:26 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log. >> >> The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 2 to O_BUFLEN. >> >> The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless. >> >> If a string's length exceeds `max_length` then we print it as follows: >> >> "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged) >> >> For example if we print "ABCDE" with a max_length of 4 then the output is literally: >> >> "AB ... DE" (abridged) >> >> The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive). >> >> For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`. >> >> Testing: >> - new test added for validation purposes >> - tiers 1 - 3 as sanity testing >> >> Thanks > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Fixed grammar I've updated per Thomas's suggestion to show the number of characters omitted. However I kept the " (abridged)" part as well as with actual very long strings you are more likely to spot that at the end than the "... (N characters ommitted) ..." in the middle. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20150#issuecomment-2235749703 From stuefe at openjdk.org Thu Jul 18 07:28:33 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 18 Jul 2024 07:28:33 GMT Subject: RFR: 8325945: Error reporting should limit the number of String characters printed [v4] In-Reply-To: <0dEcF8Fq29rrS4qhGPuhmWoU8Cpw9vN7FO8yWLtXlTo=.08a889de-2632-4b9c-b814-1414bec74f4d@github.com> References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> <0C2xrw7X8gn7dl7LWNZu9lh5XJjvOSNbA0PRqa6ydoM=.29d1d6ee-242f-4ab5-abaa-d2113d030f82@github.com> <j1xFGdRG38i_hvtMSBHJeHVlC4-HTghiPnz1aTEKY8Q=.cec14e3f-5274-420a-9683-1a90ce86aefc@github.com> <onfNeoQzCfo3lsgAdomCn6xxQx3nsVVk8h8h3gQDJl8=.b8c3a559-fad4-4b60-b22f-f07fd5f0b807@github.com> <0dEcF8Fq29rrS4qhGPuhmWoU8Cpw9vN7FO8yWLtXlTo=.08a889de-2632-4b9c-b814-1414bec74f4d@github.com> Message-ID: <DRT4whSKMivdfabSUnQmrmPKcWXpDjouRmZ3J-DnymY=.16e8cff6-2cac-4728-a215-c83565da1da2@github.com> On Thu, 18 Jul 2024 03:42:13 GMT, David Holmes <dholmes at openjdk.org> wrote: > That is not a practical concern as max_length is expected to be much larger than the replacement text. We don't need the added complexity of the "stretch" zone. > I'm open to changing the omitted text portion though, to include the number of characters omitted. Okay, that is fine. I tend to overengineer :) Thanks for taking my proposal about the omit text form. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20150#discussion_r1682336874 From stuefe at openjdk.org Thu Jul 18 07:28:35 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 18 Jul 2024 07:28:35 GMT Subject: RFR: 8325945: Error reporting should limit the number of String characters printed [v4] In-Reply-To: <mHlCtFCitj8_YGchzdAHdKC3db_MXGam6Am_z_M1BNM=.1e9e4b5a-3f8c-4946-8254-c425d64da354@github.com> References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> <mHlCtFCitj8_YGchzdAHdKC3db_MXGam6Am_z_M1BNM=.1e9e4b5a-3f8c-4946-8254-c425d64da354@github.com> Message-ID: <Iew9krH7gjDdTlpQ2MdEx7ujmdH9wjvn_ZcvrzT7hZw=.52e8d21d-a46f-45c5-a902-43e33602eb23@github.com> On Thu, 18 Jul 2024 06:52:44 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log. >> >> The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 2 to O_BUFLEN. >> >> The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless. >> >> If a string's length exceeds `max_length` then we print it as follows: >> >> "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged) >> >> For example if we print "ABCDE" with a max_length of 4 then the output is literally: >> >> "AB ... DE" (abridged) >> >> The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive). >> >> For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`. >> >> Testing: >> - new test added for validation purposes >> - tiers 1 - 3 as sanity testing >> >> Thanks > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Update comment src/hotspot/share/classfile/javaClasses.cpp line 799: > 797: if (length > max_length) { > 798: st->print(" (abridged) "); > 799: } Do we still need this marker? src/hotspot/share/runtime/globals.hpp line 1312: > 1310: "printed with the middle of the string elided.") \ > 1311: range(2, O_BUFLEN) \ > 1312: \ This would make sense as a product diagnostic switch. You want to be able to increase this at a customer if needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20150#discussion_r1682332121 PR Review Comment: https://git.openjdk.org/jdk/pull/20150#discussion_r1682334543 From dholmes at openjdk.org Thu Jul 18 07:38:38 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 18 Jul 2024 07:38:38 GMT Subject: RFR: 8325945: Error reporting should limit the number of String characters printed [v4] In-Reply-To: <Iew9krH7gjDdTlpQ2MdEx7ujmdH9wjvn_ZcvrzT7hZw=.52e8d21d-a46f-45c5-a902-43e33602eb23@github.com> References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> <mHlCtFCitj8_YGchzdAHdKC3db_MXGam6Am_z_M1BNM=.1e9e4b5a-3f8c-4946-8254-c425d64da354@github.com> <Iew9krH7gjDdTlpQ2MdEx7ujmdH9wjvn_ZcvrzT7hZw=.52e8d21d-a46f-45c5-a902-43e33602eb23@github.com> Message-ID: <L5vy6_40BG_rUhPcrgefCcMpv4SwmLa02iPC3emzAhk=.ad0e7c0e-ab6a-433b-996b-6230d36a4586@github.com> On Thu, 18 Jul 2024 07:23:16 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Update comment > > src/hotspot/share/classfile/javaClasses.cpp line 799: > >> 797: if (length > max_length) { >> 798: st->print(" (abridged) "); >> 799: } > > Do we still need this marker? See my comment above: https://github.com/openjdk/jdk/pull/20150#issuecomment-2235749703 > src/hotspot/share/runtime/globals.hpp line 1312: > >> 1310: "printed with the middle of the string elided.") \ >> 1311: range(2, O_BUFLEN) \ >> 1312: \ > > This would make sense as a product diagnostic switch. You want to be able to increase this at a customer if needed. This is modelled after the `MaxElementPrintSize` that precedes it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20150#discussion_r1682353932 PR Review Comment: https://git.openjdk.org/jdk/pull/20150#discussion_r1682354725 From stuefe at openjdk.org Thu Jul 18 07:45:31 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 18 Jul 2024 07:45:31 GMT Subject: RFR: 8325945: Error reporting should limit the number of String characters printed [v4] In-Reply-To: <mHlCtFCitj8_YGchzdAHdKC3db_MXGam6Am_z_M1BNM=.1e9e4b5a-3f8c-4946-8254-c425d64da354@github.com> References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> <mHlCtFCitj8_YGchzdAHdKC3db_MXGam6Am_z_M1BNM=.1e9e4b5a-3f8c-4946-8254-c425d64da354@github.com> Message-ID: <5XGcYDWXJwui2ftrbZRG26kv_VqiFX79_rdrMP0sMdU=.4da6620f-b3f1-47f7-80ee-a0af0a154030@github.com> On Thu, 18 Jul 2024 06:52:44 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log. >> >> The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 2 to O_BUFLEN. >> >> The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless. >> >> If a string's length exceeds `max_length` then we print it as follows: >> >> "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged) >> >> For example if we print "ABCDE" with a max_length of 4 then the output is literally: >> >> "AB ... DE" (abridged) >> >> The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive). >> >> For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`. >> >> Testing: >> - new test added for validation purposes >> - tiers 1 - 3 as sanity testing >> >> Thanks > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Update comment Looks good. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20150#pullrequestreview-2184984870 From stuefe at openjdk.org Thu Jul 18 07:45:33 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 18 Jul 2024 07:45:33 GMT Subject: RFR: 8325945: Error reporting should limit the number of String characters printed [v4] In-Reply-To: <L5vy6_40BG_rUhPcrgefCcMpv4SwmLa02iPC3emzAhk=.ad0e7c0e-ab6a-433b-996b-6230d36a4586@github.com> References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> <mHlCtFCitj8_YGchzdAHdKC3db_MXGam6Am_z_M1BNM=.1e9e4b5a-3f8c-4946-8254-c425d64da354@github.com> <Iew9krH7gjDdTlpQ2MdEx7ujmdH9wjvn_ZcvrzT7hZw=.52e8d21d-a46f-45c5-a902-43e33602eb23@github.com> <L5vy6_40BG_rUhPcrgefCcMpv4SwmLa02iPC3emzAhk=.ad0e7c0e-ab6a-433b-996b-6230d36a4586@github.com> Message-ID: <vgYER8TAjO_r2qrS16kO2CG1yoLg54AbJuxRbmI8Vas=.cef75cd4-1575-4cde-aef3-714d19f4ab0e@github.com> On Thu, 18 Jul 2024 07:35:03 GMT, David Holmes <dholmes at openjdk.org> wrote: >> src/hotspot/share/classfile/javaClasses.cpp line 799: >> >>> 797: if (length > max_length) { >>> 798: st->print(" (abridged) "); >>> 799: } >> >> Do we still need this marker? > > See my comment above: > https://github.com/openjdk/jdk/pull/20150#issuecomment-2235749703 Oh that makes sense. Okay! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20150#discussion_r1682363411 From dnsimon at openjdk.org Thu Jul 18 07:54:37 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 18 Jul 2024 07:54:37 GMT Subject: RFR: 8336587: failure_handler lldb command times out on macosx-aarch64 core file In-Reply-To: <L1fxCYdEJTI5I2mfuEWOkkTihGnPgioh2A2Q5f-qXwg=.4ba1fe74-0395-4a87-bf39-56af4080b55b@github.com> References: <L1fxCYdEJTI5I2mfuEWOkkTihGnPgioh2A2Q5f-qXwg=.4ba1fe74-0395-4a87-bf39-56af4080b55b@github.com> Message-ID: <Z78N6x85yK7RtN89rqajPHdYGclCuLPcXEEpp7FSCuo=.9524b644-b456-4789-91a6-a39602447852@github.com> On Tue, 16 Jul 2024 23:59:09 GMT, Chris Plummer <cjplummer at openjdk.org> wrote: > I was looking at the failure_handler output for the lldb command on a macosx-aarch64 core file (it is trying to use lldb to get a back trace of all threads), and noticed it timed out: > > > ---------------------------------------- > [2024-07-15 05:15:47] [<snip>/usr/bin/lldb, --core, <snip>/core.92643, <snip>/bin/java, -o, thread backtrace all, -o, quit] timeout=20000 in <snip> > ---------------------------------------- > (lldb) target create "<snip>/bin/java" --core "<snip>/core.92643" > WARNING: tool timed out: killed process after 20000 ms > ---------------------------------------- > [2024-07-15 05:16:07] exit code: -2 time: 20163 ms > ---------------------------------------- > > > 20 seconds is the failure_handler default timeout for all commands. Core files on macosx-aarch64 tend to be very large. This one was over 13gb. On my MBPro it took 30 seconds. I bumped up the timeout to 60 seconds and reproduce the same crash in mach5 (more than once), and it usually took about 55 seconds for the lldb command, but it did succeed with the longer timeout. I think we should change the timeout to even more than 60 seconds just to make sure we won't see timeouts. 120 seconds is probably a good amount. Thanks for this change - thread dumps are often crucial for investigating time outs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20206#issuecomment-2235856819 From fyang at openjdk.org Thu Jul 18 08:28:36 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 18 Jul 2024 08:28:36 GMT Subject: RFR: 8334999: RISC-V: implement AES single block encryption/decryption intrinsics [v3] In-Reply-To: <IATUuy7OYBIasXTq1KFmVEjeg2eQ9qFM2UP5B0UhoHw=.7a112155-e875-4752-b6f4-fbeb56248759@github.com> References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com> <F1yms2X9VVITjLPANuQqABre5E199ILHQ4ywpS4cicY=.3e2c0af1-8070-497a-bfa0-5732eb199974@github.com> <IATUuy7OYBIasXTq1KFmVEjeg2eQ9qFM2UP5B0UhoHw=.7a112155-e875-4752-b6f4-fbeb56248759@github.com> Message-ID: <DJY55PklmzAYqbYNmhh4j-F6BJPVRd8O0aiqLDPdqEE=.30c393c8-c801-4625-b903-1db0a2e509ff@github.com> On Tue, 9 Jul 2024 05:28:13 GMT, Fei Yang <fyang at openjdk.org> wrote: >> ArsenyBochkarev has updated the pull request incrementally with one additional commit since the last revision: >> >> Left a note on a side effect of generate_vle32_pack2 > > Changes requested by fyang (Reviewer). > As for comparison with the openssl version: first of all, thanks for the sources, @RealFYang! The main difference that I see is that they introduced three different different versions of encryption depending on the key sizes, which allows them to skip a couple of instructions, like when I did `vaesem_vv(res, vzero)` followed by `vxor_vv(res, res, vtemp1)`. So I thought it'll be more efficient to replace the current version by something openssl-lookalike. The only problem I see is increasing code size a bit. Please let me know if we are not interested in this change for some reason Does `vaesz_vs` help in anyway? And what about the `generate_aescrypt_decryptBlock`? [1] [1] https://github.com/openssl/openssl/blob/master/crypto/aes/asm/aes-riscv64-zvkned.pl#L451 ------------- PR Comment: https://git.openjdk.org/jdk/pull/19960#issuecomment-2235925757 From alanb at openjdk.org Thu Jul 18 09:43:33 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 18 Jul 2024 09:43:33 GMT Subject: RFR: 8334772: Change Class::signers to an explicit field In-Reply-To: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com> References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com> Message-ID: <O9uAoVHWx7skQYM26-2L-6lKyUj1qxOnfIHvddTaHcI=.d0934aab-1113-4eb7-ab4e-7ff0cee26019@github.com> On Wed, 17 Jul 2024 19:47:44 GMT, Chen Liang <liach at openjdk.org> wrote: > `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.) > > Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR. Signers dates back to JDK 1.1, touching it now will shine light on long standing technical debt and other issues. I think the main thing that is jumping out is that ClassLoader.setSigners (protected final method so may be called in subclasses) isn't fully specified and it is also missing a number of important checks. The method doesn't specify that the method ignores array classes or class objects for primitives. It doesn't say anything about the elements that aren't a Certificate are ignored. It doesn't specify null behavior either and doesn't say anything that the signers can change at any time. What's worse is that in the hands of a cowboy builder, it can be used to set or clear the signers for any Class, or keep a reference to the signers array and muck with them mid-flight. So lots of issues there. Technical debt aside, I think the transformation looks okay, just a bit confusing to have signers declared under a comment "Set by VM", it's not clear the comment applies only to the classData before it. At some point I think we should put a question mark on the JVMTI JVMTI_HEAP_REFERENCE_SIGNERS heap ref and the HPROF heap dump CLASS record where there is a slot for the signers array. I don't see any need for these in 2024 and could be potentially be null'ed in the future (would require a JVM TI spec change of course). On David's comment about exposing the field to code using Class.getDecalredField or Class.getDeclaredFields. Nosy code can use these methods to get a reference to the Field but it's not accessible by default. For now, code can using sun.misc.Unsafe but that is temporary and will go away. I don't have a strong opinion and no objection to adding it to the filter (which I think is what David is wondering about). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20223#issuecomment-2236065493 From jvernee at openjdk.org Thu Jul 18 11:03:40 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 18 Jul 2024 11:03:40 GMT Subject: Integrated: 8335480: Only deoptimize threads if needed when closing shared arena In-Reply-To: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> Message-ID: <GQmh0WOGyJZqSLpgnp5PdHpifM2VG1Q0Mz_Hlm6qKzo=.1a25625e-3cf4-4aa4-8041-69429fa3b803@github.com> On Fri, 12 Jul 2024 13:57:23 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: > This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark. > > Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them. > > In this PR: > - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it. > - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`. > - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case. > - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas. > > I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads: > > > Benchmark Threads Mode Cnt Score Error Units > ConcurrentClose.sharedAccess 32 avgt 10 9017.397 ? 202.870 us/op > ConcurrentClose.sharedAccess 24 avgt 10 5178.214 ? 164.922 us/op > ConcurrentClose.sharedAccess 16 avgt 10 2224.420 ? 165.754 us/op > ConcurrentClose.sharedAccess 8 avgt 10 593.828 ? 8.321 us/op > ConcurrentClose.sharedAccess 7 avgt 10 470.700 ? 22.511 us/op > ConcurrentClose.sharedAccess 6 avgt 10 386.697 ? 59.170 us/op > ConcurrentClose.sharedAccess 5 avgt 10 291.157 ? 7.023 us/op > ConcurrentClose.sharedAccess 4 avgt 10 209.178 ? 5.802 us/op > ConcurrentClose.sharedAccess 1 avgt 10 52.042 ? 0.630 us/op > ConcurrentClose.conf... This pull request has now been integrated. Changeset: 7bf53132 Author: Jorn Vernee <jvernee at openjdk.org> URL: https://git.openjdk.org/jdk/commit/7bf531324404419e7de3e83e245d351e1a4e4499 Stats: 482 lines in 20 files changed: 393 ins; 19 del; 70 mod 8335480: Only deoptimize threads if needed when closing shared arena Reviewed-by: mcimadamore, kvn, uschindler, vlivanov, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/20158 From jbhateja at openjdk.org Thu Jul 18 11:25:42 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 18 Jul 2024 11:25:42 GMT Subject: Integrated: 8335860: compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard AVX/SSE settings In-Reply-To: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com> References: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com> Message-ID: <RBZe07a04W-62A_HLu0MiEFv69JDrjujQbvHQ1EecMk=.397a3105-dc57-4028-b4d3-0285d458b6b2@github.com> On Fri, 12 Jul 2024 18:26:26 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote: > Enabling test with explicit feature checks for x86 target. > Removing from test/hotspot/jtreg/ProblemList.txt > > Best Regards, > Jatin This pull request has now been integrated. Changeset: 35df48e1 Author: Jatin Bhateja <jbhateja at openjdk.org> URL: https://git.openjdk.org/jdk/commit/35df48e1b321d16f44ba924065143af67143cf95 Stats: 4 lines in 2 files changed: 0 ins; 3 del; 1 mod 8335860: compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard AVX/SSE settings Reviewed-by: sviswanathan, kvn ------------- PR: https://git.openjdk.org/jdk/pull/20160 From rkennke at openjdk.org Thu Jul 18 11:33:40 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 18 Jul 2024 11:33:40 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9] In-Reply-To: <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com> Message-ID: <OCv6QKq_A8dUaKUbnzSdEnlEqrMIcb6pUyLfObBFq-o=.1d78e62f-151c-403d-a291-fbab38c5f4d6@github.com> On Mon, 15 Jul 2024 00:50:30 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision: > > - Remove try_read > - Add explicit to single parameter constructors > - Remove superfluous access specifier > - Remove unused include > - Update assert message OMCache::set_monitor > - Fix indentation > - Remove outdated comment LightweightSynchronizer::exit > - Remove logStream include > - Remove strange comment > - Fix javaThread include src/hotspot/share/runtime/lightweightSynchronizer.cpp line 77: > 75: using ConcurrentTable = ConcurrentHashTable<Config, MEMFLAGS::mtObjectMonitor>; > 76: > 77: ConcurrentTable* _table; So you have a class ObjectMonitorWorld, which references the ConcurrentTable, which, internally also has its actual table. This is 3 dereferences to get to the actual table, if I counted correctly. I'd try to eliminate the outermost ObjectMonitorWorld class, or at least make it a global flat structure instead of a reference to a heap-allocated object. I think, because this is a structure that is global and would exist throughout the lifetime of the Java program anyway, it might be worth figuring out how to do the actual ConcurrentHashTable flat in the global structure, too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1682682065 From mbaesken at openjdk.org Thu Jul 18 11:37:35 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 18 Jul 2024 11:37:35 GMT Subject: RFR: 8330144: Revise os::free_memory() [v2] In-Reply-To: <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com> References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com> <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com> Message-ID: <hsyl5GR2ddiGPaY1gNEbkRT0zbLsALHg1ILn2bGzwAg=.6c3c54d2-f080-4cf3-8148-f9c69724149f@github.com> On Wed, 10 Jul 2024 20:09:45 GMT, Robert Toyonaga <duke at openjdk.org> wrote: >> ### Summary >> On linux, change `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` so that it uses `madvise(MADV_DONTNEED)` (similar to the BSD implementation) instead of recommitting over the existing committed memory to discard the existing pages. This function should free the underlying memory without uncommitting. The benefit of this change is that we can get rid of conditional logic dependent on whether we're dealing with huge pages, `madvise` can't fail, and we can also get rid of the "alignment_hint" parameter. >> >> `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` has also been renamed to `os::disclaim_memory(char *addr, size_t bytes)` to differentiate it from `os::free_memory()` which reports the size of free memory instead of actually releasing memory. >> >> **Transparent huge pages:** >> `madvise(MADV_DONTNEED)` works with THP. As with small pages, `madvise(MADV_DONTNEED)` results in the memory being freed, RSS decreasing, and the addresses can be re-touched without being explicitly recommitted. >> >> To determine this, I set /sys/kernel/mm/transparent_hugepage/enabled to "always" and allocated a large amount of memory. Then /proc/PID/smaps shows that THP are being used to back that memory. After calling `disclaim_memory`, RSS decreases indicating the memory is no longer live. The `os::committed_in_range function` also reports that the memory has been freed (This function should probably be renamed to `live_in_range`). Touching the addresses again afterward is fine as well. >> >> **Explicit huge pages:** >> `madvise(MADV_DONTNEED)` does not result in memory being freed when used on explicit huge pages. However, the pages are not lost either. Additionally, after `madvise(MADV_DONTNEED)`, we can retouch the addresses without any problems. In conclusion, `madvise(MADV_DONTNEED)` has no affect on huge pages. This means the behavior of of this function with respect to huge pages remains the same. We can remove the "alignment_hint" parameter. >> >> To determine this, I allocated some huge pages via /proc/sys/vm/nr_hugepages. Successful allocation was confirmed with /proc/meminfo. After calling `disclaim_memory`, /proc/meminfo shows no change in the number of huge pages in use. Explicit huge pages are not reflected in RSS so I used the `os::committed_in_range function` instead. After calling `disclaim_memory`, the `os::committed_in_range` function reports that the memory is still live. Unfortunately that's not an imp... > > Robert Toyonaga has updated the pull request incrementally with two additional commits since the last revision: > > - Minor cleanup and comments. > - rename to disclaim_memory and update test Marked as reviewed by mbaesken (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20080#pullrequestreview-2185565822 From thartmann at openjdk.org Thu Jul 18 11:54:32 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 18 Jul 2024 11:54:32 GMT Subject: RFR: 8325945: Error reporting should limit the number of String characters printed [v4] In-Reply-To: <mHlCtFCitj8_YGchzdAHdKC3db_MXGam6Am_z_M1BNM=.1e9e4b5a-3f8c-4946-8254-c425d64da354@github.com> References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> <mHlCtFCitj8_YGchzdAHdKC3db_MXGam6Am_z_M1BNM=.1e9e4b5a-3f8c-4946-8254-c425d64da354@github.com> Message-ID: <gEMfMw0RxHxCxbpXk8zf59ELEBGuY6T4j5xrk8iaq7I=.b0434363-0f67-40d4-9724-864a4cdbdaae@github.com> On Thu, 18 Jul 2024 06:52:44 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log. >> >> The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 2 to O_BUFLEN. >> >> The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless. >> >> If a string's length exceeds `max_length` then we print it as follows: >> >> "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged) >> >> For example if we print "ABCDE" with a max_length of 4 then the output is literally: >> >> "AB ... DE" (abridged) >> >> The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive). >> >> For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`. >> >> Testing: >> - new test added for validation purposes >> - tiers 1 - 3 as sanity testing >> >> Thanks > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Update comment Still looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20150#pullrequestreview-2185596550 From thartmann at openjdk.org Thu Jul 18 11:54:38 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 18 Jul 2024 11:54:38 GMT Subject: RFR: 8336245: AArch64: remove extra register copy when converting from long to pointer In-Reply-To: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> Message-ID: <VjZBnADJJJzZtRFXXdS90tBzYVzRuS_3W9q1iBNex9k=.151572e1-8a4d-47a7-a195-e5bf31a2a8ac@github.com> On Fri, 12 Jul 2024 13:44:25 GMT, Fei Gao <fgao at openjdk.org> wrote: > In the cases like: > > UNSAFE.putLong(address + off1 + 1030, lseed); > UNSAFE.putLong(address + 1023, lseed); > UNSAFE.putLong(address + off2 + 1001, lseed); > > > Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like: > > ldr R10, [R15, #120] # int ! Field: address > ldr R11, [R16, #136] # int ! Field: off1 > ldr R12, [R16, #144] # int ! Field: off2 > add R11, R11, R10 > mov R11, R11 # long -> ptr > add R12, R12, R10 > mov R10, R10 # long -> ptr > add R11, R11, #1030 # ptr > str R17, [R11] # int > add R10, R10, #1023 # ptr > str R17, [R10] # int > mov R10, R12 # long -> ptr > add R10, R10, #1001 # ptr > str R17, [R10] # int > > > In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly: > > ldr x10, [x15,#120] > ldp x11, x12, [x16,#136] > add x11, x11, x10 > add x12, x12, x10 > add x11, x11, #0x406 > str x17, [x11] > add x10, x10, #0x3ff > str x17, [x10] > mov x10, x12 <--- extra register copy > add x10, x10, #0x3e9 > str x17, [x10] > > > There is still one extra register copy, which we're trying to remove in this patch. > > This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition. > > Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so > > [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906 Sure, I'll run this through our testing and report back once it passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20157#issuecomment-2236307911 From coleenp at openjdk.org Thu Jul 18 12:32:31 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 18 Jul 2024 12:32:31 GMT Subject: RFR: 8334772: Change Class::signers to an explicit field In-Reply-To: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com> References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com> Message-ID: <E6_2baf-dUa28dZyZdQlfDmCqJ7sPoIGdgsfJLxPYaU=.2c9e818f-ebfb-4ae8-8d42-0ecd860089a9@github.com> On Wed, 17 Jul 2024 19:47:44 GMT, Chen Liang <liach at openjdk.org> wrote: > `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.) > > Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR. I thought we moved this already. There's a change in the heapDumper.cpp that probably has to change also, because I think we're now dumping signers twice (two lines). The one in jvmtiTagMap.cpp reports the SIGNERS tag so that has to stay. ------------- PR Review: https://git.openjdk.org/jdk/pull/20223#pullrequestreview-2185676651 From liach at openjdk.org Thu Jul 18 12:40:32 2024 From: liach at openjdk.org (Chen Liang) Date: Thu, 18 Jul 2024 12:40:32 GMT Subject: RFR: 8334772: Change Class::signers to an explicit field In-Reply-To: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com> References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com> Message-ID: <5gvvLeGQfl_OU3uY4P2QTYTXDxZTt5-INzv-Yt4mpRM=.a49d4d77-793a-46f0-90cb-d219af508f37@github.com> On Wed, 17 Jul 2024 19:47:44 GMT, Chen Liang <liach at openjdk.org> wrote: > `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.) > > Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR. The `native` flag is not rendered in API spec, so indeed we can drop without a CSR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20223#issuecomment-2236359362 From alanb at openjdk.org Thu Jul 18 12:46:31 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 18 Jul 2024 12:46:31 GMT Subject: RFR: 8334772: Change Class::signers to an explicit field In-Reply-To: <E6_2baf-dUa28dZyZdQlfDmCqJ7sPoIGdgsfJLxPYaU=.2c9e818f-ebfb-4ae8-8d42-0ecd860089a9@github.com> References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com> <E6_2baf-dUa28dZyZdQlfDmCqJ7sPoIGdgsfJLxPYaU=.2c9e818f-ebfb-4ae8-8d42-0ecd860089a9@github.com> Message-ID: <MbTytLxlPVZ_kNtGqlQ9nciUEmNcyZfd_QM1eDRARRE=.cff0b65e-8339-4dc0-b3dc-52944299002c@github.com> On Thu, 18 Jul 2024 12:30:24 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: > There's a change in the heapDumper.cpp that probably has to change also, because I think we're now dumping signers twice (two lines). The HPROF heap dump has a slot for signers so have to keep that to avoid breakage. So yes, it means two refs as the signers will be in the instance fields list too. The HPROF format could be rev'ed but may not be worth it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20223#issuecomment-2236404312 From coleenp at openjdk.org Thu Jul 18 13:09:34 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 18 Jul 2024 13:09:34 GMT Subject: RFR: 8334772: Change Class::signers to an explicit field In-Reply-To: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com> References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com> Message-ID: <z6428CN4RgF6LTYx8R-MuHuPu_jerimhS5-XTjVvE6A=.b3b927f7-e7ca-4a90-afdb-2cee94d105f6@github.com> On Wed, 17 Jul 2024 19:47:44 GMT, Chen Liang <liach at openjdk.org> wrote: > `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.) > > Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR. Ok. It's not obvious from the code but I don't think it's worth commenting. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20223#pullrequestreview-2185765767 From liach at openjdk.org Thu Jul 18 13:31:02 2024 From: liach at openjdk.org (Chen Liang) Date: Thu, 18 Jul 2024 13:31:02 GMT Subject: RFR: 8334772: Change Class::signers to an explicit field [v2] In-Reply-To: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com> References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com> Message-ID: <Btsn_5ZHvYuNbW8Pjyyy43sSSz4TjVlW4tfyU1tUza4=.00cd3fb4-6074-484e-bead-3cfb7a3569b6@github.com> > `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.) > > Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR. Chen Liang has updated the pull request incrementally with one additional commit since the last revision: Reorder comment of classData to avoid misunderstanding ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20223/files - new: https://git.openjdk.org/jdk/pull/20223/files/86b3a248..dd62b9d2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20223&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20223&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20223.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20223/head:pull/20223 PR: https://git.openjdk.org/jdk/pull/20223 From alanb at openjdk.org Thu Jul 18 13:36:33 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 18 Jul 2024 13:36:33 GMT Subject: RFR: 8334772: Change Class::signers to an explicit field [v2] In-Reply-To: <Btsn_5ZHvYuNbW8Pjyyy43sSSz4TjVlW4tfyU1tUza4=.00cd3fb4-6074-484e-bead-3cfb7a3569b6@github.com> References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com> <Btsn_5ZHvYuNbW8Pjyyy43sSSz4TjVlW4tfyU1tUza4=.00cd3fb4-6074-484e-bead-3cfb7a3569b6@github.com> Message-ID: <2ZVM5wAKjhLfFx4CFBSyQ7yND6VIsMcNTxRubvcuXps=.c511f719-746a-4c01-b744-82f1f2b3619f@github.com> On Thu, 18 Jul 2024 13:31:02 GMT, Chen Liang <liach at openjdk.org> wrote: >> `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.) >> >> Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Reorder comment of classData to avoid misunderstanding Marked as reviewed by alanb (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20223#pullrequestreview-2185842171 From duke at openjdk.org Thu Jul 18 13:38:38 2024 From: duke at openjdk.org (Robert Toyonaga) Date: Thu, 18 Jul 2024 13:38:38 GMT Subject: Integrated: 8330144: Revise os::free_memory() In-Reply-To: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com> References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com> Message-ID: <AOtzhjQz_eSZz92AbEtMHyuvEkyUat-9Mjp1yDZa7A4=.480e5df1-5c17-4aa7-bab5-1daa028dff02@github.com> On Mon, 8 Jul 2024 17:33:41 GMT, Robert Toyonaga <duke at openjdk.org> wrote: > ### Summary > On linux, change `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` so that it uses `madvise(MADV_DONTNEED)` (similar to the BSD implementation) instead of recommitting over the existing committed memory to discard the existing pages. This function should free the underlying memory without uncommitting. The benefit of this change is that we can get rid of conditional logic dependent on whether we're dealing with huge pages, `madvise` can't fail, and we can also get rid of the "alignment_hint" parameter. > > `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` has also been renamed to `os::disclaim_memory(char *addr, size_t bytes)` to differentiate it from `os::free_memory()` which reports the size of free memory instead of actually releasing memory. > > **Transparent huge pages:** > `madvise(MADV_DONTNEED)` works with THP. As with small pages, `madvise(MADV_DONTNEED)` results in the memory being freed, RSS decreasing, and the addresses can be re-touched without being explicitly recommitted. > > To determine this, I set /sys/kernel/mm/transparent_hugepage/enabled to "always" and allocated a large amount of memory. Then /proc/PID/smaps shows that THP are being used to back that memory. After calling `disclaim_memory`, RSS decreases indicating the memory is no longer live. The `os::committed_in_range function` also reports that the memory has been freed (This function should probably be renamed to `live_in_range`). Touching the addresses again afterward is fine as well. > > **Explicit huge pages:** > `madvise(MADV_DONTNEED)` does not result in memory being freed when used on explicit huge pages. However, the pages are not lost either. Additionally, after `madvise(MADV_DONTNEED)`, we can retouch the addresses without any problems. In conclusion, `madvise(MADV_DONTNEED)` has no affect on huge pages. This means the behavior of of this function with respect to huge pages remains the same. We can remove the "alignment_hint" parameter. > > To determine this, I allocated some huge pages via /proc/sys/vm/nr_hugepages. Successful allocation was confirmed with /proc/meminfo. After calling `disclaim_memory`, /proc/meminfo shows no change in the number of huge pages in use. Explicit huge pages are not reflected in RSS so I used the `os::committed_in_range function` instead. After calling `disclaim_memory`, the `os::committed_in_range` function reports that the memory is still live. Unfortunately that's not an improvement upon existing behav... This pull request has now been integrated. Changeset: 4a73ed44 Author: Robert Toyonaga <rtoyonag at redhat.com> Committer: Thomas Stuefe <stuefe at openjdk.org> URL: https://git.openjdk.org/jdk/commit/4a73ed44f1af4ea3e53b1e1a6acfca1ba6b636c3 Stats: 44 lines in 10 files changed: 24 ins; 6 del; 14 mod 8330144: Revise os::free_memory() Reviewed-by: stuefe, mbaesken ------------- PR: https://git.openjdk.org/jdk/pull/20080 From liach at openjdk.org Thu Jul 18 13:48:06 2024 From: liach at openjdk.org (Chen Liang) Date: Thu, 18 Jul 2024 13:48:06 GMT Subject: RFR: 8334772: Change Class::signers to an explicit field [v3] In-Reply-To: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com> References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com> Message-ID: <nPAU4Lju2n6vy_fxtrTFpWDwt9XAXOmi8NSKnKCCy70=.62735fa0-0e49-4b04-ab6c-f856eb5f58a7@github.com> > `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.) > > Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR. Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-signers - Reorder comment of classData to avoid misunderstanding - 8334772: Change Class::signers to an explicit field ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20223/files - new: https://git.openjdk.org/jdk/pull/20223/files/dd62b9d2..5d742e34 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20223&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20223&range=01-02 Stats: 779 lines in 28 files changed: 676 ins; 29 del; 74 mod Patch: https://git.openjdk.org/jdk/pull/20223.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20223/head:pull/20223 PR: https://git.openjdk.org/jdk/pull/20223 From rriggs at openjdk.org Thu Jul 18 15:11:33 2024 From: rriggs at openjdk.org (Roger Riggs) Date: Thu, 18 Jul 2024 15:11:33 GMT Subject: RFR: 8334772: Change Class::signers to an explicit field [v3] In-Reply-To: <nPAU4Lju2n6vy_fxtrTFpWDwt9XAXOmi8NSKnKCCy70=.62735fa0-0e49-4b04-ab6c-f856eb5f58a7@github.com> References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com> <nPAU4Lju2n6vy_fxtrTFpWDwt9XAXOmi8NSKnKCCy70=.62735fa0-0e49-4b04-ab6c-f856eb5f58a7@github.com> Message-ID: <Fkuk4y6oSyWvTF0jU0i8HV8W2f0bEzzU9ebCGYWsW7M=.843ca90f-1247-4772-9c9c-36d983b44203@github.com> On Thu, 18 Jul 2024 13:48:06 GMT, Chen Liang <liach at openjdk.org> wrote: >> `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.) >> >> Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR. > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-signers > - Reorder comment of classData to avoid misunderstanding > - 8334772: Change Class::signers to an explicit field lgtm ------------- Marked as reviewed by rriggs (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20223#pullrequestreview-2186132979 From mcimadamore at openjdk.org Thu Jul 18 15:27:45 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Thu, 18 Jul 2024 15:27:45 GMT Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v3] In-Reply-To: <N0Q0GTZ0BpZzjFdQ5_-tR0hV1ba_uPXjiEWYVE7SerE=.ce7681b0-a3f8-4f38-8ad7-717d42773aab@github.com> References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com> <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com> <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com> <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com> <vkis_Q4wJQqAp1yD68PRq0cMZrUx40OCYWRSIInivPE=.d2b50a6e-1cfc-46ea-b315-43abfd46ea63@github.com> <AHMA1a2t2LmvOsuoTJXkA-g4vY1MMiUmoC7QcUath14=.b68d4910-9a33-4af0-87e0-0da3e356bfd0@github.com> <SLRLJ0POQPOmE_s6A7xdWVS3OA8Nsk3cz11OGpUMgDw=.0ca111a0-efaf-4551-9802-9b52dbaa83df@github.com> <xjy3mm5IYGAjpUPi3pW6PzKlyGjm4MDBByQdZoKwP-U=.0ee7421a-a825-46da-900e-1120ce20bbac@github.com> <05NPlQ4U6cgxul3_rm6V-5PhPdRYSWO6oKIn67lfTxo=.e36064f0-274e-422a-aeed-4672159aaf7e@github.com> <6LWfBFLTU5Umn6EoF6qNsNjOi-uzedphDp661DUr2Q4=.7cc12bce-2283-4038-b3a5-28e6750dacfa@github.com> <SDZzJPMEmQsSOaDtkC7g10HN4XPM_Q1Cmdl CsAZYcKM=.465d6eca-3af2-4d39-8d33-f4f8a026834e@github.com> <N0Q0GTZ0BpZzjFdQ5_-tR0hV1ba_uPXjiEWYVE7SerE=.ce7681b0-a3f8-4f38-8ad7-717d42773aab@github.com> Message-ID: <AXRsHfMntYQvmuTs1Uw8ZsVl6BSuY0300CtoLcvRKXw=.2a15f486-829d-48ee-8972-a2e0fed76c13@github.com> On Mon, 15 Jul 2024 16:30:11 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: > * there is some issue involving segment access with `int` induction variable which we should investigate separately This issue is tracked here: https://bugs.openjdk.org/browse/JDK-8336759 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2236855896 From aph at openjdk.org Thu Jul 18 16:39:33 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 18 Jul 2024 16:39:33 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter In-Reply-To: <7JeIjy2PKvI4EZpDain1vd0dBRlWjgjp42xPeY0bHMs=.fee63987-dd85-486d-b7d3-67e52fdbee6f@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> <7JeIjy2PKvI4EZpDain1vd0dBRlWjgjp42xPeY0bHMs=.fee63987-dd85-486d-b7d3-67e52fdbee6f@github.com> Message-ID: <K2b9CDiX4T2gCntyr6LF2q04W4ztFfyhMddDmpZoJqI=.bbce5e35-fb80-408c-adbf-920c6ed9ee72@github.com> On Thu, 11 Jul 2024 23:22:19 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 1433: >> >>> 1431: >>> 1432: // Don't check secondary_super_cache >>> 1433: if (super_check_offset.is_register() >> >> Do you see any effects from this particular change? >> >> It adds a runtime check on the fast path for all subtype checks (irrespective of whether it checks primary or secondary super). Moreover, the very same check is performed after primary super slot is checked. >> >> Unless `_secondary_super_cache` field is removed, unconditionally checking the slot at `super_check_offset` is benign. > > BTW `MacroAssembler::check_klass_subtype_fast_path` deserves a cleanup: `super_check_offset` can be safely turned into `Register` thus eliminating the code guarded by `super_check_offset.is_register() == false`. > Do you see any effects from this particular change? > > It adds a runtime check on the fast path for all subtype checks (irrespective of whether it checks primary or secondary super). Moreover, the very same check is performed after primary super slot is checked. OK. I think this was more for testing, but you make sense. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1683178664 From aph at openjdk.org Thu Jul 18 16:39:33 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 18 Jul 2024 16:39:33 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter In-Reply-To: <Eq4u2V3UeGi1VeGyEtA0FPS0sKoqAwqCgw5RmfRww-Y=.7dea6a8d-b59c-49b7-8b31-480b970d3de8@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> <7JeIjy2PKvI4EZpDain1vd0dBRlWjgjp42xPeY0bHMs=.fee63987-dd85-486d-b7d3-67e52fdbee6f@github.com> <FMWMwnwhdReuAohf_e_EWQN7yFM6WNl8Hv0_0S7goek=.9004d9f0-5755-471e-a120-6b6e83c8ebbd@github.com> <xNV7-nhHtDKME2kWU_k3bKZJId61Ii_CW12KMQvd0IY=.03b01561-0358-4635-9d1c-ee931f14f12f@github.com> <Eq4u2V3UeGi1VeGyEtA0FPS0sKoqAwqCgw5RmfRww-Y=.7dea6a8d-b59c-49b7-8b31-480b970d3de8@github.com> Message-ID: <I-RsEHCmrbxEX17LborYwpe-VJr3VbpHeCUyLooPsoo=.31ddede4-8af9-439f-bfe9-7a5273363b85@github.com> On Wed, 17 Jul 2024 18:54:32 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Now it starts to sound concerning... `Klass::set_secondary_supers()` initializes both `_secondary_supers` and `_bitmap` which implies that `Klass::is_subtype_of()` may be called on not yet initialized Klass. It that's the case, it does look like a bug on its own. How is it expected to work when `_secondary_supers` hasn't been set yet? > > On a second thought the following setter may be the culprit: > > void Klass::set_secondary_supers(Array<Klass*>* secondaries) { > assert(!UseSecondarySupersTable || secondaries == nullptr, ""); > set_secondary_supers(secondaries, SECONDARY_SUPERS_BITMAP_EMPTY); > } > > It should be adjusted to set `SECONDARY_SUPERS_BITMAP_FULL` instead. I've spent a while trying to reproduce the problem but I can't. I was seeing a problem where `Klass::is_subtype_of(vmClasses::Cloneable_klass())` was being called before the bitmap had been set. I'm not sure what to think, really. Maybe I should just back out this change to see what happens. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1683174591 From aph at openjdk.org Thu Jul 18 16:44:32 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 18 Jul 2024 16:44:32 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter In-Reply-To: <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> Message-ID: <P-SlUxusbtJqhV2MGXwLS2u9P4Yq3aQFJW664g1fOug=.e610cb90-b511-43e3-9caa-e9293a25fa5c@github.com> On Thu, 11 Jul 2024 23:39:11 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't known at compile >> time. >> >> HotSpot shared runtime >> ---------------------- >> >> Building hashed secondary tables is now unconditional. It takes very >> little time, and now that the shared runtime always has the tables, it >> might as well take advantage of them. The shared code is easier to >> follow now, I think. >> >> There might be a performance issue with x86-64 in that we build >> HotSpot for a default x86-64 target that does not support popcount. >> This means that HotSpot C++ runtime on x86 always uses a software >> emulation for popcount, even though the vast majority of machines made >> for the past 20 years can do popcount in a single instruction. It >> wouldn't be terribly hard to do something about that. >> >> Having said that, the software popcount is really not bad. >> >> x86 >> --- >> >> x86 is rather tricky, because we still support >> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as >> well as 32- and 64-bit ports. There's some further complication in >> that only `RCX` can be used as a shift count, so there's some register >> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp >> rather gnarly, with multiple levels of conditionals at compile time >> and runtime. >> >> AArch64 >> ------- >> >> AArch64 is considerably more straightforward. We always have a >> popcount instruction and (thankfully) no 32-bit code to worry about. >> >> Generally >> --------- >> >> I would dearly love simply to rip out the "old" secondary supers cache >> support, but I've left it in just in case someone has a performance >> regression. >> >> The versions of `MacroAssembler::lookup_secondary_supers_table` that >> work with variable superclasses don't take a fixed set of temp >> registers, and neither do they call out to to a slow path subroutine. >> Instead, the slow patch is expanded inline. >> >> I don't think this is necessarily bad. Apart from the very rare cases >> where C2 can't determine the superclass to search for at compile time, >> this code is only used for generating stubs, and it seemed to me >> ridiculous to have stubs calling other stubs. >> >> I've followed the guidance from @iwanowww not to obsess too much about >> the performance of C1-compiled secondary supers lookups, and to prefer >> simplicity over absolute performance. Nonetheless, this i... > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 1040: > >> 1038: >> 1039: // Secondary subtype checking >> 1040: void lookup_secondary_supers_table(Register sub_klass, > > While browsing the code, I noticed that it's far from evident at call sites which overload is used (especially with so many arguments). Does it make sense to avoid method overloads here and use distinct method names instead? So I confess: this is surely true, but I failed to think of a name for the known- and unknown-at-compile-time versions. maybe `check_const_klass_subtype_slow_path_table` and `check_var_klass_subtype_slow_path_table` ? > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 1981: > >> 1979: __ load_klass(r19_klass, copied_oop);// query the object klass >> 1980: >> 1981: BLOCK_COMMENT("type_check:"); > > Why don't you move it inside `generate_type_check`? Sorry, what? Do you mean move just this block comment? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1683182967 PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1683184664 From aph at openjdk.org Thu Jul 18 17:43:29 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 18 Jul 2024 17:43:29 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v2] In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: <M4ZQME975c-3u4MqZN8p8Mhg5g2NbSpNAdiFqgA4OSk=.13a84093-bff4-4ab3-9812-83e309c45328@github.com> > This patch expands the use of a hash table for secondary superclasses > to the interpreter, C1, and runtime. It also adds a C2 implementation > of hashed lookup in cases where the superclass isn't known at compile > time. > > HotSpot shared runtime > ---------------------- > > Building hashed secondary tables is now unconditional. It takes very > little time, and now that the shared runtime always has the tables, it > might as well take advantage of them. The shared code is easier to > follow now, I think. > > There might be a performance issue with x86-64 in that we build > HotSpot for a default x86-64 target that does not support popcount. > This means that HotSpot C++ runtime on x86 always uses a software > emulation for popcount, even though the vast majority of machines made > for the past 20 years can do popcount in a single instruction. It > wouldn't be terribly hard to do something about that. > > Having said that, the software popcount is really not bad. > > x86 > --- > > x86 is rather tricky, because we still support > `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as > well as 32- and 64-bit ports. There's some further complication in > that only `RCX` can be used as a shift count, so there's some register > shuffling to do. All of this makes the logic in macroAssembler_x86.cpp > rather gnarly, with multiple levels of conditionals at compile time > and runtime. > > AArch64 > ------- > > AArch64 is considerably more straightforward. We always have a > popcount instruction and (thankfully) no 32-bit code to worry about. > > Generally > --------- > > I would dearly love simply to rip out the "old" secondary supers cache > support, but I've left it in just in case someone has a performance > regression. > > The versions of `MacroAssembler::lookup_secondary_supers_table` that > work with variable superclasses don't take a fixed set of temp > registers, and neither do they call out to to a slow path subroutine. > Instead, the slow patch is expanded inline. > > I don't think this is necessarily bad. Apart from the very rare cases > where C2 can't determine the superclass to search for at compile time, > this code is only used for generating stubs, and it seemed to me > ridiculous to have stubs calling other stubs. > > I've followed the guidance from @iwanowww not to obsess too much about > the performance of C1-compiled secondary supers lookups, and to prefer > simplicity over absolute performance. Nonetheless, this is a > complicated patch that touches many areas. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Negated tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19989/files - new: https://git.openjdk.org/jdk/pull/19989/files/7d7694cc..bfe9ceed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=00-01 Stats: 23 lines in 3 files changed: 10 ins; 10 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19989.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989 PR: https://git.openjdk.org/jdk/pull/19989 From vlivanov at openjdk.org Thu Jul 18 19:07:33 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 18 Jul 2024 19:07:33 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v2] In-Reply-To: <P-SlUxusbtJqhV2MGXwLS2u9P4Yq3aQFJW664g1fOug=.e610cb90-b511-43e3-9caa-e9293a25fa5c@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> <P-SlUxusbtJqhV2MGXwLS2u9P4Yq3aQFJW664g1fOug=.e610cb90-b511-43e3-9caa-e9293a25fa5c@github.com> Message-ID: <yRd8QN05KfE7K_D63gauu473mUmIY5PoybKfeg0yzdA=.ad06dd5e-d1ce-4cc5-a2bf-179e0602322f@github.com> On Thu, 18 Jul 2024 16:40:47 GMT, Andrew Haley <aph at openjdk.org> wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 1040: >> >>> 1038: >>> 1039: // Secondary subtype checking >>> 1040: void lookup_secondary_supers_table(Register sub_klass, >> >> While browsing the code, I noticed that it's far from evident at call sites which overload is used (especially with so many arguments). Does it make sense to avoid method overloads here and use distinct method names instead? > > So I confess: this is surely true, but I failed to think of a name for the known- and unknown-at-compile-time versions. maybe `check_const_klass_subtype_slow_path_table` and `check_var_klass_subtype_slow_path_table` ? Another idea: `lookup_secondary_supers_table_var` vs `lookup_secondary_supers_table_const`. Or `lookup_secondary_supers_table_super_var` vs `lookup_secondary_supers_table_super_const`. >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 1981: >> >>> 1979: __ load_klass(r19_klass, copied_oop);// query the object klass >>> 1980: >>> 1981: BLOCK_COMMENT("type_check:"); >> >> Why don't you move it inside `generate_type_check`? > > Sorry, what? Do you mean move just this block comment? No, the whole piece with `if (UseSecondarySupersTable) { ... } else { ... }` included. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1683349665 PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1683348239 From vlivanov at openjdk.org Thu Jul 18 19:59:35 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 18 Jul 2024 19:59:35 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v2] In-Reply-To: <I-RsEHCmrbxEX17LborYwpe-VJr3VbpHeCUyLooPsoo=.31ddede4-8af9-439f-bfe9-7a5273363b85@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> <7JeIjy2PKvI4EZpDain1vd0dBRlWjgjp42xPeY0bHMs=.fee63987-dd85-486d-b7d3-67e52fdbee6f@github.com> <FMWMwnwhdReuAohf_e_EWQN7yFM6WNl8Hv0_0S7goek=.9004d9f0-5755-471e-a120-6b6e83c8ebbd@github.com> <xNV7-nhHtDKME2kWU_k3bKZJId61Ii_CW12KMQvd0IY=.03b01561-0358-4635-9d1c-ee931f14f12f@github.com> <Eq4u2V3UeGi1VeGyEtA0FPS0sKoqAwqCgw5RmfRww-Y=.7dea6a8d-b59c-49b7-8b31-480b970d3de8@github.com> <I-RsEHCmrbxEX17LborYwpe-VJr3VbpHeCUyLooPsoo=.31ddede4-8af9-439f-bfe9-7a5273363b85@github.com> Message-ID: <zJHx1UKVSPhz1zoL3CMSYuiI3MPN23AfMraemiDG-8k=.30ff1b8d-37f9-4af0-bf9f-5005f3021596@github.com> On Thu, 18 Jul 2024 16:35:16 GMT, Andrew Haley <aph at openjdk.org> wrote: >> On a second thought the following setter may be the culprit: >> >> void Klass::set_secondary_supers(Array<Klass*>* secondaries) { >> assert(!UseSecondarySupersTable || secondaries == nullptr, ""); >> set_secondary_supers(secondaries, SECONDARY_SUPERS_BITMAP_EMPTY); >> } >> >> It should be adjusted to set `SECONDARY_SUPERS_BITMAP_FULL` instead. > > I've spent a while trying to reproduce the problem but I can't. > > I was seeing a problem where `Klass::is_subtype_of(vmClasses::Cloneable_klass())` was being called before the bitmap had been set. I'm not sure what to think, really. Maybe I should just back out this change to see what happens. I'm in favor of backing out this change and adding an assert/guarantee (on `_secondary_supers != nullptr`) in `Klass::is_subtype_of()` to ensure no subtype checks happen on uninitialized Klasses. Then we should be able to spot and fix all the places where problematic checks happen. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1683409343 From vlivanov at openjdk.org Thu Jul 18 20:09:33 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 18 Jul 2024 20:09:33 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v2] In-Reply-To: <A2v60vdAPL9qb22NB6kLVyuCACPDeqHUYoYFRFX6ig0=.9ef6f86b-559d-463a-9061-d0bbb6093aa7@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> <A2v60vdAPL9qb22NB6kLVyuCACPDeqHUYoYFRFX6ig0=.9ef6f86b-559d-463a-9061-d0bbb6093aa7@github.com> Message-ID: <ukQ_tEZztKeBZnn8TDo3YfJ4GI0mHUrVRZmgM4d1W1g=.1fc9f9f2-c2bf-4237-94d4-dd9aae26411b@github.com> On Wed, 17 Jul 2024 17:15:32 GMT, Andrew Haley <aph at openjdk.org> wrote: >> src/hotspot/share/oops/klass.inline.hpp line 122: >> >>> 120: return true; >>> 121: >>> 122: bool result = lookup_secondary_supers_table(k); >> >> Should `UseSecondarySupersTable` affect `Klass::search_secondary_supers` as well? > > I think not. It'd complicate C++ runtime for no useful reason. On the other hand, if `-XX:-UseSecondarySupersTable` is intended solely for diagnostic purposes, then handling all possible execution modes uniformly is preferable, since it gives more confidence when troubleshooting seemingly related failures. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1683419259 From vlivanov at openjdk.org Thu Jul 18 20:13:32 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 18 Jul 2024 20:13:32 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v2] In-Reply-To: <ukQ_tEZztKeBZnn8TDo3YfJ4GI0mHUrVRZmgM4d1W1g=.1fc9f9f2-c2bf-4237-94d4-dd9aae26411b@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> <A2v60vdAPL9qb22NB6kLVyuCACPDeqHUYoYFRFX6ig0=.9ef6f86b-559d-463a-9061-d0bbb6093aa7@github.com> <ukQ_tEZztKeBZnn8TDo3YfJ4GI0mHUrVRZmgM4d1W1g=.1fc9f9f2-c2bf-4237-94d4-dd9aae26411b@github.com> Message-ID: <BolXJ-8qekfYskirR9P20jAQZW6s7WPe4A-oija7RA8=.855251f0-4246-403d-a9fe-00b9406f07e3@github.com> On Thu, 18 Jul 2024 20:07:14 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> I think not. It'd complicate C++ runtime for no useful reason. > > On the other hand, if `-XX:-UseSecondarySupersTable` is intended solely for diagnostic purposes, then handling all possible execution modes uniformly is preferable, since it gives more confidence when troubleshooting seemingly related failures. Alternatively, `Klass::is_subtype_of()` can unconditionally perform linear search over secondary_supers array. Even though I very much like to see table lookup written in C++ (accompanying heavily optimized platform-specific MacroAssembler variants), it would make C++ runtime even simpler. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1683423052 From duke at openjdk.org Thu Jul 18 20:52:38 2024 From: duke at openjdk.org (fitzsim) Date: Thu, 18 Jul 2024 20:52:38 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v11] In-Reply-To: <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com> Message-ID: <YdH1sbYiXMYAeQfEUigdlRCH1rycWckinWAPMt7wmCE=.a79dd884-1ddd-40a1-9f36-0a3af2de9d86@github.com> On Tue, 9 Jul 2024 12:08:50 GMT, Hamlin Li <mli at openjdk.org> wrote: >> Hi, >> Can you help to review the patch? >> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294). >> * NOTE: This pr depends on https://github.com/openjdk/jdk/pull/19185, which includes a README, a script to generate sleef inline headers and generated sleef inline headers. >> >> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time. >> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk. >> >> Besides of the code changes, one important task is to handle the legal process. >> >> Thanks! >> >> ## Test >> tests: >> * test/jdk/jdk/incubator/vector/ >> * test/hotspot/jtreg/compiler/vectorapi/ >> >> options: >> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:+EnableVectorSupport -XX:-UseVectorStubs >> >> ## Performance >> >> ### Options >> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' >> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' >> >> ### Float >> data >> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> >> Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0) >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408 >> Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876 >> Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936 >> Float128Vector.ATAN... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > skip TANH It is possible to regenerate `sleefinline_advsimd.h` and `sleefinline_sve.h` with some new OpenJDK build logic and only the following fifteen SLEEF source files: 32K ./src/jdk.incubator.vector/linux/native/sleef/src/arch/helperadvsimd.h 40K ./src/jdk.incubator.vector/linux/native/sleef/src/arch/helpersve.h 8.0K ./src/jdk.incubator.vector/linux/native/sleef/src/common/addSuffix.c 20K ./src/jdk.incubator.vector/linux/native/sleef/src/common/commonfuncs.h 16K ./src/jdk.incubator.vector/linux/native/sleef/src/common/dd.h 20K ./src/jdk.incubator.vector/linux/native/sleef/src/common/df.h 4.0K ./src/jdk.incubator.vector/linux/native/sleef/src/common/estrin.h 12K ./src/jdk.incubator.vector/linux/native/sleef/src/common/keywords.txt 12K ./src/jdk.incubator.vector/linux/native/sleef/src/common/misc.h 4.0K ./src/jdk.incubator.vector/linux/native/sleef/src/common/quaddef.h 4.0K ./src/jdk.incubator.vector/linux/native/sleef/src/libm/funcproto.h 20K ./src/jdk.incubator.vector/linux/native/sleef/src/libm/mkrename.c 116K ./src/jdk.incubator.vector/linux/native/sleef/src/libm/sleefinline_header.h.org 164K ./src/jdk.incubator.vector/linux/native/sleef/src/libm/sleefsimddp.c 152K ./src/jdk.incubator.vector/linux/native/sleef/src/libm/sleefsimdsp.c 624K total I was able to extract the shell and C preprocessing steps from the upstream CMake-based build system (by adding `--verbose` to `cmake --build` in `createSleef.sh`) and convert them into an OpenJDK `.gmk` file. [This branch](https://github.com/fitzsim/jdk/commits/regenerate-sleef-headers-1/) shows various approaches; ideas include: - the fifteen source files are checked directly into the OpenJDK repository - a `--regenerate-sleef-headers` configure option that will cause the headers to be rebuilt as their dependencies change - a `make regenerate-sleef-headers` phony target that unconditionally rebuilds the headers - cross-compilation support when `--openjdk-target=aarch64-linux-gnu` is specified on an `x86-64` build machine - a README section with hints on how to maintain the OpenJDK build rules Whenever the OpenJDK SLEEF source code copies were updated, one would also check for changes in the upstream CMake steps. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2237551700 From dholmes at openjdk.org Thu Jul 18 21:40:35 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 18 Jul 2024 21:40:35 GMT Subject: RFR: 8334772: Change Class::signers to an explicit field [v3] In-Reply-To: <nPAU4Lju2n6vy_fxtrTFpWDwt9XAXOmi8NSKnKCCy70=.62735fa0-0e49-4b04-ab6c-f856eb5f58a7@github.com> References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com> <nPAU4Lju2n6vy_fxtrTFpWDwt9XAXOmi8NSKnKCCy70=.62735fa0-0e49-4b04-ab6c-f856eb5f58a7@github.com> Message-ID: <hgrXmQamLgNKwuayDeBmFrbpmoJwIblYZjORh-O13tY=.93ca6837-0cab-4ca3-9d5f-7b64255e9bb8@github.com> On Thu, 18 Jul 2024 13:48:06 GMT, Chen Liang <liach at openjdk.org> wrote: >> `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.) >> >> Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR. > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-signers > - Reorder comment of classData to avoid misunderstanding > - 8334772: Change Class::signers to an explicit field On the JVMTI side and the heapDumper ... I see that heapDumper explicitly fills in a slot for the classloader, but that is also an explicit field. Does that mean that the classloader appears twice, or does the fact it is filtered by reflection mean that the heapDumper doesn't see it when dumping fields? If the latter then it suggests to me that we should be doing the same for the signers. Otherwise I don't know what the implications might be for having the field listed twice. ------------- PR Review: https://git.openjdk.org/jdk/pull/20223#pullrequestreview-2186958021 From dholmes at openjdk.org Thu Jul 18 22:18:32 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 18 Jul 2024 22:18:32 GMT Subject: RFR: 8334772: Change Class::signers to an explicit field [v3] In-Reply-To: <nPAU4Lju2n6vy_fxtrTFpWDwt9XAXOmi8NSKnKCCy70=.62735fa0-0e49-4b04-ab6c-f856eb5f58a7@github.com> References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com> <nPAU4Lju2n6vy_fxtrTFpWDwt9XAXOmi8NSKnKCCy70=.62735fa0-0e49-4b04-ab6c-f856eb5f58a7@github.com> Message-ID: <wppy0f7ZIf9KFcjbtUtVv-J3g8xgT1Rn-7I432UE_5g=.61853901-2b23-4180-9a87-09cbddc74c1d@github.com> On Thu, 18 Jul 2024 13:48:06 GMT, Chen Liang <liach at openjdk.org> wrote: >> `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.) >> >> Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR. > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-signers > - Reorder comment of classData to avoid misunderstanding > - 8334772: Change Class::signers to an explicit field I am not a hprof expert but AFAICS the `HPROF_GC_CLASS_DUMP` contains an explicit id for the classloader, signers, and pd, of the class, and then later a list of all fields declared in the class. AFAICS there is no real connection between these, so it doesn't matter if the classloader/signers/pd is an injected field, a regular Java field, or not a field at all. So in that regard it seems `signers` will now be handled the same way as `classloader` and so that should be fine. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20223#pullrequestreview-2186999675 From liach at openjdk.org Thu Jul 18 22:25:36 2024 From: liach at openjdk.org (Chen Liang) Date: Thu, 18 Jul 2024 22:25:36 GMT Subject: RFR: 8334772: Change Class::signers to an explicit field [v3] In-Reply-To: <nPAU4Lju2n6vy_fxtrTFpWDwt9XAXOmi8NSKnKCCy70=.62735fa0-0e49-4b04-ab6c-f856eb5f58a7@github.com> References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com> <nPAU4Lju2n6vy_fxtrTFpWDwt9XAXOmi8NSKnKCCy70=.62735fa0-0e49-4b04-ab6c-f856eb5f58a7@github.com> Message-ID: <gNyQrDdFcmXh88F6ECesn4vaXjwtyBvhLqp4i4ioB6o=.3e858f5c-34a4-4e65-82e3-93637c6ade73@github.com> On Thu, 18 Jul 2024 13:48:06 GMT, Chen Liang <liach at openjdk.org> wrote: >> `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.) >> >> Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR. > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-signers > - Reorder comment of classData to avoid misunderstanding > - 8334772: Change Class::signers to an explicit field Thanks for the reviews! I will go ahead and integrate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20223#issuecomment-2237720701 From liach at openjdk.org Thu Jul 18 22:25:36 2024 From: liach at openjdk.org (Chen Liang) Date: Thu, 18 Jul 2024 22:25:36 GMT Subject: Integrated: 8334772: Change Class::signers to an explicit field In-Reply-To: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com> References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com> Message-ID: <lwMnJtI4qZulSvJespfXJNOsPeNKlA_yrUcludNkqds=.5e05f673-a3d4-433e-ba97-c9da4ab71b24@github.com> On Wed, 17 Jul 2024 19:47:44 GMT, Chen Liang <liach at openjdk.org> wrote: > `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.) > > Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR. This pull request has now been integrated. Changeset: 39f44768 Author: Chen Liang <liach at openjdk.org> URL: https://git.openjdk.org/jdk/commit/39f44768131254ee11f723f92e2bac57b0d1ade0 Stats: 72 lines in 6 files changed: 6 ins; 53 del; 13 mod 8334772: Change Class::signers to an explicit field Reviewed-by: dholmes, alanb, rriggs, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/20223 From dholmes at openjdk.org Fri Jul 19 06:25:38 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 19 Jul 2024 06:25:38 GMT Subject: RFR: 8325945: Error reporting should limit the number of String characters printed [v4] In-Reply-To: <mHlCtFCitj8_YGchzdAHdKC3db_MXGam6Am_z_M1BNM=.1e9e4b5a-3f8c-4946-8254-c425d64da354@github.com> References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> <mHlCtFCitj8_YGchzdAHdKC3db_MXGam6Am_z_M1BNM=.1e9e4b5a-3f8c-4946-8254-c425d64da354@github.com> Message-ID: <j2qM-EBPh_DROJGmAJrnHA0z6i8hDvDddqRr8wZZMaQ=.df6479d5-2dbd-46bc-baa4-f3933dc75be7@github.com> On Thu, 18 Jul 2024 06:52:44 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log. >> >> The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 2 to O_BUFLEN. >> >> The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless. >> >> If a string's length exceeds `max_length` then we print it as follows: >> >> "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged) >> >> For example if we print "ABCDE" with a max_length of 4 then the output is literally: >> >> "AB ... DE" (abridged) >> >> The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive). >> >> For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`. >> >> Testing: >> - new test added for validation purposes >> - tiers 1 - 3 as sanity testing >> >> Thanks > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Update comment Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20150#issuecomment-2238360300 From dholmes at openjdk.org Fri Jul 19 06:25:40 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 19 Jul 2024 06:25:40 GMT Subject: Integrated: 8325945: Error reporting should limit the number of String characters printed In-Reply-To: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> Message-ID: <aPbD1pUbidcjt6JTaxo9KShn1xG_-SzcoKEc9rQrk68=.4487bc83-b234-46f2-98b0-7fe20ae0db30@github.com> On Fri, 12 Jul 2024 02:17:46 GMT, David Holmes <dholmes at openjdk.org> wrote: > Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log. > > The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 2 to O_BUFLEN. > > The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless. > > If a string's length exceeds `max_length` then we print it as follows: > > "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged) > > For example if we print "ABCDE" with a max_length of 4 then the output is literally: > > "AB ... DE" (abridged) > > The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive). > > For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`. > > Testing: > - new test added for validation purposes > - tiers 1 - 3 as sanity testing > > Thanks This pull request has now been integrated. Changeset: 10fcad70 Author: David Holmes <dholmes at openjdk.org> URL: https://git.openjdk.org/jdk/commit/10fcad70b3894023d65716b42dc67c1a2bda9c03 Stats: 166 lines in 6 files changed: 163 ins; 0 del; 3 mod 8325945: Error reporting should limit the number of String characters printed Reviewed-by: thartmann, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/20150 From mli at openjdk.org Fri Jul 19 07:20:43 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 19 Jul 2024 07:20:43 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v11] In-Reply-To: <YdH1sbYiXMYAeQfEUigdlRCH1rycWckinWAPMt7wmCE=.a79dd884-1ddd-40a1-9f36-0a3af2de9d86@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com> <YdH1sbYiXMYAeQfEUigdlRCH1rycWckinWAPMt7wmCE=.a79dd884-1ddd-40a1-9f36-0a3af2de9d86@github.com> Message-ID: <LCCK0gVv2r5lhKluIDzAF_WV9dsKwwRo7WVfcj8-NxU=.2f6b2a21-716c-4868-8365-e3fd28bfc8eb@github.com> On Thu, 18 Jul 2024 20:50:14 GMT, fitzsim <duke at openjdk.org> wrote: > It is possible to regenerate `sleefinline_advsimd.h` and `sleefinline_sve.h` with some new OpenJDK build logic and only the following fifteen SLEEF source files: > > ``` > 32K ./src/jdk.incubator.vector/linux/native/sleef/src/arch/helperadvsimd.h > 40K ./src/jdk.incubator.vector/linux/native/sleef/src/arch/helpersve.h > 8.0K ./src/jdk.incubator.vector/linux/native/sleef/src/common/addSuffix.c > 20K ./src/jdk.incubator.vector/linux/native/sleef/src/common/commonfuncs.h > 16K ./src/jdk.incubator.vector/linux/native/sleef/src/common/dd.h > 20K ./src/jdk.incubator.vector/linux/native/sleef/src/common/df.h > 4.0K ./src/jdk.incubator.vector/linux/native/sleef/src/common/estrin.h > 12K ./src/jdk.incubator.vector/linux/native/sleef/src/common/keywords.txt > 12K ./src/jdk.incubator.vector/linux/native/sleef/src/common/misc.h > 4.0K ./src/jdk.incubator.vector/linux/native/sleef/src/common/quaddef.h > 4.0K ./src/jdk.incubator.vector/linux/native/sleef/src/libm/funcproto.h > 20K ./src/jdk.incubator.vector/linux/native/sleef/src/libm/mkrename.c > 116K ./src/jdk.incubator.vector/linux/native/sleef/src/libm/sleefinline_header.h.org > 164K ./src/jdk.incubator.vector/linux/native/sleef/src/libm/sleefsimddp.c > 152K ./src/jdk.incubator.vector/linux/native/sleef/src/libm/sleefsimdsp.c > 624K total > ``` > > I was able to extract the shell and C preprocessing steps from the upstream CMake-based build system (by adding `--verbose` to `cmake --build` in `createSleef.sh`) and convert them into an OpenJDK `.gmk` file. > > [This branch](https://github.com/fitzsim/jdk/commits/regenerate-sleef-headers-1/) shows various approaches; ideas include: > > * the fifteen source files are checked directly into the OpenJDK repository > * a `--regenerate-sleef-headers` configure option that will cause the headers to be rebuilt as their dependencies change > * a `make regenerate-sleef-headers` phony target that unconditionally rebuilds the headers > * cross-compilation support when `--openjdk-target=aarch64-linux-gnu` is specified on an `x86-64` build machine > * a README section with hints on how to maintain the OpenJDK build rules > Really nice work, Thanks! > Whenever the OpenJDK SLEEF source code copies were updated, one would also check for changes in the upstream CMake steps. Compared to current implementation in https://github.com/openjdk/jdk/pull/19185, my bit concern about [This branch](https://github.com/fitzsim/jdk/commits/regenerate-sleef-headers-1/) is the future maintainence effort when we need to update the sleef source along with the cmake changes, also when new platforms support of sleef are added in jdk. In another hand, I'm not sure if [This branch](https://github.com/fitzsim/jdk/commits/regenerate-sleef-headers-1/) qualify the traceability requirement discussed above. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2238531577 From aph at openjdk.org Fri Jul 19 09:20:36 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 19 Jul 2024 09:20:36 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v11] In-Reply-To: <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com> Message-ID: <ml4dF0TwSQdlURT8ETSAN9RVnx3iMIVNDCWedq8lc1Y=.6b3da39c-41fc-460e-8632-d5a42be279ab@github.com> On Tue, 9 Jul 2024 12:08:50 GMT, Hamlin Li <mli at openjdk.org> wrote: >> Hi, >> Can you help to review the patch? >> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294). >> * NOTE: This pr depends on https://github.com/openjdk/jdk/pull/19185, which includes a README, a script to generate sleef inline headers and generated sleef inline headers. >> >> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time. >> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk. >> >> Besides of the code changes, one important task is to handle the legal process. >> >> Thanks! >> >> ## Test >> tests: >> * test/jdk/jdk/incubator/vector/ >> * test/hotspot/jtreg/compiler/vectorapi/ >> >> options: >> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:+EnableVectorSupport -XX:-UseVectorStubs >> >> ## Performance >> >> ### Options >> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' >> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' >> >> ### Float >> data >> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> >> Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0) >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408 >> Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876 >> Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936 >> Float128Vector.ATAN... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > skip TANH > Compared to current implementation in #19185, my bit concern about [This branch](https://github.com/fitzsim/jdk/commits/regenerate-sleef-headers-1/) is the future maintainence effort when we need to update the sleef source along with the cmake changes, also when new platforms support of sleef are added in jdk. That's a fair point. However, it's probably less work than any adequate alternative proposed thus far. > In another hand, I'm not sure if [This branch](https://github.com/fitzsim/jdk/commits/regenerate-sleef-headers-1/) qualify the traceability requirement discussed above. I'm sure it's fine: we have readable source code in the preferred form, along with a script that generates it from the corresponding SLEEF release. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2238734507 From jiefu at openjdk.org Fri Jul 19 11:56:39 2024 From: jiefu at openjdk.org (Jie Fu) Date: Fri, 19 Jul 2024 11:56:39 GMT Subject: RFR: 8325945: Error reporting should limit the number of String characters printed [v4] In-Reply-To: <mHlCtFCitj8_YGchzdAHdKC3db_MXGam6Am_z_M1BNM=.1e9e4b5a-3f8c-4946-8254-c425d64da354@github.com> References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com> <mHlCtFCitj8_YGchzdAHdKC3db_MXGam6Am_z_M1BNM=.1e9e4b5a-3f8c-4946-8254-c425d64da354@github.com> Message-ID: <yJfZgDUnG3kEsld9PIIiHxaRPfbuDDoY40lYeuHQ_jU=.17c3b559-bc8c-43f0-b716-d7272e096460@github.com> On Thu, 18 Jul 2024 06:52:44 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log. >> >> The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 2 to O_BUFLEN. >> >> The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless. >> >> If a string's length exceeds `max_length` then we print it as follows: >> >> "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged) >> >> For example if we print "ABCDE" with a max_length of 4 then the output is literally: >> >> "AB ... DE" (abridged) >> >> The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive). >> >> For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`. >> >> Testing: >> - new test added for validation purposes >> - tiers 1 - 3 as sanity testing >> >> Thanks > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Update comment runtime/PrintingTests/StringPrinting.java fails with release VMs. Please see https://github.com/openjdk/jdk/pull/20249 Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20150#issuecomment-2238980728 From aturbanov at openjdk.org Fri Jul 19 12:19:37 2024 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Fri, 19 Jul 2024 12:19:37 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9] In-Reply-To: <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com> Message-ID: <vVDog-D2CN7FfbKEkwymEfCwaBoYG0qtLcP67v4ddqk=.3903cae2-4571-4292-910e-44733436b607@github.com> On Mon, 15 Jul 2024 00:50:30 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision: > > - Remove try_read > - Add explicit to single parameter constructors > - Remove superfluous access specifier > - Remove unused include > - Update assert message OMCache::set_monitor > - Fix indentation > - Remove outdated comment LightweightSynchronizer::exit > - Remove logStream include > - Remove strange comment > - Fix javaThread include test/hotspot/jtreg/runtime/Monitor/UseObjectMonitorTableTest.java line 126: > 124: int count = getCount(); > 125: if (count != i * THREADS) { > 126: throw new RuntimeException("WaitNotifyTest: Invalid Count " + count + Suggestion: throw new RuntimeException("WaitNotifyTest: Invalid Count " + count + test/hotspot/jtreg/runtime/Monitor/UseObjectMonitorTableTest.java line 136: > 134: int count = getCount(); > 135: if (count != ITERATIONS * THREADS) { > 136: throw new RuntimeException("WaitNotifyTest: Invalid Count " + count); Suggestion: throw new RuntimeException("WaitNotifyTest: Invalid Count " + count); test/hotspot/jtreg/runtime/Monitor/UseObjectMonitorTableTest.java line 217: > 215: int count = getCount(); > 216: if (count != THREADS * ITERATIONS) { > 217: throw new RuntimeException("RandomDepthTest: Invalid Count " + count); Suggestion: throw new RuntimeException("RandomDepthTest: Invalid Count " + count); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1684293578 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1684293811 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1684293954 From szaldana at openjdk.org Fri Jul 19 13:51:17 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Fri, 19 Jul 2024 13:51:17 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v4] In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> Message-ID: <n9g0YwM2xZHvUOcVPAjck8WMEgMol0NXR_XT9fwdk4w=.3afe7768-2832-4c76-8002-671b8e0c72e3@github.com> > Hi all, > > This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. > > This PR addresses the following diagnostic commands: > - [x] Compiler.perfmap > - [x] GC.heap_dump > - [x] System.dump_map > - [x] Thread.dump_to_file > - [x] VM.cds > > Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). > > I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, > > > filename (Optional) Name of the file to which the flight recording data is > written when the recording is stopped. If no filename is given, a > filename is generated from the PID and the current date and is > placed in the directory where the process was started. The > filename may also be a directory in which case, the filename is > generated from the PID and the current date in the specified > directory. (STRING, no default value) > > Note: If a filename is given, '%p' in the filename will be > replaced by the PID, and '%t' will be replaced by the time in > 'yyyy_MM_dd_HH_mm_ss' format. > > > Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. > > Testing: > > - [x] Added test case passes. > - [x] Modified existing VM.cds tests to also check for `%p` filenames. > > Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). > > Cheers, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with three additional commits since the last revision: - Adding tests for file dcmd argument - Updates to test case - Adding FileArgument as a diagnostic argument ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20198/files - new: https://git.openjdk.org/jdk/pull/20198/files/3bb774d3..c71cb639 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=02-03 Stats: 146 lines in 11 files changed: 76 ins; 46 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/20198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198 PR: https://git.openjdk.org/jdk/pull/20198 From szaldana at openjdk.org Fri Jul 19 14:07:12 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Fri, 19 Jul 2024 14:07:12 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v5] In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> Message-ID: <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com> > Hi all, > > This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. > > This PR addresses the following diagnostic commands: > - [x] Compiler.perfmap > - [x] GC.heap_dump > - [x] System.dump_map > - [x] Thread.dump_to_file > - [x] VM.cds > > Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). > > I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, > > > filename (Optional) Name of the file to which the flight recording data is > written when the recording is stopped. If no filename is given, a > filename is generated from the PID and the current date and is > placed in the directory where the process was started. The > filename may also be a directory in which case, the filename is > generated from the PID and the current date in the specified > directory. (STRING, no default value) > > Note: If a filename is given, '%p' in the filename will be > replaced by the PID, and '%t' will be replaced by the time in > 'yyyy_MM_dd_HH_mm_ss' format. > > > Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. > > Testing: > > - [x] Added test case passes. > - [x] Modified existing VM.cds tests to also check for `%p` filenames. > > Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). > > Cheers, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: Missing copyright header update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20198/files - new: https://git.openjdk.org/jdk/pull/20198/files/c71cb639..cdf1d457 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198 PR: https://git.openjdk.org/jdk/pull/20198 From szaldana at openjdk.org Fri Jul 19 14:07:12 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Fri, 19 Jul 2024 14:07:12 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v4] In-Reply-To: <n9g0YwM2xZHvUOcVPAjck8WMEgMol0NXR_XT9fwdk4w=.3afe7768-2832-4c76-8002-671b8e0c72e3@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <n9g0YwM2xZHvUOcVPAjck8WMEgMol0NXR_XT9fwdk4w=.3afe7768-2832-4c76-8002-671b8e0c72e3@github.com> Message-ID: <a_uaHVLjX1O1prL33-UPUq7_T8CVQXk0_opluJj0yEI=.de732778-4a50-46bf-bdff-e80555c333b6@github.com> On Fri, 19 Jul 2024 13:51:17 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. >> >> This PR addresses the following diagnostic commands: >> - [x] Compiler.perfmap >> - [x] GC.heap_dump >> - [x] System.dump_map >> - [x] Thread.dump_to_file >> - [x] VM.cds >> >> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). >> >> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, >> >> >> filename (Optional) Name of the file to which the flight recording data is >> written when the recording is stopped. If no filename is given, a >> filename is generated from the PID and the current date and is >> placed in the directory where the process was started. The >> filename may also be a directory in which case, the filename is >> generated from the PID and the current date in the specified >> directory. (STRING, no default value) >> >> Note: If a filename is given, '%p' in the filename will be >> replaced by the PID, and '%t' will be replaced by the time in >> 'yyyy_MM_dd_HH_mm_ss' format. >> >> >> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. >> >> Testing: >> >> - [x] Added test case passes. >> - [x] Modified existing VM.cds tests to also check for `%p` filenames. >> >> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). >> >> Cheers, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with three additional commits since the last revision: > > - Adding tests for file dcmd argument > - Updates to test case > - Adding FileArgument as a diagnostic argument Hi folks, I made some updates. Just wanted to note a few things: * I think we can remove `test/jdk/sun/tools/jcmd/TestJcmdPIDSubstitution.java` and the changes to test/hotspot/jtreg/runtime/cds/appcds/jcmd tests. I?ve added a test case for dcmd file argument parsing which is more general. I?ve left the old tests in for reference at the moment. * Regarding warnings, I noted we wanted to issue any warnings to the issuer of the dcmd and not the JVM process. However, in ```diagnosticArgument.cpp```, they are issuing the warnings directly to the JVM process. I tried to stay consistent with how things are done there, but let me know what you think. Thanks for the comments! Sonia ------------- PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2239256942 From shade at openjdk.org Fri Jul 19 15:52:14 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 19 Jul 2024 15:52:14 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v3] In-Reply-To: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com> References: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com> Message-ID: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> > [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. > > We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Amend the test case for guaranteing it works under different compilation regimes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20139/files - new: https://git.openjdk.org/jdk/pull/20139/files/79ece901..437f2329 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=01-02 Stats: 36 lines in 1 file changed: 18 ins; 7 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/20139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139 PR: https://git.openjdk.org/jdk/pull/20139 From aph at openjdk.org Fri Jul 19 16:51:01 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 19 Jul 2024 16:51:01 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v3] In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: <7Kzb5V0WYTDNKGrZ7ugIELsASZhoMMJn3UTU_QFWq7Q=.7da728cd-061f-498d-a3a4-46e62c6020e8@github.com> > This patch expands the use of a hash table for secondary superclasses > to the interpreter, C1, and runtime. It also adds a C2 implementation > of hashed lookup in cases where the superclass isn't known at compile > time. > > HotSpot shared runtime > ---------------------- > > Building hashed secondary tables is now unconditional. It takes very > little time, and now that the shared runtime always has the tables, it > might as well take advantage of them. The shared code is easier to > follow now, I think. > > There might be a performance issue with x86-64 in that we build > HotSpot for a default x86-64 target that does not support popcount. > This means that HotSpot C++ runtime on x86 always uses a software > emulation for popcount, even though the vast majority of machines made > for the past 20 years can do popcount in a single instruction. It > wouldn't be terribly hard to do something about that. > > Having said that, the software popcount is really not bad. > > x86 > --- > > x86 is rather tricky, because we still support > `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as > well as 32- and 64-bit ports. There's some further complication in > that only `RCX` can be used as a shift count, so there's some register > shuffling to do. All of this makes the logic in macroAssembler_x86.cpp > rather gnarly, with multiple levels of conditionals at compile time > and runtime. > > AArch64 > ------- > > AArch64 is considerably more straightforward. We always have a > popcount instruction and (thankfully) no 32-bit code to worry about. > > Generally > --------- > > I would dearly love simply to rip out the "old" secondary supers cache > support, but I've left it in just in case someone has a performance > regression. > > The versions of `MacroAssembler::lookup_secondary_supers_table` that > work with variable superclasses don't take a fixed set of temp > registers, and neither do they call out to to a slow path subroutine. > Instead, the slow patch is expanded inline. > > I don't think this is necessarily bad. Apart from the very rare cases > where C2 can't determine the superclass to search for at compile time, > this code is only used for generating stubs, and it seemed to me > ridiculous to have stubs calling other stubs. > > I've followed the guidance from @iwanowww not to obsess too much about > the performance of C1-compiled secondary supers lookups, and to prefer > simplicity over absolute performance. Nonetheless, this is a > complicated patch that touches many areas. Andrew Haley has updated the pull request incrementally with four additional commits since the last revision: - Review feedback - Review feedback - Review feedback - Cleanup check_klass_subtype_fast_path for AArch64, deleting dead code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19989/files - new: https://git.openjdk.org/jdk/pull/19989/files/bfe9ceed..98f6b2b7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=01-02 Stats: 127 lines in 4 files changed: 6 ins; 46 del; 75 mod Patch: https://git.openjdk.org/jdk/pull/19989.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989 PR: https://git.openjdk.org/jdk/pull/19989 From szaldana at openjdk.org Fri Jul 19 18:57:40 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Fri, 19 Jul 2024 18:57:40 GMT Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on output() Message-ID: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com> Hi all, This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to. Testing: - [x] Added test case passes. Thanks, Sonia ------------- Commit messages: - 8327054: DiagnosticCommand Compiler.perfmap does not log on output() Changes: https://git.openjdk.org/jdk/pull/20257/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20257&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8327054 Stats: 16 lines in 5 files changed: 10 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20257.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20257/head:pull/20257 PR: https://git.openjdk.org/jdk/pull/20257 From cjplummer at openjdk.org Fri Jul 19 19:23:31 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 19 Jul 2024 19:23:31 GMT Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on output() In-Reply-To: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com> References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com> Message-ID: <HxQcEMgEFzfrvupDsbiTmOutUnDui2kZf7STpf0xw0U=.6c6c3464-3d04-49a3-9ee3-e12d58d0bf9b@github.com> On Fri, 19 Jul 2024 15:07:39 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: > Hi all, > > This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to. > > Testing: > - [x] Added test case passes. > > Thanks, > Sonia test/hotspot/jtreg/serviceability/dcmd/compiler/PerfMapTest.java line 124: > 122: output.shouldContain("Failed to create nonexistent/%s for perf map".formatted(test_dir)); > 123: output.shouldNotHaveExitValue(0); > 124: Files.deleteIfExists(path); If the file exists, that means the expected error message will not be found, which means an exception will be thrown before you get to the `Files.deleteIfExits(path)` call. If the file doesn't exist, then there is nothing to delete. So as things stand now this call will never delete anything. Maybe put it in a finally block so if the file does exist it will get deleted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20257#discussion_r1684890829 From lmesnik at openjdk.org Fri Jul 19 20:10:34 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 19 Jul 2024 20:10:34 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v5] In-Reply-To: <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com> Message-ID: <OtnhXPtAh2B02PSOOvQndDLToT517SqsTHcLQq_eeVM=.4111e376-a331-4aef-bce2-375f7dec5531@github.com> On Fri, 19 Jul 2024 14:07:12 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. >> >> This PR addresses the following diagnostic commands: >> - [x] Compiler.perfmap >> - [x] GC.heap_dump >> - [x] System.dump_map >> - [x] Thread.dump_to_file >> - [x] VM.cds >> >> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). >> >> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, >> >> >> filename (Optional) Name of the file to which the flight recording data is >> written when the recording is stopped. If no filename is given, a >> filename is generated from the PID and the current date and is >> placed in the directory where the process was started. The >> filename may also be a directory in which case, the filename is >> generated from the PID and the current date in the specified >> directory. (STRING, no default value) >> >> Note: If a filename is given, '%p' in the filename will be >> replaced by the PID, and '%t' will be replaced by the time in >> 'yyyy_MM_dd_HH_mm_ss' format. >> >> >> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. >> >> Testing: >> >> - [x] Added test case passes. >> - [x] Modified existing VM.cds tests to also check for `%p` filenames. >> >> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). >> >> Cheers, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Missing copyright header update Thanks for updating the fix. The new version looks moistly good. I added a few small comments. src/hotspot/share/prims/wbtestmethods/parserTests.cpp line 132: > 130: } else if (strcmp(type, "FILE") == 0) { > 131: DCmdArgument<FileArgument> *argument = > 132: new DCmdArgument<FileArgument>(name, desc, "FILE", mandatory); Please check indentation. src/hotspot/share/services/diagnosticArgument.cpp line 358: > 356: template <> void DCmdArgument<MemorySizeArgument>::destroy_value() { } > 357: > 358: template <> The common style here is to place in the single line 'template<> and other part of declaration. src/hotspot/share/services/diagnosticArgument.cpp line 366: > 364: _value._name = NEW_C_HEAP_ARRAY(char, JVM_MAXPATHLEN, mtInternal); > 365: if (!Arguments::copy_expand_pid(str, len, _value._name, JVM_MAXPATHLEN)) { > 366: fatal("Invalid file path: %s", str); As I understand the 'copy_expand_pid' might fail if very long line is used. This cause jvm crash., So there is possibility that user might crash jvm accidentally invoking jcmd command. It doesn't look safe, I believe it would be better to throw Exception like for any other invalid command, see " THROW_MSG(vmSymbols::java_lang_IllegalArgumentException()," The 'fatal" owuld make sense only if failing of 'copy_expand_pid' means some unrecoverable jvm bug. ------------- Changes requested by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20198#pullrequestreview-2189044201 PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1684887604 PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1684892964 PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1684923626 From lmesnik at openjdk.org Fri Jul 19 20:10:35 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 19 Jul 2024 20:10:35 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v4] In-Reply-To: <a_uaHVLjX1O1prL33-UPUq7_T8CVQXk0_opluJj0yEI=.de732778-4a50-46bf-bdff-e80555c333b6@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <n9g0YwM2xZHvUOcVPAjck8WMEgMol0NXR_XT9fwdk4w=.3afe7768-2832-4c76-8002-671b8e0c72e3@github.com> <a_uaHVLjX1O1prL33-UPUq7_T8CVQXk0_opluJj0yEI=.de732778-4a50-46bf-bdff-e80555c333b6@github.com> Message-ID: <hCo8C9dfEUDHystD6OUU6HebS7e_x5Q8Mo9aHaPneak=.1a72d889-2e66-46e0-9545-1aa05da8680f@github.com> On Fri, 19 Jul 2024 14:03:43 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: > * Regarding warnings, I noted we wanted to issue any warnings to the issuer of the dcmd and not the JVM process. However, in `diagnosticArgument.cpp`, they are issuing the warnings directly to the JVM process. I tried to stay consistent with how things are done there, but let me know what you think. > It makes sense to file separate issue for this and keep current behavior in the fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2240037485 From szaldana at openjdk.org Fri Jul 19 20:17:46 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Fri, 19 Jul 2024 20:17:46 GMT Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on output() [v2] In-Reply-To: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com> References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com> Message-ID: <1IQ-_rNXXSFB9LAsP0kbK3MAQSOgKKrqZxFC8tZzrkc=.8b2c00dd-bddf-4362-9125-36ad2042e794@github.com> > Hi all, > > This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to. > > Testing: > - [x] Added test case passes. > > Thanks, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: Ensuring test case deletes file in case of exception ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20257/files - new: https://git.openjdk.org/jdk/pull/20257/files/484f25eb..6129d87c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20257&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20257&range=00-01 Stats: 7 lines in 1 file changed: 3 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20257.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20257/head:pull/20257 PR: https://git.openjdk.org/jdk/pull/20257 From szaldana at openjdk.org Fri Jul 19 20:17:46 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Fri, 19 Jul 2024 20:17:46 GMT Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on output() [v2] In-Reply-To: <HxQcEMgEFzfrvupDsbiTmOutUnDui2kZf7STpf0xw0U=.6c6c3464-3d04-49a3-9ee3-e12d58d0bf9b@github.com> References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com> <HxQcEMgEFzfrvupDsbiTmOutUnDui2kZf7STpf0xw0U=.6c6c3464-3d04-49a3-9ee3-e12d58d0bf9b@github.com> Message-ID: <xUkrZDkazLNKwKpT73X8XgL282IU-MdYffjc0AER8hU=.be1e4387-8b0c-4f72-bf0d-ef47eca81022@github.com> On Fri, 19 Jul 2024 19:21:05 GMT, Chris Plummer <cjplummer at openjdk.org> wrote: >> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: >> >> Ensuring test case deletes file in case of exception > > test/hotspot/jtreg/serviceability/dcmd/compiler/PerfMapTest.java line 124: > >> 122: output.shouldContain("Failed to create nonexistent/%s for perf map".formatted(test_dir)); >> 123: output.shouldNotHaveExitValue(0); >> 124: Files.deleteIfExists(path); > > If the file exists, that means the expected error message will not be found, which means an exception will be thrown before you get to the `Files.deleteIfExits(path)` call. If the file doesn't exist, then there is nothing to delete. So as things stand now this call will never delete anything. Maybe put it in a finally block so if the file does exist it will get deleted. Makes sense, I added the finally block. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20257#discussion_r1684933809 From duke at openjdk.org Fri Jul 19 21:30:01 2024 From: duke at openjdk.org (Henry Lin) Date: Fri, 19 Jul 2024 21:30:01 GMT Subject: RFR: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23: runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be represented in type 'long int' Message-ID: <rs1CTZ0ODrwbGEqtAapeGhQtMkN1VyPeM-8O8385sPM=.824b60d9-0f40-4a30-8929-a0fcda9d7169@github.com> Cast the result of `nth_bit(n)` to `uintptr_t` to prevent signed integer overflow error reported by `ubsan`. Unsigned overflow is not undefined behavior and is not checked by `ubsan`. ------------- Commit messages: - 8332697: fix ubsan:shenandoahSimpleBitMap.inline.hpp runtime integer overflow Changes: https://git.openjdk.org/jdk/pull/20164/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20164&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332697 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20164.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20164/head:pull/20164 PR: https://git.openjdk.org/jdk/pull/20164 From cjplummer at openjdk.org Fri Jul 19 21:54:41 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 19 Jul 2024 21:54:41 GMT Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on output() [v2] In-Reply-To: <1IQ-_rNXXSFB9LAsP0kbK3MAQSOgKKrqZxFC8tZzrkc=.8b2c00dd-bddf-4362-9125-36ad2042e794@github.com> References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com> <1IQ-_rNXXSFB9LAsP0kbK3MAQSOgKKrqZxFC8tZzrkc=.8b2c00dd-bddf-4362-9125-36ad2042e794@github.com> Message-ID: <luz785NJsF0evZEELvZeNaG7KbdbbXN1RUqxVaVROXY=.8487e8c5-159c-4df3-bfde-e904ca8834b8@github.com> On Fri, 19 Jul 2024 20:17:46 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to. >> >> Testing: >> - [x] Added test case passes. >> >> Thanks, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Ensuring test case deletes file in case of exception Looks good. ------------- Marked as reviewed by cjplummer (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20257#pullrequestreview-2189401873 From dlong at openjdk.org Fri Jul 19 22:43:36 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 19 Jul 2024 22:43:36 GMT Subject: RFR: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23: runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be represented in type 'long int' In-Reply-To: <rs1CTZ0ODrwbGEqtAapeGhQtMkN1VyPeM-8O8385sPM=.824b60d9-0f40-4a30-8929-a0fcda9d7169@github.com> References: <rs1CTZ0ODrwbGEqtAapeGhQtMkN1VyPeM-8O8385sPM=.824b60d9-0f40-4a30-8929-a0fcda9d7169@github.com> Message-ID: <enP03_lP52TO93btMHFUgUECzw7mvDL2AHCnCH8pz00=.b92d9d89-14b1-4a06-9ff7-8b754aac9c1d@github.com> On Fri, 12 Jul 2024 20:53:04 GMT, Henry Lin <duke at openjdk.org> wrote: > Cast the result of `nth_bit(n)` to `uintptr_t` to prevent signed integer overflow error reported by `ubsan`. Unsigned overflow is not undefined behavior and is not checked by `ubsan`. src/hotspot/share/utilities/globalDefinitions.hpp line 1069: > 1067: // (note: #define used only so that they can be used in enum constant definitions) > 1068: #define nth_bit(n) (((n) >= BitsPerWord) ? 0 : (OneBit << (n))) > 1069: #define right_n_bits(n) ((uintptr_t) nth_bit(n) - 1) This changes the return type of right_n_bits, which could break existing code. If we need an unsigned version, I think it should have a different name. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20164#discussion_r1685072633 From stuefe at openjdk.org Sat Jul 20 08:07:33 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 20 Jul 2024 08:07:33 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v5] In-Reply-To: <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com> Message-ID: <znqG3L1DK9JuvU1fIlXfIEu8pp5t_LxIhITS1xvOPBc=.43cc6737-3086-4c2a-bad5-627ab1e91ca6@github.com> On Fri, 19 Jul 2024 14:07:12 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. >> >> This PR addresses the following diagnostic commands: >> - [x] Compiler.perfmap >> - [x] GC.heap_dump >> - [x] System.dump_map >> - [x] Thread.dump_to_file >> - [x] VM.cds >> >> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). >> >> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, >> >> >> filename (Optional) Name of the file to which the flight recording data is >> written when the recording is stopped. If no filename is given, a >> filename is generated from the PID and the current date and is >> placed in the directory where the process was started. The >> filename may also be a directory in which case, the filename is >> generated from the PID and the current date in the specified >> directory. (STRING, no default value) >> >> Note: If a filename is given, '%p' in the filename will be >> replaced by the PID, and '%t' will be replaced by the time in >> 'yyyy_MM_dd_HH_mm_ss' format. >> >> >> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. >> >> Testing: >> >> - [x] Added test case passes. >> - [x] Modified existing VM.cds tests to also check for `%p` filenames. >> >> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). >> >> Cheers, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Missing copyright header update Moistly good too. Remarks inline. src/hotspot/share/services/diagnosticArgument.cpp line 384: > 382: _value._name = nullptr; > 383: } > 384: } Whatever this `DCmdArgument<FileArgument>::destroy_value()` is supposed to do, it clearly isn't working, since we leak the memory. src/hotspot/share/services/diagnosticArgument.hpp line 66: > 64: public: > 65: char *_name; > 66: }; Something is off about this. What is the lifetime of this object? You don't free it. Running a command in a loop will consume C-heap (you can check this with NMT: `jcmd VM.native_memory baseline`, then run a command 100 times, then `jcmd VM.native_memory summary.diff` will show you the leak in mtInternal. I would probably just inline the string. E.g. struct FileArgument { char name[max name len] }; FileArguments sits as member inside DCmdArgument. DCmdArgument or DCmdArgumentWithParser sits as member in the various XXXDCmd classes. Those are created in DCmdFactory::create_local_DCmd(). Which is what, a static global list? So we only have one global XXXDCmd object instance per command, but for each command invocation re-parse the argument values? What a weird concept. Man, this coding is way too convoluted for a little parser engine :( But anyway, inlining the filename array into FileArgument should be probably fine from a size standpoint. I would, however, not use JVM_MAXPATHLEN or anything that depends ultimately on PATH_MAX from system headers. We don't want the object to consume e.g. an MB if some crazy platform defines PATH_MAX as 1MB. Therefore I would use e.g. 1024 as limit for the path name. (Note that PATH_MAX is an illusion anyway, there is never a guarantee that a path is smaller than that limit... See this good article: https://insanecoding.blogspot.com/2007/11/pathmax-simply-isnt.html) src/hotspot/share/services/diagnosticArgument.hpp line 113: > 111: void to_string(MemorySizeArgument f, char* buf, size_t len) const; > 112: void to_string(StringArrayArgument* s, char* buf, size_t len) const; > 113: void to_string(FileArgument f, char *buf, size_t len) const; Here, and in all other places: Please use 'char* var', not 'char *var'. ------------- PR Review: https://git.openjdk.org/jdk/pull/20198#pullrequestreview-2189782275 PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1685301655 PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1685297940 PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1685290041 From stuefe at openjdk.org Sat Jul 20 08:07:34 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 20 Jul 2024 08:07:34 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v5] In-Reply-To: <OtnhXPtAh2B02PSOOvQndDLToT517SqsTHcLQq_eeVM=.4111e376-a331-4aef-bce2-375f7dec5531@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com> <OtnhXPtAh2B02PSOOvQndDLToT517SqsTHcLQq_eeVM=.4111e376-a331-4aef-bce2-375f7dec5531@github.com> Message-ID: <EIIwRleptsdSt_X7tJdQ1mbPYD4RhXcpqA3b4UBb8mU=.ed64e38d-1aed-4304-9d53-1aa3ed434f89@github.com> On Fri, 19 Jul 2024 20:00:28 GMT, Leonid Mesnik <lmesnik at openjdk.org> wrote: >> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: >> >> Missing copyright header update > > src/hotspot/share/services/diagnosticArgument.cpp line 366: > >> 364: _value._name = NEW_C_HEAP_ARRAY(char, JVM_MAXPATHLEN, mtInternal); >> 365: if (!Arguments::copy_expand_pid(str, len, _value._name, JVM_MAXPATHLEN)) { >> 366: fatal("Invalid file path: %s", str); > > As I understand the 'copy_expand_pid' might fail if very long line is used. This cause jvm crash., > So there is possibility that user might crash jvm accidentally invoking jcmd command. > It doesn't look safe, I believe it would be better to throw Exception like for any other invalid command, see > " THROW_MSG(vmSymbols::java_lang_IllegalArgumentException()," > > The 'fatal" owuld make sense only if failing of 'copy_expand_pid' means some unrecoverable jvm bug. Yes. In this file, other commands use `fatal` only where reading the hard-coded default values - in the various `init_...` functions. Hard-coded values should be valid, obviously, otherwise the JVM developer messed up. Other values are passed in by the end user via jcmd and should not crash the JVM. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1685299871 From stuefe at openjdk.org Sat Jul 20 08:07:34 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 20 Jul 2024 08:07:34 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v5] In-Reply-To: <EIIwRleptsdSt_X7tJdQ1mbPYD4RhXcpqA3b4UBb8mU=.ed64e38d-1aed-4304-9d53-1aa3ed434f89@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com> <OtnhXPtAh2B02PSOOvQndDLToT517SqsTHcLQq_eeVM=.4111e376-a331-4aef-bce2-375f7dec5531@github.com> <EIIwRleptsdSt_X7tJdQ1mbPYD4RhXcpqA3b4UBb8mU=.ed64e38d-1aed-4304-9d53-1aa3ed434f89@github.com> Message-ID: <SDkUb2GIxTALba_5Dc-t3qK_zov3SEzBIMWzOsb2T0M=.1f420a00-732a-4917-a111-a0124240d4f6@github.com> On Sat, 20 Jul 2024 07:38:25 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> src/hotspot/share/services/diagnosticArgument.cpp line 366: >> >>> 364: _value._name = NEW_C_HEAP_ARRAY(char, JVM_MAXPATHLEN, mtInternal); >>> 365: if (!Arguments::copy_expand_pid(str, len, _value._name, JVM_MAXPATHLEN)) { >>> 366: fatal("Invalid file path: %s", str); >> >> As I understand the 'copy_expand_pid' might fail if very long line is used. This cause jvm crash., >> So there is possibility that user might crash jvm accidentally invoking jcmd command. >> It doesn't look safe, I believe it would be better to throw Exception like for any other invalid command, see >> " THROW_MSG(vmSymbols::java_lang_IllegalArgumentException()," >> >> The 'fatal" owuld make sense only if failing of 'copy_expand_pid' means some unrecoverable jvm bug. > > Yes. In this file, other commands use `fatal` only where reading the hard-coded default values - in the various `init_...` functions. Hard-coded values should be valid, obviously, otherwise the JVM developer messed up. Other values are passed in by the end user via jcmd and should not crash the JVM. I see the prevalent way to deal with runtime parse errors is to throw a java exception. That exception later is caught in the command processing loop at the entrance of the attach listener thread. So, @SoniaZaldana, I would do this here too - when in Rome... But is this not unnecessarily complex? It requires the AttachListener to be a java thread when in fact it does need no java - we just misuse java exception handling as a way to pass error information up the stack, with the simple ultimate goal of writing error information into the outputStream to be sent to the caller. We might just as well pass the outputStream* to the parse_xxx functions as third argument, and write directly and return some error code. This would make the attach listener thread a lot less dependent on Java and more robust - at least for jcmds that don't need Java (which jcmds need java?). After all, the attach listener is supposed to be super robust and always work even if the JVM misbehaves. @dholmes-ora @lmesnik what do you guys think, should we change that? (obviously in a different RFE) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1685304522 From stuefe at openjdk.org Sat Jul 20 08:07:35 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 20 Jul 2024 08:07:35 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v5] In-Reply-To: <znqG3L1DK9JuvU1fIlXfIEu8pp5t_LxIhITS1xvOPBc=.43cc6737-3086-4c2a-bad5-627ab1e91ca6@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com> <znqG3L1DK9JuvU1fIlXfIEu8pp5t_LxIhITS1xvOPBc=.43cc6737-3086-4c2a-bad5-627ab1e91ca6@github.com> Message-ID: <CxwbysRpQzuTNFd7180Kh5neBkx9mMJOcct-eqCzrIQ=.5393f979-4fe1-445f-a99d-a49515eec5fe@github.com> On Sat, 20 Jul 2024 07:30:55 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: >> >> Missing copyright header update > > src/hotspot/share/services/diagnosticArgument.hpp line 66: > >> 64: public: >> 65: char *_name; >> 66: }; > > Something is off about this. What is the lifetime of this object? > > You don't free it. Running a command in a loop will consume C-heap (you can check this with NMT: `jcmd VM.native_memory baseline`, then run a command 100 times, then `jcmd VM.native_memory summary.diff` will show you the leak in mtInternal. > > I would probably just inline the string. E.g. > > > struct FileArgument { > char name[max name len] > }; > > > FileArguments sits as member inside DCmdArgument. DCmdArgument or DCmdArgumentWithParser sits as member in the various XXXDCmd classes. > > Those are created in DCmdFactory::create_local_DCmd(). Which is what, a static global list? So we only have one global XXXDCmd object instance per command, but for each command invocation re-parse the argument values? What a weird concept. > > Man, this coding is way too convoluted for a little parser engine :( > > But anyway, inlining the filename array into FileArgument should be probably fine from a size standpoint. I would, however, not use JVM_MAXPATHLEN or anything that depends ultimately on PATH_MAX from system headers. We don't want the object to consume e.g. an MB if some crazy platform defines PATH_MAX as 1MB. Therefore I would use e.g. 1024 as limit for the path name. > > (Note that PATH_MAX is an illusion anyway, there is never a guarantee that a path is smaller than that limit... See this good article: https://insanecoding.blogspot.com/2007/11/pathmax-simply-isnt.html) Note that the reason for the leak is probably the fact that you don't clear old values on parse_value. See e.g. how char* does it. However, since you allocate with a constant size anyway, the buffer size never changes, you could just as well either follow my advice above (inlining), or just re-use the existing pointer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1685308132 From stuefe at openjdk.org Sat Jul 20 10:52:31 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 20 Jul 2024 10:52:31 GMT Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on output() [v2] In-Reply-To: <1IQ-_rNXXSFB9LAsP0kbK3MAQSOgKKrqZxFC8tZzrkc=.8b2c00dd-bddf-4362-9125-36ad2042e794@github.com> References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com> <1IQ-_rNXXSFB9LAsP0kbK3MAQSOgKKrqZxFC8tZzrkc=.8b2c00dd-bddf-4362-9125-36ad2042e794@github.com> Message-ID: <NvPlnC4JRx-09-a8I0k1n3G917Z8cWMjKQDQ4yKyUTM=.88931077-7c70-4b10-9ef3-70ad66812890@github.com> On Fri, 19 Jul 2024 20:17:46 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to. >> >> Testing: >> - [x] Added test case passes. >> >> Thanks, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Ensuring test case deletes file in case of exception src/hotspot/share/code/codeCache.hpp line 226: > 224: static void print_summary(outputStream* st, bool detailed = true); // Prints a summary of the code cache usage > 225: static void log_state(outputStream* st); > 226: LINUX_ONLY(static void write_perf_map(const char* filename, outputStream* st);) Please add a comment about what the stream `st` is supposed to be. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20257#discussion_r1685374253 From jsjolen at openjdk.org Sat Jul 20 11:22:32 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Sat, 20 Jul 2024 11:22:32 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v5] In-Reply-To: <OtnhXPtAh2B02PSOOvQndDLToT517SqsTHcLQq_eeVM=.4111e376-a331-4aef-bce2-375f7dec5531@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com> <OtnhXPtAh2B02PSOOvQndDLToT517SqsTHcLQq_eeVM=.4111e376-a331-4aef-bce2-375f7dec5531@github.com> Message-ID: <lxIweFtdKy3V3X5w6Z0RlVPT0gLUjp1wr0RQQIfcfQw=.7d4c4d60-8bec-404e-8f71-c0357d81984d@github.com> On Fri, 19 Jul 2024 19:17:54 GMT, Leonid Mesnik <lmesnik at openjdk.org> wrote: >> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: >> >> Missing copyright header update > > src/hotspot/share/prims/wbtestmethods/parserTests.cpp line 132: > >> 130: } else if (strcmp(type, "FILE") == 0) { >> 131: DCmdArgument<FileArgument> *argument = >> 132: new DCmdArgument<FileArgument>(name, desc, "FILE", mandatory); > > Please check indentation. On top: We hug the `*`s next to the type in Hotspot, not next to the var name. So `DCmdArgument<FileArgument>* argument`. This is something to check for all new code. Pre-existing: The indentation of the if-block is wrong. Also, @SoniaZaldana, would you mind changing the code to this (does not include your change), the repetition just made me cringe ?. ```c++ DCmdArgument<char*>* argument = nullptr; if (strcmp(type, "STRING") == 0) { argument = new DCmdArgument<char*>(name, desc, "STRING", mandatory, default_value); } else if (strcmp(type, "NANOTIME") == 0) { DCmdArgument<NanoTimeArgument>* argument = new DCmdArgument<NanoTimeArgument>(name, desc, "NANOTIME", mandatory, default_value); } else if (strcmp(type, "JLONG") == 0) { DCmdArgument<jlong>* argument = new DCmdArgument<jlong>(name, desc, "JLONG", mandatory, default_value); } else if (strcmp(type, "BOOLEAN") == 0) { DCmdArgument<bool>* argument = new DCmdArgument<bool>(name, desc, "BOOLEAN", mandatory, default_value); } else if (strcmp(type, "MEMORYSIZE") == 0) { DCmdArgument<MemorySizeArgument>* argument = new DCmdArgument<MemorySizeArgument>(name, desc, "MEMORY SIZE", mandatory, default_value); } else if (strcmp(type, "STRINGARRAY") == 0) { DCmdArgument<StringArrayArgument*>* argument = new DCmdArgument<StringArrayArgument*>(name, desc, "STRING SET", mandatory); } if (argument != nullptr) { if (isarg) { parser->add_dcmd_argument(argument); } else { parser->add_dcmd_option(argument); } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1685384131 From stuefe at openjdk.org Sat Jul 20 12:09:31 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 20 Jul 2024 12:09:31 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v5] In-Reply-To: <lxIweFtdKy3V3X5w6Z0RlVPT0gLUjp1wr0RQQIfcfQw=.7d4c4d60-8bec-404e-8f71-c0357d81984d@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com> <OtnhXPtAh2B02PSOOvQndDLToT517SqsTHcLQq_eeVM=.4111e376-a331-4aef-bce2-375f7dec5531@github.com> <lxIweFtdKy3V3X5w6Z0RlVPT0gLUjp1wr0RQQIfcfQw=.7d4c4d60-8bec-404e-8f71-c0357d81984d@github.com> Message-ID: <ob8IA7dbncuc-wuqsfs0sIFK7bOSXm8qsvPEPSAGqtw=.c5b1f0eb-b706-4cee-bf97-109be01e22af@github.com> On Sat, 20 Jul 2024 11:18:34 GMT, Johan Sj?len <jsjolen at openjdk.org> wrote: >> src/hotspot/share/prims/wbtestmethods/parserTests.cpp line 132: >> >>> 130: } else if (strcmp(type, "FILE") == 0) { >>> 131: DCmdArgument<FileArgument> *argument = >>> 132: new DCmdArgument<FileArgument>(name, desc, "FILE", mandatory); >> >> Please check indentation. > > On top: We hug the `*`s next to the type in Hotspot, not next to the var name. So `DCmdArgument<FileArgument>* argument`. This is something to check for all new code. > > Pre-existing: The indentation of the if-block is wrong. > > Also, @SoniaZaldana, would you mind changing the code to this (does not include your change), the repetition just made me cringe ?. > > ```c++ > DCmdArgument<char*>* argument = nullptr; > if (strcmp(type, "STRING") == 0) { > argument = new DCmdArgument<char*>(name, desc, "STRING", mandatory, default_value); > } else if (strcmp(type, "NANOTIME") == 0) { > DCmdArgument<NanoTimeArgument>* argument = new DCmdArgument<NanoTimeArgument>(name, desc, "NANOTIME", mandatory, default_value); > } else if (strcmp(type, "JLONG") == 0) { > DCmdArgument<jlong>* argument = new DCmdArgument<jlong>(name, desc, "JLONG", mandatory, default_value); > } else if (strcmp(type, "BOOLEAN") == 0) { > DCmdArgument<bool>* argument = new DCmdArgument<bool>(name, desc, "BOOLEAN", mandatory, default_value); > } else if (strcmp(type, "MEMORYSIZE") == 0) { > DCmdArgument<MemorySizeArgument>* argument = new DCmdArgument<MemorySizeArgument>(name, desc, "MEMORY SIZE", mandatory, default_value); > } else if (strcmp(type, "STRINGARRAY") == 0) { > DCmdArgument<StringArrayArgument*>* argument = new DCmdArgument<StringArrayArgument*>(name, desc, "STRING SET", mandatory); > } > > if (argument != nullptr) { > if (isarg) { > parser->add_dcmd_argument(argument); > } else { > parser->add_dcmd_option(argument); > } > } @jdksjolen > Also, @SoniaZaldana, would you mind changing the code to this Even simpler (did not test, but you get my drift): #define ALL_TYPES_DO_XX(what) \ what(char*, "STRING") \ what(NanoTimeArgument, NANOTIME) \ what(jlong, "JLONG") ... etc then #define XX(TYPE, NAME) \ if (strcmp(type, NAME) == 0) { \ DCmdArgument<TYPE>* argument = new DCmdArgument<TYPE>(name, desc, NAME, mandatory, mandatory, default_value); \ } ALL_TYPES_DO_XX(XX) #undef XX ;-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1685411741 From aph at openjdk.org Sun Jul 21 08:18:37 2024 From: aph at openjdk.org (Andrew Haley) Date: Sun, 21 Jul 2024 08:18:37 GMT Subject: RFR: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23: runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be represented in type 'long int' In-Reply-To: <enP03_lP52TO93btMHFUgUECzw7mvDL2AHCnCH8pz00=.b92d9d89-14b1-4a06-9ff7-8b754aac9c1d@github.com> References: <rs1CTZ0ODrwbGEqtAapeGhQtMkN1VyPeM-8O8385sPM=.824b60d9-0f40-4a30-8929-a0fcda9d7169@github.com> <enP03_lP52TO93btMHFUgUECzw7mvDL2AHCnCH8pz00=.b92d9d89-14b1-4a06-9ff7-8b754aac9c1d@github.com> Message-ID: <wI7-7g5DlDaZ0yH29BAWRAvWSARzUWb0S_s_2VzcPkg=.6a1b3f63-d600-4c81-a459-08c00d8373bf@github.com> On Fri, 19 Jul 2024 22:41:11 GMT, Dean Long <dlong at openjdk.org> wrote: >> Cast the result of `nth_bit(n)` to `uintptr_t` to prevent signed integer overflow error reported by `ubsan`. Unsigned overflow is not undefined behavior and is not checked by `ubsan`. > > src/hotspot/share/utilities/globalDefinitions.hpp line 1069: > >> 1067: // (note: #define used only so that they can be used in enum constant definitions) >> 1068: #define nth_bit(n) (((n) >= BitsPerWord) ? 0 : (OneBit << (n))) >> 1069: #define right_n_bits(n) ((uintptr_t) nth_bit(n) - 1) > > This changes the return type of right_n_bits, which could break existing code. If we need an unsigned version, I think it should have a different name. And this is at best a partial fix: `OneBit << 63` overflows. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20164#discussion_r1685678905 From mdoerr at openjdk.org Sun Jul 21 08:39:39 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Sun, 21 Jul 2024 08:39:39 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: <sEEh7ndeGQznpxAqNtapGJU6dT96EXBNoS3QyVcOn_g=.e0a3525d-690c-4f2c-aca1-48c4975bfb65@github.com> References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com> <sEEh7ndeGQznpxAqNtapGJU6dT96EXBNoS3QyVcOn_g=.e0a3525d-690c-4f2c-aca1-48c4975bfb65@github.com> Message-ID: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> On Thu, 20 Jun 2024 04:17:30 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Build barrier data in G1BarrierSetC2::get_store_barrier() by adding, rather than removing, barrier tags I have looked at the x86 implementation and I have some performance tuning ideas. Please take a look. I guess at least some of your code is performance critical. src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 86: > 84: // an indirect memory operand) to reduce C2's scheduling and register > 85: // allocation pressure (fewer Mach nodes). The same holds for g1StoreN and > 86: // g1EncodePAndStoreN. I'm not convinced that this is beneficial. We're wasting a temp register just for an addition? src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 123: > 121: if ((barrier_data() & G1C2BarrierPost) != 0) { > 122: __ movl($tmp2$$Register, $src$$Register); > 123: if ((barrier_data() & G1C2BarrierPostNotNull) == 0) { `decode_heap_oop` contains a null check in some cases which makes some of your code redundant. Optimization idea: In case of `(((barrier_data() & G1C2BarrierPostNotNull) == 0) && CompressedOops::base() != nullptr)` use a null check and bail out because there's nothing left to do if it's null. After that, we can always use `decode_heap_oop_not_null`. src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 182: > 180: $tmp2$$Register /* pre_val */, > 181: $tmp3$$Register /* tmp */, > 182: RegSet::of($mem$$Register, $newval$$Register, $oldval$$Register) /* preserve */); The only value which can get overwritten is `oldval`. Optimization idea: Pass `oldval` to the SATB barrier. There is no load of the old value required. src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 301: > 299: RegSet::of($mem$$Register, $newval$$Register) /* preserve */); > 300: __ movq($tmp1$$Register, $newval$$Register); > 301: __ xchgq($newval$$Register, Address($mem$$Register, 0)); Optimization idea: Despite its name, `g1_pre_write_barrier` can be moved after the xchg operation because there's no safepoint within this MachNode. This allows avoiding loading the old value twice. ------------- PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2190271351 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1685680587 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1685682308 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1685683332 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1685683768 From jsjolen at openjdk.org Sun Jul 21 08:58:35 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Sun, 21 Jul 2024 08:58:35 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v5] In-Reply-To: <ob8IA7dbncuc-wuqsfs0sIFK7bOSXm8qsvPEPSAGqtw=.c5b1f0eb-b706-4cee-bf97-109be01e22af@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com> <OtnhXPtAh2B02PSOOvQndDLToT517SqsTHcLQq_eeVM=.4111e376-a331-4aef-bce2-375f7dec5531@github.com> <lxIweFtdKy3V3X5w6Z0RlVPT0gLUjp1wr0RQQIfcfQw=.7d4c4d60-8bec-404e-8f71-c0357d81984d@github.com> <ob8IA7dbncuc-wuqsfs0sIFK7bOSXm8qsvPEPSAGqtw=.c5b1f0eb-b706-4cee-bf97-109be01e22af@github.com> Message-ID: <QuyVjUXmgR2l6pgq-3dvYkKCjLNi5raS7pwt7OagyeY=.783df525-878e-402c-820e-8ac7150dfa97@github.com> On Sat, 20 Jul 2024 12:06:33 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> On top: We hug the `*`s next to the type in Hotspot, not next to the var name. So `DCmdArgument<FileArgument>* argument`. This is something to check for all new code. >> >> Pre-existing: The indentation of the if-block is wrong. >> >> Also, @SoniaZaldana, would you mind changing the code to this (does not include your change), the repetition just made me cringe ?. >> >> ```c++ >> DCmdArgument<char*>* argument = nullptr; >> if (strcmp(type, "STRING") == 0) { >> argument = new DCmdArgument<char*>(name, desc, "STRING", mandatory, default_value); >> } else if (strcmp(type, "NANOTIME") == 0) { >> DCmdArgument<NanoTimeArgument>* argument = new DCmdArgument<NanoTimeArgument>(name, desc, "NANOTIME", mandatory, default_value); >> } else if (strcmp(type, "JLONG") == 0) { >> DCmdArgument<jlong>* argument = new DCmdArgument<jlong>(name, desc, "JLONG", mandatory, default_value); >> } else if (strcmp(type, "BOOLEAN") == 0) { >> DCmdArgument<bool>* argument = new DCmdArgument<bool>(name, desc, "BOOLEAN", mandatory, default_value); >> } else if (strcmp(type, "MEMORYSIZE") == 0) { >> DCmdArgument<MemorySizeArgument>* argument = new DCmdArgument<MemorySizeArgument>(name, desc, "MEMORY SIZE", mandatory, default_value); >> } else if (strcmp(type, "STRINGARRAY") == 0) { >> DCmdArgument<StringArrayArgument*>* argument = new DCmdArgument<StringArrayArgument*>(name, desc, "STRING SET", mandatory); >> } >> >> if (argument != nullptr) { >> if (isarg) { >> parser->add_dcmd_argument(argument); >> } else { >> parser->add_dcmd_option(argument); >> } >> } > > @jdksjolen > >> Also, @SoniaZaldana, would you mind changing the code to this > > Even simpler (did not test, but you get my drift): > > > #define ALL_TYPES_DO_XX(what) \ > what(char*, "STRING") \ > what(NanoTimeArgument, NANOTIME) \ > what(jlong, "JLONG") > ... etc > > then > > > #define XX(TYPE, NAME) \ > if (strcmp(type, NAME) == 0) { \ > DCmdArgument<TYPE>* argument = new DCmdArgument<TYPE>(name, desc, NAME, mandatory, mandatory, default_value); \ > } > ALL_TYPES_DO_XX(XX) > #undef XX > > > ;-) Sonia, my bad if you already know this stuff but since it's fairly esoteric knowledge nowadays I'd like to help you out in advance: Thomas is proposing the usage of a X macro https://en.wikipedia.org/wiki/X_macro These can be found throughout Hotspot, you can find an example definition and usage in `logTag.hpp` and `logTag.cpp`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1685688287 From stuefe at openjdk.org Sun Jul 21 10:11:31 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 21 Jul 2024 10:11:31 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v5] In-Reply-To: <QuyVjUXmgR2l6pgq-3dvYkKCjLNi5raS7pwt7OagyeY=.783df525-878e-402c-820e-8ac7150dfa97@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com> <OtnhXPtAh2B02PSOOvQndDLToT517SqsTHcLQq_eeVM=.4111e376-a331-4aef-bce2-375f7dec5531@github.com> <lxIweFtdKy3V3X5w6Z0RlVPT0gLUjp1wr0RQQIfcfQw=.7d4c4d60-8bec-404e-8f71-c0357d81984d@github.com> <ob8IA7dbncuc-wuqsfs0sIFK7bOSXm8qsvPEPSAGqtw=.c5b1f0eb-b706-4cee-bf97-109be01e22af@github.com> <QuyVjUXmgR2l6pgq-3dvYkKCjLNi5raS7pwt7OagyeY=.783df525-878e-402c-820e-8ac7150dfa97@github.com> Message-ID: <A3pXePPLllhBqQUHUSx6sR7iEZm9rB0nOFr90TXKHMQ=.57917ac7-d787-42b3-aaad-2c9e1285725f@github.com> On Sun, 21 Jul 2024 08:55:35 GMT, Johan Sj?len <jsjolen at openjdk.org> wrote: >> @jdksjolen >> >>> Also, @SoniaZaldana, would you mind changing the code to this >> >> Even simpler (did not test, but you get my drift): >> >> >> #define ALL_TYPES_DO_XX(what) \ >> what(char*, "STRING") \ >> what(NanoTimeArgument, NANOTIME) \ >> what(jlong, "JLONG") >> ... etc >> >> then >> >> >> #define XX(TYPE, NAME) \ >> if (strcmp(type, NAME) == 0) { \ >> DCmdArgument<TYPE>* argument = new DCmdArgument<TYPE>(name, desc, NAME, mandatory, mandatory, default_value); \ >> } >> ALL_TYPES_DO_XX(XX) >> #undef XX >> >> >> ;-) > > Sonia, my bad if you already know this stuff but since it's fairly esoteric knowledge nowadays I'd like to help you out in advance: Thomas is proposing the usage of a X macro https://en.wikipedia.org/wiki/X_macro > > These can be found throughout Hotspot, you can find an example definition and usage in `logTag.hpp` and `logTag.cpp`. @SoniaZaldana Note that this is very much optional. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1685701779 From dholmes at openjdk.org Mon Jul 22 01:23:39 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 22 Jul 2024 01:23:39 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v5] In-Reply-To: <SDkUb2GIxTALba_5Dc-t3qK_zov3SEzBIMWzOsb2T0M=.1f420a00-732a-4917-a111-a0124240d4f6@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com> <OtnhXPtAh2B02PSOOvQndDLToT517SqsTHcLQq_eeVM=.4111e376-a331-4aef-bce2-375f7dec5531@github.com> <EIIwRleptsdSt_X7tJdQ1mbPYD4RhXcpqA3b4UBb8mU=.ed64e38d-1aed-4304-9d53-1aa3ed434f89@github.com> <SDkUb2GIxTALba_5Dc-t3qK_zov3SEzBIMWzOsb2T0M=.1f420a00-732a-4917-a111-a0124240d4f6@github.com> Message-ID: <lciD2DICObBi9SOnTBJY35-VpJUtAJqLzPc_Ciei5ac=.9ac61a07-bb47-4d91-8d93-365a9d3c4f05@github.com> On Sat, 20 Jul 2024 07:50:46 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Yes. In this file, other commands use `fatal` only where reading the hard-coded default values - in the various `init_...` functions. Hard-coded values should be valid, obviously, otherwise the JVM developer messed up. Other values are passed in by the end user via jcmd and should not crash the JVM. > > I see the prevalent way to deal with runtime parse errors is to throw a java exception. That exception later is caught in the command processing loop at the entrance of the attach listener thread. > > So, @SoniaZaldana, I would do this here too - when in Rome... > > But is this not unnecessarily complex? It requires the AttachListener to be a java thread when in fact it does need no java - we just misuse java exception handling as a way to pass error information up the stack, with the simple ultimate goal of writing error information into the outputStream to be sent to the caller. We might just as well pass the outputStream* to the parse_xxx functions as third argument, and write directly and return some error code. This would make the attach listener thread a lot less dependent on Java and more robust - at least for jcmds that don't need Java (which jcmds need java?). > > After all, the attach listener is supposed to be super robust and always work even if the JVM misbehaves. @dholmes-ora @lmesnik what do you guys think, should we change that? (obviously in a different RFE) If the attach listener thread doesn't actually need to be a Java thread then you could look into changing that. Not sure it would really buy us that much in terms of added robustness though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1685861533 From dholmes at openjdk.org Mon Jul 22 01:27:42 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 22 Jul 2024 01:27:42 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v5] In-Reply-To: <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com> Message-ID: <63Fx1KzzGDy5E1pMf0HOb3P9o6gD6Rtps3YJYu-MLyY=.539c62f1-c36a-4fe4-8c57-ab44d932d4cb@github.com> On Fri, 19 Jul 2024 14:07:12 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. >> >> This PR addresses the following diagnostic commands: >> - [x] Compiler.perfmap >> - [x] GC.heap_dump >> - [x] System.dump_map >> - [x] Thread.dump_to_file >> - [x] VM.cds >> >> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). >> >> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, >> >> >> filename (Optional) Name of the file to which the flight recording data is >> written when the recording is stopped. If no filename is given, a >> filename is generated from the PID and the current date and is >> placed in the directory where the process was started. The >> filename may also be a directory in which case, the filename is >> generated from the PID and the current date in the specified >> directory. (STRING, no default value) >> >> Note: If a filename is given, '%p' in the filename will be >> replaced by the PID, and '%t' will be replaced by the time in >> 'yyyy_MM_dd_HH_mm_ss' format. >> >> >> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. >> >> Testing: >> >> - [x] Added test case passes. >> - [x] Modified existing VM.cds tests to also check for `%p` filenames. >> >> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). >> >> Cheers, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Missing copyright header update src/hotspot/share/services/diagnosticArgument.cpp line 361: > 359: void DCmdArgument<FileArgument>::parse_value(const char *str, size_t len, > 360: TRAPS) { > 361: if (str == NULL) { s/NULL/nullptr src/hotspot/share/services/diagnosticArgument.cpp line 372: > 370: > 371: template <> void DCmdArgument<FileArgument>::init_value(TRAPS) { > 372: if (has_default() && _default_string != NULL) { s/NULL/nullptr ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1685862082 PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1685862613 From mli at openjdk.org Mon Jul 22 07:06:38 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 22 Jul 2024 07:06:38 GMT Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v4] In-Reply-To: <FZMjsZWO9NKx4v5svo8qQPE5HKqvoiM-lc0oiDCah80=.2d250429-524a-4e93-a453-bf1db0238626@github.com> References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com> <FZMjsZWO9NKx4v5svo8qQPE5HKqvoiM-lc0oiDCah80=.2d250429-524a-4e93-a453-bf1db0238626@github.com> Message-ID: <T_c96_qdjC0dGxSe6JWkdvzfBOy3fiszOcTowErU2TQ=.398cdb88-cf71-4a45-ab22-7a34947db7e0@github.com> On Tue, 2 Jul 2024 14:16:35 GMT, Hamlin Li <mli at openjdk.org> wrote: >> Hi, >> Can you help to review the patch? >> >> I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it. >> >> Thanks. >> >> ## Test >> benchmarks run on CanVM-K230 (vlenb == 16), and banana-pi (vlenb == 32) >> >> I've tried several implementations, respectively with vector group >> * m2+m1+scalar >> * m2+scalar >> * m1+scalar >> * pure scalar >> The best one is combination of m2+m1, it have best performance in all source size. >> >> ### K230 >> >> this implementation (m2+m1) >> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> >> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic >> -- | -- | -- | -- | -- | -- | -- | -- | -- >> Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063 >> Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443 >> Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364 >> Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856 >> Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146 >> Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423 >> Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862 >> Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469 >> Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631 >> Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698 >> Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876 >> >> </google-sheets-html-origin> >> >> vector with only m2 >> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 4... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > move label Hi, Can I get another review of this pr? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19973#issuecomment-2242234520 From shade at openjdk.org Mon Jul 22 08:49:12 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 22 Jul 2024 08:49:12 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v2] In-Reply-To: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> Message-ID: <ZJj4fYHqnd5jkIRau4mSsU409_JidyOnKLTpqbNqoFY=.78a4eb10-1311-4d15-a148-f4e3fec17bd3@github.com> > See bug for more discussion. > > Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193 > > A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice: > https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160 > > AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone. > > I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields. > > Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit. > > C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner. > > Additional testing: > - [x] New IR tests > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, jcstre... Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge branch 'master' into JDK-8333791-stable-field-barrier - Variant 2: Only final-field like semantics for stable inits - Variant 3: Handle everything, including reads by compilers ------------- Changes: https://git.openjdk.org/jdk/pull/19635/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19635&range=01 Stats: 1063 lines in 16 files changed: 1023 ins; 20 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/19635.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19635/head:pull/19635 PR: https://git.openjdk.org/jdk/pull/19635 From shade at openjdk.org Mon Jul 22 08:49:12 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 22 Jul 2024 08:49:12 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields In-Reply-To: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> Message-ID: <a7AAzVU7g5n4oMfJNoSjp3xbQsJ9GavCWtaurdpdrWA=.083ab17f-52ad-4141-b672-a4ca5ab960d0@github.com> On Mon, 10 Jun 2024 18:05:09 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > See bug for more discussion. > > Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193 > > A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice: > https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160 > > AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone. > > I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields. > > Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit. > > C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner. > > Additional testing: > - [x] New IR tests > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, jcstre... Still waiting for formal reviews, please :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2242415734 From thartmann at openjdk.org Mon Jul 22 11:31:32 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 22 Jul 2024 11:31:32 GMT Subject: RFR: 8336245: AArch64: remove extra register copy when converting from long to pointer In-Reply-To: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> Message-ID: <LQaSi1l1PrOsNt4-4DWLguJQH9wxiywwTCh8TkGSo4U=.6b642f32-f804-4aad-8e07-aaea9ed23cc1@github.com> On Fri, 12 Jul 2024 13:44:25 GMT, Fei Gao <fgao at openjdk.org> wrote: > In the cases like: > > UNSAFE.putLong(address + off1 + 1030, lseed); > UNSAFE.putLong(address + 1023, lseed); > UNSAFE.putLong(address + off2 + 1001, lseed); > > > Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like: > > ldr R10, [R15, #120] # int ! Field: address > ldr R11, [R16, #136] # int ! Field: off1 > ldr R12, [R16, #144] # int ! Field: off2 > add R11, R11, R10 > mov R11, R11 # long -> ptr > add R12, R12, R10 > mov R10, R10 # long -> ptr > add R11, R11, #1030 # ptr > str R17, [R11] # int > add R10, R10, #1023 # ptr > str R17, [R10] # int > mov R10, R12 # long -> ptr > add R10, R10, #1001 # ptr > str R17, [R10] # int > > > In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly: > > ldr x10, [x15,#120] > ldp x11, x12, [x16,#136] > add x11, x11, x10 > add x12, x12, x10 > add x11, x11, #0x406 > str x17, [x11] > add x10, x10, #0x3ff > str x17, [x10] > mov x10, x12 <--- extra register copy > add x10, x10, #0x3e9 > str x17, [x10] > > > There is still one extra register copy, which we're trying to remove in this patch. > > This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition. > > Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so > > [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906 All tests passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20157#issuecomment-2242730776 From szaldana at openjdk.org Mon Jul 22 13:45:49 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 22 Jul 2024 13:45:49 GMT Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on output() [v3] In-Reply-To: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com> References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com> Message-ID: <i39AX00lgta2Gqe3KDTcr2ahB1bIkBMvRS_gHq58w4g=.780a2f9f-0479-4546-9f85-5d6bc1e99da1@github.com> > Hi all, > > This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to. > > Testing: > - [x] Added test case passes. > > Thanks, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: Adding comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20257/files - new: https://git.openjdk.org/jdk/pull/20257/files/6129d87c..237f751a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20257&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20257&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20257.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20257/head:pull/20257 PR: https://git.openjdk.org/jdk/pull/20257 From aph at openjdk.org Mon Jul 22 14:03:34 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 22 Jul 2024 14:03:34 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v3] In-Reply-To: <7JeIjy2PKvI4EZpDain1vd0dBRlWjgjp42xPeY0bHMs=.fee63987-dd85-486d-b7d3-67e52fdbee6f@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> <7JeIjy2PKvI4EZpDain1vd0dBRlWjgjp42xPeY0bHMs=.fee63987-dd85-486d-b7d3-67e52fdbee6f@github.com> Message-ID: <cfKy-VTUht4Fbtb5-paKJZvCVAar1mq6Y0d0pDbkFQE=.1aa56b36-3067-4357-89ed-d1d8c3f64426@github.com> On Thu, 11 Jul 2024 22:53:42 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> src/hotspot/share/oops/instanceKlass.cpp line 1410: >> >>> 1408: return nullptr; >>> 1409: } else if (num_extra_slots == 0) { >>> 1410: if (num_extra_slots == 0 && interfaces->length() <= 1) { >> >> Since `secondary_supers` are hashed unconditionally now, is `interfaces->length() <= 1` check still needed? > > Also, `num_extra_slots == 0` check is redundant. > Since `secondary_supers` are hashed unconditionally now, is `interfaces->length() <= 1` check still needed? I don't think so, no. Our incoming `transitive_interfaces` is formed by concatenating the interface lists of our superclasses and superinterfaces. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1686607068 From aph at openjdk.org Mon Jul 22 14:18:33 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 22 Jul 2024 14:18:33 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v3] In-Reply-To: <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> Message-ID: <WOY02iAdeWi-IgqSfHkkydfPyRxH1TpsYPYvFD8sRv0=.befb015d-0622-492a-87ab-fe52d0b1fa64@github.com> On Fri, 5 Jul 2024 22:30:09 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Andrew Haley has updated the pull request incrementally with four additional commits since the last revision: >> >> - Review feedback >> - Review feedback >> - Review feedback >> - Cleanup check_klass_subtype_fast_path for AArch64, deleting dead code > > src/hotspot/share/oops/klass.cpp line 175: > >> 173: if (secondary_supers()->at(i) == k) { >> 174: if (UseSecondarySupersCache) { >> 175: ((Klass*)this)->set_secondary_super_cache(k); > > Does it make sense to assert `UseSecondarySupersCache` in `Klass::set_secondary_super_cache()`? I kinda hate this because we're casting away `const`, which is UB. I think I'd just take it out, but once I do that, I don't think anything sets `_secondary_super_cache`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1686631030 From kevinw at openjdk.org Mon Jul 22 14:32:33 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Mon, 22 Jul 2024 14:32:33 GMT Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on output() [v3] In-Reply-To: <i39AX00lgta2Gqe3KDTcr2ahB1bIkBMvRS_gHq58w4g=.780a2f9f-0479-4546-9f85-5d6bc1e99da1@github.com> References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com> <i39AX00lgta2Gqe3KDTcr2ahB1bIkBMvRS_gHq58w4g=.780a2f9f-0479-4546-9f85-5d6bc1e99da1@github.com> Message-ID: <f9HcTrrpOw3OimtNWfyRqIXWKwBm8E8Ft0xFv_8d6jc=.481ec6f2-95a3-4eeb-9744-3cfefdf9cb84@github.com> On Mon, 22 Jul 2024 13:45:49 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to. >> >> Testing: >> - [x] Added test case passes. >> >> Thanks, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Adding comment src/hotspot/share/code/codeCache.cpp line 1804: > 1802: fileStream fs(filename, "w"); > 1803: if (!fs.is_open()) { > 1804: st->print_cr("Failed to create %s for perf map", filename); Hi -- as this used to be "log_warning" and print something like: [1129077.636s][warning][codecache] Failed to create /x for perf map ..it should probably say: Warning: Failed to...etc... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20257#discussion_r1686652944 From kevinw at openjdk.org Mon Jul 22 14:35:32 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Mon, 22 Jul 2024 14:35:32 GMT Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on output() [v3] In-Reply-To: <i39AX00lgta2Gqe3KDTcr2ahB1bIkBMvRS_gHq58w4g=.780a2f9f-0479-4546-9f85-5d6bc1e99da1@github.com> References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com> <i39AX00lgta2Gqe3KDTcr2ahB1bIkBMvRS_gHq58w4g=.780a2f9f-0479-4546-9f85-5d6bc1e99da1@github.com> Message-ID: <AVNqzuDJA0GcYQgHi3y8FhuPH30ZD0dKdOFrWj6Iaxc=.bcb850de-a604-4441-9977-fd720118bd7b@github.com> On Mon, 22 Jul 2024 13:45:49 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to. >> >> Testing: >> - [x] Added test case passes. >> >> Thanks, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Adding comment Looks good. Yes the DCmds in diagnosticCommand.cpp tend to use outputStream* output (or out) as a param, this may help make it more obvious what the stream is for. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20257#issuecomment-2243113330 From kevinw at openjdk.org Mon Jul 22 14:43:33 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Mon, 22 Jul 2024 14:43:33 GMT Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on output() [v3] In-Reply-To: <i39AX00lgta2Gqe3KDTcr2ahB1bIkBMvRS_gHq58w4g=.780a2f9f-0479-4546-9f85-5d6bc1e99da1@github.com> References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com> <i39AX00lgta2Gqe3KDTcr2ahB1bIkBMvRS_gHq58w4g=.780a2f9f-0479-4546-9f85-5d6bc1e99da1@github.com> Message-ID: <bQkYgB9eODkqc5_zOIerg-TkznUX9-IKQQiVbnpO_b0=.328dc96f-1945-4e17-8e6a-aa9aee15e2fb@github.com> On Mon, 22 Jul 2024 13:45:49 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to. >> >> Testing: >> - [x] Added test case passes. >> >> Thanks, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Adding comment test/hotspot/jtreg/serviceability/dcmd/compiler/PerfMapTest.java line 124: > 122: OutputAnalyzer output = new JMXExecutor().execute("Compiler.perfmap %s".formatted(path)); > 123: output.shouldContain("Failed to create nonexistent/%s for perf map".formatted(test_dir)); > 124: output.shouldNotHaveExitValue(0); I'm curious if this exit value check works, as jcmd failures like this show "Command executed successfully" and return 0 for success. These compiler tests have chosen JMXExecutor and PidJcmdExecutor which might be relevant. Interested to know if JMXExecutor returns a non-zero exit value for this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20257#discussion_r1686674484 From aph at openjdk.org Mon Jul 22 14:59:33 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 22 Jul 2024 14:59:33 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v3] In-Reply-To: <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> Message-ID: <wgY2erz716MCi6K6DcUKEqLyd6E82ArMlba9qHdAA9o=.de21daa5-b078-4469-a6eb-df548f699f65@github.com> On Thu, 11 Jul 2024 23:07:43 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Andrew Haley has updated the pull request incrementally with four additional commits since the last revision: >> >> - Review feedback >> - Review feedback >> - Review feedback >> - Cleanup check_klass_subtype_fast_path for AArch64, deleting dead code > > src/hotspot/share/oops/klass.inline.hpp line 117: > >> 115: } >> 116: >> 117: inline bool Klass::search_secondary_supers(Klass *k) const { > > I see you moved `Klass::search_secondary_supers` in `klass.inline.hpp`, but I'm not sure how it interacts with `Klass::is_subtype_of` (the sole caller) being declared in `klass.hpp`. > > Will the inlining still happen if `Klass::is_subtype_of()` callers include `klass.hpp`? Presumably this question applies to every function in `klass.inline.hpp`? Practically everything does `#include "oops/klass.inline.hpp"`. It's inlined in about 120 files, as far as I can see everywhere such queries are made. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1686697935 From aph at openjdk.org Mon Jul 22 15:08:44 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 22 Jul 2024 15:08:44 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v3] In-Reply-To: <BolXJ-8qekfYskirR9P20jAQZW6s7WPe4A-oija7RA8=.855251f0-4246-403d-a9fe-00b9406f07e3@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> <A2v60vdAPL9qb22NB6kLVyuCACPDeqHUYoYFRFX6ig0=.9ef6f86b-559d-463a-9061-d0bbb6093aa7@github.com> <ukQ_tEZztKeBZnn8TDo3YfJ4GI0mHUrVRZmgM4d1W1g=.1fc9f9f2-c2bf-4237-94d4-dd9aae26411b@github.com> <BolXJ-8qekfYskirR9P20jAQZW6s7WPe4A-oija7RA8=.855251f0-4246-403d-a9fe-00b9406f07e3@github.com> Message-ID: <eLDcJyPLboqZr-8yk1kxVfV6WTaRYXZq5lZvDoIEFKM=.c87b23c8-d9c5-45ff-a2dd-5f0c4875cb62@github.com> On Thu, 18 Jul 2024 20:11:03 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: > Alternatively, `Klass::is_subtype_of()` can unconditionally perform linear search over secondary_supers array. > > Even though I very much like to see table lookup written in C++ (accompanying heavily optimized platform-specific MacroAssembler variants), it would make C++ runtime even simpler. It would, but there is something to be said for being able to provide a fast "no" answer for interface membership. I'll agree it's probably not a huge difference. I guess `is_cloneable_fast()` exists only because searching the interfaces is slow. Also, if table lookup is written in C++ but not used, it will rot. Also also, `Klass::is_subtype_of()` is used for C1 runtime. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1686707925 From szaldana at openjdk.org Mon Jul 22 15:36:47 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 22 Jul 2024 15:36:47 GMT Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on output() [v4] In-Reply-To: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com> References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com> Message-ID: <ZIRiuNc2vsX1131QzFVFV8CzgGZHMKYICw8MSMppCiA=.1c8f4be7-35d7-41c6-9ec1-319d375411ae@github.com> > Hi all, > > This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to. > > Testing: > - [x] Added test case passes. > > Thanks, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: Updating warning message ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20257/files - new: https://git.openjdk.org/jdk/pull/20257/files/237f751a..a2e46173 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20257&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20257&range=02-03 Stats: 3 lines in 2 files changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20257.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20257/head:pull/20257 PR: https://git.openjdk.org/jdk/pull/20257 From szaldana at openjdk.org Mon Jul 22 15:36:47 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 22 Jul 2024 15:36:47 GMT Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on output() [v3] In-Reply-To: <bQkYgB9eODkqc5_zOIerg-TkznUX9-IKQQiVbnpO_b0=.328dc96f-1945-4e17-8e6a-aa9aee15e2fb@github.com> References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com> <i39AX00lgta2Gqe3KDTcr2ahB1bIkBMvRS_gHq58w4g=.780a2f9f-0479-4546-9f85-5d6bc1e99da1@github.com> <bQkYgB9eODkqc5_zOIerg-TkznUX9-IKQQiVbnpO_b0=.328dc96f-1945-4e17-8e6a-aa9aee15e2fb@github.com> Message-ID: <uQtz_rHgh64P0E9HIBQQ54DALvBp2HfegG22ZDFimqI=.5ba67d11-37ec-456b-93af-3a8d080dd4c3@github.com> On Mon, 22 Jul 2024 14:41:16 GMT, Kevin Walls <kevinw at openjdk.org> wrote: >> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding comment > > test/hotspot/jtreg/serviceability/dcmd/compiler/PerfMapTest.java line 124: > >> 122: OutputAnalyzer output = new JMXExecutor().execute("Compiler.perfmap %s".formatted(path)); >> 123: output.shouldContain("Failed to create nonexistent/%s for perf map".formatted(test_dir)); >> 124: output.shouldNotHaveExitValue(0); > > I'm curious if this exit value check works, as jcmd failures like this show "Command executed successfully" and return 0 for success. > These compiler tests have chosen JMXExecutor and PidJcmdExecutor which might be relevant. Interested to know if JMXExecutor returns a non-zero exit value for this? Hi Kevin, Yes, I noticed the test was exiting with a non-zero value when I was testing. After giving it a bit more thought, that check is not the main purpose of the test and I'm not entirely sure why the JMXExecutor behaves that way. I'll just remove the exit value check to avoid confusion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20257#discussion_r1686750734 From fgao at openjdk.org Mon Jul 22 16:01:38 2024 From: fgao at openjdk.org (Fei Gao) Date: Mon, 22 Jul 2024 16:01:38 GMT Subject: RFR: 8336245: AArch64: remove extra register copy when converting from long to pointer In-Reply-To: <LQaSi1l1PrOsNt4-4DWLguJQH9wxiywwTCh8TkGSo4U=.6b642f32-f804-4aad-8e07-aaea9ed23cc1@github.com> References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> <LQaSi1l1PrOsNt4-4DWLguJQH9wxiywwTCh8TkGSo4U=.6b642f32-f804-4aad-8e07-aaea9ed23cc1@github.com> Message-ID: <urZOIbDXPwEJSdpYPYCaDsqbB87cm3sfeSAVMFHAUeU=.6dd2957d-7f07-4880-8864-32249a5e74bc@github.com> On Mon, 22 Jul 2024 11:29:07 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: > All tests passed. @TobiHartmann thanks for your testing! Jcstress also passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20157#issuecomment-2243302556 From kevinw at openjdk.org Mon Jul 22 16:34:32 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Mon, 22 Jul 2024 16:34:32 GMT Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on output() [v3] In-Reply-To: <uQtz_rHgh64P0E9HIBQQ54DALvBp2HfegG22ZDFimqI=.5ba67d11-37ec-456b-93af-3a8d080dd4c3@github.com> References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com> <i39AX00lgta2Gqe3KDTcr2ahB1bIkBMvRS_gHq58w4g=.780a2f9f-0479-4546-9f85-5d6bc1e99da1@github.com> <bQkYgB9eODkqc5_zOIerg-TkznUX9-IKQQiVbnpO_b0=.328dc96f-1945-4e17-8e6a-aa9aee15e2fb@github.com> <uQtz_rHgh64P0E9HIBQQ54DALvBp2HfegG22ZDFimqI=.5ba67d11-37ec-456b-93af-3a8d080dd4c3@github.com> Message-ID: <z_BmsLBrn2IUlx6mTNBXLAnwLwRit04MVeLcmBxz3Eg=.d451fe73-e101-439e-98d8-b3c891a37b38@github.com> On Mon, 22 Jul 2024 15:33:27 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> test/hotspot/jtreg/serviceability/dcmd/compiler/PerfMapTest.java line 124: >> >>> 122: OutputAnalyzer output = new JMXExecutor().execute("Compiler.perfmap %s".formatted(path)); >>> 123: output.shouldContain("Failed to create nonexistent/%s for perf map".formatted(test_dir)); >>> 124: output.shouldNotHaveExitValue(0); >> >> I'm curious if this exit value check works, as jcmd failures like this show "Command executed successfully" and return 0 for success. >> These compiler tests have chosen JMXExecutor and PidJcmdExecutor which might be relevant. Interested to know if JMXExecutor returns a non-zero exit value for this? > > Hi Kevin, > > Yes, I noticed the test was exiting with a non-zero value when I was testing. After giving it a bit more thought, that check is not the main purpose of the test and I'm not entirely sure why the JMXExecutor behaves that way. I'll just remove the exit value check to avoid confusion. I think it was returning an "exit value" of -1, the test/lib/jdk/test/lib/process/OutputBuffer.java default. JMXExecutor doesn't set one as there isn't an exit code... That could be clearer, actually there must be various issues in that area. But yes just don't check exit code for a JMX Executor, and jcmd would return zero but we don't want to embed that as a requirement, it's really a failure. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20257#discussion_r1686829542 From aph at openjdk.org Mon Jul 22 16:39:33 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 22 Jul 2024 16:39:33 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v3] In-Reply-To: <eLDcJyPLboqZr-8yk1kxVfV6WTaRYXZq5lZvDoIEFKM=.c87b23c8-d9c5-45ff-a2dd-5f0c4875cb62@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> <A2v60vdAPL9qb22NB6kLVyuCACPDeqHUYoYFRFX6ig0=.9ef6f86b-559d-463a-9061-d0bbb6093aa7@github.com> <ukQ_tEZztKeBZnn8TDo3YfJ4GI0mHUrVRZmgM4d1W1g=.1fc9f9f2-c2bf-4237-94d4-dd9aae26411b@github.com> <BolXJ-8qekfYskirR9P20jAQZW6s7WPe4A-oija7RA8=.855251f0-4246-403d-a9fe-00b9406f07e3@github.com> <eLDcJyPLboqZr-8yk1kxVfV6WTaRYXZq5lZvDoIEFKM=.c87b23c8-d9c5-45ff-a2dd-5f0c4875cb62@github.com> Message-ID: <UAjH__AKdU3UMdJBkg7TlElKSA8mEFFE0MiElVrYexE=.4bc67a26-3383-4e4e-92b0-f1d3d33c5ce2@github.com> On Mon, 22 Jul 2024 15:03:12 GMT, Andrew Haley <aph at openjdk.org> wrote: >> Alternatively, `Klass::is_subtype_of()` can unconditionally perform linear search over secondary_supers array. >> >> Even though I very much like to see table lookup written in C++ (accompanying heavily optimized platform-specific MacroAssembler variants), it would make C++ runtime even simpler. > >> Alternatively, `Klass::is_subtype_of()` can unconditionally perform linear search over secondary_supers array. >> >> Even though I very much like to see table lookup written in C++ (accompanying heavily optimized platform-specific MacroAssembler variants), it would make C++ runtime even simpler. > > It would, but there is something to be said for being able to provide a fast "no" answer for interface membership. I'll agree it's probably not a huge difference. I guess `is_cloneable_fast()` exists only because searching the interfaces is slow. > Also, if table lookup is written in C++ but not used, it will rot. > Also also, `Klass::is_subtype_of()` is used for C1 runtime. Thinking about it some more, I don't really mind. There may be some virtue to moving lookup_secondary_supers_table() to a comment in the back end(s), and the expansion of population_count() is rather bloaty. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1686835253 From aph at openjdk.org Mon Jul 22 16:50:47 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 22 Jul 2024 16:50:47 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v4] In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: <uHjma_H_iMAeOvm_sfZVc0ifNLwfGdgoV5JyIJTl7uA=.68facdd5-8681-4264-975c-d1a4b6a8eef4@github.com> > This patch expands the use of a hash table for secondary superclasses > to the interpreter, C1, and runtime. It also adds a C2 implementation > of hashed lookup in cases where the superclass isn't known at compile > time. > > HotSpot shared runtime > ---------------------- > > Building hashed secondary tables is now unconditional. It takes very > little time, and now that the shared runtime always has the tables, it > might as well take advantage of them. The shared code is easier to > follow now, I think. > > There might be a performance issue with x86-64 in that we build > HotSpot for a default x86-64 target that does not support popcount. > This means that HotSpot C++ runtime on x86 always uses a software > emulation for popcount, even though the vast majority of machines made > for the past 20 years can do popcount in a single instruction. It > wouldn't be terribly hard to do something about that. > > Having said that, the software popcount is really not bad. > > x86 > --- > > x86 is rather tricky, because we still support > `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as > well as 32- and 64-bit ports. There's some further complication in > that only `RCX` can be used as a shift count, so there's some register > shuffling to do. All of this makes the logic in macroAssembler_x86.cpp > rather gnarly, with multiple levels of conditionals at compile time > and runtime. > > AArch64 > ------- > > AArch64 is considerably more straightforward. We always have a > popcount instruction and (thankfully) no 32-bit code to worry about. > > Generally > --------- > > I would dearly love simply to rip out the "old" secondary supers cache > support, but I've left it in just in case someone has a performance > regression. > > The versions of `MacroAssembler::lookup_secondary_supers_table` that > work with variable superclasses don't take a fixed set of temp > registers, and neither do they call out to to a slow path subroutine. > Instead, the slow patch is expanded inline. > > I don't think this is necessarily bad. Apart from the very rare cases > where C2 can't determine the superclass to search for at compile time, > this code is only used for generating stubs, and it seemed to me > ridiculous to have stubs calling other stubs. > > I've followed the guidance from @iwanowww not to obsess too much about > the performance of C1-compiled secondary supers lookups, and to prefer > simplicity over absolute performance. Nonetheless, this is a > complicated patch that touches many areas. Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: - Review comments - Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19989/files - new: https://git.openjdk.org/jdk/pull/19989/files/98f6b2b7..c252efcb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=02-03 Stats: 41 lines in 10 files changed: 9 ins; 17 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/19989.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989 PR: https://git.openjdk.org/jdk/pull/19989 From aph at openjdk.org Mon Jul 22 16:50:48 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 22 Jul 2024 16:50:48 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v4] In-Reply-To: <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> Message-ID: <QRaDiTIgGhnhvj1km2MOoIDYXKGjnzC04OoEkYgUrxU=.cdd1b266-2380-4c72-884a-163ef267be74@github.com> On Thu, 11 Jul 2024 23:57:27 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: >> >> - Review comments >> - Review comments > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4810: > >> 4808: Label* L_success, >> 4809: Label* L_failure) { >> 4810: // NB! Callers may assume that, when temp2_reg is a valid register, > > Oh, that's a subtle point... Can we make it more evident at call sites? Done. I think the only code that still depends on it is the C2 pattern that uses check_klass_subtype_slow_path_linear in x86_32.ad and x86_64.ad. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1686845837 From aph at openjdk.org Mon Jul 22 17:10:33 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 22 Jul 2024 17:10:33 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v4] In-Reply-To: <uHjma_H_iMAeOvm_sfZVc0ifNLwfGdgoV5JyIJTl7uA=.68facdd5-8681-4264-975c-d1a4b6a8eef4@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <uHjma_H_iMAeOvm_sfZVc0ifNLwfGdgoV5JyIJTl7uA=.68facdd5-8681-4264-975c-d1a4b6a8eef4@github.com> Message-ID: <LVhS1u3QiLX3D5SyyTrlIIERslY6e1CDagmo0ngb7VE=.bf70305a-383d-4d6d-a4fa-40613767487f@github.com> On Mon, 22 Jul 2024 16:50:47 GMT, Andrew Haley <aph at openjdk.org> wrote: >> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't known at compile >> time. >> >> HotSpot shared runtime >> ---------------------- >> >> Building hashed secondary tables is now unconditional. It takes very >> little time, and now that the shared runtime always has the tables, it >> might as well take advantage of them. The shared code is easier to >> follow now, I think. >> >> There might be a performance issue with x86-64 in that we build >> HotSpot for a default x86-64 target that does not support popcount. >> This means that HotSpot C++ runtime on x86 always uses a software >> emulation for popcount, even though the vast majority of machines made >> for the past 20 years can do popcount in a single instruction. It >> wouldn't be terribly hard to do something about that. >> >> Having said that, the software popcount is really not bad. >> >> x86 >> --- >> >> x86 is rather tricky, because we still support >> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as >> well as 32- and 64-bit ports. There's some further complication in >> that only `RCX` can be used as a shift count, so there's some register >> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp >> rather gnarly, with multiple levels of conditionals at compile time >> and runtime. >> >> AArch64 >> ------- >> >> AArch64 is considerably more straightforward. We always have a >> popcount instruction and (thankfully) no 32-bit code to worry about. >> >> Generally >> --------- >> >> I would dearly love simply to rip out the "old" secondary supers cache >> support, but I've left it in just in case someone has a performance >> regression. >> >> The versions of `MacroAssembler::lookup_secondary_supers_table` that >> work with variable superclasses don't take a fixed set of temp >> registers, and neither do they call out to to a slow path subroutine. >> Instead, the slow patch is expanded inline. >> >> I don't think this is necessarily bad. Apart from the very rare cases >> where C2 can't determine the superclass to search for at compile time, >> this code is only used for generating stubs, and it seemed to me >> ridiculous to have stubs calling other stubs. >> >> I've followed the guidance from @iwanowww not to obsess too much about >> the performance of C1-compiled secondary supers lookups, and to prefer >> simplicity over absolute performance. Nonetheless, this i... > > Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: > > - Review comments > - Review comments All done apart from the questions of: 1. Should `Klass::linear_search_secondary_supers() const` call `set_secondary_super_cache()`? (Strong no from me. It's UB.) 2. Should we use a straight linear search for secondary C++ supers in the runtime, i.e.not changing it for now? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2243430000 From aph at openjdk.org Mon Jul 22 17:19:46 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 22 Jul 2024 17:19:46 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v5] In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: <6P17gX_V6nL3hsgbuPrGN4Y8nzyoQMs3fTLaiRaOzwA=.e3eb0ea0-d41c-4222-a1f3-65f9075dbb4d@github.com> > This patch expands the use of a hash table for secondary superclasses > to the interpreter, C1, and runtime. It also adds a C2 implementation > of hashed lookup in cases where the superclass isn't known at compile > time. > > HotSpot shared runtime > ---------------------- > > Building hashed secondary tables is now unconditional. It takes very > little time, and now that the shared runtime always has the tables, it > might as well take advantage of them. The shared code is easier to > follow now, I think. > > There might be a performance issue with x86-64 in that we build > HotSpot for a default x86-64 target that does not support popcount. > This means that HotSpot C++ runtime on x86 always uses a software > emulation for popcount, even though the vast majority of machines made > for the past 20 years can do popcount in a single instruction. It > wouldn't be terribly hard to do something about that. > > Having said that, the software popcount is really not bad. > > x86 > --- > > x86 is rather tricky, because we still support > `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as > well as 32- and 64-bit ports. There's some further complication in > that only `RCX` can be used as a shift count, so there's some register > shuffling to do. All of this makes the logic in macroAssembler_x86.cpp > rather gnarly, with multiple levels of conditionals at compile time > and runtime. > > AArch64 > ------- > > AArch64 is considerably more straightforward. We always have a > popcount instruction and (thankfully) no 32-bit code to worry about. > > Generally > --------- > > I would dearly love simply to rip out the "old" secondary supers cache > support, but I've left it in just in case someone has a performance > regression. > > The versions of `MacroAssembler::lookup_secondary_supers_table` that > work with variable superclasses don't take a fixed set of temp > registers, and neither do they call out to to a slow path subroutine. > Instead, the slow patch is expanded inline. > > I don't think this is necessarily bad. Apart from the very rare cases > where C2 can't determine the superclass to search for at compile time, > this code is only used for generating stubs, and it seemed to me > ridiculous to have stubs calling other stubs. > > I've followed the guidance from @iwanowww not to obsess too much about > the performance of C1-compiled secondary supers lookups, and to prefer > simplicity over absolute performance. Nonetheless, this is a > complicated patch that touches many areas. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19989/files - new: https://git.openjdk.org/jdk/pull/19989/files/c252efcb..02cfd130 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=03-04 Stats: 18 lines in 6 files changed: 1 ins; 0 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/19989.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989 PR: https://git.openjdk.org/jdk/pull/19989 From duke at openjdk.org Mon Jul 22 18:57:00 2024 From: duke at openjdk.org (Henry Lin) Date: Mon, 22 Jul 2024 18:57:00 GMT Subject: RFR: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23: runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be represented in type 'long int' [v2] In-Reply-To: <rs1CTZ0ODrwbGEqtAapeGhQtMkN1VyPeM-8O8385sPM=.824b60d9-0f40-4a30-8929-a0fcda9d7169@github.com> References: <rs1CTZ0ODrwbGEqtAapeGhQtMkN1VyPeM-8O8385sPM=.824b60d9-0f40-4a30-8929-a0fcda9d7169@github.com> Message-ID: <XwiF_fbIIjpihvHwT3ZjBry0ZvI-SRwNbY5Mp79yEa4=.77273086-6e03-4cef-bbd2-f3f1517f0430@github.com> > Cast the result of `nth_bit(n)` to `uintptr_t` to prevent signed integer overflow error reported by `ubsan`. Unsigned overflow is not undefined behavior and is not checked by `ubsan`. Henry Lin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'openjdk:master' into 833269790-overflow - 8332697: fix ubsan:shenandoahSimpleBitMap.inline.hpp runtime integer overflow ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20164/files - new: https://git.openjdk.org/jdk/pull/20164/files/ed2797fe..75d921cc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20164&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20164&range=00-01 Stats: 6383 lines in 291 files changed: 4501 ins; 914 del; 968 mod Patch: https://git.openjdk.org/jdk/pull/20164.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20164/head:pull/20164 PR: https://git.openjdk.org/jdk/pull/20164 From szaldana at openjdk.org Mon Jul 22 19:49:08 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 22 Jul 2024 19:49:08 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v6] In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> Message-ID: <Q1D2x__4N9ElYI5FHhkgxNT9elpOvYcjSyim00C0EfE=.c241f5bd-840c-40ff-8157-9be769e8ef99@github.com> > Hi all, > > This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. > > This PR addresses the following diagnostic commands: > - [x] Compiler.perfmap > - [x] GC.heap_dump > - [x] System.dump_map > - [x] Thread.dump_to_file > - [x] VM.cds > > Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). > > I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, > > > filename (Optional) Name of the file to which the flight recording data is > written when the recording is stopped. If no filename is given, a > filename is generated from the PID and the current date and is > placed in the directory where the process was started. The > filename may also be a directory in which case, the filename is > generated from the PID and the current date in the specified > directory. (STRING, no default value) > > Note: If a filename is given, '%p' in the filename will be > replaced by the PID, and '%t' will be replaced by the time in > 'yyyy_MM_dd_HH_mm_ss' format. > > > Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. > > Testing: > > - [x] Added test case passes. > - [x] Modified existing VM.cds tests to also check for `%p` filenames. > > Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). > > Cheers, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with three additional commits since the last revision: - Fixing memory leak - Fixing pointer style, s/NULL/nullptr, and exception - Cleaning up parserTests.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20198/files - new: https://git.openjdk.org/jdk/pull/20198/files/cdf1d457..801fc582 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=04-05 Stats: 78 lines in 4 files changed: 9 ins; 16 del; 53 mod Patch: https://git.openjdk.org/jdk/pull/20198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198 PR: https://git.openjdk.org/jdk/pull/20198 From szaldana at openjdk.org Mon Jul 22 20:03:08 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 22 Jul 2024 20:03:08 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v7] In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> Message-ID: <5UHSCkGbA7jXwwEfE8ou0LzvPd5flc7M9ZwbNhZFFvM=.c677a49b-c98c-42c9-81df-b366379aefa9@github.com> > Hi all, > > This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. > > This PR addresses the following diagnostic commands: > - [x] Compiler.perfmap > - [x] GC.heap_dump > - [x] System.dump_map > - [x] Thread.dump_to_file > - [x] VM.cds > > Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). > > I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, > > > filename (Optional) Name of the file to which the flight recording data is > written when the recording is stopped. If no filename is given, a > filename is generated from the PID and the current date and is > placed in the directory where the process was started. The > filename may also be a directory in which case, the filename is > generated from the PID and the current date in the specified > directory. (STRING, no default value) > > Note: If a filename is given, '%p' in the filename will be > replaced by the PID, and '%t' will be replaced by the time in > 'yyyy_MM_dd_HH_mm_ss' format. > > > Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. > > Testing: > > - [x] Added test case passes. > - [x] Modified existing VM.cds tests to also check for `%p` filenames. > > Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). > > Cheers, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: Error messaging format ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20198/files - new: https://git.openjdk.org/jdk/pull/20198/files/801fc582..517db0cd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=05-06 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198 PR: https://git.openjdk.org/jdk/pull/20198 From szaldana at openjdk.org Mon Jul 22 20:06:33 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 22 Jul 2024 20:06:33 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v5] In-Reply-To: <A3pXePPLllhBqQUHUSx6sR7iEZm9rB0nOFr90TXKHMQ=.57917ac7-d787-42b3-aaad-2c9e1285725f@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com> <OtnhXPtAh2B02PSOOvQndDLToT517SqsTHcLQq_eeVM=.4111e376-a331-4aef-bce2-375f7dec5531@github.com> <lxIweFtdKy3V3X5w6Z0RlVPT0gLUjp1wr0RQQIfcfQw=.7d4c4d60-8bec-404e-8f71-c0357d81984d@github.com> <ob8IA7dbncuc-wuqsfs0sIFK7bOSXm8qsvPEPSAGqtw=.c5b1f0eb-b706-4cee-bf97-109be01e22af@github.com> <QuyVjUXmgR2l6pgq-3dvYkKCjLNi5raS7pwt7OagyeY=.783df525-878e-402c-820e-8ac7150dfa97@github.com> <A3pXePPLllhBqQUHUSx6sR7iEZm9rB0nOFr90TXKHMQ=.57917ac7-d787-42b3-aaad-2c9e1285725f@github.com> Message-ID: <_frxtxQ56OXre5svEw_F8AOS7t-bOT0wSP1rcIh_hOI=.b8544cfb-66bd-4431-a63c-0599b2a38f08@github.com> On Sun, 21 Jul 2024 10:08:38 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Sonia, my bad if you already know this stuff but since it's fairly esoteric knowledge nowadays I'd like to help you out in advance: Thomas is proposing the usage of a X macro https://en.wikipedia.org/wiki/X_macro >> >> These can be found throughout Hotspot, you can find an example definition and usage in `logTag.hpp` and `logTag.cpp`. > > @SoniaZaldana Note that this is very much optional. Hi folks, thanks for the pointers! I wasn't familiar with X macros and after some time toying around with them, I'm sad to report that I am not a fan (yet!). I implemented it and ended up breaking part of the tests. I quickly realized that debugging these is a bit harder for less experienced c++ developers (like myself). So, just wanted to note: - I cleaned up the indentation in this function as it was all wrong. - I didn't get rid of the repetition. Tried to but quickly realized we can't pull the DCmdArgument out of the if statements as they're different types. And note to self, to keep reviewing X macros because they did shorten the code a lot when I implemented them. Perhaps I'll give it another go in a different RFE. Sorry it's not what either of you hoped for! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1687080108 From szaldana at openjdk.org Mon Jul 22 20:06:34 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 22 Jul 2024 20:06:34 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v5] In-Reply-To: <CxwbysRpQzuTNFd7180Kh5neBkx9mMJOcct-eqCzrIQ=.5393f979-4fe1-445f-a99d-a49515eec5fe@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com> <znqG3L1DK9JuvU1fIlXfIEu8pp5t_LxIhITS1xvOPBc=.43cc6737-3086-4c2a-bad5-627ab1e91ca6@github.com> <CxwbysRpQzuTNFd7180Kh5neBkx9mMJOcct-eqCzrIQ=.5393f979-4fe1-445f-a99d-a49515eec5fe@github.com> Message-ID: <5-itK2a-qgShmcLpo2Dvj1OUds2U1PF9uqgi8eO5Odk=.fc97e659-13d7-4c5e-800a-16ebdcf3e809@github.com> On Sat, 20 Jul 2024 08:02:34 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> src/hotspot/share/services/diagnosticArgument.hpp line 66: >> >>> 64: public: >>> 65: char *_name; >>> 66: }; >> >> Something is off about this. What is the lifetime of this object? >> >> You don't free it. Running a command in a loop will consume C-heap (you can check this with NMT: `jcmd VM.native_memory baseline`, then run a command 100 times, then `jcmd VM.native_memory summary.diff` will show you the leak in mtInternal. >> >> I would probably just inline the string. E.g. >> >> >> struct FileArgument { >> char name[max name len] >> }; >> >> >> FileArguments sits as member inside DCmdArgument. DCmdArgument or DCmdArgumentWithParser sits as member in the various XXXDCmd classes. >> >> Those are created in DCmdFactory::create_local_DCmd(). Which is what, a static global list? So we only have one global XXXDCmd object instance per command, but for each command invocation re-parse the argument values? What a weird concept. >> >> Man, this coding is way too convoluted for a little parser engine :( >> >> But anyway, inlining the filename array into FileArgument should be probably fine from a size standpoint. I would, however, not use JVM_MAXPATHLEN or anything that depends ultimately on PATH_MAX from system headers. We don't want the object to consume e.g. an MB if some crazy platform defines PATH_MAX as 1MB. Therefore I would use e.g. 1024 as limit for the path name. >> >> (Note that PATH_MAX is an illusion anyway, there is never a guarantee that a path is smaller than that limit... See this good article: https://insanecoding.blogspot.com/2007/11/pathmax-simply-isnt.html) > > Note that the reason for the leak is probably the fact that you don't clear old values on parse_value. See e.g. how char* does it. However, since you allocate with a constant size anyway, the buffer size never changes, you could just as well either follow my advice above (inlining), or just re-use the existing pointer. Hi Thomas, Yes - this was an oversight on my end. I was not directly calling the `destroy_value()` function. I tried to follow more closely what?s done for `char*`, as I like the consistency throughout. I did a quick check and I don?t see any more leaks with NMT. Does the new change make sense to you as well? Thank you for the feedback! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1687081245 From szaldana at openjdk.org Mon Jul 22 20:08:32 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 22 Jul 2024 20:08:32 GMT Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on output() [v3] In-Reply-To: <z_BmsLBrn2IUlx6mTNBXLAnwLwRit04MVeLcmBxz3Eg=.d451fe73-e101-439e-98d8-b3c891a37b38@github.com> References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com> <i39AX00lgta2Gqe3KDTcr2ahB1bIkBMvRS_gHq58w4g=.780a2f9f-0479-4546-9f85-5d6bc1e99da1@github.com> <bQkYgB9eODkqc5_zOIerg-TkznUX9-IKQQiVbnpO_b0=.328dc96f-1945-4e17-8e6a-aa9aee15e2fb@github.com> <uQtz_rHgh64P0E9HIBQQ54DALvBp2HfegG22ZDFimqI=.5ba67d11-37ec-456b-93af-3a8d080dd4c3@github.com> <z_BmsLBrn2IUlx6mTNBXLAnwLwRit04MVeLcmBxz3Eg=.d451fe73-e101-439e-98d8-b3c891a37b38@github.com> Message-ID: <qsOX2Vh_5cBb-iHFz68RS1NbphfEmFcjV_bFcjbSArE=.2c6450df-2198-4523-a852-00d29d9a797c@github.com> On Mon, 22 Jul 2024 16:31:32 GMT, Kevin Walls <kevinw at openjdk.org> wrote: > I think it was returning an "exit value" of -1, the test/lib/jdk/test/lib/process/OutputBuffer.java default. Correct, that was the exit value. > But yes just don't check exit code for a JMX Executor, and jcmd would return zero but we don't want to embed that as a requirement, it's really a failure. Agreed. I removed the exit status check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20257#discussion_r1687083247 From duke at openjdk.org Mon Jul 22 20:29:03 2024 From: duke at openjdk.org (Henry Lin) Date: Mon, 22 Jul 2024 20:29:03 GMT Subject: RFR: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23: runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be represented in type 'long int' [v3] In-Reply-To: <rs1CTZ0ODrwbGEqtAapeGhQtMkN1VyPeM-8O8385sPM=.824b60d9-0f40-4a30-8929-a0fcda9d7169@github.com> References: <rs1CTZ0ODrwbGEqtAapeGhQtMkN1VyPeM-8O8385sPM=.824b60d9-0f40-4a30-8929-a0fcda9d7169@github.com> Message-ID: <DZ1Cgjr0Osj6By1c7iu2eROuYThWFRwYgnGgnqmxWmI=.9b961b43-4d57-4081-aeda-723968175328@github.com> > Cast the result of `nth_bit(n)` to `uintptr_t` to prevent signed integer overflow error reported by `ubsan`. Unsigned overflow is not undefined behavior and is not checked by `ubsan`. Henry Lin has updated the pull request incrementally with one additional commit since the last revision: revert right_n_bits and add unsigned right_n_bits to shenandoahSimpleBitMap.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20164/files - new: https://git.openjdk.org/jdk/pull/20164/files/75d921cc..f2011961 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20164&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20164&range=01-02 Stats: 13 lines in 4 files changed: 6 ins; 1 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20164.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20164/head:pull/20164 PR: https://git.openjdk.org/jdk/pull/20164 From duke at openjdk.org Mon Jul 22 21:16:07 2024 From: duke at openjdk.org (Henry Lin) Date: Mon, 22 Jul 2024 21:16:07 GMT Subject: RFR: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23: runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be represented in type 'long int' [v4] In-Reply-To: <rs1CTZ0ODrwbGEqtAapeGhQtMkN1VyPeM-8O8385sPM=.824b60d9-0f40-4a30-8929-a0fcda9d7169@github.com> References: <rs1CTZ0ODrwbGEqtAapeGhQtMkN1VyPeM-8O8385sPM=.824b60d9-0f40-4a30-8929-a0fcda9d7169@github.com> Message-ID: <o3jkDOZH-LfQNZkoYyz2t1W1cqYWRdsmJY2L58OLO34=.50d80985-f269-482e-973d-fbc04668bd7a@github.com> > Cast the result of `nth_bit(n)` to `uintptr_t` to prevent signed integer overflow error reported by `ubsan`. Unsigned overflow is not undefined behavior and is not checked by `ubsan`. Henry Lin has updated the pull request incrementally with one additional commit since the last revision: formatting ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20164/files - new: https://git.openjdk.org/jdk/pull/20164/files/f2011961..9b798b37 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20164&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20164&range=02-03 Stats: 2 lines in 2 files changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20164.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20164/head:pull/20164 PR: https://git.openjdk.org/jdk/pull/20164 From lmesnik at openjdk.org Mon Jul 22 21:49:31 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 22 Jul 2024 21:49:31 GMT Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on output() [v4] In-Reply-To: <ZIRiuNc2vsX1131QzFVFV8CzgGZHMKYICw8MSMppCiA=.1c8f4be7-35d7-41c6-9ec1-319d375411ae@github.com> References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com> <ZIRiuNc2vsX1131QzFVFV8CzgGZHMKYICw8MSMppCiA=.1c8f4be7-35d7-41c6-9ec1-319d375411ae@github.com> Message-ID: <_w6qRLOoFJVbJ29HZIxuwJy-ikNOKsLLXfdgUmPQ-6M=.ea3356ed-e677-4aa9-8782-debc015b0084@github.com> On Mon, 22 Jul 2024 15:36:47 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to. >> >> Testing: >> - [x] Added test case passes. >> >> Thanks, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Updating warning message Marked as reviewed by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20257#pullrequestreview-2192606124 From duke at openjdk.org Mon Jul 22 22:09:31 2024 From: duke at openjdk.org (Henry Lin) Date: Mon, 22 Jul 2024 22:09:31 GMT Subject: RFR: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23: runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be represented in type 'long int' [v4] In-Reply-To: <o3jkDOZH-LfQNZkoYyz2t1W1cqYWRdsmJY2L58OLO34=.50d80985-f269-482e-973d-fbc04668bd7a@github.com> References: <rs1CTZ0ODrwbGEqtAapeGhQtMkN1VyPeM-8O8385sPM=.824b60d9-0f40-4a30-8929-a0fcda9d7169@github.com> <o3jkDOZH-LfQNZkoYyz2t1W1cqYWRdsmJY2L58OLO34=.50d80985-f269-482e-973d-fbc04668bd7a@github.com> Message-ID: <pYacwEisUE7HnGgH88lNFIp0zqFph6RsaivyoiOhnLY=.1017984f-a8f6-40e0-a13b-e36fcf82e632@github.com> On Mon, 22 Jul 2024 21:16:07 GMT, Henry Lin <duke at openjdk.org> wrote: >> Cast the result of `nth_bit(n)` to `uintptr_t` to prevent signed integer overflow error reported by `ubsan`. Unsigned overflow is not undefined behavior and is not checked by `ubsan`. > > Henry Lin has updated the pull request incrementally with one additional commit since the last revision: > > formatting Thanks for the feedback. I reverted my changes in `globalDefinitions.hpp` and added an unsigned version of `right_n_bits` in `shenandoahSimpleBitMap.hpp`. This new unsigned version replaces the usages that caused undefined overflow behavior in the `shenandoahSimpleBitMap` files. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20164#issuecomment-2243893659 From stuefe at openjdk.org Tue Jul 23 06:48:31 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 23 Jul 2024 06:48:31 GMT Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on output() [v4] In-Reply-To: <ZIRiuNc2vsX1131QzFVFV8CzgGZHMKYICw8MSMppCiA=.1c8f4be7-35d7-41c6-9ec1-319d375411ae@github.com> References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com> <ZIRiuNc2vsX1131QzFVFV8CzgGZHMKYICw8MSMppCiA=.1c8f4be7-35d7-41c6-9ec1-319d375411ae@github.com> Message-ID: <8VD-XmtZQBI9KdQVZfZIRjvQSgyAkXKy4TXK7jdLBa0=.07cc90b9-d704-445a-b4a8-f100e611d0aa@github.com> On Mon, 22 Jul 2024 15:36:47 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to. >> >> Testing: >> - [x] Added test case passes. >> >> Thanks, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Updating warning message Looks good ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20257#pullrequestreview-2193108607 From kevinw at openjdk.org Tue Jul 23 08:03:38 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Tue, 23 Jul 2024 08:03:38 GMT Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on output() [v4] In-Reply-To: <ZIRiuNc2vsX1131QzFVFV8CzgGZHMKYICw8MSMppCiA=.1c8f4be7-35d7-41c6-9ec1-319d375411ae@github.com> References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com> <ZIRiuNc2vsX1131QzFVFV8CzgGZHMKYICw8MSMppCiA=.1c8f4be7-35d7-41c6-9ec1-319d375411ae@github.com> Message-ID: <Ixj2Rbbz2GkbF0CUDu1TdPITELi56npMpMOYjM0-FPI=.5b081457-2b8b-4290-9fb9-2c77011dfa33@github.com> On Mon, 22 Jul 2024 15:36:47 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to. >> >> Testing: >> - [x] Added test case passes. >> >> Thanks, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Updating warning message Marked as reviewed by kevinw (Reviewer). Yes looks good. I only waited in case you were changing "outputStream* st" to be called out or output, to address Thomas' comment. (I notice now the added comment.) ------------- PR Review: https://git.openjdk.org/jdk/pull/20257#pullrequestreview-2193261838 PR Comment: https://git.openjdk.org/jdk/pull/20257#issuecomment-2244523438 From mbaesken at openjdk.org Tue Jul 23 09:55:05 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 23 Jul 2024 09:55:05 GMT Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' Message-ID: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com> When running with ubsan - enabled binaries, some tests trigger the following report : src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91 #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106 #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157 #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63 #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85 #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319 #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233 #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199 Seems in case of (at least) class SmallRegisterMap we miss handling nullptr . ------------- Commit messages: - JDK-8333354 Changes: https://git.openjdk.org/jdk/pull/20296/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20296&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333354 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20296.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20296/head:pull/20296 PR: https://git.openjdk.org/jdk/pull/20296 From mli at openjdk.org Tue Jul 23 11:27:00 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 23 Jul 2024 11:27:00 GMT Subject: RFR: 8335191: RISC-V: verify perf of chacha20 Message-ID: <w9XXvU5jMWne42lO3SFUElmDRhEP28G2xS8qo0oATe8=.f3bd16e6-0ed8-4bd5-aced-d59dfadc571a@github.com> Hi, Can you help to review this simple patch? Previously, we implemented this intrinsic for chacha20 algo based on vector instructions, the latest test on real hardwares (k230, bananapi) shows that the implementation only bring more performance gain rather than regression when (vlenb == 32, on bananapi), when vlenb == 16 (on k230) it only bring regression in all test cases. So, we should adjust when to turn on the intrinsic, ie. only when vlenb == 32. Thanks ## Performance ### on k230 vlenb == 16 <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> Benchmark - on k230, vlenb == 16 | (dataSize) | (keyLength) | (mode) | (padding) | (permutation) | Cnt | Score -no-intrinsic | Score +intrinsic | Error | Units | Non-intrinsic/intrinsic -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 256 | 256 | None | NoPadding | ChaCha20 | 10 | 4642.694 | 5216.699 | 36.039 | ns/op | 0.89 o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 1024 | 256 | None | NoPadding | ChaCha20 | 10 | 15719.612 | 17622.616 | 136.609 | ns/op | 0.892 o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 4096 | 256 | None | NoPadding | ChaCha20 | 10 | 59402.742 | 67124.28 | 651.011 | ns/op | 0.885 o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 16384 | 256 | None | NoPadding | ChaCha20 | 10 | 250056.475 | 269184.924 | 8591.727 | ns/op | 0.929 o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 256 | 256 | None | NoPadding | ChaCha20 | 10 | 4752.081 | 5131.094 | 38.917 | ns/op | 0.926 o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 1024 | 256 | None | NoPadding | ChaCha20 | 10 | 15554.484 | 16992.339 | 106.583 | ns/op | 0.915 o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 4096 | 256 | None | NoPadding | ChaCha20 | 10 | 61446.365 | 67359.67 | 548.353 | ns/op | 0.912 o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 16384 | 256 | None | NoPadding | ChaCha20 | 10 | 241653.654 | 270189.531 | 3705.045 | ns/op | 0.894 o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 256 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 17833.825 | 20610.118 | 688.668 | ns/op | 0.865 o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 1024 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 32872.633 | 36427.148 | 4339.823 | ns/op | 0.902 o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 4096 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 87398.821 | 96112.498 | 1028.342 | ns/op | 0.909 o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 16384 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 314533.305 | 342115.144 | 13633.382 | ns/op | 0.919 o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt | 256 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 12190.039 | 14844.154 | 111.009 | ns/op | 0.821 o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt | 1024 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 25734.516 | 30267.139 | 326.158 | ns/op | 0.85 o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt | 4096 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 81007.764 | 90623.578 | 572.987 | ns/op | 0.894 o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt | 16384 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 308229.077 | 343146.562 | 18801.368 | ns/op | 0.898 o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt | 16384 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 321267.148 | 340960.217 | 22253.659 | ns/op | 0.942 o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt | 16384 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 307476.57 | 341029.841 | 13851.386 | ns/op | 0.902 </google-sheets-html-origin> ### on bananapi vlenb == 32 <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> Benchmark - on bananas, vlenb == 32 | (dataSize) | (keyLength) | (mode) | (padding) | (permutation) | Cnt | Score +intrinsic | Score -intrinsic | Error | Units | improvement -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 256 | 256 | None | NoPadding | ChaCha20 | 10 | 4804.517 | 4154.869 | 2.951 | ns/op | 0.865 o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 1024 | 256 | None | NoPadding | ChaCha20 | 10 | 10782.788 | 14604.89 | 19.031 | ns/op | 1.354 o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 4096 | 256 | None | NoPadding | ChaCha20 | 10 | 39502.457 | 57211.53 | 69.436 | ns/op | 1.448 o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 16384 | 256 | None | NoPadding | ChaCha20 | 10 | 166005.925 | 228615.833 | 22.311 | ns/op | 1.377 o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 256 | 256 | None | NoPadding | ChaCha20 | 10 | 5040.652 | 4389.007 | 60.197 | ns/op | 0.871 o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 1024 | 256 | None | NoPadding | ChaCha20 | 10 | 11176.787 | 14530.768 | 12.192 | ns/op | 1.3 o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 4096 | 256 | None | NoPadding | ChaCha20 | 10 | 40875.87 | 56149.493 | 111.238 | ns/op | 1.374 o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 16384 | 256 | None | NoPadding | ChaCha20 | 10 | 166459.572 | 221221.334 | 1078.792 | ns/op | 1.329 o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 256 | 256 | None | NoPadding | ChaCha20-Poly1305 | 10 | 17781.57 | 14356.974 | 38.96 | ns/op | 0.807 o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 1024 | 256 | None | NoPadding | ChaCha20-Poly1305 | 10 | 26098.932 | 27368.785 | 52.171 | ns/op | 1.049 o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 4096 | 256 | None | NoPadding | ChaCha20-Poly1305 | 10 | 67351.38 | 82535.832 | 111.414 | ns/op | 1.225 o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 16384 | 256 | None | NoPadding | ChaCha20-Poly1305 | 10 | 235767.096 | 295121.502 | 1443.64 | ns/op | 1.252 o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt | 256 | 256 | None | NoPadding | ChaCha20-Poly1305 | 10 | 13634.202 | 10476.916 | 21.069 | ns/op | 0.768 o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt | 1024 | 256 | None | NoPadding | ChaCha20-Poly1305 | 10 | 22209.959 | 24513.545 | 23.072 | ns/op | 1.104 o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt | 4096 | 256 | None | NoPadding | ChaCha20-Poly1305 | 10 | 62540.238 | 78088.592 | 54.63 | ns/op | 1.249 o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt | 16384 | 256 | None | NoPadding | ChaCha20-Poly1305 | 10 | 225358.667 | 293718.246 | 314.449 | ns/op | 1.303 o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt | 16384 | 256 | None | NoPadding | ChaCha20-Poly1305 | 10 | 237810.351 | 295495.242 | 412.976 | ns/op | 1.243 o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt | 16384 | 256 | None | NoPadding | ChaCha20-Poly1305 | 10 | 230771.689 | 290751.264 | 315.883 | ns/op | 1.26 </google-sheets-html-origin> ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/20298/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20298&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335191 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20298.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20298/head:pull/20298 PR: https://git.openjdk.org/jdk/pull/20298 From jsjolen at openjdk.org Tue Jul 23 12:11:34 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 23 Jul 2024 12:11:34 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v5] In-Reply-To: <_frxtxQ56OXre5svEw_F8AOS7t-bOT0wSP1rcIh_hOI=.b8544cfb-66bd-4431-a63c-0599b2a38f08@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com> <OtnhXPtAh2B02PSOOvQndDLToT517SqsTHcLQq_eeVM=.4111e376-a331-4aef-bce2-375f7dec5531@github.com> <lxIweFtdKy3V3X5w6Z0RlVPT0gLUjp1wr0RQQIfcfQw=.7d4c4d60-8bec-404e-8f71-c0357d81984d@github.com> <ob8IA7dbncuc-wuqsfs0sIFK7bOSXm8qsvPEPSAGqtw=.c5b1f0eb-b706-4cee-bf97-109be01e22af@github.com> <QuyVjUXmgR2l6pgq-3dvYkKCjLNi5raS7pwt7OagyeY=.783df525-878e-402c-820e-8ac7150dfa97@github.com> <A3pXePPLllhBqQUHUSx6sR7iEZm9rB0nOFr90TXKHMQ=.57917ac7-d787-42b3-aaad-2c9e1285725f@github.com> <_frxtxQ56OXre5svEw_F8AOS7t-bOT0wSP1rcIh_hOI=.b8544cfb-66bd-4431-a63c-0599b2a38f08@github.com> Message-ID: <RW3IM-8JSTnZHhndCBdY_zqgkf2JUA_GYdZ99DM9ZPw=.7389febf-394a-4ac4-a06a-cc4efe9289bd@github.com> On Mon, 22 Jul 2024 20:02:40 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> @SoniaZaldana Note that this is very much optional. > > Hi folks, thanks for the pointers! I wasn't familiar with X macros and after some time toying around with them, I'm sad to report that I am not a fan (yet!). > > I implemented it and ended up breaking part of the tests. I quickly realized that debugging these is a bit harder for less experienced c++ developers (like myself). > > So, just wanted to note: > - I cleaned up the indentation in this function as it was all wrong. > - I didn't get rid of the repetition. Tried to but quickly realized we can't pull the DCmdArgument out of the if statements as they're different types. > > And note to self, to keep reviewing X macros because they did shorten the code a lot when I implemented them. Perhaps I'll give it another go in a different RFE. > > Sorry it's not what either of you hoped for! That's fine, thanks for having a go! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1687949210 From coleenp at openjdk.org Tue Jul 23 12:37:42 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 23 Jul 2024 12:37:42 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6] In-Reply-To: <u2VLk8hKBH5V6331fMIPCwusNARMd_v-q_wL_7r0AOA=.99b9b9f1-ac37-4cb6-9ad0-4e019fe3c1fe@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com> <H1xx5Q5Wsuz3cl0FP1fwX4kL-jYdqbQ3skKwYcd54vo=.bd7abee8-0300-4253-a8b4-428ae8da1a0e@github.com> <u2VLk8hKBH5V6331fMIPCwusNARMd_v-q_wL_7r0AOA=.99b9b9f1-ac37-4cb6-9ad0-4e019fe3c1fe@github.com> Message-ID: <7MDa4Z7FtvI5TG3rARV50PQckm3MSqOzBefku_lFwyc=.ead08ce2-1850-4803-a2eb-bd22cdcdd221@github.com> On Mon, 15 Jul 2024 00:44:02 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> src/hotspot/share/oops/instanceKlass.cpp line 1090: >> >>> 1088: >>> 1089: // Step 2 >>> 1090: // If we were to use wait() instead of waitUninterruptibly() then >> >> This is a nice correction (even though, the actual call below is wait_uninterruptibly() ;-) ), but seems totally unrelated. > > I was thinking it was referring to `ObjectSynchronizer::waitUninterruptibly` added the same commit as the comment b3bf31a0a08da679ec2fd21613243fb17b1135a9 git backout restored the old wrong comment. We should fix this separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1687985648 From duke at openjdk.org Tue Jul 23 13:28:36 2024 From: duke at openjdk.org (duke) Date: Tue, 23 Jul 2024 13:28:36 GMT Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on output() [v4] In-Reply-To: <ZIRiuNc2vsX1131QzFVFV8CzgGZHMKYICw8MSMppCiA=.1c8f4be7-35d7-41c6-9ec1-319d375411ae@github.com> References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com> <ZIRiuNc2vsX1131QzFVFV8CzgGZHMKYICw8MSMppCiA=.1c8f4be7-35d7-41c6-9ec1-319d375411ae@github.com> Message-ID: <GdyeMf2ruwa49XKOtvjjXcA8OJY9133NasLqD2h8D_c=.d510ba33-b25a-4601-8a65-c48727c9fcf5@github.com> On Mon, 22 Jul 2024 15:36:47 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to. >> >> Testing: >> - [x] Added test case passes. >> >> Thanks, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Updating warning message @SoniaZaldana Your change (at version a2e46173e7d260f8fbc1a9372090ea5867a65a29) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20257#issuecomment-2245252290 From duke at openjdk.org Tue Jul 23 13:57:36 2024 From: duke at openjdk.org (fitzsim) Date: Tue, 23 Jul 2024 13:57:36 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v11] In-Reply-To: <ml4dF0TwSQdlURT8ETSAN9RVnx3iMIVNDCWedq8lc1Y=.6b3da39c-41fc-460e-8632-d5a42be279ab@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com> <ml4dF0TwSQdlURT8ETSAN9RVnx3iMIVNDCWedq8lc1Y=.6b3da39c-41fc-460e-8632-d5a42be279ab@github.com> Message-ID: <XSv3vwmQVv2abJLfDCmKsELqrY9d2Ohe_FemTMIzxXw=.3351b8ca-91d0-4a0b-9154-f758c293dcdc@github.com> On Fri, 19 Jul 2024 09:18:13 GMT, Andrew Haley <aph at openjdk.org> wrote: > Compared to current implementation in #19185, my bit concern about [This branch](https://github.com/fitzsim/jdk/commits/regenerate-sleef-headers-1/) is the future maintainence effort when we need to update the sleef source along with the cmake changes, also when new platforms support of sleef are added in jdk. To check this, I [added](https://github.com/fitzsim/jdk/commits/regenerate-sleef-headers-2/) the `riscv64` `CMake` steps to `SleefCommon.gmk`. I had intended to factor out `SetupSleefHeader` anyway for `aarch64`, to eliminate copy-n-paste. After that, there was one build step divergence for `riscv64` for the naming of the helper header. The two `riscv64` commits are: - [copy `helperrvv.h`](https://github.com/fitzsim/jdk/commit/bcd3813ca97f6308838ee93bcb5c02d9cd37375a) - [add `riscv64` support to `SleefCommon.gmk`](https://github.com/fitzsim/jdk/commit/21e0369682095422f45015d817410d07c711b8c0) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2245327294 From ayang at openjdk.org Tue Jul 23 14:18:58 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 23 Jul 2024 14:18:58 GMT Subject: RFR: 8337027: Parallel: Obsolete BaseFootPrintEstimate Message-ID: <wULp2EAECh8W75aA83GCDEq9GzldQzBwwe16SqY6phk=.902d4251-a271-4575-8ac3-4f2224ca453c@github.com> Simple obsoleting a Parallel GC product flag. ------------- Commit messages: - pgc-obsolete-base-footprint Changes: https://git.openjdk.org/jdk/pull/20299/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20299&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337027 Stats: 28 lines in 7 files changed: 0 ins; 25 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20299.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20299/head:pull/20299 PR: https://git.openjdk.org/jdk/pull/20299 From szaldana at openjdk.org Tue Jul 23 15:52:37 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Tue, 23 Jul 2024 15:52:37 GMT Subject: Integrated: 8327054: DiagnosticCommand Compiler.perfmap does not log on output() In-Reply-To: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com> References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com> Message-ID: <AJK-0iLZwFOt9O0VhLmlF5ph1dPY5QdDlEn1arTycDA=.d988bcf8-bea4-4f51-b1bc-a10cf870ce3d@github.com> On Fri, 19 Jul 2024 15:07:39 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: > Hi all, > > This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to. > > Testing: > - [x] Added test case passes. > > Thanks, > Sonia This pull request has now been integrated. Changeset: 8e1f17e3 Author: Sonia Zaldana Calles <szaldana at openjdk.org> URL: https://git.openjdk.org/jdk/commit/8e1f17e351bc7949b318a0542a4a4cb30ead5a97 Stats: 18 lines in 5 files changed: 12 ins; 0 del; 6 mod 8327054: DiagnosticCommand Compiler.perfmap does not log on output() Reviewed-by: lmesnik, stuefe, kevinw, cjplummer ------------- PR: https://git.openjdk.org/jdk/pull/20257 From stuefe at openjdk.org Tue Jul 23 15:59:35 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 23 Jul 2024 15:59:35 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v7] In-Reply-To: <5UHSCkGbA7jXwwEfE8ou0LzvPd5flc7M9ZwbNhZFFvM=.c677a49b-c98c-42c9-81df-b366379aefa9@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <5UHSCkGbA7jXwwEfE8ou0LzvPd5flc7M9ZwbNhZFFvM=.c677a49b-c98c-42c9-81df-b366379aefa9@github.com> Message-ID: <EtjIbzgWfPfmRdkLLbGD6dv_Cs4vxNGch1qG13lgxAM=.5f15b693-6b52-4252-b46a-d79cb980da64@github.com> On Mon, 22 Jul 2024 20:03:08 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. >> >> This PR addresses the following diagnostic commands: >> - [x] Compiler.perfmap >> - [x] GC.heap_dump >> - [x] System.dump_map >> - [x] Thread.dump_to_file >> - [x] VM.cds >> >> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). >> >> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, >> >> >> filename (Optional) Name of the file to which the flight recording data is >> written when the recording is stopped. If no filename is given, a >> filename is generated from the PID and the current date and is >> placed in the directory where the process was started. The >> filename may also be a directory in which case, the filename is >> generated from the PID and the current date in the specified >> directory. (STRING, no default value) >> >> Note: If a filename is given, '%p' in the filename will be >> replaced by the PID, and '%t' will be replaced by the time in >> 'yyyy_MM_dd_HH_mm_ss' format. >> >> >> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. >> >> Testing: >> >> - [x] Added test case passes. >> - [x] Modified existing VM.cds tests to also check for `%p` filenames. >> >> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). >> >> Cheers, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Error messaging format Slowly getting there... :) src/hotspot/share/prims/wbtestmethods/parserTests.cpp line 79: > 77: DCmdArgument<char*>* argument = new DCmdArgument<char*>( > 78: name, desc, > 79: "STRING", mandatory, default_value); I would revert all these style-only changes and just keep the functional one (addition of FileArgument handling). Let's keep this for a follow-up. src/hotspot/share/services/diagnosticArgument.cpp line 376: > 374: THROW_MSG(vmSymbols::java_lang_IllegalArgumentException(), error_msg.base()); > 375: } > 376: } The realloc here is a bit pointless since if `_value._name` is set, it already points to a buffer of size JVM_MAXPATHLEN. I would either one of these two: - either inline the buffer into FileArgument as I wrote earlier; no need to allocate or deallocate then. - or, in this function, allocate if _name is not null, use existing buffer otherwise src/hotspot/share/services/diagnosticCommand.cpp line 524: > 522: HeapDumper dumper(!_all.value() /* request GC if _all is false*/); > 523: dumper.dump(_filename.value()._name, output(), (int)level, _overwrite.value(), > 524: (uint)parallel); Please revert style-only changes, lets keep those for follow ups. src/hotspot/share/services/diagnosticCommand.cpp line 1195: > 1193: > 1194: void SystemDumpMapDCmd::execute(DCmdSource source, TRAPS) { > 1195: const char* name = _filename.value()._name; This direct access to the member inside _filename is a bit awkward. I would make the buffer private and give the class some getters and setters, possibly like this: class FileArgument { // private stuff public: const char* get() const { // return internal buffer } // returns true if parsing succeeded, false if not bool parse_value(const char* s, size_t len) { // call Arguments::copyexpand, target internal buffer, and return its return value } } ------------- PR Review: https://git.openjdk.org/jdk/pull/20198#pullrequestreview-2193132206 PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1687527542 PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1688310305 PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1688264948 PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1688316839 From szaldana at openjdk.org Tue Jul 23 17:43:51 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Tue, 23 Jul 2024 17:43:51 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v8] In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> Message-ID: <vtqGm4iD9-utSITUJrGmlAo6W8KQvfrKR0GZIaYgyZY=.2780c1f3-1471-44f2-a9b6-2fb6bd1c1d66@github.com> > Hi all, > > This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. > > This PR addresses the following diagnostic commands: > - [x] Compiler.perfmap > - [x] GC.heap_dump > - [x] System.dump_map > - [x] Thread.dump_to_file > - [x] VM.cds > > Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). > > I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, > > > filename (Optional) Name of the file to which the flight recording data is > written when the recording is stopped. If no filename is given, a > filename is generated from the PID and the current date and is > placed in the directory where the process was started. The > filename may also be a directory in which case, the filename is > generated from the PID and the current date in the specified > directory. (STRING, no default value) > > Note: If a filename is given, '%p' in the filename will be > replaced by the PID, and '%t' will be replaced by the time in > 'yyyy_MM_dd_HH_mm_ss' format. > > > Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. > > Testing: > > - [x] Added test case passes. > - [x] Modified existing VM.cds tests to also check for `%p` filenames. > > Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). > > Cheers, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with three additional commits since the last revision: - Fixing formatting - Inlining buffer and making field private - Reverting to functional changes in parserTests.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20198/files - new: https://git.openjdk.org/jdk/pull/20198/files/517db0cd..c898b1cf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=06-07 Stats: 80 lines in 4 files changed: 16 ins; 9 del; 55 mod Patch: https://git.openjdk.org/jdk/pull/20198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198 PR: https://git.openjdk.org/jdk/pull/20198 From szaldana at openjdk.org Tue Jul 23 17:57:04 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Tue, 23 Jul 2024 17:57:04 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v9] In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> Message-ID: <ASx5pXkZUT9ZmH7duwX5AhSsKC6HhUvhauP_qnvYcZE=.8abdc5f3-39b1-4247-b6a3-2d05a68db4f8@github.com> > Hi all, > > This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. > > This PR addresses the following diagnostic commands: > - [x] Compiler.perfmap > - [x] GC.heap_dump > - [x] System.dump_map > - [x] Thread.dump_to_file > - [x] VM.cds > > Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). > > I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, > > > filename (Optional) Name of the file to which the flight recording data is > written when the recording is stopped. If no filename is given, a > filename is generated from the PID and the current date and is > placed in the directory where the process was started. The > filename may also be a directory in which case, the filename is > generated from the PID and the current date in the specified > directory. (STRING, no default value) > > Note: If a filename is given, '%p' in the filename will be > replaced by the PID, and '%t' will be replaced by the time in > 'yyyy_MM_dd_HH_mm_ss' format. > > > Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. > > Testing: > > - [x] Added test case passes. > - [x] Modified existing VM.cds tests to also check for `%p` filenames. > > Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). > > Cheers, > Sonia Sonia Zaldana Calles has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: - Merge master - Fixing formatting - Inlining buffer and making field private - Reverting to functional changes in parserTests.cpp - Error messaging format - Fixing memory leak - Fixing pointer style, s/NULL/nullptr, and exception - Cleaning up parserTests.cpp - Missing copyright header update - Adding tests for file dcmd argument - ... and 5 more: https://git.openjdk.org/jdk/compare/2f2223d7...52ca557d ------------- Changes: https://git.openjdk.org/jdk/pull/20198/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=08 Stats: 195 lines in 11 files changed: 154 ins; 19 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/20198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198 PR: https://git.openjdk.org/jdk/pull/20198 From szaldana at openjdk.org Tue Jul 23 18:12:32 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Tue, 23 Jul 2024 18:12:32 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v9] In-Reply-To: <0FaB5dyzz0jaa0RETfdT4wcbS3jPg4QzIzj1s-pPWvw=.805a55dc-d141-482f-b6aa-e6c4fdfbb97d@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <vluUCz7LJUc6FInntimxmXcyImSJfrxWkBOUWat-2zs=.7b3ab621-30a8-4e6d-89f2-77c3504dc432@github.com> <0FaB5dyzz0jaa0RETfdT4wcbS3jPg4QzIzj1s-pPWvw=.805a55dc-d141-482f-b6aa-e6c4fdfbb97d@github.com> Message-ID: <_dK88nL0aH7z6iBgH3_pwwFfCfzom8_I5xGJU5L0swo=.11562964-b22b-47c4-8dde-8c3bc42009b4@github.com> On Wed, 17 Jul 2024 14:21:05 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > * In all cases: please, in case of an error, don't THROW, don't do `warning`. Instead, just print to the `output()` of the DCmd. You want an error to appear to the user of the dcmd - so, to stdout or stderr of the jcmd process issuing the command. Not an exception in the target JVM process, nor a warning in the target JVM stderr stream FYI, I filed [JDK-8337047](https://bugs.openjdk.org/browse/JDK-8337047) to track this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2245923827 From vlivanov at openjdk.org Tue Jul 23 19:03:33 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 23 Jul 2024 19:03:33 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v5] In-Reply-To: <wgY2erz716MCi6K6DcUKEqLyd6E82ArMlba9qHdAA9o=.de21daa5-b078-4469-a6eb-df548f699f65@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> <wgY2erz716MCi6K6DcUKEqLyd6E82ArMlba9qHdAA9o=.de21daa5-b078-4469-a6eb-df548f699f65@github.com> Message-ID: <Ct5EunuM4nq5EUa-kDtCzKs-O4Z_wEnMq2_5W7GPaeY=.f475ee5b-3bea-48a7-97d1-7f71287e4fc9@github.com> On Mon, 22 Jul 2024 14:56:31 GMT, Andrew Haley <aph at openjdk.org> wrote: >> src/hotspot/share/oops/klass.inline.hpp line 117: >> >>> 115: } >>> 116: >>> 117: inline bool Klass::search_secondary_supers(Klass *k) const { >> >> I see you moved `Klass::search_secondary_supers` in `klass.inline.hpp`, but I'm not sure how it interacts with `Klass::is_subtype_of` (the sole caller) being declared in `klass.hpp`. >> >> Will the inlining still happen if `Klass::is_subtype_of()` callers include `klass.hpp`? > > Presumably this question applies to every function in `klass.inline.hpp`? > Practically everything does `#include "oops/klass.inline.hpp"`. It's inlined in about 120 files, as far as I can see everywhere such queries are made. My confusion arises from the following: * `Klass::is_subtype_of()` is declared in `klass.hpp` * `Klass::is_subtype_of()` calls `Klass::search_secondary_supers()` * `Klass::search_secondary_supers()` is declared in `klass.inline.hpp` * `klass.inline.hpp` includes `klass.hpp` What happens when users include `klass.hpp`, but not `klass.inline.hpp`? How does it affect generated code? I suspect that `Klass::search_secondary_supers()` won't be inlinined in such case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1688559463 From vlivanov at openjdk.org Tue Jul 23 19:09:40 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 23 Jul 2024 19:09:40 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v5] In-Reply-To: <cfKy-VTUht4Fbtb5-paKJZvCVAar1mq6Y0d0pDbkFQE=.1aa56b36-3067-4357-89ed-d1d8c3f64426@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> <7JeIjy2PKvI4EZpDain1vd0dBRlWjgjp42xPeY0bHMs=.fee63987-dd85-486d-b7d3-67e52fdbee6f@github.com> <cfKy-VTUht4Fbtb5-paKJZvCVAar1mq6Y0d0pDbkFQE=.1aa56b36-3067-4357-89ed-d1d8c3f64426@github.com> Message-ID: <QlxMOE2D1aUYQodcEmvPd_V4oF2H9OXrQSi0du4gIpg=.9d6539a4-20ed-4bb3-b2f1-a9ee9bb816ab@github.com> On Mon, 22 Jul 2024 14:00:35 GMT, Andrew Haley <aph at openjdk.org> wrote: >> Also, `num_extra_slots == 0` check is redundant. > >> Since `secondary_supers` are hashed unconditionally now, is `interfaces->length() <= 1` check still needed? > > I don't think so, no. Our incoming `transitive_interfaces` is formed by concatenating the interface lists of our superclasses and superinterfaces. Right, I forgot the details. It requires us to hash transitive interfaces list first. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1688568321 From vlivanov at openjdk.org Tue Jul 23 19:09:41 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 23 Jul 2024 19:09:41 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v5] In-Reply-To: <WOY02iAdeWi-IgqSfHkkydfPyRxH1TpsYPYvFD8sRv0=.befb015d-0622-492a-87ab-fe52d0b1fa64@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> <WOY02iAdeWi-IgqSfHkkydfPyRxH1TpsYPYvFD8sRv0=.befb015d-0622-492a-87ab-fe52d0b1fa64@github.com> Message-ID: <tCRApgUEbhhxlWBKS56kjCZPeUVX2PjbHOFDfu7vyPM=.a6a65521-a4a6-4bdb-b1ca-046ffa8464dd@github.com> On Mon, 22 Jul 2024 14:16:05 GMT, Andrew Haley <aph at openjdk.org> wrote: >> src/hotspot/share/oops/klass.cpp line 175: >> >>> 173: if (secondary_supers()->at(i) == k) { >>> 174: if (UseSecondarySupersCache) { >>> 175: ((Klass*)this)->set_secondary_super_cache(k); >> >> Does it make sense to assert `UseSecondarySupersCache` in `Klass::set_secondary_super_cache()`? > > I kinda hate this because we're casting away `const`, which is UB. I think I'd just take it out, but once I do that, I don't think anything sets `_secondary_super_cache`. IMO it's OK if C++ runtime omits `_secondary_super_cache` accesses irrespective of whether `UseSecondarySupersCache` is set or not. I'm fine with addressing it separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1688565818 From coleenp at openjdk.org Tue Jul 23 19:09:43 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 23 Jul 2024 19:09:43 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9] In-Reply-To: <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com> Message-ID: <1ivG4ii0OclIXn9-0Ihh4udD4WUu5Oe64ovWDY1xSJ4=.731721e8-806e-4e34-9ec0-3188b81f9f41@github.com> On Mon, 15 Jul 2024 00:50:30 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision: > > - Remove try_read > - Add explicit to single parameter constructors > - Remove superfluous access specifier > - Remove unused include > - Update assert message OMCache::set_monitor > - Fix indentation > - Remove outdated comment LightweightSynchronizer::exit > - Remove logStream include > - Remove strange comment > - Fix javaThread include I have some suggestions that hopefully you can click on if you agree. Also, some comments. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 67: > 65: } > 66: static void* allocate_node(void* context, size_t size, Value const& value) { > 67: reinterpret_cast<ObjectMonitorWorld*>(context)->inc_table_count(); Suggestion: reinterpret_cast<ObjectMonitorWorld*>(context)->inc_items_count(); src/hotspot/share/runtime/lightweightSynchronizer.cpp line 71: > 69: }; > 70: static void free_node(void* context, void* memory, Value const& value) { > 71: reinterpret_cast<ObjectMonitorWorld*>(context)->dec_table_count(); Suggestion: reinterpret_cast<ObjectMonitorWorld*>(context)->dec_items_count(); src/hotspot/share/runtime/lightweightSynchronizer.cpp line 125: > 123: }; > 124: > 125: void inc_table_count() { Suggestion: void inc_items_count() { src/hotspot/share/runtime/lightweightSynchronizer.cpp line 126: > 124: > 125: void inc_table_count() { > 126: Atomic::inc(&_table_count); Suggestion: Atomic::inc(&_items_count); src/hotspot/share/runtime/lightweightSynchronizer.cpp line 129: > 127: } > 128: > 129: void dec_table_count() { Suggestion: void dec_items_count() { src/hotspot/share/runtime/lightweightSynchronizer.cpp line 130: > 128: > 129: void dec_table_count() { > 130: Atomic::inc(&_table_count); Suggestion: Atomic::inc(&_items_count); src/hotspot/share/runtime/lightweightSynchronizer.cpp line 134: > 132: > 133: double get_load_factor() { > 134: return (double)_table_count/(double)_table_size; Suggestion: return (double)_items_count/(double)_table_size; ------------- PR Review: https://git.openjdk.org/jdk/pull/20067#pullrequestreview-2193868194 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688563846 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688563501 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688565196 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688565561 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688565947 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688566411 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688566752 From coleenp at openjdk.org Tue Jul 23 19:09:43 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 23 Jul 2024 19:09:43 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6] In-Reply-To: <7MDa4Z7FtvI5TG3rARV50PQckm3MSqOzBefku_lFwyc=.ead08ce2-1850-4803-a2eb-bd22cdcdd221@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com> <H1xx5Q5Wsuz3cl0FP1fwX4kL-jYdqbQ3skKwYcd54vo=.bd7abee8-0300-4253-a8b4-428ae8da1a0e@github.com> <u2VLk8hKBH5V6331fMIPCwusNARMd_v-q_wL_7r0AOA=.99b9b9f1-ac37-4cb6-9ad0-4e019fe3c1fe@github.com> <7MDa4Z7FtvI5TG3rARV50PQckm3MSqOzBefku_lFwyc=.ead08ce2-1850-4803-a2eb-bd22cdcdd221@github.com> Message-ID: <JrmblfUl8jxfWwZHI8MIO0V5OOIn4a0M0A6sWS6J08Y=.cc047802-93a4-49a4-b646-6201dbd4403b@github.com> On Tue, 23 Jul 2024 12:34:45 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> I was thinking it was referring to `ObjectSynchronizer::waitUninterruptibly` added the same commit as the comment b3bf31a0a08da679ec2fd21613243fb17b1135a9 > > git backout restored the old wrong comment. We should fix this separately. Suggestion: // If we were to use wait() instead of waitInterruptibly() then >> I think I was thinking of the names as a prefix to refer to the `Count of the table` and `Size of the table`. And not the `Number of tables`. But I can see the confusion. >> >> `ConcurrentHashTable` tracks no statistics except for JFR which added some counters directly into the implementation. All statistics are for the users to manage, even if there are helpers for gather these statistics. >> >> The current implementation is based on what we do for the StringTable and SymbolTable > > In the other tables, it's called _items_count and it determines the load_factor for triggering concurrent work. We should rename this field items_count to match, and also since it's consistent. Suggestion: volatile size_t _items_count; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1687990861 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688564267 From coleenp at openjdk.org Tue Jul 23 19:09:45 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 23 Jul 2024 19:09:45 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6] In-Reply-To: <u2VLk8hKBH5V6331fMIPCwusNARMd_v-q_wL_7r0AOA=.99b9b9f1-ac37-4cb6-9ad0-4e019fe3c1fe@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com> <H1xx5Q5Wsuz3cl0FP1fwX4kL-jYdqbQ3skKwYcd54vo=.bd7abee8-0300-4253-a8b4-428ae8da1a0e@github.com> <u2VLk8hKBH5V6331fMIPCwusNARMd_v-q_wL_7r0AOA=.99b9b9f1-ac37-4cb6-9ad0-4e019fe3c1fe@github.com> Message-ID: <C-YWboakmVWLsm8fDywpBlSsKQyiB31SXInOEh2qY5o=.c954ee95-79f4-48cf-8a49-5ebabd1325c7@github.com> On Mon, 15 Jul 2024 00:44:31 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> src/hotspot/share/runtime/basicLock.cpp line 37: >> >>> 35: if (mon != nullptr) { >>> 36: mon->print_on(st); >>> 37: } >> >> I am not sure if we wanted to do this, but we know the owner, therefore we could also look-up the OM from the table, and print it. It wouldn't have all that much to do with the BasicLock, though. > > Yeah maybe it is unwanted. Not sure how we should treat these prints of the frames. My thinking was that there is something in the cache, print it. But maybe just treating it as some internal data, maybe print "monitor { <Cached ObjectMonitor* address> }" or similar is better. It seems generally useful to print the monitor in the cache if it's there. I don't think we should do a table search here. I think this looks fine as it is, and might be helpful for debugging if it turns out to be the wrong monitor. >> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 80: >> >>> 78: >>> 79: ConcurrentTable* _table; >>> 80: volatile size_t _table_count; >> >> Looks like a misnomer to me. We only have one table, but we do have N entries/nodes. This is counted when new nodes are allocated or old nodes are freed. Consider renaming this to '_entry_count' or '_node_count'? I'm actually a bit surprised if ConcurrentHashTable doesn't already track this... > > I think I was thinking of the names as a prefix to refer to the `Count of the table` and `Size of the table`. And not the `Number of tables`. But I can see the confusion. > > `ConcurrentHashTable` tracks no statistics except for JFR which added some counters directly into the implementation. All statistics are for the users to manage, even if there are helpers for gather these statistics. > > The current implementation is based on what we do for the StringTable and SymbolTable In the other tables, it's called _items_count and it determines the load_factor for triggering concurrent work. We should rename this field items_count to match, and also since it's consistent. >> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 159: >> >>> 157: static size_t min_log_size() { >>> 158: // ~= log(AvgMonitorsPerThreadEstimate default) >>> 159: return 10; >> >> Uh wait - are we assuming that threads hold 1024 monitors *on average* ? Isn't this a bit excessive? I would have thought maybe 8 monitors/thread. Yes there are workloads that are bonkers. Or maybe the comment/flag name does not say what I think it says. >> >> Or why not use AvgMonitorsPerThreadEstimate directly? > > Maybe that is resonable. I believe I had that at some point but it had to deal with how to handle extreme values of `AvgMonitorsPerThreadEstimate` as well as what to do when `AvgMonitorsPerThreadEstimate` was disabled `=0`. One 4 / 8 KB allocation seems harmless. > > But this was very arbitrary. This will probably be changed when/if the resizing of the table becomes more synchronised with deflation, allowing for shrinking the table. Shrinking the table is NYI. Maybe we should revisit this initial value then. >> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 563: >> >>> 561: assert(locking_thread == current || locking_thread->is_obj_deopt_suspend(), "locking_thread may not run concurrently"); >>> 562: if (_no_safepoint) { >>> 563: ::new (&_nsv) NoSafepointVerifier(); >> >> I'm thinking that it might be easier and cleaner to just re-do what the NoSafepointVerifier does? It just calls thread->inc/dec >> _no_safepoint_count(). > > I wanted to avoid having to add `NoSafepointVerifier` implementation details in the synchroniser code. I guess `ContinuationWrapper` already does this. > > Simply creating a `NoSafepointVerifier` when you expect no safepoint is more obvious to me, shows the intent better. This looks strange to me also, but it's be better than changing the no_safepoint_count directly, since NSV handles when the current thread isn't a JavaThread, so you'd have to duplicate that in this VerifyThreadState code too. NoSafepointVerifier::NoSafepointVerifier() : _thread(Thread::current()) { if (_thread->is_Java_thread()) { JavaThread::cast(_thread)->inc_no_safepoint_count(); } } >> src/hotspot/share/runtime/lightweightSynchronizer.hpp line 68: >> >>> 66: static void exit(oop object, JavaThread* current); >>> 67: >>> 68: static ObjectMonitor* inflate_into_object_header(Thread* current, JavaThread* inflating_thread, oop object, const ObjectSynchronizer::InflateCause cause); >> >> My IDE flags this with a warning 'Parameter 'cause' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions' *shrugs* > > Yeah. The only effect is has is that you cannot reassign the variable. It was the style taken from [synchronizer.hpp](https://github.com/openjdk/jdk/blob/15997bc3dfe9dddf21f20fa189f97291824892de/src/hotspot/share/runtime/synchronizer.hpp) where all `InflateCause` parameters are const. Do you get this for inflate_fast_locked_object also? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688011833 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688162915 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688378429 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688385921 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688397480 From coleenp at openjdk.org Tue Jul 23 19:09:47 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 23 Jul 2024 19:09:47 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9] In-Reply-To: <Wj5uaRxDmVYqDnt2V1PgErk7dI10LCro6WSfAm4Q6BU=.6fd91b51-ec40-438f-95a4-d2fbf593a288@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com> <Wj5uaRxDmVYqDnt2V1PgErk7dI10LCro6WSfAm4Q6BU=.6fd91b51-ec40-438f-95a4-d2fbf593a288@github.com> Message-ID: <RmSdsqxnTjwB53zn49RvKRIRkIuz_jiRjOCwwLhEm-g=.6e9238c5-bbbb-4666-82b0-fef3235a12b6@github.com> On Wed, 17 Jul 2024 06:35:34 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision: >> >> - Remove try_read >> - Add explicit to single parameter constructors >> - Remove superfluous access specifier >> - Remove unused include >> - Update assert message OMCache::set_monitor >> - Fix indentation >> - Remove outdated comment LightweightSynchronizer::exit >> - Remove logStream include >> - Remove strange comment >> - Fix javaThread include > > src/hotspot/share/runtime/basicLock.hpp line 44: > >> 42: // a sentinel zero value indicating a recursive stack-lock. >> 43: // * For LM_LIGHTWEIGHT >> 44: // Used as a cache the ObjectMonitor* used when locking. Must either > > The first sentence doesn't read correctly. Suggestion: // Used as a cache of the ObjectMonitor* used when locking. Must either > src/hotspot/share/runtime/deoptimization.cpp line 1641: > >> 1639: assert(fr.is_deoptimized_frame(), "frame must be scheduled for deoptimization"); >> 1640: if (LockingMode == LM_LEGACY) { >> 1641: mon_info->lock()->set_displaced_header(markWord::unused_mark()); > > In the existing code how is this restricted to the LM_LEGACY case?? It appears to be unconditional which suggests you are changing the non-UOMT LM_LIGHTWEIGHT logic. ?? Only legacy locking uses the displaced header, I believe, which isn't clear in this code at all. This seems like a fix. We should probably assert that only legacy locking uses this field as a displaced header. > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 62: > >> 60: class ObjectMonitorWorld : public CHeapObj<MEMFLAGS::mtObjectMonitor> { >> 61: struct Config { >> 62: using Value = ObjectMonitor*; > > Does this alias really help? We don't state the type that many times and it looks odd to end up with a mix of `Value` and `ObjectMonitor*` in the same code. This alias is present in the other CHT implementations, alas as a typedef in StringTable and SymbolTable so this follows the pattern and allows cut/paste of the allocate_node, get_hash, and other functions. > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 102: > >> 100: assert(*value != nullptr, "must be"); >> 101: return (*value)->object_is_cleared(); >> 102: } > > The `is_dead` functions seem oddly placed given they do not relate to the object stored in the wrapper. Why are they here? And what is the difference between `object_is_cleared` and `object_is_dead` (as used by `LookupMonitor`) ? This is a good question. When we look up the Monitor, we don't want to find any that the GC has marked dead, so that's why we call object_is_dead. When we look up with the object to find the Monitor, the object won't be dead (since we're using it to look up). But we don't want to find one that we've cleared because the Monitor was deflated? I don't see where we would clear it though. We clear the WeakHandle in the destructor after the Monitor has been removed from the table. > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 105: > >> 103: }; >> 104: >> 105: class LookupMonitor : public StackObj { > > I'm not understanding why we need this little wrapper class. It's a two way lookup. The plain Lookup class is used to lookup the Monitor given the object. This LookupMonitor class is used to lookup the object given the Monitor. The CHT takes these wrapper classes. Maybe we should rename LookupObject to be more clear? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688013308 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688041218 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688051557 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688375335 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688168626 From coleenp at openjdk.org Tue Jul 23 19:09:48 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 23 Jul 2024 19:09:48 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9] In-Reply-To: <U3cg8IdnKu5Eeg-52muJuU0vEGJTRaX4jhKCOB3DVtk=.a1acc8fc-c3b7-4d38-ace8-dd39eff6c139@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com> <0Dwv0GUezG25Soj6iG3Ti4NCm_RQJdF7psmnDoUAdRU=.c38a44c6-f6e6-4e2a-84ef-45c32d145a13@github.com> <U3cg8IdnKu5Eeg-52muJuU0vEGJTRaX4jhKCOB3DVtk=.a1acc8fc-c3b7-4d38-ace8-dd39eff6c139@github.com> Message-ID: <dxKBS7LJqVnSvlkQODDU3JXzgevL9LZ6cYGRZPj8Bmk=.38114de3-d511-43b5-b81a-fd686c13c0b8@github.com> On Wed, 17 Jul 2024 06:40:31 GMT, David Holmes <dholmes at openjdk.org> wrote: >> src/hotspot/share/runtime/basicLock.hpp line 46: >> >>> 44: // Used as a cache the ObjectMonitor* used when locking. Must either >>> 45: // be nullptr or the ObjectMonitor* used when locking. >>> 46: volatile uintptr_t _metadata; >> >> The displaced header/markword terminology was very well known to people, whereas "metadata" is really abstract - people will always need to go and find out what it actually refers to. Could we not define a union here to support the legacy and lightweight modes more explicitly and keep the existing terminology for the setters/getters for the code that uses it? > > I should have read ahead. I see you do keep the setters/getters. When we remove legacy locking in a couple of releases, we could rename this field cached_monitor. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688015247 From coleenp at openjdk.org Tue Jul 23 19:09:49 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 23 Jul 2024 19:09:49 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9] In-Reply-To: <OCv6QKq_A8dUaKUbnzSdEnlEqrMIcb6pUyLfObBFq-o=.1d78e62f-151c-403d-a291-fbab38c5f4d6@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com> <OCv6QKq_A8dUaKUbnzSdEnlEqrMIcb6pUyLfObBFq-o=.1d78e62f-151c-403d-a291-fbab38c5f4d6@github.com> Message-ID: <3m5N_Fh65MVy7vRvO0wq3qFlzxjbCLHhbTBJe8OJorw=.eb61b3bd-5aca-45cd-8e88-389ae86a599b@github.com> On Thu, 18 Jul 2024 11:30:27 GMT, Roman Kennke <rkennke at openjdk.org> wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision: >> >> - Remove try_read >> - Add explicit to single parameter constructors >> - Remove superfluous access specifier >> - Remove unused include >> - Update assert message OMCache::set_monitor >> - Fix indentation >> - Remove outdated comment LightweightSynchronizer::exit >> - Remove logStream include >> - Remove strange comment >> - Fix javaThread include > > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 77: > >> 75: using ConcurrentTable = ConcurrentHashTable<Config, MEMFLAGS::mtObjectMonitor>; >> 76: >> 77: ConcurrentTable* _table; > > So you have a class ObjectMonitorWorld, which references the ConcurrentTable, which, internally also has its actual table. This is 3 dereferences to get to the actual table, if I counted correctly. I'd try to eliminate the outermost ObjectMonitorWorld class, or at least make it a global flat structure instead of a reference to a heap-allocated object. I think, because this is a structure that is global and would exist throughout the lifetime of the Java program anyway, it might be worth figuring out how to do the actual ConcurrentHashTable flat in the global structure, too. This is a really good suggestion and might help a lot with the performance problems that we see with the table with heavily contended locking. I think we should change this in a follow-on patch (which I'll work on). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688053792 From vlivanov at openjdk.org Tue Jul 23 19:14:35 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 23 Jul 2024 19:14:35 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v5] In-Reply-To: <QRaDiTIgGhnhvj1km2MOoIDYXKGjnzC04OoEkYgUrxU=.cdd1b266-2380-4c72-884a-163ef267be74@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> <QRaDiTIgGhnhvj1km2MOoIDYXKGjnzC04OoEkYgUrxU=.cdd1b266-2380-4c72-884a-163ef267be74@github.com> Message-ID: <UGTJYkZpXOFGtKUPy3EWk2VORIgoblwrEeXaymw_rZ4=.3c014718-7254-40f0-b332-8f5650b0ce9e@github.com> On Mon, 22 Jul 2024 16:45:06 GMT, Andrew Haley <aph at openjdk.org> wrote: >> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4810: >> >>> 4808: Label* L_success, >>> 4809: Label* L_failure) { >>> 4810: // NB! Callers may assume that, when temp2_reg is a valid register, >> >> Oh, that's a subtle point... Can we make it more evident at call sites? > > Done. I think the only code that still depends on it is the C2 pattern that uses check_klass_subtype_slow_path_linear in x86_32.ad and x86_64.ad. Thanks. I revisited the code and now it seems like `temp2_reg_was_valid` duplicates `set_cond_codes` parameter in the original implementation. Am I missing something important here? Otherwise, why can't we rely on `set_cond_codes` flag instead? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1688579798 From vlivanov at openjdk.org Tue Jul 23 19:18:34 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 23 Jul 2024 19:18:34 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v5] In-Reply-To: <6P17gX_V6nL3hsgbuPrGN4Y8nzyoQMs3fTLaiRaOzwA=.e3eb0ea0-d41c-4222-a1f3-65f9075dbb4d@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <6P17gX_V6nL3hsgbuPrGN4Y8nzyoQMs3fTLaiRaOzwA=.e3eb0ea0-d41c-4222-a1f3-65f9075dbb4d@github.com> Message-ID: <okwRz9n9XLujkQc_Il_J5otkYuUpAGaYRBg5Ln0tZNk=.bb0d230a-c4d3-46eb-a431-213bfc321b28@github.com> On Mon, 22 Jul 2024 17:19:46 GMT, Andrew Haley <aph at openjdk.org> wrote: >> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't known at compile >> time. >> >> HotSpot shared runtime >> ---------------------- >> >> Building hashed secondary tables is now unconditional. It takes very >> little time, and now that the shared runtime always has the tables, it >> might as well take advantage of them. The shared code is easier to >> follow now, I think. >> >> There might be a performance issue with x86-64 in that we build >> HotSpot for a default x86-64 target that does not support popcount. >> This means that HotSpot C++ runtime on x86 always uses a software >> emulation for popcount, even though the vast majority of machines made >> for the past 20 years can do popcount in a single instruction. It >> wouldn't be terribly hard to do something about that. >> >> Having said that, the software popcount is really not bad. >> >> x86 >> --- >> >> x86 is rather tricky, because we still support >> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as >> well as 32- and 64-bit ports. There's some further complication in >> that only `RCX` can be used as a shift count, so there's some register >> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp >> rather gnarly, with multiple levels of conditionals at compile time >> and runtime. >> >> AArch64 >> ------- >> >> AArch64 is considerably more straightforward. We always have a >> popcount instruction and (thankfully) no 32-bit code to worry about. >> >> Generally >> --------- >> >> I would dearly love simply to rip out the "old" secondary supers cache >> support, but I've left it in just in case someone has a performance >> regression. >> >> The versions of `MacroAssembler::lookup_secondary_supers_table` that >> work with variable superclasses don't take a fixed set of temp >> registers, and neither do they call out to to a slow path subroutine. >> Instead, the slow patch is expanded inline. >> >> I don't think this is necessarily bad. Apart from the very rare cases >> where C2 can't determine the superclass to search for at compile time, >> this code is only used for generating stubs, and it seemed to me >> ridiculous to have stubs calling other stubs. >> >> I've followed the guidance from @iwanowww not to obsess too much about >> the performance of C1-compiled secondary supers lookups, and to prefer >> simplicity over absolute performance. Nonetheless, this i... > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Review comments FYI I did a merge with mainline and submitted the patch for testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2246099889 From vlivanov at openjdk.org Tue Jul 23 19:18:34 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 23 Jul 2024 19:18:34 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v5] In-Reply-To: <UAjH__AKdU3UMdJBkg7TlElKSA8mEFFE0MiElVrYexE=.4bc67a26-3383-4e4e-92b0-f1d3d33c5ce2@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> <A2v60vdAPL9qb22NB6kLVyuCACPDeqHUYoYFRFX6ig0=.9ef6f86b-559d-463a-9061-d0bbb6093aa7@github.com> <ukQ_tEZztKeBZnn8TDo3YfJ4GI0mHUrVRZmgM4d1W1g=.1fc9f9f2-c2bf-4237-94d4-dd9aae26411b@github.com> <BolXJ-8qekfYskirR9P20jAQZW6s7WPe4A-oija7RA8=.855251f0-4246-403d-a9fe-00b9406f07e3@github.com> <eLDcJyPLboqZr-8yk1kxVfV6WTaRYXZq5lZvDoIEFKM=.c87b23c8-d9c5-45ff-a2dd-5f0c4875cb62@github.com> <UAjH__AKdU3UMdJBkg7TlElKSA8mEFFE0MiElVrYexE=.4bc67a26-3383-4e4e-92b0-f1d3d33c5ce2@github.com> Message-ID: <M5xQ14pzHdBEr7yAdAqIVUsY_o8tXUgN9HpKxjkZznw=.f2262137-2fec-4297-ae1e-89b11874266f@github.com> On Mon, 22 Jul 2024 16:36:25 GMT, Andrew Haley <aph at openjdk.org> wrote: >>> Alternatively, `Klass::is_subtype_of()` can unconditionally perform linear search over secondary_supers array. >>> >>> Even though I very much like to see table lookup written in C++ (accompanying heavily optimized platform-specific MacroAssembler variants), it would make C++ runtime even simpler. >> >> It would, but there is something to be said for being able to provide a fast "no" answer for interface membership. I'll agree it's probably not a huge difference. I guess `is_cloneable_fast()` exists only because searching the interfaces is slow. >> Also, if table lookup is written in C++ but not used, it will rot. >> Also also, `Klass::is_subtype_of()` is used for C1 runtime. > > Thinking about it some more, I don't really mind. There may be some virtue to moving lookup_secondary_supers_table() to a comment in the back end(s), and the expansion of population_count() is rather bloaty. > Also also, Klass::is_subtype_of() is used for C1 runtime. Can you elaborate, please? What I'm seeing in `Runtime1::generate_code_for()` for `slow_subtype_check` is a call into `MacroAssembler::check_klass_subtype_slow_path()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1688586602 From vlivanov at openjdk.org Tue Jul 23 19:21:33 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 23 Jul 2024 19:21:33 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v5] In-Reply-To: <6P17gX_V6nL3hsgbuPrGN4Y8nzyoQMs3fTLaiRaOzwA=.e3eb0ea0-d41c-4222-a1f3-65f9075dbb4d@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <6P17gX_V6nL3hsgbuPrGN4Y8nzyoQMs3fTLaiRaOzwA=.e3eb0ea0-d41c-4222-a1f3-65f9075dbb4d@github.com> Message-ID: <aKMZwN8ncRaVRHjYsexOaHM5VCdZJsnOpiuCAbuAxw0=.29e4b6de-883e-40ac-b4d1-a11b414aa1a9@github.com> On Mon, 22 Jul 2024 17:19:46 GMT, Andrew Haley <aph at openjdk.org> wrote: >> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't known at compile >> time. >> >> HotSpot shared runtime >> ---------------------- >> >> Building hashed secondary tables is now unconditional. It takes very >> little time, and now that the shared runtime always has the tables, it >> might as well take advantage of them. The shared code is easier to >> follow now, I think. >> >> There might be a performance issue with x86-64 in that we build >> HotSpot for a default x86-64 target that does not support popcount. >> This means that HotSpot C++ runtime on x86 always uses a software >> emulation for popcount, even though the vast majority of machines made >> for the past 20 years can do popcount in a single instruction. It >> wouldn't be terribly hard to do something about that. >> >> Having said that, the software popcount is really not bad. >> >> x86 >> --- >> >> x86 is rather tricky, because we still support >> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as >> well as 32- and 64-bit ports. There's some further complication in >> that only `RCX` can be used as a shift count, so there's some register >> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp >> rather gnarly, with multiple levels of conditionals at compile time >> and runtime. >> >> AArch64 >> ------- >> >> AArch64 is considerably more straightforward. We always have a >> popcount instruction and (thankfully) no 32-bit code to worry about. >> >> Generally >> --------- >> >> I would dearly love simply to rip out the "old" secondary supers cache >> support, but I've left it in just in case someone has a performance >> regression. >> >> The versions of `MacroAssembler::lookup_secondary_supers_table` that >> work with variable superclasses don't take a fixed set of temp >> registers, and neither do they call out to to a slow path subroutine. >> Instead, the slow patch is expanded inline. >> >> I don't think this is necessarily bad. Apart from the very rare cases >> where C2 can't determine the superclass to search for at compile time, >> this code is only used for generating stubs, and it seemed to me >> ridiculous to have stubs calling other stubs. >> >> I've followed the guidance from @iwanowww not to obsess too much about >> the performance of C1-compiled secondary supers lookups, and to prefer >> simplicity over absolute performance. Nonetheless, this i... > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Review comments My take on the questions you raised. > Should Klass::linear_search_secondary_supers() const call set_secondary_super_cache()? (Strong no from me. It's UB.) Agree. I'm fine with addressing that separately (as I mentioned earlier). > Should we use a straight linear search for secondary C++ supers in the runtime, i.e.not changing it for now? Slightly in favor of keeping `Klass::is_subtype_of()` simple, but I'm fine with it either way. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2246108523 From coleenp at openjdk.org Tue Jul 23 20:24:37 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 23 Jul 2024 20:24:37 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9] In-Reply-To: <RmSdsqxnTjwB53zn49RvKRIRkIuz_jiRjOCwwLhEm-g=.6e9238c5-bbbb-4666-82b0-fef3235a12b6@github.com> References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com> <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com> <Wj5uaRxDmVYqDnt2V1PgErk7dI10LCro6WSfAm4Q6BU=.6fd91b51-ec40-438f-95a4-d2fbf593a288@github.com> <RmSdsqxnTjwB53zn49RvKRIRkIuz_jiRjOCwwLhEm-g=.6e9238c5-bbbb-4666-82b0-fef3235a12b6@github.com> Message-ID: <zYO70HQfI8-f1CcQ_N7H0FtHecJdJ26uGt3DBJeMaYg=.9477511e-59c1-4eb4-9d7f-40c5e0b7555c@github.com> On Tue, 23 Jul 2024 13:12:23 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> src/hotspot/share/runtime/deoptimization.cpp line 1641: >> >>> 1639: assert(fr.is_deoptimized_frame(), "frame must be scheduled for deoptimization"); >>> 1640: if (LockingMode == LM_LEGACY) { >>> 1641: mon_info->lock()->set_displaced_header(markWord::unused_mark()); >> >> In the existing code how is this restricted to the LM_LEGACY case?? It appears to be unconditional which suggests you are changing the non-UOMT LM_LIGHTWEIGHT logic. ?? > > Only legacy locking uses the displaced header, I believe, which isn't clear in this code at all. This seems like a fix. We should probably assert that only legacy locking uses this field as a displaced header. Update: yes, this code change does assert if you use BasicLock's displaced header for locking modes other than LM_LEGACY. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688668887 From asmehra at openjdk.org Tue Jul 23 21:50:58 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 23 Jul 2024 21:50:58 GMT Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic Message-ID: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com> Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031) Testing: test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java ------------- Commit messages: - Update comments in tests to reflect new output format - 8337031: Improvements to CompilationMemoryStatistic Changes: https://git.openjdk.org/jdk/pull/20304/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20304&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337031 Stats: 173 lines in 6 files changed: 77 ins; 21 del; 75 mod Patch: https://git.openjdk.org/jdk/pull/20304.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20304/head:pull/20304 PR: https://git.openjdk.org/jdk/pull/20304 From asmehra at openjdk.org Tue Jul 23 21:50:58 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 23 Jul 2024 21:50:58 GMT Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic In-Reply-To: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com> References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com> Message-ID: <Sfs3IHOQoH2YVhJBFpV6XWYC2AaZZQPq1foUBtTVlF0=.2f8dddab-a5bb-4a3b-a758-ec5dbb466f63@github.com> On Tue, 23 Jul 2024 21:46:50 GMT, Ashutosh Mehra <asmehra at openjdk.org> wrote: > Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031) > > Testing: > test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java > test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java @tstuefe fyi ------------- PR Comment: https://git.openjdk.org/jdk/pull/20304#issuecomment-2246372292 From stuefe at openjdk.org Wed Jul 24 06:32:35 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 24 Jul 2024 06:32:35 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v9] In-Reply-To: <ASx5pXkZUT9ZmH7duwX5AhSsKC6HhUvhauP_qnvYcZE=.8abdc5f3-39b1-4247-b6a3-2d05a68db4f8@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <ASx5pXkZUT9ZmH7duwX5AhSsKC6HhUvhauP_qnvYcZE=.8abdc5f3-39b1-4247-b6a3-2d05a68db4f8@github.com> Message-ID: <n8iJ1xBSSKiRJDnwa1flyz8itZaVZdPYyT6Dmh0RuQU=.3e4a14b9-a5f2-4d73-8e3b-84c0d6f7012c@github.com> On Tue, 23 Jul 2024 17:57:04 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. >> >> This PR addresses the following diagnostic commands: >> - [x] Compiler.perfmap >> - [x] GC.heap_dump >> - [x] System.dump_map >> - [x] Thread.dump_to_file >> - [x] VM.cds >> >> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). >> >> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, >> >> >> filename (Optional) Name of the file to which the flight recording data is >> written when the recording is stopped. If no filename is given, a >> filename is generated from the PID and the current date and is >> placed in the directory where the process was started. The >> filename may also be a directory in which case, the filename is >> generated from the PID and the current date in the specified >> directory. (STRING, no default value) >> >> Note: If a filename is given, '%p' in the filename will be >> replaced by the PID, and '%t' will be replaced by the time in >> 'yyyy_MM_dd_HH_mm_ss' format. >> >> >> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. >> >> Testing: >> >> - [x] Added test case passes. >> - [x] Modified existing VM.cds tests to also check for `%p` filenames. >> >> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). >> >> Cheers, >> Sonia > > Sonia Zaldana Calles has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Merge master > - Fixing formatting > - Inlining buffer and making field private > - Reverting to functional changes in parserTests.cpp > - Error messaging format > - Fixing memory leak > - Fixing pointer style, s/NULL/nullptr, and exception > - Cleaning up parserTests.cpp > - Missing copyright header update > - Adding tests for file dcmd argument > - ... and 5 more: https://git.openjdk.org/jdk/compare/2f2223d7...52ca557d src/hotspot/share/services/diagnosticArgument.cpp line 365: > 363: if (!_value.parse_value(str, len)) { > 364: stringStream error_msg; > 365: error_msg.print("Invalid file path: %s", str); In all likelyhood the only reason Argument::copy_expand... is ever going to fail would be if the expanded string would not fit the buffer in FileArgument. I'd consider a clearer warning here, therefore ("File path invalid or too long: ") src/hotspot/share/services/diagnosticCommand.cpp line 1018: > 1016: // of the default, not the actual default. > 1017: FileArgument file_arg = _filename.value(); > 1018: const char *file = _filename.is_set() ? file_arg.get() : nullptr; Style nit: const char*, not const char * src/hotspot/share/services/diagnosticCommand.cpp line 1197: > 1195: void SystemDumpMapDCmd::execute(DCmdSource source, TRAPS) { > 1196: FileArgument file_arg = _filename.value(); > 1197: const char *name = file_arg.get(); pointer style ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1689204690 PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1689206254 PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1689208226 From galder at openjdk.org Wed Jul 24 08:18:32 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 24 Jul 2024 08:18:32 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) In-Reply-To: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: <dM6XmjncreYrCU2rzCosr7BNsN96DI1w-qREFtC7h2s=.e73dea1a-0c97-4950-908d-636867153626@github.com> On Tue, 9 Jul 2024 12:07:37 GMT, Galder Zamarre?o <galder at openjdk.org> wrote: > This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. > > Currently vectorization does not kick in for loops containing either of these calls because of the following error: > > > VLoop::check_preconditions: failed: control flow in loop not allowed > > > The control flow is due to the java implementation for these methods, e.g. > > > public static long max(long a, long b) { > return (a >= b) ? a : b; > } > > > This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. > By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. > E.g. > > > SuperWord::transform_loop: > Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined > 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) > > > Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1155 > long max 1173 > > > After the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1042 > long max 1042 > > > This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. > Therefore, it still relies on the macro expansion to transform those into CMoveL. > > I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier1 2500 2500 0 0 >>> jtreg:test/jdk:tier1 ... I've been working on some JMH benchmarks and I'm seeing some strange results that I need to investigate further. I will update the PR when I have found the reason(s). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2247189103 From tschatzl at openjdk.org Wed Jul 24 08:38:31 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 24 Jul 2024 08:38:31 GMT Subject: RFR: 8337027: Parallel: Obsolete BaseFootPrintEstimate In-Reply-To: <wULp2EAECh8W75aA83GCDEq9GzldQzBwwe16SqY6phk=.902d4251-a271-4575-8ac3-4f2224ca453c@github.com> References: <wULp2EAECh8W75aA83GCDEq9GzldQzBwwe16SqY6phk=.902d4251-a271-4575-8ac3-4f2224ca453c@github.com> Message-ID: <m1m4tg8JMMdfUAzclYXGgtfUI58nn18ZJ8u8XZxi0R8=.2413d07f-bd8c-45df-a7bd-3f0769e93e6b@github.com> On Tue, 23 Jul 2024 14:11:20 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: > Simple obsoleting a Parallel GC product flag. The flag needs to be added to the obsolete flags table too, not only removed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20299#issuecomment-2247230042 From mli at openjdk.org Wed Jul 24 08:44:37 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 24 Jul 2024 08:44:37 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v11] In-Reply-To: <XSv3vwmQVv2abJLfDCmKsELqrY9d2Ohe_FemTMIzxXw=.3351b8ca-91d0-4a0b-9154-f758c293dcdc@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com> <ml4dF0TwSQdlURT8ETSAN9RVnx3iMIVNDCWedq8lc1Y=.6b3da39c-41fc-460e-8632-d5a42be279ab@github.com> <XSv3vwmQVv2abJLfDCmKsELqrY9d2Ohe_FemTMIzxXw=.3351b8ca-91d0-4a0b-9154-f758c293dcdc@github.com> Message-ID: <xZpBFKhBc-4qqdhKoAxzHltOd-Pk3AuG6dWScvpqj64=.1feef605-1927-482b-a4a0-fcc6334f9a73@github.com> On Tue, 23 Jul 2024 13:55:06 GMT, fitzsim <duke at openjdk.org> wrote: > To check this, I [added](https://github.com/fitzsim/jdk/commits/regenerate-sleef-headers-2/) the `riscv64` `CMake` steps to `SleefCommon.gmk`. > > I had intended to factor out `SetupSleefHeader` anyway for `aarch64`, to eliminate copy-n-paste. > > After that, there was one build step divergence for `riscv64` for the naming of the helper header. > > The two `riscv64` commits are: > > * [copy `helperrvv.h`](https://github.com/fitzsim/jdk/commit/bcd3813ca97f6308838ee93bcb5c02d9cd37375a) > * [add `riscv64` support to `SleefCommon.gmk`](https://github.com/fitzsim/jdk/commit/21e0369682095422f45015d817410d07c711b8c0) Thanks for your effort, this is much better. Just one question in my mind. If there is no major refactoring in sleef in the future, I think we're fine. In case there is such refactoring in sleef's implementation, the maintanance will not be a minor work, as in [This branch](https://github.com/fitzsim/jdk/commits/regenerate-sleef-headers-1/) we need to migrate some process inside sleef into jdk? But I'm not sure, maybe others can comment on this question. And I think we can move the discussion about [This branch](https://github.com/fitzsim/jdk/commits/regenerate-sleef-headers-1/) to https://github.com/openjdk/jdk/pull/19185, as finally this part of code will be pushed into jdk via that pr (because of legal process reason), I hope persons involved in that pr do not miss the discussion and information here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2247244342 From aph at openjdk.org Wed Jul 24 09:05:38 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 24 Jul 2024 09:05:38 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v5] In-Reply-To: <M5xQ14pzHdBEr7yAdAqIVUsY_o8tXUgN9HpKxjkZznw=.f2262137-2fec-4297-ae1e-89b11874266f@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> <A2v60vdAPL9qb22NB6kLVyuCACPDeqHUYoYFRFX6ig0=.9ef6f86b-559d-463a-9061-d0bbb6093aa7@github.com> <ukQ_tEZztKeBZnn8TDo3YfJ4GI0mHUrVRZmgM4d1W1g=.1fc9f9f2-c2bf-4237-94d4-dd9aae26411b@github.com> <BolXJ-8qekfYskirR9P20jAQZW6s7WPe4A-oija7RA8=.855251f0-4246-403d-a9fe-00b9406f07e3@github.com> <eLDcJyPLboqZr-8yk1kxVfV6WTaRYXZq5lZvDoIEFKM=.c87b23c8-d9c5-45ff-a2dd-5f0c4875cb62@github.com> <UAjH__AKdU3UMdJBkg7TlElKSA8mEFFE0MiElVrYexE=.4bc67a26-3383-4e4e-92b0-f1d3d33c5ce2@github.com> <M5xQ14pzHdBEr7yAdAqIVUsY_o8tXUgN9HpKxjkZznw=.f2262137-2fec-4297-ae1e-89b11874266f@github.com> Message-ID: <YxBy1Mx7Di5EDfJkCTfcaIuTzCv5KdzBzKMcE3iIeak=.2a56f436-8e14-4a22-a85d-cd06209e2c01@github.com> On Tue, 23 Jul 2024 19:14:57 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: > > Also also, Klass::is_subtype_of() is used for C1 runtime. > > Can you elaborate, please? Sorry, that was rather vague. In C1-compiled code, the Java method `Class::isInstance(Object)`calls `Klass::is_subtype_of()`. In general, I find it difficult to decide how much work, if any, should be done to improve C1 performance. Clearly, if C1 exists only to help with startup time in a tiered compilation system, the answer is "not much". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1689414385 From ayang at openjdk.org Wed Jul 24 09:11:13 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 24 Jul 2024 09:11:13 GMT Subject: RFR: 8337027: Parallel: Obsolete BaseFootPrintEstimate [v2] In-Reply-To: <wULp2EAECh8W75aA83GCDEq9GzldQzBwwe16SqY6phk=.902d4251-a271-4575-8ac3-4f2224ca453c@github.com> References: <wULp2EAECh8W75aA83GCDEq9GzldQzBwwe16SqY6phk=.902d4251-a271-4575-8ac3-4f2224ca453c@github.com> Message-ID: <VQeOV8bJxKRoDHOg5MkGa8ukguwU0SaiB3SpL3gq3_g=.4b4386f8-fc8e-4bd7-ac15-c089c59fb05c@github.com> > Simple obsoleting a Parallel GC product flag. Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20299/files - new: https://git.openjdk.org/jdk/pull/20299/files/59f96d13..10720a6d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20299&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20299&range=00-01 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20299.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20299/head:pull/20299 PR: https://git.openjdk.org/jdk/pull/20299 From rrich at openjdk.org Wed Jul 24 09:21:30 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 24 Jul 2024 09:21:30 GMT Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' In-Reply-To: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com> References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com> Message-ID: <8nHc_TnY9HDJBPodhK-8koS35dtVS7H-dBXZQCosz9A=.6e8ceb41-d795-46b7-8b05-c74416e9a313@github.com> On Tue, 23 Jul 2024 09:49:38 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote: > When running with ubsan - enabled binaries, some tests trigger the following report : > > src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' > #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91 > #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106 > #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157 > #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63 > #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85 > #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319 > #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233 > #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199 > > Seems in case of (at least) class SmallRegisterMap we miss handling nullptr . Hi Matthias, I think this is intended. No instances of SmallRegisterMap are ever created. Instead [SmallRegisterMap::instance](https://github.com/openjdk/jdk/blob/5b4824cf9aba297fa6873ebdadc0e9545647e90d/src/hotspot/cpu/x86/smallRegisterMap_x86.inline.hpp#L34C20-L34C36) is used: ```C++ static constexpr SmallRegisterMap* instance = nullptr; The type is the only information that is actually used. I guess you could fix the undefined behavior by replacing all uses of SmallRegisterMap::instance with the address of a stack allocated temporary SmallRegisterMap(). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20296#issuecomment-2247337815 From thartmann at openjdk.org Wed Jul 24 10:34:59 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 24 Jul 2024 10:34:59 GMT Subject: RFR: 8336999: Verification for resource area allocated data structures in C2 Message-ID: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com> Similar to [GrowableArrayNestingCheck](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/growableArray.cpp#L60), we should implement a check for C2's resource allocated data structures that verifies that reallocation happens under the same `ResourceMark` as the original allocation. Otherwise, use-after-free bugs like [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) will lead to memory corruption. This change adds a [ReallocMark](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/memory/allocation.cpp#L233) to all resource allocated data structures used by C2. I slightly modified it such that it checks the arena and skips verification if the data is not allocated in the resource arena. I also modified the grow methods such that we perform verification even if no reallocation is required. In addition, I changed a few `Unique_Node_List` allocations in vector.cpp from `comp_arena` to resource area allocations because they only have a short lifetime. While testing, I hit the verification code from: V [libjvm.so+0x5c1ceb] ReallocMark::check(Arena*)+0x7b (allocation.cpp:244) V [libjvm.so+0x6df2da] Block_Array::grow(unsigned int)+0x1a (block.cpp:43) V [libjvm.so+0xb88679] PhaseCFG::do_DFS(Tarjan*, unsigned int)+0x159 (block.hpp:72) V [libjvm.so+0xb88b6b] PhaseCFG::build_dominator_tree()+0xab (domgraph.cpp:74) V [libjvm.so+0xd75791] PhaseCFG::do_global_code_motion()+0x11 (gcm.cpp:1635) V [libjvm.so+0x9f4fd4] Compile::Code_Gen()+0x2a4 (compile.cpp:2949) V [libjvm.so+0x9f5f16] Compile::Compile(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, DirectiveSet*)+0xba6 (compile.cpp:991) It's a false positive because the code in `PhaseCFG::build_dominator_tree` pre-grows `PhaseCFG::_blocks` to prevent reallocation before entering the scope of a nested ResourceMark. I think that's bad practice and should be avoided. I changed the code to allocate `_blocks` in a separate arena and removed the pre-growing. This detects [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) right away, even with `java -Xcomp -version`. We should revisit the footprint impact of arena allocations in C2 with [JDK-8337015](https://bugs.openjdk.org/browse/JDK-8337015). Thanks, Tobias ------------- Commit messages: - Reverted fix for 8336095 - Small fix + refactoring - First prototype - tier1-3 pass Changes: https://git.openjdk.org/jdk/pull/20311/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20311&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8336999 Stats: 56 lines in 14 files changed: 33 ins; 8 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/20311.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20311/head:pull/20311 PR: https://git.openjdk.org/jdk/pull/20311 From kevinw at openjdk.org Wed Jul 24 10:40:36 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 24 Jul 2024 10:40:36 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v9] In-Reply-To: <ASx5pXkZUT9ZmH7duwX5AhSsKC6HhUvhauP_qnvYcZE=.8abdc5f3-39b1-4247-b6a3-2d05a68db4f8@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <ASx5pXkZUT9ZmH7duwX5AhSsKC6HhUvhauP_qnvYcZE=.8abdc5f3-39b1-4247-b6a3-2d05a68db4f8@github.com> Message-ID: <uZeEnxjF6MkDWWOYSdfSUsP9VHParHx7dRweSXjaeM0=.3f3654c8-bf79-4cf3-88de-ba5530276cd9@github.com> On Tue, 23 Jul 2024 17:57:04 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. >> >> This PR addresses the following diagnostic commands: >> - [x] Compiler.perfmap >> - [x] GC.heap_dump >> - [x] System.dump_map >> - [x] Thread.dump_to_file >> - [x] VM.cds >> >> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). >> >> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, >> >> >> filename (Optional) Name of the file to which the flight recording data is >> written when the recording is stopped. If no filename is given, a >> filename is generated from the PID and the current date and is >> placed in the directory where the process was started. The >> filename may also be a directory in which case, the filename is >> generated from the PID and the current date in the specified >> directory. (STRING, no default value) >> >> Note: If a filename is given, '%p' in the filename will be >> replaced by the PID, and '%t' will be replaced by the time in >> 'yyyy_MM_dd_HH_mm_ss' format. >> >> >> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. >> >> Testing: >> >> - [x] Added test case passes. >> - [x] Modified existing VM.cds tests to also check for `%p` filenames. >> >> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). >> >> Cheers, >> Sonia > > Sonia Zaldana Calles has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Merge master > - Fixing formatting > - Inlining buffer and making field private > - Reverting to functional changes in parserTests.cpp > - Error messaging format > - Fixing memory leak > - Fixing pointer style, s/NULL/nullptr, and exception > - Cleaning up parserTests.cpp > - Missing copyright header update > - Adding tests for file dcmd argument > - ... and 5 more: https://git.openjdk.org/jdk/compare/2f2223d7...52ca557d Is the help output working OK? Do these commands' help outputs show the new %p filename? I think it's good that they would. We should expect users of these commands to implicity understand a %p although we can still explain it, e.g. in a separate update in the man page. I just think we should be explicit that the help output is changing. In src/hotspot/share/runtime/java.cpp: if (DumpPerfMapAtExit) { CodeCache::write_perf_map(.... It may need to pass DEFAULT_PERFMAP_FILENAME (and tty). Do you have the change from JDK-8327054 merged into this branch? src/hotspot/share/services/diagnosticArgument.hpp line 65: > 63: class FileArgument { > 64: private: > 65: char _name[1024]; Probably JVM_MAXPATHLEN (which might also be 1024). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2247558224 PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2247560119 PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1689545224 From stuefe at openjdk.org Wed Jul 24 10:47:32 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 24 Jul 2024 10:47:32 GMT Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic In-Reply-To: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com> References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com> Message-ID: <yqXqOJCnOBQToVnGiTvMv9SRVECCZuArbWqfiVEj6VE=.eb63b66f-63a5-4c51-8757-87f2694afd98@github.com> On Tue, 23 Jul 2024 21:46:50 GMT, Ashutosh Mehra <asmehra at openjdk.org> wrote: > Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031) > > Testing: > test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java > test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java I plan to look at this later this week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20304#issuecomment-2247575688 From thartmann at openjdk.org Wed Jul 24 12:08:32 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 24 Jul 2024 12:08:32 GMT Subject: RFR: 8336999: Verification for resource area allocated data structures in C2 In-Reply-To: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com> References: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com> Message-ID: <-ivSR7ytOee0BB9h1RzdWBVUZyyW9G2ANy1gLpyFCSE=.598ece24-fcc9-447e-8f0f-bc5df7bfe903@github.com> On Wed, 24 Jul 2024 10:29:32 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: > Similar to [GrowableArrayNestingCheck](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/growableArray.cpp#L60), we should implement a check for C2's resource allocated data structures that verifies that reallocation happens under the same `ResourceMark` as the original allocation. Otherwise, use-after-free bugs like [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) will lead to memory corruption. > > This change adds a [ReallocMark](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/memory/allocation.cpp#L233) to all resource allocated data structures used by C2. I slightly modified it such that it checks the arena and skips verification if the data is not allocated in the resource arena. I also modified the grow methods such that we perform verification even if no reallocation is required. In addition, I changed a few `Unique_Node_List` allocations in vector.cpp from `comp_arena` to resource area allocations because they only have a short lifetime. > > While testing, I hit the verification code from: > > V [libjvm.so+0x5c1ceb] ReallocMark::check(Arena*)+0x7b (allocation.cpp:244) > V [libjvm.so+0x6df2da] Block_Array::grow(unsigned int)+0x1a (block.cpp:43) > V [libjvm.so+0xb88679] PhaseCFG::do_DFS(Tarjan*, unsigned int)+0x159 (block.hpp:72) > V [libjvm.so+0xb88b6b] PhaseCFG::build_dominator_tree()+0xab (domgraph.cpp:74) > V [libjvm.so+0xd75791] PhaseCFG::do_global_code_motion()+0x11 (gcm.cpp:1635) > V [libjvm.so+0x9f4fd4] Compile::Code_Gen()+0x2a4 (compile.cpp:2949) > V [libjvm.so+0x9f5f16] Compile::Compile(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, DirectiveSet*)+0xba6 (compile.cpp:991) > > > It's a false positive because the code in `PhaseCFG::build_dominator_tree` pre-grows `PhaseCFG::_blocks` to prevent reallocation before entering the scope of a nested ResourceMark. I think that's bad practice and should be avoided. I changed the code to allocate `_blocks` in a separate arena and removed the pre-growing. > > This detects [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) right away, even with `java -Xcomp -version`. > > We should revisit the footprint impact of arena allocations in C2 with [JDK-8337015](https://bugs.openjdk.org/browse/JDK-8337015). > > Thanks, > Tobias Github testing failed because [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) is not yet integrated. Testing with the fix all passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20311#issuecomment-2247742221 From mbaesken at openjdk.org Wed Jul 24 12:57:46 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 24 Jul 2024 12:57:46 GMT Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' [v2] In-Reply-To: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com> References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com> Message-ID: <OMBPXRc-bFS9BXXLhEKgP7VMhvji7SNPi3j-WPV5zx4=.1b8919be-6c92-4328-b4f7-ea5c367a4731@github.com> > When running with ubsan - enabled binaries, some tests trigger the following report : > > src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' > #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91 > #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106 > #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157 > #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63 > #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85 > #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319 > #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233 > #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199 > > Seems in case of (at least) class SmallRegisterMap we miss handling nullptr . Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: avoid the nullptr checks, this causes aserts in jtreg tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20296/files - new: https://git.openjdk.org/jdk/pull/20296/files/6e063a11..436648c2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20296&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20296&range=00-01 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20296.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20296/head:pull/20296 PR: https://git.openjdk.org/jdk/pull/20296 From mbaesken at openjdk.org Wed Jul 24 13:14:32 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 24 Jul 2024 13:14:32 GMT Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' In-Reply-To: <8nHc_TnY9HDJBPodhK-8koS35dtVS7H-dBXZQCosz9A=.6e8ceb41-d795-46b7-8b05-c74416e9a313@github.com> References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com> <8nHc_TnY9HDJBPodhK-8koS35dtVS7H-dBXZQCosz9A=.6e8ceb41-d795-46b7-8b05-c74416e9a313@github.com> Message-ID: <hm7KZuwHxMUaRLOqwzOzbxuSasnYmlgDkjzODF92bw4=.6e9ab6d2-8529-4cfc-8a5f-f3e92fb1c6a4@github.com> On Wed, 24 Jul 2024 09:19:22 GMT, Richard Reingruber <rrich at openjdk.org> wrote: > I think this is intended. No instances of SmallRegisterMap are ever created. Probably it is better just to switch off ubsan for the method, if it's intended. On the other hand for SmallRegisterMap in_cont returns always false (so we could at least avoid this `reg_map->in_cont()` call for SmallRegisterMap) cpu/aarch64/smallRegisterMap_aarch64.inline.hpp:80: bool in_cont() const { return false; } cpu/arm/smallRegisterMap_arm.inline.hpp:73: bool in_cont() const { return false; } cpu/ppc/smallRegisterMap_ppc.inline.hpp:79: bool in_cont() const { return false; } cpu/s390/smallRegisterMap_s390.inline.hpp:73: bool in_cont() const { return false; } cpu/x86/smallRegisterMap_x86.inline.hpp:80: bool in_cont() const { return false; } cpu/zero/smallRegisterMap_zero.inline.hpp:73: bool in_cont() const { return false; } cpu/riscv/smallRegisterMap_riscv.inline.hpp:80: bool in_cont() const { return false; } ------------- PR Comment: https://git.openjdk.org/jdk/pull/20296#issuecomment-2247888008 From mbaesken at openjdk.org Wed Jul 24 13:59:44 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 24 Jul 2024 13:59:44 GMT Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' [v3] In-Reply-To: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com> References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com> Message-ID: <kjTvU8mJqos12UWZcYqG16iDGRtAWrrpweJNCHmZGf0=.d6f0ff54-bf40-4ff9-a16c-438def9a435f@github.com> > When running with ubsan - enabled binaries, some tests trigger the following report : > > src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' > #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91 > #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106 > #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157 > #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63 > #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85 > #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319 > #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233 > #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199 > > Seems in case of (at least) class SmallRegisterMap we miss handling nullptr . Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: ATTRIBUTE_NO_UBSAN must be after template typename ... ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20296/files - new: https://git.openjdk.org/jdk/pull/20296/files/436648c2..390a2176 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20296&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20296&range=01-02 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20296.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20296/head:pull/20296 PR: https://git.openjdk.org/jdk/pull/20296 From mbaesken at openjdk.org Wed Jul 24 14:13:31 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 24 Jul 2024 14:13:31 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured In-Reply-To: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com> References: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com> Message-ID: <-jpIPJPQJV4xnJ0Oz-wme6x6vmyGwYVN7GtLBbp3YDM=.133e895a-d953-43f5-ad55-1b909294ad23@github.com> On Wed, 26 Jun 2024 13:32:32 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote: > Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work. > > We find this in the test output > > [STDOUT] > /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory > > The container where the test is executed does not contain the ubsan package; we might skip the test in this case. Hello, any input on this ? Currently the situation with the jtreg tests is not good when running ubsan-enabled binaries . So we have to check for ubsan (see the PR for an example) or install ubsan into the docker (or podman or ...) container . ------------- PR Comment: https://git.openjdk.org/jdk/pull/19907#issuecomment-2248077273 From aph at openjdk.org Wed Jul 24 14:34:09 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 24 Jul 2024 14:34:09 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v6] In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: <ClSaxeO-M405jBL-Y6YELu6xofwt7R8kp6PEdalyw_8=.c23b3958-85f4-4265-9744-125a286d75db@github.com> > This patch expands the use of a hash table for secondary superclasses > to the interpreter, C1, and runtime. It also adds a C2 implementation > of hashed lookup in cases where the superclass isn't known at compile > time. > > HotSpot shared runtime > ---------------------- > > Building hashed secondary tables is now unconditional. It takes very > little time, and now that the shared runtime always has the tables, it > might as well take advantage of them. The shared code is easier to > follow now, I think. > > There might be a performance issue with x86-64 in that we build > HotSpot for a default x86-64 target that does not support popcount. > This means that HotSpot C++ runtime on x86 always uses a software > emulation for popcount, even though the vast majority of machines made > for the past 20 years can do popcount in a single instruction. It > wouldn't be terribly hard to do something about that. > > Having said that, the software popcount is really not bad. > > x86 > --- > > x86 is rather tricky, because we still support > `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as > well as 32- and 64-bit ports. There's some further complication in > that only `RCX` can be used as a shift count, so there's some register > shuffling to do. All of this makes the logic in macroAssembler_x86.cpp > rather gnarly, with multiple levels of conditionals at compile time > and runtime. > > AArch64 > ------- > > AArch64 is considerably more straightforward. We always have a > popcount instruction and (thankfully) no 32-bit code to worry about. > > Generally > --------- > > I would dearly love simply to rip out the "old" secondary supers cache > support, but I've left it in just in case someone has a performance > regression. > > The versions of `MacroAssembler::lookup_secondary_supers_table` that > work with variable superclasses don't take a fixed set of temp > registers, and neither do they call out to to a slow path subroutine. > Instead, the slow patch is expanded inline. > > I don't think this is necessarily bad. Apart from the very rare cases > where C2 can't determine the superclass to search for at compile time, > this code is only used for generating stubs, and it seemed to me > ridiculous to have stubs calling other stubs. > > I've followed the guidance from @iwanowww not to obsess too much about > the performance of C1-compiled secondary supers lookups, and to prefer > simplicity over absolute performance. Nonetheless, this is a > complicated patch that touches many areas. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19989/files - new: https://git.openjdk.org/jdk/pull/19989/files/02cfd130..48e80a13 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=04-05 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19989.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989 PR: https://git.openjdk.org/jdk/pull/19989 From aph at openjdk.org Wed Jul 24 14:34:09 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 24 Jul 2024 14:34:09 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v6] In-Reply-To: <Ct5EunuM4nq5EUa-kDtCzKs-O4Z_wEnMq2_5W7GPaeY=.f475ee5b-3bea-48a7-97d1-7f71287e4fc9@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> <wgY2erz716MCi6K6DcUKEqLyd6E82ArMlba9qHdAA9o=.de21daa5-b078-4469-a6eb-df548f699f65@github.com> <Ct5EunuM4nq5EUa-kDtCzKs-O4Z_wEnMq2_5W7GPaeY=.f475ee5b-3bea-48a7-97d1-7f71287e4fc9@github.com> Message-ID: <TxLB7H7lM8c1e-Hc5PvGAiuil1YKOfWqg_EJUwFp4O8=.554e044f-6e2a-4216-96cc-9d55b309280d@github.com> On Tue, 23 Jul 2024 19:00:02 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: > What happens when users include `klass.hpp`, but not `klass.inline.hpp`? How does it affect generated code? > > I suspect that `Klass::search_secondary_supers()` won't be inlinined in such case. That is true. I can't tell from this exchange whether you think it should. The same "wont be inlined" fact is also true of everything else in `klass.inline.hpp`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1689927770 From tschatzl at openjdk.org Wed Jul 24 14:38:32 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 24 Jul 2024 14:38:32 GMT Subject: RFR: 8337027: Parallel: Obsolete BaseFootPrintEstimate [v2] In-Reply-To: <VQeOV8bJxKRoDHOg5MkGa8ukguwU0SaiB3SpL3gq3_g=.4b4386f8-fc8e-4bd7-ac15-c089c59fb05c@github.com> References: <wULp2EAECh8W75aA83GCDEq9GzldQzBwwe16SqY6phk=.902d4251-a271-4575-8ac3-4f2224ca453c@github.com> <VQeOV8bJxKRoDHOg5MkGa8ukguwU0SaiB3SpL3gq3_g=.4b4386f8-fc8e-4bd7-ac15-c089c59fb05c@github.com> Message-ID: <NHXk2evSAldw2eag8lB8erd2dVVoikl6Dm1dJOgCdu8=.cf0a7fea-6ca1-4ddb-a874-8f8860541127@github.com> On Wed, 24 Jul 2024 09:11:13 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: >> Simple obsoleting a Parallel GC product flag. > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20299#pullrequestreview-2196941028 From mbaesken at openjdk.org Wed Jul 24 14:41:32 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 24 Jul 2024 14:41:32 GMT Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' [v3] In-Reply-To: <kjTvU8mJqos12UWZcYqG16iDGRtAWrrpweJNCHmZGf0=.d6f0ff54-bf40-4ff9-a16c-438def9a435f@github.com> References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com> <kjTvU8mJqos12UWZcYqG16iDGRtAWrrpweJNCHmZGf0=.d6f0ff54-bf40-4ff9-a16c-438def9a435f@github.com> Message-ID: <QSl_zKdSoRYcDRQDiF5XlB9VxJWoL9il_rGqEO5ypbA=.56b1949d-5489-4a20-949d-86c3f10c9645@github.com> On Wed, 24 Jul 2024 13:59:44 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote: >> When running with ubsan - enabled binaries, some tests trigger the following report : >> >> src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' >> #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91 >> #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106 >> #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157 >> #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63 >> #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85 >> #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319 >> #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233 >> #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199 >> >> Seems in case of (at least) class SmallRegisterMap we miss handling nullptr . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > ATTRIBUTE_NO_UBSAN must be after template typename ... When using the `ATTRIBUTE_NO_UBSAN` for `frame::oopmapreg_to_location` , we unfortunately run into another similar looking issue (e.g. when running jtreg test java/net/vthread/HttpALot.java) /jdk/src/hotspot/share/runtime/stackChunkFrameStream.inline.hpp:286:46: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' #0 0x7febd955502d in void* StackChunkFrameStream<(ChunkFrames)1>::reg_to_loc<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/stackChunkFrameStream.inline.hpp:286 #1 0x7febd955502d in void StackChunkFrameStream<(ChunkFrames)1>::iterate_oops<BarrierClosure<(stackChunkOopDesc::BarrierType)1, true>, SmallRegisterMap>(BarrierClosure<(stackChunkOopDesc::BarrierType)1, true>*, SmallRegisterMap const*) const src/hotspot/share/runtime/stackChunkFrameStream.inline.hpp:373 #2 0x7febd955502d in void stackChunkOopDesc::do_barriers0<(stackChunkOopDesc::BarrierType)1, (ChunkFrames)1, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)1> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:375 #3 0x7febd75a2121 in void stackChunkOopDesc::do_barriers<(stackChunkOopDesc::BarrierType)1, (ChunkFrames)1, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)1> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.inline.hpp:193 #4 0x7febd75a2121 in ThawBase::recurse_thaw_compiled_frame(frame const&, frame&, int, bool) src/hotspot/share/runtime/continuationFreezeThaw.cpp:2246 #5 0x7febd75a1f60 in bool ThawBase::recurse_thaw_java_frame<ContinuationHelper::CompiledFrame>(frame&, int) src/hotspot/share/runtime/continuationFreezeThaw.cpp:2092 #6 0x7febd75a1f60 in ThawBase::recurse_thaw_compiled_frame(frame const&, frame&, int, bool) src/hotspot/share/runtime/continuationFreezeThaw.cpp:2249 #7 0x7febd75a6aca in ThawBase::thaw_slow(stackChunkOopDesc*, bool) src/hotspot/share/runtime/continuationFreezeThaw.cpp:2040 #8 0x7febd75aa156 in Thaw<Config<(oop_kind)0, G1BarrierSet> >::thaw(Continuation::thaw_kind) src/hotspot/share/runtime/continuationFreezeThaw.cpp:1825 #9 0x7febd75aa156 in thaw_internal<Config<(oop_kind)0, G1BarrierSet> > src/hotspot/share/runtime/continuationFreezeThaw.cpp:2450 #10 0x7febd75aa156 in Config<(oop_kind)0, G1BarrierSet>::thaw(JavaThread*, Continuation::thaw_kind) src/hotspot/share/runtime/continuationFreezeThaw.cpp:276 #11 0x7febd75aa156 in thaw<Config<(oop_kind)0, G1BarrierSet> > src/hotspot/share/runtime/continuationFreezeThaw.cpp:253 #12 0x7febbb89c526 (<unknown module>) this time it is the map->location call through a nullptr template <ChunkFrames frame_kind> template <typename RegisterMapT> inline void* StackChunkFrameStream<frame_kind>::reg_to_loc(VMReg reg, const RegisterMapT* map) const { assert(!is_done(), ""); return reg->is_reg() ? (void*)map->location(reg, sp()) // see frame::update_map_with_saved_link(&map, link_addr); : (void*)((address)unextended_sp() + (reg->reg2stack() * VMRegImpl::stack_slot_size)); } But SmallRegisterMap::location is for some platforms even UnImplemented so how does this work cross platform ? https://github.com/openjdk/jdk/blob/332df83e7cb1f272c08f8e4955d6abaf3f091ace/src/hotspot/cpu/arm/smallRegisterMap_arm.inline.hpp#L56 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20296#issuecomment-2248182106 From aph at openjdk.org Wed Jul 24 15:54:35 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 24 Jul 2024 15:54:35 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v6] In-Reply-To: <TxLB7H7lM8c1e-Hc5PvGAiuil1YKOfWqg_EJUwFp4O8=.554e044f-6e2a-4216-96cc-9d55b309280d@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> <wgY2erz716MCi6K6DcUKEqLyd6E82ArMlba9qHdAA9o=.de21daa5-b078-4469-a6eb-df548f699f65@github.com> <Ct5EunuM4nq5EUa-kDtCzKs-O4Z_wEnMq2_5W7GPaeY=.f475ee5b-3bea-48a7-97d1-7f71287e4fc9@github.com> <TxLB7H7lM8c1e-Hc5PvGAiuil1YKOfWqg_EJUwFp4O8=.554e044f-6e2a-4216-96cc-9d55b309280d@github.com> Message-ID: <sYvOt6BDxFIPAczwoEop5-nUNHQeOi-IH2hGlSVL0ww=.8f6ed07f-0251-44c9-a3a0-0742dabbc15c@github.com> On Wed, 24 Jul 2024 14:29:09 GMT, Andrew Haley <aph at openjdk.org> wrote: > I suspect that Klass::search_secondary_supers() won't be inlinined in such case. That's true, but it's true of every other function in that file. Is it not deliberate? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1690061017 From rrich at openjdk.org Wed Jul 24 16:13:31 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 24 Jul 2024 16:13:31 GMT Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' [v3] In-Reply-To: <kjTvU8mJqos12UWZcYqG16iDGRtAWrrpweJNCHmZGf0=.d6f0ff54-bf40-4ff9-a16c-438def9a435f@github.com> References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com> <kjTvU8mJqos12UWZcYqG16iDGRtAWrrpweJNCHmZGf0=.d6f0ff54-bf40-4ff9-a16c-438def9a435f@github.com> Message-ID: <KbPe2MYqawbLoL6e-pHuh6OtViYi0Uy6UPbOpi5YHMw=.c587eee4-6e5a-401e-b265-16192ea4f893@github.com> On Wed, 24 Jul 2024 13:59:44 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote: >> When running with ubsan - enabled binaries, some tests trigger the following report : >> >> src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' >> #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91 >> #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106 >> #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157 >> #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63 >> #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85 >> #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319 >> #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233 >> #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199 >> >> Seems in case of (at least) class SmallRegisterMap we miss handling nullptr . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > ATTRIBUTE_NO_UBSAN must be after template typename ... Continuations (-XX:+VMContinuations) haven't been ported to Arm32. `SmallRegisterMap` depends on it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20296#issuecomment-2248407753 From aph at openjdk.org Wed Jul 24 16:17:34 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 24 Jul 2024 16:17:34 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v6] In-Reply-To: <sYvOt6BDxFIPAczwoEop5-nUNHQeOi-IH2hGlSVL0ww=.8f6ed07f-0251-44c9-a3a0-0742dabbc15c@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> <wgY2erz716MCi6K6DcUKEqLyd6E82ArMlba9qHdAA9o=.de21daa5-b078-4469-a6eb-df548f699f65@github.com> <Ct5EunuM4nq5EUa-kDtCzKs-O4Z_wEnMq2_5W7GPaeY=.f475ee5b-3bea-48a7-97d1-7f71287e4fc9@github.com> <TxLB7H7lM8c1e-Hc5PvGAiuil1YKOfWqg_EJUwFp4O8=.554e044f-6e2a-4216-96cc-9d55b309280d@github.com> <sYvOt6BDxFIPAczwoEop5-nUNHQeOi-IH2hGlSVL0ww=.8f6ed07f-0251-44c9-a3a0-0742dabbc15c@github.com> Message-ID: <eMrlgijA4K3kj9F7-cj1RBWRQg0rc9faF13SR9UdEys=.4ca5c8f7-6462-4adf-8160-91c018985822@github.com> On Wed, 24 Jul 2024 15:51:26 GMT, Andrew Haley <aph at openjdk.org> wrote: >>> What happens when users include `klass.hpp`, but not `klass.inline.hpp`? How does it affect generated code? >>> >>> I suspect that `Klass::search_secondary_supers()` won't be inlinined in such case. >> >> That is true. I can't tell from this exchange whether you think it should. The same "wont be inlined" fact is also true of everything else in `klass.inline.hpp`. > >> I suspect that Klass::search_secondary_supers() won't be inlinined in such case. > > That's true, but it's true of every other function in that file. Is it not deliberate? FYI, somewhat related: AArch64 GCC inlines `lookup_secondary_supers_table()` 237 times (it's only a few instructions.) x86-64 GCC, because it doesn't use a popcount intrinsic, decides that `lookup_secondary_supers_table()` is too big to be worth inlining in all but 3 cases. So the right thing happens, I think: where we can profit from fast lookups without bloating the runtime, we do. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1690096760 From szaldana at openjdk.org Wed Jul 24 17:57:59 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Wed, 24 Jul 2024 17:57:59 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v9] In-Reply-To: <uZeEnxjF6MkDWWOYSdfSUsP9VHParHx7dRweSXjaeM0=.3f3654c8-bf79-4cf3-88de-ba5530276cd9@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <ASx5pXkZUT9ZmH7duwX5AhSsKC6HhUvhauP_qnvYcZE=.8abdc5f3-39b1-4247-b6a3-2d05a68db4f8@github.com> <uZeEnxjF6MkDWWOYSdfSUsP9VHParHx7dRweSXjaeM0=.3f3654c8-bf79-4cf3-88de-ba5530276cd9@github.com> Message-ID: <tkO2QN5Nzk3njsiyCgolhdy7fzZ26PfDHe44LK3vUf8=.33b8e759-8725-4333-93ae-9d2a14c523b5@github.com> On Wed, 24 Jul 2024 10:35:35 GMT, Kevin Walls <kevinw at openjdk.org> wrote: >> Sonia Zaldana Calles has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: >> >> - Merge master >> - Fixing formatting >> - Inlining buffer and making field private >> - Reverting to functional changes in parserTests.cpp >> - Error messaging format >> - Fixing memory leak >> - Fixing pointer style, s/NULL/nullptr, and exception >> - Cleaning up parserTests.cpp >> - Missing copyright header update >> - Adding tests for file dcmd argument >> - ... and 5 more: https://git.openjdk.org/jdk/compare/2f2223d7...52ca557d > > src/hotspot/share/services/diagnosticArgument.hpp line 65: > >> 63: class FileArgument { >> 64: private: >> 65: char _name[1024]; > > Probably JVM_MAXPATHLEN (which might also be 1024). Hi, I avoided JVM_MAXPATHLEN because of this comment https://github.com/openjdk/jdk/pull/20198#discussion_r1685297940 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1690215958 From szaldana at openjdk.org Wed Jul 24 17:57:58 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Wed, 24 Jul 2024 17:57:58 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v10] In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> Message-ID: <M2iFmQIvcLoIJSBEaHBKzQs8Tku6MDwuJFJtKTYgB9I=.da1c7e91-c22a-47ca-a944-1ad8a489d920@github.com> > Hi all, > > This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. > > This PR addresses the following diagnostic commands: > - [x] Compiler.perfmap > - [x] GC.heap_dump > - [x] System.dump_map > - [x] Thread.dump_to_file > - [x] VM.cds > > Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). > > I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, > > > filename (Optional) Name of the file to which the flight recording data is > written when the recording is stopped. If no filename is given, a > filename is generated from the PID and the current date and is > placed in the directory where the process was started. The > filename may also be a directory in which case, the filename is > generated from the PID and the current date in the specified > directory. (STRING, no default value) > > Note: If a filename is given, '%p' in the filename will be > replaced by the PID, and '%t' will be replaced by the time in > 'yyyy_MM_dd_HH_mm_ss' format. > > > Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. > > Testing: > > - [x] Added test case passes. > - [x] Modified existing VM.cds tests to also check for `%p` filenames. > > Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). > > Cheers, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: fixing pointer style ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20198/files - new: https://git.openjdk.org/jdk/pull/20198/files/52ca557d..dc1bfe1d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=08-09 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198 PR: https://git.openjdk.org/jdk/pull/20198 From szaldana at openjdk.org Wed Jul 24 18:05:49 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Wed, 24 Jul 2024 18:05:49 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v11] In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> Message-ID: <5KMQ16ZAAx9VI9fpawUyrvQSqll5wzs9lCC1bRL62ow=.34c7def9-ab85-42b7-af4b-a433a5b5cf2c@github.com> > Hi all, > > This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. > > This PR addresses the following diagnostic commands: > - [x] Compiler.perfmap > - [x] GC.heap_dump > - [x] System.dump_map > - [x] Thread.dump_to_file > - [x] VM.cds > > Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). > > I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, > > > filename (Optional) Name of the file to which the flight recording data is > written when the recording is stopped. If no filename is given, a > filename is generated from the PID and the current date and is > placed in the directory where the process was started. The > filename may also be a directory in which case, the filename is > generated from the PID and the current date in the specified > directory. (STRING, no default value) > > Note: If a filename is given, '%p' in the filename will be > replaced by the PID, and '%t' will be replaced by the time in > 'yyyy_MM_dd_HH_mm_ss' format. > > > Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. > > Testing: > > - [x] Added test case passes. > - [x] Modified existing VM.cds tests to also check for `%p` filenames. > > Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). > > Cheers, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: Updating help text for VM.cds ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20198/files - new: https://git.openjdk.org/jdk/pull/20198/files/dc1bfe1d..34e3f80a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=09-10 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198 PR: https://git.openjdk.org/jdk/pull/20198 From wkemper at openjdk.org Wed Jul 24 18:12:40 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 24 Jul 2024 18:12:40 GMT Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update mode Message-ID: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com> We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development. ------------- Commit messages: - Remove last vestiges of incremental update mode - Missed test, remove actual IU barrier flag - Remove missed iu_barrier usages for C1 - Update test (all barriers can be enabled now for all modes) - WIP: Remove incremental update mode Changes: https://git.openjdk.org/jdk/pull/20316/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20316&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8336685 Stats: 1696 lines in 69 files changed: 4 ins; 1658 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/20316.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20316/head:pull/20316 PR: https://git.openjdk.org/jdk/pull/20316 From amenkov at openjdk.org Wed Jul 24 18:36:41 2024 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 24 Jul 2024 18:36:41 GMT Subject: RFR: 8330427: Obsolete -XX:+PreserveAllAnnotations Message-ID: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com> Obsolete PreserveAllAnnotations flag which was deprecated in JDK 23. Testing: tier1,tier2,tier3,tier4,hs-tier5-svc ------------- Commit messages: - jcheck - fix Changes: https://git.openjdk.org/jdk/pull/20315/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20315&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330427 Stats: 192 lines in 7 files changed: 3 ins; 153 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/20315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20315/head:pull/20315 PR: https://git.openjdk.org/jdk/pull/20315 From szaldana at openjdk.org Wed Jul 24 18:40:16 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Wed, 24 Jul 2024 18:40:16 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v12] In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> Message-ID: <Hj_ISn8I6TozxRCyy2AqQs-F1rnETzYwqMmme-ih87M=.8a7b1a72-ae81-4d00-a6ac-11fa17ec978e@github.com> > Hi all, > > This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. > > This PR addresses the following diagnostic commands: > - [x] Compiler.perfmap > - [x] GC.heap_dump > - [x] System.dump_map > - [x] Thread.dump_to_file > - [x] VM.cds > > Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). > > I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, > > > filename (Optional) Name of the file to which the flight recording data is > written when the recording is stopped. If no filename is given, a > filename is generated from the PID and the current date and is > placed in the directory where the process was started. The > filename may also be a directory in which case, the filename is > generated from the PID and the current date in the specified > directory. (STRING, no default value) > > Note: If a filename is given, '%p' in the filename will be > replaced by the PID, and '%t' will be replaced by the time in > 'yyyy_MM_dd_HH_mm_ss' format. > > > Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. > > Testing: > > - [x] Added test case passes. > - [x] Modified existing VM.cds tests to also check for `%p` filenames. > > Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). > > Cheers, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: Adding default perfmap filename when invoked outside of diagnostic command ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20198/files - new: https://git.openjdk.org/jdk/pull/20198/files/34e3f80a..d43d90d1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=10-11 Stats: 5 lines in 3 files changed: 2 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198 PR: https://git.openjdk.org/jdk/pull/20198 From stuefe at openjdk.org Wed Jul 24 18:40:16 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 24 Jul 2024 18:40:16 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v11] In-Reply-To: <5KMQ16ZAAx9VI9fpawUyrvQSqll5wzs9lCC1bRL62ow=.34c7def9-ab85-42b7-af4b-a433a5b5cf2c@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <5KMQ16ZAAx9VI9fpawUyrvQSqll5wzs9lCC1bRL62ow=.34c7def9-ab85-42b7-af4b-a433a5b5cf2c@github.com> Message-ID: <yWaSEDOfex_SXtjwbWE8A-0RwJQIoICkmdQXOdOaOp0=.5d84ef94-8e21-4933-b3ab-8216c73bdc90@github.com> On Wed, 24 Jul 2024 18:05:49 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. >> >> This PR addresses the following diagnostic commands: >> - [x] Compiler.perfmap >> - [x] GC.heap_dump >> - [x] System.dump_map >> - [x] Thread.dump_to_file >> - [x] VM.cds >> >> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). >> >> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, >> >> >> filename (Optional) Name of the file to which the flight recording data is >> written when the recording is stopped. If no filename is given, a >> filename is generated from the PID and the current date and is >> placed in the directory where the process was started. The >> filename may also be a directory in which case, the filename is >> generated from the PID and the current date in the specified >> directory. (STRING, no default value) >> >> Note: If a filename is given, '%p' in the filename will be >> replaced by the PID, and '%t' will be replaced by the time in >> 'yyyy_MM_dd_HH_mm_ss' format. >> >> >> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. >> >> Testing: >> >> - [x] Added test case passes. >> - [x] Modified existing VM.cds tests to also check for `%p` filenames. >> >> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). >> >> Cheers, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Updating help text for VM.cds All good. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20198#pullrequestreview-2197495698 From szaldana at openjdk.org Wed Jul 24 18:40:16 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Wed, 24 Jul 2024 18:40:16 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v9] In-Reply-To: <uZeEnxjF6MkDWWOYSdfSUsP9VHParHx7dRweSXjaeM0=.3f3654c8-bf79-4cf3-88de-ba5530276cd9@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <ASx5pXkZUT9ZmH7duwX5AhSsKC6HhUvhauP_qnvYcZE=.8abdc5f3-39b1-4247-b6a3-2d05a68db4f8@github.com> <uZeEnxjF6MkDWWOYSdfSUsP9VHParHx7dRweSXjaeM0=.3f3654c8-bf79-4cf3-88de-ba5530276cd9@github.com> Message-ID: <8_dPuH2noHgNOFKzsBke96yBSdGoTwhBl0-pyXaoDhA=.e638cdb0-2ea1-42ca-bd8b-88eaf2b719ac@github.com> On Wed, 24 Jul 2024 10:38:01 GMT, Kevin Walls <kevinw at openjdk.org> wrote: > In src/hotspot/share/runtime/java.cpp: if (DumpPerfMapAtExit) { CodeCache::write_perf_map(.... > > It may need to pass DEFAULT_PERFMAP_FILENAME (and tty). > > Do you have the change from JDK-8327054 merged into this branch? Good point - I made an update to cover that invocation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2248669712 From shade at openjdk.org Wed Jul 24 18:44:31 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 24 Jul 2024 18:44:31 GMT Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update mode In-Reply-To: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com> References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com> Message-ID: <kkDjLgSV3zILB-gIvYB-JrYVowr9zOFEQEZoicZrKB0=.5e1d86ce-cef5-4255-a887-3ffdd0f2b7c2@github.com> On Wed, 24 Jul 2024 18:08:46 GMT, William Kemper <wkemper at openjdk.org> wrote: > We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development. > > ## Testing > * hotspot_gc_shenandoah > * dacapo > * diluvian > * extremem > * hyperalloc > * specjbb2015 > * specjvm2008 Good riddance. I have to comb through this more accurately tomorrow, but first pass comments below. src/hotspot/cpu/ppc/gc/shenandoah/shenandoahBarrierSetAssembler_ppc.cpp line 571: > 569: /* ==== Apply keep-alive barrier, if required (e.g., to inhibit weak reference resurrection) ==== */ > 570: if (ShenandoahBarrierSet::need_keep_alive_barrier(decorators, type)) { > 571: if (ShenandoahSATBBarrier) { A bit weird to replace IU with SATB barrier here. src/hotspot/share/opto/classes.hpp line 327: > 325: shmacro(ShenandoahWeakCompareAndSwapN) > 326: shmacro(ShenandoahWeakCompareAndSwapP) > 327: I think this newline is unnecessary. ------------- PR Review: https://git.openjdk.org/jdk/pull/20316#pullrequestreview-2197480695 PR Review Comment: https://git.openjdk.org/jdk/pull/20316#discussion_r1690256658 PR Review Comment: https://git.openjdk.org/jdk/pull/20316#discussion_r1690269614 From kdnilsen at openjdk.org Wed Jul 24 18:55:32 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 24 Jul 2024 18:55:32 GMT Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update mode In-Reply-To: <kkDjLgSV3zILB-gIvYB-JrYVowr9zOFEQEZoicZrKB0=.5e1d86ce-cef5-4255-a887-3ffdd0f2b7c2@github.com> References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com> <kkDjLgSV3zILB-gIvYB-JrYVowr9zOFEQEZoicZrKB0=.5e1d86ce-cef5-4255-a887-3ffdd0f2b7c2@github.com> Message-ID: <sdmP00bGZEd6m6TWUElgXdt1JL-IbSJNUFDAAcbv3bU=.0ce7b1f6-fc3c-49ed-b771-ca1311b208b0@github.com> On Wed, 24 Jul 2024 18:25:38 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: >> We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development. >> >> ## Testing >> * hotspot_gc_shenandoah >> * dacapo >> * diluvian >> * extremem >> * hyperalloc >> * specjbb2015 >> * specjvm2008 > > src/hotspot/cpu/ppc/gc/shenandoah/shenandoahBarrierSetAssembler_ppc.cpp line 571: > >> 569: /* ==== Apply keep-alive barrier, if required (e.g., to inhibit weak reference resurrection) ==== */ >> 570: if (ShenandoahBarrierSet::need_keep_alive_barrier(decorators, type)) { >> 571: if (ShenandoahSATBBarrier) { > > A bit weird to replace IU with SATB barrier here. Will need_keep_alive_barrier() always be false in absence of IU mode support? can we replace this with an assert? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20316#discussion_r1690285365 From coleenp at openjdk.org Wed Jul 24 18:56:33 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 24 Jul 2024 18:56:33 GMT Subject: RFR: 8330427: Obsolete -XX:+PreserveAllAnnotations In-Reply-To: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com> References: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com> Message-ID: <EwkVrk4h27-FwYktUeakBXODWJ43gCXl5s68QYLdh-I=.ab04c479-8c24-47ac-b84a-dccd0041ff31@github.com> On Wed, 24 Jul 2024 18:01:15 GMT, Alex Menkov <amenkov at openjdk.org> wrote: > Obsolete PreserveAllAnnotations flag which was deprecated in JDK 23. > > Testing: tier1,tier2,tier3,tier4,hs-tier5-svc This looks really good. I hadn't expected so much code we had to preserve these. Nice cleanup! ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20315#pullrequestreview-2197529968 From kvn at openjdk.org Wed Jul 24 19:00:33 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 24 Jul 2024 19:00:33 GMT Subject: RFR: 8336999: Verification for resource area allocated data structures in C2 In-Reply-To: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com> References: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com> Message-ID: <YpSBc5RXGIWZ-W1Xu3XYqXedrbX7vnZashVrght5v4k=.62192fc8-77a7-49f5-a185-17317753a521@github.com> On Wed, 24 Jul 2024 10:29:32 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: > Similar to [GrowableArrayNestingCheck](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/growableArray.cpp#L60), we should implement a check for C2's resource allocated data structures that verifies that reallocation happens under the same `ResourceMark` as the original allocation. Otherwise, use-after-free bugs like [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) will lead to memory corruption. > > This change adds a [ReallocMark](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/memory/allocation.cpp#L233) to all resource allocated data structures used by C2. I slightly modified it such that it checks the arena and skips verification if the data is not allocated in the resource arena. I also modified the grow methods such that we perform verification even if no reallocation is required. In addition, I changed a few `Unique_Node_List` allocations in vector.cpp from `comp_arena` to resource area allocations because they only have a short lifetime. > > While testing, I hit the verification code from: > > V [libjvm.so+0x5c1ceb] ReallocMark::check(Arena*)+0x7b (allocation.cpp:244) > V [libjvm.so+0x6df2da] Block_Array::grow(unsigned int)+0x1a (block.cpp:43) > V [libjvm.so+0xb88679] PhaseCFG::do_DFS(Tarjan*, unsigned int)+0x159 (block.hpp:72) > V [libjvm.so+0xb88b6b] PhaseCFG::build_dominator_tree()+0xab (domgraph.cpp:74) > V [libjvm.so+0xd75791] PhaseCFG::do_global_code_motion()+0x11 (gcm.cpp:1635) > V [libjvm.so+0x9f4fd4] Compile::Code_Gen()+0x2a4 (compile.cpp:2949) > V [libjvm.so+0x9f5f16] Compile::Compile(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, DirectiveSet*)+0xba6 (compile.cpp:991) > > > It's a false positive because the code in `PhaseCFG::build_dominator_tree` pre-grows `PhaseCFG::_blocks` to prevent reallocation before entering the scope of a nested ResourceMark. I think that's bad practice and should be avoided. I changed the code to allocate `_blocks` in a separate arena and removed the pre-growing. > > This detects [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) right away, even with `java -Xcomp -version`. > > We should revisit the footprint impact of arena allocations in C2 with [JDK-8337015](https://bugs.openjdk.org/browse/JDK-8337015). > > Thanks, > Tobias Looks good to me. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20311#pullrequestreview-2197536320 From kbarrett at openjdk.org Wed Jul 24 19:02:32 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 24 Jul 2024 19:02:32 GMT Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' [v3] In-Reply-To: <kjTvU8mJqos12UWZcYqG16iDGRtAWrrpweJNCHmZGf0=.d6f0ff54-bf40-4ff9-a16c-438def9a435f@github.com> References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com> <kjTvU8mJqos12UWZcYqG16iDGRtAWrrpweJNCHmZGf0=.d6f0ff54-bf40-4ff9-a16c-438def9a435f@github.com> Message-ID: <t1g-4dP38_LQWzBPFLqZlsHaDKKrLBc_4LzYSuH_Sc8=.8f4f92c1-2e49-40e0-88b1-ca1c37c1ec70@github.com> On Wed, 24 Jul 2024 13:59:44 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote: >> When running with ubsan - enabled binaries, some tests trigger the following report : >> >> src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' >> #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91 >> #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106 >> #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157 >> #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63 >> #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85 >> #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319 >> #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233 >> #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199 >> >> Seems in case of (at least) class SmallRegisterMap we miss handling nullptr . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > ATTRIBUTE_NO_UBSAN must be after template typename ... Changes requested by kbarrett (Reviewer). > I think this is intended. No instances of SmallRegisterMap are ever created. > > Instead [SmallRegisterMap::instance](https://github.com/openjdk/jdk/blob/5b4824cf9aba297fa6873ebdadc0e9545647e90d/src/hotspot/cpu/x86/smallRegisterMap_x86.inline.hpp#L34C20-L34C36) is used: > > ```c++ > static constexpr SmallRegisterMap* instance = nullptr; > ``` > > The type is the only information that is actually used. Being intentional doesn't make it any less invalid. Here's an untested change that I think will fix the problem. https://github.com/openjdk/jdk/compare/master...kimbarrett:openjdk-jdk:smallregmap?expand=1 src/hotspot/share/runtime/frame.inline.hpp line 86: > 84: > 85: template <typename RegisterMapT> > 86: ATTRIBUTE_NO_UBSAN That's not good enough. Turning off the ubsan warning doesn't prevent the compiler from doing unexpected and potentially bad things with invalid code. ------------- PR Review: https://git.openjdk.org/jdk/pull/20296#pullrequestreview-2197514292 PR Comment: https://git.openjdk.org/jdk/pull/20296#issuecomment-2248704435 PR Comment: https://git.openjdk.org/jdk/pull/20296#issuecomment-2248706749 PR Review Comment: https://git.openjdk.org/jdk/pull/20296#discussion_r1690277354 From vlivanov at openjdk.org Wed Jul 24 19:12:37 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 24 Jul 2024 19:12:37 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v6] In-Reply-To: <YxBy1Mx7Di5EDfJkCTfcaIuTzCv5KdzBzKMcE3iIeak=.2a56f436-8e14-4a22-a85d-cd06209e2c01@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> <A2v60vdAPL9qb22NB6kLVyuCACPDeqHUYoYFRFX6ig0=.9ef6f86b-559d-463a-9061-d0bbb6093aa7@github.com> <ukQ_tEZztKeBZnn8TDo3YfJ4GI0mHUrVRZmgM4d1W1g=.1fc9f9f2-c2bf-4237-94d4-dd9aae26411b@github.com> <BolXJ-8qekfYskirR9P20jAQZW6s7WPe4A-oija7RA8=.855251f0-4246-403d-a9fe-00b9406f07e3@github.com> <eLDcJyPLboqZr-8yk1kxVfV6WTaRYXZq5lZvDoIEFKM=.c87b23c8-d9c5-45ff-a2dd-5f0c4875cb62@github.com> <UAjH__AKdU3UMdJBkg7TlElKSA8mEFFE0MiElVrYexE=.4bc67a26-3383-4e4e-92b0-f1d3d33c5ce2@github.com> <M5xQ14pzHdBEr7yAdAqIVUsY_o8tXUgN9HpKxjkZznw=.f2262137-2fec-4297-ae1e-89b11874266f@github.com> <YxBy1Mx7Di5EDfJkCTfcaIuTzCv5KdzBzKMcE3iIeak=.2a56f436-8e14-4a22-a85d-cd06209e2c01@github.com> Message-ID: <vw5vWKYgk45g7I9Yio_NTYLSL9fz3y6ptFHJyGNZJCE=.bb3c7d4e-9a5e-4c53-80b7-853dc74a611c@github.com> On Wed, 24 Jul 2024 09:03:12 GMT, Andrew Haley <aph at openjdk.org> wrote: >>> Also also, Klass::is_subtype_of() is used for C1 runtime. >> >> Can you elaborate, please? What I'm seeing in `Runtime1::generate_code_for()` for `slow_subtype_check` is a call into `MacroAssembler::check_klass_subtype_slow_path()`. > >> > Also also, Klass::is_subtype_of() is used for C1 runtime. >> >> Can you elaborate, please? > > Sorry, that was rather vague. In C1-compiled code, the Java method `Class::isInstance(Object)`calls `Klass::is_subtype_of()`. > > In general, I find it difficult to decide how much work, if any, should be done to improve C1 performance. Clearly, if C1 exists only to help with startup time in a tiered compilation system, the answer is "not much". Thanks, now I see that `Class::isInstance(Object)` is backed by `Runtime1::is_instance_of()` which uses `oopDesc::is_a()` to do the job. If it turns out to be performance critical, the intrinsic implementation should be rewritten to exercise existing subtype checking support in C1. As it is implemented now, it's already quite inefficient. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1690303731 From wkemper at openjdk.org Wed Jul 24 19:27:40 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 24 Jul 2024 19:27:40 GMT Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update mode In-Reply-To: <kkDjLgSV3zILB-gIvYB-JrYVowr9zOFEQEZoicZrKB0=.5e1d86ce-cef5-4255-a887-3ffdd0f2b7c2@github.com> References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com> <kkDjLgSV3zILB-gIvYB-JrYVowr9zOFEQEZoicZrKB0=.5e1d86ce-cef5-4255-a887-3ffdd0f2b7c2@github.com> Message-ID: <iExDnHB_1WSKaVnW8g2usSSiQUTMMQuGuc691noaUnA=.792236b9-a939-43c7-aa55-c200bf5e7d86@github.com> On Wed, 24 Jul 2024 18:25:38 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: >> We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development. >> >> ## Testing >> * hotspot_gc_shenandoah >> * dacapo >> * diluvian >> * extremem >> * hyperalloc >> * specjbb2015 >> * specjvm2008 > > src/hotspot/cpu/ppc/gc/shenandoah/shenandoahBarrierSetAssembler_ppc.cpp line 571: > >> 569: /* ==== Apply keep-alive barrier, if required (e.g., to inhibit weak reference resurrection) ==== */ >> 570: if (ShenandoahBarrierSet::need_keep_alive_barrier(decorators, type)) { >> 571: if (ShenandoahSATBBarrier) { > > A bit weird to replace IU with SATB barrier here. @shipilev [The original code](https://github.com/openjdk/jdk/pull/20316/files#diff-cb01b36a8c7017c9e21645a0ff9075897e5bfa67ae37d4f0d69ccc582656ec31L71) used the function `iu_barrier` to emit the pre-write barrier for both SATB and IU modes. This only happened in the `ppc` port. Other platforms just invoke the function to emit the `satb_write_barrier` directly here (see [x86](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/gc/shenandoah/shenandoahBarrierSetAssembler_x86.cpp#L580), for example). @kdnilsen No, `need_keep_alive_barrier` may be true in SATB mode. The use of the barrier here is to make sure weak references that get loaded during mark are added to the SATB buffer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20316#discussion_r1690320078 From wkemper at openjdk.org Wed Jul 24 19:31:04 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 24 Jul 2024 19:31:04 GMT Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update mode [v2] In-Reply-To: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com> References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com> Message-ID: <03bSRAN8T28AU2-M4IzjsBygTwG4SHrc8HUIJYLM5TE=.e4299a87-b25f-471b-9f6e-2c08e741c6f2@github.com> > We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development. > > ## Testing > * hotspot_gc_shenandoah > * dacapo > * diluvian > * extremem > * hyperalloc > * specjbb2015 > * specjvm2008 William Kemper has updated the pull request incrementally with one additional commit since the last revision: Remove unintentional new line ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20316/files - new: https://git.openjdk.org/jdk/pull/20316/files/41a2deb8..ec2d6b64 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20316&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20316&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20316.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20316/head:pull/20316 PR: https://git.openjdk.org/jdk/pull/20316 From wkemper at openjdk.org Wed Jul 24 19:31:05 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 24 Jul 2024 19:31:05 GMT Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update mode [v2] In-Reply-To: <kkDjLgSV3zILB-gIvYB-JrYVowr9zOFEQEZoicZrKB0=.5e1d86ce-cef5-4255-a887-3ffdd0f2b7c2@github.com> References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com> <kkDjLgSV3zILB-gIvYB-JrYVowr9zOFEQEZoicZrKB0=.5e1d86ce-cef5-4255-a887-3ffdd0f2b7c2@github.com> Message-ID: <G2bCQdkWIBL7HfMTvz9gM07jZ2qAme4yFMMaXOW6tSc=.71fce9e0-0cfe-4e3a-bdee-6d138b5a4444@github.com> On Wed, 24 Jul 2024 18:38:02 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: >> William Kemper has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unintentional new line > > src/hotspot/share/opto/classes.hpp line 327: > >> 325: shmacro(ShenandoahWeakCompareAndSwapN) >> 326: shmacro(ShenandoahWeakCompareAndSwapP) >> 327: > > I think this newline is unnecessary. I agree. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20316#discussion_r1690322953 From kbarrett at openjdk.org Wed Jul 24 20:57:37 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 24 Jul 2024 20:57:37 GMT Subject: RFR: 8316930: HotSpot should use noexcept instead of throw() [v4] In-Reply-To: <zEG6cI-mqmOGle_U_psjrGOlJysZTex7Xw5fal3un-c=.e1beed32-9efa-4dbd-94c2-7ae6f9a24287@github.com> References: <kc_cq_sBCqn-iAwHCEaTqgMVYrnT6tKsk3SZnD_qP-s=.1b5d24dd-a925-4f6d-aefb-67b4df6bddac@github.com> <VScr4niHiJs_5LG0Npi39l3fdOa47am0zftg3jO-IsQ=.0905fa3e-2b66-41a5-b015-2bbd9f7b3940@github.com> <2sW1ZQ333qSkLQRk_e-f4g-sOwvW2bKshy8cszUoDrw=.bca141f1-d832-4f95-ad68-47c5fc3d068b@github.com> <X522HC3ke8LtoI1xQspLXQDc2Drkc_gmYZKap_TS7pQ=.f0a5177a-187e-4f4e-88df-fc84dec67eb2@github.com> <zEG6cI-mqmOGle_U_psjrGOlJysZTex7Xw5fal3un-c=.e1beed32-9efa-4dbd-94c2-7ae6f9a24287@github.com> Message-ID: <RTVs-cHpHTkrIGHu-qQQMNk55e2D-0lA4NJh6u5fBMU=.66ee5a8b-982e-4759-a104-db5721fc2d31@github.com> On Sun, 4 Feb 2024 23:09:33 GMT, David Holmes <dholmes at openjdk.org> wrote: >> I think EXIT_OOM is an implementation detail. The key point is that these methods do not throw. > > Or do I have that backwards ... the key point of `no_except` is that callers of these method must check for `nullptr`, but with EXIT_OOM that is not the case - and we don't want it to appear that the allocation can actually fail and we continue execution! It looks like there are a number of `operator new`s that have nothrow exception specs but shouldn't. CompilationResourceObj in the immediately preceding file, for example. A precursor cleanup (or maybe several) that removed those first would be nice. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15910#discussion_r1690410887 From kbarrett at openjdk.org Wed Jul 24 20:57:37 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 24 Jul 2024 20:57:37 GMT Subject: RFR: 8316930: HotSpot should use noexcept instead of throw() [v4] In-Reply-To: <RTVs-cHpHTkrIGHu-qQQMNk55e2D-0lA4NJh6u5fBMU=.66ee5a8b-982e-4759-a104-db5721fc2d31@github.com> References: <kc_cq_sBCqn-iAwHCEaTqgMVYrnT6tKsk3SZnD_qP-s=.1b5d24dd-a925-4f6d-aefb-67b4df6bddac@github.com> <VScr4niHiJs_5LG0Npi39l3fdOa47am0zftg3jO-IsQ=.0905fa3e-2b66-41a5-b015-2bbd9f7b3940@github.com> <2sW1ZQ333qSkLQRk_e-f4g-sOwvW2bKshy8cszUoDrw=.bca141f1-d832-4f95-ad68-47c5fc3d068b@github.com> <X522HC3ke8LtoI1xQspLXQDc2Drkc_gmYZKap_TS7pQ=.f0a5177a-187e-4f4e-88df-fc84dec67eb2@github.com> <zEG6cI-mqmOGle_U_psjrGOlJysZTex7Xw5fal3un-c=.e1beed32-9efa-4dbd-94c2-7ae6f9a24287@github.com> <RTVs-cHpHTkrIGHu-qQQMNk55e2D-0lA4NJh6u5fBMU=.66ee5a8b-982e-4759-a104-db5721fc2d31@github.com> Message-ID: <Q7mS5hx_DMDBevrTHtfZV_uVtg8WsrgXkhrd_baO_sg=.0a20afba-7270-4abc-877b-ee275b89b644@github.com> On Wed, 24 Jul 2024 20:53:04 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: >> Or do I have that backwards ... the key point of `no_except` is that callers of these method must check for `nullptr`, but with EXIT_OOM that is not the case - and we don't want it to appear that the allocation can actually fail and we continue execution! > > It looks like there are a number of `operator new`s that have nothrow exception specs but shouldn't. > CompilationResourceObj in the immediately preceding file, for example. > A precursor cleanup (or maybe several) that removed those first would be nice. It seems that https://bugs.openjdk.org/browse/JDK-8305590 only removed some of them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15910#discussion_r1690413889 From kvn at openjdk.org Wed Jul 24 21:00:32 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 24 Jul 2024 21:00:32 GMT Subject: RFR: 8334230: Optimize C2 classes layout In-Reply-To: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com> References: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com> Message-ID: <Cbw6Z7Zoa5krBIgdJ8_MF3IPox99JcSAEckvYStY1vM=.0e7d3fd5-0021-4694-8037-16e1ab91e644@github.com> On Mon, 24 Jun 2024 15:53:24 GMT, Neethu Prasad <nprasad at openjdk.org> wrote: > **Notes** > > Rearrange C2 class fields to optimize footprint. > > > **Verification** > > 1. Ran tier2_compiler, hotspot_compiler, tier 1 & tier 2 tests. > 2. Ran pahole on 64 bit machine post re-ordering and verified that there are no holes / reduction in total bytes. > > | Class | Size | Cachelines | Sum Members | Holes | Sum holes | Last Cacheline | Padding | > | ----- | ----- | ---------- | --------------- | ----- | ---------- | --------------- | -------- | > | ArrayPointer | 56 -> 48 | 1 -> 1 | 45 -> 0 | 2 -> 0 | 11 -> 0 | 56 bytes -> 48 | 0 -> 3 | > | CallJavaNode | 152 -> 144 | 3 -> 3 | 12 -> 0 | 1 -> 0 | 5 -> 0 | 24 bytes -> 16 | 7 -> 4 | > | C2Access | 56 -> 48 | 1-> 1 | 42 -> 0 | 1 -> 0 | 7 -> 0 | 56 bytes -> 48 | 7 -> 6 | > | VectorSet| 32 -> 24 | 1-> 1 | 24 -> 0 | 1 -> 0 | 8 -> 0 | 32 bytes -> 24 | 1 -> 1 | > > class ArrayPointer { > const class Node * _pointer; /* 0 8 */ > const class Node * _base; /* 8 8 */ > const jlong _constant_offset; /* 16 8 */ > const class Node * _int_offset; /* 24 8 */ > const class GrowableArray<Node*> * _other_offsets; /* 32 8 */ > const jint _int_offset_shift; /* 40 4 */ > const bool _is_valid; /* 44 1 */ > public: > > > /* size: 48, cachelines: 1, members: 7 */ > /* padding: 3 */ > /* last cacheline: 48 bytes */ > }; > > > > class CallJavaNode : public CallNode { > public: > > /* class CallNode <ancestor>; */ /* 0 128 */ > protected: > > /* --- cacheline 2 boundary (128 bytes) --- */ > class ciMethod * _method; /* 128 8 */ > bool _optimized_virtual; /* 136 1 */ > bool _method_handle_invoke; /* 137 1 */ > bool _override_symbolic_info; /* 138 1 */ > bool _arg_escape; /* 139 1 */ > public: > > protected: > > public: > > > /* size: 144, cachelines: 3, members: 6 */ > /* padding: 4 */ > /* last cacheline: 16 bytes */ > > /* BRAIN FART ALERT! 144 bytes != 12 (member bytes) + 0 (member bits) + 0 (byte holes) + 0 (bit holes), diff = 1024 bits */ > }; > > > > class C2Access : public StackObj { > public: > > /* class StackObj <ancestor>; */ /* 0 0 */ > > /* XXX last struct has 1 byte of padding */ > > int ()(void) * * _vptr.C2Access; /* 0 8 */ > protected: > > DecoratorSet _decorators; /* 8 ... Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19861#pullrequestreview-2197755523 From vlivanov at openjdk.org Wed Jul 24 21:26:39 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 24 Jul 2024 21:26:39 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v6] In-Reply-To: <eMrlgijA4K3kj9F7-cj1RBWRQg0rc9faF13SR9UdEys=.4ca5c8f7-6462-4adf-8160-91c018985822@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> <wgY2erz716MCi6K6DcUKEqLyd6E82ArMlba9qHdAA9o=.de21daa5-b078-4469-a6eb-df548f699f65@github.com> <Ct5EunuM4nq5EUa-kDtCzKs-O4Z_wEnMq2_5W7GPaeY=.f475ee5b-3bea-48a7-97d1-7f71287e4fc9@github.com> <TxLB7H7lM8c1e-Hc5PvGAiuil1YKOfWqg_EJUwFp4O8=.554e044f-6e2a-4216-96cc-9d55b309280d@github.com> <sYvOt6BDxFIPAczwoEop5-nUNHQeOi-IH2hGlSVL0ww=.8f6ed07f-0251-44c9-a3a0-0742dabbc15c@github.com> <eMrlgijA4K3kj9F7-cj1RBWRQg0rc9faF13SR9UdEys=.4ca5c8f7-6462-4adf-8160-91c018985822@github.com> Message-ID: <FmVMUWy97fFvsqi16zkq3xtZzftZqo6oa9YSxWUIr_E=.1a2e7ab6-1b85-47fe-ae4c-3b8705f65fd3@github.com> On Wed, 24 Jul 2024 16:14:47 GMT, Andrew Haley <aph at openjdk.org> wrote: >>> I suspect that Klass::search_secondary_supers() won't be inlinined in such case. >> >> That's true, but it's true of every other function in that file. Is it not deliberate? > > FYI, somewhat related: AArch64 GCC inlines `lookup_secondary_supers_table()` 237 times (it's only a few instructions.) > x86-64 GCC, because it doesn't use a popcount intrinsic, decides that `lookup_secondary_supers_table()` is too big to be worth inlining in all but 3 cases. So the right thing happens, I think: where we can profit from fast lookups without bloating the runtime, we do. > That's true, but it's true of every other function in that file. Is it not deliberate? IMO the fact that `Klass::search_secondary_supers()` is used in `klass.hpp` makes a difference here. After thinking more about it, I did a small experiment [1] and observed a build failure on AArch64 [2]. I think we don't see any more failures simply because `klass.inline.hpp` is included pervasively. What do you think about moving `Klass::is_subtype_of()` to `klass.inline.hpp`? [1] diff --git a/test/hotspot/gtest/oops/test_klass.cpp b/test/hotspot/gtest/oops/test_klass.cpp new file mode 100644 index 00000000000..326a70f1f54 --- /dev/null +++ b/test/hotspot/gtest/oops/test_klass.cpp @@ -0,0 +1,9 @@ +#include "precompiled.hpp" +#include "oops/klass.hpp" +#include "unittest.hpp" + +TEST_VM(Klass, is_subtype_of) { + Klass* k = vmClasses::Object_klass(); + ASSERT_TRUE(k->is_subtype_of(k)); +} [2] Undefined symbols for architecture arm64: "Klass::search_secondary_supers(Klass*) const", referenced from: Klass_is_subtype_of_vm_Test::TestBody() in test_klass.o ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1690472489 From dholmes at openjdk.org Thu Jul 25 01:20:33 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 25 Jul 2024 01:20:33 GMT Subject: RFR: 8330427: Obsolete -XX:+PreserveAllAnnotations In-Reply-To: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com> References: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com> Message-ID: <PRFkm94et3LVwe-Rq4m8vZYSzw7sKlvZIRu9ltxu0h4=.91dc77a3-8c50-4a9c-9529-320efcfa78c5@github.com> On Wed, 24 Jul 2024 18:01:15 GMT, Alex Menkov <amenkov at openjdk.org> wrote: > Obsolete PreserveAllAnnotations flag which was deprecated in JDK 23. > > Testing: tier1,tier2,tier3,tier4,hs-tier5-svc Great cleanup - good to see all that complexity go! I think the test can be removed completely - see below. Thanks test/jdk/java/lang/instrument/RetransformRecordAnnotation.java line 32: > 30: * @run shell MakeJAR.sh retransformAgent > 31: * @run main/othervm -javaagent:retransformAgent.jar -Xlog:redefine+class=trace RetransformRecordAnnotation > 32: * @run main/othervm -javaagent:retransformAgent.jar -XX:+PreserveAllAnnotations -Xlog:redefine+class=trace RetransformRecordAnnotation This test is described as: * @summary test that records with invisible annotation can be retransformed ``` which suggests to me the test can actually be deleted as it serves no purpose now there are no invisible annotations ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20315#pullrequestreview-2198083244 PR Review Comment: https://git.openjdk.org/jdk/pull/20315#discussion_r1690664957 From amenkov at openjdk.org Thu Jul 25 01:53:13 2024 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 25 Jul 2024 01:53:13 GMT Subject: RFR: 8330427: Obsolete -XX:+PreserveAllAnnotations [v2] In-Reply-To: <PRFkm94et3LVwe-Rq4m8vZYSzw7sKlvZIRu9ltxu0h4=.91dc77a3-8c50-4a9c-9529-320efcfa78c5@github.com> References: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com> <PRFkm94et3LVwe-Rq4m8vZYSzw7sKlvZIRu9ltxu0h4=.91dc77a3-8c50-4a9c-9529-320efcfa78c5@github.com> Message-ID: <WqY9zi3sI0V3Fr_MsLNqCube0o2-_2YaktYchKaz43U=.570d227b-9f19-4d2a-a9f2-beb519e3856a@github.com> On Thu, 25 Jul 2024 01:17:14 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: >> >> remove test > > test/jdk/java/lang/instrument/RetransformRecordAnnotation.java line 32: > >> 30: * @run shell MakeJAR.sh retransformAgent >> 31: * @run main/othervm -javaagent:retransformAgent.jar -Xlog:redefine+class=trace RetransformRecordAnnotation >> 32: * @run main/othervm -javaagent:retransformAgent.jar -XX:+PreserveAllAnnotations -Xlog:redefine+class=trace RetransformRecordAnnotation > > This test is described as: > > * @summary test that records with invisible annotation can be retransformed > ``` > which suggests to me the test can actually be deleted as it serves no purpose now there are no invisible annotations Agree. Removed the test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20315#discussion_r1690681450 From amenkov at openjdk.org Thu Jul 25 01:53:13 2024 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 25 Jul 2024 01:53:13 GMT Subject: RFR: 8330427: Obsolete -XX:+PreserveAllAnnotations [v2] In-Reply-To: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com> References: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com> Message-ID: <yoYPcRiwlovmm5hdLcD8y1d25ABb3r5KUniSzzyfBzI=.be9f5747-fa03-4b13-ba53-4d868ea85989@github.com> > Obsolete PreserveAllAnnotations flag which was deprecated in JDK 23. > > Testing: tier1,tier2,tier3,tier4,hs-tier5-svc Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: remove test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20315/files - new: https://git.openjdk.org/jdk/pull/20315/files/03aa9a76..89c83c60 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20315&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20315&range=00-01 Stats: 186 lines in 1 file changed: 0 ins; 186 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20315/head:pull/20315 PR: https://git.openjdk.org/jdk/pull/20315 From dholmes at openjdk.org Thu Jul 25 01:57:34 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 25 Jul 2024 01:57:34 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured In-Reply-To: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com> References: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com> Message-ID: <6IlNzo_E9E-FWJrCiQZJRxD6rcTrBZ9pB86OjP0DzMU=.6562dd2f-eed5-4dd8-8fe7-b116e7932a3e@github.com> On Wed, 26 Jun 2024 13:32:32 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote: > Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work. > > We find this in the test output > > [STDOUT] > /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory > > The container where the test is executed does not contain the ubsan package; we might skip the test in this case. On the one hand this seems like a "Dr Dr it hurts when I do this" kind of problem. On the other hand it only affects the docker testing so I'm inclined to let it in, even though it is a bit of a blunt instrument (what if ubsan is installed in the container and someone wants to run with it enabled there?). ------------- PR Review: https://git.openjdk.org/jdk/pull/19907#pullrequestreview-2198111143 From dholmes at openjdk.org Thu Jul 25 01:58:31 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 25 Jul 2024 01:58:31 GMT Subject: RFR: 8330427: Obsolete -XX:+PreserveAllAnnotations [v2] In-Reply-To: <yoYPcRiwlovmm5hdLcD8y1d25ABb3r5KUniSzzyfBzI=.be9f5747-fa03-4b13-ba53-4d868ea85989@github.com> References: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com> <yoYPcRiwlovmm5hdLcD8y1d25ABb3r5KUniSzzyfBzI=.be9f5747-fa03-4b13-ba53-4d868ea85989@github.com> Message-ID: <aIkYNugMcYZSQq6-hQHBvVaojngb68xf8ZJxoMJsy5Y=.3e5ccf25-cde1-4956-8b75-cc80df4a8295@github.com> On Thu, 25 Jul 2024 01:53:13 GMT, Alex Menkov <amenkov at openjdk.org> wrote: >> Obsolete PreserveAllAnnotations flag which was deprecated in JDK 23. >> >> Testing: tier1,tier2,tier3,tier4,hs-tier5-svc > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > remove test Marked as reviewed by dholmes (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20315#pullrequestreview-2198112296 From fyang at openjdk.org Thu Jul 25 04:11:32 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 25 Jul 2024 04:11:32 GMT Subject: RFR: 8335191: RISC-V: verify perf of chacha20 In-Reply-To: <w9XXvU5jMWne42lO3SFUElmDRhEP28G2xS8qo0oATe8=.f3bd16e6-0ed8-4bd5-aced-d59dfadc571a@github.com> References: <w9XXvU5jMWne42lO3SFUElmDRhEP28G2xS8qo0oATe8=.f3bd16e6-0ed8-4bd5-aced-d59dfadc571a@github.com> Message-ID: <PBX_MsEj4hzAW23ya5c1lfHuabJAJD-mKQfhqbz9ZzY=.ae162303-7a4a-456f-9e86-8f3852d6b35c@github.com> On Tue, 23 Jul 2024 11:21:31 GMT, Hamlin Li <mli at openjdk.org> wrote: > Hi, > Can you help to review this simple patch? > > Previously, we implemented this intrinsic for chacha20 algo based on vector instructions, the latest test on real hardwares (k230, bananapi) shows that the implementation only bring more performance gain rather than regression when (vlenb == 32, on bananapi), when vlenb == 16 (on k230) it only bring regression in all test cases. > So, we should adjust when to turn on the intrinsic, ie. only when vlenb == 32. > > Thanks > > > ## Performance > > ### on k230 > vlenb == 16 > <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> > Benchmark - on k230, vlenb == 16 | (dataSize) | (keyLength) | (mode) | (padding) | (permutation) | Cnt | Score -no-intrinsic | Score +intrinsic | Error | Units | Non-intrinsic/intrinsic > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 256 | 256 | None | NoPadding | ChaCha20 | 10 | 4642.694 | 5216.699 | 36.039 | ns/op | 0.89 > o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 1024 | 256 | None | NoPadding | ChaCha20 | 10 | 15719.612 | 17622.616 | 136.609 | ns/op | 0.892 > o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 4096 | 256 | None | NoPadding | ChaCha20 | 10 | 59402.742 | 67124.28 | 651.011 | ns/op | 0.885 > o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 16384 | 256 | None | NoPadding | ChaCha20 | 10 | 250056.475 | 269184.924 | 8591.727 | ns/op | 0.929 > o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 256 | 256 | None | NoPadding | ChaCha20 | 10 | 4752.081 | 5131.094 | 38.917 | ns/op | 0.926 > o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 1024 | 256 | None | NoPadding | ChaCha20 | 10 | 15554.484 | 16992.339 | 106.583 | ns/op | 0.915 > o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 4096 | 256 | None | NoPadding | ChaCha20 | 10 | 61446.365 | 67359.67 | 548.353 | ns/op | 0.912 > o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 16384 | 256 | None | NoPadding | ChaCha20 | 10 | 241653.654 | 270189.531 | 3705.045 | ns/op | 0.894 > o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 256 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 17833.825 | 20610.118 | 688.668 | ns/op | 0.865 > o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 1024 | 256 | None | NoPadding ChaC | ha20-Poly1... Thanks for carrying out the test. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20298#pullrequestreview-2198265238 From kbarrett at openjdk.org Thu Jul 25 04:29:35 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 25 Jul 2024 04:29:35 GMT Subject: RFR: 8337027: Parallel: Obsolete BaseFootPrintEstimate [v2] In-Reply-To: <VQeOV8bJxKRoDHOg5MkGa8ukguwU0SaiB3SpL3gq3_g=.4b4386f8-fc8e-4bd7-ac15-c089c59fb05c@github.com> References: <wULp2EAECh8W75aA83GCDEq9GzldQzBwwe16SqY6phk=.902d4251-a271-4575-8ac3-4f2224ca453c@github.com> <VQeOV8bJxKRoDHOg5MkGa8ukguwU0SaiB3SpL3gq3_g=.4b4386f8-fc8e-4bd7-ac15-c089c59fb05c@github.com> Message-ID: <14S7Ls4AoVfFjxSOeW4N42KZtcvhsJrIe25S9r5FEjg=.03e5738f-bcb6-42d5-831f-a2dc00c01c86@github.com> On Wed, 24 Jul 2024 09:11:13 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: >> Simple obsoleting a Parallel GC product flag. > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Looks good. I assume you will be updating copyrights before integration? ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20299#pullrequestreview-2198278577 From thartmann at openjdk.org Thu Jul 25 05:05:31 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 25 Jul 2024 05:05:31 GMT Subject: RFR: 8336999: Verification for resource area allocated data structures in C2 In-Reply-To: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com> References: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com> Message-ID: <Z38oUi5mwF0KqgcyfFnDZjuvvQyY1UtIyrKeoOJHjk0=.18e09077-c469-4306-8956-7114e15a4dc7@github.com> On Wed, 24 Jul 2024 10:29:32 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: > Similar to [GrowableArrayNestingCheck](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/growableArray.cpp#L60), we should implement a check for C2's resource allocated data structures that verifies that reallocation happens under the same `ResourceMark` as the original allocation. Otherwise, use-after-free bugs like [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) will lead to memory corruption. > > This change adds a [ReallocMark](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/memory/allocation.cpp#L233) to all resource allocated data structures used by C2. I slightly modified it such that it checks the arena and skips verification if the data is not allocated in the resource arena. I also modified the grow methods such that we perform verification even if no reallocation is required. In addition, I changed a few `Unique_Node_List` allocations in vector.cpp from `comp_arena` to resource area allocations because they only have a short lifetime. > > While testing, I hit the verification code from: > > V [libjvm.so+0x5c1ceb] ReallocMark::check(Arena*)+0x7b (allocation.cpp:244) > V [libjvm.so+0x6df2da] Block_Array::grow(unsigned int)+0x1a (block.cpp:43) > V [libjvm.so+0xb88679] PhaseCFG::do_DFS(Tarjan*, unsigned int)+0x159 (block.hpp:72) > V [libjvm.so+0xb88b6b] PhaseCFG::build_dominator_tree()+0xab (domgraph.cpp:74) > V [libjvm.so+0xd75791] PhaseCFG::do_global_code_motion()+0x11 (gcm.cpp:1635) > V [libjvm.so+0x9f4fd4] Compile::Code_Gen()+0x2a4 (compile.cpp:2949) > V [libjvm.so+0x9f5f16] Compile::Compile(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, DirectiveSet*)+0xba6 (compile.cpp:991) > > > It's a false positive because the code in `PhaseCFG::build_dominator_tree` pre-grows `PhaseCFG::_blocks` to prevent reallocation before entering the scope of a nested ResourceMark. I think that's bad practice and should be avoided. I changed the code to allocate `_blocks` in a separate arena and removed the pre-growing. > > This detects [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) right away, even with `java -Xcomp -version`. > > We should revisit the footprint impact of arena allocations in C2 with [JDK-8337015](https://bugs.openjdk.org/browse/JDK-8337015). > > Thanks, > Tobias Thanks for the review, Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20311#issuecomment-2249434446 From thartmann at openjdk.org Thu Jul 25 05:06:31 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 25 Jul 2024 05:06:31 GMT Subject: RFR: 8334230: Optimize C2 classes layout In-Reply-To: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com> References: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com> Message-ID: <GrbzBV2RTD9A0wnUCt4Y0w-FescrLUfEvp2Lvzr8zxM=.c88db3dc-34ba-4142-847e-b790c92b557c@github.com> On Mon, 24 Jun 2024 15:53:24 GMT, Neethu Prasad <nprasad at openjdk.org> wrote: > **Notes** > > Rearrange C2 class fields to optimize footprint. > > > **Verification** > > 1. Ran tier2_compiler, hotspot_compiler, tier 1 & tier 2 tests. > 2. Ran pahole on 64 bit machine post re-ordering and verified that there are no holes / reduction in total bytes. > > | Class | Size | Cachelines | Sum Members | Holes | Sum holes | Last Cacheline | Padding | > | ----- | ----- | ---------- | --------------- | ----- | ---------- | --------------- | -------- | > | ArrayPointer | 56 -> 48 | 1 -> 1 | 45 -> 0 | 2 -> 0 | 11 -> 0 | 56 bytes -> 48 | 0 -> 3 | > | CallJavaNode | 152 -> 144 | 3 -> 3 | 12 -> 0 | 1 -> 0 | 5 -> 0 | 24 bytes -> 16 | 7 -> 4 | > | C2Access | 56 -> 48 | 1-> 1 | 42 -> 0 | 1 -> 0 | 7 -> 0 | 56 bytes -> 48 | 7 -> 6 | > | VectorSet| 32 -> 24 | 1-> 1 | 24 -> 0 | 1 -> 0 | 8 -> 0 | 32 bytes -> 24 | 1 -> 1 | > > class ArrayPointer { > const class Node * _pointer; /* 0 8 */ > const class Node * _base; /* 8 8 */ > const jlong _constant_offset; /* 16 8 */ > const class Node * _int_offset; /* 24 8 */ > const class GrowableArray<Node*> * _other_offsets; /* 32 8 */ > const jint _int_offset_shift; /* 40 4 */ > const bool _is_valid; /* 44 1 */ > public: > > > /* size: 48, cachelines: 1, members: 7 */ > /* padding: 3 */ > /* last cacheline: 48 bytes */ > }; > > > > class CallJavaNode : public CallNode { > public: > > /* class CallNode <ancestor>; */ /* 0 128 */ > protected: > > /* --- cacheline 2 boundary (128 bytes) --- */ > class ciMethod * _method; /* 128 8 */ > bool _optimized_virtual; /* 136 1 */ > bool _method_handle_invoke; /* 137 1 */ > bool _override_symbolic_info; /* 138 1 */ > bool _arg_escape; /* 139 1 */ > public: > > protected: > > public: > > > /* size: 144, cachelines: 3, members: 6 */ > /* padding: 4 */ > /* last cacheline: 16 bytes */ > > /* BRAIN FART ALERT! 144 bytes != 12 (member bytes) + 0 (member bits) + 0 (byte holes) + 0 (bit holes), diff = 1024 bits */ > }; > > > > class C2Access : public StackObj { > public: > > /* class StackObj <ancestor>; */ /* 0 0 */ > > /* XXX last struct has 1 byte of padding */ > > int ()(void) * * _vptr.C2Access; /* 0 8 */ > protected: > > DecoratorSet _decorators; /* 8 ... Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19861#pullrequestreview-2198317310 From mbaesken at openjdk.org Thu Jul 25 07:32:32 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 25 Jul 2024 07:32:32 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured In-Reply-To: <6IlNzo_E9E-FWJrCiQZJRxD6rcTrBZ9pB86OjP0DzMU=.6562dd2f-eed5-4dd8-8fe7-b116e7932a3e@github.com> References: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com> <6IlNzo_E9E-FWJrCiQZJRxD6rcTrBZ9pB86OjP0DzMU=.6562dd2f-eed5-4dd8-8fe7-b116e7932a3e@github.com> Message-ID: <hPNrVZ4OLxXqJFoSYb-rgaJI9buQomNkCUhpD6bM2JQ=.59e8a864-aa01-4c47-b01f-781acb275b74@github.com> On Thu, 25 Jul 2024 01:54:33 GMT, David Holmes <dholmes at openjdk.org> wrote: > what if ubsan is installed in the container and someone wants to run with it enabled there We could also try to install the ubsan package into the test container, at least for the default container setup. Do you prefer that ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19907#issuecomment-2249647308 From ayang at openjdk.org Thu Jul 25 07:44:45 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 25 Jul 2024 07:44:45 GMT Subject: RFR: 8337027: Parallel: Obsolete BaseFootPrintEstimate [v3] In-Reply-To: <wULp2EAECh8W75aA83GCDEq9GzldQzBwwe16SqY6phk=.902d4251-a271-4575-8ac3-4f2224ca453c@github.com> References: <wULp2EAECh8W75aA83GCDEq9GzldQzBwwe16SqY6phk=.902d4251-a271-4575-8ac3-4f2224ca453c@github.com> Message-ID: <F7o8T0rJbkysjNnN-pcQuRNXYfvRM1EyHrPSKBVcQ0Q=.79de0f66-f206-4e2a-ba1c-9d0f06bd025e@github.com> > Simple obsoleting a Parallel GC product flag. Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20299/files - new: https://git.openjdk.org/jdk/pull/20299/files/10720a6d..def4cff1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20299&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20299&range=01-02 Stats: 4 lines in 4 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20299.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20299/head:pull/20299 PR: https://git.openjdk.org/jdk/pull/20299 From mli at openjdk.org Thu Jul 25 07:52:35 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 25 Jul 2024 07:52:35 GMT Subject: RFR: 8335191: RISC-V: verify perf of chacha20 In-Reply-To: <PBX_MsEj4hzAW23ya5c1lfHuabJAJD-mKQfhqbz9ZzY=.ae162303-7a4a-456f-9e86-8f3852d6b35c@github.com> References: <w9XXvU5jMWne42lO3SFUElmDRhEP28G2xS8qo0oATe8=.f3bd16e6-0ed8-4bd5-aced-d59dfadc571a@github.com> <PBX_MsEj4hzAW23ya5c1lfHuabJAJD-mKQfhqbz9ZzY=.ae162303-7a4a-456f-9e86-8f3852d6b35c@github.com> Message-ID: <kd3LsB0dmtwoCKHwd_k-UEaEU8Rs0Rhw4e5nEfYtmrk=.4291f5f9-c693-44a2-b4c1-12ad22963dfa@github.com> On Thu, 25 Jul 2024 04:09:10 GMT, Fei Yang <fyang at openjdk.org> wrote: >> Hi, >> Can you help to review this simple patch? >> >> Previously, we implemented this intrinsic for chacha20 algo based on vector instructions, the latest test on real hardwares (k230, bananapi) shows that the implementation only bring more performance gain rather than regression when (vlenb == 32, on bananapi), when vlenb == 16 (on k230) it only bring regression in all test cases. >> So, we should adjust when to turn on the intrinsic, ie. only when vlenb == 32. >> >> Thanks >> >> >> ## Performance >> >> ### on k230 >> vlenb == 16 >> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> >> Benchmark - on k230, vlenb == 16 | (dataSize) | (keyLength) | (mode) | (padding) | (permutation) | Cnt | Score -no-intrinsic | Score +intrinsic | Error | Units | Non-intrinsic/intrinsic >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 256 | 256 | None | NoPadding | ChaCha20 | 10 | 4642.694 | 5216.699 | 36.039 | ns/op | 0.89 >> o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 1024 | 256 | None | NoPadding | ChaCha20 | 10 | 15719.612 | 17622.616 | 136.609 | ns/op | 0.892 >> o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 4096 | 256 | None | NoPadding | ChaCha20 | 10 | 59402.742 | 67124.28 | 651.011 | ns/op | 0.885 >> o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 16384 | 256 | None | NoPadding | ChaCha20 | 10 | 250056.475 | 269184.924 | 8591.727 | ns/op | 0.929 >> o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 256 | 256 | None | NoPadding | ChaCha20 | 10 | 4752.081 | 5131.094 | 38.917 | ns/op | 0.926 >> o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 1024 | 256 | None | NoPadding | ChaCha20 | 10 | 15554.484 | 16992.339 | 106.583 | ns/op | 0.915 >> o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 4096 | 256 | None | NoPadding | ChaCha20 | 10 | 61446.365 | 67359.67 | 548.353 | ns/op | 0.912 >> o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 16384 | 256 | None | NoPadding | ChaCha20 | 10 | 241653.654 | 270189.531 | 3705.045 | ns/op | 0.894 >> o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 256 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 17833.825 | 20610.118 | 688.668 | ns/op | 0.865 >> o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decry... > > Thanks for carrying out the test. Thanks @RealFYang for your reviewing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20298#issuecomment-2249676413 From mli at openjdk.org Thu Jul 25 07:52:36 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 25 Jul 2024 07:52:36 GMT Subject: RFR: 8335191: RISC-V: verify perf of chacha20 In-Reply-To: <w9XXvU5jMWne42lO3SFUElmDRhEP28G2xS8qo0oATe8=.f3bd16e6-0ed8-4bd5-aced-d59dfadc571a@github.com> References: <w9XXvU5jMWne42lO3SFUElmDRhEP28G2xS8qo0oATe8=.f3bd16e6-0ed8-4bd5-aced-d59dfadc571a@github.com> Message-ID: <vxopaVYaEePjLH3lrz8Ce8qO545mRT8PxyVCVkiLK3k=.039fc662-1b1e-4073-a831-9fc0f34d13fa@github.com> On Tue, 23 Jul 2024 11:21:31 GMT, Hamlin Li <mli at openjdk.org> wrote: > Hi, > Can you help to review this simple patch? > > Previously, we implemented this intrinsic for chacha20 algo based on vector instructions, the latest test on real hardwares (k230, bananapi) shows that the implementation only bring more performance gain rather than regression when (vlenb == 32, on bananapi), when vlenb == 16 (on k230) it only bring regression in all test cases. > So, we should adjust when to turn on the intrinsic, ie. only when vlenb == 32. > > Thanks > > > ## Performance > > ### on k230 > vlenb == 16 > <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> > Benchmark - on k230, vlenb == 16 | (dataSize) | (keyLength) | (mode) | (padding) | (permutation) | Cnt | Score -no-intrinsic | Score +intrinsic | Error | Units | Non-intrinsic/intrinsic > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 256 | 256 | None | NoPadding | ChaCha20 | 10 | 4642.694 | 5216.699 | 36.039 | ns/op | 0.89 > o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 1024 | 256 | None | NoPadding | ChaCha20 | 10 | 15719.612 | 17622.616 | 136.609 | ns/op | 0.892 > o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 4096 | 256 | None | NoPadding | ChaCha20 | 10 | 59402.742 | 67124.28 | 651.011 | ns/op | 0.885 > o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 16384 | 256 | None | NoPadding | ChaCha20 | 10 | 250056.475 | 269184.924 | 8591.727 | ns/op | 0.929 > o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 256 | 256 | None | NoPadding | ChaCha20 | 10 | 4752.081 | 5131.094 | 38.917 | ns/op | 0.926 > o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 1024 | 256 | None | NoPadding | ChaCha20 | 10 | 15554.484 | 16992.339 | 106.583 | ns/op | 0.915 > o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 4096 | 256 | None | NoPadding | ChaCha20 | 10 | 61446.365 | 67359.67 | 548.353 | ns/op | 0.912 > o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 16384 | 256 | None | NoPadding | ChaCha20 | 10 | 241653.654 | 270189.531 | 3705.045 | ns/op | 0.894 > o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 256 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 17833.825 | 20610.118 | 688.668 | ns/op | 0.865 > o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 1024 | 256 | None | NoPadding ChaC | ha20-Poly1... As the change is minor and straight, I'll push it with one review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20298#issuecomment-2249677415 From mli at openjdk.org Thu Jul 25 07:52:36 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 25 Jul 2024 07:52:36 GMT Subject: Integrated: 8335191: RISC-V: verify perf of chacha20 In-Reply-To: <w9XXvU5jMWne42lO3SFUElmDRhEP28G2xS8qo0oATe8=.f3bd16e6-0ed8-4bd5-aced-d59dfadc571a@github.com> References: <w9XXvU5jMWne42lO3SFUElmDRhEP28G2xS8qo0oATe8=.f3bd16e6-0ed8-4bd5-aced-d59dfadc571a@github.com> Message-ID: <kSeuny690ejdLCLZxK2PYy1XLPPAMheb-YO0mLGyseI=.02eb4ea0-0888-499f-ae74-9bf64c1072b6@github.com> On Tue, 23 Jul 2024 11:21:31 GMT, Hamlin Li <mli at openjdk.org> wrote: > Hi, > Can you help to review this simple patch? > > Previously, we implemented this intrinsic for chacha20 algo based on vector instructions, the latest test on real hardwares (k230, bananapi) shows that the implementation only bring more performance gain rather than regression when (vlenb == 32, on bananapi), when vlenb == 16 (on k230) it only bring regression in all test cases. > So, we should adjust when to turn on the intrinsic, ie. only when vlenb == 32. > > Thanks > > > ## Performance > > ### on k230 > vlenb == 16 > <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> > Benchmark - on k230, vlenb == 16 | (dataSize) | (keyLength) | (mode) | (padding) | (permutation) | Cnt | Score -no-intrinsic | Score +intrinsic | Error | Units | Non-intrinsic/intrinsic > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 256 | 256 | None | NoPadding | ChaCha20 | 10 | 4642.694 | 5216.699 | 36.039 | ns/op | 0.89 > o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 1024 | 256 | None | NoPadding | ChaCha20 | 10 | 15719.612 | 17622.616 | 136.609 | ns/op | 0.892 > o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 4096 | 256 | None | NoPadding | ChaCha20 | 10 | 59402.742 | 67124.28 | 651.011 | ns/op | 0.885 > o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 16384 | 256 | None | NoPadding | ChaCha20 | 10 | 250056.475 | 269184.924 | 8591.727 | ns/op | 0.929 > o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 256 | 256 | None | NoPadding | ChaCha20 | 10 | 4752.081 | 5131.094 | 38.917 | ns/op | 0.926 > o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 1024 | 256 | None | NoPadding | ChaCha20 | 10 | 15554.484 | 16992.339 | 106.583 | ns/op | 0.915 > o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 4096 | 256 | None | NoPadding | ChaCha20 | 10 | 61446.365 | 67359.67 | 548.353 | ns/op | 0.912 > o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 16384 | 256 | None | NoPadding | ChaCha20 | 10 | 241653.654 | 270189.531 | 3705.045 | ns/op | 0.894 > o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 256 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 17833.825 | 20610.118 | 688.668 | ns/op | 0.865 > o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 1024 | 256 | None | NoPadding ChaC | ha20-Poly1... This pull request has now been integrated. Changeset: 9d879186 Author: Hamlin Li <mli at openjdk.org> URL: https://git.openjdk.org/jdk/commit/9d8791864ec48f3321707d7f7805cd3618fc3b51 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod 8335191: RISC-V: verify perf of chacha20 Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/20298 From shade at openjdk.org Thu Jul 25 08:33:31 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 25 Jul 2024 08:33:31 GMT Subject: RFR: 8334230: Optimize C2 classes layout In-Reply-To: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com> References: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com> Message-ID: <xE7mDVrT8tLQxaDEmfmQUZT377ixCmnEVteB5paDt_4=.eacd0d55-c65a-4a7c-8d07-482bd8e704a8@github.com> On Mon, 24 Jun 2024 15:53:24 GMT, Neethu Prasad <nprasad at openjdk.org> wrote: > **Notes** > > Rearrange C2 class fields to optimize footprint. > > > **Verification** > > 1. Ran tier2_compiler, hotspot_compiler, tier 1 & tier 2 tests. > 2. Ran pahole on 64 bit machine post re-ordering and verified that there are no holes / reduction in total bytes. > > | Class | Size | Cachelines | Sum Members | Holes | Sum holes | Last Cacheline | Padding | > | ----- | ----- | ---------- | --------------- | ----- | ---------- | --------------- | -------- | > | ArrayPointer | 56 -> 48 | 1 -> 1 | 45 -> 0 | 2 -> 0 | 11 -> 0 | 56 bytes -> 48 | 0 -> 3 | > | CallJavaNode | 152 -> 144 | 3 -> 3 | 12 -> 0 | 1 -> 0 | 5 -> 0 | 24 bytes -> 16 | 7 -> 4 | > | C2Access | 56 -> 48 | 1-> 1 | 42 -> 0 | 1 -> 0 | 7 -> 0 | 56 bytes -> 48 | 7 -> 6 | > | VectorSet| 32 -> 24 | 1-> 1 | 24 -> 0 | 1 -> 0 | 8 -> 0 | 32 bytes -> 24 | 1 -> 1 | > > class ArrayPointer { > const class Node * _pointer; /* 0 8 */ > const class Node * _base; /* 8 8 */ > const jlong _constant_offset; /* 16 8 */ > const class Node * _int_offset; /* 24 8 */ > const class GrowableArray<Node*> * _other_offsets; /* 32 8 */ > const jint _int_offset_shift; /* 40 4 */ > const bool _is_valid; /* 44 1 */ > public: > > > /* size: 48, cachelines: 1, members: 7 */ > /* padding: 3 */ > /* last cacheline: 48 bytes */ > }; > > > > class CallJavaNode : public CallNode { > public: > > /* class CallNode <ancestor>; */ /* 0 128 */ > protected: > > /* --- cacheline 2 boundary (128 bytes) --- */ > class ciMethod * _method; /* 128 8 */ > bool _optimized_virtual; /* 136 1 */ > bool _method_handle_invoke; /* 137 1 */ > bool _override_symbolic_info; /* 138 1 */ > bool _arg_escape; /* 139 1 */ > public: > > protected: > > public: > > > /* size: 144, cachelines: 3, members: 6 */ > /* padding: 4 */ > /* last cacheline: 16 bytes */ > > /* BRAIN FART ALERT! 144 bytes != 12 (member bytes) + 0 (member bits) + 0 (byte holes) + 0 (bit holes), diff = 1024 bits */ > }; > > > > class C2Access : public StackObj { > public: > > /* class StackObj <ancestor>; */ /* 0 0 */ > > /* XXX last struct has 1 byte of padding */ > > int ()(void) * * _vptr.C2Access; /* 0 8 */ > protected: > > DecoratorSet _decorators; /* 8 ... Looks fine, but I think we want to keep argument list in current order. Unless there is a good reason to change it, and I just don't see it? src/hotspot/share/gc/shared/c2/barrierSetC2.hpp line 115: > 113: public: > 114: C2Access(DecoratorSet decorators, > 115: Node* base, C2AccessValuePtr& addr, BasicType type) : I think it would be cleaner to leave the argument order alone here, and only change the field order. This would guarantee we do not change anything in APIs, which simplifies future changes and backports. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19861#pullrequestreview-2198664777 PR Review Comment: https://git.openjdk.org/jdk/pull/19861#discussion_r1691054895 From shade at openjdk.org Thu Jul 25 08:34:34 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 25 Jul 2024 08:34:34 GMT Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update mode [v2] In-Reply-To: <iExDnHB_1WSKaVnW8g2usSSiQUTMMQuGuc691noaUnA=.792236b9-a939-43c7-aa55-c200bf5e7d86@github.com> References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com> <kkDjLgSV3zILB-gIvYB-JrYVowr9zOFEQEZoicZrKB0=.5e1d86ce-cef5-4255-a887-3ffdd0f2b7c2@github.com> <iExDnHB_1WSKaVnW8g2usSSiQUTMMQuGuc691noaUnA=.792236b9-a939-43c7-aa55-c200bf5e7d86@github.com> Message-ID: <Vyt0QUR0Atgv1Jud_L1dNXMCYezne6EmBGmfq8OfqSo=.cbc7d46f-aaed-40bd-8890-18f431caa0e1@github.com> On Wed, 24 Jul 2024 19:25:24 GMT, William Kemper <wkemper at openjdk.org> wrote: >> src/hotspot/cpu/ppc/gc/shenandoah/shenandoahBarrierSetAssembler_ppc.cpp line 571: >> >>> 569: /* ==== Apply keep-alive barrier, if required (e.g., to inhibit weak reference resurrection) ==== */ >>> 570: if (ShenandoahBarrierSet::need_keep_alive_barrier(decorators, type)) { >>> 571: if (ShenandoahSATBBarrier) { >> >> A bit weird to replace IU with SATB barrier here. > > @shipilev [The original code](https://github.com/openjdk/jdk/pull/20316/files#diff-cb01b36a8c7017c9e21645a0ff9075897e5bfa67ae37d4f0d69ccc582656ec31L71) used the function `iu_barrier` to emit the pre-write barrier for both SATB and IU modes. This only happened in the `ppc` port. Other platforms just invoke the function to emit the `satb_write_barrier` directly here (see [x86](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/gc/shenandoah/shenandoahBarrierSetAssembler_x86.cpp#L580), for example). > > @kdnilsen No, `need_keep_alive_barrier` may be true in SATB mode. The use of the barrier here is to make sure weak references that get loaded during mark are added to the SATB buffer. Oh, okay. So that weirdness is pre-existing, fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20316#discussion_r1691057726 From fgao at openjdk.org Thu Jul 25 08:46:32 2024 From: fgao at openjdk.org (Fei Gao) Date: Thu, 25 Jul 2024 08:46:32 GMT Subject: RFR: 8336245: AArch64: remove extra register copy when converting from long to pointer In-Reply-To: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> Message-ID: <ULCrYK98jgJuZnte0wUwt15lPOUS1gdnyiIF2EfpMUo=.a6f337ac-9736-4b81-b37e-7e04940e9040@github.com> On Fri, 12 Jul 2024 13:44:25 GMT, Fei Gao <fgao at openjdk.org> wrote: > In the cases like: > > UNSAFE.putLong(address + off1 + 1030, lseed); > UNSAFE.putLong(address + 1023, lseed); > UNSAFE.putLong(address + off2 + 1001, lseed); > > > Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like: > > ldr R10, [R15, #120] # int ! Field: address > ldr R11, [R16, #136] # int ! Field: off1 > ldr R12, [R16, #144] # int ! Field: off2 > add R11, R11, R10 > mov R11, R11 # long -> ptr > add R12, R12, R10 > mov R10, R10 # long -> ptr > add R11, R11, #1030 # ptr > str R17, [R11] # int > add R10, R10, #1023 # ptr > str R17, [R10] # int > mov R10, R12 # long -> ptr > add R10, R10, #1001 # ptr > str R17, [R10] # int > > > In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly: > > ldr x10, [x15,#120] > ldp x11, x12, [x16,#136] > add x11, x11, x10 > add x12, x12, x10 > add x11, x11, #0x406 > str x17, [x11] > add x10, x10, #0x3ff > str x17, [x10] > mov x10, x12 <--- extra register copy > add x10, x10, #0x3e9 > str x17, [x10] > > > There is still one extra register copy, which we're trying to remove in this patch. > > This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition. > > Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so > > [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906 Can I have a second review please :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20157#issuecomment-2249795560 From chagedorn at openjdk.org Thu Jul 25 08:58:34 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 25 Jul 2024 08:58:34 GMT Subject: RFR: 8336999: Verification for resource area allocated data structures in C2 In-Reply-To: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com> References: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com> Message-ID: <WKPBNw86vLLxMZ2PpLH0zqvzVFf1QIIG4YG42PAKPx4=.7cd8981b-17da-4e1d-8ad9-2b516943dbe5@github.com> On Wed, 24 Jul 2024 10:29:32 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: > Similar to [GrowableArrayNestingCheck](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/growableArray.cpp#L60), we should implement a check for C2's resource allocated data structures that verifies that reallocation happens under the same `ResourceMark` as the original allocation. Otherwise, use-after-free bugs like [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) will lead to memory corruption. > > This change adds a [ReallocMark](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/memory/allocation.cpp#L233) to all resource allocated data structures used by C2. I slightly modified it such that it checks the arena and skips verification if the data is not allocated in the resource arena. I also modified the grow methods such that we perform verification even if no reallocation is required. In addition, I changed a few `Unique_Node_List` allocations in vector.cpp from `comp_arena` to resource area allocations because they only have a short lifetime. > > While testing, I hit the verification code from: > > V [libjvm.so+0x5c1ceb] ReallocMark::check(Arena*)+0x7b (allocation.cpp:244) > V [libjvm.so+0x6df2da] Block_Array::grow(unsigned int)+0x1a (block.cpp:43) > V [libjvm.so+0xb88679] PhaseCFG::do_DFS(Tarjan*, unsigned int)+0x159 (block.hpp:72) > V [libjvm.so+0xb88b6b] PhaseCFG::build_dominator_tree()+0xab (domgraph.cpp:74) > V [libjvm.so+0xd75791] PhaseCFG::do_global_code_motion()+0x11 (gcm.cpp:1635) > V [libjvm.so+0x9f4fd4] Compile::Code_Gen()+0x2a4 (compile.cpp:2949) > V [libjvm.so+0x9f5f16] Compile::Compile(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, DirectiveSet*)+0xba6 (compile.cpp:991) > > > It's a false positive because the code in `PhaseCFG::build_dominator_tree` pre-grows `PhaseCFG::_blocks` to prevent reallocation before entering the scope of a nested ResourceMark. I think that's bad practice and should be avoided. I changed the code to allocate `_blocks` in a separate arena and removed the pre-growing. > > This detects [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) right away, even with `java -Xcomp -version`. > > We should revisit the footprint impact of arena allocations in C2 with [JDK-8337015](https://bugs.openjdk.org/browse/JDK-8337015). > > Thanks, > Tobias Nice verification! Only one minor thing, otherwise, looks good to me, too. src/hotspot/share/opto/block.cpp line 46: > 44: return; // No need to grow > 45: } > 46: assert(i >= Max(), "must be an overflow"); The assert is now not necessary anymore. I guess you can remove it. Suggestion: ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20311#pullrequestreview-2198721155 PR Review Comment: https://git.openjdk.org/jdk/pull/20311#discussion_r1691090588 From shade at openjdk.org Thu Jul 25 09:07:33 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 25 Jul 2024 09:07:33 GMT Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update mode [v2] In-Reply-To: <03bSRAN8T28AU2-M4IzjsBygTwG4SHrc8HUIJYLM5TE=.e4299a87-b25f-471b-9f6e-2c08e741c6f2@github.com> References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com> <03bSRAN8T28AU2-M4IzjsBygTwG4SHrc8HUIJYLM5TE=.e4299a87-b25f-471b-9f6e-2c08e741c6f2@github.com> Message-ID: <sGgIbbq6E71rOKNm-riTf5bbae_3igRFJFt3e7JR4oA=.b70ed09c-8029-4e05-82c9-871dc4a82f85@github.com> On Wed, 24 Jul 2024 19:31:04 GMT, William Kemper <wkemper at openjdk.org> wrote: >> We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development. >> >> ## Testing >> * hotspot_gc_shenandoah >> * dacapo >> * diluvian >> * extremem >> * hyperalloc >> * specjbb2015 >> * specjvm2008 > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Remove unintentional new line I like this. Consider another thing to clean up: src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 122: > 120: > 121: ShenandoahMarkRefsClosure<GENERATION> mark_cl(q, rp); > 122: ShenandoahSATBAndRemarkThreadsClosure tc(satb_mq_set, nullptr); Looks like `ShenandoahSATBAndRemarkThreadsClosure` can be considerably simplified, now that we do not pass any closure to it. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20316#pullrequestreview-2198733772 PR Review Comment: https://git.openjdk.org/jdk/pull/20316#discussion_r1691098698 From thartmann at openjdk.org Thu Jul 25 09:13:08 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 25 Jul 2024 09:13:08 GMT Subject: RFR: 8336999: Verification for resource area allocated data structures in C2 In-Reply-To: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com> References: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com> Message-ID: <S8wy4l_RqxDyolZucDg6cQXaA1jNc9ECij35oVEtl0E=.d4bc4ef0-2ca5-4899-af0c-57812dd76112@github.com> On Wed, 24 Jul 2024 10:29:32 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: > Similar to [GrowableArrayNestingCheck](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/growableArray.cpp#L60), we should implement a check for C2's resource allocated data structures that verifies that reallocation happens under the same `ResourceMark` as the original allocation. Otherwise, use-after-free bugs like [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) will lead to memory corruption. > > This change adds a [ReallocMark](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/memory/allocation.cpp#L233) to all resource allocated data structures used by C2. I slightly modified it such that it checks the arena and skips verification if the data is not allocated in the resource arena. I also modified the grow methods such that we perform verification even if no reallocation is required. In addition, I changed a few `Unique_Node_List` allocations in vector.cpp from `comp_arena` to resource area allocations because they only have a short lifetime. > > While testing, I hit the verification code from: > > V [libjvm.so+0x5c1ceb] ReallocMark::check(Arena*)+0x7b (allocation.cpp:244) > V [libjvm.so+0x6df2da] Block_Array::grow(unsigned int)+0x1a (block.cpp:43) > V [libjvm.so+0xb88679] PhaseCFG::do_DFS(Tarjan*, unsigned int)+0x159 (block.hpp:72) > V [libjvm.so+0xb88b6b] PhaseCFG::build_dominator_tree()+0xab (domgraph.cpp:74) > V [libjvm.so+0xd75791] PhaseCFG::do_global_code_motion()+0x11 (gcm.cpp:1635) > V [libjvm.so+0x9f4fd4] Compile::Code_Gen()+0x2a4 (compile.cpp:2949) > V [libjvm.so+0x9f5f16] Compile::Compile(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, DirectiveSet*)+0xba6 (compile.cpp:991) > > > It's a false positive because the code in `PhaseCFG::build_dominator_tree` pre-grows `PhaseCFG::_blocks` to prevent reallocation before entering the scope of a nested ResourceMark. I think that's bad practice and should be avoided. I changed the code to allocate `_blocks` in a separate arena and removed the pre-growing. > > This detects [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) right away, even with `java -Xcomp -version`. > > We should revisit the footprint impact of arena allocations in C2 with [JDK-8337015](https://bugs.openjdk.org/browse/JDK-8337015). > > Thanks, > Tobias Thanks for the review, Christian! Updated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20311#issuecomment-2249844785 From thartmann at openjdk.org Thu Jul 25 09:13:07 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 25 Jul 2024 09:13:07 GMT Subject: RFR: 8336999: Verification for resource area allocated data structures in C2 [v2] In-Reply-To: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com> References: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com> Message-ID: <oZKf2g2aJmibrx_nxm8YCuYH1QafKzeuy1FrSLmtglU=.02b61561-b686-4cc4-860c-e5e7c006913e@github.com> > Similar to [GrowableArrayNestingCheck](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/growableArray.cpp#L60), we should implement a check for C2's resource allocated data structures that verifies that reallocation happens under the same `ResourceMark` as the original allocation. Otherwise, use-after-free bugs like [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) will lead to memory corruption. > > This change adds a [ReallocMark](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/memory/allocation.cpp#L233) to all resource allocated data structures used by C2. I slightly modified it such that it checks the arena and skips verification if the data is not allocated in the resource arena. I also modified the grow methods such that we perform verification even if no reallocation is required. In addition, I changed a few `Unique_Node_List` allocations in vector.cpp from `comp_arena` to resource area allocations because they only have a short lifetime. > > While testing, I hit the verification code from: > > V [libjvm.so+0x5c1ceb] ReallocMark::check(Arena*)+0x7b (allocation.cpp:244) > V [libjvm.so+0x6df2da] Block_Array::grow(unsigned int)+0x1a (block.cpp:43) > V [libjvm.so+0xb88679] PhaseCFG::do_DFS(Tarjan*, unsigned int)+0x159 (block.hpp:72) > V [libjvm.so+0xb88b6b] PhaseCFG::build_dominator_tree()+0xab (domgraph.cpp:74) > V [libjvm.so+0xd75791] PhaseCFG::do_global_code_motion()+0x11 (gcm.cpp:1635) > V [libjvm.so+0x9f4fd4] Compile::Code_Gen()+0x2a4 (compile.cpp:2949) > V [libjvm.so+0x9f5f16] Compile::Compile(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, DirectiveSet*)+0xba6 (compile.cpp:991) > > > It's a false positive because the code in `PhaseCFG::build_dominator_tree` pre-grows `PhaseCFG::_blocks` to prevent reallocation before entering the scope of a nested ResourceMark. I think that's bad practice and should be avoided. I changed the code to allocate `_blocks` in a separate arena and removed the pre-growing. > > This detects [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) right away, even with `java -Xcomp -version`. > > We should revisit the footprint impact of arena allocations in C2 with [JDK-8337015](https://bugs.openjdk.org/browse/JDK-8337015). > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/block.cpp Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20311/files - new: https://git.openjdk.org/jdk/pull/20311/files/b0f839b8..391dd920 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20311&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20311&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20311.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20311/head:pull/20311 PR: https://git.openjdk.org/jdk/pull/20311 From adinn at openjdk.org Thu Jul 25 09:40:32 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 25 Jul 2024 09:40:32 GMT Subject: RFR: 8336245: AArch64: remove extra register copy when converting from long to pointer In-Reply-To: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> Message-ID: <U17GOjlTnzMqNfLCJ3TOFe6qyJbFip49tCMooOR0V94=.ad5ec6c9-6401-4859-8a3c-e9ad917bd54a@github.com> On Fri, 12 Jul 2024 13:44:25 GMT, Fei Gao <fgao at openjdk.org> wrote: > In the cases like: > > UNSAFE.putLong(address + off1 + 1030, lseed); > UNSAFE.putLong(address + 1023, lseed); > UNSAFE.putLong(address + off2 + 1001, lseed); > > > Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like: > > ldr R10, [R15, #120] # int ! Field: address > ldr R11, [R16, #136] # int ! Field: off1 > ldr R12, [R16, #144] # int ! Field: off2 > add R11, R11, R10 > mov R11, R11 # long -> ptr > add R12, R12, R10 > mov R10, R10 # long -> ptr > add R11, R11, #1030 # ptr > str R17, [R11] # int > add R10, R10, #1023 # ptr > str R17, [R10] # int > mov R10, R12 # long -> ptr > add R10, R10, #1001 # ptr > str R17, [R10] # int > > > In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly: > > ldr x10, [x15,#120] > ldp x11, x12, [x16,#136] > add x11, x11, x10 > add x12, x12, x10 > add x11, x11, #0x406 > str x17, [x11] > add x10, x10, #0x3ff > str x17, [x10] > mov x10, x12 <--- extra register copy > add x10, x10, #0x3e9 > str x17, [x10] > > > There is still one extra register copy, which we're trying to remove in this patch. > > This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition. > > Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so > > [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906 src/hotspot/cpu/aarch64/aarch64.ad line 4235: > 4233: operand immLOffset() > 4234: %{ > 4235: predicate(n->get_long() >= -256 && n->get_long() <= 65520); Why is this using hard wired constants rather than using Address::offset_ok_for_immed? Also, why is the constant value 65520? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20157#discussion_r1691150336 From aph at openjdk.org Thu Jul 25 09:56:32 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 25 Jul 2024 09:56:32 GMT Subject: RFR: 8336245: AArch64: remove extra register copy when converting from long to pointer In-Reply-To: <U17GOjlTnzMqNfLCJ3TOFe6qyJbFip49tCMooOR0V94=.ad5ec6c9-6401-4859-8a3c-e9ad917bd54a@github.com> References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> <U17GOjlTnzMqNfLCJ3TOFe6qyJbFip49tCMooOR0V94=.ad5ec6c9-6401-4859-8a3c-e9ad917bd54a@github.com> Message-ID: <zEMJ521TvbtgVoiwHOW8dBvMw2_BzkRQ9g6H2rZafUc=.95f660a0-c710-41f2-a692-71368ce11865@github.com> On Thu, 25 Jul 2024 09:37:42 GMT, Andrew Dinn <adinn at openjdk.org> wrote: >> In the cases like: >> >> UNSAFE.putLong(address + off1 + 1030, lseed); >> UNSAFE.putLong(address + 1023, lseed); >> UNSAFE.putLong(address + off2 + 1001, lseed); >> >> >> Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like: >> >> ldr R10, [R15, #120] # int ! Field: address >> ldr R11, [R16, #136] # int ! Field: off1 >> ldr R12, [R16, #144] # int ! Field: off2 >> add R11, R11, R10 >> mov R11, R11 # long -> ptr >> add R12, R12, R10 >> mov R10, R10 # long -> ptr >> add R11, R11, #1030 # ptr >> str R17, [R11] # int >> add R10, R10, #1023 # ptr >> str R17, [R10] # int >> mov R10, R12 # long -> ptr >> add R10, R10, #1001 # ptr >> str R17, [R10] # int >> >> >> In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly: >> >> ldr x10, [x15,#120] >> ldp x11, x12, [x16,#136] >> add x11, x11, x10 >> add x12, x12, x10 >> add x11, x11, #0x406 >> str x17, [x11] >> add x10, x10, #0x3ff >> str x17, [x10] >> mov x10, x12 <--- extra register copy >> add x10, x10, #0x3e9 >> str x17, [x10] >> >> >> There is still one extra register copy, which we're trying to remove in this patch. >> >> This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition. >> >> Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so >> >> [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906 > > src/hotspot/cpu/aarch64/aarch64.ad line 4235: > >> 4233: operand immLOffset() >> 4234: %{ >> 4235: predicate(n->get_long() >= -256 && n->get_long() <= 65520); > > Why is this using hard wired constants rather than using Address::offset_ok_for_immed? > > Also, why is the constant value 65520? I think `Address::offset_ok_for_immed` is too restrictive: we want a predicate that is the superset of all possible address offsets. jshell> ((1<<12)-1) <<4 $3 ==> 65520 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20157#discussion_r1691171119 From chagedorn at openjdk.org Thu Jul 25 10:48:33 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 25 Jul 2024 10:48:33 GMT Subject: RFR: 8336999: Verification for resource area allocated data structures in C2 [v2] In-Reply-To: <oZKf2g2aJmibrx_nxm8YCuYH1QafKzeuy1FrSLmtglU=.02b61561-b686-4cc4-860c-e5e7c006913e@github.com> References: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com> <oZKf2g2aJmibrx_nxm8YCuYH1QafKzeuy1FrSLmtglU=.02b61561-b686-4cc4-860c-e5e7c006913e@github.com> Message-ID: <3htt-59G3IbpUxCqnYSaj-gkhcoswta47VVQqj-0wzQ=.f46427af-470d-4094-bb36-744e299ded7d@github.com> On Thu, 25 Jul 2024 09:13:07 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: >> Similar to [GrowableArrayNestingCheck](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/growableArray.cpp#L60), we should implement a check for C2's resource allocated data structures that verifies that reallocation happens under the same `ResourceMark` as the original allocation. Otherwise, use-after-free bugs like [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) will lead to memory corruption. >> >> This change adds a [ReallocMark](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/memory/allocation.cpp#L233) to all resource allocated data structures used by C2. I slightly modified it such that it checks the arena and skips verification if the data is not allocated in the resource arena. I also modified the grow methods such that we perform verification even if no reallocation is required. In addition, I changed a few `Unique_Node_List` allocations in vector.cpp from `comp_arena` to resource area allocations because they only have a short lifetime. >> >> While testing, I hit the verification code from: >> >> V [libjvm.so+0x5c1ceb] ReallocMark::check(Arena*)+0x7b (allocation.cpp:244) >> V [libjvm.so+0x6df2da] Block_Array::grow(unsigned int)+0x1a (block.cpp:43) >> V [libjvm.so+0xb88679] PhaseCFG::do_DFS(Tarjan*, unsigned int)+0x159 (block.hpp:72) >> V [libjvm.so+0xb88b6b] PhaseCFG::build_dominator_tree()+0xab (domgraph.cpp:74) >> V [libjvm.so+0xd75791] PhaseCFG::do_global_code_motion()+0x11 (gcm.cpp:1635) >> V [libjvm.so+0x9f4fd4] Compile::Code_Gen()+0x2a4 (compile.cpp:2949) >> V [libjvm.so+0x9f5f16] Compile::Compile(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, DirectiveSet*)+0xba6 (compile.cpp:991) >> >> >> It's a false positive because the code in `PhaseCFG::build_dominator_tree` pre-grows `PhaseCFG::_blocks` to prevent reallocation before entering the scope of a nested ResourceMark. I think that's bad practice and should be avoided. I changed the code to allocate `_blocks` in a separate arena and removed the pre-growing. >> >> This detects [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) right away, even with `java -Xcomp -version`. >> >> We should revisit the footprint impact of arena allocations in C2 with [JDK-8337015](https://bugs.openjdk.org/browse/JDK-8337015). >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/block.cpp > > Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com> Still good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20311#pullrequestreview-2198961385 From adinn at openjdk.org Thu Jul 25 10:50:31 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 25 Jul 2024 10:50:31 GMT Subject: RFR: 8336245: AArch64: remove extra register copy when converting from long to pointer In-Reply-To: <zEMJ521TvbtgVoiwHOW8dBvMw2_BzkRQ9g6H2rZafUc=.95f660a0-c710-41f2-a692-71368ce11865@github.com> References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> <U17GOjlTnzMqNfLCJ3TOFe6qyJbFip49tCMooOR0V94=.ad5ec6c9-6401-4859-8a3c-e9ad917bd54a@github.com> <zEMJ521TvbtgVoiwHOW8dBvMw2_BzkRQ9g6H2rZafUc=.95f660a0-c710-41f2-a692-71368ce11865@github.com> Message-ID: <1n3zHLxSaMMYy7ViMvIvA0Dpo7LA7rOjY2ZKTNtp3xU=.446b0858-ec07-47df-985b-2cd8956974ff@github.com> On Thu, 25 Jul 2024 09:53:45 GMT, Andrew Haley <aph at openjdk.org> wrote: >> src/hotspot/cpu/aarch64/aarch64.ad line 4235: >> >>> 4233: operand immLOffset() >>> 4234: %{ >>> 4235: predicate(n->get_long() >= -256 && n->get_long() <= 65520); >> >> Why is this using hard wired constants rather than using Address::offset_ok_for_immed? >> >> Also, why is the constant value 65520? > > I think `Address::offset_ok_for_immed` is too restrictive: we want a predicate that is the superset of all possible address offsets. > > > jshell> ((1<<12)-1) <<4 > $3 ==> 65520 Yes, I realise that this is 16 less than 65536. However, there are two things I don't follow. In the original code immLoffset was only used to define indOffLN i.e. a long offset used with a narrow pointer. The use of Address::offset_ok_for_immed(n->get_long(), 0) in the predicate limited narrow pointer offsets to -256 <= offset <= (2^12 - 1). With this change the top end of the range is now (2^12 - 1) << 4. I am wondering why that is appropriate? The change allows immLOffset to be used in the definition of indOffX2P. I am not clear why indOffX2P is not just defined using the existing operand immLoffset16 which has as its predicate Address::offset_ok_for_immed(n->get_long(), 4). The only difference I can see is that the alternative predicate used here will accept a positive offset that is not 16 byte aligned. Is that the intention of the redefinition? Again, why is that appropriate? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20157#discussion_r1691242965 From thartmann at openjdk.org Thu Jul 25 10:56:34 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 25 Jul 2024 10:56:34 GMT Subject: RFR: 8336999: Verification for resource area allocated data structures in C2 [v2] In-Reply-To: <oZKf2g2aJmibrx_nxm8YCuYH1QafKzeuy1FrSLmtglU=.02b61561-b686-4cc4-860c-e5e7c006913e@github.com> References: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com> <oZKf2g2aJmibrx_nxm8YCuYH1QafKzeuy1FrSLmtglU=.02b61561-b686-4cc4-860c-e5e7c006913e@github.com> Message-ID: <eeqOET7dIVI01B15yLxmKiRSI7YfZZnf4BmZTTumv5U=.f88064de-b17b-4e51-9fbd-38fd68831f97@github.com> On Thu, 25 Jul 2024 09:13:07 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: >> Similar to [GrowableArrayNestingCheck](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/growableArray.cpp#L60), we should implement a check for C2's resource allocated data structures that verifies that reallocation happens under the same `ResourceMark` as the original allocation. Otherwise, use-after-free bugs like [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) will lead to memory corruption. >> >> This change adds a [ReallocMark](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/memory/allocation.cpp#L233) to all resource allocated data structures used by C2. I slightly modified it such that it checks the arena and skips verification if the data is not allocated in the resource arena. I also modified the grow methods such that we perform verification even if no reallocation is required. In addition, I changed a few `Unique_Node_List` allocations in vector.cpp from `comp_arena` to resource area allocations because they only have a short lifetime. >> >> While testing, I hit the verification code from: >> >> V [libjvm.so+0x5c1ceb] ReallocMark::check(Arena*)+0x7b (allocation.cpp:244) >> V [libjvm.so+0x6df2da] Block_Array::grow(unsigned int)+0x1a (block.cpp:43) >> V [libjvm.so+0xb88679] PhaseCFG::do_DFS(Tarjan*, unsigned int)+0x159 (block.hpp:72) >> V [libjvm.so+0xb88b6b] PhaseCFG::build_dominator_tree()+0xab (domgraph.cpp:74) >> V [libjvm.so+0xd75791] PhaseCFG::do_global_code_motion()+0x11 (gcm.cpp:1635) >> V [libjvm.so+0x9f4fd4] Compile::Code_Gen()+0x2a4 (compile.cpp:2949) >> V [libjvm.so+0x9f5f16] Compile::Compile(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, DirectiveSet*)+0xba6 (compile.cpp:991) >> >> >> It's a false positive because the code in `PhaseCFG::build_dominator_tree` pre-grows `PhaseCFG::_blocks` to prevent reallocation before entering the scope of a nested ResourceMark. I think that's bad practice and should be avoided. I changed the code to allocate `_blocks` in a separate arena and removed the pre-growing. >> >> This detects [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) right away, even with `java -Xcomp -version`. >> >> We should revisit the footprint impact of arena allocations in C2 with [JDK-8337015](https://bugs.openjdk.org/browse/JDK-8337015). >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/block.cpp > > Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com> Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20311#issuecomment-2250043897 From kevinw at openjdk.org Thu Jul 25 10:57:36 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Thu, 25 Jul 2024 10:57:36 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v9] In-Reply-To: <8_dPuH2noHgNOFKzsBke96yBSdGoTwhBl0-pyXaoDhA=.e638cdb0-2ea1-42ca-bd8b-88eaf2b719ac@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <ASx5pXkZUT9ZmH7duwX5AhSsKC6HhUvhauP_qnvYcZE=.8abdc5f3-39b1-4247-b6a3-2d05a68db4f8@github.com> <uZeEnxjF6MkDWWOYSdfSUsP9VHParHx7dRweSXjaeM0=.3f3654c8-bf79-4cf3-88de-ba5530276cd9@github.com> <8_dPuH2noHgNOFKzsBke96yBSdGoTwhBl0-pyXaoDhA=.e638cdb0-2ea1-42ca-bd8b-88eaf2b719ac@github.com> Message-ID: <vgVYMZVc-wpKVEGoCZgsYBvA7fwJ-0TUIWT2g5BAYj4=.3d52cdbb-57a6-4dc2-823e-3c829142a646@github.com> On Wed, 24 Jul 2024 18:36:44 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: > ...made an update to cover that invocation. Thanks for having only one DEFAULT_PERFMAP_FILENAME definition. It could be wrapped with #ifdef LINUX like it was before. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2250043656 From kevinw at openjdk.org Thu Jul 25 10:57:37 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Thu, 25 Jul 2024 10:57:37 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v12] In-Reply-To: <Hj_ISn8I6TozxRCyy2AqQs-F1rnETzYwqMmme-ih87M=.8a7b1a72-ae81-4d00-a6ac-11fa17ec978e@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <Hj_ISn8I6TozxRCyy2AqQs-F1rnETzYwqMmme-ih87M=.8a7b1a72-ae81-4d00-a6ac-11fa17ec978e@github.com> Message-ID: <NhUgfbKGUCOEIf8yU0cpWLePjPuxTTNC8klTO2_rX28=.c3815881-6491-4d04-ae79-a3c98ef9158b@github.com> On Wed, 24 Jul 2024 18:40:16 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. >> >> This PR addresses the following diagnostic commands: >> - [x] Compiler.perfmap >> - [x] GC.heap_dump >> - [x] System.dump_map >> - [x] Thread.dump_to_file >> - [x] VM.cds >> >> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). >> >> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, >> >> >> filename (Optional) Name of the file to which the flight recording data is >> written when the recording is stopped. If no filename is given, a >> filename is generated from the PID and the current date and is >> placed in the directory where the process was started. The >> filename may also be a directory in which case, the filename is >> generated from the PID and the current date in the specified >> directory. (STRING, no default value) >> >> Note: If a filename is given, '%p' in the filename will be >> replaced by the PID, and '%t' will be replaced by the time in >> 'yyyy_MM_dd_HH_mm_ss' format. >> >> >> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. >> >> Testing: >> >> - [x] Added test case passes. >> - [x] Modified existing VM.cds tests to also check for `%p` filenames. >> >> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). >> >> Cheers, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Adding default perfmap filename when invoked outside of diagnostic command Do we need to update all the initialisations to set _filename members to type "FILE" ? e.g. HeapDumpDCmd: there is still _filename("filename","Name of the dump file", "STRING",true), ------------- PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2250044890 From kevinw at openjdk.org Thu Jul 25 11:40:35 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Thu, 25 Jul 2024 11:40:35 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v9] In-Reply-To: <tkO2QN5Nzk3njsiyCgolhdy7fzZ26PfDHe44LK3vUf8=.33b8e759-8725-4333-93ae-9d2a14c523b5@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <ASx5pXkZUT9ZmH7duwX5AhSsKC6HhUvhauP_qnvYcZE=.8abdc5f3-39b1-4247-b6a3-2d05a68db4f8@github.com> <uZeEnxjF6MkDWWOYSdfSUsP9VHParHx7dRweSXjaeM0=.3f3654c8-bf79-4cf3-88de-ba5530276cd9@github.com> <tkO2QN5Nzk3njsiyCgolhdy7fzZ26PfDHe44LK3vUf8=.33b8e759-8725-4333-93ae-9d2a14c523b5@github.com> Message-ID: <m4FyXjoRzxHMR62x8z4u8dylCkZUjnc1GrXU-3KwDcU=.8874c13a-aaea-44f0-81f2-bc6221831bd1@github.com> On Wed, 24 Jul 2024 17:54:30 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> src/hotspot/share/services/diagnosticArgument.hpp line 65: >> >>> 63: class FileArgument { >>> 64: private: >>> 65: char _name[1024]; >> >> Probably JVM_MAXPATHLEN (which might also be 1024). > > Hi, I avoided JVM_MAXPATHLEN because of this comment https://github.com/openjdk/jdk/pull/20198#discussion_r1685297940 It seems strange to me to NOT use MAXPATHLEN (or JVM_MAXPATHLEN), in this one particular place, based on if somebody rebuilds the JDK on a system where it is defined to be very very long, then there would be some unnecessarily large allocations. There are approx 140 other uses. If JVM_MAXPATHLEN is 4k, we are saying that those other usages reserve the 4k, but this particular path should max out at 1024 bytes? Given common cloud paths and even in our test systems, paths are commonly nearly 400 bytes, so 1024 is not that much spare capacity. I don't want to contradict @tstuefe too much, and it's not make or break for this change, but I would think just go with the standard max path len used everywhere else. If there's a problem with memory bloat, then hardcoding one of the usages isn't really going to help much. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1691299887 From kevinw at openjdk.org Thu Jul 25 13:13:34 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Thu, 25 Jul 2024 13:13:34 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v12] In-Reply-To: <NhUgfbKGUCOEIf8yU0cpWLePjPuxTTNC8klTO2_rX28=.c3815881-6491-4d04-ae79-a3c98ef9158b@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <Hj_ISn8I6TozxRCyy2AqQs-F1rnETzYwqMmme-ih87M=.8a7b1a72-ae81-4d00-a6ac-11fa17ec978e@github.com> <NhUgfbKGUCOEIf8yU0cpWLePjPuxTTNC8klTO2_rX28=.c3815881-6491-4d04-ae79-a3c98ef9158b@github.com> Message-ID: <Ip0rmvmbwcSZztOHUHePXBv4Z7ZEFFBaCTSpKCmCeps=.79dd4028-71a4-4e22-8319-0fd04e6aba68@github.com> On Thu, 25 Jul 2024 10:54:43 GMT, Kevin Walls <kevinw at openjdk.org> wrote: > Do we need to update all the initialisations to set _filename members to type "FILE" ? I checked, no we don't NEED to change them. We can, it works either way. It does affect the help output. e.g. Arguments: filepath : The file path to the output file (FILE, no default value) ...which would be good as it's a way of telling people these are FILEs therefore %p is interpreted, rather than just a STRING. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2250289811 From mbaesken at openjdk.org Thu Jul 25 13:39:33 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 25 Jul 2024 13:39:33 GMT Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' [v3] In-Reply-To: <t1g-4dP38_LQWzBPFLqZlsHaDKKrLBc_4LzYSuH_Sc8=.8f4f92c1-2e49-40e0-88b1-ca1c37c1ec70@github.com> References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com> <kjTvU8mJqos12UWZcYqG16iDGRtAWrrpweJNCHmZGf0=.d6f0ff54-bf40-4ff9-a16c-438def9a435f@github.com> <t1g-4dP38_LQWzBPFLqZlsHaDKKrLBc_4LzYSuH_Sc8=.8f4f92c1-2e49-40e0-88b1-ca1c37c1ec70@github.com> Message-ID: <MJy7wKbRFXgZUci6hFRE-3RoJ3z1MXhzcUSRU9rRzH8=.da00e8d6-68a7-4565-bf6a-42d00fe3bb9f@github.com> On Wed, 24 Jul 2024 18:59:34 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: > Here's an untested change that I think will fix the problem. https://github.com/openjdk/jdk/compare/master...kimbarrett:openjdk-jdk:smallregmap?expand=1 Seems this works well, at least :tier1 tests on some platforms (x86_64, ppc64le) look okay to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20296#issuecomment-2250344053 From aph at openjdk.org Thu Jul 25 13:41:33 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 25 Jul 2024 13:41:33 GMT Subject: RFR: 8336245: AArch64: remove extra register copy when converting from long to pointer In-Reply-To: <1n3zHLxSaMMYy7ViMvIvA0Dpo7LA7rOjY2ZKTNtp3xU=.446b0858-ec07-47df-985b-2cd8956974ff@github.com> References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> <U17GOjlTnzMqNfLCJ3TOFe6qyJbFip49tCMooOR0V94=.ad5ec6c9-6401-4859-8a3c-e9ad917bd54a@github.com> <zEMJ521TvbtgVoiwHOW8dBvMw2_BzkRQ9g6H2rZafUc=.95f660a0-c710-41f2-a692-71368ce11865@github.com> <1n3zHLxSaMMYy7ViMvIvA0Dpo7LA7rOjY2ZKTNtp3xU=.446b0858-ec07-47df-985b-2cd8956974ff@github.com> Message-ID: <LrbAv5XoGQuXuZiReg9oJ6hpqq_Ip0wO1VN5bwNWZSA=.43a0e39d-89c1-4ca5-9ad4-fb3208fb3fb6@github.com> On Thu, 25 Jul 2024 10:47:41 GMT, Andrew Dinn <adinn at openjdk.org> wrote: > The change allows immLOffset to be used in the definition of indOffX2P. I am not clear why indOffX2P is not just defined using the existing operand immLoffset16 which has as its predicate Address::offset_ok_for_immed(n->get_long(), 4). After this change, `immLOffset` is a more general-purpose type than `immLoffset16`. `immLOffset` matches all possible address offsets, along with some impossible ones. For example, it matches all of the misaligned `Unsafe` accesses at any offset, regardless of operand size. In the (rare) event that an operand size and offset don't fit a single instruction, we'll split the instruction when we emit it. After this patch there will be a few rare cases where we have a regression in code size, but it's worth it for the simplicity and the size of the matcher logic, which would otherwise explode. I don't expect any significant regression in execution time. This patch is not the last word on the matter; later patches may well further reduce the number of integer offset types in a similar way. I don't think that many of the offsetL/I/X/P types do anything useful, and we'd probably profit from removing them, but that's another patch for anther day. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20157#discussion_r1691462027 From mbaesken at openjdk.org Thu Jul 25 13:42:48 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 25 Jul 2024 13:42:48 GMT Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' [v4] In-Reply-To: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com> References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com> Message-ID: <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com> > When running with ubsan - enabled binaries, some tests trigger the following report : > > src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' > #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91 > #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106 > #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157 > #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63 > #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85 > #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319 > #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233 > #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199 > > Seems in case of (at least) class SmallRegisterMap we miss handling nullptr . Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: add patch of Kim Barrett ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20296/files - new: https://git.openjdk.org/jdk/pull/20296/files/390a2176..b6f5dcfa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20296&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20296&range=02-03 Stats: 133 lines in 11 files changed: 50 ins; 66 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/20296.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20296/head:pull/20296 PR: https://git.openjdk.org/jdk/pull/20296 From aph at openjdk.org Thu Jul 25 13:59:35 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 25 Jul 2024 13:59:35 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v6] In-Reply-To: <vw5vWKYgk45g7I9Yio_NTYLSL9fz3y6ptFHJyGNZJCE=.bb3c7d4e-9a5e-4c53-80b7-853dc74a611c@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> <A2v60vdAPL9qb22NB6kLVyuCACPDeqHUYoYFRFX6ig0=.9ef6f86b-559d-463a-9061-d0bbb6093aa7@github.com> <ukQ_tEZztKeBZnn8TDo3YfJ4GI0mHUrVRZmgM4d1W1g=.1fc9f9f2-c2bf-4237-94d4-dd9aae26411b@github.com> <BolXJ-8qekfYskirR9P20jAQZW6s7WPe4A-oija7RA8=.855251f0-4246-403d-a9fe-00b9406f07e3@github.com> <eLDcJyPLboqZr-8yk1kxVfV6WTaRYXZq5lZvDoIEFKM=.c87b23c8-d9c5-45ff-a2dd-5f0c4875cb62@github.com> <UAjH__AKdU3UMdJBkg7TlElKSA8mEFFE0MiElVrYexE=.4bc67a26-3383-4e4e-92b0-f1d3d33c5ce2@github.com> <M5xQ14pzHdBEr7yAdAqIVUsY_o8tXUgN9HpKxjkZznw=.f2262137-2fec-4297-ae1e-89b11874266f@github.com> <YxBy1Mx7Di5EDfJkCTfcaIuTzCv5KdzBzKMcE3iIeak=.2a56f436-8e14-4a22-a85d-cd06209e2c01@github.com> <vw5vWKYgk45g7I9Yio_NTYLSL9fz3y6ptFHJyGNZJCE=.bb3c7d4e-9a5e-4c53-80b7-853dc74a611c@github.com> Message-ID: <RDhgVrQ6zzzaCBuvMVR7tKA2Qe1SwF2pktr5xcI_duE=.9ae8f5e8-f888-47b5-b979-e56692b278f6@github.com> On Wed, 24 Jul 2024 19:09:06 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >>> > Also also, Klass::is_subtype_of() is used for C1 runtime. >>> >>> Can you elaborate, please? >> >> Sorry, that was rather vague. In C1-compiled code, the Java method `Class::isInstance(Object)`calls `Klass::is_subtype_of()`. >> >> In general, I find it difficult to decide how much work, if any, should be done to improve C1 performance. Clearly, if C1 exists only to help with startup time in a tiered compilation system, the answer is "not much". > > Thanks, now I see that `Class::isInstance(Object)` is backed by `Runtime1::is_instance_of()` which uses `oopDesc::is_a()` to do the job. > > If it turns out to be performance critical, the intrinsic implementation should be rewritten to exercise existing subtype checking support in C1. As it is implemented now, it's already quite inefficient. I did write an intrinsic for that, but it made this patch even larger. I have a small patch for C1, for some other time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1691491950 From mbaesken at openjdk.org Thu Jul 25 14:04:33 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 25 Jul 2024 14:04:33 GMT Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' [v4] In-Reply-To: <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com> References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com> <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com> Message-ID: <znGOpYSNwo0aqgQcaw57RYRCqVixDaVVTBXnt-pIWQ8=.7626c298-e1d7-42d1-8f2e-ac74fcfc5e4a@github.com> On Thu, 25 Jul 2024 13:42:48 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote: >> When running with ubsan - enabled binaries, some tests trigger the following report : >> >> src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' >> #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91 >> #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106 >> #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157 >> #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63 >> #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85 >> #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319 >> #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233 >> #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199 >> >> Seems in case of (at least) class SmallRegisterMap we miss handling nullptr . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > add patch of Kim Barrett Good additional news - the jdk :tier1 tests on linux x86_64 with ubsan - enabled binaries are after this fix almost clean, only some shenandoah related tests still fail because of https://bugs.openjdk.org/browse/JDK-8332697 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23: runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be represented in type 'long int' -------------------------------------------------- TEST: java/foreign/stackwalk/TestAsyncStackWalk.java#shenandoah TEST RESULT: Failed. Unexpected exit from test [exit code: 1] -------------------------------------------------- TEST: java/foreign/stackwalk/TestStackWalk.java#shenandoah TEST RESULT: Failed. Unexpected exit from test [exit code: 1] -------------------------------------------------- Test results: passed: 2,420; failed: 2 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20296#issuecomment-2250400507 From szaldana at openjdk.org Thu Jul 25 14:48:50 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Thu, 25 Jul 2024 14:48:50 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v13] In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> Message-ID: <Ic3egsnhXJQ-4k-m99sIQWIoa0ypGVZ5uIw8OTHhYG8=.0f2f8fbf-421d-45d0-bc46-73922dfde987@github.com> > Hi all, > > This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. > > This PR addresses the following diagnostic commands: > - [x] Compiler.perfmap > - [x] GC.heap_dump > - [x] System.dump_map > - [x] Thread.dump_to_file > - [x] VM.cds > > Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). > > I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, > > > filename (Optional) Name of the file to which the flight recording data is > written when the recording is stopped. If no filename is given, a > filename is generated from the PID and the current date and is > placed in the directory where the process was started. The > filename may also be a directory in which case, the filename is > generated from the PID and the current date in the specified > directory. (STRING, no default value) > > Note: If a filename is given, '%p' in the filename will be > replaced by the PID, and '%t' will be replaced by the time in > 'yyyy_MM_dd_HH_mm_ss' format. > > > Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. > > Testing: > > - [x] Added test case passes. > - [x] Modified existing VM.cds tests to also check for `%p` filenames. > > Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). > > Cheers, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: Wrapping in linux ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20198/files - new: https://git.openjdk.org/jdk/pull/20198/files/d43d90d1..33976d70 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=11-12 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198 PR: https://git.openjdk.org/jdk/pull/20198 From szaldana at openjdk.org Thu Jul 25 14:48:50 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Thu, 25 Jul 2024 14:48:50 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v12] In-Reply-To: <Ip0rmvmbwcSZztOHUHePXBv4Z7ZEFFBaCTSpKCmCeps=.79dd4028-71a4-4e22-8319-0fd04e6aba68@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <Hj_ISn8I6TozxRCyy2AqQs-F1rnETzYwqMmme-ih87M=.8a7b1a72-ae81-4d00-a6ac-11fa17ec978e@github.com> <NhUgfbKGUCOEIf8yU0cpWLePjPuxTTNC8klTO2_rX28=.c3815881-6491-4d04-ae79-a3c98ef9158b@github.com> <Ip0rmvmbwcSZztOHUHePXBv4Z7ZEFFBaCTSpKCmCeps=.79dd4028-71a4-4e22-8319-0fd04e6aba68@github.com> Message-ID: <JujyBGi73a8Bo6IAhyZnsjX_vyQdvGYdpWeDrL35WZA=.a119c16c-72fc-41a9-a590-7a7239546faf@github.com> On Thu, 25 Jul 2024 13:11:00 GMT, Kevin Walls <kevinw at openjdk.org> wrote: > good as it's a way of telling people these are FILEs therefore %p is interpreted, rather than just a STRING. Hi Kevin, I feel this could be more explicit by updating the manpage to explain the %p substitution rather than updating the type to FILE seeing how users are asked to specify a filename rather than an existing file. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2250523394 From kevinw at openjdk.org Thu Jul 25 14:53:35 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Thu, 25 Jul 2024 14:53:35 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v12] In-Reply-To: <JujyBGi73a8Bo6IAhyZnsjX_vyQdvGYdpWeDrL35WZA=.a119c16c-72fc-41a9-a590-7a7239546faf@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <Hj_ISn8I6TozxRCyy2AqQs-F1rnETzYwqMmme-ih87M=.8a7b1a72-ae81-4d00-a6ac-11fa17ec978e@github.com> <NhUgfbKGUCOEIf8yU0cpWLePjPuxTTNC8klTO2_rX28=.c3815881-6491-4d04-ae79-a3c98ef9158b@github.com> <Ip0rmvmbwcSZztOHUHePXBv4Z7ZEFFBaCTSpKCmCeps=.79dd4028-71a4-4e22-8319-0fd04e6aba68@github.com> <JujyBGi73a8Bo6IAhyZnsjX_vyQdvGYdpWeDrL35WZA=.a119c16c-72fc-41a9-a590-7a7239546faf@github.com> Message-ID: <TMigVvxThtPHPTrUpUk8RhJnCeK_-NomWVDbOCO7oi4=.7278475c-f9b2-4031-a3f5-69b296f5732a@github.com> On Thu, 25 Jul 2024 14:46:05 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: > > good as it's a way of telling people these are FILEs therefore %p is interpreted, rather than just a STRING. > > Hi Kevin, > > I feel this could be more explicit by updating the manpage to explain the %p substitution rather than updating the type to FILE seeing how users are asked to specify a filename rather than an existing file. What do you think? Hi, I was thinking both 8-) I can do the man page task as that is still closed... ------------- PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2250551321 From adinn at openjdk.org Thu Jul 25 15:15:33 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 25 Jul 2024 15:15:33 GMT Subject: RFR: 8336245: AArch64: remove extra register copy when converting from long to pointer In-Reply-To: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> Message-ID: <-SR_kRJlM8NfFkcmDOQ6txfC4qVYrvi5w48jQ6y9aZg=.f6f6e942-e247-4e63-b823-1d7a3922039a@github.com> On Fri, 12 Jul 2024 13:44:25 GMT, Fei Gao <fgao at openjdk.org> wrote: > In the cases like: > > UNSAFE.putLong(address + off1 + 1030, lseed); > UNSAFE.putLong(address + 1023, lseed); > UNSAFE.putLong(address + off2 + 1001, lseed); > > > Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like: > > ldr R10, [R15, #120] # int ! Field: address > ldr R11, [R16, #136] # int ! Field: off1 > ldr R12, [R16, #144] # int ! Field: off2 > add R11, R11, R10 > mov R11, R11 # long -> ptr > add R12, R12, R10 > mov R10, R10 # long -> ptr > add R11, R11, #1030 # ptr > str R17, [R11] # int > add R10, R10, #1023 # ptr > str R17, [R10] # int > mov R10, R12 # long -> ptr > add R10, R10, #1001 # ptr > str R17, [R10] # int > > > In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly: > > ldr x10, [x15,#120] > ldp x11, x12, [x16,#136] > add x11, x11, x10 > add x12, x12, x10 > add x11, x11, #0x406 > str x17, [x11] > add x10, x10, #0x3ff > str x17, [x10] > mov x10, x12 <--- extra register copy > add x10, x10, #0x3e9 > str x17, [x10] > > > There is still one extra register copy, which we're trying to remove in this patch. > > This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition. > > Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so > > [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906 Marked as reviewed by adinn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20157#pullrequestreview-2199613265 From adinn at openjdk.org Thu Jul 25 15:15:34 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 25 Jul 2024 15:15:34 GMT Subject: RFR: 8336245: AArch64: remove extra register copy when converting from long to pointer In-Reply-To: <LrbAv5XoGQuXuZiReg9oJ6hpqq_Ip0wO1VN5bwNWZSA=.43a0e39d-89c1-4ca5-9ad4-fb3208fb3fb6@github.com> References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> <U17GOjlTnzMqNfLCJ3TOFe6qyJbFip49tCMooOR0V94=.ad5ec6c9-6401-4859-8a3c-e9ad917bd54a@github.com> <zEMJ521TvbtgVoiwHOW8dBvMw2_BzkRQ9g6H2rZafUc=.95f660a0-c710-41f2-a692-71368ce11865@github.com> <1n3zHLxSaMMYy7ViMvIvA0Dpo7LA7rOjY2ZKTNtp3xU=.446b0858-ec07-47df-985b-2cd8956974ff@github.com> <LrbAv5XoGQuXuZiReg9oJ6hpqq_Ip0wO1VN5bwNWZSA=.43a0e39d-89c1-4ca5-9ad4-fb3208fb3fb6@github.com> Message-ID: <0tiCxWaLEvUkFXN74lGdcytTydBLaSwOrc_Xr4rk-YM=.d9585a6a-42a6-4609-a51c-c257fbfca2b5@github.com> On Thu, 25 Jul 2024 13:38:47 GMT, Andrew Haley <aph at openjdk.org> wrote: >> Yes, I realise that this is 16 less than 65536. However, there are two things I don't follow. >> >> In the original code immLoffset was only used to define indOffLN i.e. a long offset used with a narrow pointer. The use of Address::offset_ok_for_immed(n->get_long(), 0) in the predicate limited narrow pointer offsets to -256 <= offset <= (2^12 - 1). With this change the top end of the range is now (2^12 - 1) << 4. I am wondering why that is appropriate? >> >> The change allows immLOffset to be used in the definition of indOffX2P. I am not clear why indOffX2P is not just defined using the existing operand immLoffset16 which has as its predicate Address::offset_ok_for_immed(n->get_long(), 4). The only difference I can see is that the alternative predicate used here will accept a positive offset that is not 16 byte aligned. Is that the intention of the redefinition? Again, why is that appropriate? > >> The change allows immLOffset to be used in the definition of indOffX2P. I am not clear why indOffX2P is not just defined using the existing operand immLoffset16 which has as its predicate Address::offset_ok_for_immed(n->get_long(), 4). > > After this change, `immLOffset` is a more general-purpose type than `immLoffset16`. `immLOffset` matches all possible address offsets, along with some impossible ones. For example, it matches all of the misaligned `Unsafe` accesses at any offset, regardless of operand size. In the (rare) event that an operand size and offset don't fit a single instruction, we'll split the instruction when we emit it. > > After this patch there will be a few rare cases where we have a regression in code size, but it's worth it for the simplicity and the size of the matcher logic, which would otherwise explode. I don't expect any significant regression in execution time. > > This patch is not the last word on the matter; later patches may well further reduce the number of integer offset types in a similar way. I don't think that many of the offsetL/I/X/P types do anything useful, and we'd probably profit from removing them, but that's another patch for anther day. Ok, I see. The use of immLoffset as currently defined is actually correct for narrow oops and, indeed, for all other address base types. It allows for all possible offsets that might fit into a load an immediate slot. Whether we can legitimately encode the operand offset as an immediate or need instead to use an auxiliary add does not actually depend on the type of the address base but on the size of the datum fetched by the indirect load that consumes the operand. So, an indirect operand with offset 4098 would be too big to encode in an ldrb, fine to encode in an ldrh and invalid for encoding in an ldrw or ldrx because it is not suitably aligned. That does imply we should get rid of the other (now redundant) immLoffset<n> operands. However, we can do that in a follow-up patch because it is not what this fix is addressing ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20157#discussion_r1691636646 From szaldana at openjdk.org Thu Jul 25 15:31:05 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Thu, 25 Jul 2024 15:31:05 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v14] In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> Message-ID: <rKeKx8FnFBhN6mW30EXQDJcETtRcLimDZwu_Z3VQdyA=.5b821a7b-3753-4146-89bb-f5a64effc8c5@github.com> > Hi all, > > This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. > > This PR addresses the following diagnostic commands: > - [x] Compiler.perfmap > - [x] GC.heap_dump > - [x] System.dump_map > - [x] Thread.dump_to_file > - [x] VM.cds > > Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). > > I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, > > > filename (Optional) Name of the file to which the flight recording data is > written when the recording is stopped. If no filename is given, a > filename is generated from the PID and the current date and is > placed in the directory where the process was started. The > filename may also be a directory in which case, the filename is > generated from the PID and the current date in the specified > directory. (STRING, no default value) > > Note: If a filename is given, '%p' in the filename will be > replaced by the PID, and '%t' will be replaced by the time in > 'yyyy_MM_dd_HH_mm_ss' format. > > > Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. > > Testing: > > - [x] Added test case passes. > - [x] Modified existing VM.cds tests to also check for `%p` filenames. > > Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). > > Cheers, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: Adding FILE descriptor for help output ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20198/files - new: https://git.openjdk.org/jdk/pull/20198/files/33976d70..71d3d140 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=12-13 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198 PR: https://git.openjdk.org/jdk/pull/20198 From szaldana at openjdk.org Thu Jul 25 15:31:05 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Thu, 25 Jul 2024 15:31:05 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v12] In-Reply-To: <TMigVvxThtPHPTrUpUk8RhJnCeK_-NomWVDbOCO7oi4=.7278475c-f9b2-4031-a3f5-69b296f5732a@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <Hj_ISn8I6TozxRCyy2AqQs-F1rnETzYwqMmme-ih87M=.8a7b1a72-ae81-4d00-a6ac-11fa17ec978e@github.com> <NhUgfbKGUCOEIf8yU0cpWLePjPuxTTNC8klTO2_rX28=.c3815881-6491-4d04-ae79-a3c98ef9158b@github.com> <Ip0rmvmbwcSZztOHUHePXBv4Z7ZEFFBaCTSpKCmCeps=.79dd4028-71a4-4e22-8319-0fd04e6aba68@github.com> <JujyBGi73a8Bo6IAhyZnsjX_vyQdvGYdpWeDrL35WZA=.a119c16c-72fc-41a9-a590-7a7239546faf@github.com> <TMigVvxThtPHPTrUpUk8RhJnCeK_-NomWVDbOCO7oi4=.7278475c-f9b2-4031-a3f5-69b296f5732a@github.com> Message-ID: <QnzdzdE6YOb5ErBR9ePCYd0Q5x6xTuNMMgy7vMJ-Uhw=.5ef64e7a-b31c-4c07-b94c-d190cf880f0c@github.com> On Thu, 25 Jul 2024 14:51:22 GMT, Kevin Walls <kevinw at openjdk.org> wrote: > > > good as it's a way of telling people these are FILEs therefore %p is interpreted, rather than just a STRING. > > > > > > Hi Kevin, > > I feel this could be more explicit by updating the manpage to explain the %p substitution rather than updating the type to FILE seeing how users are asked to specify a filename rather than an existing file. What do you think? > > Hi, I was thinking both 8-) I can do the man page task as that is still closed... Makes sense. I updated the relevant arguments to `FILE`. > I can do the man page task as that is still closed... That'd be great and thank you for your patience with this review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2250681706 From aph at openjdk.org Thu Jul 25 15:42:04 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 25 Jul 2024 15:42:04 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v7] In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: <KaJs7zZ3iFiDt3n7xWByN9WMex4VndXAPgpy3lMtSRY=.ee768444-5097-4968-8964-99329011ec73@github.com> > This patch expands the use of a hash table for secondary superclasses > to the interpreter, C1, and runtime. It also adds a C2 implementation > of hashed lookup in cases where the superclass isn't known at compile > time. > > HotSpot shared runtime > ---------------------- > > Building hashed secondary tables is now unconditional. It takes very > little time, and now that the shared runtime always has the tables, it > might as well take advantage of them. The shared code is easier to > follow now, I think. > > There might be a performance issue with x86-64 in that we build > HotSpot for a default x86-64 target that does not support popcount. > This means that HotSpot C++ runtime on x86 always uses a software > emulation for popcount, even though the vast majority of machines made > for the past 20 years can do popcount in a single instruction. It > wouldn't be terribly hard to do something about that. > > Having said that, the software popcount is really not bad. > > x86 > --- > > x86 is rather tricky, because we still support > `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as > well as 32- and 64-bit ports. There's some further complication in > that only `RCX` can be used as a shift count, so there's some register > shuffling to do. All of this makes the logic in macroAssembler_x86.cpp > rather gnarly, with multiple levels of conditionals at compile time > and runtime. > > AArch64 > ------- > > AArch64 is considerably more straightforward. We always have a > popcount instruction and (thankfully) no 32-bit code to worry about. > > Generally > --------- > > I would dearly love simply to rip out the "old" secondary supers cache > support, but I've left it in just in case someone has a performance > regression. > > The versions of `MacroAssembler::lookup_secondary_supers_table` that > work with variable superclasses don't take a fixed set of temp > registers, and neither do they call out to to a slow path subroutine. > Instead, the slow patch is expanded inline. > > I don't think this is necessarily bad. Apart from the very rare cases > where C2 can't determine the superclass to search for at compile time, > this code is only used for generating stubs, and it seemed to me > ridiculous to have stubs calling other stubs. > > I've followed the guidance from @iwanowww not to obsess too much about > the performance of C1-compiled secondary supers lookups, and to prefer > simplicity over absolute performance. Nonetheless, this is a > complicated patch that touches many areas. Andrew Haley has updated the pull request incrementally with four additional commits since the last revision: - Cleanup - temp - Merge branch 'JDK-8331658-work' of https://github.com/theRealAph/jdk into JDK-8331658-work - Minor cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19989/files - new: https://git.openjdk.org/jdk/pull/19989/files/48e80a13..011a3880 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=05-06 Stats: 51 lines in 7 files changed: 21 ins; 18 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/19989.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989 PR: https://git.openjdk.org/jdk/pull/19989 From aph at openjdk.org Thu Jul 25 16:05:49 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 25 Jul 2024 16:05:49 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v8] In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: <0fk0Qo0HelMbG6l1d-hxUN504qx0ehO9uNxg9JrOeJU=.0b150931-21ea-4383-b6c2-85f6c74958d1@github.com> > This patch expands the use of a hash table for secondary superclasses > to the interpreter, C1, and runtime. It also adds a C2 implementation > of hashed lookup in cases where the superclass isn't known at compile > time. > > HotSpot shared runtime > ---------------------- > > Building hashed secondary tables is now unconditional. It takes very > little time, and now that the shared runtime always has the tables, it > might as well take advantage of them. The shared code is easier to > follow now, I think. > > There might be a performance issue with x86-64 in that we build > HotSpot for a default x86-64 target that does not support popcount. > This means that HotSpot C++ runtime on x86 always uses a software > emulation for popcount, even though the vast majority of machines made > for the past 20 years can do popcount in a single instruction. It > wouldn't be terribly hard to do something about that. > > Having said that, the software popcount is really not bad. > > x86 > --- > > x86 is rather tricky, because we still support > `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as > well as 32- and 64-bit ports. There's some further complication in > that only `RCX` can be used as a shift count, so there's some register > shuffling to do. All of this makes the logic in macroAssembler_x86.cpp > rather gnarly, with multiple levels of conditionals at compile time > and runtime. > > AArch64 > ------- > > AArch64 is considerably more straightforward. We always have a > popcount instruction and (thankfully) no 32-bit code to worry about. > > Generally > --------- > > I would dearly love simply to rip out the "old" secondary supers cache > support, but I've left it in just in case someone has a performance > regression. > > The versions of `MacroAssembler::lookup_secondary_supers_table` that > work with variable superclasses don't take a fixed set of temp > registers, and neither do they call out to to a slow path subroutine. > Instead, the slow patch is expanded inline. > > I don't think this is necessarily bad. Apart from the very rare cases > where C2 can't determine the superclass to search for at compile time, > this code is only used for generating stubs, and it seemed to me > ridiculous to have stubs calling other stubs. > > I've followed the guidance from @iwanowww not to obsess too much about > the performance of C1-compiled secondary supers lookups, and to prefer > simplicity over absolute performance. Nonetheless, this is a > complicated patch that touches many areas. Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 40 commits: - Merge branch 'clean' into JDK-8331658-work - Cleanup - temp - Merge branch 'JDK-8331658-work' of https://github.com/theRealAph/jdk into JDK-8331658-work - Review comments - Review comments - Review comments - Review comments - Review feedback - Review feedback - ... and 30 more: https://git.openjdk.org/jdk/compare/34ee06f5...248f44dc ------------- Changes: https://git.openjdk.org/jdk/pull/19989/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=07 Stats: 991 lines in 19 files changed: 754 ins; 116 del; 121 mod Patch: https://git.openjdk.org/jdk/pull/19989.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989 PR: https://git.openjdk.org/jdk/pull/19989 From aph at openjdk.org Thu Jul 25 16:25:38 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 25 Jul 2024 16:25:38 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v8] In-Reply-To: <0fk0Qo0HelMbG6l1d-hxUN504qx0ehO9uNxg9JrOeJU=.0b150931-21ea-4383-b6c2-85f6c74958d1@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <0fk0Qo0HelMbG6l1d-hxUN504qx0ehO9uNxg9JrOeJU=.0b150931-21ea-4383-b6c2-85f6c74958d1@github.com> Message-ID: <xS--usGQMNszO_NP1guEfRkZTSvBwWmJ8A3wadP0fBU=.2bc1982b-8f8a-44e1-a34c-1ffdee026db4@github.com> On Thu, 25 Jul 2024 16:05:49 GMT, Andrew Haley <aph at openjdk.org> wrote: >> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't known at compile >> time. >> >> HotSpot shared runtime >> ---------------------- >> >> Building hashed secondary tables is now unconditional. It takes very >> little time, and now that the shared runtime always has the tables, it >> might as well take advantage of them. The shared code is easier to >> follow now, I think. >> >> There might be a performance issue with x86-64 in that we build >> HotSpot for a default x86-64 target that does not support popcount. >> This means that HotSpot C++ runtime on x86 always uses a software >> emulation for popcount, even though the vast majority of machines made >> for the past 20 years can do popcount in a single instruction. It >> wouldn't be terribly hard to do something about that. >> >> Having said that, the software popcount is really not bad. >> >> x86 >> --- >> >> x86 is rather tricky, because we still support >> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as >> well as 32- and 64-bit ports. There's some further complication in >> that only `RCX` can be used as a shift count, so there's some register >> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp >> rather gnarly, with multiple levels of conditionals at compile time >> and runtime. >> >> AArch64 >> ------- >> >> AArch64 is considerably more straightforward. We always have a >> popcount instruction and (thankfully) no 32-bit code to worry about. >> >> Generally >> --------- >> >> I would dearly love simply to rip out the "old" secondary supers cache >> support, but I've left it in just in case someone has a performance >> regression. >> >> The versions of `MacroAssembler::lookup_secondary_supers_table` that >> work with variable superclasses don't take a fixed set of temp >> registers, and neither do they call out to to a slow path subroutine. >> Instead, the slow patch is expanded inline. >> >> I don't think this is necessarily bad. Apart from the very rare cases >> where C2 can't determine the superclass to search for at compile time, >> this code is only used for generating stubs, and it seemed to me >> ridiculous to have stubs calling other stubs. >> >> I've followed the guidance from @iwanowww not to obsess too much about >> the performance of C1-compiled secondary supers lookups, and to prefer >> simplicity over absolute performance. Nonetheless, this i... > > Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 40 commits: > > - Merge branch 'clean' into JDK-8331658-work > - Cleanup > - temp > - Merge branch 'JDK-8331658-work' of https://github.com/theRealAph/jdk into JDK-8331658-work > - Review comments > - Review comments > - Review comments > - Review comments > - Review feedback > - Review feedback > - ... and 30 more: https://git.openjdk.org/jdk/compare/34ee06f5...248f44dc OK! I think that's everything. Are we ready for a second pair of eyes now? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2250880303 From wkemper at openjdk.org Thu Jul 25 17:10:05 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 25 Jul 2024 17:10:05 GMT Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update mode [v3] In-Reply-To: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com> References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com> Message-ID: <wLt5dTWUk5MaiDzfW0G3Lwrk_mUtffeCUxIGJPIka4c=.7ec28766-8e09-4ee9-93d5-9c06642e6906@github.com> > We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development. > > ## Testing > * hotspot_gc_shenandoah > * dacapo > * diluvian > * extremem > * hyperalloc > * specjbb2015 > * specjvm2008 William Kemper has updated the pull request incrementally with one additional commit since the last revision: Simplify final mark now that incremental update mode is removed ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20316/files - new: https://git.openjdk.org/jdk/pull/20316/files/ec2d6b64..79ceade3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20316&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20316&range=01-02 Stats: 13 lines in 1 file changed: 0 ins; 9 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20316.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20316/head:pull/20316 PR: https://git.openjdk.org/jdk/pull/20316 From shade at openjdk.org Thu Jul 25 17:28:31 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 25 Jul 2024 17:28:31 GMT Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update mode [v3] In-Reply-To: <wLt5dTWUk5MaiDzfW0G3Lwrk_mUtffeCUxIGJPIka4c=.7ec28766-8e09-4ee9-93d5-9c06642e6906@github.com> References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com> <wLt5dTWUk5MaiDzfW0G3Lwrk_mUtffeCUxIGJPIka4c=.7ec28766-8e09-4ee9-93d5-9c06642e6906@github.com> Message-ID: <Xw04WrROfcoqCvKoIFtVmdckxzQIJS95qrJ4Zri9iUk=.b5bb6bb4-f440-4831-a311-b33867239f2a@github.com> On Thu, 25 Jul 2024 17:10:05 GMT, William Kemper <wkemper at openjdk.org> wrote: >> We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development. >> >> ## Testing >> * hotspot_gc_shenandoah >> * dacapo >> * diluvian >> * extremem >> * hyperalloc >> * specjbb2015 >> * specjvm2008 > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Simplify final mark now that incremental update mode is removed Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20316#pullrequestreview-2199950416 From wkemper at openjdk.org Thu Jul 25 21:39:06 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 25 Jul 2024 21:39:06 GMT Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update mode [v4] In-Reply-To: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com> References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com> Message-ID: <cNGAzck0BWF_wvjEGCwPJ4908wvwaWKVOEfql6P105Q=.e3794703-ee0b-4421-b1da-75fd9d09fc1d@github.com> > We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development. > > ## Testing > * hotspot_gc_shenandoah > * dacapo > * diluvian > * extremem > * hyperalloc > * specjbb2015 > * specjvm2008 William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge remote-tracking branch 'openjdk/master' into remove-incremental-update-mode - Simplify final mark now that incremental update mode is removed - Remove unintentional new line - Remove last vestiges of incremental update mode - Missed test, remove actual IU barrier flag - Remove missed iu_barrier usages for C1 - Update test (all barriers can be enabled now for all modes) - WIP: Remove incremental update mode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20316/files - new: https://git.openjdk.org/jdk/pull/20316/files/79ceade3..287b6187 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20316&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20316&range=02-03 Stats: 10966 lines in 271 files changed: 7634 ins; 1955 del; 1377 mod Patch: https://git.openjdk.org/jdk/pull/20316.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20316/head:pull/20316 PR: https://git.openjdk.org/jdk/pull/20316 From lmesnik at openjdk.org Thu Jul 25 21:49:36 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 25 Jul 2024 21:49:36 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v14] In-Reply-To: <rKeKx8FnFBhN6mW30EXQDJcETtRcLimDZwu_Z3VQdyA=.5b821a7b-3753-4146-89bb-f5a64effc8c5@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <rKeKx8FnFBhN6mW30EXQDJcETtRcLimDZwu_Z3VQdyA=.5b821a7b-3753-4146-89bb-f5a64effc8c5@github.com> Message-ID: <eVSJPPx1YvOofxUamBOBMAPyc7wugyokec1gQ1erbkQ=.f3dc9d08-3dc8-47d8-845b-a461b705f4c7@github.com> On Thu, 25 Jul 2024 15:31:05 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. >> >> This PR addresses the following diagnostic commands: >> - [x] Compiler.perfmap >> - [x] GC.heap_dump >> - [x] System.dump_map >> - [x] Thread.dump_to_file >> - [x] VM.cds >> >> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). >> >> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, >> >> >> filename (Optional) Name of the file to which the flight recording data is >> written when the recording is stopped. If no filename is given, a >> filename is generated from the PID and the current date and is >> placed in the directory where the process was started. The >> filename may also be a directory in which case, the filename is >> generated from the PID and the current date in the specified >> directory. (STRING, no default value) >> >> Note: If a filename is given, '%p' in the filename will be >> replaced by the PID, and '%t' will be replaced by the time in >> 'yyyy_MM_dd_HH_mm_ss' format. >> >> >> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. >> >> Testing: >> >> - [x] Added test case passes. >> - [x] Modified existing VM.cds tests to also check for `%p` filenames. >> >> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). >> >> Cheers, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Adding FILE descriptor for help output Thank you for improving argument handling in jcmd. Please fix small identation problem. Also I would recommend to get approval from svc team reviewer. src/hotspot/share/prims/wbtestmethods/parserTests.cpp line 132: > 130: } else if (strcmp(type, "FILE") == 0) { > 131: DCmdArgument<FileArgument>* argument = > 132: new DCmdArgument<FileArgument>(name, desc, "FILE", mandatory); Please update identation. ------------- Marked as reviewed by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20198#pullrequestreview-2200510671 PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1692165047 From vlivanov at openjdk.org Thu Jul 25 23:33:38 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 25 Jul 2024 23:33:38 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v8] In-Reply-To: <0fk0Qo0HelMbG6l1d-hxUN504qx0ehO9uNxg9JrOeJU=.0b150931-21ea-4383-b6c2-85f6c74958d1@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <0fk0Qo0HelMbG6l1d-hxUN504qx0ehO9uNxg9JrOeJU=.0b150931-21ea-4383-b6c2-85f6c74958d1@github.com> Message-ID: <2tItgZaRCa5BQrmelOWEsn6FVlHlEvY4is2L1n3HxhE=.eb2519cc-d69e-45e6-8ca3-b5b02565bb76@github.com> On Thu, 25 Jul 2024 16:05:49 GMT, Andrew Haley <aph at openjdk.org> wrote: >> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't known at compile >> time. >> >> HotSpot shared runtime >> ---------------------- >> >> Building hashed secondary tables is now unconditional. It takes very >> little time, and now that the shared runtime always has the tables, it >> might as well take advantage of them. The shared code is easier to >> follow now, I think. >> >> There might be a performance issue with x86-64 in that we build >> HotSpot for a default x86-64 target that does not support popcount. >> This means that HotSpot C++ runtime on x86 always uses a software >> emulation for popcount, even though the vast majority of machines made >> for the past 20 years can do popcount in a single instruction. It >> wouldn't be terribly hard to do something about that. >> >> Having said that, the software popcount is really not bad. >> >> x86 >> --- >> >> x86 is rather tricky, because we still support >> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as >> well as 32- and 64-bit ports. There's some further complication in >> that only `RCX` can be used as a shift count, so there's some register >> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp >> rather gnarly, with multiple levels of conditionals at compile time >> and runtime. >> >> AArch64 >> ------- >> >> AArch64 is considerably more straightforward. We always have a >> popcount instruction and (thankfully) no 32-bit code to worry about. >> >> Generally >> --------- >> >> I would dearly love simply to rip out the "old" secondary supers cache >> support, but I've left it in just in case someone has a performance >> regression. >> >> The versions of `MacroAssembler::lookup_secondary_supers_table` that >> work with variable superclasses don't take a fixed set of temp >> registers, and neither do they call out to to a slow path subroutine. >> Instead, the slow patch is expanded inline. >> >> I don't think this is necessarily bad. Apart from the very rare cases >> where C2 can't determine the superclass to search for at compile time, >> this code is only used for generating stubs, and it seemed to me >> ridiculous to have stubs calling other stubs. >> >> I've followed the guidance from @iwanowww not to obsess too much about >> the performance of C1-compiled secondary supers lookups, and to prefer >> simplicity over absolute performance. Nonetheless, this i... > > Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 40 commits: > > - Merge branch 'clean' into JDK-8331658-work > - Cleanup > - temp > - Merge branch 'JDK-8331658-work' of https://github.com/theRealAph/jdk into JDK-8331658-work > - Review comments > - Review comments > - Review comments > - Review comments > - Review feedback > - Review feedback > - ... and 30 more: https://git.openjdk.org/jdk/compare/34ee06f5...248f44dc Thanks! The patch looks good, except there was one failure observed during testing with the latest patch [1]. It does look related to the latest changes you did in [54050a5](https://github.com/openjdk/jdk/pull/19989/commits/54050a5c2c0aa1d6a9e36d0240c66345765845e3) about `bitmap == SECONDARY_SUPERS_BITMAP_FULL` check. [1] # Internal Error (.../src/hotspot/share/oops/array.hpp:126), pid=1225147, tid=1273512 # assert(i >= 0 && i< _length) failed: oob: 0 <= 63 < 63 V [libjvm.so+0x7f7eab] oopDesc::is_a(Klass*) const+0x21b (array.hpp:126) V [libjvm.so+0xe9805e] java_lang_Throwable::fill_in_stack_trace(Handle, methodHandle const&, JavaThread*)+0x158e (javaClasses.cpp:2677) V [libjvm.so+0xe982cb] java_lang_Throwable::fill_in_stack_trace(Handle, methodHandle const&)+0x6b (javaClasses.cpp:2719) V [libjvm.so+0xfe045c] JVM_FillInStackTrace+0x9c (jvm.cpp:515) .... RBX=0x00000000b446bcf0 is a pointer to class: javasoft.sqe.tests.lang.clss029.clss02902.e67 {0x00000000b446bcf0} ... - name: 'javasoft/sqe/tests/lang/clss029/clss02902/e67' - super: 'javasoft/sqe/tests/lang/clss029/clss02902/e66' - sub: 'javasoft/sqe/tests/lang/clss029/clss02902/e68' ... - secondary supers: Array<T>(0x00007faff68b3058) - hash_slot: 39 - secondary bitmap: 0xffffffffffffffff R12=0x00000000b446ba80 is a pointer to class: javasoft.sqe.tests.lang.clss029.clss02902.e66 {0x00000000b446ba80} ... - name: 'javasoft/sqe/tests/lang/clss029/clss02902/e66' - super: 'javasoft/sqe/tests/lang/clss029/clss02902/e65' - sub: 'javasoft/sqe/tests/lang/clss029/clss02902/e67' ... - secondary supers: Array<T>(0x00007faff68b2c90) - hash_slot: 63 - secondary bitmap: 0xfffffffffffefffd Do we miss ------------- PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2251567973 From vlivanov at openjdk.org Thu Jul 25 23:51:38 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 25 Jul 2024 23:51:38 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v8] In-Reply-To: <0fk0Qo0HelMbG6l1d-hxUN504qx0ehO9uNxg9JrOeJU=.0b150931-21ea-4383-b6c2-85f6c74958d1@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <0fk0Qo0HelMbG6l1d-hxUN504qx0ehO9uNxg9JrOeJU=.0b150931-21ea-4383-b6c2-85f6c74958d1@github.com> Message-ID: <8X8AZw0dKF8wuWXY1KtHXSY0OItqLX-SiAJG6zRYwfY=.8e4b95c2-95ca-4847-aa12-3d8fd7b10f17@github.com> On Thu, 25 Jul 2024 16:05:49 GMT, Andrew Haley <aph at openjdk.org> wrote: >> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't known at compile >> time. >> >> HotSpot shared runtime >> ---------------------- >> >> Building hashed secondary tables is now unconditional. It takes very >> little time, and now that the shared runtime always has the tables, it >> might as well take advantage of them. The shared code is easier to >> follow now, I think. >> >> There might be a performance issue with x86-64 in that we build >> HotSpot for a default x86-64 target that does not support popcount. >> This means that HotSpot C++ runtime on x86 always uses a software >> emulation for popcount, even though the vast majority of machines made >> for the past 20 years can do popcount in a single instruction. It >> wouldn't be terribly hard to do something about that. >> >> Having said that, the software popcount is really not bad. >> >> x86 >> --- >> >> x86 is rather tricky, because we still support >> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as >> well as 32- and 64-bit ports. There's some further complication in >> that only `RCX` can be used as a shift count, so there's some register >> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp >> rather gnarly, with multiple levels of conditionals at compile time >> and runtime. >> >> AArch64 >> ------- >> >> AArch64 is considerably more straightforward. We always have a >> popcount instruction and (thankfully) no 32-bit code to worry about. >> >> Generally >> --------- >> >> I would dearly love simply to rip out the "old" secondary supers cache >> support, but I've left it in just in case someone has a performance >> regression. >> >> The versions of `MacroAssembler::lookup_secondary_supers_table` that >> work with variable superclasses don't take a fixed set of temp >> registers, and neither do they call out to to a slow path subroutine. >> Instead, the slow patch is expanded inline. >> >> I don't think this is necessarily bad. Apart from the very rare cases >> where C2 can't determine the superclass to search for at compile time, >> this code is only used for generating stubs, and it seemed to me >> ridiculous to have stubs calling other stubs. >> >> I've followed the guidance from @iwanowww not to obsess too much about >> the performance of C1-compiled secondary supers lookups, and to prefer >> simplicity over absolute performance. Nonetheless, this i... > > Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 40 commits: > > - Merge branch 'clean' into JDK-8331658-work > - Cleanup > - temp > - Merge branch 'JDK-8331658-work' of https://github.com/theRealAph/jdk into JDK-8331658-work > - Review comments > - Review comments > - Review comments > - Review comments > - Review feedback > - Review feedback > - ... and 30 more: https://git.openjdk.org/jdk/compare/34ee06f5...248f44dc src/hotspot/share/oops/klass.inline.hpp line 82: > 80: // subtype check: true if is_subclass_of, or if k is interface and receiver implements it > 81: inline bool Klass::is_subtype_of(Klass* k) const { > 82: guarantee(secondary_supers() != nullptr, "must be"); Minor point: considering libjvm contains hundreds of copies, does it make sense to turn it into an assert instead? For example, on AArch64 the check costs 36 bytes [1] in product build. [1] libjvm.dylib[0x1306d4] <+28>: ldr x9, [x8, #0x28] libjvm.dylib[0x1306d8] <+32>: cbz x9, 0x1307dc ; <+292> [inlined] Klass::is_subtype_of(Klass*) const at klass.inline.hpp:82:3 ... libjvm.dylib[0x1307dc] <+292>: adrp x0, 3200 libjvm.dylib[0x1307e0] <+296>: add x0, x0, #0x663 ; "open/src/hotspot/share/oops/klass.inline.hpp" libjvm.dylib[0x1307e4] <+300>: adrp x2, 3200 libjvm.dylib[0x1307e8] <+304>: add x2, x2, #0x690 ; "guarantee(secondary_supers() != nullptr) failed" libjvm.dylib[0x1307ec] <+308>: adrp x3, 3200 libjvm.dylib[0x1307f0] <+312>: add x3, x3, #0x6c0 ; "must be" libjvm.dylib[0x1307f4] <+316>: mov w1, #0x52 libjvm.dylib[0x1307f8] <+320>: bl 0x54f870 ; report_vm_error at debug.cpp:181 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1692288943 From wkemper at openjdk.org Fri Jul 26 00:02:38 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 26 Jul 2024 00:02:38 GMT Subject: Integrated: 8336685: Shenandoah: Remove experimental incremental update mode In-Reply-To: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com> References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com> Message-ID: <dkk2Zl3MgpS5RuSIAhg5A1NkDc6PtROb0X6R0nwRzAA=.606a1b08-b70f-4498-b696-89a8c80c09e7@github.com> On Wed, 24 Jul 2024 18:08:46 GMT, William Kemper <wkemper at openjdk.org> wrote: > We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development. > > ## Testing > * hotspot_gc_shenandoah > * dacapo > * diluvian > * extremem > * hyperalloc > * specjbb2015 > * specjvm2008 This pull request has now been integrated. Changeset: 0584af23 Author: William Kemper <wkemper at openjdk.org> URL: https://git.openjdk.org/jdk/commit/0584af23255b6b8f49190eaf2618f3bcc299adfe Stats: 1708 lines in 69 files changed: 4 ins; 1668 del; 36 mod 8336685: Shenandoah: Remove experimental incremental update mode Reviewed-by: shade ------------- PR: https://git.openjdk.org/jdk/pull/20316 From ysr at openjdk.org Fri Jul 26 01:09:39 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 26 Jul 2024 01:09:39 GMT Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update mode [v4] In-Reply-To: <cNGAzck0BWF_wvjEGCwPJ4908wvwaWKVOEfql6P105Q=.e3794703-ee0b-4421-b1da-75fd9d09fc1d@github.com> References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com> <cNGAzck0BWF_wvjEGCwPJ4908wvwaWKVOEfql6P105Q=.e3794703-ee0b-4421-b1da-75fd9d09fc1d@github.com> Message-ID: <B9KUTPsaCmQ0ewO73Mdh6lHXUpu_lAmCNQ8FC6_fzkU=.e2c55cef-c263-4b41-a7d7-529d7f7740e3@github.com> On Thu, 25 Jul 2024 21:39:06 GMT, William Kemper <wkemper at openjdk.org> wrote: >> We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development. >> >> ## Testing >> * hotspot_gc_shenandoah >> * dacapo >> * diluvian >> * extremem >> * hyperalloc >> * specjbb2015 >> * specjvm2008 > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge remote-tracking branch 'openjdk/master' into remove-incremental-update-mode > - Simplify final mark now that incremental update mode is removed > - Remove unintentional new line > - Remove last vestiges of incremental update mode > - Missed test, remove actual IU barrier flag > - Remove missed iu_barrier usages for C1 > - Update test (all barriers can be enabled now for all modes) > - WIP: Remove incremental update mode I'd left my review comments instead of flushing them yesterday. Not sure if they are still relevant, but here they go... fwiw. src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 339: > 337: product(bool, ShenandoahIUBarrier, false, DIAGNOSTIC, \ > 338: "Turn on/off I-U barriers barriers in Shenandoah") \ > 339: \ Not your change, but these doc comments should really say `"Enable blah-blah ..."` rather than `"Turn on/off blah-blah..."`. ------------- PR Review: https://git.openjdk.org/jdk/pull/20316#pullrequestreview-2198135931 PR Review Comment: https://git.openjdk.org/jdk/pull/20316#discussion_r1690703426 From ysr at openjdk.org Fri Jul 26 01:09:40 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 26 Jul 2024 01:09:40 GMT Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update mode [v2] In-Reply-To: <03bSRAN8T28AU2-M4IzjsBygTwG4SHrc8HUIJYLM5TE=.e4299a87-b25f-471b-9f6e-2c08e741c6f2@github.com> References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com> <03bSRAN8T28AU2-M4IzjsBygTwG4SHrc8HUIJYLM5TE=.e4299a87-b25f-471b-9f6e-2c08e741c6f2@github.com> Message-ID: <Fp-wdKTTyKEIGRH8CpV38B8gThKzmvNmbrLcN9Yc4Rg=.1bd4121b-9a9a-47d8-850e-1e9bb4dbbfdb@github.com> On Wed, 24 Jul 2024 19:31:04 GMT, William Kemper <wkemper at openjdk.org> wrote: >> We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development. >> >> ## Testing >> * hotspot_gc_shenandoah >> * dacapo >> * diluvian >> * extremem >> * hyperalloc >> * specjbb2015 >> * specjvm2008 > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Remove unintentional new line src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 122: > 120: > 121: ShenandoahMarkRefsClosure<GENERATION> mark_cl(q, rp); > 122: ShenandoahSATBAndRemarkThreadsClosure tc(satb_mq_set, nullptr); Because this is the only c'tor usage of this closure, may be get rid of the second argument altogether, and clean up its `do_thread()` method further above at lines 84-89 ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20316#discussion_r1690701759 From dholmes at openjdk.org Fri Jul 26 04:08:59 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 26 Jul 2024 04:08:59 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string Message-ID: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence. The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this. Testing: - new gtest exercises the truncation code with the different possibilities for bad truncation - tiers 1-3 sanity testing Thanks. ------------- Commit messages: - Fixed typo - 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string Changes: https://git.openjdk.org/jdk/pull/20345/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20345&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325002 Stats: 180 lines in 4 files changed: 177 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20345.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20345/head:pull/20345 PR: https://git.openjdk.org/jdk/pull/20345 From dholmes at openjdk.org Fri Jul 26 04:29:36 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 26 Jul 2024 04:29:36 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured In-Reply-To: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com> References: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com> Message-ID: <iMT1EV6Ol7de4iryXsfwqufpXOFMoBckgURpg0XRQa8=.6e31894c-f976-4e1b-8295-157303885927@github.com> On Wed, 26 Jun 2024 13:32:32 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote: > Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work. > > We find this in the test output > > [STDOUT] > /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory > > The container where the test is executed does not contain the ubsan package; we might skip the test in this case. I think others who have more investment in this area need to weigh in. I don't know the implications for our infra folk if we need to ensure ubsan is installed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19907#issuecomment-2251949827 From stuefe at openjdk.org Fri Jul 26 05:39:33 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 26 Jul 2024 05:39:33 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string In-Reply-To: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> Message-ID: <wHY5e9XeMFpUyA7Zr0RKG2zIXC3rB5dqklIuzb8TnAQ=.55cc765a-6ec8-46dc-8cf1-4fe49d4aa476@github.com> On Fri, 26 Jul 2024 04:03:10 GMT, David Holmes <dholmes at openjdk.org> wrote: > Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence. > > The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this. > > Testing: > - new gtest exercises the truncation code with the different possibilities for bad truncation > - tiers 1-3 sanity testing > > Thanks. This is a neat technique, but it won't work for very short strings (e.g. consisting of just one or two multi-byte sequences, the latter being invalid). Reason is that you need a minimal buffer length to do the check safely. What you could do alternatively is to allocate the `msg` buffer with 5 leading bytes that you don't use, just zero-initialize. So the string start would be at msg+5. But that way, you can safely overstep the array with e.g. index - 5 without corruption. src/hotspot/share/utilities/exceptions.cpp line 275: > 273: // we may also have a truncated UTF-8 sequence. In such cases we need to fix the buffer so the UTF-8 > 274: // sequence is valid. > 275: if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) { You should test for length >= 5 since it is the farthest you could access in `UTF8::truncate_to_legal_utf8` later: for (int index = length - 2; index > 0; index--) { ... assert(buffer[index - 3] == 0xED, "malformed sequence"); src/hotspot/share/utilities/exceptions.cpp line 276: > 274: // sequence is valid. > 275: if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) { > 276: assert(msg[max_msg_size - 1] == '\0', "should be null terminated"); Would this always be true? For a formatting error, too? Maybe just to be sure, instead of asserting set the last byte to zero. src/hotspot/share/utilities/utf8.cpp line 407: > 405: // To avoid that the caller can choose to check for validity first. > 406: // The incoming buffer is still expected to be NUL-terminated. > 407: void UTF8::truncate_to_legal_utf8(unsigned char* buffer, int length) { Lets make buffer length size_t and avoid awkward casting ------------- PR Review: https://git.openjdk.org/jdk/pull/20345#pullrequestreview-2200961895 PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1692526390 PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1692526795 PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1692524657 From stuefe at openjdk.org Fri Jul 26 05:44:31 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 26 Jul 2024 05:44:31 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string In-Reply-To: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> Message-ID: <WBv-Q4gapmUUyRSFfJrCY6A24u9ovvovQiCZ7N9ZadU=.9374cf3f-94c6-47bf-aeaa-037132083a6b@github.com> On Fri, 26 Jul 2024 04:03:10 GMT, David Holmes <dholmes at openjdk.org> wrote: > Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence. > > The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this. > > Testing: > - new gtest exercises the truncation code with the different possibilities for bad truncation > - tiers 1-3 sanity testing > > Thanks. src/hotspot/share/utilities/exceptions.cpp line 277: > 275: if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) { > 276: assert(msg[max_msg_size - 1] == '\0', "should be null terminated"); > 277: UTF8::truncate_to_legal_utf8((unsigned char*)msg, max_msg_size); Ah, I misread your patch and thought you pass in the strlen of the message to the truncation function, when in fact you pass in the hard coded message buffer size. But that begs the question of why you test strlen above, and more importantly, whether all cases where snprintf returns an error are truncation problems. It could have detected an invalid UTF8 sequence and aborted in the middle of it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1692538448 From vlivanov at openjdk.org Fri Jul 26 05:59:33 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 26 Jul 2024 05:59:33 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v8] In-Reply-To: <RDhgVrQ6zzzaCBuvMVR7tKA2Qe1SwF2pktr5xcI_duE=.9ae8f5e8-f888-47b5-b979-e56692b278f6@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com> <A2v60vdAPL9qb22NB6kLVyuCACPDeqHUYoYFRFX6ig0=.9ef6f86b-559d-463a-9061-d0bbb6093aa7@github.com> <ukQ_tEZztKeBZnn8TDo3YfJ4GI0mHUrVRZmgM4d1W1g=.1fc9f9f2-c2bf-4237-94d4-dd9aae26411b@github.com> <BolXJ-8qekfYskirR9P20jAQZW6s7WPe4A-oija7RA8=.855251f0-4246-403d-a9fe-00b9406f07e3@github.com> <eLDcJyPLboqZr-8yk1kxVfV6WTaRYXZq5lZvDoIEFKM=.c87b23c8-d9c5-45ff-a2dd-5f0c4875cb62@github.com> <UAjH__AKdU3UMdJBkg7TlElKSA8mEFFE0MiElVrYexE=.4bc67a26-3383-4e4e-92b0-f1d3d33c5ce2@github.com> <M5xQ14pzHdBEr7yAdAqIVUsY_o8tXUgN9HpKxjkZznw=.f2262137-2fec-4297-ae1e-89b11874266f@github.com> <YxBy1Mx7Di5EDfJkCTfcaIuTzCv5KdzBzKMcE3iIeak=.2a56f436-8e14-4a22-a85d-cd06209e2c01@github.com> <vw5vWKYgk45g7I9Yio_NTYLSL9fz3y6ptFHJyGNZJCE=.bb3c7d4e-9a5e-4c53-80b7-853dc74a611c@github.com> <RDhgVrQ6zzzaCBuvMVR7tKA2Qe1SwF2pktr 5xcI_duE=.9ae8f5e8-f888-47b5-b979-e56692b278f6@github.com> Message-ID: <oTP5cg-k2QeE1yGQgR3Cueo4EQZYZOr5QofOEulYM4s=.48445657-22aa-4631-b6d1-e3040f3b329f@github.com> On Thu, 25 Jul 2024 13:56:34 GMT, Andrew Haley <aph at openjdk.org> wrote: >> Thanks, now I see that `Class::isInstance(Object)` is backed by `Runtime1::is_instance_of()` which uses `oopDesc::is_a()` to do the job. >> >> If it turns out to be performance critical, the intrinsic implementation should be rewritten to exercise existing subtype checking support in C1. As it is implemented now, it's already quite inefficient. > > I did write an intrinsic for that, but it made this patch even larger. I have a small patch for C1, for some other time. FYI I filed a low-priority RFE against C1 to track it ([JDK-8337251](https://bugs.openjdk.org/browse/JDK-8337251)). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1692548251 From stuefe at openjdk.org Fri Jul 26 06:41:36 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 26 Jul 2024 06:41:36 GMT Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic In-Reply-To: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com> References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com> Message-ID: <ej5ON8iDbUMsORwZNuLzDbXERpzGJde7q_hd50vKPGo=.34c0d39e-7a85-49a7-9d10-363a9800cc4d@github.com> On Tue, 23 Jul 2024 21:46:50 GMT, Ashutosh Mehra <asmehra at openjdk.org> wrote: > Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031) > > Testing: > test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java > test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java Hi Ashu, generally okay, see remarks. It seems awkward to have a size_t vector with a defined length, and then having to specify the length as input argument. I'd consider either use the good old style of void foo(const size_t array[NUM], ...); (using array with a defined size, but careful since in foo sizeof(array) is still just a pointer) or just write a small wrapper class holding a size_t vector and taking care of the copying. src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 58: > 56: } > 57: > 58: void ArenaStatCounter::init() { Proposal: `reset()` ? src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 115: > 113: for (int tag = 0; tag < Arena::tag_count(); tag++) { > 114: total += _tags_size[tag]; > 115: } Do it with x-macro? src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 118: > 116: if (total != _current) { > 117: log_info(compilation, alloc)("WARNING!!! Total does not match current"); > 118: } Why do we calculate total? Just for this test? I would then put this into an ASSERT section, and make the info log an assert. However, I wonder if this is really needed. The logic updating both _current and _tags_size is pretty straightforward, I don't see how there could be a mismatch. src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 204: > 202: size_t _total; > 203: // usage per arena tag when total peaked > 204: size_t _tags_size_at_peak[Arena::tag_count()]; Can you please make sure Arena::tag_count() evaluates to constexpr? When in doubt, just use the enum value instead. src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 226: > 224: > 225: void set_total(size_t n) { _total = n; } > 226: void set_tags_size_at_peak(size_t* tags_size_at_peak, int nelements) { const size_t* src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 228: > 226: void set_tags_size_at_peak(size_t* tags_size_at_peak, int nelements) { > 227: assert(nelements*sizeof(size_t) <= sizeof(_tags_size_at_peak), "overflow check"); > 228: memcpy(_tags_size_at_peak, tags_size_at_peak, nelements*sizeof(size_t)); style, we do blanks between * (n * x, not n*x) src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 242: > 240: for (int tag = 0; tag < Arena::tag_count(); tag++) { > 241: st->print_cr(" " LEGEND_KEY_FMT ": %s", Arena::tag_name[tag], Arena::tag_desc[tag]); > 242: } use x macro? src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 365: > 363: > 364: void add(const FullMethodName& fmn, CompilerType comptype, > 365: size_t total, size_t* tags_size_at_peak, int nelements, const size_t* src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 471: > 469: _the_table->add(fmn, ct, > 470: arena_stat->peak(), // total > 471: (size_t *)arena_stat->tags_size_at_peak(), Cast should not be needed src/hotspot/share/compiler/compilationMemoryStatistic.hpp line 46: > 44: size_t _peak; > 45: // Current bytes used by arenas per tag > 46: size_t _tags_size[Arena::tag_count()]; Proposal: `_current_by_tag`, referring to _current src/hotspot/share/compiler/compilationMemoryStatistic.hpp line 53: > 51: > 52: // Peak composition: > 53: size_t _tags_size_at_peak[Arena::tag_count()]; `_peak_by_tag` ? src/hotspot/share/memory/arena.cpp line 48: > 46: > 47: #define ARENA_TAG_STRING_(str) #str > 48: #define ARENA_TAG_STRING(name, str, desc) ARENA_TAG_STRING_(str), Can you use STR/XSTR in macros.hpp? src/hotspot/share/memory/arena.hpp line 86: > 84: }; > 85: > 86: #define DO_ARENA_TAG(template) \ Please don't name this template, confuses my IDE. We usually call it DO or XX or something like that src/hotspot/share/memory/arena.hpp line 97: > 95: > 96: #define ARENA_TAG_ENUM_(name) tag_##name > 97: #define ARENA_TAG_ENUM(name, str, desc) ARENA_TAG_ENUM_(name), Here, and in other places: Please try to cut down the number of temp. macros. You can just as well do a enum class Tag { #define XX(name, whatever, whatever2) tag_##name DO_ARENA_TAG(XX) #undef XX num_tags }; here. ------------- PR Review: https://git.openjdk.org/jdk/pull/20304#pullrequestreview-2201007416 PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692556908 PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692554736 PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692556339 PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692574321 PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692574925 PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692577085 PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692577477 PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692578328 PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692579046 PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692557726 PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692557957 PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692559750 PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692561100 PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692564129 From mbaesken at openjdk.org Fri Jul 26 07:40:31 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 26 Jul 2024 07:40:31 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured In-Reply-To: <iMT1EV6Ol7de4iryXsfwqufpXOFMoBckgURpg0XRQa8=.6e31894c-f976-4e1b-8295-157303885927@github.com> References: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com> <iMT1EV6Ol7de4iryXsfwqufpXOFMoBckgURpg0XRQa8=.6e31894c-f976-4e1b-8295-157303885927@github.com> Message-ID: <LPSf0cysKCYU5hiQqDe7fJ0QT_4gXMJ9FVu-cfKCXNc=.c873813f-fed8-418d-9349-55ff41e282eb@github.com> On Fri, 26 Jul 2024 04:27:04 GMT, David Holmes <dholmes at openjdk.org> wrote: > I think others who have more investment in this area need to weigh in. I don't know the implications for our infra folk if we need to ensure ubsan is installed. I think this would be in the standard container config / BUILDFILE we use for the tests. So if all works well, no implications. On the other hand, we could also just run the ubsan - based tests with an own exclude/problem list and exclude the docker test that currently cannot work. That needs a separate list but no other src changes like this PR or the idea with adjusted docker container config. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19907#issuecomment-2252155558 From mli at openjdk.org Fri Jul 26 07:56:08 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 26 Jul 2024 07:56:08 GMT Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v5] In-Reply-To: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com> References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com> Message-ID: <7Do3NsCKTNc9hpLgInx3V8mAvLEQEmdmP0n5VXy4uck=.e45d67c2-6d72-454e-a836-4cb5886e6066@github.com> > Hi, > Can you help to review the patch? > > I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it. > > Thanks. > > ## Test > benchmarks run on CanVM-K230 (vlenb == 16), and banana-pi (vlenb == 32) > > I've tried several implementations, respectively with vector group > * m2+m1+scalar > * m2+scalar > * m1+scalar > * pure scalar > The best one is combination of m2+m1, it have best performance in all source size. > > ### K230 > > this implementation (m2+m1) > <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> > Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic > -- | -- | -- | -- | -- | -- | -- | -- | -- > Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063 > Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443 > Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364 > Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856 > Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146 > Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423 > Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862 > Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469 > Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631 > Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698 > Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876 > > </google-sheets-html-origin> > > vector with only m2 > <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px... Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into baes64-encode-integrated - move label - refine code - use pure scalar version when rvv is not supported - clean code - Initial commit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19973/files - new: https://git.openjdk.org/jdk/pull/19973/files/8645a6a1..736f5f8b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19973&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19973&range=03-04 Stats: 8439 lines in 328 files changed: 5796 ins; 1367 del; 1276 mod Patch: https://git.openjdk.org/jdk/pull/19973.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19973/head:pull/19973 PR: https://git.openjdk.org/jdk/pull/19973 From mli at openjdk.org Fri Jul 26 08:10:01 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 26 Jul 2024 08:10:01 GMT Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v6] In-Reply-To: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com> References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com> Message-ID: <0NpNq_wNl-qus6kEr_6J7liSQXXYdjybbWQWDJPGPmQ=.8ba0ea43-2bc7-4f01-afee-adb4a43da29c@github.com> > Hi, > Can you help to review the patch? > > I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it. > > Thanks. > > ## Test > benchmarks run on CanVM-K230 (vlenb == 16), and banana-pi (vlenb == 32) > > I've tried several implementations, respectively with vector group > * m2+m1+scalar > * m2+scalar > * m1+scalar > * pure scalar > The best one is combination of m2+m1, it have best performance in all source size. > > ### K230 > > this implementation (m2+m1) > <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> > Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic > -- | -- | -- | -- | -- | -- | -- | -- | -- > Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063 > Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443 > Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364 > Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856 > Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146 > Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423 > Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862 > Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469 > Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631 > Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698 > Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876 > > </google-sheets-html-origin> > > vector with only m2 > <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px... Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - merge master - Merge branch 'master' into baes64-encode-integrated - move label - refine code - use pure scalar version when rvv is not supported - clean code - Initial commit ------------- Changes: https://git.openjdk.org/jdk/pull/19973/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19973&range=05 Stats: 238 lines in 3 files changed: 238 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19973.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19973/head:pull/19973 PR: https://git.openjdk.org/jdk/pull/19973 From djelinski at openjdk.org Fri Jul 26 08:18:41 2024 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Fri, 26 Jul 2024 08:18:41 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string In-Reply-To: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> Message-ID: <S1NZjbJMW41XauI6C9DQy6i4IPitvkb-1UJWz8Rp3OI=.10e0de51-fe1a-44af-b414-053faf37737b@github.com> On Fri, 26 Jul 2024 04:03:10 GMT, David Holmes <dholmes at openjdk.org> wrote: > Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence. > > The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this. > > Testing: > - new gtest exercises the truncation code with the different possibilities for bad truncation > - tiers 1-3 sanity testing > > Thanks. src/hotspot/share/utilities/utf8.cpp line 440: > 438: // Could be first or fourth byte. If fourth > 439: // then 2 bytes before will have second byte pattern (0b1010xxxx) > 440: if ((index - 3) >= 0 && ((buffer[index - 2] & 0xA0) == 0xA0)) { Suggestion: if ((index - 3) >= 0 && ((buffer[index - 2] & 0xF0) == 0xA0)) { src/hotspot/share/utilities/utf8.cpp line 442: > 440: if ((index - 3) >= 0 && ((buffer[index - 2] & 0xA0) == 0xA0)) { > 441: // it was fourth byte so truncate 3 bytes earlier > 442: assert(buffer[index - 3] == 0xED, "malformed sequence"); This needs to be an if, not an assert: ec-a0-80 is a [legitimate 3-byte UTF-8](https://www.compart.com/en/unicode/U+C800) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1692684932 PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1692684622 From fgao at openjdk.org Fri Jul 26 09:39:42 2024 From: fgao at openjdk.org (Fei Gao) Date: Fri, 26 Jul 2024 09:39:42 GMT Subject: RFR: 8336245: AArch64: remove extra register copy when converting from long to pointer In-Reply-To: <A7LqCA84i3ml2kFafMJr2_ENuyn9yW-KjBViIryuKBU=.8efd29b0-3636-4ef7-aa2c-dc92228cefc5@github.com> References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> <A7LqCA84i3ml2kFafMJr2_ENuyn9yW-KjBViIryuKBU=.8efd29b0-3636-4ef7-aa2c-dc92228cefc5@github.com> Message-ID: <xwzJoMkq5YYrecjs4BtCmnDdL4ngEWUTICPeEohZu-g=.6cf68070-36f5-43c4-924c-38514746c919@github.com> On Mon, 15 Jul 2024 11:00:39 GMT, Andrew Haley <aph at openjdk.org> wrote: >> In the cases like: >> >> UNSAFE.putLong(address + off1 + 1030, lseed); >> UNSAFE.putLong(address + 1023, lseed); >> UNSAFE.putLong(address + off2 + 1001, lseed); >> >> >> Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like: >> >> ldr R10, [R15, #120] # int ! Field: address >> ldr R11, [R16, #136] # int ! Field: off1 >> ldr R12, [R16, #144] # int ! Field: off2 >> add R11, R11, R10 >> mov R11, R11 # long -> ptr >> add R12, R12, R10 >> mov R10, R10 # long -> ptr >> add R11, R11, #1030 # ptr >> str R17, [R11] # int >> add R10, R10, #1023 # ptr >> str R17, [R10] # int >> mov R10, R12 # long -> ptr >> add R10, R10, #1001 # ptr >> str R17, [R10] # int >> >> >> In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly: >> >> ldr x10, [x15,#120] >> ldp x11, x12, [x16,#136] >> add x11, x11, x10 >> add x12, x12, x10 >> add x11, x11, #0x406 >> str x17, [x11] >> add x10, x10, #0x3ff >> str x17, [x10] >> mov x10, x12 <--- extra register copy >> add x10, x10, #0x3e9 >> str x17, [x10] >> >> >> There is still one extra register copy, which we're trying to remove in this patch. >> >> This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition. >> >> Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so >> >> [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906 > > This will need quite a lot of testing, perhaps higher tiers and jcstress. You can test these two PRs together. @theRealAph @adinn Thanks for your reviews! I'll integrate it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20157#issuecomment-2252351336 From fgao at openjdk.org Fri Jul 26 09:39:43 2024 From: fgao at openjdk.org (Fei Gao) Date: Fri, 26 Jul 2024 09:39:43 GMT Subject: Integrated: 8336245: AArch64: remove extra register copy when converting from long to pointer In-Reply-To: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com> Message-ID: <CzXwApza6QlyJkIQX0p4Ddk351Zbsu7c_GhLS331iJ8=.618d3ade-a46b-4102-9cbb-2a5744de01ec@github.com> On Fri, 12 Jul 2024 13:44:25 GMT, Fei Gao <fgao at openjdk.org> wrote: > In the cases like: > > UNSAFE.putLong(address + off1 + 1030, lseed); > UNSAFE.putLong(address + 1023, lseed); > UNSAFE.putLong(address + off2 + 1001, lseed); > > > Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like: > > ldr R10, [R15, #120] # int ! Field: address > ldr R11, [R16, #136] # int ! Field: off1 > ldr R12, [R16, #144] # int ! Field: off2 > add R11, R11, R10 > mov R11, R11 # long -> ptr > add R12, R12, R10 > mov R10, R10 # long -> ptr > add R11, R11, #1030 # ptr > str R17, [R11] # int > add R10, R10, #1023 # ptr > str R17, [R10] # int > mov R10, R12 # long -> ptr > add R10, R10, #1001 # ptr > str R17, [R10] # int > > > In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly: > > ldr x10, [x15,#120] > ldp x11, x12, [x16,#136] > add x11, x11, x10 > add x12, x12, x10 > add x11, x11, #0x406 > str x17, [x11] > add x10, x10, #0x3ff > str x17, [x10] > mov x10, x12 <--- extra register copy > add x10, x10, #0x3e9 > str x17, [x10] > > > There is still one extra register copy, which we're trying to remove in this patch. > > This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition. > > Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so > > [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906 This pull request has now been integrated. Changeset: d10afa26 Author: Fei Gao <fgao at openjdk.org> URL: https://git.openjdk.org/jdk/commit/d10afa26e5c59475e49b353ed34e8e85d0615d92 Stats: 320 lines in 5 files changed: 297 ins; 3 del; 20 mod 8336245: AArch64: remove extra register copy when converting from long to pointer Co-authored-by: Andrew Haley <aph at openjdk.org> Reviewed-by: aph, adinn ------------- PR: https://git.openjdk.org/jdk/pull/20157 From sspitsyn at openjdk.org Fri Jul 26 10:42:33 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 26 Jul 2024 10:42:33 GMT Subject: RFR: 8330427: Obsolete -XX:+PreserveAllAnnotations [v2] In-Reply-To: <yoYPcRiwlovmm5hdLcD8y1d25ABb3r5KUniSzzyfBzI=.be9f5747-fa03-4b13-ba53-4d868ea85989@github.com> References: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com> <yoYPcRiwlovmm5hdLcD8y1d25ABb3r5KUniSzzyfBzI=.be9f5747-fa03-4b13-ba53-4d868ea85989@github.com> Message-ID: <o-XobOOcOSevq7Rfqt6VAKNZ1BdxEGekXLLxsMN0iR4=.276a4f9a-ed9f-472b-ab02-aa73413d1bdf@github.com> On Thu, 25 Jul 2024 01:53:13 GMT, Alex Menkov <amenkov at openjdk.org> wrote: >> Obsolete PreserveAllAnnotations flag which was deprecated in JDK 23. >> >> Testing: tier1,tier2,tier3,tier4,hs-tier5-svc > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > remove test Looks good. Really nice simplification. Do I understand it right that all annotations are visible now, or we just do not parse/process invisible ones? If all annotations are visible then can we get rid of the suffix `_visible'? ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20315#pullrequestreview-2201523609 From kevinw at openjdk.org Fri Jul 26 11:41:36 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Fri, 26 Jul 2024 11:41:36 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v14] In-Reply-To: <rKeKx8FnFBhN6mW30EXQDJcETtRcLimDZwu_Z3VQdyA=.5b821a7b-3753-4146-89bb-f5a64effc8c5@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <rKeKx8FnFBhN6mW30EXQDJcETtRcLimDZwu_Z3VQdyA=.5b821a7b-3753-4146-89bb-f5a64effc8c5@github.com> Message-ID: <X1PNORe3zCsQbH8DQhGBwUACW8f501e9_IBAmvUiUV8=.ec8e20b1-4b8e-4a92-8654-c2a8d1a9f94d@github.com> On Thu, 25 Jul 2024 15:31:05 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. >> >> This PR addresses the following diagnostic commands: >> - [x] Compiler.perfmap >> - [x] GC.heap_dump >> - [x] System.dump_map >> - [x] Thread.dump_to_file >> - [x] VM.cds >> >> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). >> >> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, >> >> >> filename (Optional) Name of the file to which the flight recording data is >> written when the recording is stopped. If no filename is given, a >> filename is generated from the PID and the current date and is >> placed in the directory where the process was started. The >> filename may also be a directory in which case, the filename is >> generated from the PID and the current date in the specified >> directory. (STRING, no default value) >> >> Note: If a filename is given, '%p' in the filename will be >> replaced by the PID, and '%t' will be replaced by the time in >> 'yyyy_MM_dd_HH_mm_ss' format. >> >> >> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. >> >> Testing: >> >> - [x] Added test case passes. >> - [x] Modified existing VM.cds tests to also check for `%p` filenames. >> >> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). >> >> Cheers, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Adding FILE descriptor for help output One more thing that's troubling me. (Apologies it's now and not last week.) I was looking at the _filename.value().get() usage and finding it uncomfortable, compared to the previous simple _filename.value() 8-) Harder to remember and to read and understand. Maybe we can avoid the two accessors, it really is just a char*. These additional argument types look like part of the framework which never found an audience: MemorySizeArgument has one usage in CompilationMemoryStatisticDCmd, NanoTimeArgument looks unused -- so the two-accessor usage is only in once place until now? Adding FileArgument as another of these might be the wrong direction, as these classes are so almost redundant. What if we didn't add FileArgument, and kept using <char*> for _filename args/opts: Then in DCmdArgument<char*>::parse_value(), recognise a "FILE" argument type and call Arguments::copy_expand_pid there, to set _value. Just seeing if we can cut down some of the complexity here, as Thomas mentioned, it is already very complex for what it is! (There is also the to_string method which seemed like it would be useful here, but it needs a buffer so is more complex than calling two accessors... Another thing that seems to part of the framework that was never much adopted.) ------------- PR Review: https://git.openjdk.org/jdk/pull/20198#pullrequestreview-2201623426 From jsjolen at openjdk.org Fri Jul 26 12:46:47 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 26 Jul 2024 12:46:47 GMT Subject: RFR: 8335701: Make GrowableArray templated by an Index [v3] In-Reply-To: <RdHPj2BymMdh9XdDmzcAtFCxfvfPfA0jxKk5lDK-GPI=.cae72821-07f6-4092-934c-b4bbd08a8167@github.com> References: <RdHPj2BymMdh9XdDmzcAtFCxfvfPfA0jxKk5lDK-GPI=.cae72821-07f6-4092-934c-b4bbd08a8167@github.com> Message-ID: <LG1MWkM8a4plqZXPMsZ81I7z-_9TmZAdXEw4a97lzuE=.430dd51d-3c6b-4bbb-8328-5f0efbb67ccb@github.com> > Hi, > > Today the GrowableArray has a set index type of `int`, this PR makes it so that you can set your own index type through a template parameter. > > This opens up for a few new design choices: > > - Do you know that you have a very small array? Use an `uint8_t` for len and cap, each. > - Do you have a very large one? Use an `uint64_t`. > > The code has opted for `int` being default, as to keep identical semantics for all existing code and to let users not have to worry about the index if they don't care. > > One "major" change that I don't want to get lost in the review: I've changed the mid-point calculation to be overflow insensitive without casting. > > > > // Old > mid = ((max + min) / 2); > // New > mid = min + ((max - min) / 2); > > Some semi-rigorous thinking: > min \in [0, len) > max \in [0, len) > min <= max > max - min / 2 \in [0, len/2) > Maximizing min and max => len + 0 > Maximizing max, minimizing min => len/2 > Minimizing max, maximizing min => max = min => min > > > // Proof that they're identical when m, h, l \in N > (1) m = l + (h - l) / 2 <=> > 2m = 2l + h - l = h + l > > (2) m = (h + l) / 2 <=> > 2m = h + l > (1) = (2) > QED Johan Sj?len has updated the pull request incrementally with four additional commits since the last revision: - Fix - Apparently this(!) - This? - Use COMMA ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20031/files - new: https://git.openjdk.org/jdk/pull/20031/files/b5a87422..937f6eb6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20031&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20031&range=01-02 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20031.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20031/head:pull/20031 PR: https://git.openjdk.org/jdk/pull/20031 From stuefe at openjdk.org Fri Jul 26 12:46:48 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 26 Jul 2024 12:46:48 GMT Subject: RFR: 8335701: Make GrowableArray templated by an Index [v2] In-Reply-To: <VN0fNxU6lHhckcxd-NtBrSwE8x5o52dTv89e8NuudGM=.a81cf548-3bb0-4610-a9b6-d783b6311984@github.com> References: <RdHPj2BymMdh9XdDmzcAtFCxfvfPfA0jxKk5lDK-GPI=.cae72821-07f6-4092-934c-b4bbd08a8167@github.com> <VN0fNxU6lHhckcxd-NtBrSwE8x5o52dTv89e8NuudGM=.a81cf548-3bb0-4610-a9b6-d783b6311984@github.com> Message-ID: <iz73YZU43_x7KZg48wTHbIEyRcudWUeyX3FTnNd3u8E=.57951eb8-5915-41bb-b6e0-9d7432761d76@github.com> On Thu, 4 Jul 2024 13:35:36 GMT, Johan Sj?len <jsjolen at openjdk.org> wrote: >> Hi, >> >> Today the GrowableArray has a set index type of `int`, this PR makes it so that you can set your own index type through a template parameter. >> >> This opens up for a few new design choices: >> >> - Do you know that you have a very small array? Use an `uint8_t` for len and cap, each. >> - Do you have a very large one? Use an `uint64_t`. >> >> The code has opted for `int` being default, as to keep identical semantics for all existing code and to let users not have to worry about the index if they don't care. >> >> One "major" change that I don't want to get lost in the review: I've changed the mid-point calculation to be overflow insensitive without casting. >> >> >> >> // Old >> mid = ((max + min) / 2); >> // New >> mid = min + ((max - min) / 2); >> >> Some semi-rigorous thinking: >> min \in [0, len) >> max \in [0, len) >> min <= max >> max - min / 2 \in [0, len/2) >> Maximizing min and max => len + 0 >> Maximizing max, minimizing min => len/2 >> Minimizing max, maximizing min => max = min => min >> >> >> // Proof that they're identical when m, h, l \in N >> (1) m = l + (h - l) / 2 <=> >> 2m = 2l + h - l = h + l >> >> (2) m = (h + l) / 2 <=> >> 2m = h + l >> (1) = (2) >> QED > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Attempt at fixing GA VMStruct If this is for src/hotspot/share/nmt/arrayWithFreeList.hpp, would it not be a lot simpler to just implement it there, and give it another backing store? In particular because after doing all this work it still won't even support the feature I was hoping for, mainly the ability to put an indexed free list atop of existing memory. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20031#issuecomment-2209091083 From jsjolen at openjdk.org Fri Jul 26 12:46:48 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 26 Jul 2024 12:46:48 GMT Subject: RFR: 8335701: Make GrowableArray templated by an Index [v2] In-Reply-To: <iz73YZU43_x7KZg48wTHbIEyRcudWUeyX3FTnNd3u8E=.57951eb8-5915-41bb-b6e0-9d7432761d76@github.com> References: <RdHPj2BymMdh9XdDmzcAtFCxfvfPfA0jxKk5lDK-GPI=.cae72821-07f6-4092-934c-b4bbd08a8167@github.com> <VN0fNxU6lHhckcxd-NtBrSwE8x5o52dTv89e8NuudGM=.a81cf548-3bb0-4610-a9b6-d783b6311984@github.com> <iz73YZU43_x7KZg48wTHbIEyRcudWUeyX3FTnNd3u8E=.57951eb8-5915-41bb-b6e0-9d7432761d76@github.com> Message-ID: <U765IkNQyrp-UH4PUVtyT4D6GsDeqRtJK9qtGIIyO3E=.14aafa51-445c-485e-b90c-fd9bfbef5ff4@github.com> On Thu, 4 Jul 2024 14:07:57 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > If this is for src/hotspot/share/nmt/arrayWithFreeList.hpp, would it not be a lot simpler to just implement it there, and give it another backing store? > > In particular because after doing all this work it still won't even support the feature I was hoping for, mainly the ability to put an indexed free list atop of existing memory. I did that first and it sure is simpler, but I'm not sure whether it's a good idea to have to support such a backing storage. See `resizable_array` in here: https://github.com/openjdk/jdk/pull/20002 Still, it does do what you asked for, kind of, see: `GrowableArrayFromArray`. I can adapt AWFL to be able to use either `GAFA` or `GA`. It's also not the only reason to do this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20031#issuecomment-2209118236 From jsjolen at openjdk.org Fri Jul 26 12:46:48 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 26 Jul 2024 12:46:48 GMT Subject: RFR: 8335701: Make GrowableArray templated by an Index [v2] In-Reply-To: <VN0fNxU6lHhckcxd-NtBrSwE8x5o52dTv89e8NuudGM=.a81cf548-3bb0-4610-a9b6-d783b6311984@github.com> References: <RdHPj2BymMdh9XdDmzcAtFCxfvfPfA0jxKk5lDK-GPI=.cae72821-07f6-4092-934c-b4bbd08a8167@github.com> <VN0fNxU6lHhckcxd-NtBrSwE8x5o52dTv89e8NuudGM=.a81cf548-3bb0-4610-a9b6-d783b6311984@github.com> Message-ID: <MnaoMlToY92Ay91ANlCwVRF1mSyy4_tZnGtF8lbWqFE=.1d381810-1795-4b33-a91d-cb2f1bab66c7@github.com> On Thu, 4 Jul 2024 13:35:36 GMT, Johan Sj?len <jsjolen at openjdk.org> wrote: >> Hi, >> >> Today the GrowableArray has a set index type of `int`, this PR makes it so that you can set your own index type through a template parameter. >> >> This opens up for a few new design choices: >> >> - Do you know that you have a very small array? Use an `uint8_t` for len and cap, each. >> - Do you have a very large one? Use an `uint64_t`. >> >> The code has opted for `int` being default, as to keep identical semantics for all existing code and to let users not have to worry about the index if they don't care. >> >> One "major" change that I don't want to get lost in the review: I've changed the mid-point calculation to be overflow insensitive without casting. >> >> >> >> // Old >> mid = ((max + min) / 2); >> // New >> mid = min + ((max - min) / 2); >> >> Some semi-rigorous thinking: >> min \in [0, len) >> max \in [0, len) >> min <= max >> max - min / 2 \in [0, len/2) >> Maximizing min and max => len + 0 >> Maximizing max, minimizing min => len/2 >> Minimizing max, maximizing min => max = min => min >> >> >> // Proof that they're identical when m, h, l \in N >> (1) m = l + (h - l) / 2 <=> >> 2m = 2l + h - l = h + l >> >> (2) m = (h + l) / 2 <=> >> 2m = h + l >> (1) = (2) >> QED > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Attempt at fixing GA VMStruct Compiler issue in linux-x86 seems unrelated to this change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20031#issuecomment-2252687347 From duke at openjdk.org Fri Jul 26 12:46:48 2024 From: duke at openjdk.org (Glavo) Date: Fri, 26 Jul 2024 12:46:48 GMT Subject: RFR: 8335701: Make GrowableArray templated by an Index [v3] In-Reply-To: <LG1MWkM8a4plqZXPMsZ81I7z-_9TmZAdXEw4a97lzuE=.430dd51d-3c6b-4bbb-8328-5f0efbb67ccb@github.com> References: <RdHPj2BymMdh9XdDmzcAtFCxfvfPfA0jxKk5lDK-GPI=.cae72821-07f6-4092-934c-b4bbd08a8167@github.com> <LG1MWkM8a4plqZXPMsZ81I7z-_9TmZAdXEw4a97lzuE=.430dd51d-3c6b-4bbb-8328-5f0efbb67ccb@github.com> Message-ID: <gLnsMoXLgkS5HK75auE3qmpkhkC_OJxci8AqodyteU0=.5b713ab9-5f89-4d60-aa38-e37e4dd8665b@github.com> On Fri, 26 Jul 2024 12:44:31 GMT, Johan Sj?len <jsjolen at openjdk.org> wrote: >> Hi, >> >> Today the GrowableArray has a set index type of `int`, this PR makes it so that you can set your own index type through a template parameter. >> >> This opens up for a few new design choices: >> >> - Do you know that you have a very small array? Use an `uint8_t` for len and cap, each. >> - Do you have a very large one? Use an `uint64_t`. >> >> The code has opted for `int` being default, as to keep identical semantics for all existing code and to let users not have to worry about the index if they don't care. >> >> One "major" change that I don't want to get lost in the review: I've changed the mid-point calculation to be overflow insensitive without casting. >> >> >> >> // Old >> mid = ((max + min) / 2); >> // New >> mid = min + ((max - min) / 2); >> >> Some semi-rigorous thinking: >> min \in [0, len) >> max \in [0, len) >> min <= max >> max - min / 2 \in [0, len/2) >> Maximizing min and max => len + 0 >> Maximizing max, minimizing min => len/2 >> Minimizing max, maximizing min => max = min => min >> >> >> // Proof that they're identical when m, h, l \in N >> (1) m = l + (h - l) / 2 <=> >> 2m = 2l + h - l = h + l >> >> (2) m = (h + l) / 2 <=> >> 2m = h + l >> (1) = (2) >> QED > > Johan Sj?len has updated the pull request incrementally with four additional commits since the last revision: > > - Fix > - Apparently this(!) > - This? > - Use COMMA src/hotspot/share/classfile/classFileParser.hpp line 46: > 44: class ConstMethod; > 45: class FieldInfo; > 46: template<typename E, typename Index> Suggestion: template<typename E, typename Index = int> Is it possible to reduce the changes by providing default parameters? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20031#discussion_r1666447498 From jsjolen at openjdk.org Fri Jul 26 12:46:48 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 26 Jul 2024 12:46:48 GMT Subject: RFR: 8335701: Make GrowableArray templated by an Index [v3] In-Reply-To: <gLnsMoXLgkS5HK75auE3qmpkhkC_OJxci8AqodyteU0=.5b713ab9-5f89-4d60-aa38-e37e4dd8665b@github.com> References: <RdHPj2BymMdh9XdDmzcAtFCxfvfPfA0jxKk5lDK-GPI=.cae72821-07f6-4092-934c-b4bbd08a8167@github.com> <LG1MWkM8a4plqZXPMsZ81I7z-_9TmZAdXEw4a97lzuE=.430dd51d-3c6b-4bbb-8328-5f0efbb67ccb@github.com> <gLnsMoXLgkS5HK75auE3qmpkhkC_OJxci8AqodyteU0=.5b713ab9-5f89-4d60-aa38-e37e4dd8665b@github.com> Message-ID: <4H6ngLZ5pODKJSClj8mfQx2_B58hNIyA6yW878uF0zk=.fbe42340-3850-4a6a-a529-2b70a4a37b6c@github.com> On Fri, 5 Jul 2024 07:38:12 GMT, Glavo <duke at openjdk.org> wrote: >> Johan Sj?len has updated the pull request incrementally with four additional commits since the last revision: >> >> - Fix >> - Apparently this(!) >> - This? >> - Use COMMA > > src/hotspot/share/classfile/classFileParser.hpp line 46: > >> 44: class ConstMethod; >> 45: class FieldInfo; >> 46: template<typename E, typename Index> > > Suggestion: > > template<typename E, typename Index = int> > > > Is it possible to reduce the changes by providing default parameters? Unfortunately, no. Forward decl.s may not re-define the default template argument, even though they are the same as the definition. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20031#discussion_r1666460897 From aph at openjdk.org Fri Jul 26 15:13:06 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 26 Jul 2024 15:13:06 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v9] In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: <LnkS1a2xutLFBgsUO0b-doRPPTDCBjRAuiMWGquAvhU=.3de28018-570d-49f8-9cc1-4a3ea577a0b9@github.com> > This patch expands the use of a hash table for secondary superclasses > to the interpreter, C1, and runtime. It also adds a C2 implementation > of hashed lookup in cases where the superclass isn't known at compile > time. > > HotSpot shared runtime > ---------------------- > > Building hashed secondary tables is now unconditional. It takes very > little time, and now that the shared runtime always has the tables, it > might as well take advantage of them. The shared code is easier to > follow now, I think. > > There might be a performance issue with x86-64 in that we build > HotSpot for a default x86-64 target that does not support popcount. > This means that HotSpot C++ runtime on x86 always uses a software > emulation for popcount, even though the vast majority of machines made > for the past 20 years can do popcount in a single instruction. It > wouldn't be terribly hard to do something about that. > > Having said that, the software popcount is really not bad. > > x86 > --- > > x86 is rather tricky, because we still support > `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as > well as 32- and 64-bit ports. There's some further complication in > that only `RCX` can be used as a shift count, so there's some register > shuffling to do. All of this makes the logic in macroAssembler_x86.cpp > rather gnarly, with multiple levels of conditionals at compile time > and runtime. > > AArch64 > ------- > > AArch64 is considerably more straightforward. We always have a > popcount instruction and (thankfully) no 32-bit code to worry about. > > Generally > --------- > > I would dearly love simply to rip out the "old" secondary supers cache > support, but I've left it in just in case someone has a performance > regression. > > The versions of `MacroAssembler::lookup_secondary_supers_table` that > work with variable superclasses don't take a fixed set of temp > registers, and neither do they call out to to a slow path subroutine. > Instead, the slow patch is expanded inline. > > I don't think this is necessarily bad. Apart from the very rare cases > where C2 can't determine the superclass to search for at compile time, > this code is only used for generating stubs, and it seemed to me > ridiculous to have stubs calling other stubs. > > I've followed the guidance from @iwanowww not to obsess too much about > the performance of C1-compiled secondary supers lookups, and to prefer > simplicity over absolute performance. Nonetheless, this is a > complicated patch that touches many areas. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Fix test failure ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19989/files - new: https://git.openjdk.org/jdk/pull/19989/files/248f44dc..e9581019 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=07-08 Stats: 7 lines in 1 file changed: 0 ins; 1 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19989.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989 PR: https://git.openjdk.org/jdk/pull/19989 From wkemper at openjdk.org Fri Jul 26 15:32:38 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 26 Jul 2024 15:32:38 GMT Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update mode [v2] In-Reply-To: <Fp-wdKTTyKEIGRH8CpV38B8gThKzmvNmbrLcN9Yc4Rg=.1bd4121b-9a9a-47d8-850e-1e9bb4dbbfdb@github.com> References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com> <03bSRAN8T28AU2-M4IzjsBygTwG4SHrc8HUIJYLM5TE=.e4299a87-b25f-471b-9f6e-2c08e741c6f2@github.com> <Fp-wdKTTyKEIGRH8CpV38B8gThKzmvNmbrLcN9Yc4Rg=.1bd4121b-9a9a-47d8-850e-1e9bb4dbbfdb@github.com> Message-ID: <inDl-fffiHKi375I22yY_978HoveyzuZf1tx3RHC-Ks=.2ddea4a6-be29-4b81-9291-8f335583a3df@github.com> On Thu, 25 Jul 2024 02:24:43 GMT, Y. Srinivas Ramakrishna <ysr at openjdk.org> wrote: >> William Kemper has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unintentional new line > > src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 122: > >> 120: >> 121: ShenandoahMarkRefsClosure<GENERATION> mark_cl(q, rp); >> 122: ShenandoahSATBAndRemarkThreadsClosure tc(satb_mq_set, nullptr); > > Because this is the only c'tor usage of this closure, may be get rid of the second argument altogether, and clean up its `do_thread()` method further above at lines 84-89 ? Right, @shipilev also pointed this out. I've since cleaned it up. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20316#discussion_r1693258254 From wkemper at openjdk.org Fri Jul 26 15:32:41 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 26 Jul 2024 15:32:41 GMT Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update mode [v4] In-Reply-To: <B9KUTPsaCmQ0ewO73Mdh6lHXUpu_lAmCNQ8FC6_fzkU=.e2c55cef-c263-4b41-a7d7-529d7f7740e3@github.com> References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com> <cNGAzck0BWF_wvjEGCwPJ4908wvwaWKVOEfql6P105Q=.e3794703-ee0b-4421-b1da-75fd9d09fc1d@github.com> <B9KUTPsaCmQ0ewO73Mdh6lHXUpu_lAmCNQ8FC6_fzkU=.e2c55cef-c263-4b41-a7d7-529d7f7740e3@github.com> Message-ID: <KiP-D8PU6B1hSk7-XAr2h5eGiAzkRwDzS9MoJADv2Js=.47749f28-2094-4d86-88a4-88acc35153a5@github.com> On Thu, 25 Jul 2024 02:27:20 GMT, Y. Srinivas Ramakrishna <ysr at openjdk.org> wrote: >> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Merge remote-tracking branch 'openjdk/master' into remove-incremental-update-mode >> - Simplify final mark now that incremental update mode is removed >> - Remove unintentional new line >> - Remove last vestiges of incremental update mode >> - Missed test, remove actual IU barrier flag >> - Remove missed iu_barrier usages for C1 >> - Update test (all barriers can be enabled now for all modes) >> - WIP: Remove incremental update mode > > src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 339: > >> 337: product(bool, ShenandoahIUBarrier, false, DIAGNOSTIC, \ >> 338: "Turn on/off I-U barriers barriers in Shenandoah") \ >> 339: \ > > Not your change, but these doc comments should really say `"Enable blah-blah ..."` rather than `"Turn on/off blah-blah..."`. Yes, next time we're in this file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20316#discussion_r1693259588 From aph at openjdk.org Fri Jul 26 15:40:35 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 26 Jul 2024 15:40:35 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v8] In-Reply-To: <2tItgZaRCa5BQrmelOWEsn6FVlHlEvY4is2L1n3HxhE=.eb2519cc-d69e-45e6-8ca3-b5b02565bb76@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <0fk0Qo0HelMbG6l1d-hxUN504qx0ehO9uNxg9JrOeJU=.0b150931-21ea-4383-b6c2-85f6c74958d1@github.com> <2tItgZaRCa5BQrmelOWEsn6FVlHlEvY4is2L1n3HxhE=.eb2519cc-d69e-45e6-8ca3-b5b02565bb76@github.com> Message-ID: <mJPwuVTF3rE7s9bvenMcur1eOkelC2mMzQWgJq_Dwtk=.6f590424-d389-4721-a1aa-08774216746d@github.com> On Thu, 25 Jul 2024 23:31:21 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: > Thanks! The patch looks good, except there was one failure observed during testing with the latest patch [1]. It does look related to the latest changes you did in [54050a5](https://github.com/openjdk/jdk/pull/19989/commits/54050a5c2c0aa1d6a9e36d0240c66345765845e3) about `bitmap == SECONDARY_SUPERS_BITMAP_FULL` check. Wow! Whoever wrote that test case deserves a bouquet of roses. It's a super-edge case. If the hash slot of an interface is 63, and the secondaries array length of the Klass we're probing is 63, then the initial probe is at Offset 63, one past the array bounds. This bug happens because of an "optimization" created during the first round of reviews. If the secondaries array length is >= 62 (_not_ >= 63), we set the secondaries bitmap to `SECONDARY_SUPERS_BITMAP_FULL`. So, the initial probe sees a full bitmap, popcount returns 63, and we look at secondary_supers[63]. _Bang_. We never noticed this problem before because there's no bounds checking in the hand-coded assembly language implementations. The root cause of this bug is that we're not maintaining the invariant `popcount(bitmap) == secondary_supers()->length()`. There are a couple of ways to fix this. We could check `secondary_supers()->length()` before doing any probe. I'm very reluctant to add a memory load to the super-hot path for this edge case, though. It's better to take some pain in the case of an almost-full secondaries array, because those are very rare, and will never occur in most Java programs. So, i've corrected the bitmap at the point the hash table is constructed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2253019028 From amenkov at openjdk.org Fri Jul 26 17:59:41 2024 From: amenkov at openjdk.org (Alex Menkov) Date: Fri, 26 Jul 2024 17:59:41 GMT Subject: RFR: 8330427: Obsolete -XX:+PreserveAllAnnotations [v2] In-Reply-To: <o-XobOOcOSevq7Rfqt6VAKNZ1BdxEGekXLLxsMN0iR4=.276a4f9a-ed9f-472b-ab02-aa73413d1bdf@github.com> References: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com> <yoYPcRiwlovmm5hdLcD8y1d25ABb3r5KUniSzzyfBzI=.be9f5747-fa03-4b13-ba53-4d868ea85989@github.com> <o-XobOOcOSevq7Rfqt6VAKNZ1BdxEGekXLLxsMN0iR4=.276a4f9a-ed9f-472b-ab02-aa73413d1bdf@github.com> Message-ID: <r58fJ8zGBJ158Ucsgpo3dRY3cirAx_5PsdlPozgXwjE=.afe2557e-6603-478b-932e-be72604ecc2c@github.com> On Fri, 26 Jul 2024 10:39:28 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: > Looks good. Really nice simplification. Do I understand it right that all annotations are visible now, or we just do not parse/process invisible ones? If all annotations are visible then can we get rid of the suffix `_visible'? We skip (do not process) invisible annotations ------------- PR Comment: https://git.openjdk.org/jdk/pull/20315#issuecomment-2253225273 From amenkov at openjdk.org Fri Jul 26 17:59:42 2024 From: amenkov at openjdk.org (Alex Menkov) Date: Fri, 26 Jul 2024 17:59:42 GMT Subject: Integrated: 8330427: Obsolete -XX:+PreserveAllAnnotations In-Reply-To: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com> References: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com> Message-ID: <uzdS6siIly7pmOkHflwuUg2NBbii8FluRUYMhnH8rAM=.75bfaae5-3986-4d56-b790-f2d6182fddc2@github.com> On Wed, 24 Jul 2024 18:01:15 GMT, Alex Menkov <amenkov at openjdk.org> wrote: > Obsolete PreserveAllAnnotations flag which was deprecated in JDK 23. > > Testing: tier1,tier2,tier3,tier4,hs-tier5-svc This pull request has now been integrated. Changeset: abc4ca5a Author: Alex Menkov <amenkov at openjdk.org> URL: https://git.openjdk.org/jdk/commit/abc4ca5a8c440f8f3f36a9b35036772c5b5ee7ea Stats: 378 lines in 7 files changed: 3 ins; 339 del; 36 mod 8330427: Obsolete -XX:+PreserveAllAnnotations Reviewed-by: dholmes, sspitsyn, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/20315 From asmehra at openjdk.org Fri Jul 26 18:23:33 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Fri, 26 Jul 2024 18:23:33 GMT Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic In-Reply-To: <ej5ON8iDbUMsORwZNuLzDbXERpzGJde7q_hd50vKPGo=.34c0d39e-7a85-49a7-9d10-363a9800cc4d@github.com> References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com> <ej5ON8iDbUMsORwZNuLzDbXERpzGJde7q_hd50vKPGo=.34c0d39e-7a85-49a7-9d10-363a9800cc4d@github.com> Message-ID: <1zW4OT5fJqNOIVmEJzaa75P1pkOtTDCc5o3As0Cbrfg=.37b21e54-fb16-4015-a910-40ead48c94b3@github.com> On Fri, 26 Jul 2024 06:08:03 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031) >> >> Testing: >> test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java >> test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java > > src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 118: > >> 116: if (total != _current) { >> 117: log_info(compilation, alloc)("WARNING!!! Total does not match current"); >> 118: } > > Why do we calculate total? Just for this test? I would then put this into an ASSERT section, and make the info log an assert. > > However, I wonder if this is really needed. The logic updating both _current and _tags_size is pretty straightforward, I don't see how there could be a mismatch. This code should not have been there. I forgot to remove it. There is no use of `total` here. > src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 204: > >> 202: size_t _total; >> 203: // usage per arena tag when total peaked >> 204: size_t _tags_size_at_peak[Arena::tag_count()]; > > Can you please make sure Arena::tag_count() evaluates to constexpr? When in doubt, just use the enum value instead. Arena::tag_count() is declared as a constexpr. I wanted to avoid writing `static_cast<int>(Arena::Tag::tag_count)` every time I need tag_count, so I wrapped it in Arena::tag_count() and declared it with constexpr. Is that not sufficient to make it a constexpr? > src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 242: > >> 240: for (int tag = 0; tag < Arena::tag_count(); tag++) { >> 241: st->print_cr(" " LEGEND_KEY_FMT ": %s", Arena::tag_name[tag], Arena::tag_desc[tag]); >> 242: } > > use x macro? What do you mean by x macro? Do you have an example that shows the use of x macro? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1693443814 PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1693443227 PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1693445269 From vlivanov at openjdk.org Fri Jul 26 18:43:33 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 26 Jul 2024 18:43:33 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v9] In-Reply-To: <LnkS1a2xutLFBgsUO0b-doRPPTDCBjRAuiMWGquAvhU=.3de28018-570d-49f8-9cc1-4a3ea577a0b9@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <LnkS1a2xutLFBgsUO0b-doRPPTDCBjRAuiMWGquAvhU=.3de28018-570d-49f8-9cc1-4a3ea577a0b9@github.com> Message-ID: <FJz0qOtL2DHVrLC_zwUBm7eMG--I601KNycK4uD8SN4=.0f444c69-eb62-45cc-a92d-6551ce42bf05@github.com> On Fri, 26 Jul 2024 15:13:06 GMT, Andrew Haley <aph at openjdk.org> wrote: >> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't known at compile >> time. >> >> HotSpot shared runtime >> ---------------------- >> >> Building hashed secondary tables is now unconditional. It takes very >> little time, and now that the shared runtime always has the tables, it >> might as well take advantage of them. The shared code is easier to >> follow now, I think. >> >> There might be a performance issue with x86-64 in that we build >> HotSpot for a default x86-64 target that does not support popcount. >> This means that HotSpot C++ runtime on x86 always uses a software >> emulation for popcount, even though the vast majority of machines made >> for the past 20 years can do popcount in a single instruction. It >> wouldn't be terribly hard to do something about that. >> >> Having said that, the software popcount is really not bad. >> >> x86 >> --- >> >> x86 is rather tricky, because we still support >> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as >> well as 32- and 64-bit ports. There's some further complication in >> that only `RCX` can be used as a shift count, so there's some register >> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp >> rather gnarly, with multiple levels of conditionals at compile time >> and runtime. >> >> AArch64 >> ------- >> >> AArch64 is considerably more straightforward. We always have a >> popcount instruction and (thankfully) no 32-bit code to worry about. >> >> Generally >> --------- >> >> I would dearly love simply to rip out the "old" secondary supers cache >> support, but I've left it in just in case someone has a performance >> regression. >> >> The versions of `MacroAssembler::lookup_secondary_supers_table` that >> work with variable superclasses don't take a fixed set of temp >> registers, and neither do they call out to to a slow path subroutine. >> Instead, the slow patch is expanded inline. >> >> I don't think this is necessarily bad. Apart from the very rare cases >> where C2 can't determine the superclass to search for at compile time, >> this code is only used for generating stubs, and it seemed to me >> ridiculous to have stubs calling other stubs. >> >> I've followed the guidance from @iwanowww not to obsess too much about >> the performance of C1-compiled secondary supers lookups, and to prefer >> simplicity over absolute performance. Nonetheless, this i... > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Fix test failure Oh, it comes as a surprise to me... I was under impression that the first thing hand-coded assembly variants do is check for `bitmap != SECONDARY_SUPERS_BITMAP_FULL`. At least, it was my recollection from working on [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). (And the initial version of the patch with the check in `Klass::lookup_secondary_supers_table()` and `_bitmap(SECONDARY_SUPERS_BITMAP_FULL)` only reassured me that's still the case.) > The root cause of this bug is that we're not maintaining the invariant popcount(bitmap) == secondary_supers()->length(). The invariant holds only when `bitmap != SECONDARY_SUPERS_BITMAP_FULL`. It does help that even in case of non-hashed `secondary_supers` list `secondary_supers()->length() >= popcount(bitmap)`, but initial probing becomes much less efficient (a random probe followed by a full linear pass over secondary supers list). Alternatively, all table lookups can be adjusted to start with `bitmap != SECONDARY_SUPERS_BITMAP_FULL` checks before probing the table. It does add a branch on the fast path (and slightly increases inlined snippet code size), but the branch is highly predictable and works on a value we need anyway. So, I would be surprised to see any performance effects from it. IMO it's easier to reason and more flexible: `SECONDARY_SUPERS_BITMAP_FULL == bitmap` simply disables all table lookups and unconditionally falls back to linear search. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2253281836 From aph at openjdk.org Fri Jul 26 21:36:34 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 26 Jul 2024 21:36:34 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v9] In-Reply-To: <FJz0qOtL2DHVrLC_zwUBm7eMG--I601KNycK4uD8SN4=.0f444c69-eb62-45cc-a92d-6551ce42bf05@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <LnkS1a2xutLFBgsUO0b-doRPPTDCBjRAuiMWGquAvhU=.3de28018-570d-49f8-9cc1-4a3ea577a0b9@github.com> <FJz0qOtL2DHVrLC_zwUBm7eMG--I601KNycK4uD8SN4=.0f444c69-eb62-45cc-a92d-6551ce42bf05@github.com> Message-ID: <LIwHtqflDxdG8s68Pj3OMLNYpRPh19xrJfJreXtOxQc=.00a895b3-1e0f-4db8-a8f8-1d3a531b5a41@github.com> On Fri, 26 Jul 2024 18:39:27 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: > Oh, it comes as a surprise to me... I was under impression that the first thing hand-coded assembly variants do is check for `bitmap != SECONDARY_SUPERS_BITMAP_FULL`. At least, it was my recollection from working on [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). (And the initial version of the patch with the check in `Klass::lookup_secondary_supers_table()` and `_bitmap(SECONDARY_SUPERS_BITMAP_FULL)` only reassured me that's still the case.) > > > The root cause of this bug is that we're not maintaining the invariant popcount(bitmap) == secondary_supers()->length(). > > The invariant holds only when `bitmap != SECONDARY_SUPERS_BITMAP_FULL`. Yes, exactly so. That's what I intended to mean. > It does help that even in case of non-hashed `secondary_supers` list `secondary_supers()->length() >= popcount(bitmap)`, but initial probing becomes much less efficient (a random probe followed by a full linear pass over secondary supers list). > > Alternatively, all table lookups can be adjusted to start with `bitmap != SECONDARY_SUPERS_BITMAP_FULL` checks before probing the table. It does add a branch on the fast path (and slightly increases inlined snippet code size), but the branch is highly predictable and works on a value we need anyway. True enough. I've been trying to move as much as I can out of the inlined code, though. > So, I would be surprised to see any performance effects from it. IMO it's easier to reason and more flexible: `SECONDARY_SUPERS_BITMAP_FULL == bitmap` simply disables all table lookups and unconditionally falls back to linear search. I take your point. But that seems like it's a bit of a sledgehammer to crack a walnut, don't you think? Given that it's only necessary to fix a rare edge case. But I'm not going to be precious about this choice, I'll test for a full bitmap first if you prefer. It's only a couple of instructions, but they are in the fast path that is successful in almost 99% of cases. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2253537338 From dholmes at openjdk.org Fri Jul 26 21:37:43 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 26 Jul 2024 21:37:43 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string In-Reply-To: <wHY5e9XeMFpUyA7Zr0RKG2zIXC3rB5dqklIuzb8TnAQ=.55cc765a-6ec8-46dc-8cf1-4fe49d4aa476@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> <wHY5e9XeMFpUyA7Zr0RKG2zIXC3rB5dqklIuzb8TnAQ=.55cc765a-6ec8-46dc-8cf1-4fe49d4aa476@github.com> Message-ID: <vz8nu18FeuJADlmZjGknJXHdBzCkuBxR6-w-18bWboI=.27e0969a-e5bf-43b8-9dc0-b018f7034fe8@github.com> On Fri, 26 Jul 2024 05:23:28 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence. >> >> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this. >> >> Testing: >> - new gtest exercises the truncation code with the different possibilities for bad truncation >> - tiers 1-3 sanity testing >> >> Thanks. > > src/hotspot/share/utilities/exceptions.cpp line 276: > >> 274: // sequence is valid. >> 275: if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) { >> 276: assert(msg[max_msg_size - 1] == '\0', "should be null terminated"); > > Would this always be true? For a formatting error, too? > Maybe just to be sure, instead of asserting set the last byte to zero. vsnprintf is supposed to guarantee it, and os::vsnprint does IIRC, so this is just a sanity check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693624938 From dholmes at openjdk.org Fri Jul 26 21:46:41 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 26 Jul 2024 21:46:41 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string In-Reply-To: <wHY5e9XeMFpUyA7Zr0RKG2zIXC3rB5dqklIuzb8TnAQ=.55cc765a-6ec8-46dc-8cf1-4fe49d4aa476@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> <wHY5e9XeMFpUyA7Zr0RKG2zIXC3rB5dqklIuzb8TnAQ=.55cc765a-6ec8-46dc-8cf1-4fe49d4aa476@github.com> Message-ID: <Bp9RxG0ZfwtVg7p9v_X_ZgogL1U-aG0ha7ME7nKW8c8=.49302a72-48f0-4b5b-bc16-64ff037f6006@github.com> On Fri, 26 Jul 2024 05:36:42 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence. >> >> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this. >> >> Testing: >> - new gtest exercises the truncation code with the different possibilities for bad truncation >> - tiers 1-3 sanity testing >> >> Thanks. > > This is a neat technique, but it won't work for very short strings (e.g. consisting of just one or two multi-byte sequences, the latter being invalid). Reason is that you need a minimal buffer length to do the check safely. > > What you could do alternatively is to allocate the `msg` buffer with 5 leading bytes that you don't use, just zero-initialize. So the string start would be at msg+5. But that way, you can safely overstep the array with e.g. index - 5 without corruption. Thanks for looking at this @tstuefe and @djelinski > src/hotspot/share/utilities/exceptions.cpp line 275: > >> 273: // we may also have a truncated UTF-8 sequence. In such cases we need to fix the buffer so the UTF-8 >> 274: // sequence is valid. >> 275: if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) { > > You should test for length >= 5 since it is the farthest you could access in `UTF8::truncate_to_legal_utf8` later: > > > for (int index = length - 2; index > 0; index--) { > ... > assert(buffer[index - 3] == 0xED, "malformed sequence"); I will update `truncate_to_legal_utf` to ensure we check for small buffers - though of course we would never expect to pass such a buffer to it in the first place. > src/hotspot/share/utilities/exceptions.cpp line 277: > >> 275: if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) { >> 276: assert(msg[max_msg_size - 1] == '\0', "should be null terminated"); >> 277: UTF8::truncate_to_legal_utf8((unsigned char*)msg, max_msg_size); > > Ah, I misread your patch and thought you pass in the strlen of the message to the truncation function, when in fact you pass in the hard coded message buffer size. > > But that begs the question of why you test strlen above, and more importantly, whether all cases where snprintf returns an error are truncation problems. It could have detected an invalid UTF8 sequence and aborted in the middle of it. The `strlen` check is to skip the empty buffer you can get on Windows if vsnprintf returns -1 due to overflow of INT_MAX. We are assuming/requiring that we start with a valid UTF8 sequence and the worst that will happen is that vsnprintf will truncate it. If we actually got -1 for a conversion error (no way to tell the difference in the two cases) then we would unnecessarily truncate, but we do not expect any such conversion errors - in part because we type check the format specifiers and args and so should never get a mismatch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20345#issuecomment-2253549010 PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693627660 PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693627154 From dholmes at openjdk.org Fri Jul 26 21:46:42 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 26 Jul 2024 21:46:42 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string In-Reply-To: <Hx2L_c-7TZ4xp3QGfZWrYAsuw35Z4f90q7pMX-SseTE=.30230b2b-efe6-4339-a4bd-6ee12a4a706d@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> <S1NZjbJMW41XauI6C9DQy6i4IPitvkb-1UJWz8Rp3OI=.10e0de51-fe1a-44af-b414-053faf37737b@github.com> <Hx2L_c-7TZ4xp3QGfZWrYAsuw35Z4f90q7pMX-SseTE=.30230b2b-efe6-4339-a4bd-6ee12a4a706d@github.com> Message-ID: <BGSEf3h_EuLOHuRHwBJl5h_VMezDWyv7j0w4xGgZXeA=.e5919fd1-0544-44ac-b11d-62b19e1c5bc1@github.com> On Fri, 26 Jul 2024 21:42:32 GMT, David Holmes <dholmes at openjdk.org> wrote: >> src/hotspot/share/utilities/utf8.cpp line 440: >> >>> 438: // Could be first or fourth byte. If fourth >>> 439: // then 2 bytes before will have second byte pattern (0b1010xxxx) >>> 440: if ((index - 3) >= 0 && ((buffer[index - 2] & 0xA0) == 0xA0)) { >> >> Suggestion: >> >> if ((index - 3) >= 0 && ((buffer[index - 2] & 0xF0) == 0xA0)) { > > I don't understand the rationale for the suggestion sorry. I am looking specifically for the second byte of six pattern. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693630123 From dholmes at openjdk.org Fri Jul 26 21:46:42 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 26 Jul 2024 21:46:42 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string In-Reply-To: <S1NZjbJMW41XauI6C9DQy6i4IPitvkb-1UJWz8Rp3OI=.10e0de51-fe1a-44af-b414-053faf37737b@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> <S1NZjbJMW41XauI6C9DQy6i4IPitvkb-1UJWz8Rp3OI=.10e0de51-fe1a-44af-b414-053faf37737b@github.com> Message-ID: <Hx2L_c-7TZ4xp3QGfZWrYAsuw35Z4f90q7pMX-SseTE=.30230b2b-efe6-4339-a4bd-6ee12a4a706d@github.com> On Fri, 26 Jul 2024 08:16:14 GMT, Daniel Jeli?ski <djelinski at openjdk.org> wrote: >> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence. >> >> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this. >> >> Testing: >> - new gtest exercises the truncation code with the different possibilities for bad truncation >> - tiers 1-3 sanity testing >> >> Thanks. > > src/hotspot/share/utilities/utf8.cpp line 440: > >> 438: // Could be first or fourth byte. If fourth >> 439: // then 2 bytes before will have second byte pattern (0b1010xxxx) >> 440: if ((index - 3) >= 0 && ((buffer[index - 2] & 0xA0) == 0xA0)) { > > Suggestion: > > if ((index - 3) >= 0 && ((buffer[index - 2] & 0xF0) == 0xA0)) { I don't understand the rationale for the suggestion sorry. > src/hotspot/share/utilities/utf8.cpp line 442: > >> 440: if ((index - 3) >= 0 && ((buffer[index - 2] & 0xA0) == 0xA0)) { >> 441: // it was fourth byte so truncate 3 bytes earlier >> 442: assert(buffer[index - 3] == 0xED, "malformed sequence"); > > This needs to be an if, not an assert: ec-a0-80 is a [legitimate 3-byte UTF-8](https://www.compart.com/en/unicode/U+C800) Will need to re-examine this part. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693629740 PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693630625 From vlivanov at openjdk.org Fri Jul 26 22:01:38 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 26 Jul 2024 22:01:38 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v9] In-Reply-To: <LnkS1a2xutLFBgsUO0b-doRPPTDCBjRAuiMWGquAvhU=.3de28018-570d-49f8-9cc1-4a3ea577a0b9@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <LnkS1a2xutLFBgsUO0b-doRPPTDCBjRAuiMWGquAvhU=.3de28018-570d-49f8-9cc1-4a3ea577a0b9@github.com> Message-ID: <6elom8uMJFqHyYuOgOk56W_YuIczI5vTlDcYKXUKr2Q=.51a5c0e5-6342-4e57-af8e-0fe9b1f3648e@github.com> On Fri, 26 Jul 2024 15:13:06 GMT, Andrew Haley <aph at openjdk.org> wrote: >> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't known at compile >> time. >> >> HotSpot shared runtime >> ---------------------- >> >> Building hashed secondary tables is now unconditional. It takes very >> little time, and now that the shared runtime always has the tables, it >> might as well take advantage of them. The shared code is easier to >> follow now, I think. >> >> There might be a performance issue with x86-64 in that we build >> HotSpot for a default x86-64 target that does not support popcount. >> This means that HotSpot C++ runtime on x86 always uses a software >> emulation for popcount, even though the vast majority of machines made >> for the past 20 years can do popcount in a single instruction. It >> wouldn't be terribly hard to do something about that. >> >> Having said that, the software popcount is really not bad. >> >> x86 >> --- >> >> x86 is rather tricky, because we still support >> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as >> well as 32- and 64-bit ports. There's some further complication in >> that only `RCX` can be used as a shift count, so there's some register >> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp >> rather gnarly, with multiple levels of conditionals at compile time >> and runtime. >> >> AArch64 >> ------- >> >> AArch64 is considerably more straightforward. We always have a >> popcount instruction and (thankfully) no 32-bit code to worry about. >> >> Generally >> --------- >> >> I would dearly love simply to rip out the "old" secondary supers cache >> support, but I've left it in just in case someone has a performance >> regression. >> >> The versions of `MacroAssembler::lookup_secondary_supers_table` that >> work with variable superclasses don't take a fixed set of temp >> registers, and neither do they call out to to a slow path subroutine. >> Instead, the slow patch is expanded inline. >> >> I don't think this is necessarily bad. Apart from the very rare cases >> where C2 can't determine the superclass to search for at compile time, >> this code is only used for generating stubs, and it seemed to me >> ridiculous to have stubs calling other stubs. >> >> I've followed the guidance from @iwanowww not to obsess too much about >> the performance of C1-compiled secondary supers lookups, and to prefer >> simplicity over absolute performance. Nonetheless, this i... > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Fix test failure Yes, I'm in favor of avoiding probing the table when `SECONDARY_SUPERS_BITMAP_FULL == bitmap`. It doesn't look right when the code treats `secondary_supers` as a table irrespective of whether it was hashed or not. IMO it unnecessarily complicates things and may continue to be a source of bugs. Also, you can rearrange fast path checks: probe the home slot bit first, then check for `SECONDARY_SUPERS_BITMAP_FULL != bitmap` before accessing `secondary_supers`. It won't help with inlined code size increase, but negative lookups will stay mostly unaffected by the additional check. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2253563566 From vlivanov at openjdk.org Sat Jul 27 00:03:33 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 27 Jul 2024 00:03:33 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v9] In-Reply-To: <LnkS1a2xutLFBgsUO0b-doRPPTDCBjRAuiMWGquAvhU=.3de28018-570d-49f8-9cc1-4a3ea577a0b9@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <LnkS1a2xutLFBgsUO0b-doRPPTDCBjRAuiMWGquAvhU=.3de28018-570d-49f8-9cc1-4a3ea577a0b9@github.com> Message-ID: <a8tSEj-R9U4twOMRH_3nlAKLyFT9_OiA2uvO26gaCUs=.a1097ab0-ad3d-4e73-8d3e-2f7a4c4bfc05@github.com> On Fri, 26 Jul 2024 15:13:06 GMT, Andrew Haley <aph at openjdk.org> wrote: >> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't known at compile >> time. >> >> HotSpot shared runtime >> ---------------------- >> >> Building hashed secondary tables is now unconditional. It takes very >> little time, and now that the shared runtime always has the tables, it >> might as well take advantage of them. The shared code is easier to >> follow now, I think. >> >> There might be a performance issue with x86-64 in that we build >> HotSpot for a default x86-64 target that does not support popcount. >> This means that HotSpot C++ runtime on x86 always uses a software >> emulation for popcount, even though the vast majority of machines made >> for the past 20 years can do popcount in a single instruction. It >> wouldn't be terribly hard to do something about that. >> >> Having said that, the software popcount is really not bad. >> >> x86 >> --- >> >> x86 is rather tricky, because we still support >> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as >> well as 32- and 64-bit ports. There's some further complication in >> that only `RCX` can be used as a shift count, so there's some register >> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp >> rather gnarly, with multiple levels of conditionals at compile time >> and runtime. >> >> AArch64 >> ------- >> >> AArch64 is considerably more straightforward. We always have a >> popcount instruction and (thankfully) no 32-bit code to worry about. >> >> Generally >> --------- >> >> I would dearly love simply to rip out the "old" secondary supers cache >> support, but I've left it in just in case someone has a performance >> regression. >> >> The versions of `MacroAssembler::lookup_secondary_supers_table` that >> work with variable superclasses don't take a fixed set of temp >> registers, and neither do they call out to to a slow path subroutine. >> Instead, the slow patch is expanded inline. >> >> I don't think this is necessarily bad. Apart from the very rare cases >> where C2 can't determine the superclass to search for at compile time, >> this code is only used for generating stubs, and it seemed to me >> ridiculous to have stubs calling other stubs. >> >> I've followed the guidance from @iwanowww not to obsess too much about >> the performance of C1-compiled secondary supers lookups, and to prefer >> simplicity over absolute performance. Nonetheless, this i... > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Fix test failure BTW it makes sense to assert the invariant. Here's a patch (accompanied by a minor cleanup): diff --git a/src/hotspot/share/oops/instanceKlass.cpp b/src/hotspot/share/oops/instanceKlass.cpp index b050784dfc5..5413e29defb 100644 --- a/src/hotspot/share/oops/instanceKlass.cpp +++ b/src/hotspot/share/oops/instanceKlass.cpp @@ -647,7 +647,7 @@ void InstanceKlass::deallocate_contents(ClassLoaderData* loader_data) { !secondary_supers()->is_shared()) { MetadataFactory::free_array<Klass*>(loader_data, secondary_supers()); } - set_secondary_supers(nullptr); + set_secondary_supers(nullptr, SECONDARY_SUPERS_BITMAP_EMPTY); deallocate_interfaces(loader_data, super(), local_interfaces(), transitive_interfaces()); set_transitive_interfaces(nullptr); diff --git a/src/hotspot/share/oops/klass.cpp b/src/hotspot/share/oops/klass.cpp index 26ec25d1c80..b1012810be4 100644 --- a/src/hotspot/share/oops/klass.cpp +++ b/src/hotspot/share/oops/klass.cpp @@ -319,16 +319,16 @@ bool Klass::can_be_primary_super_slow() const { return true; } -void Klass::set_secondary_supers(Array<Klass*>* secondaries) { - assert(!UseSecondarySupersTable || secondaries == nullptr, ""); - set_secondary_supers(secondaries, SECONDARY_SUPERS_BITMAP_EMPTY); -} - void Klass::set_secondary_supers(Array<Klass*>* secondaries, uintx bitmap) { #ifdef ASSERT if (secondaries != nullptr) { uintx real_bitmap = compute_secondary_supers_bitmap(secondaries); assert(bitmap == real_bitmap, "must be"); + if (bitmap != SECONDARY_SUPERS_BITMAP_FULL) { + assert(((uint)secondaries->length() == population_count(bitmap)), "required"); + } + } else { + assert(bitmap == SECONDARY_SUPERS_BITMAP_EMPTY, ""); } #endif _secondary_supers_bitmap = bitmap; diff --git a/src/hotspot/share/oops/klass.hpp b/src/hotspot/share/oops/klass.hpp index 2f733e11eef..a9e73e7bcbd 100644 --- a/src/hotspot/share/oops/klass.hpp +++ b/src/hotspot/share/oops/klass.hpp @@ -236,7 +236,6 @@ class Klass : public Metadata { void set_secondary_super_cache(Klass* k) { _secondary_super_cache = k; } Array<Klass*>* secondary_supers() const { return _secondary_supers; } - void set_secondary_supers(Array<Klass*>* k); void set_secondary_supers(Array<Klass*>* k, uintx bitmap); uint8_t hash_slot() const { return _hash_slot; } ------------- PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2253660006 From sspitsyn at openjdk.org Sat Jul 27 01:31:48 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 27 Jul 2024 01:31:48 GMT Subject: RFR: 8330427: Obsolete -XX:+PreserveAllAnnotations [v2] In-Reply-To: <r58fJ8zGBJ158Ucsgpo3dRY3cirAx_5PsdlPozgXwjE=.afe2557e-6603-478b-932e-be72604ecc2c@github.com> References: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com> <yoYPcRiwlovmm5hdLcD8y1d25ABb3r5KUniSzzyfBzI=.be9f5747-fa03-4b13-ba53-4d868ea85989@github.com> <o-XobOOcOSevq7Rfqt6VAKNZ1BdxEGekXLLxsMN0iR4=.276a4f9a-ed9f-472b-ab02-aa73413d1bdf@github.com> <r58fJ8zGBJ158Ucsgpo3dRY3cirAx_5PsdlPozgXwjE=.afe2557e-6603-478b-932e-be72604ecc2c@github.com> Message-ID: <46dVI9rDgSSWdNtPFQF-J-QYMFnG-zKUSe2NlOkxUl0=.c923333c-0450-44a0-b1cb-89f5b776806c@github.com> On Fri, 26 Jul 2024 17:56:27 GMT, Alex Menkov <amenkov at openjdk.org> wrote: > We skip (do not process) invisible annotations Okay, thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20315#issuecomment-2253694697 From stuefe at openjdk.org Sat Jul 27 05:41:36 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 27 Jul 2024 05:41:36 GMT Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic In-Reply-To: <1zW4OT5fJqNOIVmEJzaa75P1pkOtTDCc5o3As0Cbrfg=.37b21e54-fb16-4015-a910-40ead48c94b3@github.com> References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com> <ej5ON8iDbUMsORwZNuLzDbXERpzGJde7q_hd50vKPGo=.34c0d39e-7a85-49a7-9d10-363a9800cc4d@github.com> <1zW4OT5fJqNOIVmEJzaa75P1pkOtTDCc5o3As0Cbrfg=.37b21e54-fb16-4015-a910-40ead48c94b3@github.com> Message-ID: <3FZJyHPjSnJUN-wuslNgoJDIQu6toFSyhagzBPQ_ZV4=.ccd8162b-bb4a-4594-954e-57f43310f219@github.com> On Fri, 26 Jul 2024 18:18:45 GMT, Ashutosh Mehra <asmehra at openjdk.org> wrote: >> src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 204: >> >>> 202: size_t _total; >>> 203: // usage per arena tag when total peaked >>> 204: size_t _tags_size_at_peak[Arena::tag_count()]; >> >> Can you please make sure Arena::tag_count() evaluates to constexpr? When in doubt, just use the enum value instead. > > Arena::tag_count() is declared as a constexpr. I wanted to avoid writing `static_cast<int>(Arena::Tag::tag_count)` every time I need tag_count, so I wrapped it in Arena::tag_count() and declared it with constexpr. Is that not sufficient to make it a constexpr? Okay then, that is fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1693848291 From stuefe at openjdk.org Sat Jul 27 05:46:34 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 27 Jul 2024 05:46:34 GMT Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic In-Reply-To: <1zW4OT5fJqNOIVmEJzaa75P1pkOtTDCc5o3As0Cbrfg=.37b21e54-fb16-4015-a910-40ead48c94b3@github.com> References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com> <ej5ON8iDbUMsORwZNuLzDbXERpzGJde7q_hd50vKPGo=.34c0d39e-7a85-49a7-9d10-363a9800cc4d@github.com> <1zW4OT5fJqNOIVmEJzaa75P1pkOtTDCc5o3As0Cbrfg=.37b21e54-fb16-4015-a910-40ead48c94b3@github.com> Message-ID: <x1uZAfQc-7Dvbhv5cy7wm7CGyTGWmc8oOCs23DrnXVI=.85639be2-68c2-4c49-9ac2-12cef799f77c@github.com> On Fri, 26 Jul 2024 18:21:05 GMT, Ashutosh Mehra <asmehra at openjdk.org> wrote: >> src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 242: >> >>> 240: for (int tag = 0; tag < Arena::tag_count(); tag++) { >>> 241: st->print_cr(" " LEGEND_KEY_FMT ": %s", Arena::tag_name[tag], Arena::tag_desc[tag]); >>> 242: } >> >> use x macro? > > What do you mean by x macro? Do you have an example that shows the use of x macro? You use them already in your patch. E.g. #define XX(name, whatever, desc) st->print_cr(" " LEGEND_KEY_FMT ": " #name #desc DO_ARENA_TAG(XX) #undef XX Admittedly, it is not a lot less code than the for loop. Up to you. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1693851006 From stuefe at openjdk.org Sat Jul 27 06:13:34 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 27 Jul 2024 06:13:34 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v14] In-Reply-To: <X1PNORe3zCsQbH8DQhGBwUACW8f501e9_IBAmvUiUV8=.ec8e20b1-4b8e-4a92-8654-c2a8d1a9f94d@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <rKeKx8FnFBhN6mW30EXQDJcETtRcLimDZwu_Z3VQdyA=.5b821a7b-3753-4146-89bb-f5a64effc8c5@github.com> <X1PNORe3zCsQbH8DQhGBwUACW8f501e9_IBAmvUiUV8=.ec8e20b1-4b8e-4a92-8654-c2a8d1a9f94d@github.com> Message-ID: <d47EseDyodKwKOaWHIo_zDzOj44sQXvZCucr0V0vV8U=.9c847055-903b-45a8-b2ba-c4c27b15211e@github.com> On Fri, 26 Jul 2024 11:39:02 GMT, Kevin Walls <kevinw at openjdk.org> wrote: > One more thing that's troubling me. (Apologies it's now and not last week.) > > I was looking at the _filename.value().get() usage and finding it uncomfortable, compared to the previous simple _filename.value() 8-) Harder to remember and to read and understand. Maybe we can avoid the two accessors, it really is just a char*. > > These additional argument types look like part of the framework which never found an audience: MemorySizeArgument has one usage in CompilationMemoryStatisticDCmd, NanoTimeArgument looks unused -- so the two-accessor usage is only in once place until now? > > Adding FileArgument as another of these might be the wrong direction, as these classes are so almost redundant. > > What if we didn't add FileArgument, and kept using <char*> for _filename args/opts: > > Then in DCmdArgument<char*>::parse_value(), recognise a "FILE" argument type and call Arguments::copy_expand_pid there, to set _value. > > Just seeing if we can cut down some of the complexity here, as Thomas mentioned, it is already very complex for what it is! > > (There is also the to_string method which seemed like it would be useful here, but it needs a buffer so is more complex than calling two accessors... Another thing that seems to part of the framework that was never much adopted.) IMHO for a functional addition we should follow the established pattern. Reworking the framework is certainly useful, but I would like it if we could get this done first (I intend to use it in other DCmds). And if we simplify this coding, we should think first about how to do this and what to solve. Things that come to mind: - overuse of template - The argument-type-by-template-division and the runtime "type" string argument (the third argument to DCmdArgument) seem redundant - the fact that we keep command metadata (which should be constant) together with command invocation data (values for arguments that are scoped to a single command invocation) in a single global structure, and then rewrite the latter every time we invoke a command. That is a strange concept and makes cleaning up temporary memory non-trivial - the fact that each new command takes a ton of boilerplate coding (Just see the many many repetitions in diagnosticCommand.cpp) - the fact that we use runtime-polymorphy, which is completely fine, but then all metadata information are "static". So in order to e.g. know how many arguments a command takes, you need to know the command class, since you cannot just call e.g. `num_arguments()` on a `DCmdWithParser*`. I think the whole framework could be done without templates, just using plain old virtual functions instead. This is not code where a vtable lookup really hurts. Just my random thoughts. Maybe there is more, but my point is that if we agree this can be improved, it would be better in a separate RFE, and not mixed into functional RFEs. @lmesnik > Also I would recommend to get approval from svc team reviewer. Who could this be, @plummercj ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2253809524 From cjplummer at openjdk.org Sat Jul 27 07:05:38 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Sat, 27 Jul 2024 07:05:38 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v14] In-Reply-To: <d47EseDyodKwKOaWHIo_zDzOj44sQXvZCucr0V0vV8U=.9c847055-903b-45a8-b2ba-c4c27b15211e@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <rKeKx8FnFBhN6mW30EXQDJcETtRcLimDZwu_Z3VQdyA=.5b821a7b-3753-4146-89bb-f5a64effc8c5@github.com> <X1PNORe3zCsQbH8DQhGBwUACW8f501e9_IBAmvUiUV8=.ec8e20b1-4b8e-4a92-8654-c2a8d1a9f94d@github.com> <d47EseDyodKwKOaWHIo_zDzOj44sQXvZCucr0V0vV8U=.9c847055-903b-45a8-b2ba-c4c27b15211e@github.com> Message-ID: <V0dcMbHYRhqAtqnMM49cqRjFGzX8VLP627FER7IWXjA=.7c8e21ff-7f83-4b3c-9d53-dae883e58a53@github.com> On Sat, 27 Jul 2024 06:11:00 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > > Also I would recommend to get approval from svc team reviewer. > > Who could this be, @plummercj ? @kevinjwalls is on the svc team and has been involved in this review, and @dholmes-ora, @lmesnik, and @AlanBateman all count as svc reviewers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2253858030 From vlivanov at openjdk.org Sat Jul 27 07:12:33 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 27 Jul 2024 07:12:33 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v9] In-Reply-To: <LnkS1a2xutLFBgsUO0b-doRPPTDCBjRAuiMWGquAvhU=.3de28018-570d-49f8-9cc1-4a3ea577a0b9@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <LnkS1a2xutLFBgsUO0b-doRPPTDCBjRAuiMWGquAvhU=.3de28018-570d-49f8-9cc1-4a3ea577a0b9@github.com> Message-ID: <Smmitd15ELI7ZZgx_6FqgbOv8zKs-Ye8AAgiJntOM7g=.4f7fd243-d220-4401-a6c2-a308e04b6bb5@github.com> On Fri, 26 Jul 2024 15:13:06 GMT, Andrew Haley <aph at openjdk.org> wrote: >> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't known at compile >> time. >> >> HotSpot shared runtime >> ---------------------- >> >> Building hashed secondary tables is now unconditional. It takes very >> little time, and now that the shared runtime always has the tables, it >> might as well take advantage of them. The shared code is easier to >> follow now, I think. >> >> There might be a performance issue with x86-64 in that we build >> HotSpot for a default x86-64 target that does not support popcount. >> This means that HotSpot C++ runtime on x86 always uses a software >> emulation for popcount, even though the vast majority of machines made >> for the past 20 years can do popcount in a single instruction. It >> wouldn't be terribly hard to do something about that. >> >> Having said that, the software popcount is really not bad. >> >> x86 >> --- >> >> x86 is rather tricky, because we still support >> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as >> well as 32- and 64-bit ports. There's some further complication in >> that only `RCX` can be used as a shift count, so there's some register >> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp >> rather gnarly, with multiple levels of conditionals at compile time >> and runtime. >> >> AArch64 >> ------- >> >> AArch64 is considerably more straightforward. We always have a >> popcount instruction and (thankfully) no 32-bit code to worry about. >> >> Generally >> --------- >> >> I would dearly love simply to rip out the "old" secondary supers cache >> support, but I've left it in just in case someone has a performance >> regression. >> >> The versions of `MacroAssembler::lookup_secondary_supers_table` that >> work with variable superclasses don't take a fixed set of temp >> registers, and neither do they call out to to a slow path subroutine. >> Instead, the slow patch is expanded inline. >> >> I don't think this is necessarily bad. Apart from the very rare cases >> where C2 can't determine the superclass to search for at compile time, >> this code is only used for generating stubs, and it seemed to me >> ridiculous to have stubs calling other stubs. >> >> I've followed the guidance from @iwanowww not to obsess too much about >> the performance of C1-compiled secondary supers lookups, and to prefer >> simplicity over absolute performance. Nonetheless, this i... > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Fix test failure src/hotspot/share/oops/klass.cpp line 285: > 283: // which doesn't zero out the memory before calling the constructor. > 284: Klass::Klass(KlassKind kind) : _kind(kind), > 285: _secondary_supers_bitmap(SECONDARY_SUPERS_BITMAP_EMPTY), Looks like it is redundant since metaspace allocation already initializes memory with zeroes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1693907175 From djelinski at openjdk.org Sat Jul 27 07:28:36 2024 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Sat, 27 Jul 2024 07:28:36 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string In-Reply-To: <BGSEf3h_EuLOHuRHwBJl5h_VMezDWyv7j0w4xGgZXeA=.e5919fd1-0544-44ac-b11d-62b19e1c5bc1@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> <S1NZjbJMW41XauI6C9DQy6i4IPitvkb-1UJWz8Rp3OI=.10e0de51-fe1a-44af-b414-053faf37737b@github.com> <Hx2L_c-7TZ4xp3QGfZWrYAsuw35Z4f90q7pMX-SseTE=.30230b2b-efe6-4339-a4bd-6ee12a4a706d@github.com> <BGSEf3h_EuLOHuRHwBJl5h_VMezDWyv7j0w4xGgZXeA=.e5919fd1-0544-44ac-b11d-62b19e1c5bc1@github.com> Message-ID: <3bljylhUd2wp8G_TSbeTaca4F4i4HxJUdfJRUKtLbrw=.f435237b-83f0-4e41-ba67-6375cd0f6a25@github.com> On Fri, 26 Jul 2024 21:42:58 GMT, David Holmes <dholmes at openjdk.org> wrote: >> I don't understand the rationale for the suggestion sorry. > > I am looking specifically for the second byte of six pattern. The current code matches the pattern 0b1x1xxxxx, and you want to match 0b1010xxxx ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693912937 From djelinski at openjdk.org Sat Jul 27 07:28:37 2024 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Sat, 27 Jul 2024 07:28:37 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string In-Reply-To: <Hx2L_c-7TZ4xp3QGfZWrYAsuw35Z4f90q7pMX-SseTE=.30230b2b-efe6-4339-a4bd-6ee12a4a706d@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> <S1NZjbJMW41XauI6C9DQy6i4IPitvkb-1UJWz8Rp3OI=.10e0de51-fe1a-44af-b414-053faf37737b@github.com> <Hx2L_c-7TZ4xp3QGfZWrYAsuw35Z4f90q7pMX-SseTE=.30230b2b-efe6-4339-a4bd-6ee12a4a706d@github.com> Message-ID: <sn6y6ztyTPsUW3ysMQC_685RPAxOzfutbYCE_yS1Oxw=.452272d1-43f5-4d07-8a32-9a4cd23ef67d@github.com> On Fri, 26 Jul 2024 21:43:49 GMT, David Holmes <dholmes at openjdk.org> wrote: >> src/hotspot/share/utilities/utf8.cpp line 442: >> >>> 440: if ((index - 3) >= 0 && ((buffer[index - 2] & 0xA0) == 0xA0)) { >>> 441: // it was fourth byte so truncate 3 bytes earlier >>> 442: assert(buffer[index - 3] == 0xED, "malformed sequence"); >> >> This needs to be an if, not an assert: ec-a0-80 is a [legitimate 3-byte UTF-8](https://www.compart.com/en/unicode/U+C800) > > Will need to re-examine this part. Keep in mind that 0xed a few lines above could have matched the first byte and not the fourth one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693913232 From djelinski at openjdk.org Sat Jul 27 08:14:31 2024 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Sat, 27 Jul 2024 08:14:31 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string In-Reply-To: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> Message-ID: <yTmjLaPOdaxUACJLH6e_6WZpi3nJRdAclowzh55uKmc=.74a06bba-8b03-4aa7-b900-8e1624579f9b@github.com> On Fri, 26 Jul 2024 04:03:10 GMT, David Holmes <dholmes at openjdk.org> wrote: > Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence. > > The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this. > > Testing: > - new gtest exercises the truncation code with the different possibilities for bad truncation > - tiers 1-3 sanity testing > > Thanks. src/hotspot/share/utilities/utf8.cpp line 398: > 396: // byte sequence. > 397: static bool is_starting_byte(unsigned char b) { > 398: return b >= 0xC0 && b <= 0xEF;; Do you plan to use this method only for modified UTF-8 or for standard Utf-8 as well? Standard UTF-8 also uses F0-F7 as starting bytes. Also, remove the double semicolon. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693919203 From dholmes at openjdk.org Sat Jul 27 12:18:34 2024 From: dholmes at openjdk.org (David Holmes) Date: Sat, 27 Jul 2024 12:18:34 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string In-Reply-To: <yTmjLaPOdaxUACJLH6e_6WZpi3nJRdAclowzh55uKmc=.74a06bba-8b03-4aa7-b900-8e1624579f9b@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> <yTmjLaPOdaxUACJLH6e_6WZpi3nJRdAclowzh55uKmc=.74a06bba-8b03-4aa7-b900-8e1624579f9b@github.com> Message-ID: <qOpeJqOJUzSaqPZQtnHc_Xh8KXmgPLtaUCMy4DrwKn4=.9cfd92b6-fbf2-4600-a310-2ac32e9b228d@github.com> On Sat, 27 Jul 2024 08:11:53 GMT, Daniel Jeli?ski <djelinski at openjdk.org> wrote: >> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence. >> >> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this. >> >> Testing: >> - new gtest exercises the truncation code with the different possibilities for bad truncation >> - tiers 1-3 sanity testing >> >> Thanks. > > src/hotspot/share/utilities/utf8.cpp line 398: > >> 396: // byte sequence. >> 397: static bool is_starting_byte(unsigned char b) { >> 398: return b >= 0xC0 && b <= 0xEF;; > > Do you plan to use this method only for modified UTF-8 or for standard Utf-8 as well? Standard UTF-8 also uses F0-F7 as starting bytes. > > Also, remove the double semicolon. AFAIK the VM only deals with modified UTF-8. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693948995 From dholmes at openjdk.org Sat Jul 27 12:18:35 2024 From: dholmes at openjdk.org (David Holmes) Date: Sat, 27 Jul 2024 12:18:35 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string In-Reply-To: <sn6y6ztyTPsUW3ysMQC_685RPAxOzfutbYCE_yS1Oxw=.452272d1-43f5-4d07-8a32-9a4cd23ef67d@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> <S1NZjbJMW41XauI6C9DQy6i4IPitvkb-1UJWz8Rp3OI=.10e0de51-fe1a-44af-b414-053faf37737b@github.com> <Hx2L_c-7TZ4xp3QGfZWrYAsuw35Z4f90q7pMX-SseTE=.30230b2b-efe6-4339-a4bd-6ee12a4a706d@github.com> <sn6y6ztyTPsUW3ysMQC_685RPAxOzfutbYCE_yS1Oxw=.452272d1-43f5-4d07-8a32-9a4cd23ef67d@github.com> Message-ID: <5prDfKO5gMQOw5SCMMpNnWFpy4rust1LIhEeR4wdNek=.dafc8f68-6019-4b53-8405-ba8451b26336@github.com> On Sat, 27 Jul 2024 07:25:30 GMT, Daniel Jeli?ski <djelinski at openjdk.org> wrote: >> Will need to re-examine this part. > > Keep in mind that 0xed a few lines above could have matched the first byte and not the fourth one. So ... the first three bytes of a six byte sequence can be indistinguishable from a three byte sequence! How does that work??? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693949216 From dholmes at openjdk.org Sat Jul 27 12:22:30 2024 From: dholmes at openjdk.org (David Holmes) Date: Sat, 27 Jul 2024 12:22:30 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string In-Reply-To: <vz8nu18FeuJADlmZjGknJXHdBzCkuBxR6-w-18bWboI=.27e0969a-e5bf-43b8-9dc0-b018f7034fe8@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> <wHY5e9XeMFpUyA7Zr0RKG2zIXC3rB5dqklIuzb8TnAQ=.55cc765a-6ec8-46dc-8cf1-4fe49d4aa476@github.com> <vz8nu18FeuJADlmZjGknJXHdBzCkuBxR6-w-18bWboI=.27e0969a-e5bf-43b8-9dc0-b018f7034fe8@github.com> Message-ID: <Qk1D6SlZr4nMtAPOi2ct8DRkIRZD2vqJI9nzXcxA2DM=.fd981aac-84e3-445b-952a-bd1f0c81e042@github.com> On Fri, 26 Jul 2024 21:35:08 GMT, David Holmes <dholmes at openjdk.org> wrote: >> src/hotspot/share/utilities/exceptions.cpp line 276: >> >>> 274: // sequence is valid. >>> 275: if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) { >>> 276: assert(msg[max_msg_size - 1] == '\0', "should be null terminated"); >> >> Would this always be true? For a formatting error, too? >> Maybe just to be sure, instead of asserting set the last byte to zero. > > vsnprintf is supposed to guarantee it, and os::vsnprint does IIRC, so this is just a sanity check. Yep os::vnsprintf guarantees nul-termination ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693949707 From dholmes at openjdk.org Sat Jul 27 12:22:31 2024 From: dholmes at openjdk.org (David Holmes) Date: Sat, 27 Jul 2024 12:22:31 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string In-Reply-To: <wHY5e9XeMFpUyA7Zr0RKG2zIXC3rB5dqklIuzb8TnAQ=.55cc765a-6ec8-46dc-8cf1-4fe49d4aa476@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> <wHY5e9XeMFpUyA7Zr0RKG2zIXC3rB5dqklIuzb8TnAQ=.55cc765a-6ec8-46dc-8cf1-4fe49d4aa476@github.com> Message-ID: <DxKmQKHRLgtYvDrRlj3raK94RWxL7yr25eIJ8vv_bSk=.2cdbbea7-8cd9-4dd0-af32-5edf25b610ba@github.com> On Fri, 26 Jul 2024 05:19:32 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence. >> >> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this. >> >> Testing: >> - new gtest exercises the truncation code with the different possibilities for bad truncation >> - tiers 1-3 sanity testing >> >> Thanks. > > src/hotspot/share/utilities/utf8.cpp line 407: > >> 405: // To avoid that the caller can choose to check for validity first. >> 406: // The incoming buffer is still expected to be NUL-terminated. >> 407: void UTF8::truncate_to_legal_utf8(unsigned char* buffer, int length) { > > Lets make buffer length size_t and avoid awkward casting No this code uses `int` for length everywhere. Feel free to file a RFE to change it all. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693949553 From dholmes at openjdk.org Sat Jul 27 12:33:30 2024 From: dholmes at openjdk.org (David Holmes) Date: Sat, 27 Jul 2024 12:33:30 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string In-Reply-To: <3bljylhUd2wp8G_TSbeTaca4F4i4HxJUdfJRUKtLbrw=.f435237b-83f0-4e41-ba67-6375cd0f6a25@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> <S1NZjbJMW41XauI6C9DQy6i4IPitvkb-1UJWz8Rp3OI=.10e0de51-fe1a-44af-b414-053faf37737b@github.com> <Hx2L_c-7TZ4xp3QGfZWrYAsuw35Z4f90q7pMX-SseTE=.30230b2b-efe6-4339-a4bd-6ee12a4a706d@github.com> <BGSEf3h_EuLOHuRHwBJl5h_VMezDWyv7j0w4xGgZXeA=.e5919fd1-0544-44ac-b11d-62b19e1c5bc1@github.com> <3bljylhUd2wp8G_TSbeTaca4F4i4HxJUdfJRUKtLbrw=.f435237b-83f0-4e41-ba67-6375cd0f6a25@github.com> Message-ID: <7hvfenQ7f8IClATzJEu_HsTcpaVPMfQj2-Vguh0uvgc=.7e2c029b-aa42-44cb-827c-9c07892e46e5@github.com> On Sat, 27 Jul 2024 07:23:55 GMT, Daniel Jeli?ski <djelinski at openjdk.org> wrote: >> I am looking specifically for the second byte of six pattern. > > The current code matches the pattern 0b1x1xxxxx, and you want to match 0b1010xxxx Doh! Thanks for that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693951018 From dholmes at openjdk.org Sat Jul 27 12:37:32 2024 From: dholmes at openjdk.org (David Holmes) Date: Sat, 27 Jul 2024 12:37:32 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string In-Reply-To: <qOpeJqOJUzSaqPZQtnHc_Xh8KXmgPLtaUCMy4DrwKn4=.9cfd92b6-fbf2-4600-a310-2ac32e9b228d@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> <yTmjLaPOdaxUACJLH6e_6WZpi3nJRdAclowzh55uKmc=.74a06bba-8b03-4aa7-b900-8e1624579f9b@github.com> <qOpeJqOJUzSaqPZQtnHc_Xh8KXmgPLtaUCMy4DrwKn4=.9cfd92b6-fbf2-4600-a310-2ac32e9b228d@github.com> Message-ID: <2ayYKgAZcb7IUO4P3YMfmWfj_fsjyUBGxj9pvuwS0bg=.3692889f-4345-456f-ab39-faf88e8d340c@github.com> On Sat, 27 Jul 2024 12:13:55 GMT, David Holmes <dholmes at openjdk.org> wrote: >> src/hotspot/share/utilities/utf8.cpp line 398: >> >>> 396: // byte sequence. >>> 397: static bool is_starting_byte(unsigned char b) { >>> 398: return b >= 0xC0 && b <= 0xEF;; >> >> Do you plan to use this method only for modified UTF-8 or for standard Utf-8 as well? Standard UTF-8 also uses F0-F7 as starting bytes. >> >> Also, remove the double semicolon. > > AFAIK the VM only deals with modified UTF-8. ;; fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693951565 From fyang at openjdk.org Mon Jul 29 05:02:34 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 29 Jul 2024 05:02:34 GMT Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v6] In-Reply-To: <0NpNq_wNl-qus6kEr_6J7liSQXXYdjybbWQWDJPGPmQ=.8ba0ea43-2bc7-4f01-afee-adb4a43da29c@github.com> References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com> <0NpNq_wNl-qus6kEr_6J7liSQXXYdjybbWQWDJPGPmQ=.8ba0ea43-2bc7-4f01-afee-adb4a43da29c@github.com> Message-ID: <D-LqktQwsa-Mg1zpbQvnI-lKEfL8pdbhKgejdG14OmI=.8b0cd0b7-7527-4281-993a-9f8f7a571bf1@github.com> On Fri, 26 Jul 2024 08:10:01 GMT, Hamlin Li <mli at openjdk.org> wrote: >> Hi, >> Can you help to review the patch? >> >> I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it. >> >> Thanks. >> >> ## Test >> benchmarks run on CanVM-K230 (vlenb == 16), and banana-pi (vlenb == 32) >> >> I've tried several implementations, respectively with vector group >> * m2+m1+scalar >> * m2+scalar >> * m1+scalar >> * pure scalar >> The best one is combination of m2+m1, it have best performance in all source size. >> >> ### K230 >> >> this implementation (m2+m1) >> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"> >> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic >> -- | -- | -- | -- | -- | -- | -- | -- | -- >> Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063 >> Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443 >> Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364 >> Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856 >> Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146 >> Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423 >> Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862 >> Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469 >> Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631 >> Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698 >> Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876 >> >> </google-sheets-html-origin> >> >> vector with only m2 >> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 4... > > Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - merge master > - Merge branch 'master' into baes64-encode-integrated > - move label > - refine code > - use pure scalar version when rvv is not supported > - clean code > - Initial commit Hi, will take a look. BTW: Have you resolved the performance issue of base64 decode instrinsic? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19973#issuecomment-2254948125 From thartmann at openjdk.org Mon Jul 29 05:33:51 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 29 Jul 2024 05:33:51 GMT Subject: Integrated: 8336999: Verification for resource area allocated data structures in C2 In-Reply-To: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com> References: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com> Message-ID: <7dRxg7c8RpnnQ3_Y13IYbc_Qj0_yQyPNjDIFFyir5wU=.f991c976-154b-4b45-8009-2023cd42717c@github.com> On Wed, 24 Jul 2024 10:29:32 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: > Similar to [GrowableArrayNestingCheck](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/growableArray.cpp#L60), we should implement a check for C2's resource allocated data structures that verifies that reallocation happens under the same `ResourceMark` as the original allocation. Otherwise, use-after-free bugs like [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) will lead to memory corruption. > > This change adds a [ReallocMark](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/memory/allocation.cpp#L233) to all resource allocated data structures used by C2. I slightly modified it such that it checks the arena and skips verification if the data is not allocated in the resource arena. I also modified the grow methods such that we perform verification even if no reallocation is required. In addition, I changed a few `Unique_Node_List` allocations in vector.cpp from `comp_arena` to resource area allocations because they only have a short lifetime. > > While testing, I hit the verification code from: > > V [libjvm.so+0x5c1ceb] ReallocMark::check(Arena*)+0x7b (allocation.cpp:244) > V [libjvm.so+0x6df2da] Block_Array::grow(unsigned int)+0x1a (block.cpp:43) > V [libjvm.so+0xb88679] PhaseCFG::do_DFS(Tarjan*, unsigned int)+0x159 (block.hpp:72) > V [libjvm.so+0xb88b6b] PhaseCFG::build_dominator_tree()+0xab (domgraph.cpp:74) > V [libjvm.so+0xd75791] PhaseCFG::do_global_code_motion()+0x11 (gcm.cpp:1635) > V [libjvm.so+0x9f4fd4] Compile::Code_Gen()+0x2a4 (compile.cpp:2949) > V [libjvm.so+0x9f5f16] Compile::Compile(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, DirectiveSet*)+0xba6 (compile.cpp:991) > > > It's a false positive because the code in `PhaseCFG::build_dominator_tree` pre-grows `PhaseCFG::_blocks` to prevent reallocation before entering the scope of a nested ResourceMark. I think that's bad practice and should be avoided. I changed the code to allocate `_blocks` in a separate arena and removed the pre-growing. > > This detects [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) right away, even with `java -Xcomp -version`. > > We should revisit the footprint impact of arena allocations in C2 with [JDK-8337015](https://bugs.openjdk.org/browse/JDK-8337015). > > Thanks, > Tobias This pull request has now been integrated. Changeset: 657c0bdd Author: Tobias Hartmann <thartmann at openjdk.org> URL: https://git.openjdk.org/jdk/commit/657c0bddf90b537ac653817571532705a6e3643a Stats: 56 lines in 14 files changed: 32 ins; 8 del; 16 mod 8336999: Verification for resource area allocated data structures in C2 Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/20311 From kevinw at openjdk.org Mon Jul 29 09:41:39 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Mon, 29 Jul 2024 09:41:39 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v14] In-Reply-To: <d47EseDyodKwKOaWHIo_zDzOj44sQXvZCucr0V0vV8U=.9c847055-903b-45a8-b2ba-c4c27b15211e@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <rKeKx8FnFBhN6mW30EXQDJcETtRcLimDZwu_Z3VQdyA=.5b821a7b-3753-4146-89bb-f5a64effc8c5@github.com> <X1PNORe3zCsQbH8DQhGBwUACW8f501e9_IBAmvUiUV8=.ec8e20b1-4b8e-4a92-8654-c2a8d1a9f94d@github.com> <d47EseDyodKwKOaWHIo_zDzOj44sQXvZCucr0V0vV8U=.9c847055-903b-45a8-b2ba-c4c27b15211e@github.com> Message-ID: <IUN4djOhB-ZaEz_AwNwcrQyyrVxdxqc-Th5nclLnPew=.1cc99b12-abc7-4556-b3c3-16219d7dec44@github.com> On Sat, 27 Jul 2024 06:11:00 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> One more thing that's troubling me. (Apologies it's now and not last week.) >> >> I was looking at the _filename.value().get() usage and finding it uncomfortable, compared to the previous simple _filename.value() 8-) >> Harder to remember and to read and understand. Maybe we can avoid the two accessors, it really is just a char*. >> >> These additional argument types look like part of the framework which never found an audience: MemorySizeArgument has one usage in CompilationMemoryStatisticDCmd, NanoTimeArgument looks unused -- so the two-accessor usage is only in once place until now? >> >> Adding FileArgument as another of these might be the wrong direction, as these classes are so almost redundant. >> >> What if we didn't add FileArgument, and kept using <char*> for _filename args/opts: >> >> Then in DCmdArgument<char*>::parse_value(), recognise a "FILE" argument type and call Arguments::copy_expand_pid there, to set _value. >> >> Just seeing if we can cut down some of the complexity here, as Thomas mentioned, it is already very complex for what it is! >> >> >> (There is also the to_string method which seemed like it would be useful here, but it needs a buffer so is more complex than calling two accessors... Another thing that seems to part of the framework that was never much adopted.) > >> One more thing that's troubling me. (Apologies it's now and not last week.) >> >> I was looking at the _filename.value().get() usage and finding it uncomfortable, compared to the previous simple _filename.value() 8-) Harder to remember and to read and understand. Maybe we can avoid the two accessors, it really is just a char*. >> >> These additional argument types look like part of the framework which never found an audience: MemorySizeArgument has one usage in CompilationMemoryStatisticDCmd, NanoTimeArgument looks unused -- so the two-accessor usage is only in once place until now? >> >> Adding FileArgument as another of these might be the wrong direction, as these classes are so almost redundant. >> >> What if we didn't add FileArgument, and kept using <char*> for _filename args/opts: >> >> Then in DCmdArgument<char*>::parse_value(), recognise a "FILE" argument type and call Arguments::copy_expand_pid there, to set _value. >> >> Just seeing if we can cut down some of the complexity here, as Thomas mentioned, it is already very complex for what it is! >> >> (There is also the to_string method which seemed like it would be useful here, but it needs a buffer so is more complex than calling two accessors... Another thing that seems to part of the framework that was never much adopted.) > > IMHO for a functional addition we should follow the established pattern. Reworking the framework is certainly useful, but I would like it if we could get this done first (I intend to use it in other DCmds). > > And if we simplify this coding, we should think first about how to do this and what to solve. Things that come to mind: > > - overuse of template > - The argument-type-by-template-division and the runtime "type" string argument (the third argument to DCmdArgument) seem redundant > - the fact that we keep command metadata (which should be constant) together with command invocation data (values for arguments that are scoped to a single command invocation) in a single global structure, and then rewrite the latter every time we invoke a command. That is a strange concept and makes cleaning up temporary memory non-trivial > - the fact that each new command takes a ton of boilerplate coding (Just see the many many repetitions in diagnosticCommand.cpp) > - the fact that we use runtime-polymorphy, which is completely fine, but then all metadata information are "static". So in order to e.g. know how many arguments a command takes, you need to know the command class, since you cannot just call e.... Thanks Thomas @tstuefe - We're agreeing that some of this framework is overly complex, and that we aren't going to simplify the framework in this change. But the more we adopt the obscure parts of the framework, the the harder it will be to move away from it, so that's the reason for suggesting not creating the FileArgument class. Use the simpler parts of this machine, with some special cases where necessary, like a char* argument which happens to be used for a FILEname (an input filename which gets %p substitution). The logic I don't follow is: Using this complex mechanism because it exists, when it only has one? actual usage. This seems to contradict the earlier max path len notes where it's suggested not to use a pattern established by about 140 other usages. Apologies Sonia for dragging this out, still really pleased to get this change happening. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2255464478 From jwtang at openjdk.org Mon Jul 29 09:42:09 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Mon, 29 Jul 2024 09:42:09 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option Message-ID: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing. ------------- Commit messages: - 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option Changes: https://git.openjdk.org/jdk/pull/20373/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337331 Stats: 135 lines in 3 files changed: 135 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20373/head:pull/20373 PR: https://git.openjdk.org/jdk/pull/20373 From jwtang at openjdk.org Mon Jul 29 09:49:11 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Mon, 29 Jul 2024 09:49:11 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v2] In-Reply-To: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> Message-ID: <XCGDY44bragDfsG7U5CYPgBFQRoMVtHdn13sWpRdwIE=.17f3be32-1b73-4958-a4f2-ce548299f29a@github.com> > I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing. Jiawei Tang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20373/files - new: https://git.openjdk.org/jdk/pull/20373/files/d768df02..00ec5887 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20373/head:pull/20373 PR: https://git.openjdk.org/jdk/pull/20373 From dholmes at openjdk.org Mon Jul 29 09:54:10 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 29 Jul 2024 09:54:10 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string [v2] In-Reply-To: <Bp9RxG0ZfwtVg7p9v_X_ZgogL1U-aG0ha7ME7nKW8c8=.49302a72-48f0-4b5b-bc16-64ff037f6006@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> <wHY5e9XeMFpUyA7Zr0RKG2zIXC3rB5dqklIuzb8TnAQ=.55cc765a-6ec8-46dc-8cf1-4fe49d4aa476@github.com> <Bp9RxG0ZfwtVg7p9v_X_ZgogL1U-aG0ha7ME7nKW8c8=.49302a72-48f0-4b5b-bc16-64ff037f6006@github.com> Message-ID: <6j41RLigngpx3YFjVu1Fx4btFX4_j_05PmGX4Yr6P4E=.8ec67827-1ec7-4d95-8209-f737604f432d@github.com> On Fri, 26 Jul 2024 21:39:16 GMT, David Holmes <dholmes at openjdk.org> wrote: >> src/hotspot/share/utilities/exceptions.cpp line 277: >> >>> 275: if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) { >>> 276: assert(msg[max_msg_size - 1] == '\0', "should be null terminated"); >>> 277: UTF8::truncate_to_legal_utf8((unsigned char*)msg, max_msg_size); >> >> Ah, I misread your patch and thought you pass in the strlen of the message to the truncation function, when in fact you pass in the hard coded message buffer size. >> >> But that begs the question of why you test strlen above, and more importantly, whether all cases where snprintf returns an error are truncation problems. It could have detected an invalid UTF8 sequence and aborted in the middle of it. > > The `strlen` check is to skip the empty buffer you can get on Windows if vsnprintf returns -1 due to overflow of INT_MAX. > > We are assuming/requiring that we start with a valid UTF8 sequence and the worst that will happen is that vsnprintf will truncate it. > > If we actually got -1 for a conversion error (no way to tell the difference in the two cases) then we would unnecessarily truncate, but we do not expect any such conversion errors - in part because we type check the format specifiers and args and so should never get a mismatch. Thanks for the off-list discussion @djelinski , I now understand what you mean here. Code updated and commented. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1694935121 From dholmes at openjdk.org Mon Jul 29 09:54:09 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 29 Jul 2024 09:54:09 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string [v2] In-Reply-To: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> Message-ID: <sZyCrr8Ti9Ad6EiJrSO_1fvCYsmLlrgHgFACt_790_Q=.ac6ceba1-8ffd-46ec-9e30-9ed3e6ad3cf4@github.com> > Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence. > > The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this. > > Testing: > - new gtest exercises the truncation code with the different possibilities for bad truncation > - tiers 1-3 sanity testing > > Thanks. David Holmes has updated the pull request incrementally with three additional commits since the last revision: - Fix logic for 4th byte of 6. - Fix logic error and typo - Ensure the buffer is > 5 bytes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20345/files - new: https://git.openjdk.org/jdk/pull/20345/files/c1a47375..02d636a6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20345&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20345&range=00-01 Stats: 15 lines in 1 file changed: 6 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/20345.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20345/head:pull/20345 PR: https://git.openjdk.org/jdk/pull/20345 From dholmes at openjdk.org Mon Jul 29 09:56:33 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 29 Jul 2024 09:56:33 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string [v2] In-Reply-To: <5prDfKO5gMQOw5SCMMpNnWFpy4rust1LIhEeR4wdNek=.dafc8f68-6019-4b53-8405-ba8451b26336@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> <S1NZjbJMW41XauI6C9DQy6i4IPitvkb-1UJWz8Rp3OI=.10e0de51-fe1a-44af-b414-053faf37737b@github.com> <Hx2L_c-7TZ4xp3QGfZWrYAsuw35Z4f90q7pMX-SseTE=.30230b2b-efe6-4339-a4bd-6ee12a4a706d@github.com> <sn6y6ztyTPsUW3ysMQC_685RPAxOzfutbYCE_yS1Oxw=.452272d1-43f5-4d07-8a32-9a4cd23ef67d@github.com> <5prDfKO5gMQOw5SCMMpNnWFpy4rust1LIhEeR4wdNek=.dafc8f68-6019-4b53-8405-ba8451b26336@github.com> Message-ID: <Ntb7qR15ci_xQZC9N4PWH99wvvkV_Uy_KienpbKm7jU=.2cf22b3c-74c1-4fe6-afef-c10d409f39ca@github.com> On Sat, 27 Jul 2024 12:15:50 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Keep in mind that 0xed a few lines above could have matched the first byte and not the fourth one. > > So ... the first three bytes of a six byte sequence can be indistinguishable from a three byte sequence! How does that work??? github seems to have swallowed my comment so at the risk of repeating it ... Thanks for the off-list discussion @djelinski , I now understand what you meant. The code and comments are updated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1694938900 From dholmes at openjdk.org Mon Jul 29 10:12:32 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 29 Jul 2024 10:12:32 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v2] In-Reply-To: <XCGDY44bragDfsG7U5CYPgBFQRoMVtHdn13sWpRdwIE=.17f3be32-1b73-4958-a4f2-ce548299f29a@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <XCGDY44bragDfsG7U5CYPgBFQRoMVtHdn13sWpRdwIE=.17f3be32-1b73-4958-a4f2-ce548299f29a@github.com> Message-ID: <jJBqPY8Oq4nyz90eWb6Nn_Kfh7g8F9yUI14S5rLHq4E=.eaac8272-9619-40fd-b804-c1933c340545@github.com> On Mon, 29 Jul 2024 09:49:11 GMT, Jiawei Tang <jwtang at openjdk.org> wrote: >> I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing. > > Jiawei Tang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option Can we not just preload the problematic class so that it won't happen during the transition? test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/TestPinCaseWithTrace.java line 2: > 1: /* > 2: * Copyright (c) 2024, 2024, Oracle and/or its affiliates. All rights reserved. Please only use a single year here. test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/TestPinCaseWithTrace.java line 36: > 34: * @run main/othervm/timeout=100 -Djdk.tracePinnedThreads=full TestPinCaseWithTrace > 35: * @run main/othervm/timeout=100 -javaagent:TestPinCaseWithTrace.jar TestPinCaseWithTrace > 36: * @run main/othervm/timeout=100 -Djdk.tracePinnedThreads=full -javaagent:TestPinCaseWithTrace.jar TestPinCaseWithTrace Unclear why we need the three variants. Also where does the timeout value come from? How long does the test take to run? test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/TestPinCaseWithTrace.java line 62: > 60: public static void main(String[] args) throws Exception{ > 61: ExecutorService scheduler = Executors.newFixedThreadPool(1); > 62: Thread.Builder builder = TestPinCaseWithTrace.virtualThreadBuilder(scheduler); Can you not just create a Virtual Thread directly rather than defining a single-threaded executor?? test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/libPinJNI.c line 28: > 26: JNIEXPORT jint JNICALL > 27: Java_TestPinCaseWithTrace_nativeFuncPin(JNIEnv* env, jclass klass, jint x) { > 28: jmethodID nativeBaz = (*env)->GetStaticMethodID(env, klass, "native2Java", "(I)I"); Suggestion: just use `m` rather than `nativeBaz`. ------------- PR Review: https://git.openjdk.org/jdk/pull/20373#pullrequestreview-2204496515 PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1694947729 PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1694949670 PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1694952022 PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1694954328 From alanb at openjdk.org Mon Jul 29 10:12:34 2024 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 29 Jul 2024 10:12:34 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v2] In-Reply-To: <XCGDY44bragDfsG7U5CYPgBFQRoMVtHdn13sWpRdwIE=.17f3be32-1b73-4958-a4f2-ce548299f29a@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <XCGDY44bragDfsG7U5CYPgBFQRoMVtHdn13sWpRdwIE=.17f3be32-1b73-4958-a4f2-ce548299f29a@github.com> Message-ID: <1XAlnt_F0byj_fkmh5Ggy2yPINoYi9wbfceo7HRbhfM=.ac35c579-282b-4f3f-8dac-510fde9a1c8a@github.com> On Mon, 29 Jul 2024 09:49:11 GMT, Jiawei Tang <jwtang at openjdk.org> wrote: >> I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing. > > Jiawei Tang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/TestPinCaseWithTrace.java line 2: > 1: /* > 2: * Copyright (c) 2024, 2024, Oracle and/or its affiliates. All rights reserved. I assume you didn't mean to include a date range on a new test. test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/TestPinCaseWithTrace.java line 66: > 64: System.out.println("call native: " + nativeFuncPin(1)); > 65: }); > 66: } Does this really need to use a custom scheduler? If not, the running the test with -Djdk.virtualThreadScheduler.maxPoolSize=1 would be simpler. If you really need a custom scheduler, the test can use jdk.test.lib.thread.VThreadScheduler. Also to create a pining scenario it can use jdk.test.lib.thread.VThreadPinner. You'll see examples of both in other tests. test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/TestPinCaseWithTrace.java line 70: > 68: static int native2Java(int b) { > 69: try { > 70: Thread.sleep(500); // try yield, will pin, javaagent+tracePinnedThreads will crash here (because of the class `PinnedThreadPrinter`) As noted in the JBS issue, -Djdk.tracePinnedThreads has been very problematic and has been removed in the loom repo as part of the object monitor changes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1694938805 PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1694941417 PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1694942324 From aph at openjdk.org Mon Jul 29 10:35:07 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 29 Jul 2024 10:35:07 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v10] In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: <P2b0tKeBww7FoIbgYj9vjJJfEI5sgo2nNsh7g61lynY=.6a90c2e6-d73f-4282-bcd9-157d56253d8c@github.com> > This patch expands the use of a hash table for secondary superclasses > to the interpreter, C1, and runtime. It also adds a C2 implementation > of hashed lookup in cases where the superclass isn't known at compile > time. > > HotSpot shared runtime > ---------------------- > > Building hashed secondary tables is now unconditional. It takes very > little time, and now that the shared runtime always has the tables, it > might as well take advantage of them. The shared code is easier to > follow now, I think. > > There might be a performance issue with x86-64 in that we build > HotSpot for a default x86-64 target that does not support popcount. > This means that HotSpot C++ runtime on x86 always uses a software > emulation for popcount, even though the vast majority of machines made > for the past 20 years can do popcount in a single instruction. It > wouldn't be terribly hard to do something about that. > > Having said that, the software popcount is really not bad. > > x86 > --- > > x86 is rather tricky, because we still support > `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as > well as 32- and 64-bit ports. There's some further complication in > that only `RCX` can be used as a shift count, so there's some register > shuffling to do. All of this makes the logic in macroAssembler_x86.cpp > rather gnarly, with multiple levels of conditionals at compile time > and runtime. > > AArch64 > ------- > > AArch64 is considerably more straightforward. We always have a > popcount instruction and (thankfully) no 32-bit code to worry about. > > Generally > --------- > > I would dearly love simply to rip out the "old" secondary supers cache > support, but I've left it in just in case someone has a performance > regression. > > The versions of `MacroAssembler::lookup_secondary_supers_table` that > work with variable superclasses don't take a fixed set of temp > registers, and neither do they call out to to a slow path subroutine. > Instead, the slow patch is expanded inline. > > I don't think this is necessarily bad. Apart from the very rare cases > where C2 can't determine the superclass to search for at compile time, > this code is only used for generating stubs, and it seemed to me > ridiculous to have stubs calling other stubs. > > I've followed the guidance from @iwanowww not to obsess too much about > the performance of C1-compiled secondary supers lookups, and to prefer > simplicity over absolute performance. Nonetheless, this is a > complicated patch that touches many areas. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Minor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19989/files - new: https://git.openjdk.org/jdk/pull/19989/files/e9581019..329f487a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=08-09 Stats: 6 lines in 1 file changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19989.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989 PR: https://git.openjdk.org/jdk/pull/19989 From alanb at openjdk.org Mon Jul 29 10:42:30 2024 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 29 Jul 2024 10:42:30 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v2] In-Reply-To: <jJBqPY8Oq4nyz90eWb6Nn_Kfh7g8F9yUI14S5rLHq4E=.eaac8272-9619-40fd-b804-c1933c340545@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <XCGDY44bragDfsG7U5CYPgBFQRoMVtHdn13sWpRdwIE=.17f3be32-1b73-4958-a4f2-ce548299f29a@github.com> <jJBqPY8Oq4nyz90eWb6Nn_Kfh7g8F9yUI14S5rLHq4E=.eaac8272-9619-40fd-b804-c1933c340545@github.com> Message-ID: <BBJW5dJJzbgBxIu0KJaAu-ArQ_9YWfya1rFY04oH7zA=.02bdd406-a580-48d1-b6c0-abfd9a91d998@github.com> On Mon, 29 Jul 2024 10:09:36 GMT, David Holmes <dholmes at openjdk.org> wrote: > Can we not just preload the problematic class so that it won't happen during the transition? It's potentially dozens of classes as it's everything to support the StackWalker API, stream pipelines, and printing code. This diagnostic option is effectively incompatible with the agents that enable the CFLH event. It has other issues and is really a left over from early development. It has been removed in the loom repo, in favour of better JFR events. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20373#issuecomment-2255588034 From jwtang at openjdk.org Mon Jul 29 11:21:31 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Mon, 29 Jul 2024 11:21:31 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v2] In-Reply-To: <jJBqPY8Oq4nyz90eWb6Nn_Kfh7g8F9yUI14S5rLHq4E=.eaac8272-9619-40fd-b804-c1933c340545@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <XCGDY44bragDfsG7U5CYPgBFQRoMVtHdn13sWpRdwIE=.17f3be32-1b73-4958-a4f2-ce548299f29a@github.com> <jJBqPY8Oq4nyz90eWb6Nn_Kfh7g8F9yUI14S5rLHq4E=.eaac8272-9619-40fd-b804-c1933c340545@github.com> Message-ID: <U3MujQK4Xvw97Zp57AuU4c0qA42igxV8RRX8oCqeaeI=.ea333626-7059-42b5-a802-e0c5dd550328@github.com> On Mon, 29 Jul 2024 10:01:56 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Jiawei Tang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option > > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/TestPinCaseWithTrace.java line 36: > >> 34: * @run main/othervm/timeout=100 -Djdk.tracePinnedThreads=full TestPinCaseWithTrace >> 35: * @run main/othervm/timeout=100 -javaagent:TestPinCaseWithTrace.jar TestPinCaseWithTrace >> 36: * @run main/othervm/timeout=100 -Djdk.tracePinnedThreads=full -javaagent:TestPinCaseWithTrace.jar TestPinCaseWithTrace > > Unclear why we need the three variants. > > Also where does the timeout value come from? How long does the test take to run? I will remove the first two variants. The task will not end because of dead lock in vm. But if the issue is fixed, it can finish in 1s. Considering the differences in platforms and jdk mode(release/debug), I extended the time limit. I'm not sure if I should set this timeout value. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1695042090 From jwtang at openjdk.org Mon Jul 29 11:30:08 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Mon, 29 Jul 2024 11:30:08 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v3] In-Reply-To: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> Message-ID: <Pq3717t6CcEZuvhb8V34_CyTW6eHdVtPs_u_nGRwib8=.2883d513-24b6-4d38-ae4d-90b0e78e7eac@github.com> > I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing. Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision: changes according to reviewers' advice ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20373/files - new: https://git.openjdk.org/jdk/pull/20373/files/00ec5887..723b1ec6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=01-02 Stats: 33 lines in 2 files changed: 1 ins; 26 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20373/head:pull/20373 PR: https://git.openjdk.org/jdk/pull/20373 From jwtang at openjdk.org Mon Jul 29 11:30:09 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Mon, 29 Jul 2024 11:30:09 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v2] In-Reply-To: <1XAlnt_F0byj_fkmh5Ggy2yPINoYi9wbfceo7HRbhfM=.ac35c579-282b-4f3f-8dac-510fde9a1c8a@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <XCGDY44bragDfsG7U5CYPgBFQRoMVtHdn13sWpRdwIE=.17f3be32-1b73-4958-a4f2-ce548299f29a@github.com> <1XAlnt_F0byj_fkmh5Ggy2yPINoYi9wbfceo7HRbhfM=.ac35c579-282b-4f3f-8dac-510fde9a1c8a@github.com> Message-ID: <YHMhVuIxVmPr6ulleAU64ZncRv1NGdbky8BIGo0tMT8=.77228095-43d5-445a-ac95-f4dba7f3b814@github.com> On Mon, 29 Jul 2024 09:53:40 GMT, Alan Bateman <alanb at openjdk.org> wrote: >> Jiawei Tang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option > > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/TestPinCaseWithTrace.java line 2: > >> 1: /* >> 2: * Copyright (c) 2024, 2024, Oracle and/or its affiliates. All rights reserved. > > I assume you didn't mean to include a date range on a new test. Change it. > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/TestPinCaseWithTrace.java line 66: > >> 64: System.out.println("call native: " + nativeFuncPin(1)); >> 65: }); >> 66: } > > Does this really need to use a custom scheduler? If not, running the test with -Djdk.virtualThreadScheduler.maxPoolSize=1 would be simpler. If you really need a custom scheduler, the test can use jdk.test.lib.thread.VThreadScheduler. Also to create a pinning scenario it can use jdk.test.lib.thread.VThreadPinner to avoid needing to add JNI code. You'll see examples of both in other tests. Thanks, now I use `-Djdk.virtualThreadScheduler.maxPoolSize=1` instead. > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/TestPinCaseWithTrace.java line 70: > >> 68: static int native2Java(int b) { >> 69: try { >> 70: Thread.sleep(500); // try yield, will pin, javaagent+tracePinnedThreads will crash here (because of the class `PinnedThreadPrinter`) > > As noted in the JBS issue, -Djdk.tracePinnedThreads has been very problematic and has been removed in the loom repo as part of the object monitor changes. I have read the code in loom and this issue can be resolved by using JFR event instead. But I hope this could be fixed since using javaagent is very common in java application. The root cause is that no new class should be use after the vthread is pinned, since a agent can change the class bytecode and need to use `JvmtiVTMSTransitionDisabler` when transforming class. However, this vthread is in VTMS, it cannot jump out the loop. Using `-Djdk.tracePinnedThreads=full` will use the class `PinnedThreadPrinter` so we end in a deadlock. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1695049376 PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1695050299 PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1695050607 From jwtang at openjdk.org Mon Jul 29 11:30:09 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Mon, 29 Jul 2024 11:30:09 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v2] In-Reply-To: <jJBqPY8Oq4nyz90eWb6Nn_Kfh7g8F9yUI14S5rLHq4E=.eaac8272-9619-40fd-b804-c1933c340545@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <XCGDY44bragDfsG7U5CYPgBFQRoMVtHdn13sWpRdwIE=.17f3be32-1b73-4958-a4f2-ce548299f29a@github.com> <jJBqPY8Oq4nyz90eWb6Nn_Kfh7g8F9yUI14S5rLHq4E=.eaac8272-9619-40fd-b804-c1933c340545@github.com> Message-ID: <HxNigfeSkock1o0ZTmxR02djs0jGr5Hd9wxtB3Nct18=.ae254bc8-fde4-457b-b49c-ee5268e9c68e@github.com> On Mon, 29 Jul 2024 10:00:23 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Jiawei Tang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option > > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/TestPinCaseWithTrace.java line 2: > >> 1: /* >> 2: * Copyright (c) 2024, 2024, Oracle and/or its affiliates. All rights reserved. > > Please only use a single year here. Change it. > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/TestPinCaseWithTrace.java line 62: > >> 60: public static void main(String[] args) throws Exception{ >> 61: ExecutorService scheduler = Executors.newFixedThreadPool(1); >> 62: Thread.Builder builder = TestPinCaseWithTrace.virtualThreadBuilder(scheduler); > > Can you not just create a Virtual Thread directly rather than defining a single-threaded executor?? Now I use `-Djdk.virtualThreadScheduler.maxPoolSize=1` instead. > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/libPinJNI.c line 28: > >> 26: JNIEXPORT jint JNICALL >> 27: Java_TestPinCaseWithTrace_nativeFuncPin(JNIEnv* env, jclass klass, jint x) { >> 28: jmethodID nativeBaz = (*env)->GetStaticMethodID(env, klass, "native2Java", "(I)I"); > > Suggestion: just use `m` rather than `nativeBaz`. Change it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1695050956 PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1695052060 PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1695052254 From jwtang at openjdk.org Mon Jul 29 11:33:31 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Mon, 29 Jul 2024 11:33:31 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v2] In-Reply-To: <BBJW5dJJzbgBxIu0KJaAu-ArQ_9YWfya1rFY04oH7zA=.02bdd406-a580-48d1-b6c0-abfd9a91d998@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <XCGDY44bragDfsG7U5CYPgBFQRoMVtHdn13sWpRdwIE=.17f3be32-1b73-4958-a4f2-ce548299f29a@github.com> <jJBqPY8Oq4nyz90eWb6Nn_Kfh7g8F9yUI14S5rLHq4E=.eaac8272-9619-40fd-b804-c1933c340545@github.com> <BBJW5dJJzbgBxIu0KJaAu-ArQ_9YWfya1rFY04oH7zA=.02bdd406-a580-48d1-b6c0-abfd9a91d998@github.com> Message-ID: <sCKSd7Snv3SDNxlbJgaa7Cf8HZMfagRHZFi9dtMxJhI=.930e33ee-3658-429e-b114-b671aace8585@github.com> On Mon, 29 Jul 2024 10:40:17 GMT, Alan Bateman <alanb at openjdk.org> wrote: > Can we not just preload the problematic class so that it won't happen during the transition? I think if a new agent are attached into the running progress, the vm may still crash? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20373#issuecomment-2255688446 From alanb at openjdk.org Mon Jul 29 12:37:35 2024 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 29 Jul 2024 12:37:35 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v2] In-Reply-To: <YHMhVuIxVmPr6ulleAU64ZncRv1NGdbky8BIGo0tMT8=.77228095-43d5-445a-ac95-f4dba7f3b814@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <XCGDY44bragDfsG7U5CYPgBFQRoMVtHdn13sWpRdwIE=.17f3be32-1b73-4958-a4f2-ce548299f29a@github.com> <1XAlnt_F0byj_fkmh5Ggy2yPINoYi9wbfceo7HRbhfM=.ac35c579-282b-4f3f-8dac-510fde9a1c8a@github.com> <YHMhVuIxVmPr6ulleAU64ZncRv1NGdbky8BIGo0tMT8=.77228095-43d5-445a-ac95-f4dba7f3b814@github.com> Message-ID: <K54bM5EjNkz1hhK0_BYCAJzb9rLgv7QzXVfEaAympok=.3c958e93-93ca-4b06-bb70-d1a6d441764a@github.com> On Mon, 29 Jul 2024 11:26:06 GMT, Jiawei Tang <jwtang at openjdk.org> wrote: >> test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/TestPinCaseWithTrace.java line 70: >> >>> 68: static int native2Java(int b) { >>> 69: try { >>> 70: Thread.sleep(500); // try yield, will pin, javaagent+tracePinnedThreads will crash here (because of the class `PinnedThreadPrinter`) >> >> As noted in the JBS issue, -Djdk.tracePinnedThreads has been very problematic and has been removed in the loom repo as part of the object monitor changes. > > I have read the code in loom and this issue can be resolved by using JFR event instead. But I hope this could be fixed since using javaagent is very common in java application. The root cause is that no new class should be use after the vthread is pinned, since a agent can change the class bytecode and need to use `JvmtiVTMSTransitionDisabler` when transforming class. However, this vthread is in VTMS, it cannot jump out the loop. > > Using `-Djdk.tracePinnedThreads=full` will use the class `PinnedThreadPrinter` so we end in a deadlock. I have no objection to fixing JVMTI, I'm just pointing out that -Djdk.tracePinnedThreads has been very problematic and many other reasons so it will be proposed to be removed when we bring the changes to main line. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1695136030 From mbaesken at openjdk.org Mon Jul 29 12:41:34 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 29 Jul 2024 12:41:34 GMT Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' [v4] In-Reply-To: <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com> References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com> <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com> Message-ID: <H5Og5spL1v62N3FGj_p_7-3O_vboWyRJSrMUO5Ygpjs=.0c486cf3-d195-4938-96c3-a841d9172cbc@github.com> On Thu, 25 Jul 2024 13:42:48 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote: >> When running with ubsan - enabled binaries, some tests trigger the following report : >> >> src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' >> #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91 >> #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106 >> #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157 >> #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63 >> #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85 >> #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319 >> #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233 >> #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199 >> >> Seems in case of (at least) class SmallRegisterMap we miss handling nullptr . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > add patch of Kim Barrett Any comments on the change ? Kim / Richard what you think ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20296#issuecomment-2255822536 From rrich at openjdk.org Mon Jul 29 12:56:33 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 29 Jul 2024 12:56:33 GMT Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' [v4] In-Reply-To: <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com> References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com> <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com> Message-ID: <MH8_MNPKmpp11HlTW0L42ND2A_PanP7cpOmNqDSPLzo=.732541b1-bca6-4d04-a323-2dad31c63b96@github.com> On Thu, 25 Jul 2024 13:42:48 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote: >> When running with ubsan - enabled binaries, some tests trigger the following report : >> >> src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' >> #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91 >> #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106 >> #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157 >> #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63 >> #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85 >> #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319 >> #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233 >> #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199 >> >> Seems in case of (at least) class SmallRegisterMap we miss handling nullptr . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > add patch of Kim Barrett The change looks good to me. Thanks, Richard. I think you should add Kim as contributor (see [here](https://wiki.openjdk.org/display/SKARA/Pull+Request+Commands#PullRequestCommands-/contributor)). ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20296#pullrequestreview-2204849049 PR Comment: https://git.openjdk.org/jdk/pull/20296#issuecomment-2255855359 From mbaesken at openjdk.org Mon Jul 29 13:05:32 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 29 Jul 2024 13:05:32 GMT Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' [v4] In-Reply-To: <MH8_MNPKmpp11HlTW0L42ND2A_PanP7cpOmNqDSPLzo=.732541b1-bca6-4d04-a323-2dad31c63b96@github.com> References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com> <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com> <MH8_MNPKmpp11HlTW0L42ND2A_PanP7cpOmNqDSPLzo=.732541b1-bca6-4d04-a323-2dad31c63b96@github.com> Message-ID: <7OzXtBJVimg_9dlx5j8WOSDBU7JDIUbGwIu8Iag6s2A=.309a9594-c0c6-4ab0-9878-e4ac67749ee0@github.com> On Mon, 29 Jul 2024 12:53:53 GMT, Richard Reingruber <rrich at openjdk.org> wrote: > I think you should add Kim as contributor (see [here](https://wiki.openjdk.org/display/SKARA/Pull+Request+Commands#PullRequestCommands-/contributor)). Makes sense, hope we find another second reviewer (guess as contributor, Kim cannot review) . ------------- PR Comment: https://git.openjdk.org/jdk/pull/20296#issuecomment-2255879699 From aph at openjdk.org Mon Jul 29 13:10:36 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 29 Jul 2024 13:10:36 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v10] In-Reply-To: <P2b0tKeBww7FoIbgYj9vjJJfEI5sgo2nNsh7g61lynY=.6a90c2e6-d73f-4282-bcd9-157d56253d8c@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <P2b0tKeBww7FoIbgYj9vjJJfEI5sgo2nNsh7g61lynY=.6a90c2e6-d73f-4282-bcd9-157d56253d8c@github.com> Message-ID: <iYkxv36NxYN4Q4gvJTmC4c9kJtl0CnY1VpRZS9utBU8=.6dd575df-d3f7-4151-99dc-119879f4ac23@github.com> On Mon, 29 Jul 2024 10:35:07 GMT, Andrew Haley <aph at openjdk.org> wrote: >> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't known at compile >> time. >> >> HotSpot shared runtime >> ---------------------- >> >> Building hashed secondary tables is now unconditional. It takes very >> little time, and now that the shared runtime always has the tables, it >> might as well take advantage of them. The shared code is easier to >> follow now, I think. >> >> There might be a performance issue with x86-64 in that we build >> HotSpot for a default x86-64 target that does not support popcount. >> This means that HotSpot C++ runtime on x86 always uses a software >> emulation for popcount, even though the vast majority of machines made >> for the past 20 years can do popcount in a single instruction. It >> wouldn't be terribly hard to do something about that. >> >> Having said that, the software popcount is really not bad. >> >> x86 >> --- >> >> x86 is rather tricky, because we still support >> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as >> well as 32- and 64-bit ports. There's some further complication in >> that only `RCX` can be used as a shift count, so there's some register >> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp >> rather gnarly, with multiple levels of conditionals at compile time >> and runtime. >> >> AArch64 >> ------- >> >> AArch64 is considerably more straightforward. We always have a >> popcount instruction and (thankfully) no 32-bit code to worry about. >> >> Generally >> --------- >> >> I would dearly love simply to rip out the "old" secondary supers cache >> support, but I've left it in just in case someone has a performance >> regression. >> >> The versions of `MacroAssembler::lookup_secondary_supers_table` that >> work with variable superclasses don't take a fixed set of temp >> registers, and neither do they call out to to a slow path subroutine. >> Instead, the slow patch is expanded inline. >> >> I don't think this is necessarily bad. Apart from the very rare cases >> where C2 can't determine the superclass to search for at compile time, >> this code is only used for generating stubs, and it seemed to me >> ridiculous to have stubs calling other stubs. >> >> I've followed the guidance from @iwanowww not to obsess too much about >> the performance of C1-compiled secondary supers lookups, and to prefer >> simplicity over absolute performance. Nonetheless, this i... > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Minor I promise that if you say you really want this change I will do it, but there is a cost I want to make clear. Adding the full-bitmap test at the start of the fast-path code increases the execution time in the case of `SecondarySupersLookup.testPositive03` from 5 cycles/op to 5.5 cycles/op on average. It also adds at least 5 bytes (8 bytes for AArch64) to the inline code size, depending on how you do it. In contrast, my proposed fix makes the invariant `pocount(bitmap) >= secondary_supers.length` truly invariant, and changes the full-bitmap test at the start of the slow path thusly to a void a performance regression with a nearly-full bitmap: --- a/src/hotspot/cpu/x86/macroAssembler_x86.cpp +++ b/src/hotspot/cpu/x86/macroAssembler_x86.cpp @@ -5212,8 +5212,8 @@ void MacroAssembler::lookup_secondary_supers_table_slow_path(Register r_super_kl // The bitmap is full to bursting. // Implicit invariant: BITMAP_FULL implies (length > 0) assert(Klass::SECONDARY_SUPERS_BITMAP_FULL == ~uintx(0), ""); - cmpq(r_bitmap, (int32_t)-1); // sign-extends immediate to 64-bit value - jcc(Assembler::equal, L_huge); + cmpq(r_array_length, (int32_t)SECONDARY_SUPERS_TABLE_SIZE - 2); + jcc(Assembler::greater, L_huge); @@ -344,11 +370,12 @@ uintx Klass::hash_secondary_supers(Array<Klass*>* secondaries, bool rewrite) { return uintx(1) << hash_slot; } --- a/src/hotspot/share/oops/klass.cpp +++ b/src/hotspot/share/oops/klass.cpp @@ -344,11 +370,12 @@ uintx Klass::hash_secondary_supers(Array<Klass*>* secondaries, bool rewrite) { return uintx(1) << hash_slot; } - // For performance reasons we don't use a hashed table unless there - // are at least two empty slots in it. If there were only one empty - // slot it'd take a long time to create the table and the resulting - // search would be no faster than linear probing. - if (length > SECONDARY_SUPERS_TABLE_SIZE - 2) { + // Invariant: _secondary_supers.length >= population_count(_secondary_supers_bitmap) + + // Don't attempt to hash a table that's completely full, because in + // the case of an absent interface linear probing would not + // terminate. + if (length >= SECONDARY_SUPERS_TABLE_SIZE) { return SECONDARY_SUPERS_BITMAP_FULL; } So, what I'm suggesting is a bit smaller, a bit faster, and less work for me. On the other hand you say > It doesn't look right when the code treats secondary_supers as a table irrespective of whether it was hashed or not. IMO > it unnecessarily complicates things and may continue to be a source of bugs. I agree about the "It doesn't look right" part, but I'm not sure I agree about the cause of the bug. IMO, that was the failure to make the `pocount(bitmap) >= secondary_supers.length` truly invariant. Your call. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2255892483 From djelinski at openjdk.org Mon Jul 29 13:28:32 2024 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Mon, 29 Jul 2024 13:28:32 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string [v2] In-Reply-To: <sZyCrr8Ti9Ad6EiJrSO_1fvCYsmLlrgHgFACt_790_Q=.ac6ceba1-8ffd-46ec-9e30-9ed3e6ad3cf4@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> <sZyCrr8Ti9Ad6EiJrSO_1fvCYsmLlrgHgFACt_790_Q=.ac6ceba1-8ffd-46ec-9e30-9ed3e6ad3cf4@github.com> Message-ID: <fyVtacPVdwpKbRQIu8icJ08uNq5MW4AqIN7V8zoeemU=.83792ea3-4c65-49e8-9f9b-bacecad37115@github.com> On Mon, 29 Jul 2024 09:54:09 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence. >> >> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this. >> >> Testing: >> - new gtest exercises the truncation code with the different possibilities for bad truncation >> - tiers 1-3 sanity testing >> >> Thanks. > > David Holmes has updated the pull request incrementally with three additional commits since the last revision: > > - Fix logic for 4th byte of 6. > - Fix logic error and typo > - Ensure the buffer is > 5 bytes src/hotspot/share/utilities/exceptions.cpp line 275: > 273: // we may also have a truncated UTF-8 sequence. In such cases we need to fix the buffer so the UTF-8 > 274: // sequence is valid. > 275: if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) { Do we need to check if `strlen(msg) == max_msg_size - 1`? If strlen is shorter, the bytes between the null terminator and max_msg_size are undefined, which might trigger an assertion while truncating. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1695241148 From aph at openjdk.org Mon Jul 29 13:48:35 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 29 Jul 2024 13:48:35 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v10] In-Reply-To: <iYkxv36NxYN4Q4gvJTmC4c9kJtl0CnY1VpRZS9utBU8=.6dd575df-d3f7-4151-99dc-119879f4ac23@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <P2b0tKeBww7FoIbgYj9vjJJfEI5sgo2nNsh7g61lynY=.6a90c2e6-d73f-4282-bcd9-157d56253d8c@github.com> <iYkxv36NxYN4Q4gvJTmC4c9kJtl0CnY1VpRZS9utBU8=.6dd575df-d3f7-4151-99dc-119879f4ac23@github.com> Message-ID: <YmFTuQlVBScM1KWJiChk8pvoC1cTJs2fZJUym_ltH4E=.cac5810b-d345-49f7-9d84-419fc0440960@github.com> On Mon, 29 Jul 2024 13:07:48 GMT, Andrew Haley <aph at openjdk.org> wrote: > Adding the full-bitmap test at the start of the fast-path code increases the execution time... Oh, and I think it'll slow down the slow path a little bit too, but that's much less important. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2255994481 From mli at openjdk.org Mon Jul 29 14:49:34 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 29 Jul 2024 14:49:34 GMT Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v6] In-Reply-To: <D-LqktQwsa-Mg1zpbQvnI-lKEfL8pdbhKgejdG14OmI=.8b0cd0b7-7527-4281-993a-9f8f7a571bf1@github.com> References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com> <0NpNq_wNl-qus6kEr_6J7liSQXXYdjybbWQWDJPGPmQ=.8ba0ea43-2bc7-4f01-afee-adb4a43da29c@github.com> <D-LqktQwsa-Mg1zpbQvnI-lKEfL8pdbhKgejdG14OmI=.8b0cd0b7-7527-4281-993a-9f8f7a571bf1@github.com> Message-ID: <Ua01ftmAzg2WJZFBwP2F4sjJRhxhRtllepahph5oOhM=.304a6fca-2c81-44ae-b699-032981218177@github.com> On Mon, 29 Jul 2024 05:00:13 GMT, Fei Yang <fyang at openjdk.org> wrote: > Hi, will take a look. Thanks! > BTW: Have you resolved the performance issue of base64 decode instrinsic? It only bring regression in MIME cases, I did some investigation, but not found root cause yet. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19973#issuecomment-2256144590 From asmehra at openjdk.org Mon Jul 29 14:49:48 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Mon, 29 Jul 2024 14:49:48 GMT Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic [v2] In-Reply-To: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com> References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com> Message-ID: <5fyuvwoHRU_EUT2tvUsWwzCjd7dazKHMiL0rGWW8jVU=.fed6e33a-7a22-4b4c-950f-d19c18ee0eaf@github.com> > Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031) > > Testing: > test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java > test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision: Address review comments by Thomas S. Signed-off-by: Ashutosh Mehra <asmehra at redhat.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20304/files - new: https://git.openjdk.org/jdk/pull/20304/files/1ffbd696..008ac6b9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20304&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20304&range=00-01 Stats: 91 lines in 4 files changed: 31 ins; 30 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/20304.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20304/head:pull/20304 PR: https://git.openjdk.org/jdk/pull/20304 From asmehra at openjdk.org Mon Jul 29 16:05:32 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Mon, 29 Jul 2024 16:05:32 GMT Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic In-Reply-To: <yqXqOJCnOBQToVnGiTvMv9SRVECCZuArbWqfiVEj6VE=.eb63b66f-63a5-4c51-8757-87f2694afd98@github.com> References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com> <yqXqOJCnOBQToVnGiTvMv9SRVECCZuArbWqfiVEj6VE=.eb63b66f-63a5-4c51-8757-87f2694afd98@github.com> Message-ID: <9i2-oCVJ5XhCtD5vwX7sgpjg4kbUp-BsKpakhYgE28Q=.5013ceda-dd76-4417-9c81-a8bb3898483d@github.com> On Wed, 24 Jul 2024 10:45:05 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031) >> >> Testing: >> test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java >> test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java > > I plan to look at this later this week. @tstuefe I have added a patch to address your review comments. > or just write a small wrapper class holding a size_t vector and taking care of the copying. Using a wrapper class is a good idea. I have added `ArenaTagsCounter` for that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20304#issuecomment-2256315773 From asmehra at openjdk.org Mon Jul 29 16:05:33 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Mon, 29 Jul 2024 16:05:33 GMT Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic [v2] In-Reply-To: <x1uZAfQc-7Dvbhv5cy7wm7CGyTGWmc8oOCs23DrnXVI=.85639be2-68c2-4c49-9ac2-12cef799f77c@github.com> References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com> <ej5ON8iDbUMsORwZNuLzDbXERpzGJde7q_hd50vKPGo=.34c0d39e-7a85-49a7-9d10-363a9800cc4d@github.com> <1zW4OT5fJqNOIVmEJzaa75P1pkOtTDCc5o3As0Cbrfg=.37b21e54-fb16-4015-a910-40ead48c94b3@github.com> <x1uZAfQc-7Dvbhv5cy7wm7CGyTGWmc8oOCs23DrnXVI=.85639be2-68c2-4c49-9ac2-12cef799f77c@github.com> Message-ID: <UPINfxkYkLCDl7Pd-3g4wRe-99GF8Uete1tWwqJcXn4=.94c80dca-5867-42d5-b5bd-e06e859a4045@github.com> On Sat, 27 Jul 2024 05:44:14 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> What do you mean by x macro? Do you have an example that shows the use of x macro? > > You use them already in your patch. > > E.g. > > > #define XX(name, whatever, desc) st->print_cr(" " LEGEND_KEY_FMT ": " #name #desc > DO_ARENA_TAG(XX) > #undef XX > > > Admittedly, it is not a lot less code than the for loop. Up to you. I will keep the loop if you don't have strong preference for the macro usage here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1695486252 From kbarrett at openjdk.org Mon Jul 29 18:17:31 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 29 Jul 2024 18:17:31 GMT Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' [v4] In-Reply-To: <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com> References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com> <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com> Message-ID: <5teiNpibz_Cv-qbIMPkPkoti1tG20tptXcVpaOByWZM=.7f2c67f0-c3f8-4121-b476-2232dc5ab891@github.com> On Thu, 25 Jul 2024 13:42:48 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote: >> When running with ubsan - enabled binaries, some tests trigger the following report : >> >> src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' >> #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91 >> #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106 >> #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157 >> #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63 >> #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85 >> #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319 >> #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233 >> #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199 >> >> Seems in case of (at least) class SmallRegisterMap we miss handling nullptr . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > add patch of Kim Barrett It's annoying that we have all these very similar platform-specific classes of non-trivial size. It seems like it ought to be possible to refactor to reduce the duplication. But it might not be worth the trouble, and certainly not as part of this change. The additions @MBaesken has made to my prototype look good to me. Looks good except missing some copyright updates. I think when incorporating something like my suggested changes, the PR author can be considered to have reviewed them. The goal is to have some number of people look over the code and approve all the pieces (an author and 2 reviewers). At least, that's my recollection of some prior discussions of situations like this. But I agree it can feel a little incestuous having 2 authors who are playing a reviewer roll for the other's work, and especially when there's some back and forth on it. ------------- Changes requested by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20296#pullrequestreview-2205667735 From szaldana at openjdk.org Mon Jul 29 19:01:02 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 29 Jul 2024 19:01:02 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v15] In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> Message-ID: <oVnMKyMgEthsHECfO9kDZrJ0XEzHZ40cx_x2HHO1ujw=.2ec8a7f1-3e4b-4ad9-a0b9-35008e3abc9f@github.com> > Hi all, > > This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. > > This PR addresses the following diagnostic commands: > - [x] Compiler.perfmap > - [x] GC.heap_dump > - [x] System.dump_map > - [x] Thread.dump_to_file > - [x] VM.cds > > Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). > > I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, > > > filename (Optional) Name of the file to which the flight recording data is > written when the recording is stopped. If no filename is given, a > filename is generated from the PID and the current date and is > placed in the directory where the process was started. The > filename may also be a directory in which case, the filename is > generated from the PID and the current date in the specified > directory. (STRING, no default value) > > Note: If a filename is given, '%p' in the filename will be > replaced by the PID, and '%t' will be replaced by the time in > 'yyyy_MM_dd_HH_mm_ss' format. > > > Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. > > Testing: > > - [x] Added test case passes. > - [x] Modified existing VM.cds tests to also check for `%p` filenames. > > Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). > > Cheers, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: Reverting FileArgument and using char* instead ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20198/files - new: https://git.openjdk.org/jdk/pull/20198/files/71d3d140..2cccc9b4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=13-14 Stats: 68 lines in 5 files changed: 12 ins; 41 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/20198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198 PR: https://git.openjdk.org/jdk/pull/20198 From szaldana at openjdk.org Mon Jul 29 19:05:07 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 29 Jul 2024 19:05:07 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v16] In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> Message-ID: <z3eAxxssepMc8ha_D4nbGErIPB6OCb6CfNfi94WxA5c=.aef9fa4d-2898-44bb-a94e-c0e68b5b606a@github.com> > Hi all, > > This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. > > This PR addresses the following diagnostic commands: > - [x] Compiler.perfmap > - [x] GC.heap_dump > - [x] System.dump_map > - [x] Thread.dump_to_file > - [x] VM.cds > > Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). > > I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, > > > filename (Optional) Name of the file to which the flight recording data is > written when the recording is stopped. If no filename is given, a > filename is generated from the PID and the current date and is > placed in the directory where the process was started. The > filename may also be a directory in which case, the filename is > generated from the PID and the current date in the specified > directory. (STRING, no default value) > > Note: If a filename is given, '%p' in the filename will be > replaced by the PID, and '%t' will be replaced by the time in > 'yyyy_MM_dd_HH_mm_ss' format. > > > Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. > > Testing: > > - [x] Added test case passes. > - [x] Modified existing VM.cds tests to also check for `%p` filenames. > > Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). > > Cheers, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: Reverting some lingering changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20198/files - new: https://git.openjdk.org/jdk/pull/20198/files/2cccc9b4..7f22495a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=14-15 Stats: 6 lines in 2 files changed: 2 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198 PR: https://git.openjdk.org/jdk/pull/20198 From szaldana at openjdk.org Mon Jul 29 19:08:17 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 29 Jul 2024 19:08:17 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v17] In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> Message-ID: <fyALiuRCBIdxuyUue80jejw0G9ChAh4Y0kn--lbTTHY=.ea8dd1ae-6cec-416c-976b-fe027732dd79@github.com> > Hi all, > > This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. > > This PR addresses the following diagnostic commands: > - [x] Compiler.perfmap > - [x] GC.heap_dump > - [x] System.dump_map > - [x] Thread.dump_to_file > - [x] VM.cds > > Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). > > I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, > > > filename (Optional) Name of the file to which the flight recording data is > written when the recording is stopped. If no filename is given, a > filename is generated from the PID and the current date and is > placed in the directory where the process was started. The > filename may also be a directory in which case, the filename is > generated from the PID and the current date in the specified > directory. (STRING, no default value) > > Note: If a filename is given, '%p' in the filename will be > replaced by the PID, and '%t' will be replaced by the time in > 'yyyy_MM_dd_HH_mm_ss' format. > > > Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. > > Testing: > > - [x] Added test case passes. > - [x] Modified existing VM.cds tests to also check for `%p` filenames. > > Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). > > Cheers, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: last lingering change ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20198/files - new: https://git.openjdk.org/jdk/pull/20198/files/7f22495a..ceb96eb9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=15-16 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198 PR: https://git.openjdk.org/jdk/pull/20198 From szaldana at openjdk.org Mon Jul 29 19:10:34 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 29 Jul 2024 19:10:34 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v14] In-Reply-To: <IUN4djOhB-ZaEz_AwNwcrQyyrVxdxqc-Th5nclLnPew=.1cc99b12-abc7-4556-b3c3-16219d7dec44@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <rKeKx8FnFBhN6mW30EXQDJcETtRcLimDZwu_Z3VQdyA=.5b821a7b-3753-4146-89bb-f5a64effc8c5@github.com> <X1PNORe3zCsQbH8DQhGBwUACW8f501e9_IBAmvUiUV8=.ec8e20b1-4b8e-4a92-8654-c2a8d1a9f94d@github.com> <d47EseDyodKwKOaWHIo_zDzOj44sQXvZCucr0V0vV8U=.9c847055-903b-45a8-b2ba-c4c27b15211e@github.com> <IUN4djOhB-ZaEz_AwNwcrQyyrVxdxqc-Th5nclLnPew=.1cc99b12-abc7-4556-b3c3-16219d7dec44@github.com> Message-ID: <vhdHYIvzAjRezuTlEt9NNWFNfEUoYS22Ab-ICYSzh2Q=.fe7b9958-034b-498e-87a5-878fd21339d7@github.com> On Mon, 29 Jul 2024 09:39:07 GMT, Kevin Walls <kevinw at openjdk.org> wrote: >>> One more thing that's troubling me. (Apologies it's now and not last week.) >>> >>> I was looking at the _filename.value().get() usage and finding it uncomfortable, compared to the previous simple _filename.value() 8-) Harder to remember and to read and understand. Maybe we can avoid the two accessors, it really is just a char*. >>> >>> These additional argument types look like part of the framework which never found an audience: MemorySizeArgument has one usage in CompilationMemoryStatisticDCmd, NanoTimeArgument looks unused -- so the two-accessor usage is only in once place until now? >>> >>> Adding FileArgument as another of these might be the wrong direction, as these classes are so almost redundant. >>> >>> What if we didn't add FileArgument, and kept using <char*> for _filename args/opts: >>> >>> Then in DCmdArgument<char*>::parse_value(), recognise a "FILE" argument type and call Arguments::copy_expand_pid there, to set _value. >>> >>> Just seeing if we can cut down some of the complexity here, as Thomas mentioned, it is already very complex for what it is! >>> >>> (There is also the to_string method which seemed like it would be useful here, but it needs a buffer so is more complex than calling two accessors... Another thing that seems to part of the framework that was never much adopted.) >> >> IMHO for a functional addition we should follow the established pattern. Reworking the framework is certainly useful, but I would like it if we could get this done first (I intend to use it in other DCmds). >> >> And if we simplify this coding, we should think first about how to do this and what to solve. Things that come to mind: >> >> - overuse of template >> - The argument-type-by-template-division and the runtime "type" string argument (the third argument to DCmdArgument) seem redundant >> - the fact that we keep command metadata (which should be constant) together with command invocation data (values for arguments that are scoped to a single command invocation) in a single global structure, and then rewrite the latter every time we invoke a command. That is a strange concept and makes cleaning up temporary memory non-trivial >> - the fact that each new command takes a ton of boilerplate coding (Just see the many many repetitions in diagnosticCommand.cpp) >> - the fact that we use runtime-polymorphy, which is completely fine, but then all metadata information are "static". So in order to e.g. know how many arguments a command takes, you need to know the command c... > > Thanks Thomas @tstuefe - > > We're agreeing that some of this framework is overly complex, and that we aren't going to simplify the framework in this change. > > But the more we adopt the obscure parts of the framework, the the harder it will be to move away from it, so that's the reason for suggesting not creating the FileArgument class. Use the simpler parts of this machine, with some special cases where necessary, like a char* argument which happens to be used for a FILEname (an input filename which gets %p substitution). > > The logic I don't follow is: > Using this complex mechanism because it exists, when it only has one? actual usage. This seems to contradict the earlier max path len notes where it's suggested not to use a pattern established by about 140 other usages. > > Apologies Sonia for dragging this out, still really pleased to get this change happening. Hi @kevinjwalls, I reverted back to `char*` and modified parsing based on the type `FILE`. Really hope this reaches a consensus! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2256698014 From lmesnik at openjdk.org Mon Jul 29 19:29:35 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 29 Jul 2024 19:29:35 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v17] In-Reply-To: <fyALiuRCBIdxuyUue80jejw0G9ChAh4Y0kn--lbTTHY=.ea8dd1ae-6cec-416c-976b-fe027732dd79@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <fyALiuRCBIdxuyUue80jejw0G9ChAh4Y0kn--lbTTHY=.ea8dd1ae-6cec-416c-976b-fe027732dd79@github.com> Message-ID: <6gCx1ciA8eMVyM90LMRHr2YcKyTZuPCn8423YIT88aU=.35b32524-4c9d-4974-a67d-2eab1146c258@github.com> On Mon, 29 Jul 2024 19:08:17 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. >> >> This PR addresses the following diagnostic commands: >> - [x] Compiler.perfmap >> - [x] GC.heap_dump >> - [x] System.dump_map >> - [x] Thread.dump_to_file >> - [x] VM.cds >> >> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). >> >> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, >> >> >> filename (Optional) Name of the file to which the flight recording data is >> written when the recording is stopped. If no filename is given, a >> filename is generated from the PID and the current date and is >> placed in the directory where the process was started. The >> filename may also be a directory in which case, the filename is >> generated from the PID and the current date in the specified >> directory. (STRING, no default value) >> >> Note: If a filename is given, '%p' in the filename will be >> replaced by the PID, and '%t' will be replaced by the time in >> 'yyyy_MM_dd_HH_mm_ss' format. >> >> >> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. >> >> Testing: >> >> - [x] Added test case passes. >> - [x] Modified existing VM.cds tests to also check for `%p` filenames. >> >> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). >> >> Cheers, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > last lingering change Looks good also. The main goal of my request was to unify arguments paring. /using type FILE for 'char *' seems enough and no need to introduce new filename type for values. Thanks for fixing. ------------- Marked as reviewed by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20198#pullrequestreview-2205814072 From dholmes at openjdk.org Mon Jul 29 22:02:10 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 29 Jul 2024 22:02:10 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string [v3] In-Reply-To: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> Message-ID: <jaHgSfRfceQDzSZUwxoGgSGEPto-ev11jLPUPB29lLU=.0a882ecd-aa7f-4385-8979-4e5cb1a91a16@github.com> > Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence. > > The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this. > > Testing: > - new gtest exercises the truncation code with the different possibilities for bad truncation > - tiers 1-3 sanity testing > > Thanks. David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into 8325002-fthrow - Fix logic for 4th byte of 6. - Fix logic error and typo - Ensure the buffer is > 5 bytes - Fixed typo - 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20345/files - new: https://git.openjdk.org/jdk/pull/20345/files/02d636a6..794da826 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20345&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20345&range=01-02 Stats: 13051 lines in 379 files changed: 7576 ins; 4001 del; 1474 mod Patch: https://git.openjdk.org/jdk/pull/20345.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20345/head:pull/20345 PR: https://git.openjdk.org/jdk/pull/20345 From sspitsyn at openjdk.org Mon Jul 29 22:39:36 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 29 Jul 2024 22:39:36 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v3] In-Reply-To: <Pq3717t6CcEZuvhb8V34_CyTW6eHdVtPs_u_nGRwib8=.2883d513-24b6-4d38-ae4d-90b0e78e7eac@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <Pq3717t6CcEZuvhb8V34_CyTW6eHdVtPs_u_nGRwib8=.2883d513-24b6-4d38-ae4d-90b0e78e7eac@github.com> Message-ID: <A8wM3bukziE67E6BEq1fo8wM-fXbbvME_k4zoQTGtSY=.e967a17a-ec4d-4d38-83b5-e5578a05d2b6@github.com> On Mon, 29 Jul 2024 11:30:08 GMT, Jiawei Tang <jwtang at openjdk.org> wrote: >> I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing. > > Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision: > > changes according to reviewers' advice src/hotspot/share/prims/jvmtiExport.cpp line 970: > 968: if (_thread->is_in_any_VTMS_transition()) { > 969: return; // no events should be posted if thread is in any VTMS transition > 970: } This is not right place to fix it. This would be better: @@ -1091,8 +1091,8 @@ bool JvmtiExport::post_class_file_load_hook(Symbol* h_name, if (JvmtiEnv::get_phase() < JVMTI_PHASE_PRIMORDIAL) { return false; } - if (JavaThread::current()->is_in_tmp_VTMS_transition()) { - return false; // skip CFLH events in tmp VTMS transition + if (thread->is_in_any_VTMS_transition()) { + return; // no events should be posted if thread is in any VTMS transition } JvmtiClassFileLoadHookPoster poster(h_name, class_loader, Also, there is a check in the constructor `JvmtiClassFileLoadHookPoster()`: if (_thread->is_in_any_VTMS_transition()) { return; // no events should be posted if thread is in any VTMS transition } It is better to replace it with assert. With the right check in the `JvmtiExport::post_class_file_load_hook()` we should never call this constructor and `poster.post()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1696039799 From sspitsyn at openjdk.org Mon Jul 29 23:01:31 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 29 Jul 2024 23:01:31 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v3] In-Reply-To: <Pq3717t6CcEZuvhb8V34_CyTW6eHdVtPs_u_nGRwib8=.2883d513-24b6-4d38-ae4d-90b0e78e7eac@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <Pq3717t6CcEZuvhb8V34_CyTW6eHdVtPs_u_nGRwib8=.2883d513-24b6-4d38-ae4d-90b0e78e7eac@github.com> Message-ID: <yBWdB5qfG39speceqxReLp2SRTzlOk3bWt1rjGK83lA=.041249fc-d4b5-4c81-9dc8-4193d82e3a28@github.com> On Mon, 29 Jul 2024 11:30:08 GMT, Jiawei Tang <jwtang at openjdk.org> wrote: >> I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing. > > Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision: > > changes according to reviewers' advice Could you please, do some test renaming/refactoring? We have a number of `.c` JVMTI agents in the testbase. The plan is to convert them to `.cpp` in the future. Can you convert the test use .cpp as well? Also, I'm suggesting to name test directory/files as below: - TestPinCaseWithCFLH/TestPinCaseWithCFLH.java - TestPinCaseWithCFLH/libTestPinCaseWithCFLH.cpp ------------- PR Comment: https://git.openjdk.org/jdk/pull/20373#issuecomment-2257149690 From vlivanov at openjdk.org Mon Jul 29 23:12:36 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 29 Jul 2024 23:12:36 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v10] In-Reply-To: <P2b0tKeBww7FoIbgYj9vjJJfEI5sgo2nNsh7g61lynY=.6a90c2e6-d73f-4282-bcd9-157d56253d8c@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> <P2b0tKeBww7FoIbgYj9vjJJfEI5sgo2nNsh7g61lynY=.6a90c2e6-d73f-4282-bcd9-157d56253d8c@github.com> Message-ID: <fJXkUbnJ4S1vZ1qR-MS6oDQZ63DmFjqwEnM9zL0U_nI=.38ef928d-b916-45ad-b64f-b0a0bc3affe3@github.com> On Mon, 29 Jul 2024 10:35:07 GMT, Andrew Haley <aph at openjdk.org> wrote: >> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't known at compile >> time. >> >> HotSpot shared runtime >> ---------------------- >> >> Building hashed secondary tables is now unconditional. It takes very >> little time, and now that the shared runtime always has the tables, it >> might as well take advantage of them. The shared code is easier to >> follow now, I think. >> >> There might be a performance issue with x86-64 in that we build >> HotSpot for a default x86-64 target that does not support popcount. >> This means that HotSpot C++ runtime on x86 always uses a software >> emulation for popcount, even though the vast majority of machines made >> for the past 20 years can do popcount in a single instruction. It >> wouldn't be terribly hard to do something about that. >> >> Having said that, the software popcount is really not bad. >> >> x86 >> --- >> >> x86 is rather tricky, because we still support >> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as >> well as 32- and 64-bit ports. There's some further complication in >> that only `RCX` can be used as a shift count, so there's some register >> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp >> rather gnarly, with multiple levels of conditionals at compile time >> and runtime. >> >> AArch64 >> ------- >> >> AArch64 is considerably more straightforward. We always have a >> popcount instruction and (thankfully) no 32-bit code to worry about. >> >> Generally >> --------- >> >> I would dearly love simply to rip out the "old" secondary supers cache >> support, but I've left it in just in case someone has a performance >> regression. >> >> The versions of `MacroAssembler::lookup_secondary_supers_table` that >> work with variable superclasses don't take a fixed set of temp >> registers, and neither do they call out to to a slow path subroutine. >> Instead, the slow patch is expanded inline. >> >> I don't think this is necessarily bad. Apart from the very rare cases >> where C2 can't determine the superclass to search for at compile time, >> this code is only used for generating stubs, and it seemed to me >> ridiculous to have stubs calling other stubs. >> >> I've followed the guidance from @iwanowww not to obsess too much about >> the performance of C1-compiled secondary supers lookups, and to prefer >> simplicity over absolute performance. Nonetheless, this i... > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Minor Thanks for the numbers, Andrew. I agree that the fix you propose is simple and conservative which makes it very appealing. First of all, does the bug have to be addressed separately? It affects 23, so we need to backport the fix anyway. Also, I took a closer look at the implementation. A couple of observations: * as of now, there are 4 platforms which support secondary supers table lookup (so, all of them have to be adjusted if any platform-specific changes are needed); * there are existing implicit assumptions on `SECONDARY_SUPERS_BITMAP_FULL` (e.g., `secondary_supsers_bitmap == SECONDARY_SUPERS_BITMAP_FULL => secondary_supsers->length() > 0`. I thought about use cases for `SECONDARY_SUPERS_BITMAP_FULL` as a kill switch for table lookups, but don't see anything important (once secondary supers are hashed unconditionally). Let's do the fix you propose for now. We can reconsider the decision later if any interesting use cases show up. The downside is that there'll be more platform-specific code to touch, but that looks like a fair price to pay. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2257160673 From dholmes at openjdk.org Mon Jul 29 23:18:35 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 29 Jul 2024 23:18:35 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string [v2] In-Reply-To: <fyVtacPVdwpKbRQIu8icJ08uNq5MW4AqIN7V8zoeemU=.83792ea3-4c65-49e8-9f9b-bacecad37115@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> <sZyCrr8Ti9Ad6EiJrSO_1fvCYsmLlrgHgFACt_790_Q=.ac6ceba1-8ffd-46ec-9e30-9ed3e6ad3cf4@github.com> <fyVtacPVdwpKbRQIu8icJ08uNq5MW4AqIN7V8zoeemU=.83792ea3-4c65-49e8-9f9b-bacecad37115@github.com> Message-ID: <lh8krMqdhecXEkx5-8ndf88RVyKTpK32hkrQt9_POP0=.439c434c-0074-48ed-ad97-ce146f76c236@github.com> On Mon, 29 Jul 2024 13:26:12 GMT, Daniel Jeli?ski <djelinski at openjdk.org> wrote: >> David Holmes has updated the pull request incrementally with three additional commits since the last revision: >> >> - Fix logic for 4th byte of 6. >> - Fix logic error and typo >> - Ensure the buffer is > 5 bytes > > src/hotspot/share/utilities/exceptions.cpp line 275: > >> 273: // we may also have a truncated UTF-8 sequence. In such cases we need to fix the buffer so the UTF-8 >> 274: // sequence is valid. >> 275: if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) { > > Do we need to check if `strlen(msg) == max_msg_size - 1`? If strlen is shorter, the bytes between the null terminator and max_msg_size are undefined, which might trigger an assertion while truncating. In fact we know it may be shorter than `max_msg_size - 1` - that is what we get on macOS if the string is huge and exceeds `INT_MAX` causing `vsnprintf` to return -1. I originally had an assert that failed due to that. I need to fix this case as well. <sigh>. Good catch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1696067908 From nprasad at openjdk.org Mon Jul 29 23:27:48 2024 From: nprasad at openjdk.org (Neethu Prasad) Date: Mon, 29 Jul 2024 23:27:48 GMT Subject: RFR: 8334230: Optimize C2 classes layout [v2] In-Reply-To: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com> References: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com> Message-ID: <ROSSy11IiafrTR2QH6Ig_yr9f5iC6lmJ5IIKNlMOJ6k=.f1083873-ff85-4dc3-9266-304db685be05@github.com> > **Notes** > > Rearrange C2 class fields to optimize footprint. > > > **Verification** > > 1. Ran tier2_compiler, hotspot_compiler, tier 1 & tier 2 tests. > 2. Ran pahole on 64 bit machine post re-ordering and verified that there are no holes / reduction in total bytes. > > | Class | Size | Cachelines | Sum Members | Holes | Sum holes | Last Cacheline | Padding | > | ----- | ----- | ---------- | --------------- | ----- | ---------- | --------------- | -------- | > | ArrayPointer | 56 -> 48 | 1 -> 1 | 45 -> 0 | 2 -> 0 | 11 -> 0 | 56 bytes -> 48 | 0 -> 3 | > | CallJavaNode | 152 -> 144 | 3 -> 3 | 12 -> 0 | 1 -> 0 | 5 -> 0 | 24 bytes -> 16 | 7 -> 4 | > | C2Access | 56 -> 48 | 1-> 1 | 42 -> 0 | 1 -> 0 | 7 -> 0 | 56 bytes -> 48 | 7 -> 6 | > | VectorSet| 32 -> 24 | 1-> 1 | 24 -> 0 | 1 -> 0 | 8 -> 0 | 32 bytes -> 24 | 1 -> 1 | > > class ArrayPointer { > const class Node * _pointer; /* 0 8 */ > const class Node * _base; /* 8 8 */ > const jlong _constant_offset; /* 16 8 */ > const class Node * _int_offset; /* 24 8 */ > const class GrowableArray<Node*> * _other_offsets; /* 32 8 */ > const jint _int_offset_shift; /* 40 4 */ > const bool _is_valid; /* 44 1 */ > public: > > > /* size: 48, cachelines: 1, members: 7 */ > /* padding: 3 */ > /* last cacheline: 48 bytes */ > }; > > > > class CallJavaNode : public CallNode { > public: > > /* class CallNode <ancestor>; */ /* 0 128 */ > protected: > > /* --- cacheline 2 boundary (128 bytes) --- */ > class ciMethod * _method; /* 128 8 */ > bool _optimized_virtual; /* 136 1 */ > bool _method_handle_invoke; /* 137 1 */ > bool _override_symbolic_info; /* 138 1 */ > bool _arg_escape; /* 139 1 */ > public: > > protected: > > public: > > > /* size: 144, cachelines: 3, members: 6 */ > /* padding: 4 */ > /* last cacheline: 16 bytes */ > > /* BRAIN FART ALERT! 144 bytes != 12 (member bytes) + 0 (member bits) + 0 (byte holes) + 0 (bit holes), diff = 1024 bits */ > }; > > > > class C2Access : public StackObj { > public: > > /* class StackObj <ancestor>; */ /* 0 0 */ > > /* XXX last struct has 1 byte of padding */ > > int ()(void) * * _vptr.C2Access; /* 0 8 */ > protected: > > DecoratorSet _decorators; /* 8 ... Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: Undo constructor arg order change ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19861/files - new: https://git.openjdk.org/jdk/pull/19861/files/f5309ce4..668f8c66 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19861&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19861&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19861.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19861/head:pull/19861 PR: https://git.openjdk.org/jdk/pull/19861 From kbarrett at openjdk.org Mon Jul 29 23:28:33 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 29 Jul 2024 23:28:33 GMT Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' [v4] In-Reply-To: <5teiNpibz_Cv-qbIMPkPkoti1tG20tptXcVpaOByWZM=.7f2c67f0-c3f8-4121-b476-2232dc5ab891@github.com> References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com> <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com> <5teiNpibz_Cv-qbIMPkPkoti1tG20tptXcVpaOByWZM=.7f2c67f0-c3f8-4121-b476-2232dc5ab891@github.com> Message-ID: <EEVGWDmtho4fv8fWW9gkAczQpuPEsOWkIbGviHXAEtA=.5dc72232-d3ce-4b1e-a342-3065a99cbb38@github.com> On Mon, 29 Jul 2024 18:14:59 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: > I think when incorporating something like my suggested changes, the PR author can be considered to have reviewed them. The goal is to have some number of people look over the code and approve all the pieces (an author and 2 reviewers). At least, that's my recollection of some prior discussions of situations like this. But I agree it can feel a little incestuous having 2 authors who are playing a reviewer roll for the other's work, and especially when there's some back and forth on it. I did some asking around about this, and it seems my old info is stale and we should usually have reviewers who are distinct from any of the contributors. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20296#issuecomment-2257175455 From nprasad at openjdk.org Tue Jul 30 00:53:04 2024 From: nprasad at openjdk.org (Neethu Prasad) Date: Tue, 30 Jul 2024 00:53:04 GMT Subject: RFR: 8334230: Optimize C2 classes layout [v3] In-Reply-To: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com> References: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com> Message-ID: <9Xqj8lhtk5xtM-NHRl-GBFTZSzQcNw8yYo_ket5U0aM=.a752c907-e9c7-47cc-87c1-bf6bf0a3d642@github.com> > **Notes** > > Rearrange C2 class fields to optimize footprint. > > > **Verification** > > 1. Ran tier2_compiler, hotspot_compiler, tier 1 & tier 2 tests. > 2. Ran pahole on 64 bit machine post re-ordering and verified that there are no holes / reduction in total bytes. > > | Class | Size | Cachelines | Sum Members | Holes | Sum holes | Last Cacheline | Padding | > | ----- | ----- | ---------- | --------------- | ----- | ---------- | --------------- | -------- | > | ArrayPointer | 56 -> 48 | 1 -> 1 | 45 -> 0 | 2 -> 0 | 11 -> 0 | 56 bytes -> 48 | 0 -> 3 | > | CallJavaNode | 152 -> 144 | 3 -> 3 | 12 -> 0 | 1 -> 0 | 5 -> 0 | 24 bytes -> 16 | 7 -> 4 | > | C2Access | 56 -> 48 | 1-> 1 | 42 -> 0 | 1 -> 0 | 7 -> 0 | 56 bytes -> 48 | 7 -> 6 | > | VectorSet| 32 -> 24 | 1-> 1 | 24 -> 0 | 1 -> 0 | 8 -> 0 | 32 bytes -> 24 | 1 -> 1 | > > class ArrayPointer { > const class Node * _pointer; /* 0 8 */ > const class Node * _base; /* 8 8 */ > const jlong _constant_offset; /* 16 8 */ > const class Node * _int_offset; /* 24 8 */ > const class GrowableArray<Node*> * _other_offsets; /* 32 8 */ > const jint _int_offset_shift; /* 40 4 */ > const bool _is_valid; /* 44 1 */ > public: > > > /* size: 48, cachelines: 1, members: 7 */ > /* padding: 3 */ > /* last cacheline: 48 bytes */ > }; > > > > class CallJavaNode : public CallNode { > public: > > /* class CallNode <ancestor>; */ /* 0 128 */ > protected: > > /* --- cacheline 2 boundary (128 bytes) --- */ > class ciMethod * _method; /* 128 8 */ > bool _optimized_virtual; /* 136 1 */ > bool _method_handle_invoke; /* 137 1 */ > bool _override_symbolic_info; /* 138 1 */ > bool _arg_escape; /* 139 1 */ > public: > > protected: > > public: > > > /* size: 144, cachelines: 3, members: 6 */ > /* padding: 4 */ > /* last cacheline: 16 bytes */ > > /* BRAIN FART ALERT! 144 bytes != 12 (member bytes) + 0 (member bits) + 0 (byte holes) + 0 (bit holes), diff = 1024 bits */ > }; > > > > class C2Access : public StackObj { > public: > > /* class StackObj <ancestor>; */ /* 0 0 */ > > /* XXX last struct has 1 byte of padding */ > > int ()(void) * * _vptr.C2Access; /* 0 8 */ > protected: > > DecoratorSet _decorators; /* 8 ... Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: Address constructor order issue for C2OptAccess ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19861/files - new: https://git.openjdk.org/jdk/pull/19861/files/668f8c66..490c381e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19861&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19861&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19861.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19861/head:pull/19861 PR: https://git.openjdk.org/jdk/pull/19861 From amenkov at openjdk.org Tue Jul 30 02:04:59 2024 From: amenkov at openjdk.org (Alex Menkov) Date: Tue, 30 Jul 2024 02:04:59 GMT Subject: RFR: 8331015: Obsolete -XX:+UseNotificationThread Message-ID: <bLUGHCTJHF_LiwVu0wVJ2onQG6wD5_k_RnDstWMkkhw=.5b5d3af1-f406-41f4-b9b5-1137cab9fa8c@github.com> Obsolete UseNotificationThread flag which was deprecated in JDK 23. Testing: tier1..tier5 ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/20381/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20381&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331015 Stats: 41 lines in 7 files changed: 1 ins; 31 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/20381.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20381/head:pull/20381 PR: https://git.openjdk.org/jdk/pull/20381 From dholmes at openjdk.org Tue Jul 30 02:48:39 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 30 Jul 2024 02:48:39 GMT Subject: RFR: 8331015: Obsolete -XX:+UseNotificationThread In-Reply-To: <bLUGHCTJHF_LiwVu0wVJ2onQG6wD5_k_RnDstWMkkhw=.5b5d3af1-f406-41f4-b9b5-1137cab9fa8c@github.com> References: <bLUGHCTJHF_LiwVu0wVJ2onQG6wD5_k_RnDstWMkkhw=.5b5d3af1-f406-41f4-b9b5-1137cab9fa8c@github.com> Message-ID: <Xfw-QCRB3ppSexvfQI5AaZ0Pt1baX5g9M98znhWnc2g=.f2fbf09b-c9e3-4e6f-80f8-c9e3775689d2@github.com> On Tue, 30 Jul 2024 01:57:33 GMT, Alex Menkov <amenkov at openjdk.org> wrote: > Obsolete UseNotificationThread flag which was deprecated in JDK 23. > > Testing: tier1..tier5 Looks good! Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20381#pullrequestreview-2206468050 From kbarrett at openjdk.org Tue Jul 30 03:39:58 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 30 Jul 2024 03:39:58 GMT Subject: RFR: 8337416: Fix -Wzero-as-null-pointer-constant warnings in misc. runtime code Message-ID: <d5tyrnQNDwidRG11CtHlC_dWlGOHRQPDi-xRS389boU=.29346f8f-d7cd-4de7-99dd-d504dac01b5e@github.com> Please review this (perhaps trivial?) change that removes some uses of literal 0 as a null pointer constant in misc. runtime code. Most are changed to use nullptr. Testing: mach5 tier1 ------------- Commit messages: - fix simple runtime Changes: https://git.openjdk.org/jdk/pull/20383/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20383&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337416 Stats: 17 lines in 10 files changed: 0 ins; 0 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/20383.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20383/head:pull/20383 PR: https://git.openjdk.org/jdk/pull/20383 From dholmes at openjdk.org Tue Jul 30 04:12:11 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 30 Jul 2024 04:12:11 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string [v4] In-Reply-To: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> Message-ID: <ZVf8-PqRdsaSkXDK_PGBeQIWxNV9zY3Id57z8TBP78Q=.e5d32a11-db6f-468d-af60-54d2c91b11a0@github.com> > Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence. > > The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this. > > Testing: > - new gtest exercises the truncation code with the different possibilities for bad truncation > - tiers 1-3 sanity testing > > Thanks. David Holmes has updated the pull request incrementally with one additional commit since the last revision: Fix logic to allow for the buffer being partially filled. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20345/files - new: https://git.openjdk.org/jdk/pull/20345/files/794da826..77079058 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20345&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20345&range=02-03 Stats: 11 lines in 1 file changed: 8 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20345.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20345/head:pull/20345 PR: https://git.openjdk.org/jdk/pull/20345 From kbarrett at openjdk.org Tue Jul 30 04:18:02 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 30 Jul 2024 04:18:02 GMT Subject: RFR: 8337418: Fix -Wzero-as-null-pointer-constant warnings in prims code Message-ID: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com> Please review this change that removes some uses of literal 0 as a null pointer constant in prims code. Testing: mach5 tier1 ------------- Commit messages: - fix warnings in prims Changes: https://git.openjdk.org/jdk/pull/20385/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20385&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337418 Stats: 26 lines in 7 files changed: 0 ins; 3 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/20385.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20385/head:pull/20385 PR: https://git.openjdk.org/jdk/pull/20385 From dholmes at openjdk.org Tue Jul 30 04:19:32 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 30 Jul 2024 04:19:32 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string [v2] In-Reply-To: <lh8krMqdhecXEkx5-8ndf88RVyKTpK32hkrQt9_POP0=.439c434c-0074-48ed-ad97-ce146f76c236@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> <sZyCrr8Ti9Ad6EiJrSO_1fvCYsmLlrgHgFACt_790_Q=.ac6ceba1-8ffd-46ec-9e30-9ed3e6ad3cf4@github.com> <fyVtacPVdwpKbRQIu8icJ08uNq5MW4AqIN7V8zoeemU=.83792ea3-4c65-49e8-9f9b-bacecad37115@github.com> <lh8krMqdhecXEkx5-8ndf88RVyKTpK32hkrQt9_POP0=.439c434c-0074-48ed-ad97-ce146f76c236@github.com> Message-ID: <XlPDI-Ag825A9SAeOMivR4dv_ClpklfuAPOoceX35CI=.50ee33cc-cb36-4498-abbf-5087703208be@github.com> On Mon, 29 Jul 2024 23:16:14 GMT, David Holmes <dholmes at openjdk.org> wrote: >> src/hotspot/share/utilities/exceptions.cpp line 275: >> >>> 273: // we may also have a truncated UTF-8 sequence. In such cases we need to fix the buffer so the UTF-8 >>> 274: // sequence is valid. >>> 275: if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) { >> >> Do we need to check if `strlen(msg) == max_msg_size - 1`? If strlen is shorter, the bytes between the null terminator and max_msg_size are undefined, which might trigger an assertion while truncating. > > In fact we know it may be shorter than `max_msg_size - 1` - that is what we get on macOS if the string is huge and exceeds `INT_MAX` causing `vsnprintf` to return -1. I originally had an assert that failed due to that. > > I need to fix this case as well. <sigh>. Good catch. I did some experimentation here and it seems in practice that if the buffer is only partially filled then it will contain valid data as `vsnprintf` would stop filling at a well-defined point. But as it is not a clearly specified area we pass the buffer through anyway, using `strlen(msg)` . Most of the time a partially filled buffer will end with an ASCII character anyway and so the utf8 truncation operation will be a no-op. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1696265017 From dholmes at openjdk.org Tue Jul 30 04:19:33 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 30 Jul 2024 04:19:33 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string [v4] In-Reply-To: <Bp9RxG0ZfwtVg7p9v_X_ZgogL1U-aG0ha7ME7nKW8c8=.49302a72-48f0-4b5b-bc16-64ff037f6006@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> <wHY5e9XeMFpUyA7Zr0RKG2zIXC3rB5dqklIuzb8TnAQ=.55cc765a-6ec8-46dc-8cf1-4fe49d4aa476@github.com> <Bp9RxG0ZfwtVg7p9v_X_ZgogL1U-aG0ha7ME7nKW8c8=.49302a72-48f0-4b5b-bc16-64ff037f6006@github.com> Message-ID: <UjgJ4jR3nx62K-v1JVXXQYxFxstfns-d_DaNdQgdraI=.51c4881c-b431-4793-b2ca-8081d7760129@github.com> On Fri, 26 Jul 2024 21:39:16 GMT, David Holmes <dholmes at openjdk.org> wrote: >> src/hotspot/share/utilities/exceptions.cpp line 277: >> >>> 275: if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) { >>> 276: assert(msg[max_msg_size - 1] == '\0', "should be null terminated"); >>> 277: UTF8::truncate_to_legal_utf8((unsigned char*)msg, max_msg_size); >> >> Ah, I misread your patch and thought you pass in the strlen of the message to the truncation function, when in fact you pass in the hard coded message buffer size. >> >> But that begs the question of why you test strlen above, and more importantly, whether all cases where snprintf returns an error are truncation problems. It could have detected an invalid UTF8 sequence and aborted in the middle of it. > > The `strlen` check is to skip the empty buffer you can get on Windows if vsnprintf returns -1 due to overflow of INT_MAX. > > We are assuming/requiring that we start with a valid UTF8 sequence and the worst that will happen is that vsnprintf will truncate it. > > If we actually got -1 for a conversion error (no way to tell the difference in the two cases) then we would unnecessarily truncate, but we do not expect any such conversion errors - in part because we type check the format specifiers and args and so should never get a mismatch. Note this has been updated now to pass `strlen(msg)`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1696264323 From dholmes at openjdk.org Tue Jul 30 04:34:31 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 30 Jul 2024 04:34:31 GMT Subject: RFR: 8337416: Fix -Wzero-as-null-pointer-constant warnings in misc. runtime code In-Reply-To: <d5tyrnQNDwidRG11CtHlC_dWlGOHRQPDi-xRS389boU=.29346f8f-d7cd-4de7-99dd-d504dac01b5e@github.com> References: <d5tyrnQNDwidRG11CtHlC_dWlGOHRQPDi-xRS389boU=.29346f8f-d7cd-4de7-99dd-d504dac01b5e@github.com> Message-ID: <Z60HHFyeI-xKnJXCK3Y5VDJzAjtjbPtPK1l6yEmPPIk=.5d4b8acd-4bcd-464c-aef8-4cfd707846f4@github.com> On Tue, 30 Jul 2024 03:34:18 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: > Please review this (perhaps trivial?) change that removes some uses of literal > 0 as a null pointer constant in misc. runtime code. Most are changed to use > nullptr. > > Testing: mach5 tier1 This looks fine, and I think trivial. I think there is an existing bug but probably better to file a separate JBS issue for that. Thanks src/hotspot/share/oops/constantPool.cpp line 2068: > 2066: } > 2067: printf("Cpool size: %d\n", size); > 2068: fflush(nullptr); This looks like a bug. I think someone used 0 aka fd0 when they needed stdout for fflush. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20383#pullrequestreview-2206608763 PR Review Comment: https://git.openjdk.org/jdk/pull/20383#discussion_r1696275148 From stuefe at openjdk.org Tue Jul 30 04:55:33 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 30 Jul 2024 04:55:33 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v17] In-Reply-To: <fyALiuRCBIdxuyUue80jejw0G9ChAh4Y0kn--lbTTHY=.ea8dd1ae-6cec-416c-976b-fe027732dd79@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <fyALiuRCBIdxuyUue80jejw0G9ChAh4Y0kn--lbTTHY=.ea8dd1ae-6cec-416c-976b-fe027732dd79@github.com> Message-ID: <YOZuklnaueS0NGWEomZaOfh-ic1ALDoP_395eGL4ITM=.1762685e-a1da-44cf-ad18-c61e01b5f48a@github.com> On Mon, 29 Jul 2024 19:08:17 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. >> >> This PR addresses the following diagnostic commands: >> - [x] Compiler.perfmap >> - [x] GC.heap_dump >> - [x] System.dump_map >> - [x] Thread.dump_to_file >> - [x] VM.cds >> >> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). >> >> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, >> >> >> filename (Optional) Name of the file to which the flight recording data is >> written when the recording is stopped. If no filename is given, a >> filename is generated from the PID and the current date and is >> placed in the directory where the process was started. The >> filename may also be a directory in which case, the filename is >> generated from the PID and the current date in the specified >> directory. (STRING, no default value) >> >> Note: If a filename is given, '%p' in the filename will be >> replaced by the PID, and '%t' will be replaced by the time in >> 'yyyy_MM_dd_HH_mm_ss' format. >> >> >> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. >> >> Testing: >> >> - [x] Added test case passes. >> - [x] Modified existing VM.cds tests to also check for `%p` filenames. >> >> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). >> >> Cheers, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > last lingering change Okay. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20198#pullrequestreview-2206633057 From stuefe at openjdk.org Tue Jul 30 05:16:34 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 30 Jul 2024 05:16:34 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string [v4] In-Reply-To: <ZVf8-PqRdsaSkXDK_PGBeQIWxNV9zY3Id57z8TBP78Q=.e5d32a11-db6f-468d-af60-54d2c91b11a0@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> <ZVf8-PqRdsaSkXDK_PGBeQIWxNV9zY3Id57z8TBP78Q=.e5d32a11-db6f-468d-af60-54d2c91b11a0@github.com> Message-ID: <XpUTvbOxM7zHYsI-5yqbIfRo-e-_mev-dNBf1nNnY7s=.ed0f4c8f-9775-4aeb-bd64-761522a9d514@github.com> On Tue, 30 Jul 2024 04:12:11 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence. >> >> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this. >> >> Testing: >> - new gtest exercises the truncation code with the different possibilities for bad truncation >> - tiers 1-3 sanity testing >> >> Thanks. > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Fix logic to allow for the buffer being partially filled. ok ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20345#pullrequestreview-2206655823 From stuefe at openjdk.org Tue Jul 30 05:22:32 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 30 Jul 2024 05:22:32 GMT Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic [v2] In-Reply-To: <5fyuvwoHRU_EUT2tvUsWwzCjd7dazKHMiL0rGWW8jVU=.fed6e33a-7a22-4b4c-950f-d19c18ee0eaf@github.com> References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com> <5fyuvwoHRU_EUT2tvUsWwzCjd7dazKHMiL0rGWW8jVU=.fed6e33a-7a22-4b4c-950f-d19c18ee0eaf@github.com> Message-ID: <lsdnO__d3kqEFpSJJVZOz7JSRaSQXjxT6xwC0kc1MxI=.ec76bf46-635c-411d-9d0c-918d286f0f0b@github.com> On Mon, 29 Jul 2024 14:49:48 GMT, Ashutosh Mehra <asmehra at openjdk.org> wrote: >> Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031) >> >> Testing: >> test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java >> test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java > > Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments by Thomas S. > > Signed-off-by: Ashutosh Mehra <asmehra at redhat.com> Minor naming nit, otherwise good. src/hotspot/share/compiler/compilationMemoryStatistic.hpp line 40: > 38: > 39: // Helper class to wrap the array of arena tags for easier processing > 40: class ArenaTagsCounter { Sorry for being a stickler for precise names, but I would like plural for counters here - it is not a single counter, its a series/vector/array of counters. Any of these work for me: ArenaCountersByTag - ArenaCountersByTagVector - ArenaTagCounterVector - ArenaTagCounters ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20304#pullrequestreview-2206660184 PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1696322176 From jwaters at openjdk.org Tue Jul 30 05:26:31 2024 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 30 Jul 2024 05:26:31 GMT Subject: RFR: 8337416: Fix -Wzero-as-null-pointer-constant warnings in misc. runtime code In-Reply-To: <d5tyrnQNDwidRG11CtHlC_dWlGOHRQPDi-xRS389boU=.29346f8f-d7cd-4de7-99dd-d504dac01b5e@github.com> References: <d5tyrnQNDwidRG11CtHlC_dWlGOHRQPDi-xRS389boU=.29346f8f-d7cd-4de7-99dd-d504dac01b5e@github.com> Message-ID: <EeNLijHM045qSCsVlfkuAvzvyEknaPkdEa5XZnMszHw=.5120ec93-af97-4219-848d-db99043c2e1a@github.com> On Tue, 30 Jul 2024 03:34:18 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: > Please review this (perhaps trivial?) change that removes some uses of literal > 0 as a null pointer constant in misc. runtime code. Most are changed to use > nullptr. > > Testing: mach5 tier1 Looks Good! ------------- Marked as reviewed by jwaters (Committer). PR Review: https://git.openjdk.org/jdk/pull/20383#pullrequestreview-2206668559 From dholmes at openjdk.org Tue Jul 30 05:32:32 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 30 Jul 2024 05:32:32 GMT Subject: RFR: 8337418: Fix -Wzero-as-null-pointer-constant warnings in prims code In-Reply-To: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com> References: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com> Message-ID: <Em7Cdv0NCUyAnZtjOQQaTNJleGEvIi4mWAFAuUVCz24=.4eebb188-2eaa-484e-8f5d-557ac99fd67d@github.com> On Tue, 30 Jul 2024 04:12:33 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: > Please review this change that removes some uses of literal 0 as a null > pointer constant in prims code. > > Testing: mach5 tier1 Couple of queries on this one. Thanks src/hotspot/share/prims/jni.cpp line 1151: > 1149: \ > 1150: EntryProbe; \ > 1151: ResultType ret{}; \ This looks bogus. ResultType is just a macro variable and could be a primitive type. ?? Does the local need initializing? src/hotspot/share/prims/methodHandles.cpp line 439: > 437: default: > 438: fatal("unexpected intrinsic id: %d %s", vmIntrinsics::as_int(iid), vmIntrinsics::name_at(iid)); > 439: return 0; Do we no longer need these returns after `fatal` to keep compilers happy? ------------- PR Review: https://git.openjdk.org/jdk/pull/20385#pullrequestreview-2206671959 PR Review Comment: https://git.openjdk.org/jdk/pull/20385#discussion_r1696328696 PR Review Comment: https://git.openjdk.org/jdk/pull/20385#discussion_r1696329565 From dholmes at openjdk.org Tue Jul 30 05:41:08 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 30 Jul 2024 05:41:08 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string [v5] In-Reply-To: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> Message-ID: <I5ohuzIDvghA8wDhpSAQTppCO3Kqsbp9mGeDvxO6G4U=.1adfae97-c72c-4c53-a465-982e2d398873@github.com> > Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence. > > The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this. > > Testing: > - new gtest exercises the truncation code with the different possibilities for bad truncation > - tiers 1-3 sanity testing > > Thanks. David Holmes has updated the pull request incrementally with one additional commit since the last revision: Fix off-by-one error ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20345/files - new: https://git.openjdk.org/jdk/pull/20345/files/77079058..d7b65bbb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20345&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20345&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20345.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20345/head:pull/20345 PR: https://git.openjdk.org/jdk/pull/20345 From gcao at openjdk.org Tue Jul 30 05:47:57 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 30 Jul 2024 05:47:57 GMT Subject: RFR: 8337421: RISC-V: client VM build failure after JDK-8335191 Message-ID: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com> Hi, please help review this patch that fix the client VM build failed for riscv. Error log for client VM build to see: [JDK-8337421](https://bugs.openjdk.org/browse/JDK-8337421) The root cause is that MaxVectorSize is defined in COMPILER2 or JVMCI, which is not included in client mode. In addition to this, I have adjusted the functions related to initialization using UseSHA256Intrinsics, UseSHA512Intrinsics, UseMD5Intrinsics, UseChaCha20Intrinsics, UseSHA1Intrinsics, UseAdler32Intrinsics to be under the control of the COMPILER2 macro. And made related adjustments in VM_Version::c2_initialize(). ### Testing - [x] linux-riscv client VM fastdebug native build ------------- Commit messages: - 8337421: RISC-V: client VM build failure after JDK-8335191 Changes: https://git.openjdk.org/jdk/pull/20386/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20386&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337421 Stats: 200 lines in 2 files changed: 100 ins; 99 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20386.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20386/head:pull/20386 PR: https://git.openjdk.org/jdk/pull/20386 From gcao at openjdk.org Tue Jul 30 05:47:57 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 30 Jul 2024 05:47:57 GMT Subject: RFR: 8337421: RISC-V: client VM build failure after JDK-8335191 In-Reply-To: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com> References: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com> Message-ID: <tNosSLnTmI6wOQsQboQ53uNpOO8BaE3c6RvGJJWOOtg=.b1fa62f7-8c6f-4fe2-a3e2-ccaf274982b3@github.com> On Tue, 30 Jul 2024 05:41:45 GMT, Gui Cao <gcao at openjdk.org> wrote: > Hi, please help review this patch that fix the client VM build failed for riscv. > > Error log for client VM build to see: [JDK-8337421](https://bugs.openjdk.org/browse/JDK-8337421) > > The root cause is that MaxVectorSize is defined in COMPILER2 or JVMCI, which is not included in client mode. In addition to this, I have adjusted the functions related to initialization using UseSHA256Intrinsics, UseSHA512Intrinsics, UseMD5Intrinsics, UseChaCha20Intrinsics, UseSHA1Intrinsics, UseAdler32Intrinsics to be under the control of the COMPILER2 macro. And made related adjustments in VM_Version::c2_initialize(). > > ### Testing > - [x] linux-riscv client VM fastdebug native build @Hamlin-Li : May I ask if this makes sense to you? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20386#issuecomment-2257512053 From dholmes at openjdk.org Tue Jul 30 06:35:32 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 30 Jul 2024 06:35:32 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string [v4] In-Reply-To: <XpUTvbOxM7zHYsI-5yqbIfRo-e-_mev-dNBf1nNnY7s=.ed0f4c8f-9775-4aeb-bd64-761522a9d514@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> <ZVf8-PqRdsaSkXDK_PGBeQIWxNV9zY3Id57z8TBP78Q=.e5d32a11-db6f-468d-af60-54d2c91b11a0@github.com> <XpUTvbOxM7zHYsI-5yqbIfRo-e-_mev-dNBf1nNnY7s=.ed0f4c8f-9775-4aeb-bd64-761522a9d514@github.com> Message-ID: <K1v8QyK5R_ntO7mI74NnRLZ150UmuX2Ji3r3dOMDLg4=.f58db593-372a-4561-8a2b-c0aec888b462@github.com> On Tue, 30 Jul 2024 05:14:06 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix logic to allow for the buffer being partially filled. > > ok Thanks for the review @tstuefe ------------- PR Comment: https://git.openjdk.org/jdk/pull/20345#issuecomment-2257572292 From jwtang at openjdk.org Tue Jul 30 06:46:20 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Tue, 30 Jul 2024 06:46:20 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v4] In-Reply-To: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> Message-ID: <Og5U6MWsTWn6yVFHLPi4Fovp1Nke8Lk41qCwReD0BIU=.5d3848df-da28-48f4-8801-3ad184e8762f@github.com> > I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing. Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision: refactor testcase and change the location of fix codes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20373/files - new: https://git.openjdk.org/jdk/pull/20373/files/723b1ec6..1b0de486 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=02-03 Stats: 230 lines in 5 files changed: 117 ins; 111 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20373/head:pull/20373 PR: https://git.openjdk.org/jdk/pull/20373 From alanb at openjdk.org Tue Jul 30 06:49:31 2024 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 30 Jul 2024 06:49:31 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v3] In-Reply-To: <yBWdB5qfG39speceqxReLp2SRTzlOk3bWt1rjGK83lA=.041249fc-d4b5-4c81-9dc8-4193d82e3a28@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <Pq3717t6CcEZuvhb8V34_CyTW6eHdVtPs_u_nGRwib8=.2883d513-24b6-4d38-ae4d-90b0e78e7eac@github.com> <yBWdB5qfG39speceqxReLp2SRTzlOk3bWt1rjGK83lA=.041249fc-d4b5-4c81-9dc8-4193d82e3a28@github.com> Message-ID: <NUYBzJVMKqywg4-jWaehrYyh76pE84JYyD4n_iYnL0k=.c6682c01-9e20-4ebf-996e-7a715a53d0d7@github.com> On Mon, 29 Jul 2024 22:58:09 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: > Can you convert the test to use `.cpp` instead of `.c` as well? or maybe it could use VThreadPinner which allows calling through a native frame for tests like this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20373#issuecomment-2257591543 From kbarrett at openjdk.org Tue Jul 30 06:56:33 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 30 Jul 2024 06:56:33 GMT Subject: RFR: 8337418: Fix -Wzero-as-null-pointer-constant warnings in prims code In-Reply-To: <Em7Cdv0NCUyAnZtjOQQaTNJleGEvIi4mWAFAuUVCz24=.4eebb188-2eaa-484e-8f5d-557ac99fd67d@github.com> References: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com> <Em7Cdv0NCUyAnZtjOQQaTNJleGEvIi4mWAFAuUVCz24=.4eebb188-2eaa-484e-8f5d-557ac99fd67d@github.com> Message-ID: <GHKiuJLY8J-ixCnxqrGAOyAJm0wdZUOGI6sbioUCNS8=.45d5fe4b-e059-4473-885f-ade0efae9cb5@github.com> On Tue, 30 Jul 2024 05:27:37 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Please review this change that removes some uses of literal 0 as a null >> pointer constant in prims code. >> >> Testing: mach5 tier1 > > src/hotspot/share/prims/jni.cpp line 1151: > >> 1149: \ >> 1150: EntryProbe; \ >> 1151: ResultType ret{}; \ > > This looks bogus. ResultType is just a macro variable and could be a primitive type. ?? Does the local need initializing? This is value-initialization syntax. Value-initialization of a primitive type is zero-initialization. However, I think we don't need the local variable at all. Here and in the other 5(?) similar places, rather than ResultType ret{}; ... ret = jvalue.get_##ResultType(); return ret; I think we could just have ... return jvalue.get_##ResultType(); > src/hotspot/share/prims/methodHandles.cpp line 439: > >> 437: default: >> 438: fatal("unexpected intrinsic id: %d %s", vmIntrinsics::as_int(iid), vmIntrinsics::name_at(iid)); >> 439: return 0; > > Do we no longer need these returns after `fatal` to keep compilers happy? Now that we have, and are using, `[[noreturn]]` on all platforms, we no longer need that dead code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20385#discussion_r1696408217 PR Review Comment: https://git.openjdk.org/jdk/pull/20385#discussion_r1696408335 From jwtang at openjdk.org Tue Jul 30 06:56:34 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Tue, 30 Jul 2024 06:56:34 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v3] In-Reply-To: <A8wM3bukziE67E6BEq1fo8wM-fXbbvME_k4zoQTGtSY=.e967a17a-ec4d-4d38-83b5-e5578a05d2b6@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <Pq3717t6CcEZuvhb8V34_CyTW6eHdVtPs_u_nGRwib8=.2883d513-24b6-4d38-ae4d-90b0e78e7eac@github.com> <A8wM3bukziE67E6BEq1fo8wM-fXbbvME_k4zoQTGtSY=.e967a17a-ec4d-4d38-83b5-e5578a05d2b6@github.com> Message-ID: <B6mclOMVdt2B3T53Qsh1j3IQCslA5tdle7KPf-0bDF0=.ed0f9930-f107-41ca-b588-cf1ff31d1ae7@github.com> On Mon, 29 Jul 2024 22:34:46 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision: >> >> changes according to reviewers' advice > > src/hotspot/share/prims/jvmtiExport.cpp line 970: > >> 968: if (_thread->is_in_any_VTMS_transition()) { >> 969: return; // no events should be posted if thread is in any VTMS transition >> 970: } > > This is not right place to fix it. > > This would be better: > > @@ -1091,8 +1091,8 @@ bool JvmtiExport::post_class_file_load_hook(Symbol* h_name, > if (JvmtiEnv::get_phase() < JVMTI_PHASE_PRIMORDIAL) { > return false; > } > - if (JavaThread::current()->is_in_tmp_VTMS_transition()) { > - return false; // skip CFLH events in tmp VTMS transition > + if (thread->is_in_any_VTMS_transition()) { > + return; // no events should be posted if thread is in any VTMS transition > } > > JvmtiClassFileLoadHookPoster poster(h_name, class_loader, > > > Also, there is a check in the constructor `JvmtiClassFileLoadHookPoster()`: > > if (_thread->is_in_any_VTMS_transition()) { > return; // no events should be posted if thread is in any VTMS transition > } > > It is better to replace it with assert. With the right check in the `JvmtiExport::post_class_file_load_hook()` we should never call this constructor and `poster.post()` when in a VTMS transition. Change it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1696407776 From mli at openjdk.org Tue Jul 30 07:03:31 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 30 Jul 2024 07:03:31 GMT Subject: RFR: 8337421: RISC-V: client VM build failure after JDK-8335191 In-Reply-To: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com> References: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com> Message-ID: <selusQPJMBBgbqUG7i0dxDxZkXJfHwyZJ5-LBMP3Q2c=.dc77ee0b-5abc-4b03-9b92-55e8c4d3a940@github.com> On Tue, 30 Jul 2024 05:41:45 GMT, Gui Cao <gcao at openjdk.org> wrote: > Hi, please help review this patch that fix the client VM build failed for riscv. > > Error log for client VM build to see: [JDK-8337421](https://bugs.openjdk.org/browse/JDK-8337421) > > The root cause is that MaxVectorSize is defined in COMPILER2 or JVMCI, which is not included in client mode. In addition to this, I have adjusted the functions related to initialization using UseSHA256Intrinsics, UseSHA512Intrinsics, UseMD5Intrinsics, UseChaCha20Intrinsics, UseSHA1Intrinsics, UseAdler32Intrinsics to be under the control of the COMPILER2 macro. And made related adjustments in VM_Version::c2_initialize(). > > ### Testing > - [x] linux-riscv client VM fastdebug native build Thanks for catching. Looks good to me. Just one minor comment, which is quite subjective, you're on the call. Suggested changes: void VM_Version::initialize() { common_initialize(); #ifdef COMPILER2 c2_initialize(); #endif // COMPILER2 } void VM_Version::common_initialize() { ... } #ifdef COMPILER2 void VM_Version::c2_initialize() { ... } #endif // COMPILER2 ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20386#pullrequestreview-2206812795 From kbarrett at openjdk.org Tue Jul 30 07:18:33 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 30 Jul 2024 07:18:33 GMT Subject: RFR: 8337418: Fix -Wzero-as-null-pointer-constant warnings in prims code In-Reply-To: <GHKiuJLY8J-ixCnxqrGAOyAJm0wdZUOGI6sbioUCNS8=.45d5fe4b-e059-4473-885f-ade0efae9cb5@github.com> References: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com> <Em7Cdv0NCUyAnZtjOQQaTNJleGEvIi4mWAFAuUVCz24=.4eebb188-2eaa-484e-8f5d-557ac99fd67d@github.com> <GHKiuJLY8J-ixCnxqrGAOyAJm0wdZUOGI6sbioUCNS8=.45d5fe4b-e059-4473-885f-ade0efae9cb5@github.com> Message-ID: <AYeZPI3ANHsd29eP2-PHll2yUn8KT1HL4S_2KaFUon0=.3dda4769-20ef-4653-aaeb-eec3f568925f@github.com> On Tue, 30 Jul 2024 06:54:04 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: >> src/hotspot/share/prims/jni.cpp line 1151: >> >>> 1149: \ >>> 1150: EntryProbe; \ >>> 1151: ResultType ret{}; \ >> >> This looks bogus. ResultType is just a macro variable and could be a primitive type. ?? Does the local need initializing? > > This is value-initialization syntax. Value-initialization of a primitive type is zero-initialization. > > However, I think we don't need the local variable at all. Here and in the other 5(?) similar places, rather than > > ResultType ret{}; > ... > ret = jvalue.get_##ResultType(); > return ret; > > I think we could just have > > ... > return jvalue.get_##ResultType(); Looks like eliminating the variable doesn't work. It gets used in a `DT_RETURN_MARK_FOR` form, which needs the address of the return value. That address is obtained using a reference. Taking a reference to an uninitialized variable is (I think) okay, so long as one doesn't attempt to use the uninitialized value. But then the assignment could be problematic if it's uninitialized and the assignment operator is non-trivial. I expect the compiler will optimize away a trivial zero initialization if it's not needed. So ensuring it is value-initialized seems like the cleanest thing to do. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20385#discussion_r1696441217 From kbarrett at openjdk.org Tue Jul 30 07:27:37 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 30 Jul 2024 07:27:37 GMT Subject: RFR: 8337416: Fix -Wzero-as-null-pointer-constant warnings in misc. runtime code In-Reply-To: <Z60HHFyeI-xKnJXCK3Y5VDJzAjtjbPtPK1l6yEmPPIk=.5d4b8acd-4bcd-464c-aef8-4cfd707846f4@github.com> References: <d5tyrnQNDwidRG11CtHlC_dWlGOHRQPDi-xRS389boU=.29346f8f-d7cd-4de7-99dd-d504dac01b5e@github.com> <Z60HHFyeI-xKnJXCK3Y5VDJzAjtjbPtPK1l6yEmPPIk=.5d4b8acd-4bcd-464c-aef8-4cfd707846f4@github.com> Message-ID: <8inmHHwwUefDxv-O0Ltki71aI177Wz7yb_mAkt5tEr8=.c66c77f7-8290-4d17-98fa-498d2ef06180@github.com> On Tue, 30 Jul 2024 04:32:09 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Please review this (perhaps trivial?) change that removes some uses of literal >> 0 as a null pointer constant in misc. runtime code. Most are changed to use >> nullptr. >> >> Testing: mach5 tier1 > > This looks fine, and I think trivial. > > I think there is an existing bug but probably better to file a separate JBS issue for that. > > Thanks Thanks for reviews @dholmes-ora and @TheShermanTanker . I'll file a followup bug for the pre-existing fflush argument that @dholmes-ora pointed out. > src/hotspot/share/oops/constantPool.cpp line 2068: > >> 2066: } >> 2067: printf("Cpool size: %d\n", size); >> 2068: fflush(nullptr); > > This looks like a bug. I think someone used 0 aka fd0 when they needed stdout for fflush. I assumed there was some reason for flushing all here, but you are right, this is probably a bug. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20383#issuecomment-2257653298 PR Review Comment: https://git.openjdk.org/jdk/pull/20383#discussion_r1696447458 From kbarrett at openjdk.org Tue Jul 30 07:27:37 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 30 Jul 2024 07:27:37 GMT Subject: Integrated: 8337416: Fix -Wzero-as-null-pointer-constant warnings in misc. runtime code In-Reply-To: <d5tyrnQNDwidRG11CtHlC_dWlGOHRQPDi-xRS389boU=.29346f8f-d7cd-4de7-99dd-d504dac01b5e@github.com> References: <d5tyrnQNDwidRG11CtHlC_dWlGOHRQPDi-xRS389boU=.29346f8f-d7cd-4de7-99dd-d504dac01b5e@github.com> Message-ID: <3ojt0S-OE_7u3dFaUtQ7zGyTuTu9AR_wLCfm6rUJNJQ=.10175182-f63e-441d-8a18-8630ff7ade52@github.com> On Tue, 30 Jul 2024 03:34:18 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: > Please review this (perhaps trivial?) change that removes some uses of literal > 0 as a null pointer constant in misc. runtime code. Most are changed to use > nullptr. > > Testing: mach5 tier1 This pull request has now been integrated. Changeset: bc7c255b Author: Kim Barrett <kbarrett at openjdk.org> URL: https://git.openjdk.org/jdk/commit/bc7c255b156bf3bb3fd8c3f622b8127ab27e7c7a Stats: 17 lines in 10 files changed: 0 ins; 0 del; 17 mod 8337416: Fix -Wzero-as-null-pointer-constant warnings in misc. runtime code Reviewed-by: dholmes, jwaters ------------- PR: https://git.openjdk.org/jdk/pull/20383 From clanger at openjdk.org Tue Jul 30 07:48:36 2024 From: clanger at openjdk.org (Christoph Langer) Date: Tue, 30 Jul 2024 07:48:36 GMT Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' [v4] In-Reply-To: <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com> References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com> <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com> Message-ID: <7cxoF1PDzUFxZzMM29aVQKU87VH50Xpp42nyEk8oFvg=.c7c4e1ed-6bb4-499f-854e-ce16fcaac091@github.com> On Thu, 25 Jul 2024 13:42:48 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote: >> When running with ubsan - enabled binaries, some tests trigger the following report : >> >> src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' >> #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91 >> #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106 >> #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157 >> #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63 >> #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85 >> #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319 >> #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233 >> #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199 >> >> Seems in case of (at least) class SmallRegisterMap we miss handling nullptr . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > add patch of Kim Barrett LGTM ------------- Marked as reviewed by clanger (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20296#pullrequestreview-2206911734 From gcao at openjdk.org Tue Jul 30 07:52:03 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 30 Jul 2024 07:52:03 GMT Subject: RFR: 8337421: RISC-V: client VM build failure after JDK-8335191 [v2] In-Reply-To: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com> References: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com> Message-ID: <W6_6tk_93Tdgi18jxyNhKJVTbfzgrJmVTXcUdRa5GYo=.5566062a-5ca8-4ff1-b040-98d9ef7536cf@github.com> > Hi, please help review this patch that fix the client VM build failed for riscv. > > Error log for client VM build to see: [JDK-8337421](https://bugs.openjdk.org/browse/JDK-8337421) > > The root cause is that MaxVectorSize is defined in COMPILER2 or JVMCI, which is not included in client mode. In addition to this, I have adjusted the functions related to initialization using UseSHA256Intrinsics, UseSHA512Intrinsics, UseMD5Intrinsics, UseChaCha20Intrinsics, UseSHA1Intrinsics, UseAdler32Intrinsics to be under the control of the COMPILER2 macro. And made related adjustments in VM_Version::c2_initialize(). > > ### Testing > - [x] linux-riscv client VM fastdebug native build Gui Cao has updated the pull request incrementally with one additional commit since the last revision: Fix for review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20386/files - new: https://git.openjdk.org/jdk/pull/20386/files/6f8b6883..d9055e6f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20386&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20386&range=00-01 Stats: 14 lines in 2 files changed: 9 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20386.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20386/head:pull/20386 PR: https://git.openjdk.org/jdk/pull/20386 From gcao at openjdk.org Tue Jul 30 07:52:04 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 30 Jul 2024 07:52:04 GMT Subject: RFR: 8337421: RISC-V: client VM build failure after JDK-8335191 [v2] In-Reply-To: <selusQPJMBBgbqUG7i0dxDxZkXJfHwyZJ5-LBMP3Q2c=.dc77ee0b-5abc-4b03-9b92-55e8c4d3a940@github.com> References: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com> <selusQPJMBBgbqUG7i0dxDxZkXJfHwyZJ5-LBMP3Q2c=.dc77ee0b-5abc-4b03-9b92-55e8c4d3a940@github.com> Message-ID: <yNPc7JZLQ1UCsIP3MkCnb3XX8SNZeAGDJduWbMr_ua0=.baa032d8-f434-4273-98a1-8a7c7f0bcd9a@github.com> On Tue, 30 Jul 2024 07:00:27 GMT, Hamlin Li <mli at openjdk.org> wrote: > Thanks for catching. Looks good to me. > > Just one minor comment, which is quite subjective, you're on the call. > > Suggested changes: > > ``` > void VM_Version::initialize() { > common_initialize(); > #ifdef COMPILER2 > c2_initialize(); > #endif // COMPILER2 > } > > void VM_Version::common_initialize() { > ... > } > > #ifdef COMPILER2 > void VM_Version::c2_initialize() { > ... > } > #endif // COMPILER2 > ``` Thanks for the review. Fixed ------------- PR Comment: https://git.openjdk.org/jdk/pull/20386#issuecomment-2257695188 From duke at openjdk.org Tue Jul 30 07:55:38 2024 From: duke at openjdk.org (Yuri Gaevsky) Date: Tue, 30 Jul 2024 07:55:38 GMT Subject: RFR: 8324124: RISC-V: implement _vectorizedMismatch intrinsic In-Reply-To: <dxSBhJiLeVkLF8PvHW3MMg69vwXU0VshECCMz5HnhhI=.e0cbda8b-f7f6-44ff-806b-1f21496911be@github.com> References: <dxSBhJiLeVkLF8PvHW3MMg69vwXU0VshECCMz5HnhhI=.e0cbda8b-f7f6-44ff-806b-1f21496911be@github.com> Message-ID: <_0CrA8Qa71vtP2DRk3o4yb9F80-czEU-D7lEb7stkHk=.a45226ab-0f6c-4f25-acd4-657fcc29ca93@github.com> On Wed, 7 Feb 2024 14:35:55 GMT, Yuri Gaevsky <duke at openjdk.org> wrote: > Hello All, > > Please review these changes to enable the __vectorizedMismatch_ intrinsic on RISC-V platform with RVV instructions supported. > > Thank you, > -Yuri Gaevsky > > **Correctness checks:** > hotspot/jtreg/compiler/{intrinsic/c1/c2}/ under QEMU-8.1 with RVV v1.0.0 and -XX:TieredStopAtLevel=1/2/3/4. . ------------- PR Comment: https://git.openjdk.org/jdk/pull/17750#issuecomment-2257705299 From dholmes at openjdk.org Tue Jul 30 07:57:31 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 30 Jul 2024 07:57:31 GMT Subject: RFR: 8337418: Fix -Wzero-as-null-pointer-constant warnings in prims code In-Reply-To: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com> References: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com> Message-ID: <QctZn5PDD3uluZJ-W_CG3Ffo0f02PsY7Zlx5neUOICQ=.7ca80211-e028-4553-87f5-f27f17d903ea@github.com> On Tue, 30 Jul 2024 04:12:33 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: > Please review this change that removes some uses of literal 0 as a null > pointer constant in prims code. > > Testing: mach5 tier1 Okay - looks good. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20385#pullrequestreview-2206930451 From dholmes at openjdk.org Tue Jul 30 07:57:32 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 30 Jul 2024 07:57:32 GMT Subject: RFR: 8337418: Fix -Wzero-as-null-pointer-constant warnings in prims code In-Reply-To: <AYeZPI3ANHsd29eP2-PHll2yUn8KT1HL4S_2KaFUon0=.3dda4769-20ef-4653-aaeb-eec3f568925f@github.com> References: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com> <Em7Cdv0NCUyAnZtjOQQaTNJleGEvIi4mWAFAuUVCz24=.4eebb188-2eaa-484e-8f5d-557ac99fd67d@github.com> <GHKiuJLY8J-ixCnxqrGAOyAJm0wdZUOGI6sbioUCNS8=.45d5fe4b-e059-4473-885f-ade0efae9cb5@github.com> <AYeZPI3ANHsd29eP2-PHll2yUn8KT1HL4S_2KaFUon0=.3dda4769-20ef-4653-aaeb-eec3f568925f@github.com> Message-ID: <2HT3saxNUjevXOwHYDEDT2dIsjjzI6OS8ps6z9oF_nY=.c50ca0b5-5eed-4ce3-b124-fe5c9995fa46@github.com> On Tue, 30 Jul 2024 07:16:21 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: >> This is value-initialization syntax. Value-initialization of a primitive type is zero-initialization. >> >> However, I think we don't need the local variable at all. Here and in the other 5(?) similar places, rather than >> >> ResultType ret{}; >> ... >> ret = jvalue.get_##ResultType(); >> return ret; >> >> I think we could just have >> >> ... >> return jvalue.get_##ResultType(); > > Looks like eliminating the variable doesn't work. It gets used in a `DT_RETURN_MARK_FOR` form, which > needs the address of the return value. That address is obtained using a reference. Taking a reference > to an uninitialized variable is (I think) okay, so long as one doesn't attempt to use the uninitialized value. > But then the assignment could be problematic if it's uninitialized and the assignment operator is non-trivial. > I expect the compiler will optimize away a trivial zero initialization if it's not needed. So ensuring it is > value-initialized seems like the cleanest thing to do. One day I will remember what this syntax is and does. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20385#discussion_r1696496369 From fyang at openjdk.org Tue Jul 30 08:13:33 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 30 Jul 2024 08:13:33 GMT Subject: RFR: 8337421: RISC-V: client VM build failure after JDK-8335191 [v2] In-Reply-To: <W6_6tk_93Tdgi18jxyNhKJVTbfzgrJmVTXcUdRa5GYo=.5566062a-5ca8-4ff1-b040-98d9ef7536cf@github.com> References: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com> <W6_6tk_93Tdgi18jxyNhKJVTbfzgrJmVTXcUdRa5GYo=.5566062a-5ca8-4ff1-b040-98d9ef7536cf@github.com> Message-ID: <gqAdNg8JIOb7Tk9ZO-jKPYa8dGW9DTDU57FIyv02Rqg=.27da6419-ca78-47d6-a658-df0fec754e83@github.com> On Tue, 30 Jul 2024 07:52:03 GMT, Gui Cao <gcao at openjdk.org> wrote: >> Hi, please help review this patch that fix the client VM build failed for riscv. >> >> Error log for client VM build to see: [JDK-8337421](https://bugs.openjdk.org/browse/JDK-8337421) >> >> The root cause is that MaxVectorSize is defined in COMPILER2 or JVMCI, which is not included in client mode. In addition to this, I have adjusted the functions related to initialization using UseSHA256Intrinsics, UseSHA512Intrinsics, UseMD5Intrinsics, UseChaCha20Intrinsics, UseSHA1Intrinsics, UseAdler32Intrinsics to be under the control of the COMPILER2 macro. And made related adjustments in VM_Version::c2_initialize(). >> >> ### Testing >> - [x] linux-riscv client VM fastdebug native build > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Fix for review comments src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 6007: > 6005: generate_compare_long_strings(); > 6006: > 6007: generate_string_indexof_stubs(); I think we can put this two under macro `COMPILER2` too. Then we can further remove check for macro `COMPILER2_OR_JVMCI` in this function. I don't think these stubs are ever used by JVMCI which is only partially implemented on this platform for now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20386#discussion_r1696524759 From gcao at openjdk.org Tue Jul 30 08:16:44 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 30 Jul 2024 08:16:44 GMT Subject: RFR: 8337421: RISC-V: client VM build failure after JDK-8335191 [v3] In-Reply-To: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com> References: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com> Message-ID: <HMZWo-m5vmc27XgW5E5SE6Gx_rr8nBnoglxOJ8J64Uw=.803cd228-4977-46c7-a267-3b2a9ffce03b@github.com> > Hi, please help review this patch that fix the client VM build failed for riscv. > > Error log for client VM build to see: [JDK-8337421](https://bugs.openjdk.org/browse/JDK-8337421) > > The root cause is that MaxVectorSize is defined in COMPILER2 or JVMCI, which is not included in client mode. In addition to this, I have adjusted the functions related to initialization using UseSHA256Intrinsics, UseSHA512Intrinsics, UseMD5Intrinsics, UseChaCha20Intrinsics, UseSHA1Intrinsics, UseAdler32Intrinsics to be under the control of the COMPILER2 macro. And made related adjustments in VM_Version::c2_initialize(). > > ### Testing > - [x] linux-riscv client VM fastdebug native build Gui Cao has updated the pull request incrementally with one additional commit since the last revision: Remove check for macro COMPILER2_OR_JVMCI in generate_compiler_stubs function ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20386/files - new: https://git.openjdk.org/jdk/pull/20386/files/d9055e6f..9d3b6d29 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20386&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20386&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20386.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20386/head:pull/20386 PR: https://git.openjdk.org/jdk/pull/20386 From gcao at openjdk.org Tue Jul 30 08:16:44 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 30 Jul 2024 08:16:44 GMT Subject: RFR: 8337421: RISC-V: client VM build failure after JDK-8335191 [v2] In-Reply-To: <gqAdNg8JIOb7Tk9ZO-jKPYa8dGW9DTDU57FIyv02Rqg=.27da6419-ca78-47d6-a658-df0fec754e83@github.com> References: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com> <W6_6tk_93Tdgi18jxyNhKJVTbfzgrJmVTXcUdRa5GYo=.5566062a-5ca8-4ff1-b040-98d9ef7536cf@github.com> <gqAdNg8JIOb7Tk9ZO-jKPYa8dGW9DTDU57FIyv02Rqg=.27da6419-ca78-47d6-a658-df0fec754e83@github.com> Message-ID: <wH2MgatEWTnw2loLBXJEdPtKcrXo6cSNG_n45pAsJi8=.b95827f2-7eca-413f-8fe0-6d858bffc2dc@github.com> On Tue, 30 Jul 2024 08:10:24 GMT, Fei Yang <fyang at openjdk.org> wrote: > I think we can put this two under macro `COMPILER2` too. Then we can further remove check for macro `COMPILER2_OR_JVMCI` in this function. I don't think these stubs are ever used by JVMCI which is only partially implemented on this platform for now. Thanks for your review. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20386#discussion_r1696530195 From lucy at openjdk.org Tue Jul 30 08:25:39 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 30 Jul 2024 08:25:39 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v17] In-Reply-To: <NQ1QNuTBkNsmBReCpdhY1lrdIYz9s8UiNd1As1sLQ7M=.17c8f789-2bf1-4beb-891f-debccad29164@github.com> References: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com> <NQ1QNuTBkNsmBReCpdhY1lrdIYz9s8UiNd1As1sLQ7M=.17c8f789-2bf1-4beb-891f-debccad29164@github.com> Message-ID: <N83svaDOiUQdrjynAb0K834OxualQ_3FSJkKxL_0B3c=.5e5ffc29-2088-4345-aba2-23cdd9ae9817@github.com> On Mon, 1 Jul 2024 14:14:50 GMT, Amit Kumar <amitkumar at openjdk.org> wrote: >> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) >> >> I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. >> >> >> Without Patch: >> >> SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op >> >> SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op >> >> Benchmark Mode Cnt Score Error Units >> SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/cpu/s390/macroAssembler_s390.cpp > > Co-authored-by: Andrew Haley <aph-open at littlepinkcloud.com> Changes look good. Sorry for the poor response time. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19544#pullrequestreview-2207003971 From shade at openjdk.org Tue Jul 30 08:28:34 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 30 Jul 2024 08:28:34 GMT Subject: RFR: 8334230: Optimize C2 classes layout [v3] In-Reply-To: <9Xqj8lhtk5xtM-NHRl-GBFTZSzQcNw8yYo_ket5U0aM=.a752c907-e9c7-47cc-87c1-bf6bf0a3d642@github.com> References: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com> <9Xqj8lhtk5xtM-NHRl-GBFTZSzQcNw8yYo_ket5U0aM=.a752c907-e9c7-47cc-87c1-bf6bf0a3d642@github.com> Message-ID: <DU3m0oaK_eqlDJhlUjT9EW97-HApsm0FAO6UYOVwviM=.0c490ea0-96e6-4408-b149-8caea3447ecc@github.com> On Tue, 30 Jul 2024 00:53:04 GMT, Neethu Prasad <nprasad at openjdk.org> wrote: >> **Notes** >> >> Rearrange C2 class fields to optimize footprint. >> >> >> **Verification** >> >> 1. Ran tier2_compiler, hotspot_compiler, tier 1 & tier 2 tests. >> 2. Ran pahole on 64 bit machine post re-ordering and verified that there are no holes / reduction in total bytes. >> >> | Class | Size | Cachelines | Sum Members | Holes | Sum holes | Last Cacheline | Padding | >> | ----- | ----- | ---------- | --------------- | ----- | ---------- | --------------- | -------- | >> | ArrayPointer | 56 -> 48 | 1 -> 1 | 45 -> 0 | 2 -> 0 | 11 -> 0 | 56 bytes -> 48 | 0 -> 3 | >> | CallJavaNode | 152 -> 144 | 3 -> 3 | 12 -> 0 | 1 -> 0 | 5 -> 0 | 24 bytes -> 16 | 7 -> 4 | >> | C2Access | 56 -> 48 | 1-> 1 | 42 -> 0 | 1 -> 0 | 7 -> 0 | 56 bytes -> 48 | 7 -> 6 | >> | VectorSet| 32 -> 24 | 1-> 1 | 24 -> 0 | 1 -> 0 | 8 -> 0 | 32 bytes -> 24 | 1 -> 1 | >> >> class ArrayPointer { >> const class Node * _pointer; /* 0 8 */ >> const class Node * _base; /* 8 8 */ >> const jlong _constant_offset; /* 16 8 */ >> const class Node * _int_offset; /* 24 8 */ >> const class GrowableArray<Node*> * _other_offsets; /* 32 8 */ >> const jint _int_offset_shift; /* 40 4 */ >> const bool _is_valid; /* 44 1 */ >> public: >> >> >> /* size: 48, cachelines: 1, members: 7 */ >> /* padding: 3 */ >> /* last cacheline: 48 bytes */ >> }; >> >> >> >> class CallJavaNode : public CallNode { >> public: >> >> /* class CallNode <ancestor>; */ /* 0 128 */ >> protected: >> >> /* --- cacheline 2 boundary (128 bytes) --- */ >> class ciMethod * _method; /* 128 8 */ >> bool _optimized_virtual; /* 136 1 */ >> bool _method_handle_invoke; /* 137 1 */ >> bool _override_symbolic_info; /* 138 1 */ >> bool _arg_escape; /* 139 1 */ >> public: >> >> protected: >> >> public: >> >> >> /* size: 144, cachelines: 3, members: 6 */ >> /* padding: 4 */ >> /* last cacheline: 16 bytes */ >> >> /* BRAIN FART ALERT! 144 bytes != 12 (member bytes) + 0 (member bits) + 0 (byte holes) + 0 (bit holes), diff = 1024 bits */ >> }; >> >> >> >> class C2Access : public StackObj { >> public: >> >> /* class StackObj <ancestor>; */ /* 0 0 */ >> >> /* XXX last struct has 1 byte of padding */ >> > ... > > Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: > > Address constructor order issue for C2OptAccess Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19861#pullrequestreview-2207011005 From djelinski at openjdk.org Tue Jul 30 08:42:37 2024 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Tue, 30 Jul 2024 08:42:37 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string [v5] In-Reply-To: <I5ohuzIDvghA8wDhpSAQTppCO3Kqsbp9mGeDvxO6G4U=.1adfae97-c72c-4c53-a465-982e2d398873@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> <I5ohuzIDvghA8wDhpSAQTppCO3Kqsbp9mGeDvxO6G4U=.1adfae97-c72c-4c53-a465-982e2d398873@github.com> Message-ID: <ecGNOoxKho_Go2gZcszTdimzEVHA2NkzQ6XlX97xmoA=.63c2ca29-e65f-4ffb-b178-c87567650241@github.com> On Tue, 30 Jul 2024 05:41:08 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence. >> >> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this. >> >> Testing: >> - new gtest exercises the truncation code with the different possibilities for bad truncation >> - tiers 1-3 sanity testing >> >> Thanks. > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Fix off-by-one error src/hotspot/share/utilities/exceptions.cpp line 278: > 276: if (ret == -1 || ret >= max_msg_size) { > 277: int len = (int) strlen(msg); > 278: if (len > 0) { `truncate` asserts that len>5, you might need to adjust that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1696567086 From gcao at openjdk.org Tue Jul 30 08:48:11 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 30 Jul 2024 08:48:11 GMT Subject: RFR: 8337421: RISC-V: client VM build failure after JDK-8335191 [v4] In-Reply-To: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com> References: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com> Message-ID: <qkkur4f8aYsaOYftYIA67lXS3sFtoYJGjIonPoGsD4s=.d26a6434-2fd2-4f19-b355-3b6cfdb1fa49@github.com> > Hi, please help review this patch that fix the client VM build failed for riscv. > > Error log for client VM build to see: [JDK-8337421](https://bugs.openjdk.org/browse/JDK-8337421) > > The root cause is that MaxVectorSize is defined in COMPILER2 or JVMCI, which is not included in client mode. In addition to this, I have adjusted the functions related to initialization using UseSHA256Intrinsics, UseSHA512Intrinsics, UseMD5Intrinsics, UseChaCha20Intrinsics, UseSHA1Intrinsics, UseAdler32Intrinsics to be under the control of the COMPILER2 macro. And made related adjustments in VM_Version::c2_initialize(). > > ### Testing > - [x] linux-riscv client VM fastdebug native build Gui Cao has updated the pull request incrementally with one additional commit since the last revision: Polishing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20386/files - new: https://git.openjdk.org/jdk/pull/20386/files/9d3b6d29..edf16e07 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20386&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20386&range=02-03 Stats: 8 lines in 1 file changed: 4 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20386.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20386/head:pull/20386 PR: https://git.openjdk.org/jdk/pull/20386 From fyang at openjdk.org Tue Jul 30 08:48:11 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 30 Jul 2024 08:48:11 GMT Subject: RFR: 8337421: RISC-V: client VM build failure after JDK-8335191 [v4] In-Reply-To: <qkkur4f8aYsaOYftYIA67lXS3sFtoYJGjIonPoGsD4s=.d26a6434-2fd2-4f19-b355-3b6cfdb1fa49@github.com> References: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com> <qkkur4f8aYsaOYftYIA67lXS3sFtoYJGjIonPoGsD4s=.d26a6434-2fd2-4f19-b355-3b6cfdb1fa49@github.com> Message-ID: <vuPlLM3zCQEWYUVmgtZ95YJJrGwg7wh4gk9l7ABSED0=.bf84d10e-c2ad-4c2c-9920-98e7439a5633@github.com> On Tue, 30 Jul 2024 08:44:36 GMT, Gui Cao <gcao at openjdk.org> wrote: >> Hi, please help review this patch that fix the client VM build failed for riscv. >> >> Error log for client VM build to see: [JDK-8337421](https://bugs.openjdk.org/browse/JDK-8337421) >> >> The root cause is that MaxVectorSize is defined in COMPILER2 or JVMCI, which is not included in client mode. In addition to this, I have adjusted the functions related to initialization using UseSHA256Intrinsics, UseSHA512Intrinsics, UseMD5Intrinsics, UseChaCha20Intrinsics, UseSHA1Intrinsics, UseAdler32Intrinsics to be under the control of the COMPILER2 macro. And made related adjustments in VM_Version::c2_initialize(). >> >> ### Testing >> - [x] linux-riscv client VM fastdebug native build > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Polishing Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20386#pullrequestreview-2207054048 From fyang at openjdk.org Tue Jul 30 08:48:11 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 30 Jul 2024 08:48:11 GMT Subject: RFR: 8337421: RISC-V: client VM build failure after JDK-8335191 [v3] In-Reply-To: <HMZWo-m5vmc27XgW5E5SE6Gx_rr8nBnoglxOJ8J64Uw=.803cd228-4977-46c7-a267-3b2a9ffce03b@github.com> References: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com> <HMZWo-m5vmc27XgW5E5SE6Gx_rr8nBnoglxOJ8J64Uw=.803cd228-4977-46c7-a267-3b2a9ffce03b@github.com> Message-ID: <KoSCi1gr2QcrBOgsvYD14IHQijNV-iS4dYqiXNQghrg=.b0b49097-28f7-448c-a16c-d0ce816b616e@github.com> On Tue, 30 Jul 2024 08:16:44 GMT, Gui Cao <gcao at openjdk.org> wrote: >> Hi, please help review this patch that fix the client VM build failed for riscv. >> >> Error log for client VM build to see: [JDK-8337421](https://bugs.openjdk.org/browse/JDK-8337421) >> >> The root cause is that MaxVectorSize is defined in COMPILER2 or JVMCI, which is not included in client mode. In addition to this, I have adjusted the functions related to initialization using UseSHA256Intrinsics, UseSHA512Intrinsics, UseMD5Intrinsics, UseChaCha20Intrinsics, UseSHA1Intrinsics, UseAdler32Intrinsics to be under the control of the COMPILER2 macro. And made related adjustments in VM_Version::c2_initialize(). >> >> ### Testing >> - [x] linux-riscv client VM fastdebug native build > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Remove check for macro COMPILER2_OR_JVMCI in generate_compiler_stubs function src/hotspot/cpu/riscv/vm_version_riscv.cpp line 333: > 331: // NOTE: Make sure codes dependent on UseRVV are put after MaxVectorSize initialize, > 332: // as there are extra checks inside it which could disable UseRVV > 333: // in some situations. Please also move this code comment to immediately after initialization of MaxVectorSize. Otherwise looks good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20386#discussion_r1696567820 From clanger at openjdk.org Tue Jul 30 09:33:34 2024 From: clanger at openjdk.org (Christoph Langer) Date: Tue, 30 Jul 2024 09:33:34 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured In-Reply-To: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com> References: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com> Message-ID: <6WiWP5JSoQ6KQbe5FAg84BGAcRp0XtojWan0nyGaXjo=.562bc939-4751-47c8-a739-3b76cb67b710@github.com> On Wed, 26 Jun 2024 13:32:32 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote: > Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work. > > We find this in the test output > > [STDOUT] > /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory > > The container where the test is executed does not contain the ubsan package; we might skip the test in this case. I would prefer to add the required ubsan libraries to the container used for testing when testing an ubsan enabled build. Can we achieve this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19907#issuecomment-2257905835 From amitkumar at openjdk.org Tue Jul 30 09:35:38 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 30 Jul 2024 09:35:38 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v17] In-Reply-To: <G2IfSoXv1DKf69H_Gr5O_L-FTkQQgYGBS15UCNMoVt0=.acf2acd9-337c-4d45-8321-1c1be4e3316e@github.com> References: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com> <NQ1QNuTBkNsmBReCpdhY1lrdIYz9s8UiNd1As1sLQ7M=.17c8f789-2bf1-4beb-891f-debccad29164@github.com> <G2IfSoXv1DKf69H_Gr5O_L-FTkQQgYGBS15UCNMoVt0=.acf2acd9-337c-4d45-8321-1c1be4e3316e@github.com> Message-ID: <IVpAB244ihXt0tdqTHQMTSknGgfLjFZvptzpVmTG1Wg=.eca9e293-728e-4e7f-89b7-5cab6a6d40ef@github.com> On Thu, 4 Jul 2024 15:26:22 GMT, Andrew Haley <aph at openjdk.org> wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/cpu/s390/macroAssembler_s390.cpp >> >> Co-authored-by: Andrew Haley <aph-open at littlepinkcloud.com> > > Looks good. thank you @theRealAph @TheRealMDoerr @RealLucy for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19544#issuecomment-2257905725 From amitkumar at openjdk.org Tue Jul 30 09:35:38 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 30 Jul 2024 09:35:38 GMT Subject: Integrated: 8331126: [s390x] secondary_super_cache does not scale well In-Reply-To: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com> References: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com> Message-ID: <yVlMdHoceR8O1gGSRhgNq9IVsiz51AHlqiFvvRh2c50=.d89a9bea-7c55-4823-8435-efd21dfa7683@github.com> On Tue, 4 Jun 2024 15:19:51 GMT, Amit Kumar <amitkumar at openjdk.org> wrote: > s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) > > I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. > > > Without Patch: > > SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op > > SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op > > Benchmark Mode Cnt Score Error Units > SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op > SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op > SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op > SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op > SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op > SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op > SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op > SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op > SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op > SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op > SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op > SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op > SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op > SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op > SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op > SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op > SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op > SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op > SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op > SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op > SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op > SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op > SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ns/op > SecondarySupersLookup.testNegative61 avgt 15 27.763 ? 1.628 ns... This pull request has now been integrated. Changeset: 7ac53118 Author: Amit Kumar <amitkumar at openjdk.org> URL: https://git.openjdk.org/jdk/commit/7ac531181c25815577ba2f6f426e1da270e4f589 Stats: 429 lines in 6 files changed: 426 ins; 0 del; 3 mod 8331126: [s390x] secondary_super_cache does not scale well Reviewed-by: lucy, aph, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/19544 From kevinw at openjdk.org Tue Jul 30 10:14:37 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Tue, 30 Jul 2024 10:14:37 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v17] In-Reply-To: <fyALiuRCBIdxuyUue80jejw0G9ChAh4Y0kn--lbTTHY=.ea8dd1ae-6cec-416c-976b-fe027732dd79@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <fyALiuRCBIdxuyUue80jejw0G9ChAh4Y0kn--lbTTHY=.ea8dd1ae-6cec-416c-976b-fe027732dd79@github.com> Message-ID: <54ySFb85fkY1XfU-2IWvCwIWKijd_F8xS-vWm_wO7KY=.b9713fd9-9b3b-49d2-93fc-57054de1e190@github.com> On Mon, 29 Jul 2024 19:08:17 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. >> >> This PR addresses the following diagnostic commands: >> - [x] Compiler.perfmap >> - [x] GC.heap_dump >> - [x] System.dump_map >> - [x] Thread.dump_to_file >> - [x] VM.cds >> >> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). >> >> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, >> >> >> filename (Optional) Name of the file to which the flight recording data is >> written when the recording is stopped. If no filename is given, a >> filename is generated from the PID and the current date and is >> placed in the directory where the process was started. The >> filename may also be a directory in which case, the filename is >> generated from the PID and the current date in the specified >> directory. (STRING, no default value) >> >> Note: If a filename is given, '%p' in the filename will be >> replaced by the PID, and '%t' will be replaced by the time in >> 'yyyy_MM_dd_HH_mm_ss' format. >> >> >> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. >> >> Testing: >> >> - [x] Added test case passes. >> - [x] Modified existing VM.cds tests to also check for `%p` filenames. >> >> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). >> >> Cheers, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > last lingering change Thanks Sonia, and thanks Thomas! I did just see a poblem with DumpPerfMapAtExit that I didn't notice before. When -XX:+DumpPerfMapAtExit causes a call to CodeCache::write_perf_map, there's now no %p substitution so /tmp/perf-%p.map gets created. We all hate duplication but CodeCache::write_perf_map has two very different callers. It could do something like this (feel free to adjust/correct/do something else): src/hotspot/share/code/codeCache.cpp #ifdef LINUX void CodeCache::write_perf_map(const char* filename, outputStream* st) { MutexLocker mu(CodeCache_lock, Mutex::_no_safepoint_check_flag); + if (filename == nullptr) { + st->print_cr("Warning: Not writing perf map as null filename provided."); + return; + } + char fname[JVM_MAXPATHLEN]; + if (strstr(filename, "%p") != nullptr) { + // Unnecessary if filename contains %%p but will be a rare waste of time: + if (!Arguments::copy_expand_pid(filename, strlen(filename), fname, JVM_MAXPATHLEN)) { + st->print_cr("Warning: Not writing perf map as substitution failed."); + return; + } + filename = fname; + } + fileStream fs(filename, "w"); JVM_MAXPATHLEN will have a lot of slack space there as if it contains %p it really should be the default filename, so you could go with a lower value. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2257986965 From shade at openjdk.org Tue Jul 30 10:22:32 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 30 Jul 2024 10:22:32 GMT Subject: RFR: 8337418: Fix -Wzero-as-null-pointer-constant warnings in prims code In-Reply-To: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com> References: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com> Message-ID: <oftVH8FweQHlfgduKuMopunOkTBQ30f8s0j5dB0AnQo=.6ab59e3a-906c-41d9-93c1-d614209531e9@github.com> On Tue, 30 Jul 2024 04:12:33 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: > Please review this change that removes some uses of literal 0 as a null > pointer constant in prims code. > > Testing: mach5 tier1 All right, this looks fine. (I am somewhat allergic to `{}` syntax, but it is what it is.) ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20385#pullrequestreview-2207285678 From kevinw at openjdk.org Tue Jul 30 11:07:30 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Tue, 30 Jul 2024 11:07:30 GMT Subject: RFR: 8331015: Obsolete -XX:+UseNotificationThread In-Reply-To: <bLUGHCTJHF_LiwVu0wVJ2onQG6wD5_k_RnDstWMkkhw=.5b5d3af1-f406-41f4-b9b5-1137cab9fa8c@github.com> References: <bLUGHCTJHF_LiwVu0wVJ2onQG6wD5_k_RnDstWMkkhw=.5b5d3af1-f406-41f4-b9b5-1137cab9fa8c@github.com> Message-ID: <BY2HT1UNrrQ4e7TAJruG2X9Caqwgb5bQyT0JOuuakpQ=.b342d0c2-0386-47e8-9026-d1e8b5ad9a7a@github.com> On Tue, 30 Jul 2024 01:57:33 GMT, Alex Menkov <amenkov at openjdk.org> wrote: > Obsolete UseNotificationThread flag which was deprecated in JDK 23. > > Testing: tier1..tier5 Looks good ------------- Marked as reviewed by kevinw (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20381#pullrequestreview-2207376511 From sspitsyn at openjdk.org Tue Jul 30 11:23:32 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 30 Jul 2024 11:23:32 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v4] In-Reply-To: <Og5U6MWsTWn6yVFHLPi4Fovp1Nke8Lk41qCwReD0BIU=.5d3848df-da28-48f4-8801-3ad184e8762f@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <Og5U6MWsTWn6yVFHLPi4Fovp1Nke8Lk41qCwReD0BIU=.5d3848df-da28-48f4-8801-3ad184e8762f@github.com> Message-ID: <-4ohGO-ytRMr_I-4SRpWX6QDeZQCIhVho9mTQadK3MQ=.dff1bb8d-907c-4844-9e1b-801d69984d49@github.com> On Tue, 30 Jul 2024 06:46:20 GMT, Jiawei Tang <jwtang at openjdk.org> wrote: >> I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing. > > Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision: > > refactor testcase and change the location of fix codes Changes requested by sspitsyn (Reviewer). src/hotspot/share/prims/jvmtiExport.cpp line 1098: > 1096: if (JavaThread::current()->is_in_any_VTMS_transition()) { > 1097: return false; // no events should be posted if thread is in any VTMS transition > 1098: } Sorry, I was not clear the 3 lines above 1093-1095 had to be replaced with new lines 1096-1098. The check for `is_in_any_VTMS_transition()` includes the checks `is_in_tmp_VTMS_transition()`. ------------- PR Review: https://git.openjdk.org/jdk/pull/20373#pullrequestreview-2207409392 PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1696791636 From sspitsyn at openjdk.org Tue Jul 30 11:27:32 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 30 Jul 2024 11:27:32 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v3] In-Reply-To: <NUYBzJVMKqywg4-jWaehrYyh76pE84JYyD4n_iYnL0k=.c6682c01-9e20-4ebf-996e-7a715a53d0d7@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <Pq3717t6CcEZuvhb8V34_CyTW6eHdVtPs_u_nGRwib8=.2883d513-24b6-4d38-ae4d-90b0e78e7eac@github.com> <yBWdB5qfG39speceqxReLp2SRTzlOk3bWt1rjGK83lA=.041249fc-d4b5-4c81-9dc8-4193d82e3a28@github.com> <NUYBzJVMKqywg4-jWaehrYyh76pE84JYyD4n_iYnL0k=.c6682c01-9e20-4ebf-996e-7a715a53d0d7@github.com> Message-ID: <bZ-8wYbIe6rPMB0cv05KHVs3uajJdi_Lrqzg8A6bZVc=.daf9fada-9fe5-45c9-9572-a5660c2c4520@github.com> On Tue, 30 Jul 2024 06:46:38 GMT, Alan Bateman <alanb at openjdk.org> wrote: > > Can you convert the test to use .cpp instead of .c as well? > or maybe it could use VThreadPinner which allows calling through a native frame for tests like this. This is a good suggestion, I was also thinking about it. An example can be found in the test: `test/hotspot/jtreg/serviceability/jvmti/vthread/GetThreadState/GetThreadStateTest.java` ------------- PR Comment: https://git.openjdk.org/jdk/pull/20373#issuecomment-2258119426 From jwaters at openjdk.org Tue Jul 30 11:54:33 2024 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 30 Jul 2024 11:54:33 GMT Subject: RFR: 8337418: Fix -Wzero-as-null-pointer-constant warnings in prims code In-Reply-To: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com> References: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com> Message-ID: <b3UnImUMMfRpNDfiV2Td8ysgmt53Zxas8DTLFnt2ieM=.3da375d1-b0d7-49a8-b947-72fa35bee6ee@github.com> On Tue, 30 Jul 2024 04:12:33 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: > Please review this change that removes some uses of literal 0 as a null > pointer constant in prims code. > > Testing: mach5 tier1 Looks Good! ------------- Marked as reviewed by jwaters (Committer). PR Review: https://git.openjdk.org/jdk/pull/20385#pullrequestreview-2207470079 From jwaters at openjdk.org Tue Jul 30 11:54:34 2024 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 30 Jul 2024 11:54:34 GMT Subject: RFR: 8337418: Fix -Wzero-as-null-pointer-constant warnings in prims code In-Reply-To: <GHKiuJLY8J-ixCnxqrGAOyAJm0wdZUOGI6sbioUCNS8=.45d5fe4b-e059-4473-885f-ade0efae9cb5@github.com> References: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com> <Em7Cdv0NCUyAnZtjOQQaTNJleGEvIi4mWAFAuUVCz24=.4eebb188-2eaa-484e-8f5d-557ac99fd67d@github.com> <GHKiuJLY8J-ixCnxqrGAOyAJm0wdZUOGI6sbioUCNS8=.45d5fe4b-e059-4473-885f-ade0efae9cb5@github.com> Message-ID: <TqktOSyEh_Ox9Ex39pPlxyn_LBy3msYU05uizfb4tHc=.8d065351-2145-46d2-bc18-203cf3be6865@github.com> On Tue, 30 Jul 2024 06:54:10 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: >> src/hotspot/share/prims/methodHandles.cpp line 439: >> >>> 437: default: >>> 438: fatal("unexpected intrinsic id: %d %s", vmIntrinsics::as_int(iid), vmIntrinsics::name_at(iid)); >>> 439: return 0; >> >> Do we no longer need these returns after `fatal` to keep compilers happy? > > Now that we have, and are using, `[[noreturn]]` on all platforms, we no longer need that dead code. I'll admit, I do prefer having a return to end all possible control flows in a non void method, but oh well ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20385#discussion_r1696829614 From dholmes at openjdk.org Tue Jul 30 12:34:33 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 30 Jul 2024 12:34:33 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string [v5] In-Reply-To: <ecGNOoxKho_Go2gZcszTdimzEVHA2NkzQ6XlX97xmoA=.63c2ca29-e65f-4ffb-b178-c87567650241@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> <I5ohuzIDvghA8wDhpSAQTppCO3Kqsbp9mGeDvxO6G4U=.1adfae97-c72c-4c53-a465-982e2d398873@github.com> <ecGNOoxKho_Go2gZcszTdimzEVHA2NkzQ6XlX97xmoA=.63c2ca29-e65f-4ffb-b178-c87567650241@github.com> Message-ID: <5r9kuyj3yAUYArP73qpEE9Mkb0kNlboRq2m3CWNduIg=.d893b616-6c3a-4b2d-9b1b-ab9ece37e3ea@github.com> On Tue, 30 Jul 2024 08:39:51 GMT, Daniel Jeli?ski <djelinski at openjdk.org> wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix off-by-one error > > src/hotspot/share/utilities/exceptions.cpp line 278: > >> 276: if (ret == -1 || ret >= max_msg_size) { >> 277: int len = (int) strlen(msg); >> 278: if (len > 0) { > > `truncate` asserts that len>5, you might need to adjust that. We know we only got here because the message was either huge (-1) or > 1K. We only get a zero length if we got -1 and are on Windows. Any length < max_msg_size means we got -1 and are on macOS and we have truncated prior to the conversion that caused the INT_MAX overflow. In theory it could be a single "%s" format but in practice we don't call fthrow that way. Also if we got the -1 then there are actually very few circumstances that can get us to this point because it needs to be an exception message that can relate to huge strings (which at the moment is trying to look up a class with an illegally long name - something that will soon be handled on the Java side before we get to the VM.) So if we somehow were to trigger the len>5 assert, that is fine as it indicates something unusual/unexpected that we want to catch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1696883436 From djelinski at openjdk.org Tue Jul 30 12:38:33 2024 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Tue, 30 Jul 2024 12:38:33 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string [v5] In-Reply-To: <I5ohuzIDvghA8wDhpSAQTppCO3Kqsbp9mGeDvxO6G4U=.1adfae97-c72c-4c53-a465-982e2d398873@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> <I5ohuzIDvghA8wDhpSAQTppCO3Kqsbp9mGeDvxO6G4U=.1adfae97-c72c-4c53-a465-982e2d398873@github.com> Message-ID: <0DGLCSfGuXAZI6APR2EiKoRxAlxZo6OIRpDtHHTJ-is=.20ad55c6-830a-475c-bbcf-a0a9c84e771e@github.com> On Tue, 30 Jul 2024 05:41:08 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence. >> >> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this. >> >> Testing: >> - new gtest exercises the truncation code with the different possibilities for bad truncation >> - tiers 1-3 sanity testing >> >> Thanks. > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Fix off-by-one error LGTM ------------- Marked as reviewed by djelinski (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20345#pullrequestreview-2207567151 From mbaesken at openjdk.org Tue Jul 30 12:43:35 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 30 Jul 2024 12:43:35 GMT Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' [v4] In-Reply-To: <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com> References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com> <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com> Message-ID: <4ZJEarDQH0c_N4bwAtFQw3lG_WCwsa-c2QmHAmFD0J0=.ece94411-8818-4a29-8934-21ca8f60db1c@github.com> On Thu, 25 Jul 2024 13:42:48 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote: >> When running with ubsan - enabled binaries, some tests trigger the following report : >> >> src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' >> #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91 >> #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106 >> #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157 >> #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63 >> #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85 >> #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319 >> #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233 >> #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199 >> >> Seems in case of (at least) class SmallRegisterMap we miss handling nullptr . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > add patch of Kim Barrett Thanks for the reviews ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20296#issuecomment-2258251969 From mbaesken at openjdk.org Tue Jul 30 12:43:36 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 30 Jul 2024 12:43:36 GMT Subject: Integrated: 8333354: ubsan: frame.inline.hpp:91:25: and src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' In-Reply-To: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com> References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com> Message-ID: <W-pWj7nt20S7Ovrl0hnUWnP_SKbzwDt2wx0cemi4p9I=.dcd9cc26-f079-48a9-aea9-0c57893597c4@github.com> On Tue, 23 Jul 2024 09:49:38 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote: > When running with ubsan - enabled binaries, some tests trigger the following report : > > src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' > #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91 > #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106 > #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157 > #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63 > #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85 > #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319 > #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233 > #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199 > > Seems in case of (at least) class SmallRegisterMap we miss handling nullptr . This pull request has now been integrated. Changeset: 81628328 Author: Matthias Baesken <mbaesken at openjdk.org> URL: https://git.openjdk.org/jdk/commit/8162832832ac6e8c17f942e718e309a3489e0da6 Stats: 131 lines in 10 files changed: 50 ins; 64 del; 17 mod 8333354: ubsan: frame.inline.hpp:91:25: and src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call on null pointer of type 'const struct SmallRegisterMap' Co-authored-by: Kim Barrett <kbarrett at openjdk.org> Reviewed-by: rrich, clanger ------------- PR: https://git.openjdk.org/jdk/pull/20296 From mbaesken at openjdk.org Tue Jul 30 13:03:07 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 30 Jul 2024 13:03:07 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured [v2] In-Reply-To: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com> References: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com> Message-ID: <QlWywf3yso2GdzP_yf85VTKSE_n45c9PowGDpNbV_9c=.6e180952-944f-4de7-a0f3-4121a444cb31@github.com> > Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work. > > We find this in the test output > > [STDOUT] > /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory > > The container where the test is executed does not contain the ubsan package; we might skip the test in this case. Matthias Baesken has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge remote-tracking branch 'origin/master' into JDK-8333144 - JDK-8333144 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19907/files - new: https://git.openjdk.org/jdk/pull/19907/files/35163ff7..ba4f63be Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19907&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19907&range=00-01 Stats: 29403 lines in 1043 files changed: 19226 ins; 5688 del; 4489 mod Patch: https://git.openjdk.org/jdk/pull/19907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19907/head:pull/19907 PR: https://git.openjdk.org/jdk/pull/19907 From mbaesken at openjdk.org Tue Jul 30 13:26:45 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 30 Jul 2024 13:26:45 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured [v3] In-Reply-To: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com> References: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com> Message-ID: <WOwcaWSeF_X020nBqsY6rs7STGxZmZVuZAyeA3nt1Tg=.a16acf38-8c6d-429a-b184-8c5c04ac9ceb@github.com> > Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work. > > We find this in the test output > > [STDOUT] > /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory > > The container where the test is executed does not contain the ubsan package; we might skip the test in this case. Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: install libubsan1 into test container ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19907/files - new: https://git.openjdk.org/jdk/pull/19907/files/ba4f63be..4a792430 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19907&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19907&range=01-02 Stats: 12 lines in 2 files changed: 1 ins; 11 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19907/head:pull/19907 PR: https://git.openjdk.org/jdk/pull/19907 From mbaesken at openjdk.org Tue Jul 30 13:30:31 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 30 Jul 2024 13:30:31 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured In-Reply-To: <6WiWP5JSoQ6KQbe5FAg84BGAcRp0XtojWan0nyGaXjo=.562bc939-4751-47c8-a739-3b76cb67b710@github.com> References: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com> <6WiWP5JSoQ6KQbe5FAg84BGAcRp0XtojWan0nyGaXjo=.562bc939-4751-47c8-a739-3b76cb67b710@github.com> Message-ID: <5sIltLs4ES9qaTNsz1m1O2W1yz-ZfPWuUAGZM877XNA=.c118065b-6cf3-44ab-871c-937e373d3dc7@github.com> On Tue, 30 Jul 2024 09:31:15 GMT, Christoph Langer <clanger at openjdk.org> wrote: > I would prefer to add the required ubsan libraries to the container used for testing when testing an ubsan enabled build. Can we achieve this? I added libubsan1 to the container (tested it and works nicely, should do no harm if we test a non-ubsan build). Should we go this way ? If so I could remove the WhiteBox related changes (or keep it for other usages). I also tried to add a WhiteBox based check to 'DockerTestUtils.java' but this seems not to work. But as i said it is probably not necessary. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19907#issuecomment-2258351989 From asmehra at openjdk.org Tue Jul 30 13:35:35 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 30 Jul 2024 13:35:35 GMT Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic [v2] In-Reply-To: <lsdnO__d3kqEFpSJJVZOz7JSRaSQXjxT6xwC0kc1MxI=.ec76bf46-635c-411d-9d0c-918d286f0f0b@github.com> References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com> <5fyuvwoHRU_EUT2tvUsWwzCjd7dazKHMiL0rGWW8jVU=.fed6e33a-7a22-4b4c-950f-d19c18ee0eaf@github.com> <lsdnO__d3kqEFpSJJVZOz7JSRaSQXjxT6xwC0kc1MxI=.ec76bf46-635c-411d-9d0c-918d286f0f0b@github.com> Message-ID: <IeStexqMLw1WnPuf5RpzLaFFRraSGUV39IRN-Zr1N1k=.41eab3b5-5640-481a-8b53-7a7072489da2@github.com> On Tue, 30 Jul 2024 05:18:17 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments by Thomas S. >> >> Signed-off-by: Ashutosh Mehra <asmehra at redhat.com> > > src/hotspot/share/compiler/compilationMemoryStatistic.hpp line 40: > >> 38: >> 39: // Helper class to wrap the array of arena tags for easier processing >> 40: class ArenaTagsCounter { > > Sorry for being a stickler for precise names, but I would like plural for counters here - it is not a single counter, its a series/vector/array of counters. > Any of these work for me: ArenaCountersByTag - ArenaCountersByTagVector - ArenaTagCounterVector - ArenaTagCounters I am pretty bad in naming things, so I welcome these suggestions. I will go with ArenaCountersByTag. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1696973723 From nprasad at openjdk.org Tue Jul 30 13:52:33 2024 From: nprasad at openjdk.org (Neethu Prasad) Date: Tue, 30 Jul 2024 13:52:33 GMT Subject: RFR: 8334230: Optimize C2 classes layout [v3] In-Reply-To: <9Xqj8lhtk5xtM-NHRl-GBFTZSzQcNw8yYo_ket5U0aM=.a752c907-e9c7-47cc-87c1-bf6bf0a3d642@github.com> References: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com> <9Xqj8lhtk5xtM-NHRl-GBFTZSzQcNw8yYo_ket5U0aM=.a752c907-e9c7-47cc-87c1-bf6bf0a3d642@github.com> Message-ID: <cqc5eZCkK8915jLHjCDOiSGUJA74tTBZz7PYMg1czFc=.65685c5a-951b-4f55-b9a6-6228522f6eaf@github.com> On Tue, 30 Jul 2024 00:53:04 GMT, Neethu Prasad <nprasad at openjdk.org> wrote: >> **Notes** >> >> Rearrange C2 class fields to optimize footprint. >> >> >> **Verification** >> >> 1. Ran tier2_compiler, hotspot_compiler, tier 1 & tier 2 tests. >> 2. Ran pahole on 64 bit machine post re-ordering and verified that there are no holes / reduction in total bytes. >> >> | Class | Size | Cachelines | Sum Members | Holes | Sum holes | Last Cacheline | Padding | >> | ----- | ----- | ---------- | --------------- | ----- | ---------- | --------------- | -------- | >> | ArrayPointer | 56 -> 48 | 1 -> 1 | 45 -> 0 | 2 -> 0 | 11 -> 0 | 56 bytes -> 48 | 0 -> 3 | >> | CallJavaNode | 152 -> 144 | 3 -> 3 | 12 -> 0 | 1 -> 0 | 5 -> 0 | 24 bytes -> 16 | 7 -> 4 | >> | C2Access | 56 -> 48 | 1-> 1 | 42 -> 0 | 1 -> 0 | 7 -> 0 | 56 bytes -> 48 | 7 -> 6 | >> | VectorSet| 32 -> 24 | 1-> 1 | 24 -> 0 | 1 -> 0 | 8 -> 0 | 32 bytes -> 24 | 1 -> 1 | >> >> class ArrayPointer { >> const class Node * _pointer; /* 0 8 */ >> const class Node * _base; /* 8 8 */ >> const jlong _constant_offset; /* 16 8 */ >> const class Node * _int_offset; /* 24 8 */ >> const class GrowableArray<Node*> * _other_offsets; /* 32 8 */ >> const jint _int_offset_shift; /* 40 4 */ >> const bool _is_valid; /* 44 1 */ >> public: >> >> >> /* size: 48, cachelines: 1, members: 7 */ >> /* padding: 3 */ >> /* last cacheline: 48 bytes */ >> }; >> >> >> >> class CallJavaNode : public CallNode { >> public: >> >> /* class CallNode <ancestor>; */ /* 0 128 */ >> protected: >> >> /* --- cacheline 2 boundary (128 bytes) --- */ >> class ciMethod * _method; /* 128 8 */ >> bool _optimized_virtual; /* 136 1 */ >> bool _method_handle_invoke; /* 137 1 */ >> bool _override_symbolic_info; /* 138 1 */ >> bool _arg_escape; /* 139 1 */ >> public: >> >> protected: >> >> public: >> >> >> /* size: 144, cachelines: 3, members: 6 */ >> /* padding: 4 */ >> /* last cacheline: 16 bytes */ >> >> /* BRAIN FART ALERT! 144 bytes != 12 (member bytes) + 0 (member bits) + 0 (byte holes) + 0 (bit holes), diff = 1024 bits */ >> }; >> >> >> >> class C2Access : public StackObj { >> public: >> >> /* class StackObj <ancestor>; */ /* 0 0 */ >> >> /* XXX last struct has 1 byte of padding */ >> > ... > > Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: > > Address constructor order issue for C2OptAccess Thanks for the review & approval. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19861#issuecomment-2258399973 From duke at openjdk.org Tue Jul 30 13:52:33 2024 From: duke at openjdk.org (duke) Date: Tue, 30 Jul 2024 13:52:33 GMT Subject: RFR: 8334230: Optimize C2 classes layout [v3] In-Reply-To: <9Xqj8lhtk5xtM-NHRl-GBFTZSzQcNw8yYo_ket5U0aM=.a752c907-e9c7-47cc-87c1-bf6bf0a3d642@github.com> References: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com> <9Xqj8lhtk5xtM-NHRl-GBFTZSzQcNw8yYo_ket5U0aM=.a752c907-e9c7-47cc-87c1-bf6bf0a3d642@github.com> Message-ID: <uWFDikC4hzr1vv0BkTHm4YfYGcuYm086t5CHNsU6HB4=.2475c8c2-a712-4716-b3e2-c78eabf8433e@github.com> On Tue, 30 Jul 2024 00:53:04 GMT, Neethu Prasad <nprasad at openjdk.org> wrote: >> **Notes** >> >> Rearrange C2 class fields to optimize footprint. >> >> >> **Verification** >> >> 1. Ran tier2_compiler, hotspot_compiler, tier 1 & tier 2 tests. >> 2. Ran pahole on 64 bit machine post re-ordering and verified that there are no holes / reduction in total bytes. >> >> | Class | Size | Cachelines | Sum Members | Holes | Sum holes | Last Cacheline | Padding | >> | ----- | ----- | ---------- | --------------- | ----- | ---------- | --------------- | -------- | >> | ArrayPointer | 56 -> 48 | 1 -> 1 | 45 -> 0 | 2 -> 0 | 11 -> 0 | 56 bytes -> 48 | 0 -> 3 | >> | CallJavaNode | 152 -> 144 | 3 -> 3 | 12 -> 0 | 1 -> 0 | 5 -> 0 | 24 bytes -> 16 | 7 -> 4 | >> | C2Access | 56 -> 48 | 1-> 1 | 42 -> 0 | 1 -> 0 | 7 -> 0 | 56 bytes -> 48 | 7 -> 6 | >> | VectorSet| 32 -> 24 | 1-> 1 | 24 -> 0 | 1 -> 0 | 8 -> 0 | 32 bytes -> 24 | 1 -> 1 | >> >> class ArrayPointer { >> const class Node * _pointer; /* 0 8 */ >> const class Node * _base; /* 8 8 */ >> const jlong _constant_offset; /* 16 8 */ >> const class Node * _int_offset; /* 24 8 */ >> const class GrowableArray<Node*> * _other_offsets; /* 32 8 */ >> const jint _int_offset_shift; /* 40 4 */ >> const bool _is_valid; /* 44 1 */ >> public: >> >> >> /* size: 48, cachelines: 1, members: 7 */ >> /* padding: 3 */ >> /* last cacheline: 48 bytes */ >> }; >> >> >> >> class CallJavaNode : public CallNode { >> public: >> >> /* class CallNode <ancestor>; */ /* 0 128 */ >> protected: >> >> /* --- cacheline 2 boundary (128 bytes) --- */ >> class ciMethod * _method; /* 128 8 */ >> bool _optimized_virtual; /* 136 1 */ >> bool _method_handle_invoke; /* 137 1 */ >> bool _override_symbolic_info; /* 138 1 */ >> bool _arg_escape; /* 139 1 */ >> public: >> >> protected: >> >> public: >> >> >> /* size: 144, cachelines: 3, members: 6 */ >> /* padding: 4 */ >> /* last cacheline: 16 bytes */ >> >> /* BRAIN FART ALERT! 144 bytes != 12 (member bytes) + 0 (member bits) + 0 (byte holes) + 0 (bit holes), diff = 1024 bits */ >> }; >> >> >> >> class C2Access : public StackObj { >> public: >> >> /* class StackObj <ancestor>; */ /* 0 0 */ >> >> /* XXX last struct has 1 byte of padding */ >> > ... > > Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: > > Address constructor order issue for C2OptAccess @neethu-prasad Your change (at version 490c381ee37ec38774fd08b1239d28ad11ad7aa6) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19861#issuecomment-2258401588 From nprasad at openjdk.org Tue Jul 30 14:10:37 2024 From: nprasad at openjdk.org (Neethu Prasad) Date: Tue, 30 Jul 2024 14:10:37 GMT Subject: Integrated: 8334230: Optimize C2 classes layout In-Reply-To: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com> References: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com> Message-ID: <i47cPhAlUn1TTJhuzpzLGJ6m1nmvgLv1tNDlgq4X7jY=.f47d4aa7-4599-4a8a-8947-f656aaa2c0b4@github.com> On Mon, 24 Jun 2024 15:53:24 GMT, Neethu Prasad <nprasad at openjdk.org> wrote: > **Notes** > > Rearrange C2 class fields to optimize footprint. > > > **Verification** > > 1. Ran tier2_compiler, hotspot_compiler, tier 1 & tier 2 tests. > 2. Ran pahole on 64 bit machine post re-ordering and verified that there are no holes / reduction in total bytes. > > | Class | Size | Cachelines | Sum Members | Holes | Sum holes | Last Cacheline | Padding | > | ----- | ----- | ---------- | --------------- | ----- | ---------- | --------------- | -------- | > | ArrayPointer | 56 -> 48 | 1 -> 1 | 45 -> 0 | 2 -> 0 | 11 -> 0 | 56 bytes -> 48 | 0 -> 3 | > | CallJavaNode | 152 -> 144 | 3 -> 3 | 12 -> 0 | 1 -> 0 | 5 -> 0 | 24 bytes -> 16 | 7 -> 4 | > | C2Access | 56 -> 48 | 1-> 1 | 42 -> 0 | 1 -> 0 | 7 -> 0 | 56 bytes -> 48 | 7 -> 6 | > | VectorSet| 32 -> 24 | 1-> 1 | 24 -> 0 | 1 -> 0 | 8 -> 0 | 32 bytes -> 24 | 1 -> 1 | > > class ArrayPointer { > const class Node * _pointer; /* 0 8 */ > const class Node * _base; /* 8 8 */ > const jlong _constant_offset; /* 16 8 */ > const class Node * _int_offset; /* 24 8 */ > const class GrowableArray<Node*> * _other_offsets; /* 32 8 */ > const jint _int_offset_shift; /* 40 4 */ > const bool _is_valid; /* 44 1 */ > public: > > > /* size: 48, cachelines: 1, members: 7 */ > /* padding: 3 */ > /* last cacheline: 48 bytes */ > }; > > > > class CallJavaNode : public CallNode { > public: > > /* class CallNode <ancestor>; */ /* 0 128 */ > protected: > > /* --- cacheline 2 boundary (128 bytes) --- */ > class ciMethod * _method; /* 128 8 */ > bool _optimized_virtual; /* 136 1 */ > bool _method_handle_invoke; /* 137 1 */ > bool _override_symbolic_info; /* 138 1 */ > bool _arg_escape; /* 139 1 */ > public: > > protected: > > public: > > > /* size: 144, cachelines: 3, members: 6 */ > /* padding: 4 */ > /* last cacheline: 16 bytes */ > > /* BRAIN FART ALERT! 144 bytes != 12 (member bytes) + 0 (member bits) + 0 (byte holes) + 0 (bit holes), diff = 1024 bits */ > }; > > > > class C2Access : public StackObj { > public: > > /* class StackObj <ancestor>; */ /* 0 0 */ > > /* XXX last struct has 1 byte of padding */ > > int ()(void) * * _vptr.C2Access; /* 0 8 */ > protected: > > DecoratorSet _decorators; /* 8 ... This pull request has now been integrated. Changeset: 1cb27f7e Author: Neethu Prasad <nprasad at openjdk.org> URL: https://git.openjdk.org/jdk/commit/1cb27f7e2355ccb911bb274fc004e5bc57fd5dc9 Stats: 17 lines in 4 files changed: 8 ins; 8 del; 1 mod 8334230: Optimize C2 classes layout Reviewed-by: shade, kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/19861 From szaldana at openjdk.org Tue Jul 30 14:33:10 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Tue, 30 Jul 2024 14:33:10 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v18] In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> Message-ID: <kEEMF9wHZB2AKvO2w3Mg_o249GBDeZ8PlWMQFIejh7k=.ca4cd356-7c4f-4344-bb40-b336c717dae4@github.com> > Hi all, > > This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. > > This PR addresses the following diagnostic commands: > - [x] Compiler.perfmap > - [x] GC.heap_dump > - [x] System.dump_map > - [x] Thread.dump_to_file > - [x] VM.cds > > Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). > > I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, > > > filename (Optional) Name of the file to which the flight recording data is > written when the recording is stopped. If no filename is given, a > filename is generated from the PID and the current date and is > placed in the directory where the process was started. The > filename may also be a directory in which case, the filename is > generated from the PID and the current date in the specified > directory. (STRING, no default value) > > Note: If a filename is given, '%p' in the filename will be > replaced by the PID, and '%t' will be replaced by the time in > 'yyyy_MM_dd_HH_mm_ss' format. > > > Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. > > Testing: > > - [x] Added test case passes. > - [x] Modified existing VM.cds tests to also check for `%p` filenames. > > Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). > > Cheers, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: Fixing invocation outside of jcmd ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20198/files - new: https://git.openjdk.org/jdk/pull/20198/files/ceb96eb9..564349e3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=16-17 Stats: 12 lines in 2 files changed: 11 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198 PR: https://git.openjdk.org/jdk/pull/20198 From kevinw at openjdk.org Tue Jul 30 14:52:35 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Tue, 30 Jul 2024 14:52:35 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v18] In-Reply-To: <kEEMF9wHZB2AKvO2w3Mg_o249GBDeZ8PlWMQFIejh7k=.ca4cd356-7c4f-4344-bb40-b336c717dae4@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> <kEEMF9wHZB2AKvO2w3Mg_o249GBDeZ8PlWMQFIejh7k=.ca4cd356-7c4f-4344-bb40-b336c717dae4@github.com> Message-ID: <vYX77VW0Aj5Uj7sLL1tTsCHkNgGO9fZ0wEkD5gsyR7s=.99bcd306-43da-43b1-a56b-2fba622a9b3f@github.com> On Tue, 30 Jul 2024 14:33:10 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: >> Hi all, >> >> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. >> >> This PR addresses the following diagnostic commands: >> - [x] Compiler.perfmap >> - [x] GC.heap_dump >> - [x] System.dump_map >> - [x] Thread.dump_to_file >> - [x] VM.cds >> >> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). >> >> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, >> >> >> filename (Optional) Name of the file to which the flight recording data is >> written when the recording is stopped. If no filename is given, a >> filename is generated from the PID and the current date and is >> placed in the directory where the process was started. The >> filename may also be a directory in which case, the filename is >> generated from the PID and the current date in the specified >> directory. (STRING, no default value) >> >> Note: If a filename is given, '%p' in the filename will be >> replaced by the PID, and '%t' will be replaced by the time in >> 'yyyy_MM_dd_HH_mm_ss' format. >> >> >> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. >> >> Testing: >> >> - [x] Added test case passes. >> - [x] Modified existing VM.cds tests to also check for `%p` filenames. >> >> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). >> >> Cheers, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Fixing invocation outside of jcmd OK great, thanks for that last update, nothing more to say! 8-) I will do the man page update and get it out for review soon. ------------- Marked as reviewed by kevinw (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20198#pullrequestreview-2207924716 PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2258543294 From asmehra at openjdk.org Tue Jul 30 15:02:51 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 30 Jul 2024 15:02:51 GMT Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic [v3] In-Reply-To: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com> References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com> Message-ID: <-9CKmLLhUHanmV4kGh3Mzti8od9vxyQAQ_t2hvQVLX4=.c50db040-3a76-42de-ae55-27e4db236fda@github.com> > Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031) > > Testing: > test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java > test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision: Rename ArenaTagsCounter to ArenaCountersByTag Signed-off-by: Ashutosh Mehra <asmehra at redhat.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20304/files - new: https://git.openjdk.org/jdk/pull/20304/files/008ac6b9..70762110 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20304&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20304&range=01-02 Stats: 7 lines in 2 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/20304.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20304/head:pull/20304 PR: https://git.openjdk.org/jdk/pull/20304 From asmehra at openjdk.org Tue Jul 30 15:02:51 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 30 Jul 2024 15:02:51 GMT Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic In-Reply-To: <yqXqOJCnOBQToVnGiTvMv9SRVECCZuArbWqfiVEj6VE=.eb63b66f-63a5-4c51-8757-87f2694afd98@github.com> References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com> <yqXqOJCnOBQToVnGiTvMv9SRVECCZuArbWqfiVEj6VE=.eb63b66f-63a5-4c51-8757-87f2694afd98@github.com> Message-ID: <PGwdTyL6oCyFp513QBNKaVsBE9dKstmCKacfcxYf3jg=.0c9730d7-37a5-4912-b57a-0a69696fd18b@github.com> On Wed, 24 Jul 2024 10:45:05 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031) >> >> Testing: >> test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java >> test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java > > I plan to look at this later this week. @tstuefe renamed ArenaTagsCounter to ArenaCountersByTag ------------- PR Comment: https://git.openjdk.org/jdk/pull/20304#issuecomment-2258563618 From kvn at openjdk.org Tue Jul 30 17:10:33 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 30 Jul 2024 17:10:33 GMT Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic [v3] In-Reply-To: <-9CKmLLhUHanmV4kGh3Mzti8od9vxyQAQ_t2hvQVLX4=.c50db040-3a76-42de-ae55-27e4db236fda@github.com> References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com> <-9CKmLLhUHanmV4kGh3Mzti8od9vxyQAQ_t2hvQVLX4=.c50db040-3a76-42de-ae55-27e4db236fda@github.com> Message-ID: <_XRsVlxQxbH7kHV8gIwxeK6S2Cy3oT0frrcIzOgQESY=.202dcbf2-3258-417a-933e-eab10e3d7fbc@github.com> On Tue, 30 Jul 2024 15:02:51 GMT, Ashutosh Mehra <asmehra at openjdk.org> wrote: >> Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031) >> >> Testing: >> test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java >> test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java > > Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision: > > Rename ArenaTagsCounter to ArenaCountersByTag > > Signed-off-by: Ashutosh Mehra <asmehra at redhat.com> Looks reasonable. I submitted our testing to make sure tests passed in our testing. ------------- PR Review: https://git.openjdk.org/jdk/pull/20304#pullrequestreview-2208245618 From szaldana at openjdk.org Tue Jul 30 18:43:41 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Tue, 30 Jul 2024 18:43:41 GMT Subject: Integrated: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> Message-ID: <7Dk-ApfeC6fpoIAMknLszeWlrg0G-h728jC6iaHn2S4=.e2253687-64ca-48cf-bf2e-9f1a680049c4@github.com> On Tue, 16 Jul 2024 16:25:50 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote: > Hi all, > > This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. > > This PR addresses the following diagnostic commands: > - [x] Compiler.perfmap > - [x] GC.heap_dump > - [x] System.dump_map > - [x] Thread.dump_to_file > - [x] VM.cds > > Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). > > I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, > > > filename (Optional) Name of the file to which the flight recording data is > written when the recording is stopped. If no filename is given, a > filename is generated from the PID and the current date and is > placed in the directory where the process was started. The > filename may also be a directory in which case, the filename is > generated from the PID and the current date in the specified > directory. (STRING, no default value) > > Note: If a filename is given, '%p' in the filename will be > replaced by the PID, and '%t' will be replaced by the time in > 'yyyy_MM_dd_HH_mm_ss' format. > > > Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. > > Testing: > > - [x] Added test case passes. > - [x] Modified existing VM.cds tests to also check for `%p` filenames. > > Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). > > Cheers, > Sonia This pull request has now been integrated. Changeset: f5c9e8f1 Author: Sonia Zaldana Calles <szaldana at openjdk.org> URL: https://git.openjdk.org/jdk/commit/f5c9e8f1229f3aa00743119a2dee3e15d57048d8 Stats: 173 lines in 11 files changed: 134 ins; 16 del; 23 mod 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID Reviewed-by: kevinw, stuefe, lmesnik ------------- PR: https://git.openjdk.org/jdk/pull/20198 From kvn at openjdk.org Tue Jul 30 21:08:34 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 30 Jul 2024 21:08:34 GMT Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic [v3] In-Reply-To: <-9CKmLLhUHanmV4kGh3Mzti8od9vxyQAQ_t2hvQVLX4=.c50db040-3a76-42de-ae55-27e4db236fda@github.com> References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com> <-9CKmLLhUHanmV4kGh3Mzti8od9vxyQAQ_t2hvQVLX4=.c50db040-3a76-42de-ae55-27e4db236fda@github.com> Message-ID: <TCzh2VR4vb__B5WC8W5yGFITgdOjkquLymnj5pzbSac=.fec19a8b-e77a-4846-b916-ba5c51fd9801@github.com> On Tue, 30 Jul 2024 15:02:51 GMT, Ashutosh Mehra <asmehra at openjdk.org> wrote: >> Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031) >> >> Testing: >> test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java >> test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java > > Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision: > > Rename ArenaTagsCounter to ArenaCountersByTag > > Signed-off-by: Ashutosh Mehra <asmehra at redhat.com> My testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20304#pullrequestreview-2208698602 From dholmes at openjdk.org Tue Jul 30 22:38:36 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 30 Jul 2024 22:38:36 GMT Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string [v5] In-Reply-To: <0DGLCSfGuXAZI6APR2EiKoRxAlxZo6OIRpDtHHTJ-is=.20ad55c6-830a-475c-bbcf-a0a9c84e771e@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> <I5ohuzIDvghA8wDhpSAQTppCO3Kqsbp9mGeDvxO6G4U=.1adfae97-c72c-4c53-a465-982e2d398873@github.com> <0DGLCSfGuXAZI6APR2EiKoRxAlxZo6OIRpDtHHTJ-is=.20ad55c6-830a-475c-bbcf-a0a9c84e771e@github.com> Message-ID: <gqsiIGMIc2Z68KOimRr82y0DFUkOxHrJKBy5m4r6D44=.a4469385-ec21-4c12-b51d-05e8fe19acf9@github.com> On Tue, 30 Jul 2024 12:36:01 GMT, Daniel Jeli?ski <djelinski at openjdk.org> wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix off-by-one error > > LGTM Thanks for the review and the assistance @djelinski ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20345#issuecomment-2259316733 From dholmes at openjdk.org Tue Jul 30 22:38:37 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 30 Jul 2024 22:38:37 GMT Subject: Integrated: 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string In-Reply-To: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com> Message-ID: <owL-YVQEFHKjBUu4GxNs1KZFFe6w4R0994d0b37sQQk=.4ed07c8b-4a98-4cce-82a0-6ccae6d0d18a@github.com> On Fri, 26 Jul 2024 04:03:10 GMT, David Holmes <dholmes at openjdk.org> wrote: > Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence. > > The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this. > > Testing: > - new gtest exercises the truncation code with the different possibilities for bad truncation > - tiers 1-3 sanity testing > > Thanks. This pull request has now been integrated. Changeset: 5b7bb40d Author: David Holmes <dholmes at openjdk.org> URL: https://git.openjdk.org/jdk/commit/5b7bb40d1f5a8e1261cc252db2a09b5e4f07e5f0 Stats: 194 lines in 4 files changed: 191 ins; 0 del; 3 mod 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string Reviewed-by: djelinski, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/20345 From dholmes at openjdk.org Tue Jul 30 23:07:56 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 30 Jul 2024 23:07:56 GMT Subject: RFR: 8337515: JVM_DumpAllStacks is dead code Message-ID: <bj6I0sIWwBuxo-YC7xW59uo-mEN1RByejEDgf2nKa6w=.8655fc0c-6264-4af1-be7b-02966460e3f4@github.com> Trivial cleanup of long unused code. Thanks. ------------- Commit messages: - 8337515: JVM_DumpAllStacks is dead code Changes: https://git.openjdk.org/jdk/pull/20396/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20396&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337515 Stats: 11 lines in 2 files changed: 0 ins; 11 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20396.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20396/head:pull/20396 PR: https://git.openjdk.org/jdk/pull/20396 From kvn at openjdk.org Wed Jul 31 00:32:34 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 31 Jul 2024 00:32:34 GMT Subject: RFR: 8337515: JVM_DumpAllStacks is dead code In-Reply-To: <bj6I0sIWwBuxo-YC7xW59uo-mEN1RByejEDgf2nKa6w=.8655fc0c-6264-4af1-be7b-02966460e3f4@github.com> References: <bj6I0sIWwBuxo-YC7xW59uo-mEN1RByejEDgf2nKa6w=.8655fc0c-6264-4af1-be7b-02966460e3f4@github.com> Message-ID: <xbSCjeDRwb0CWIhmEcQlhhZpHuwWrjosn3OulE4bhe8=.fc77088d-bedc-47c4-98b2-7ac74f368d31@github.com> On Tue, 30 Jul 2024 23:03:43 GMT, David Holmes <dholmes at openjdk.org> wrote: > Trivial cleanup of long unused code. > > Thanks. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20396#pullrequestreview-2208930663 From dholmes at openjdk.org Wed Jul 31 01:01:49 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 31 Jul 2024 01:01:49 GMT Subject: RFR: 8337515: JVM_DumpAllStacks is dead code In-Reply-To: <xbSCjeDRwb0CWIhmEcQlhhZpHuwWrjosn3OulE4bhe8=.fc77088d-bedc-47c4-98b2-7ac74f368d31@github.com> References: <bj6I0sIWwBuxo-YC7xW59uo-mEN1RByejEDgf2nKa6w=.8655fc0c-6264-4af1-be7b-02966460e3f4@github.com> <xbSCjeDRwb0CWIhmEcQlhhZpHuwWrjosn3OulE4bhe8=.fc77088d-bedc-47c4-98b2-7ac74f368d31@github.com> Message-ID: <ZJ-yiycoeCDJQdCnLTYlhSE8DrHb5YD-g9XC4pcs1ck=.ddf8bb17-817b-494f-8514-eb399b943038@github.com> On Wed, 31 Jul 2024 00:30:05 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote: >> Trivial cleanup of long unused code. >> >> Thanks. > > Good. Thanks for the review @vnkozlov ------------- PR Comment: https://git.openjdk.org/jdk/pull/20396#issuecomment-2259436253 From dholmes at openjdk.org Wed Jul 31 01:01:49 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 31 Jul 2024 01:01:49 GMT Subject: Integrated: 8337515: JVM_DumpAllStacks is dead code In-Reply-To: <bj6I0sIWwBuxo-YC7xW59uo-mEN1RByejEDgf2nKa6w=.8655fc0c-6264-4af1-be7b-02966460e3f4@github.com> References: <bj6I0sIWwBuxo-YC7xW59uo-mEN1RByejEDgf2nKa6w=.8655fc0c-6264-4af1-be7b-02966460e3f4@github.com> Message-ID: <l17smZve1znr7ezNHLNjv3-ScmzFzIA5OATXYBMD3xs=.18ed23c4-c56a-4828-8523-9ad9317b2519@github.com> On Tue, 30 Jul 2024 23:03:43 GMT, David Holmes <dholmes at openjdk.org> wrote: > Trivial cleanup of long unused code. > > Thanks. This pull request has now been integrated. Changeset: 1c6fef8f Author: David Holmes <dholmes at openjdk.org> URL: https://git.openjdk.org/jdk/commit/1c6fef8f1cd12b26de9d31799a6516ce4399313f Stats: 11 lines in 2 files changed: 0 ins; 11 del; 0 mod 8337515: JVM_DumpAllStacks is dead code Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/20396 From asmehra at openjdk.org Wed Jul 31 01:38:44 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 31 Jul 2024 01:38:44 GMT Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic In-Reply-To: <yqXqOJCnOBQToVnGiTvMv9SRVECCZuArbWqfiVEj6VE=.eb63b66f-63a5-4c51-8757-87f2694afd98@github.com> References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com> <yqXqOJCnOBQToVnGiTvMv9SRVECCZuArbWqfiVEj6VE=.eb63b66f-63a5-4c51-8757-87f2694afd98@github.com> Message-ID: <rC-wpaU8uWZFb1Eq185T4yfXrH4C_Oea9qHRJMgnnf0=.47602ddd-f226-493f-8b39-57408dfc2daf@github.com> On Wed, 24 Jul 2024 10:45:05 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031) >> >> Testing: >> test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java >> test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java > > I plan to look at this later this week. thanks @tstuefe @vnkozlov for reviewing and testing ------------- PR Comment: https://git.openjdk.org/jdk/pull/20304#issuecomment-2259466393 From asmehra at openjdk.org Wed Jul 31 01:38:45 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 31 Jul 2024 01:38:45 GMT Subject: Integrated: 8337031: Improvements to CompilationMemoryStatistic In-Reply-To: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com> References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com> Message-ID: <V9CEV-z2FNrqekAnMk8JPDsz6RHRRPZB2sXPhITAkio=.f159a6fa-f1b5-4f5f-a16a-e1e6e1119732@github.com> On Tue, 23 Jul 2024 21:46:50 GMT, Ashutosh Mehra <asmehra at openjdk.org> wrote: > Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031) > > Testing: > test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java > test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java This pull request has now been integrated. Changeset: e63d0165 Author: Ashutosh Mehra <asmehra at openjdk.org> URL: https://git.openjdk.org/jdk/commit/e63d01654e0b702b3a8c0c4de97a6bb6893fbd1f Stats: 182 lines in 6 files changed: 84 ins; 27 del; 71 mod 8337031: Improvements to CompilationMemoryStatistic Reviewed-by: kvn, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/20304 From kbarrett at openjdk.org Wed Jul 31 01:56:31 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 31 Jul 2024 01:56:31 GMT Subject: RFR: 8337418: Fix -Wzero-as-null-pointer-constant warnings in prims code In-Reply-To: <TqktOSyEh_Ox9Ex39pPlxyn_LBy3msYU05uizfb4tHc=.8d065351-2145-46d2-bc18-203cf3be6865@github.com> References: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com> <Em7Cdv0NCUyAnZtjOQQaTNJleGEvIi4mWAFAuUVCz24=.4eebb188-2eaa-484e-8f5d-557ac99fd67d@github.com> <GHKiuJLY8J-ixCnxqrGAOyAJm0wdZUOGI6sbioUCNS8=.45d5fe4b-e059-4473-885f-ade0efae9cb5@github.com> <TqktOSyEh_Ox9Ex39pPlxyn_LBy3msYU05uizfb4tHc=.8d065351-2145-46d2-bc18-203cf3be6865@github.com> Message-ID: <sQbWWUWpc4CwcM-ufoIrVblSxPAjH1dBTLXDYheqo2U=.a3505ba7-fedc-4699-b60e-c0adc37f34c9@github.com> On Tue, 30 Jul 2024 11:51:37 GMT, Julian Waters <jwaters at openjdk.org> wrote: >> Now that we have, and are using, `[[noreturn]]` on all platforms, we no longer need that dead code. > > I'll admit, I do prefer having a return to end all possible control flows in a non void method, but oh well I would rather it not look like it can return null (or some other manufactured "default") when it actually can't. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20385#discussion_r1697794269 From kbarrett at openjdk.org Wed Jul 31 02:07:42 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 31 Jul 2024 02:07:42 GMT Subject: RFR: 8337418: Fix -Wzero-as-null-pointer-constant warnings in prims code In-Reply-To: <oftVH8FweQHlfgduKuMopunOkTBQ30f8s0j5dB0AnQo=.6ab59e3a-906c-41d9-93c1-d614209531e9@github.com> References: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com> <oftVH8FweQHlfgduKuMopunOkTBQ30f8s0j5dB0AnQo=.6ab59e3a-906c-41d9-93c1-d614209531e9@github.com> Message-ID: <4vjtLlNagS_WsEeSPaVUHZJsItLG-WvdHxIFTFnEvgk=.1a631179-543a-414e-ba8e-5c72a2f3c976@github.com> On Tue, 30 Jul 2024 10:19:59 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > All right, this looks fine. (I am somewhat allergic to `{}` syntax, but it is what it is.) The hoops one had to go through to get guaranteed value-initialization before we had brace initialization are really not pretty. See https://www.boost.org/doc/libs/1_85_0/libs/utility/doc/html/utility/utilities/value_init.html and its associated implementation. It might help if we were to commit to using direct brace initialization whenever appropriate, but that hasn't happened. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20385#issuecomment-2259500102 From aph at openjdk.org Wed Jul 31 06:31:15 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 31 Jul 2024 06:31:15 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v11] In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: <ZjZKhf-9TRmDtCdy22FudPupipyCsHsdXEEgE-P-nu8=.b514d4ec-6305-4b96-90cf-393b198a2a5d@github.com> > This patch expands the use of a hash table for secondary superclasses > to the interpreter, C1, and runtime. It also adds a C2 implementation > of hashed lookup in cases where the superclass isn't known at compile > time. > > HotSpot shared runtime > ---------------------- > > Building hashed secondary tables is now unconditional. It takes very > little time, and now that the shared runtime always has the tables, it > might as well take advantage of them. The shared code is easier to > follow now, I think. > > There might be a performance issue with x86-64 in that we build > HotSpot for a default x86-64 target that does not support popcount. > This means that HotSpot C++ runtime on x86 always uses a software > emulation for popcount, even though the vast majority of machines made > for the past 20 years can do popcount in a single instruction. It > wouldn't be terribly hard to do something about that. > > Having said that, the software popcount is really not bad. > > x86 > --- > > x86 is rather tricky, because we still support > `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as > well as 32- and 64-bit ports. There's some further complication in > that only `RCX` can be used as a shift count, so there's some register > shuffling to do. All of this makes the logic in macroAssembler_x86.cpp > rather gnarly, with multiple levels of conditionals at compile time > and runtime. > > AArch64 > ------- > > AArch64 is considerably more straightforward. We always have a > popcount instruction and (thankfully) no 32-bit code to worry about. > > Generally > --------- > > I would dearly love simply to rip out the "old" secondary supers cache > support, but I've left it in just in case someone has a performance > regression. > > The versions of `MacroAssembler::lookup_secondary_supers_table` that > work with variable superclasses don't take a fixed set of temp > registers, and neither do they call out to to a slow path subroutine. > Instead, the slow patch is expanded inline. > > I don't think this is necessarily bad. Apart from the very rare cases > where C2 can't determine the superclass to search for at compile time, > this code is only used for generating stubs, and it seemed to me > ridiculous to have stubs calling other stubs. > > I've followed the guidance from @iwanowww not to obsess too much about > the performance of C1-compiled secondary supers lookups, and to prefer > simplicity over absolute performance. Nonetheless, this is a > complicated patch that touches many areas. Andrew Haley has updated the pull request incrementally with three additional commits since the last revision: - Review comments - Experiment: test bitmap upfront. - Experiment: test bitmap upfront. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19989/files - new: https://git.openjdk.org/jdk/pull/19989/files/329f487a..5cca1cc2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=09-10 Stats: 26 lines in 4 files changed: 5 ins; 10 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/19989.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989 PR: https://git.openjdk.org/jdk/pull/19989 From aph at openjdk.org Wed Jul 31 06:45:05 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 31 Jul 2024 06:45:05 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v12] In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: <IgSEghEHB3YOa1xbX84bNChiSEr7V9hRXPdHP2Q8RQI=.d9f881f3-3693-4815-bf0b-e2393eced10d@github.com> > This patch expands the use of a hash table for secondary superclasses > to the interpreter, C1, and runtime. It also adds a C2 implementation > of hashed lookup in cases where the superclass isn't known at compile > time. > > HotSpot shared runtime > ---------------------- > > Building hashed secondary tables is now unconditional. It takes very > little time, and now that the shared runtime always has the tables, it > might as well take advantage of them. The shared code is easier to > follow now, I think. > > There might be a performance issue with x86-64 in that we build > HotSpot for a default x86-64 target that does not support popcount. > This means that HotSpot C++ runtime on x86 always uses a software > emulation for popcount, even though the vast majority of machines made > for the past 20 years can do popcount in a single instruction. It > wouldn't be terribly hard to do something about that. > > Having said that, the software popcount is really not bad. > > x86 > --- > > x86 is rather tricky, because we still support > `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as > well as 32- and 64-bit ports. There's some further complication in > that only `RCX` can be used as a shift count, so there's some register > shuffling to do. All of this makes the logic in macroAssembler_x86.cpp > rather gnarly, with multiple levels of conditionals at compile time > and runtime. > > AArch64 > ------- > > AArch64 is considerably more straightforward. We always have a > popcount instruction and (thankfully) no 32-bit code to worry about. > > Generally > --------- > > I would dearly love simply to rip out the "old" secondary supers cache > support, but I've left it in just in case someone has a performance > regression. > > The versions of `MacroAssembler::lookup_secondary_supers_table` that > work with variable superclasses don't take a fixed set of temp > registers, and neither do they call out to to a slow path subroutine. > Instead, the slow patch is expanded inline. > > I don't think this is necessarily bad. Apart from the very rare cases > where C2 can't determine the superclass to search for at compile time, > this code is only used for generating stubs, and it seemed to me > ridiculous to have stubs calling other stubs. > > I've followed the guidance from @iwanowww not to obsess too much about > the performance of C1-compiled secondary supers lookups, and to prefer > simplicity over absolute performance. Nonetheless, this is a > complicated patch that touches many areas. Andrew Haley has updated the pull request incrementally with three additional commits since the last revision: - Merge branch 'JDK-8331658-work' of https://github.com/theRealAph/jdk into JDK-8331658-work - Fix AArch64 - Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19989/files - new: https://git.openjdk.org/jdk/pull/19989/files/5cca1cc2..2769d9e7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=10-11 Stats: 4 lines in 2 files changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19989.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989 PR: https://git.openjdk.org/jdk/pull/19989 From clanger at openjdk.org Wed Jul 31 07:22:41 2024 From: clanger at openjdk.org (Christoph Langer) Date: Wed, 31 Jul 2024 07:22:41 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured [v3] In-Reply-To: <WOwcaWSeF_X020nBqsY6rs7STGxZmZVuZAyeA3nt1Tg=.a16acf38-8c6d-429a-b184-8c5c04ac9ceb@github.com> References: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com> <WOwcaWSeF_X020nBqsY6rs7STGxZmZVuZAyeA3nt1Tg=.a16acf38-8c6d-429a-b184-8c5c04ac9ceb@github.com> Message-ID: <mc7jusNH20OOjMvwVhpsetjm0A1li1OjUlccr7qiOxc=.45faf772-7e8b-42ab-a7e7-a7aef5833cac@github.com> On Tue, 30 Jul 2024 13:26:45 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote: >> Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work. >> >> We find this in the test output >> >> [STDOUT] >> /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory >> >> The container where the test is executed does not contain the ubsan package; we might skip the test in this case. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > install libubsan1 into test container I think adding libubsan1 to the test container is the best way to go. If it cannot be made conditional on ubsan builds then be it so. Then the Whitebox changes should be removed obviously. test/hotspot/jtreg/containers/docker/DockerBasicTest.java line 34: > 32: * jdk.jartool/sun.tools.jar > 33: * @build HelloDocker > 34: * @run driver DockerBasicTest you remove @build and @run directives from the test - probably not desired. ------------- Changes requested by clanger (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19907#pullrequestreview-2209366794 PR Review Comment: https://git.openjdk.org/jdk/pull/19907#discussion_r1698020822 From sspitsyn at openjdk.org Wed Jul 31 08:21:34 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 31 Jul 2024 08:21:34 GMT Subject: RFR: 8337418: Fix -Wzero-as-null-pointer-constant warnings in prims code In-Reply-To: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com> References: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com> Message-ID: <Ykmc8f7ocECyFzqaoQLO7uOx4yio6cqTR8-l-KA8nCk=.8dbfdc5a-bc55-4db9-b580-98b171d9600a@github.com> On Tue, 30 Jul 2024 04:12:33 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: > Please review this change that removes some uses of literal 0 as a null > pointer constant in prims code. > > Testing: mach5 tier1 Looks good. Thank you for fixing this! The `ResultType ret{};` syntax is a little bit unusual but I'm okay with that. :) ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20385#pullrequestreview-2209490825 From sspitsyn at openjdk.org Wed Jul 31 08:27:31 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 31 Jul 2024 08:27:31 GMT Subject: RFR: 8331015: Obsolete -XX:+UseNotificationThread In-Reply-To: <bLUGHCTJHF_LiwVu0wVJ2onQG6wD5_k_RnDstWMkkhw=.5b5d3af1-f406-41f4-b9b5-1137cab9fa8c@github.com> References: <bLUGHCTJHF_LiwVu0wVJ2onQG6wD5_k_RnDstWMkkhw=.5b5d3af1-f406-41f4-b9b5-1137cab9fa8c@github.com> Message-ID: <EjVshUcAGazHU4sYDhPK0JZJIJAHpOChL7qHhistaqM=.bec6c786-4687-4794-bcda-ffefc9a8b832@github.com> On Tue, 30 Jul 2024 01:57:33 GMT, Alex Menkov <amenkov at openjdk.org> wrote: > Obsolete UseNotificationThread flag which was deprecated in JDK 23. > > Testing: tier1..tier5 Marked as reviewed by sspitsyn (Reviewer). Looks good. ------------- PR Review: https://git.openjdk.org/jdk/pull/20381#pullrequestreview-2209502256 PR Review: https://git.openjdk.org/jdk/pull/20381#pullrequestreview-2209502965 From ayang at openjdk.org Wed Jul 31 11:32:00 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 31 Jul 2024 11:32:00 GMT Subject: RFR: 8337546: Remove unused GCCause::_adaptive_size_policy Message-ID: <R02fNqBAuGz5WSD5tVAonh0GB2j7Apo1j24MbRjPArA=.66527688-96e6-4fa5-aa28-c82af17efc0b@github.com> Trivial removing an unused gc-cause; it was previously used by Parallel only. ------------- Commit messages: - remove-gc-cause Changes: https://git.openjdk.org/jdk/pull/20403/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20403&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337546 Stats: 13 lines in 4 files changed: 0 ins; 11 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20403.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20403/head:pull/20403 PR: https://git.openjdk.org/jdk/pull/20403 From coleenp at openjdk.org Wed Jul 31 12:26:33 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 31 Jul 2024 12:26:33 GMT Subject: RFR: 8331015: Obsolete -XX:+UseNotificationThread In-Reply-To: <bLUGHCTJHF_LiwVu0wVJ2onQG6wD5_k_RnDstWMkkhw=.5b5d3af1-f406-41f4-b9b5-1137cab9fa8c@github.com> References: <bLUGHCTJHF_LiwVu0wVJ2onQG6wD5_k_RnDstWMkkhw=.5b5d3af1-f406-41f4-b9b5-1137cab9fa8c@github.com> Message-ID: <steBn0CiWBFUT9xZOgikAo_W_krsGNUtjmw8uY4qF-Y=.d67294ef-2fd8-43c0-adf4-96c8f43d1d58@github.com> On Tue, 30 Jul 2024 01:57:33 GMT, Alex Menkov <amenkov at openjdk.org> wrote: > Obsolete UseNotificationThread flag which was deprecated in JDK 23. > > Testing: tier1..tier5 Thank you for doing this. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20381#pullrequestreview-2210020383 From tschatzl at openjdk.org Wed Jul 31 13:12:33 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 31 Jul 2024 13:12:33 GMT Subject: RFR: 8337027: Parallel: Obsolete BaseFootPrintEstimate [v3] In-Reply-To: <F7o8T0rJbkysjNnN-pcQuRNXYfvRM1EyHrPSKBVcQ0Q=.79de0f66-f206-4e2a-ba1c-9d0f06bd025e@github.com> References: <wULp2EAECh8W75aA83GCDEq9GzldQzBwwe16SqY6phk=.902d4251-a271-4575-8ac3-4f2224ca453c@github.com> <F7o8T0rJbkysjNnN-pcQuRNXYfvRM1EyHrPSKBVcQ0Q=.79de0f66-f206-4e2a-ba1c-9d0f06bd025e@github.com> Message-ID: <WmFingximOMyc7ZwJfBYrTAd9rzbvgMVFeUnCV4r2tM=.83b7ac15-4e91-40c1-a39c-20762d0f270e@github.com> On Thu, 25 Jul 2024 07:44:45 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: >> Simple obsoleting a Parallel GC product flag. > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > copyright Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20299#pullrequestreview-2210127266 From kbarrett at openjdk.org Wed Jul 31 13:14:36 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 31 Jul 2024 13:14:36 GMT Subject: RFR: 8337418: Fix -Wzero-as-null-pointer-constant warnings in prims code In-Reply-To: <QctZn5PDD3uluZJ-W_CG3Ffo0f02PsY7Zlx5neUOICQ=.7ca80211-e028-4553-87f5-f27f17d903ea@github.com> References: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com> <QctZn5PDD3uluZJ-W_CG3Ffo0f02PsY7Zlx5neUOICQ=.7ca80211-e028-4553-87f5-f27f17d903ea@github.com> Message-ID: <PJRhnn2ICjJpBJcqrQevZe42VSf5E4VgnpJTnuUCLHg=.814e73d4-977c-4d29-84a6-e64242d74bbe@github.com> On Tue, 30 Jul 2024 07:54:43 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Please review this change that removes some uses of literal 0 as a null >> pointer constant in prims code. >> >> Testing: mach5 tier1 > > Okay - looks good. Thanks. Thanks for reviews @dholmes-ora , @shipilev , @TheShermanTanker , and @sspitsyn . ------------- PR Comment: https://git.openjdk.org/jdk/pull/20385#issuecomment-2260494132 From kbarrett at openjdk.org Wed Jul 31 13:18:46 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 31 Jul 2024 13:18:46 GMT Subject: Integrated: 8337418: Fix -Wzero-as-null-pointer-constant warnings in prims code In-Reply-To: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com> References: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com> Message-ID: <XfzbhHQy-zY2flp7kgZ6beSFVJiVzOJL-JYFkn41ZQ4=.d6448dc0-5f8c-490e-91de-6851dc6228d8@github.com> On Tue, 30 Jul 2024 04:12:33 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: > Please review this change that removes some uses of literal 0 as a null > pointer constant in prims code. > > Testing: mach5 tier1 This pull request has now been integrated. Changeset: 07dd7250 Author: Kim Barrett <kbarrett at openjdk.org> URL: https://git.openjdk.org/jdk/commit/07dd725025a54035436a76ac4c0a8bb2b12e264a Stats: 26 lines in 7 files changed: 0 ins; 3 del; 23 mod 8337418: Fix -Wzero-as-null-pointer-constant warnings in prims code Reviewed-by: dholmes, shade, jwaters, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/20385 From tschatzl at openjdk.org Wed Jul 31 13:29:32 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 31 Jul 2024 13:29:32 GMT Subject: RFR: 8337546: Remove unused GCCause::_adaptive_size_policy In-Reply-To: <R02fNqBAuGz5WSD5tVAonh0GB2j7Apo1j24MbRjPArA=.66527688-96e6-4fa5-aa28-c82af17efc0b@github.com> References: <R02fNqBAuGz5WSD5tVAonh0GB2j7Apo1j24MbRjPArA=.66527688-96e6-4fa5-aa28-c82af17efc0b@github.com> Message-ID: <vWWWgQbg3HbkjWXGT3hq1wj-wR9xWYAn9X2Yndd3Dc8=.1da20e11-5c7b-4989-9a01-7c5d04e8840e@github.com> On Wed, 31 Jul 2024 11:25:50 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: > Trivial removing an unused gc-cause; it was previously used by Parallel only. Good. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20403#pullrequestreview-2210168192 From kbarrett at openjdk.org Wed Jul 31 13:29:33 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 31 Jul 2024 13:29:33 GMT Subject: RFR: 8337546: Remove unused GCCause::_adaptive_size_policy In-Reply-To: <R02fNqBAuGz5WSD5tVAonh0GB2j7Apo1j24MbRjPArA=.66527688-96e6-4fa5-aa28-c82af17efc0b@github.com> References: <R02fNqBAuGz5WSD5tVAonh0GB2j7Apo1j24MbRjPArA=.66527688-96e6-4fa5-aa28-c82af17efc0b@github.com> Message-ID: <5oVJOB_OmdBtra0NImrqPvDCfneIQJnN7dW6YyOx3zc=.608a67f3-ad10-4946-9372-da4355862e45@github.com> On Wed, 31 Jul 2024 11:25:50 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: > Trivial removing an unused gc-cause; it was previously used by Parallel only. Looks good, and trivial. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20403#pullrequestreview-2210171121 From mbaesken at openjdk.org Wed Jul 31 14:07:46 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 31 Jul 2024 14:07:46 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured [v4] In-Reply-To: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com> References: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com> Message-ID: <WY7JVdGq2cEFGFQ5cypCV_cjVNlp73ZWVkjkr2kpzxA=.a67f045b-8fc4-4426-8ac7-a513942731ed@github.com> > Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work. > > We find this in the test output > > [STDOUT] > /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory > > The container where the test is executed does not contain the ubsan package; we might skip the test in this case. Matthias Baesken has updated the pull request incrementally with two additional commits since the last revision: - remove method from WhiteBox.java - remove WB_isUbsanEnabled, fix test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19907/files - new: https://git.openjdk.org/jdk/pull/19907/files/4a792430..c4c3fdbe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19907&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19907&range=02-03 Stats: 15 lines in 3 files changed: 2 ins; 11 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19907/head:pull/19907 PR: https://git.openjdk.org/jdk/pull/19907 From mbaesken at openjdk.org Wed Jul 31 14:07:46 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 31 Jul 2024 14:07:46 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured [v3] In-Reply-To: <WOwcaWSeF_X020nBqsY6rs7STGxZmZVuZAyeA3nt1Tg=.a16acf38-8c6d-429a-b184-8c5c04ac9ceb@github.com> References: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com> <WOwcaWSeF_X020nBqsY6rs7STGxZmZVuZAyeA3nt1Tg=.a16acf38-8c6d-429a-b184-8c5c04ac9ceb@github.com> Message-ID: <mNj85L8m-PP8I0YzYN3m_Fg25COYN_06eexKsUhSBUQ=.ee5974bc-dee8-4849-8285-29c5d734805a@github.com> On Tue, 30 Jul 2024 13:26:45 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote: >> Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work. >> >> We find this in the test output >> >> [STDOUT] >> /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory >> >> The container where the test is executed does not contain the ubsan package; we might skip the test in this case. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > install libubsan1 into test container I removed the WhiteBox stuff. Maybe David could give the change a try in the Oracle CI if that's possible ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19907#issuecomment-2260608472 From gcao at openjdk.org Wed Jul 31 14:20:39 2024 From: gcao at openjdk.org (Gui Cao) Date: Wed, 31 Jul 2024 14:20:39 GMT Subject: RFR: 8337421: RISC-V: client VM build failure after JDK-8335191 [v4] In-Reply-To: <qkkur4f8aYsaOYftYIA67lXS3sFtoYJGjIonPoGsD4s=.d26a6434-2fd2-4f19-b355-3b6cfdb1fa49@github.com> References: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com> <qkkur4f8aYsaOYftYIA67lXS3sFtoYJGjIonPoGsD4s=.d26a6434-2fd2-4f19-b355-3b6cfdb1fa49@github.com> Message-ID: <6Hb5Gtgjtpb9cZyyvCsHABv4X7kc2M-GT7ugPKFxcdw=.f76d7b81-cfff-46ee-b8ba-4b1d183a9fb5@github.com> On Tue, 30 Jul 2024 08:48:11 GMT, Gui Cao <gcao at openjdk.org> wrote: >> Hi, please help review this patch that fix the client VM build failed for riscv. >> >> Error log for client VM build to see: [JDK-8337421](https://bugs.openjdk.org/browse/JDK-8337421) >> >> The root cause is that MaxVectorSize is defined in COMPILER2 or JVMCI, which is not included in client mode. In addition to this, I have adjusted the functions related to initialization using UseSHA256Intrinsics, UseSHA512Intrinsics, UseMD5Intrinsics, UseChaCha20Intrinsics, UseSHA1Intrinsics, UseAdler32Intrinsics to be under the control of the COMPILER2 macro. And made related adjustments in VM_Version::c2_initialize(). >> >> ### Testing >> - [x] linux-riscv client VM fastdebug native build > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Polishing Thanks all for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20386#issuecomment-2260636217 From duke at openjdk.org Wed Jul 31 14:20:39 2024 From: duke at openjdk.org (duke) Date: Wed, 31 Jul 2024 14:20:39 GMT Subject: RFR: 8337421: RISC-V: client VM build failure after JDK-8335191 [v4] In-Reply-To: <qkkur4f8aYsaOYftYIA67lXS3sFtoYJGjIonPoGsD4s=.d26a6434-2fd2-4f19-b355-3b6cfdb1fa49@github.com> References: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com> <qkkur4f8aYsaOYftYIA67lXS3sFtoYJGjIonPoGsD4s=.d26a6434-2fd2-4f19-b355-3b6cfdb1fa49@github.com> Message-ID: <Z4P_mdHz2QKlHfnjwVdGWQHbxvdUEIC5Y1gwST99E64=.9e07730b-2d52-48d2-8536-2166d9580ccc@github.com> On Tue, 30 Jul 2024 08:48:11 GMT, Gui Cao <gcao at openjdk.org> wrote: >> Hi, please help review this patch that fix the client VM build failed for riscv. >> >> Error log for client VM build to see: [JDK-8337421](https://bugs.openjdk.org/browse/JDK-8337421) >> >> The root cause is that MaxVectorSize is defined in COMPILER2 or JVMCI, which is not included in client mode. In addition to this, I have adjusted the functions related to initialization using UseSHA256Intrinsics, UseSHA512Intrinsics, UseMD5Intrinsics, UseChaCha20Intrinsics, UseSHA1Intrinsics, UseAdler32Intrinsics to be under the control of the COMPILER2 macro. And made related adjustments in VM_Version::c2_initialize(). >> >> ### Testing >> - [x] linux-riscv client VM fastdebug native build > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Polishing @zifeihan Your change (at version edf16e07daf5bf644afb7bc0111e8ddb9ff32ffe) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20386#issuecomment-2260637846 From gcao at openjdk.org Wed Jul 31 14:46:37 2024 From: gcao at openjdk.org (Gui Cao) Date: Wed, 31 Jul 2024 14:46:37 GMT Subject: Integrated: 8337421: RISC-V: client VM build failure after JDK-8335191 In-Reply-To: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com> References: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com> Message-ID: <fvrLYt8b3LHEAPyCO4jM7rznhJIKuTKC58r7hzWiKw8=.5417b931-a3a2-4909-976c-d4d95a6208a5@github.com> On Tue, 30 Jul 2024 05:41:45 GMT, Gui Cao <gcao at openjdk.org> wrote: > Hi, please help review this patch that fix the client VM build failed for riscv. > > Error log for client VM build to see: [JDK-8337421](https://bugs.openjdk.org/browse/JDK-8337421) > > The root cause is that MaxVectorSize is defined in COMPILER2 or JVMCI, which is not included in client mode. In addition to this, I have adjusted the functions related to initialization using UseSHA256Intrinsics, UseSHA512Intrinsics, UseMD5Intrinsics, UseChaCha20Intrinsics, UseSHA1Intrinsics, UseAdler32Intrinsics to be under the control of the COMPILER2 macro. And made related adjustments in VM_Version::c2_initialize(). > > ### Testing > - [x] linux-riscv client VM fastdebug native build This pull request has now been integrated. Changeset: 7121d71b Author: Gui Cao <gcao at openjdk.org> URL: https://git.openjdk.org/jdk/commit/7121d71b516b415c7c11e3643731cd32d4057aa6 Stats: 207 lines in 3 files changed: 103 ins; 100 del; 4 mod 8337421: RISC-V: client VM build failure after JDK-8335191 Reviewed-by: fyang, mli ------------- PR: https://git.openjdk.org/jdk/pull/20386 From shade at openjdk.org Wed Jul 31 14:48:36 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 31 Jul 2024 14:48:36 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v2] In-Reply-To: <ZJj4fYHqnd5jkIRau4mSsU409_JidyOnKLTpqbNqoFY=.78a4eb10-1311-4d15-a148-f4e3fec17bd3@github.com> References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> <ZJj4fYHqnd5jkIRau4mSsU409_JidyOnKLTpqbNqoFY=.78a4eb10-1311-4d15-a148-f4e3fec17bd3@github.com> Message-ID: <NF4pT2sfZgR7LUZ8E9679vvcsjAFmF3kalBlJ55NJLE=.5c53a13f-4d6e-4161-92c6-60bfd620fba1@github.com> On Mon, 22 Jul 2024 08:49:12 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: >> See bug for more discussion. >> >> Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193 >> >> A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice: >> https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160 >> >> AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone. >> >> I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields. >> >> Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit. >> >> C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner. >> >> Additional testing: >> - [x] New IR tests >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x... > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Merge branch 'master' into JDK-8333791-stable-field-barrier > - Variant 2: Only final-field like semantics for stable inits > - Variant 3: Handle everything, including reads by compilers ...and still looking for formal reviews, pretty please :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2260695675 From ayang at openjdk.org Wed Jul 31 16:26:36 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 31 Jul 2024 16:26:36 GMT Subject: RFR: 8337027: Parallel: Obsolete BaseFootPrintEstimate [v3] In-Reply-To: <F7o8T0rJbkysjNnN-pcQuRNXYfvRM1EyHrPSKBVcQ0Q=.79de0f66-f206-4e2a-ba1c-9d0f06bd025e@github.com> References: <wULp2EAECh8W75aA83GCDEq9GzldQzBwwe16SqY6phk=.902d4251-a271-4575-8ac3-4f2224ca453c@github.com> <F7o8T0rJbkysjNnN-pcQuRNXYfvRM1EyHrPSKBVcQ0Q=.79de0f66-f206-4e2a-ba1c-9d0f06bd025e@github.com> Message-ID: <EDNGq-8NnG8qrhXzN4Q5qxAVx4ru1UY_91xCy85X5F0=.e02c6ddc-1d6e-467c-91c2-fc6b3f0ca443@github.com> On Thu, 25 Jul 2024 07:44:45 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: >> Simple obsoleting a Parallel GC product flag. > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > copyright Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20299#issuecomment-2260902728 From ayang at openjdk.org Wed Jul 31 16:26:36 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 31 Jul 2024 16:26:36 GMT Subject: Integrated: 8337027: Parallel: Obsolete BaseFootPrintEstimate In-Reply-To: <wULp2EAECh8W75aA83GCDEq9GzldQzBwwe16SqY6phk=.902d4251-a271-4575-8ac3-4f2224ca453c@github.com> References: <wULp2EAECh8W75aA83GCDEq9GzldQzBwwe16SqY6phk=.902d4251-a271-4575-8ac3-4f2224ca453c@github.com> Message-ID: <SWLP_jvgK4lPKKvkSIImHz15Nxjd1yFQuV8yT5C2vmI=.fa6b3390-5d1f-4709-91da-7751dfff5840@github.com> On Tue, 23 Jul 2024 14:11:20 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: > Simple obsoleting a Parallel GC product flag. This pull request has now been integrated. Changeset: e4c7850c Author: Albert Mingkun Yang <ayang at openjdk.org> URL: https://git.openjdk.org/jdk/commit/e4c7850c177899a5da6f5050cb0647a6e1a75d31 Stats: 34 lines in 7 files changed: 2 ins; 25 del; 7 mod 8337027: Parallel: Obsolete BaseFootPrintEstimate Reviewed-by: tschatzl, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/20299 From coleenp at openjdk.org Wed Jul 31 18:40:56 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 31 Jul 2024 18:40:56 GMT Subject: RFR: 8335059: Consider renaming ClassLoaderData::keep_alive Message-ID: <qzky8BLyBVaSJmI00_ovSHZQxyudEMAGbnF6DHa97MI=.475e81bd-d939-451d-8960-a1afbc6c2126@github.com> How does this rename look? Instead of ClassLoaderData::keep_alive() and a _keep_alive refcount, it's been renamed to _strongly_reachable and is_strongly_reachable(). Tested with tier1 on Oracle supported platforms. ------------- Commit messages: - Fix indent. - 8335059: Consider renaming ClassLoaderData::keep_alive Changes: https://git.openjdk.org/jdk/pull/20408/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20408&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335059 Stats: 37 lines in 10 files changed: 2 ins; 0 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/20408.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20408/head:pull/20408 PR: https://git.openjdk.org/jdk/pull/20408 From vlivanov at openjdk.org Wed Jul 31 18:52:36 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 31 Jul 2024 18:52:36 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v2] In-Reply-To: <ZJj4fYHqnd5jkIRau4mSsU409_JidyOnKLTpqbNqoFY=.78a4eb10-1311-4d15-a148-f4e3fec17bd3@github.com> References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> <ZJj4fYHqnd5jkIRau4mSsU409_JidyOnKLTpqbNqoFY=.78a4eb10-1311-4d15-a148-f4e3fec17bd3@github.com> Message-ID: <Gx0ZMYP1XcwD2T53qKz2Ww52IIYWbo8PxJlH0GUNDnM=.7a1fe94d-51c9-4ab0-a04b-072dd33fc2a6@github.com> On Mon, 22 Jul 2024 08:49:12 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: >> See bug for more discussion. >> >> Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193 >> >> A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice: >> https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160 >> >> AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone. >> >> I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields. >> >> Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit. >> >> C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner. >> >> Additional testing: >> - [x] New IR tests >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x... > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Merge branch 'master' into JDK-8333791-stable-field-barrier > - Variant 2: Only final-field like semantics for stable inits > - Variant 3: Handle everything, including reads by compilers src/hotspot/share/runtime/globals.hpp line 1997: > 1995: "Use a terrible hash function in order to generate many collisions.") \ > 1996: \ > 1997: develop(bool, RestrictStable, true, \ What's the use case for the flag? Solely for testing purposes (since it's develop)? Alternatively, you could place the test classes on boot class path and enable test execution with product binaries. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19635#discussion_r1698961280 From shade at openjdk.org Wed Jul 31 18:58:36 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 31 Jul 2024 18:58:36 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v2] In-Reply-To: <Gx0ZMYP1XcwD2T53qKz2Ww52IIYWbo8PxJlH0GUNDnM=.7a1fe94d-51c9-4ab0-a04b-072dd33fc2a6@github.com> References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> <ZJj4fYHqnd5jkIRau4mSsU409_JidyOnKLTpqbNqoFY=.78a4eb10-1311-4d15-a148-f4e3fec17bd3@github.com> <Gx0ZMYP1XcwD2T53qKz2Ww52IIYWbo8PxJlH0GUNDnM=.7a1fe94d-51c9-4ab0-a04b-072dd33fc2a6@github.com> Message-ID: <Xndb4WvNE5J6vsrSFwbfQ6J0R07GQNw39q_LTLweraU=.d740fb08-9f38-4de5-9518-1db54fc005be@github.com> On Wed, 31 Jul 2024 18:49:47 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: >> >> - Merge branch 'master' into JDK-8333791-stable-field-barrier >> - Variant 2: Only final-field like semantics for stable inits >> - Variant 3: Handle everything, including reads by compilers > > src/hotspot/share/runtime/globals.hpp line 1997: > >> 1995: "Use a terrible hash function in order to generate many collisions.") \ >> 1996: \ >> 1997: develop(bool, RestrictStable, true, \ > > What's the use case for the flag? Solely for testing purposes (since it's develop)? > Alternatively, you could place the test classes on boot class path and enable test execution with product binaries. Yes, to access the annotation from test, like `RestrictContended` nearby: https://github.com/openjdk/jdk/blob/97f7c03dd0ff389abefed7ea2a7257bcb42e0754/src/hotspot/share/classfile/classFileParser.cpp#L1960 Not sure if putting test classes on bootclasspath would work well with IR tests that are now running in driver mode. I'd prefer to keep the develop flag and keep tests in driver mode and without additional fluff. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19635#discussion_r1698965131 From stefank at openjdk.org Wed Jul 31 19:04:30 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 31 Jul 2024 19:04:30 GMT Subject: RFR: 8335059: Consider renaming ClassLoaderData::keep_alive In-Reply-To: <qzky8BLyBVaSJmI00_ovSHZQxyudEMAGbnF6DHa97MI=.475e81bd-d939-451d-8960-a1afbc6c2126@github.com> References: <qzky8BLyBVaSJmI00_ovSHZQxyudEMAGbnF6DHa97MI=.475e81bd-d939-451d-8960-a1afbc6c2126@github.com> Message-ID: <DptNYmKH9gv3oyw9FMkTD7xTMIEcztvLNeyhXnTqURA=.af2fce7a-e394-4b87-ae2e-d414bbb8acac@github.com> On Wed, 31 Jul 2024 18:35:12 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: > How does this rename look? Instead of ClassLoaderData::keep_alive() and a _keep_alive refcount, it's been renamed to _strongly_reachable and is_strongly_reachable(). > Tested with tier1 on Oracle supported platforms. There's a risk that someone incorrectly interprets: - _strongly_reachable 0 to mean that the class loader isn't strongly reachable. In the bug entry I suggested a name `_strong_count` and tried to avoid the word "reachable" because it already have a meaning for the GC. When talking about the "strong" property we both talk about strongly reachable and strong roots. The property for this CLD is that it is a strong root from the GC's perspective. Maybe we can use that instead. What do you think about `_strong_root` and `is_strong_root`? Or maybe even `_root` and `is_root`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20408#issuecomment-2261207888 From vlivanov at openjdk.org Wed Jul 31 19:28:33 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 31 Jul 2024 19:28:33 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v2] In-Reply-To: <Xndb4WvNE5J6vsrSFwbfQ6J0R07GQNw39q_LTLweraU=.d740fb08-9f38-4de5-9518-1db54fc005be@github.com> References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> <ZJj4fYHqnd5jkIRau4mSsU409_JidyOnKLTpqbNqoFY=.78a4eb10-1311-4d15-a148-f4e3fec17bd3@github.com> <Gx0ZMYP1XcwD2T53qKz2Ww52IIYWbo8PxJlH0GUNDnM=.7a1fe94d-51c9-4ab0-a04b-072dd33fc2a6@github.com> <Xndb4WvNE5J6vsrSFwbfQ6J0R07GQNw39q_LTLweraU=.d740fb08-9f38-4de5-9518-1db54fc005be@github.com> Message-ID: <2IdxXlsbkFOF9BnHuiSXm96Fil-4YoA0GCdKOIz2tPE=.c596ab28-a346-44f6-9e80-7ee76a2aa20b@github.com> On Wed, 31 Jul 2024 18:53:43 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: >> src/hotspot/share/runtime/globals.hpp line 1997: >> >>> 1995: "Use a terrible hash function in order to generate many collisions.") \ >>> 1996: \ >>> 1997: develop(bool, RestrictStable, true, \ >> >> What's the use case for the flag? Solely for testing purposes (since it's develop)? >> Alternatively, you could place the test classes on boot class path and enable test execution with product binaries. > > Yes, to access the annotation from test, like `RestrictContended` nearby: > https://github.com/openjdk/jdk/blob/97f7c03dd0ff389abefed7ea2a7257bcb42e0754/src/hotspot/share/classfile/classFileParser.cpp#L1960 > > Not sure if putting test classes on bootclasspath would work well with IR tests that are now running in driver mode. I'd prefer to keep the develop flag and keep tests in driver mode and without additional fluff. `RestrictContended` and `RestrictReservedStack` are product flags. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19635#discussion_r1698997608 From vlivanov at openjdk.org Wed Jul 31 19:28:34 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 31 Jul 2024 19:28:34 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v2] In-Reply-To: <2IdxXlsbkFOF9BnHuiSXm96Fil-4YoA0GCdKOIz2tPE=.c596ab28-a346-44f6-9e80-7ee76a2aa20b@github.com> References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com> <ZJj4fYHqnd5jkIRau4mSsU409_JidyOnKLTpqbNqoFY=.78a4eb10-1311-4d15-a148-f4e3fec17bd3@github.com> <Gx0ZMYP1XcwD2T53qKz2Ww52IIYWbo8PxJlH0GUNDnM=.7a1fe94d-51c9-4ab0-a04b-072dd33fc2a6@github.com> <Xndb4WvNE5J6vsrSFwbfQ6J0R07GQNw39q_LTLweraU=.d740fb08-9f38-4de5-9518-1db54fc005be@github.com> <2IdxXlsbkFOF9BnHuiSXm96Fil-4YoA0GCdKOIz2tPE=.c596ab28-a346-44f6-9e80-7ee76a2aa20b@github.com> Message-ID: <oXUvUuFN_ItNrmxvXSQM5JsEKzGndG_SmENEQ44epR4=.19762cb1-ae8e-4b41-a626-a2a23012fa3e@github.com> On Wed, 31 Jul 2024 19:22:35 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Yes, to access the annotation from test, like `RestrictContended` nearby: >> https://github.com/openjdk/jdk/blob/97f7c03dd0ff389abefed7ea2a7257bcb42e0754/src/hotspot/share/classfile/classFileParser.cpp#L1960 >> >> Not sure if putting test classes on bootclasspath would work well with IR tests that are now running in driver mode. I'd prefer to keep the develop flag and keep tests in driver mode and without additional fluff. > > `RestrictContended` and `RestrictReservedStack` are product flags. I'm not saying that `RestrictStable` should be made product. It was a deliberate decision to limit it only to trusted classes. There are existing tests for `@Stable` (under `test/hotspot/jtreg/compiler/stable/`) and they don't require any special assistance from the JVM. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19635#discussion_r1699000542 From coleen.phillimore at oracle.com Wed Jul 31 19:46:43 2024 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 31 Jul 2024 15:46:43 -0400 Subject: RFR: 8335059: Consider renaming ClassLoaderData::keep_alive In-Reply-To: <DptNYmKH9gv3oyw9FMkTD7xTMIEcztvLNeyhXnTqURA=.af2fce7a-e394-4b87-ae2e-d414bbb8acac@github.com> References: <qzky8BLyBVaSJmI00_ovSHZQxyudEMAGbnF6DHa97MI=.475e81bd-d939-451d-8960-a1afbc6c2126@github.com> <DptNYmKH9gv3oyw9FMkTD7xTMIEcztvLNeyhXnTqURA=.af2fce7a-e394-4b87-ae2e-d414bbb8acac@github.com> Message-ID: <ddea7a55-e60c-4fd7-b6ea-0c08748ad205@oracle.com> On 7/31/24 3:04 PM, Stefan Karlsson wrote: > On Wed, 31 Jul 2024 18:35:12 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: > >> How does this rename look? Instead of ClassLoaderData::keep_alive() and a _keep_alive refcount, it's been renamed to _strongly_reachable and is_strongly_reachable(). >> Tested with tier1 on Oracle supported platforms. > There's a risk that someone incorrectly interprets: > > - _strongly_reachable 0 > > > to mean that the class loader isn't strongly reachable. That is what _strongly_reachable == 0 means, that the CLD isn't strongly reachable.? (?) > > In the bug entry I suggested a name `_strong_count` and tried to avoid the word "reachable" because it already have a meaning for the GC. When talking about the "strong" property we both talk about strongly reachable and strong roots. The property for this CLD is that it is a strong root from the GC's perspective. Maybe we can use that instead. What do you think about > `_strong_root` and `is_strong_root`? Or maybe even `_root` and `is_root`? "strong" has meaning for GC with the addition of "root" or "reachable", but on its own has no meaning for CLDG.? "strong" can mean a whole host of things.? It doesn't help me know why we're setting this flag for this CLD. I want the attribute to tell me that GC can't unload this CLD! > > ------------- > > PR Comment: https://git.openjdk.org/jdk/pull/20408#issuecomment-2261207888 From coleen.phillimore at oracle.com Wed Jul 31 19:49:40 2024 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 31 Jul 2024 15:49:40 -0400 Subject: RFR: 8335059: Consider renaming ClassLoaderData::keep_alive In-Reply-To: <ddea7a55-e60c-4fd7-b6ea-0c08748ad205@oracle.com> References: <qzky8BLyBVaSJmI00_ovSHZQxyudEMAGbnF6DHa97MI=.475e81bd-d939-451d-8960-a1afbc6c2126@github.com> <DptNYmKH9gv3oyw9FMkTD7xTMIEcztvLNeyhXnTqURA=.af2fce7a-e394-4b87-ae2e-d414bbb8acac@github.com> <ddea7a55-e60c-4fd7-b6ea-0c08748ad205@oracle.com> Message-ID: <e4f4922f-0bc9-4ee5-88ee-728c9c477589@oracle.com> On 7/31/24 3:46 PM, coleen.phillimore at oracle.com wrote: > > > On 7/31/24 3:04 PM, Stefan Karlsson wrote: >> On Wed, 31 Jul 2024 18:35:12 GMT, Coleen Phillimore >> <coleenp at openjdk.org> wrote: >> >>> How does this rename look?? Instead of ClassLoaderData::keep_alive() >>> and a _keep_alive refcount, it's been renamed to _strongly_reachable >>> and is_strongly_reachable(). >>> Tested with tier1 on Oracle supported platforms. >> There's a risk that someone incorrectly interprets: >> >> - _strongly_reachable 0 >> >> >> to mean that the class loader isn't strongly reachable. > > That is what _strongly_reachable == 0 means, that the CLD isn't > strongly reachable.? (?) > >> >> In the bug entry I suggested a name `_strong_count` and tried to >> avoid the word "reachable" because it already have a meaning for the >> GC. When talking about the "strong" property we both talk about >> strongly reachable and strong roots. The property for this CLD is >> that it is a strong root from the GC's perspective. Maybe we can use >> that instead. What do you think about >> `_strong_root` and `is_strong_root`? Or maybe even `_root` and >> `is_root`? > > "strong" has meaning for GC with the addition of "root" or > "reachable", but on its own has no meaning for CLDG.? "strong" can > mean a whole host of things.? It doesn't help me know why we're > setting this flag for this CLD. > > I want the attribute to tell me that GC can't unload this CLD! How about _gc_root and is_gc_root() ? >> >> ------------- >> >> PR Comment: >> https://git.openjdk.org/jdk/pull/20408#issuecomment-2261207888 > From amenkov at openjdk.org Wed Jul 31 20:05:37 2024 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 31 Jul 2024 20:05:37 GMT Subject: Integrated: 8331015: Obsolete -XX:+UseNotificationThread In-Reply-To: <bLUGHCTJHF_LiwVu0wVJ2onQG6wD5_k_RnDstWMkkhw=.5b5d3af1-f406-41f4-b9b5-1137cab9fa8c@github.com> References: <bLUGHCTJHF_LiwVu0wVJ2onQG6wD5_k_RnDstWMkkhw=.5b5d3af1-f406-41f4-b9b5-1137cab9fa8c@github.com> Message-ID: <hpTcWsFUasOZ7VebYQlsV8-d6lmZD2pvuK759k8-1PE=.899119f7-9d6c-4bba-b918-0e6c24fcb73e@github.com> On Tue, 30 Jul 2024 01:57:33 GMT, Alex Menkov <amenkov at openjdk.org> wrote: > Obsolete UseNotificationThread flag which was deprecated in JDK 23. > > Testing: tier1..tier5 This pull request has now been integrated. Changeset: 8af2ef35 Author: Alex Menkov <amenkov at openjdk.org> URL: https://git.openjdk.org/jdk/commit/8af2ef35b6f9399b6d25ff7a4a72ad283df63f03 Stats: 41 lines in 7 files changed: 1 ins; 31 del; 9 mod 8331015: Obsolete -XX:+UseNotificationThread Reviewed-by: dholmes, kevinw, sspitsyn, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/20381 From coleenp at openjdk.org Wed Jul 31 20:41:47 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 31 Jul 2024 20:41:47 GMT Subject: RFR: 8335059: Consider renaming ClassLoaderData::keep_alive [v2] In-Reply-To: <qzky8BLyBVaSJmI00_ovSHZQxyudEMAGbnF6DHa97MI=.475e81bd-d939-451d-8960-a1afbc6c2126@github.com> References: <qzky8BLyBVaSJmI00_ovSHZQxyudEMAGbnF6DHa97MI=.475e81bd-d939-451d-8960-a1afbc6c2126@github.com> Message-ID: <GyNKTm70AgDEk0aji4kZnBWp5wdQ4o1sojpOs0S5Y7s=.fc5ed1b4-bae7-4322-a772-2c2cabf1389d@github.com> > How does this rename look? Instead of ClassLoaderData::keep_alive() and a _keep_alive refcount, it's been renamed to _strongly_reachable and is_strongly_reachable(). > Tested with tier1 on Oracle supported platforms. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Rename to keep_alive_ref_count ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20408/files - new: https://git.openjdk.org/jdk/pull/20408/files/fc69e2e1..38f036f7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20408&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20408&range=00-01 Stats: 29 lines in 10 files changed: 0 ins; 0 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/20408.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20408/head:pull/20408 PR: https://git.openjdk.org/jdk/pull/20408 From clanger at openjdk.org Wed Jul 31 20:59:33 2024 From: clanger at openjdk.org (Christoph Langer) Date: Wed, 31 Jul 2024 20:59:33 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured [v4] In-Reply-To: <WY7JVdGq2cEFGFQ5cypCV_cjVNlp73ZWVkjkr2kpzxA=.a67f045b-8fc4-4426-8ac7-a513942731ed@github.com> References: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com> <WY7JVdGq2cEFGFQ5cypCV_cjVNlp73ZWVkjkr2kpzxA=.a67f045b-8fc4-4426-8ac7-a513942731ed@github.com> Message-ID: <3OFH8Fhnbd7pkBTFmQ0cH5FjWTmfIV26C2gqQSI-Vlc=.63d3fb39-f116-4b5d-b392-10069a987a7f@github.com> On Wed, 31 Jul 2024 14:07:46 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote: >> Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work. >> >> We find this in the test output >> >> [STDOUT] >> /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory >> >> The container where the test is executed does not contain the ubsan package; we might skip the test in this case. > > Matthias Baesken has updated the pull request incrementally with two additional commits since the last revision: > > - remove method from WhiteBox.java > - remove WB_isUbsanEnabled, fix test Fine for me. ------------- Marked as reviewed by clanger (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19907#pullrequestreview-2211120542 From azafari at openjdk.org Wed Jul 31 23:20:07 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Wed, 31 Jul 2024 23:20:07 GMT Subject: RFR: 8333151: Investigate if the Hotspot Arena chunk pools still make sense Message-ID: <nDdImFYUsNwB-8V2TVDpyUKBNWr3gt7h6tgCiWybfFA=.0cf55e88-69cb-4a4e-ad26-f9d9fa8231e2@github.com> Using `ChunkPool` or not is investigated in this PR based on time and memory consumption. Based on the tests using ChunkPool shows no better speed nor memory footprint. Memory usage is taken from RSS reports of Linux API. ------------- Commit messages: - 8333151: Investigate if the Hotspot Arena chunk pools still make sense - Merge branch '_8333151_chunk_pool_test' of http://github.com/afshin-zafari/jdk into _8333151_chunk_pool_test - add memory footprint measurement - 8333151: Investigate if the Hotspot Arena chunk pools still make sense - rebase master - compare the memory footprint - add memory footprint measurement - 8333151: Investigate if the Hotspot Arena chunk pools still make sense Changes: https://git.openjdk.org/jdk/pull/20411/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20411&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333151 Stats: 119 lines in 3 files changed: 113 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20411.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20411/head:pull/20411 PR: https://git.openjdk.org/jdk/pull/20411 From azafari at openjdk.org Wed Jul 31 23:38:02 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Wed, 31 Jul 2024 23:38:02 GMT Subject: RFR: 8333151: Investigate if the Hotspot Arena chunk pools still make sense [v2] In-Reply-To: <nDdImFYUsNwB-8V2TVDpyUKBNWr3gt7h6tgCiWybfFA=.0cf55e88-69cb-4a4e-ad26-f9d9fa8231e2@github.com> References: <nDdImFYUsNwB-8V2TVDpyUKBNWr3gt7h6tgCiWybfFA=.0cf55e88-69cb-4a4e-ad26-f9d9fa8231e2@github.com> Message-ID: <uSrCn5pmcLBmOhoFkJ0hH2nEOLABLhMCJOjaucfiJKE=.5691ad8a-8ff3-4036-ac30-b0807786f78a@github.com> > Using `ChunkPool` or not is investigated in this PR based on time and memory consumption. > Based on the tests using ChunkPool shows no better speed nor memory footprint. > Memory usage is taken from RSS reports of Linux API. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: fixes after merge glitches ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20411/files - new: https://git.openjdk.org/jdk/pull/20411/files/081cf0a2..dc6be286 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20411&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20411&range=00-01 Stats: 8 lines in 2 files changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/20411.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20411/head:pull/20411 PR: https://git.openjdk.org/jdk/pull/20411 From kvn at openjdk.org Wed Jul 31 23:46:45 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 31 Jul 2024 23:46:45 GMT Subject: RFR: 8337396: Cleanup usage of ExternalAddess Message-ID: <MbVHDHN0Sd9JmSr8tzsk_TW9patwxXiBBkUpFhqPOD8=.2f5a1013-6d44-4e88-8f44-31bfcb5ef5bc@github.com> `ExternalAddess` should be used only for data load. For calls (and jump) instructions we should use `RuntimeAddress` which uses `runtime_call_Relocation`. I found few places where `ExternalAddess` is used incorrectly and fixed them. I also added code to print "hottest" (most referenced) `ExternalAddess` addresses in global table to move them into static global tables which will be introduced by [JDK-8334691](https://bugs.openjdk.org/browse/JDK-8334691) and [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519). Here is current output from debug VM on MacBook M1 (Aarch64): External addresses table: 6 entries, 324 accesses 0: 158 0x00000001082de0f0 : extn: vmClasses::_klasses+480 1: 84 0x00000001082ddf20 : extn: vmClasses::_klasses+16 2: 40 0x00000001082c4790 : extn: SharedRuntime::_partial_subtype_ctr 3: 24 0x00000001082bdb04 : extn: JvmtiExport::_should_notify_object_alloc 4: 18 0x0000000118384080 : stub: forward exception on MacOS-x64: External addresses table: 143 entries, 44405 accesses 0: 11766 0x00000001047922a0 : extn: CompressedOops::_narrow_oop 1: 11002 0x0000000104474384 : 'should not reach here' 2: 9672 0x0000000104581a90 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+882068 3: 2447 0x0000000104508005 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+383753 4: 1916 0x000000010458188e : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+881554 and on linux-x64: External addresses table: 143 entries, 77297 accesses 0: 22334 0x00007f35d5b9c000 : '' 1: 19789 0x00007f35d55eea1f : 'should not reach here' 2: 18366 0x00007f35d5747bb8 : 'MacroAssembler::decode_heap_oop: heap base corrupted?' 3: 5036 0x00007f35d56e4d40 : 'uncommon trap returned which should never happen' 4: 3643 0x00007f35d57479f8 : 'MacroAssembler::encode_heap_oop: heap base corrupted?' Few points about difference in output: 1. aarch64 does not use `ExternalAddess` or any relocation for messages (strings). 2. `stub: forward exception` corresponds to `StubRoutines::forward_exception_entry()` for which C2 generates tail-call from [C2's stubs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/generateOptoStub.cpp#L258C48-L258C87). It is difficult to convert it to `RuntimeAddress` because how relocation for constants in C2 are handled. 3. linux-x64 implementation of `dlladdr()`, I used to print C++ symbol name, only works for functions: `0x00007f35d5b9c000` points to `CompressedOops::_narrow_oop._base` from code in [MacroAssembler::verify_heapbase()](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L5760) and on aarch64 [verify_heapbase()](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L2959) is empty (guarded by `#if 0`). 4. I think `ClassLoader::file_name_for_class_name()+...` on MacOSX-x64 corresponds to strings on linux-x64. Additionally I moved asserts before locks in `ExternalsRecorder` methods. Tested: tier1-3, xcomp, stress ------------- Commit messages: - 8337396: Cleanup usage of ExternalAddess Changes: https://git.openjdk.org/jdk/pull/20412/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20412&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337396 Stats: 104 lines in 5 files changed: 86 ins; 3 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/20412.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20412/head:pull/20412 PR: https://git.openjdk.org/jdk/pull/20412