From fyang at openjdk.org Thu Aug 1 02:14:39 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 1 Aug 2024 02:14:39 GMT Subject: RFR: 8337396: Cleanup usage of ExternalAddess In-Reply-To: References: Message-ID: <4Z6aov0by98AXsxBAgKcbNromj5HuPZm0B4tsClZLFM=.285882c0-6170-4b19-abfd-cfbef36a413e@github.com> On Wed, 31 Jul 2024 23:42:36 GMT, Vladimir Kozlov wrote: > `ExternalAddess` should be used only for data load. For calls (and jump) instructions we should use `RuntimeAddress` which uses `runtime_call_Relocation`. > > I found few places where `ExternalAddess` is used incorrectly and fixed them. > > I also added code to print "hottest" (most referenced) `ExternalAddess` addresses in global table to move them into static global tables which will be introduced by [JDK-8334691](https://bugs.openjdk.org/browse/JDK-8334691) and [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519). > > Here is current output from debug VM on MacBook M1 (Aarch64): > > External addresses table: 6 entries, 324 accesses > 0: 158 0x00000001082de0f0 : extn: vmClasses::_klasses+480 > 1: 84 0x00000001082ddf20 : extn: vmClasses::_klasses+16 > 2: 40 0x00000001082c4790 : extn: SharedRuntime::_partial_subtype_ctr > 3: 24 0x00000001082bdb04 : extn: JvmtiExport::_should_notify_object_alloc > 4: 18 0x0000000118384080 : stub: forward exception > > > on MacOS-x64: > > External addresses table: 143 entries, 44405 accesses > 0: 11766 0x00000001047922a0 : extn: CompressedOops::_narrow_oop > 1: 11002 0x0000000104474384 : 'should not reach here' > 2: 9672 0x0000000104581a90 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+882068 > 3: 2447 0x0000000104508005 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+383753 > 4: 1916 0x000000010458188e : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+881554 > > > and on linux-x64: > > External addresses table: 143 entries, 77297 accesses > 0: 22334 0x00007f35d5b9c000 : '' > 1: 19789 0x00007f35d55eea1f : 'should not reach here' > 2: 18366 0x00007f35d5747bb8 : 'MacroAssembler::decode_heap_oop: heap base corrupted?' > 3: 5036 0x00007f35d56e4d40 : 'uncommon trap returned which should never happen' > 4: 3643 0x00007f35d57479f8 : 'MacroAssembler::encode_heap_oop: heap base corrupted?' > > > Few points about difference in output: > 1. aarch64 does not use `ExternalAddess` or any relocation for messages (strings). > 2. `stub: forward exception` corresponds to `StubRoutines::forward_exception_entry()` for which C2 generates tail-call from [C2's stubs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/generateOptoStub.cpp#L258C48-L258C87). It is difficult to convert it to `RuntimeAddress` because how relocation for constants in C2 are handled. > 3. linux-x64 implementation of `dlladdr()`, I used to print C++ symbol name, only... Hi, Seems to me that the following two were missed? diff --git a/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp b/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp index 3f1a4423b5e..5e2ef97e4a3 100644 --- a/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp +++ b/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp @@ -7045,7 +7045,7 @@ class StubGenerator: public StubCodeGenerator { Label thaw_success; // rscratch2 contains the size of the frames to thaw, 0 if overflow or no more frames __ cbnz(rscratch2, thaw_success); - __ lea(rscratch1, ExternalAddress(StubRoutines::throw_StackOverflowError_entry())); + __ lea(rscratch1, RuntimeAddress(StubRoutines::throw_StackOverflowError_entry())); __ br(rscratch1); __ bind(thaw_success); diff --git a/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp b/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp index f78d7261e40..198835d733f 100644 --- a/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp +++ b/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp @@ -3774,7 +3774,7 @@ class StubGenerator: public StubCodeGenerator { Label thaw_success; // t1 contains the size of the frames to thaw, 0 if overflow or no more frames __ bnez(t1, thaw_success); - __ la(t0, ExternalAddress(StubRoutines::throw_StackOverflowError_entry())); + __ la(t0, RuntimeAddress(StubRoutines::throw_StackOverflowError_entry())); __ jr(t0); __ bind(thaw_success); ------------- PR Comment: https://git.openjdk.org/jdk/pull/20412#issuecomment-2261814965 From jwtang at openjdk.org Thu Aug 1 03:06:15 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Thu, 1 Aug 2024 03:06:15 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v5] In-Reply-To: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> Message-ID: <9JIJyNsoybEln6CB1Rxcqs8vCJkT7jcPdmR3J5sLjPM=.fdac6552-77a7-4a7f-8baa-39c03107c2bd@github.com> > I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing. Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision: remove useless codes and refactor the testcase ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20373/files - new: https://git.openjdk.org/jdk/pull/20373/files/1b0de486..60411296 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=03-04 Stats: 63 lines in 3 files changed: 7 ins; 53 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20373/head:pull/20373 PR: https://git.openjdk.org/jdk/pull/20373 From jwtang at openjdk.org Thu Aug 1 03:06:16 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Thu, 1 Aug 2024 03:06:16 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v4] In-Reply-To: <-4ohGO-ytRMr_I-4SRpWX6QDeZQCIhVho9mTQadK3MQ=.dff1bb8d-907c-4844-9e1b-801d69984d49@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <-4ohGO-ytRMr_I-4SRpWX6QDeZQCIhVho9mTQadK3MQ=.dff1bb8d-907c-4844-9e1b-801d69984d49@github.com> Message-ID: On Tue, 30 Jul 2024 11:20:48 GMT, Serguei Spitsyn wrote: >> Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision: >> >> refactor testcase and change the location of fix codes > > src/hotspot/share/prims/jvmtiExport.cpp line 1098: > >> 1096: if (JavaThread::current()->is_in_any_VTMS_transition()) { >> 1097: return false; // no events should be posted if thread is in any VTMS transition >> 1098: } > > Sorry, I was not clear the 3 lines above 1093-1095 had to be replaced with new lines 1096-1098. > The check for `is_in_any_VTMS_transition()` includes the checks `is_in_tmp_VTMS_transition()`. > So, the lines 1093-1095 need to be removed now. Change it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1699334445 From kvn at openjdk.org Thu Aug 1 03:19:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 Aug 2024 03:19:01 GMT Subject: RFR: 8337396: Cleanup usage of ExternalAddess [v2] In-Reply-To: References: Message-ID: > `ExternalAddess` should be used only for data load. For calls (and jump) instructions we should use `RuntimeAddress` which uses `runtime_call_Relocation`. > > I found few places where `ExternalAddess` is used incorrectly and fixed them. > > I also added code to print "hottest" (most referenced) `ExternalAddess` addresses in global table to move them into static global tables which will be introduced by [JDK-8334691](https://bugs.openjdk.org/browse/JDK-8334691) and [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519). > > Here is current output from debug VM on MacBook M1 (Aarch64): > > External addresses table: 6 entries, 324 accesses > 0: 158 0x00000001082de0f0 : extn: vmClasses::_klasses+480 > 1: 84 0x00000001082ddf20 : extn: vmClasses::_klasses+16 > 2: 40 0x00000001082c4790 : extn: SharedRuntime::_partial_subtype_ctr > 3: 24 0x00000001082bdb04 : extn: JvmtiExport::_should_notify_object_alloc > 4: 18 0x0000000118384080 : stub: forward exception > > > on MacOS-x64: > > External addresses table: 143 entries, 44405 accesses > 0: 11766 0x00000001047922a0 : extn: CompressedOops::_narrow_oop > 1: 11002 0x0000000104474384 : 'should not reach here' > 2: 9672 0x0000000104581a90 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+882068 > 3: 2447 0x0000000104508005 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+383753 > 4: 1916 0x000000010458188e : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+881554 > > > and on linux-x64: > > External addresses table: 143 entries, 77297 accesses > 0: 22334 0x00007f35d5b9c000 : '' > 1: 19789 0x00007f35d55eea1f : 'should not reach here' > 2: 18366 0x00007f35d5747bb8 : 'MacroAssembler::decode_heap_oop: heap base corrupted?' > 3: 5036 0x00007f35d56e4d40 : 'uncommon trap returned which should never happen' > 4: 3643 0x00007f35d57479f8 : 'MacroAssembler::encode_heap_oop: heap base corrupted?' > > > Few points about difference in output: > 1. aarch64 does not use `ExternalAddess` or any relocation for messages (strings). > 2. `stub: forward exception` corresponds to `StubRoutines::forward_exception_entry()` for which C2 generates tail-call from [C2's stubs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/generateOptoStub.cpp#L258C48-L258C87). It is difficult to convert it to `RuntimeAddress` because how relocation for constants in C2 are handled. > 3. linux-x64 implementation of `dlladdr()`, I used to print C++ symbol name, only... Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Add missed ExternalAddress changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20412/files - new: https://git.openjdk.org/jdk/pull/20412/files/d2abe7fb..7fa9e11f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20412&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20412&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20412.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20412/head:pull/20412 PR: https://git.openjdk.org/jdk/pull/20412 From kvn at openjdk.org Thu Aug 1 03:19:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 Aug 2024 03:19:01 GMT Subject: RFR: 8337396: Cleanup usage of ExternalAddess In-Reply-To: <4Z6aov0by98AXsxBAgKcbNromj5HuPZm0B4tsClZLFM=.285882c0-6170-4b19-abfd-cfbef36a413e@github.com> References: <4Z6aov0by98AXsxBAgKcbNromj5HuPZm0B4tsClZLFM=.285882c0-6170-4b19-abfd-cfbef36a413e@github.com> Message-ID: On Thu, 1 Aug 2024 02:12:23 GMT, Fei Yang wrote: >> `ExternalAddess` should be used only for data load. For calls (and jump) instructions we should use `RuntimeAddress` which uses `runtime_call_Relocation`. >> >> I found few places where `ExternalAddess` is used incorrectly and fixed them. >> >> I also added code to print "hottest" (most referenced) `ExternalAddess` addresses in global table to move them into static global tables which will be introduced by [JDK-8334691](https://bugs.openjdk.org/browse/JDK-8334691) and [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519). >> >> Here is current output from debug VM on MacBook M1 (Aarch64): >> >> External addresses table: 6 entries, 324 accesses >> 0: 158 0x00000001082de0f0 : extn: vmClasses::_klasses+480 >> 1: 84 0x00000001082ddf20 : extn: vmClasses::_klasses+16 >> 2: 40 0x00000001082c4790 : extn: SharedRuntime::_partial_subtype_ctr >> 3: 24 0x00000001082bdb04 : extn: JvmtiExport::_should_notify_object_alloc >> 4: 18 0x0000000118384080 : stub: forward exception >> >> >> on MacOS-x64: >> >> External addresses table: 143 entries, 44405 accesses >> 0: 11766 0x00000001047922a0 : extn: CompressedOops::_narrow_oop >> 1: 11002 0x0000000104474384 : 'should not reach here' >> 2: 9672 0x0000000104581a90 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+882068 >> 3: 2447 0x0000000104508005 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+383753 >> 4: 1916 0x000000010458188e : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+881554 >> >> >> and on linux-x64: >> >> External addresses table: 143 entries, 77297 accesses >> 0: 22334 0x00007f35d5b9c000 : '' >> 1: 19789 0x00007f35d55eea1f : 'should not reach here' >> 2: 18366 0x00007f35d5747bb8 : 'MacroAssembler::decode_heap_oop: heap base corrupted?' >> 3: 5036 0x00007f35d56e4d40 : 'uncommon trap returned which should never happen' >> 4: 3643 0x00007f35d57479f8 : 'MacroAssembler::encode_heap_oop: heap base corrupted?' >> >> >> Few points about difference in output: >> 1. aarch64 does not use `ExternalAddess` or any relocation for messages (strings). >> 2. `stub: forward exception` corresponds to `StubRoutines::forward_exception_entry()` for which C2 generates tail-call from [C2's stubs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/generateOptoStub.cpp#L258C48-L258C87). It is difficult to convert it to `RuntimeAddress` because how relocation for constants in C2 are handled. >> 3... > > Hi, Seems to me that the following two were missed? > > diff --git a/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp b/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp > index 3f1a4423b5e..5e2ef97e4a3 100644 > --- a/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp > +++ b/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp > @@ -7045,7 +7045,7 @@ class StubGenerator: public StubCodeGenerator { > Label thaw_success; > // rscratch2 contains the size of the frames to thaw, 0 if overflow or no more frames > __ cbnz(rscratch2, thaw_success); > - __ lea(rscratch1, ExternalAddress(StubRoutines::throw_StackOverflowError_entry())); > + __ lea(rscratch1, RuntimeAddress(StubRoutines::throw_StackOverflowError_entry())); > __ br(rscratch1); > __ bind(thaw_success); > > diff --git a/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp b/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp > index f78d7261e40..198835d733f 100644 > --- a/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp > +++ b/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp > @@ -3774,7 +3774,7 @@ class StubGenerator: public StubCodeGenerator { > Label thaw_success; > // t1 contains the size of the frames to thaw, 0 if overflow or no more frames > __ bnez(t1, thaw_success); > - __ la(t0, ExternalAddress(StubRoutines::throw_StackOverflowError_entry())); > + __ la(t0, RuntimeAddress(StubRoutines::throw_StackOverflowError_entry())); > __ jr(t0); > __ bind(thaw_success); Thank you, @RealFYang I added missing cases pointed by you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20412#issuecomment-2261880171 From sjayagond at openjdk.org Thu Aug 1 03:51:36 2024 From: sjayagond at openjdk.org (Sidraya Jayagond) Date: Thu, 1 Aug 2024 03:51:36 GMT Subject: RFR: 8327652: S390x: Implements SLP support [v7] In-Reply-To: References: Message-ID: On Tue, 26 Mar 2024 15:10:37 GMT, Sidraya Jayagond wrote: >> This PR Adds SIMD support on s390x. > > Sidraya Jayagond has updated the pull request incrementally with one additional commit since the last revision: > > PopCountVI supported by z14 onwards. Keeping it open. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18162#issuecomment-2261915862 From sspitsyn at openjdk.org Thu Aug 1 04:14:57 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 1 Aug 2024 04:14:57 GMT Subject: RFR: 8336846: assert(state->get_thread() == jt) failed: handshake unsafe conditions Message-ID: The JVMTI Watch Field functions do not disable VTMS transitions with the `JvmtiVTMSTransitionDisabler`: - `SetFieldAccessWatch()` - `ClearFieldAccessWatch()` - `SetFieldModificationWatch()` - `ClearFieldModificationWatch()` so in the `recompute_enabled()` we could see that a vthread is mounted, but in the `EnterInterpOnlyModeClosure` handshake the thread could have been unmounted already. This is a root cause of failures with this assert. The fix is to disable transitions in the `JvmtiEventControllerPrivate::change_field_watch()` function. Testing: - TBD: submit mach5 tiers 1-6 ------------- Commit messages: - 8336846: assert(state->get_thread() == jt) failed: handshake unsafe conditions Changes: https://git.openjdk.org/jdk/pull/20413/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20413&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8336846 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20413/head:pull/20413 PR: https://git.openjdk.org/jdk/pull/20413 From dholmes at openjdk.org Thu Aug 1 05:16:35 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 1 Aug 2024 05:16:35 GMT Subject: RFR: 8335059: Consider renaming ClassLoaderData::keep_alive [v2] In-Reply-To: References: Message-ID: On Wed, 31 Jul 2024 20:41:47 GMT, Coleen Phillimore wrote: >> How does this rename look? Instead of ClassLoaderData::keep_alive() and a _keep_alive refcount, it's been renamed to _strongly_reachable and is_strongly_reachable(). >> Tested with tier1 on Oracle supported platforms. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Rename to keep_alive_ref_count At the moment the changes do not at all match the PR description, and while the renaming adds a little clarity it doesn't seem to address any concerns one might have had with "keep alive" as a terminology. > I want the attribute to tell me that GC can't unload this CLD! Maybe `disable_unload_count`? src/hotspot/share/classfile/classLoaderDataGraph.cpp line 313: > 311: while (ClassLoaderData* cld = iter.get_next()) { > 312: // Keep the holder alive. > 313: cld->keep_alive(); Does comment need changing or removing? src/hotspot/share/classfile/classLoaderDataGraph.cpp line 338: > 336: while (ClassLoaderData* cld = iter.get_next()) { > 337: // Keep the holder alive. > 338: cld->keep_alive(); Ditto re comment ------------- PR Review: https://git.openjdk.org/jdk/pull/20408#pullrequestreview-2211633655 PR Review Comment: https://git.openjdk.org/jdk/pull/20408#discussion_r1699427472 PR Review Comment: https://git.openjdk.org/jdk/pull/20408#discussion_r1699427605 From dholmes at openjdk.org Thu Aug 1 05:28:30 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 1 Aug 2024 05:28:30 GMT Subject: RFR: 8336846: assert(state->get_thread() == jt) failed: handshake unsafe conditions In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 04:09:31 GMT, Serguei Spitsyn wrote: > The JVMTI Watch Field functions do not disable VTMS transitions with the `JvmtiVTMSTransitionDisabler`: > - `SetFieldAccessWatch()` > - `ClearFieldAccessWatch()` > - `SetFieldModificationWatch()` > - `ClearFieldModificationWatch()` > so in the `recompute_enabled()` we could see that a vthread is mounted, but in the `EnterInterpOnlyModeClosure` handshake the thread could have been unmounted already. This is a root cause of failures with this assert. > > The fix is to disable transitions in the `JvmtiEventControllerPrivate::change_field_watch()` function. > > Testing: > - TBD: submit mach5 tiers 1-6 src/hotspot/share/prims/jvmtiEventController.cpp line 988: > 986: (*count_addr)++; > 987: if (*count_addr == 1) { > 988: JvmtiVTMSTransitionDisabler disabler; Could you just declare this outside the if-block rather than repeating it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20413#discussion_r1699441402 From dholmes at openjdk.org Thu Aug 1 05:37:31 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 1 Aug 2024 05:37:31 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v5] In-Reply-To: <9JIJyNsoybEln6CB1Rxcqs8vCJkT7jcPdmR3J5sLjPM=.fdac6552-77a7-4a7f-8baa-39c03107c2bd@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <9JIJyNsoybEln6CB1Rxcqs8vCJkT7jcPdmR3J5sLjPM=.fdac6552-77a7-4a7f-8baa-39c03107c2bd@github.com> Message-ID: On Thu, 1 Aug 2024 03:06:15 GMT, Jiawei Tang wrote: >> I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing. > > Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision: > > remove useless codes and refactor the testcase So the essence of this change is simply that is_in_tmp becomes is_in_any? That seems reasonable. Thanks ------------- PR Review: https://git.openjdk.org/jdk/pull/20373#pullrequestreview-2211665450 From stuefe at openjdk.org Thu Aug 1 05:53:31 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 1 Aug 2024 05:53:31 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured [v4] In-Reply-To: References: Message-ID: On Wed, 31 Jul 2024 14:07:46 GMT, Matthias Baesken wrote: >> Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work. >> >> We find this in the test output >> >> [STDOUT] >> /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory >> >> The container where the test is executed does not contain the ubsan package; we might skip the test in this case. > > Matthias Baesken has updated the pull request incrementally with two additional commits since the last revision: > > - remove method from WhiteBox.java > - remove WB_isUbsanEnabled, fix test I have seen this too late.. some questions: - Are we sure the images the tests use will always be Debian or Debian descendants? What about RHEL or Oracle Linux? - How close does the libubsan have to be to the compiler used to build the tested JVM? After all, UBSAN is a compiler feature. Does this work for any version of GCC used to build the JVM? Or do we risk weird test errors if libubsan in the image is incompatible with the GCC version used when building the testee JVM? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19907#issuecomment-2262093863 From qxing at openjdk.org Thu Aug 1 06:34:31 2024 From: qxing at openjdk.org (Qizheng Xing) Date: Thu, 1 Aug 2024 06:34:31 GMT Subject: RFR: 8336163: Remove declarations of some debug-only methods in release build [v2] In-Reply-To: References: Message-ID: On Thu, 11 Jul 2024 07:12:35 GMT, Qizheng Xing wrote: >> Some of the methods are defined only in debug mode, but their declarations still exist in release mode. >> >> This is considered a bug because these methods may be called mistakenly in release mode and cause the build to fail. > > Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright. Hi all, This PR does some cleanup of some unused declarations in release mode, which is trivial and low risk. Could someone please review this PR? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20131#issuecomment-2262157566 From stefank at openjdk.org Thu Aug 1 06:47:38 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 1 Aug 2024 06:47:38 GMT Subject: RFR: 8335059: Consider renaming ClassLoaderData::keep_alive [v2] In-Reply-To: References: Message-ID: On Wed, 31 Jul 2024 20:41:47 GMT, Coleen Phillimore wrote: >> How does this rename look? Instead of ClassLoaderData::keep_alive() and a _keep_alive refcount, it's been renamed to _strongly_reachable and is_strongly_reachable(). >> Tested with tier1 on Oracle supported platforms. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Rename to keep_alive_ref_count Me and Coleen talked about and agreed on the keep_alive_ref_count name. I think this looks good. I'll Approve the PR when the Windows compilation error has been fixed. Thanks! src/hotspot/share/classfile/classLoaderData.hpp line 215: > 213: > 214: private: > 215: bool keep_alive_ref_count() const { return _keep_alive_ref_count; } keep_alive_ref_count() should return an int. The Windows GHA build complains about this. src/hotspot/share/classfile/classLoaderData.hpp line 308: > 306: > 307: // Used to refcount a non-strong hidden class's s CLD in order to indicate their aliveness. > 308: void inc_keep_alive_ref_count(); This comment is slightly misleading. We have two mechanisms to determine a CLDs aliveness. The first is this ref-counting mechanism, the other is the GC tracing. One can misinterpret this comment to mean that we only use the ref-count to determine the aliveness. Could we tweak this comment to say something that the ref-count is used to *force* the aliveness of CLDs? ------------- PR Review: https://git.openjdk.org/jdk/pull/20408#pullrequestreview-2211769698 PR Review Comment: https://git.openjdk.org/jdk/pull/20408#discussion_r1699518844 PR Review Comment: https://git.openjdk.org/jdk/pull/20408#discussion_r1699523957 From stefank at openjdk.org Thu Aug 1 06:47:39 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 1 Aug 2024 06:47:39 GMT Subject: RFR: 8335059: Consider renaming ClassLoaderData::keep_alive [v2] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 05:07:36 GMT, David Holmes wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename to keep_alive_ref_count > > src/hotspot/share/classfile/classLoaderDataGraph.cpp line 313: > >> 311: while (ClassLoaderData* cld = iter.get_next()) { >> 312: // Keep the holder alive. >> 313: cld->keep_alive(); > > Does comment need changing or removing? I agree. There's no need to talk about the holder here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20408#discussion_r1699524549 From eliu at openjdk.org Thu Aug 1 06:53:33 2024 From: eliu at openjdk.org (Eric Liu) Date: Thu, 1 Aug 2024 06:53:33 GMT Subject: RFR: 8336163: Remove declarations of some debug-only methods in release build [v2] In-Reply-To: References: Message-ID: On Thu, 11 Jul 2024 07:12:35 GMT, Qizheng Xing wrote: >> Some of the methods are defined only in debug mode, but their declarations still exist in release mode. >> >> This is considered a bug because these methods may be called mistakenly in release mode and cause the build to fail. > > Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright. src/hotspot/share/runtime/registerMap.hpp line 159: > 157: void print_on(outputStream* st) const; > 158: void print() const; > 159: #endif They can be merged into a single MACRO with line 165. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20131#discussion_r1699531407 From ayang at openjdk.org Thu Aug 1 07:31:43 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 1 Aug 2024 07:31:43 GMT Subject: RFR: 8337546: Remove unused GCCause::_adaptive_size_policy In-Reply-To: References: Message-ID: On Wed, 31 Jul 2024 11:25:50 GMT, Albert Mingkun Yang wrote: > Trivial removing an unused gc-cause; it was previously used by Parallel only. Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20403#issuecomment-2262242210 From ayang at openjdk.org Thu Aug 1 07:31:43 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 1 Aug 2024 07:31:43 GMT Subject: Integrated: 8337546: Remove unused GCCause::_adaptive_size_policy In-Reply-To: References: Message-ID: <6THEqrfMC8jW6TBFfLMIn8XdDslUFXP9jBtYzc0jOKc=.474e0a7c-827c-4519-948e-db8aecc15722@github.com> On Wed, 31 Jul 2024 11:25:50 GMT, Albert Mingkun Yang wrote: > Trivial removing an unused gc-cause; it was previously used by Parallel only. This pull request has now been integrated. Changeset: cf1230a5 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/cf1230a5f7e5ae4c72ec6243fff1d0b0eb27779a Stats: 13 lines in 4 files changed: 0 ins; 11 del; 2 mod 8337546: Remove unused GCCause::_adaptive_size_policy Reviewed-by: tschatzl, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/20403 From mbaesken at openjdk.org Thu Aug 1 07:33:33 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 1 Aug 2024 07:33:33 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured [v4] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 05:49:59 GMT, Thomas Stuefe wrote: > Are we sure the images the tests use will always be Debian or Debian descendants? What about RHEL or Oracle Linux? We use Ubuntu for the container test base image as default , see https://github.com/openjdk/jdk/blob/65646b5f81279a7fcef3ea04ef9894cf66f77a5a/test/lib/jdk/test/lib/containers/docker/DockerfileConfig.java#L47 To be more on the safe side (potentially the image can be switched with jdk.test.docker.image.name) we could add a check (and avoid adding the libubsan1 package if it is not ubuntu). Regarding compatibility, I've seen no issues (and if you compile _without_ ubsan you would not reference the libubsan1 anyway). So it is for some special configuration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19907#issuecomment-2262249578 From qxing at openjdk.org Thu Aug 1 08:33:48 2024 From: qxing at openjdk.org (Qizheng Xing) Date: Thu, 1 Aug 2024 08:33:48 GMT Subject: RFR: 8336163: Remove declarations of some debug-only methods in release build [v3] In-Reply-To: References: Message-ID: > Some of the methods are defined only in debug mode, but their declarations still exist in release mode. > > This is considered a bug because these methods may be called mistakenly in release mode and cause the build to fail. Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: Merge `#ifndef PRODUCT` regions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20131/files - new: https://git.openjdk.org/jdk/pull/20131/files/37a14107..1117b89d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20131&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20131&range=01-02 Stats: 8 lines in 1 file changed: 3 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20131.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20131/head:pull/20131 PR: https://git.openjdk.org/jdk/pull/20131 From qxing at openjdk.org Thu Aug 1 08:33:48 2024 From: qxing at openjdk.org (Qizheng Xing) Date: Thu, 1 Aug 2024 08:33:48 GMT Subject: RFR: 8336163: Remove declarations of some debug-only methods in release build [v2] In-Reply-To: References: Message-ID: <92z-E0SRuWeaw_4BR-bkA1-NTpcI51jh66lDopA3PGA=.426005ff-6bd6-4ee5-807a-403425864d3c@github.com> On Thu, 1 Aug 2024 06:50:25 GMT, Eric Liu wrote: >> Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: >> >> Update copyright. > > src/hotspot/share/runtime/registerMap.hpp line 159: > >> 157: void print_on(outputStream* st) const; >> 158: void print() const; >> 159: #endif > > They can be merged into a single MACRO with line 165. Thanks, updated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20131#discussion_r1699652896 From shade at openjdk.org Thu Aug 1 09:28:47 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 1 Aug 2024 09:28:47 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v2] In-Reply-To: References: <2IdxXlsbkFOF9BnHuiSXm96Fil-4YoA0GCdKOIz2tPE=.c596ab28-a346-44f6-9e80-7ee76a2aa20b@github.com> Message-ID: On Wed, 31 Jul 2024 19:25:25 GMT, Vladimir Ivanov wrote: >> `RestrictContended` and `RestrictReservedStack` are product flags. > > I'm not saying that `RestrictStable` should be made product. It was a deliberate decision to limit it only to trusted classes. > > There are existing tests for `@Stable` (under `test/hotspot/jtreg/compiler/stable/`) and they don't require any special assistance from the JVM. Again, the problem here is that new tests are *IR Tests*, and I am struggling to find a good way to bootclasspath the classes that IR Test Framework itself invokes. The tests you referenced are not IR tests, and so do not have this problem. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19635#discussion_r1699831864 From sspitsyn at openjdk.org Thu Aug 1 09:30:39 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 1 Aug 2024 09:30:39 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v5] In-Reply-To: <9JIJyNsoybEln6CB1Rxcqs8vCJkT7jcPdmR3J5sLjPM=.fdac6552-77a7-4a7f-8baa-39c03107c2bd@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <9JIJyNsoybEln6CB1Rxcqs8vCJkT7jcPdmR3J5sLjPM=.fdac6552-77a7-4a7f-8baa-39c03107c2bd@github.com> Message-ID: On Thu, 1 Aug 2024 03:06:15 GMT, Jiawei Tang wrote: >> I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing. > > Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision: > > remove useless codes and refactor the testcase Thank you for the update! It looks good in general. I've requested a couple of nits to split long lines in the test though. test/hotspot/jtreg/serviceability/jvmti/vthread/TestPinCaseWithCFLH/TestPinCaseWithCFLH.java line 34: > 32: * @run driver jdk.test.lib.util.JavaAgentBuilder > 33: * TestPinCaseWithCFLH TestPinCaseWithCFLH.jar > 34: * @run main/othervm/timeout=100 -Djdk.virtualThreadScheduler.maxPoolSize=1 -Djdk.tracePinnedThreads=full --enable-native-access=ALL-UNNAMED -javaagent:TestPinCaseWithCFLH.jar TestPinCaseWithCFLH Nit: Could you, please, rearrange to avoid long lines? test/hotspot/jtreg/serviceability/jvmti/vthread/TestPinCaseWithCFLH/TestPinCaseWithCFLH.java line 45: > 43: > 44: public static class TestClassFileTransformer implements ClassFileTransformer { > 45: public byte[] transform(ClassLoader loader, String className, Class classBeingRedefined, ProtectionDomain protectionDomain, byte[] classfileBuffer) throws IllegalClassFormatException { Nit: Could you, please, rearrange this to avoid long lines? test/hotspot/jtreg/serviceability/jvmti/vthread/TestPinCaseWithCFLH/TestPinCaseWithCFLH.java line 61: > 59: VThreadPinner.runPinned(() -> { > 60: try { > 61: Thread.sleep(500); // try yield, will pin, javaagent + tracePinnedThreads should not lead to crash (because of the class `PinnedThreadPrinter`) Nit: Could you, please, rearrange to avoid long lines? ------------- PR Review: https://git.openjdk.org/jdk/pull/20373#pullrequestreview-2212212420 PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1699830174 PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1699831041 PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1699831379 From sspitsyn at openjdk.org Thu Aug 1 09:37:08 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 1 Aug 2024 09:37:08 GMT Subject: RFR: 8336846: assert(state->get_thread() == jt) failed: handshake unsafe conditions [v2] In-Reply-To: References: Message-ID: > The JVMTI Watch Field functions do not disable VTMS transitions with the `JvmtiVTMSTransitionDisabler`: > - `SetFieldAccessWatch()` > - `ClearFieldAccessWatch()` > - `SetFieldModificationWatch()` > - `ClearFieldModificationWatch()` > so in the `recompute_enabled()` we could see that a vthread is mounted, but in the `EnterInterpOnlyModeClosure` handshake the thread could have been unmounted already. This is a root cause of failures with this assert. > > The fix is to disable transitions in the `JvmtiEventControllerPrivate::change_field_watch()` function. > > Testing: > - TBD: submit mach5 tiers 1-6 Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: rearranged to have one JvmtiVTMSTransitionDisabler instead of two ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20413/files - new: https://git.openjdk.org/jdk/pull/20413/files/6a7ca7bc..bb94448c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20413&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20413&range=00-01 Stats: 4 lines in 1 file changed: 2 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20413/head:pull/20413 PR: https://git.openjdk.org/jdk/pull/20413 From sspitsyn at openjdk.org Thu Aug 1 09:37:08 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 1 Aug 2024 09:37:08 GMT Subject: RFR: 8336846: assert(state->get_thread() == jt) failed: handshake unsafe conditions [v2] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 05:25:53 GMT, David Holmes wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> rearranged to have one JvmtiVTMSTransitionDisabler instead of two > > src/hotspot/share/prims/jvmtiEventController.cpp line 988: > >> 986: (*count_addr)++; >> 987: if (*count_addr == 1) { >> 988: JvmtiVTMSTransitionDisabler disabler; > > Could you just declare this outside the if-block rather than repeating it? Thank you for looking at this and for the suggestion, David. I wanted to optimize a little bit but agree it is not worth it, so updated now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20413#discussion_r1699840543 From eliu at openjdk.org Thu Aug 1 10:53:32 2024 From: eliu at openjdk.org (Eric Liu) Date: Thu, 1 Aug 2024 10:53:32 GMT Subject: RFR: 8336163: Remove declarations of some debug-only methods in release build [v3] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 08:33:48 GMT, Qizheng Xing wrote: >> Some of the methods are defined only in debug mode, but their declarations still exist in release mode. >> >> This is considered a bug because these methods may be called mistakenly in release mode and cause the build to fail. > > Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: > > Merge `#ifndef PRODUCT` regions. Marked as reviewed by eliu (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20131#pullrequestreview-2212396047 From adinn at openjdk.org Thu Aug 1 11:42:59 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 1 Aug 2024 11:42:59 GMT Subject: RFR: 8337654: Relocate uncommon trap stub from SharedRuntime to OptoRuntime Message-ID: Reorganization of generation and management code for C2-specific blob so that it is, as far as possible, under the scope of class OptoRuntime with an implementation located in C2-specific source files. ------------- Commit messages: - 8337654: Relocate uncommon trap stub from SharedRuntime to OptoRuntime Changes: https://git.openjdk.org/jdk/pull/20417/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20417&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337654 Stats: 2351 lines in 21 files changed: 981 ins; 1355 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/20417.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20417/head:pull/20417 PR: https://git.openjdk.org/jdk/pull/20417 From duke at openjdk.org Thu Aug 1 12:11:40 2024 From: duke at openjdk.org (duke) Date: Thu, 1 Aug 2024 12:11:40 GMT Subject: Withdrawn: 8333343: [REDO] AArch64: optimize integer remainder In-Reply-To: <2sQ52bHtUebVvRZ6dd0zC3So9sN2mm40kXaYLm0nm_k=.5ec3561b-8dc4-4666-af9f-c32e19ff1c04@github.com> References: <2sQ52bHtUebVvRZ6dd0zC3So9sN2mm40kXaYLm0nm_k=.5ec3561b-8dc4-4666-af9f-c32e19ff1c04@github.com> Message-ID: On Thu, 30 May 2024 05:33:00 GMT, Jin Guojie wrote: > On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. > > (1) The following test has passed, which shows performance improvement. > > make test TEST="micro:java.lang.IntegerDivMod" > make test TEST="micro:java.lang.LongDivMod" > > * IntegerDivMod.testDivideRemainderUnsigned baseline(ns/ops) 2223 with this pacth(ns/ops) 1885 improvement(%) 17.93% > > * IntegerDivMod.testRemainderUnsigned baseline(ns/ops) 2225 with this pacth(ns/ops) 1885 improvement(%) 18.03% > > * LongDivMod.testDivideRemainderUnsigned baseline(ns/ops) 2231 with this pacth(ns/ops) 1894 improvement(%) 17.79% > > * LongDivMod.testRemainderUnsigned baseline(ns/ops) 2232 with this pacth(ns/ops) 1891 improvement(%) 18.03% > > (2) jtreg test has passed > > make run-test? TEST=tier1 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/19471 From adinn at openjdk.org Thu Aug 1 12:26:44 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 1 Aug 2024 12:26:44 GMT Subject: RFR: 8337654: Relocate uncommon trap stub from SharedRuntime to OptoRuntime [v2] In-Reply-To: References: Message-ID: > Reorganization of generation and management code for C2-specific blob so that it is, as far as possible, under the scope of class OptoRuntime with an implementation located in C2-specific source files. Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: fix conditonal directives in arm code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20417/files - new: https://git.openjdk.org/jdk/pull/20417/files/b2821099..eb9b2d5c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20417&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20417&range=00-01 Stats: 6 lines in 1 file changed: 3 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20417.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20417/head:pull/20417 PR: https://git.openjdk.org/jdk/pull/20417 From coleenp at openjdk.org Thu Aug 1 12:27:33 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 1 Aug 2024 12:27:33 GMT Subject: RFR: 8335059: Consider renaming ClassLoaderData::keep_alive [v2] In-Reply-To: References: Message-ID: <_z0zJjXgwso9alJlEcAyCu8-fhINSLGOay5BQVlyUuY=.1b7c63b5-c533-4c24-a4d2-ba9404e3f214@github.com> On Thu, 1 Aug 2024 06:38:03 GMT, Stefan Karlsson wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename to keep_alive_ref_count > > src/hotspot/share/classfile/classLoaderData.hpp line 215: > >> 213: >> 214: private: >> 215: bool keep_alive_ref_count() const { return _keep_alive_ref_count; } > > keep_alive_ref_count() should return an int. The Windows GHA build complains about this. I don't know why the linux compiler I used didn't, but fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20408#discussion_r1700050506 From coleenp at openjdk.org Thu Aug 1 12:35:00 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 1 Aug 2024 12:35:00 GMT Subject: RFR: 8335059: Consider renaming ClassLoaderData::keep_alive [v3] In-Reply-To: References: Message-ID: <6q02aOYDow87b8ohDXO6DVTeyzITIKxCmFbBZJsuEDo=.8da3927e-2992-4962-9af2-918757929ab0@github.com> > How does this rename look? Instead of ClassLoaderData::keep_alive() and a _keep_alive refcount, it's been renamed to _strongly_reachable and is_strongly_reachable(). > Tested with tier1 on Oracle supported platforms. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Remove comments, update comment, fix compliation error. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20408/files - new: https://git.openjdk.org/jdk/pull/20408/files/38f036f7..4cd8e222 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20408&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20408&range=01-02 Stats: 5 lines in 2 files changed: 1 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20408.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20408/head:pull/20408 PR: https://git.openjdk.org/jdk/pull/20408 From coleenp at openjdk.org Thu Aug 1 12:35:00 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 1 Aug 2024 12:35:00 GMT Subject: RFR: 8335059: Consider renaming ClassLoaderData::keep_alive [v2] In-Reply-To: References: Message-ID: <5o-QOEATiOIpxLewZSF6zTDyQTOUKxPq-kR7h7qP_v4=.9d65ca83-88f8-456b-851c-367f937a6ffb@github.com> On Thu, 1 Aug 2024 06:43:14 GMT, Stefan Karlsson wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename to keep_alive_ref_count > > src/hotspot/share/classfile/classLoaderData.hpp line 308: > >> 306: >> 307: // Used to refcount a non-strong hidden class's s CLD in order to indicate their aliveness. >> 308: void inc_keep_alive_ref_count(); > > This comment is slightly misleading. We have two mechanisms to determine a CLDs aliveness. The first is this ref-counting mechanism, the other is the GC tracing. One can misinterpret this comment to mean that we only use the ref-count to determine the aliveness. > > Could we tweak this comment to say something that the ref-count is used to *force* the aliveness of CLDs? How about: // Used to refcount a non-strong hidden class's CLD in order to force its aliveness during // loading, when gc tracing may not find this CLD alive through the holder. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20408#discussion_r1700056669 From stefank at openjdk.org Thu Aug 1 12:39:30 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 1 Aug 2024 12:39:30 GMT Subject: RFR: 8335059: Consider renaming ClassLoaderData::keep_alive [v3] In-Reply-To: <6q02aOYDow87b8ohDXO6DVTeyzITIKxCmFbBZJsuEDo=.8da3927e-2992-4962-9af2-918757929ab0@github.com> References: <6q02aOYDow87b8ohDXO6DVTeyzITIKxCmFbBZJsuEDo=.8da3927e-2992-4962-9af2-918757929ab0@github.com> Message-ID: On Thu, 1 Aug 2024 12:35:00 GMT, Coleen Phillimore wrote: >> How does this rename look? Instead of ClassLoaderData::keep_alive() and a _keep_alive refcount, it's been renamed to _strongly_reachable and is_strongly_reachable(). >> Tested with tier1 on Oracle supported platforms. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Remove comments, update comment, fix compliation error. Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20408#pullrequestreview-2212603430 From aph at openjdk.org Thu Aug 1 12:41:47 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 1 Aug 2024 12:41:47 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v13] In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: > This patch expands the use of a hash table for secondary superclasses > to the interpreter, C1, and runtime. It also adds a C2 implementation > of hashed lookup in cases where the superclass isn't known at compile > time. > > HotSpot shared runtime > ---------------------- > > Building hashed secondary tables is now unconditional. It takes very > little time, and now that the shared runtime always has the tables, it > might as well take advantage of them. The shared code is easier to > follow now, I think. > > There might be a performance issue with x86-64 in that we build > HotSpot for a default x86-64 target that does not support popcount. > This means that HotSpot C++ runtime on x86 always uses a software > emulation for popcount, even though the vast majority of machines made > for the past 20 years can do popcount in a single instruction. It > wouldn't be terribly hard to do something about that. > > Having said that, the software popcount is really not bad. > > x86 > --- > > x86 is rather tricky, because we still support > `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as > well as 32- and 64-bit ports. There's some further complication in > that only `RCX` can be used as a shift count, so there's some register > shuffling to do. All of this makes the logic in macroAssembler_x86.cpp > rather gnarly, with multiple levels of conditionals at compile time > and runtime. > > AArch64 > ------- > > AArch64 is considerably more straightforward. We always have a > popcount instruction and (thankfully) no 32-bit code to worry about. > > Generally > --------- > > I would dearly love simply to rip out the "old" secondary supers cache > support, but I've left it in just in case someone has a performance > regression. > > The versions of `MacroAssembler::lookup_secondary_supers_table` that > work with variable superclasses don't take a fixed set of temp > registers, and neither do they call out to to a slow path subroutine. > Instead, the slow patch is expanded inline. > > I don't think this is necessarily bad. Apart from the very rare cases > where C2 can't determine the superclass to search for at compile time, > this code is only used for generating stubs, and it seemed to me > ridiculous to have stubs calling other stubs. > > I've followed the guidance from @iwanowww not to obsess too much about > the performance of C1-compiled secondary supers lookups, and to prefer > simplicity over absolute performance. Nonetheless, this is a > complicated patch that touches many areas. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Reorganize x86 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19989/files - new: https://git.openjdk.org/jdk/pull/19989/files/2769d9e7..24aca9a2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=11-12 Stats: 64 lines in 5 files changed: 6 ins; 6 del; 52 mod Patch: https://git.openjdk.org/jdk/pull/19989.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989 PR: https://git.openjdk.org/jdk/pull/19989 From hgreule at openjdk.org Thu Aug 1 13:08:32 2024 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 1 Aug 2024 13:08:32 GMT Subject: RFR: 8335638: Calling VarHandle.{access-mode} methods reflectively throws wrong exception [v2] In-Reply-To: <1yQze0X7kl1oxFtlWu0rtJwHF2WtnZYJ7t6OteIJAnQ=.85eae267-7848-4978-aa11-9f2720e67e00@github.com> References: <1yQze0X7kl1oxFtlWu0rtJwHF2WtnZYJ7t6OteIJAnQ=.85eae267-7848-4978-aa11-9f2720e67e00@github.com> Message-ID: On Thu, 4 Jul 2024 06:22:31 GMT, Hannes Greule wrote: >> Similar to how `MethodHandle#invoke(Exact)` methods are already handled, this change adds special casing for `VarHandle.{access-mode}` methods. >> >> The exception message is less exact, but I think that's acceptable. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > address comments @JornVernee if you want you can also review ------------- PR Comment: https://git.openjdk.org/jdk/pull/20015#issuecomment-2262988654 From duke at openjdk.org Thu Aug 1 13:08:33 2024 From: duke at openjdk.org (duke) Date: Thu, 1 Aug 2024 13:08:33 GMT Subject: RFR: 8335638: Calling VarHandle.{access-mode} methods reflectively throws wrong exception [v2] In-Reply-To: <1yQze0X7kl1oxFtlWu0rtJwHF2WtnZYJ7t6OteIJAnQ=.85eae267-7848-4978-aa11-9f2720e67e00@github.com> References: <1yQze0X7kl1oxFtlWu0rtJwHF2WtnZYJ7t6OteIJAnQ=.85eae267-7848-4978-aa11-9f2720e67e00@github.com> Message-ID: On Thu, 4 Jul 2024 06:22:31 GMT, Hannes Greule wrote: >> Similar to how `MethodHandle#invoke(Exact)` methods are already handled, this change adds special casing for `VarHandle.{access-mode}` methods. >> >> The exception message is less exact, but I think that's acceptable. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > address comments @SirYwell Your change (at version e329ceb206d198e35186843391f4a8a25732e64e) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20015#issuecomment-2262992409 From aph at openjdk.org Thu Aug 1 13:13:08 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 1 Aug 2024 13:13:08 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v14] In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: > This patch expands the use of a hash table for secondary superclasses > to the interpreter, C1, and runtime. It also adds a C2 implementation > of hashed lookup in cases where the superclass isn't known at compile > time. > > HotSpot shared runtime > ---------------------- > > Building hashed secondary tables is now unconditional. It takes very > little time, and now that the shared runtime always has the tables, it > might as well take advantage of them. The shared code is easier to > follow now, I think. > > There might be a performance issue with x86-64 in that we build > HotSpot for a default x86-64 target that does not support popcount. > This means that HotSpot C++ runtime on x86 always uses a software > emulation for popcount, even though the vast majority of machines made > for the past 20 years can do popcount in a single instruction. It > wouldn't be terribly hard to do something about that. > > Having said that, the software popcount is really not bad. > > x86 > --- > > x86 is rather tricky, because we still support > `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as > well as 32- and 64-bit ports. There's some further complication in > that only `RCX` can be used as a shift count, so there's some register > shuffling to do. All of this makes the logic in macroAssembler_x86.cpp > rather gnarly, with multiple levels of conditionals at compile time > and runtime. > > AArch64 > ------- > > AArch64 is considerably more straightforward. We always have a > popcount instruction and (thankfully) no 32-bit code to worry about. > > Generally > --------- > > I would dearly love simply to rip out the "old" secondary supers cache > support, but I've left it in just in case someone has a performance > regression. > > The versions of `MacroAssembler::lookup_secondary_supers_table` that > work with variable superclasses don't take a fixed set of temp > registers, and neither do they call out to to a slow path subroutine. > Instead, the slow patch is expanded inline. > > I don't think this is necessarily bad. Apart from the very rare cases > where C2 can't determine the superclass to search for at compile time, > this code is only used for generating stubs, and it seemed to me > ridiculous to have stubs calling other stubs. > > I've followed the guidance from @iwanowww not to obsess too much about > the performance of C1-compiled secondary supers lookups, and to prefer > simplicity over absolute performance. Nonetheless, this is a > complicated patch that touches many areas. Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: - use assert rather than guarantee - Untabify ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19989/files - new: https://git.openjdk.org/jdk/pull/19989/files/24aca9a2..4c7fad7b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=12-13 Stats: 16 lines in 2 files changed: 0 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/19989.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989 PR: https://git.openjdk.org/jdk/pull/19989 From aph at openjdk.org Thu Aug 1 13:35:49 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 1 Aug 2024 13:35:49 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v15] In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: <2_4xx-GF-rYQD1h2GUG2iapuY38LIuDxNuv0OH-VAqU=.ed84253c-9a2a-4d13-9555-8a077fda5f22@github.com> > This patch expands the use of a hash table for secondary superclasses > to the interpreter, C1, and runtime. It also adds a C2 implementation > of hashed lookup in cases where the superclass isn't known at compile > time. > > HotSpot shared runtime > ---------------------- > > Building hashed secondary tables is now unconditional. It takes very > little time, and now that the shared runtime always has the tables, it > might as well take advantage of them. The shared code is easier to > follow now, I think. > > There might be a performance issue with x86-64 in that we build > HotSpot for a default x86-64 target that does not support popcount. > This means that HotSpot C++ runtime on x86 always uses a software > emulation for popcount, even though the vast majority of machines made > for the past 20 years can do popcount in a single instruction. It > wouldn't be terribly hard to do something about that. > > Having said that, the software popcount is really not bad. > > x86 > --- > > x86 is rather tricky, because we still support > `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as > well as 32- and 64-bit ports. There's some further complication in > that only `RCX` can be used as a shift count, so there's some register > shuffling to do. All of this makes the logic in macroAssembler_x86.cpp > rather gnarly, with multiple levels of conditionals at compile time > and runtime. > > AArch64 > ------- > > AArch64 is considerably more straightforward. We always have a > popcount instruction and (thankfully) no 32-bit code to worry about. > > Generally > --------- > > I would dearly love simply to rip out the "old" secondary supers cache > support, but I've left it in just in case someone has a performance > regression. > > The versions of `MacroAssembler::lookup_secondary_supers_table` that > work with variable superclasses don't take a fixed set of temp > registers, and neither do they call out to to a slow path subroutine. > Instead, the slow patch is expanded inline. > > I don't think this is necessarily bad. Apart from the very rare cases > where C2 can't determine the superclass to search for at compile time, > this code is only used for generating stubs, and it seemed to me > ridiculous to have stubs calling other stubs. > > I've followed the guidance from @iwanowww not to obsess too much about > the performance of C1-compiled secondary supers lookups, and to prefer > simplicity over absolute performance. Nonetheless, this is a > complicated patch that touches many areas. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Fix shared code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19989/files - new: https://git.openjdk.org/jdk/pull/19989/files/4c7fad7b..7792ca8a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=13-14 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19989.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989 PR: https://git.openjdk.org/jdk/pull/19989 From aph at openjdk.org Thu Aug 1 13:40:48 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 1 Aug 2024 13:40:48 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v16] In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: > This patch expands the use of a hash table for secondary superclasses > to the interpreter, C1, and runtime. It also adds a C2 implementation > of hashed lookup in cases where the superclass isn't known at compile > time. > > HotSpot shared runtime > ---------------------- > > Building hashed secondary tables is now unconditional. It takes very > little time, and now that the shared runtime always has the tables, it > might as well take advantage of them. The shared code is easier to > follow now, I think. > > There might be a performance issue with x86-64 in that we build > HotSpot for a default x86-64 target that does not support popcount. > This means that HotSpot C++ runtime on x86 always uses a software > emulation for popcount, even though the vast majority of machines made > for the past 20 years can do popcount in a single instruction. It > wouldn't be terribly hard to do something about that. > > Having said that, the software popcount is really not bad. > > x86 > --- > > x86 is rather tricky, because we still support > `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as > well as 32- and 64-bit ports. There's some further complication in > that only `RCX` can be used as a shift count, so there's some register > shuffling to do. All of this makes the logic in macroAssembler_x86.cpp > rather gnarly, with multiple levels of conditionals at compile time > and runtime. > > AArch64 > ------- > > AArch64 is considerably more straightforward. We always have a > popcount instruction and (thankfully) no 32-bit code to worry about. > > Generally > --------- > > I would dearly love simply to rip out the "old" secondary supers cache > support, but I've left it in just in case someone has a performance > regression. > > The versions of `MacroAssembler::lookup_secondary_supers_table` that > work with variable superclasses don't take a fixed set of temp > registers, and neither do they call out to to a slow path subroutine. > Instead, the slow patch is expanded inline. > > I don't think this is necessarily bad. Apart from the very rare cases > where C2 can't determine the superclass to search for at compile time, > this code is only used for generating stubs, and it seemed to me > ridiculous to have stubs calling other stubs. > > I've followed the guidance from @iwanowww not to obsess too much about > the performance of C1-compiled secondary supers lookups, and to prefer > simplicity over absolute performance. Nonetheless, this is a > complicated patch that touches many areas. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Fix shared code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19989/files - new: https://git.openjdk.org/jdk/pull/19989/files/7792ca8a..eb739933 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=14-15 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19989.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989 PR: https://git.openjdk.org/jdk/pull/19989 From dchuyko at openjdk.org Thu Aug 1 13:42:02 2024 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Thu, 1 Aug 2024 13:42:02 GMT Subject: RFR: 8337657: AArch64: No need for acquire fence in safepoint poll during JNI calls Message-ID: <44fCyKKdCaJQB8k82GuR5LnwERGsRta7cVpasF0kVvc=.6f38152b-22a1-4316-9835-0a1e2a9a78c5@github.com> This is a tiny change to improve JNI calls performance on AArch64. In SharedRuntime::generate_native_wrapper() and TemplateInterpreterGenerator::generate_native_entry() safepoint_poll is made with acquire=true. It comes from the aarch64 implementation of Thread-local handshakes [0], [1]. Presently, it is no longer required [2]. Turning LDAR into regular load has significant performance effect. For instance, NativeCall benchmarks [3] by @simonis on Graviton 2 show following improvements: -XX:-UseSystemMemoryBarrier (current default) NativeCall.callingEmptyNative 8.04% NativeCall.callingJniCriticalArray 1.01% NativeCall.callingJniCriticalEmpty 6.73% NativeCall.callingStaticEmpty 10.47% NativeCall.methodCallingNativeWithArgs 10.73% NativeCall.methodCallingNativeWithManyArgs 9.41% NativeCall.staticMethodCallingStaticNativeIntStub 3.68% NativeCall.staticMethodCallingStaticNativeNoTiered 9.86% NativeCall.staticMethodCallingStaticNativeWithManyArgs 4.81% -XX:+UseSystemMemoryBarrier NativeCall.callingEmptyNative 33.70% NativeCall.callingJniCriticalArray 3.64% NativeCall.callingJniCriticalEmpty 34.15% NativeCall.callingStaticArray 3.02% NativeCall.callingStaticEmpty 34.25% NativeCall.methodCallingNativeWithArgs 35.98% NativeCall.methodCallingNativeWithManyArgs 34.42% NativeCall.staticMethodCallingStaticNativeIntStub 15.66% NativeCall.staticMethodCallingStaticNativeNoTiered 35.27% NativeCall.staticMethodCallingStaticNativeWithManyArgs 32.99% Similar improvements are observed on different CPUs. It is especially interesting that -XX:+UseSystemMemoryBarrier variant began to show improvements in cases where there was parity. Testing: tier1-3 on linux-aarch64. [0] https://bugs.openjdk.org/browse/JDK-8189596 [1] https://mail.openjdk.org/pipermail/hotspot-dev/2017-November/029264.html [2] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-July/078715.html [3] https://github.com/simonis/Java2Native/tree/main/examples/jmh/java2native ------------- Commit messages: - Copyright year - Safepoint poll with acquire=false in JNI entry/wrapper Changes: https://git.openjdk.org/jdk/pull/20420/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20420&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337657 Stats: 20 lines in 2 files changed: 0 ins; 15 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20420.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20420/head:pull/20420 PR: https://git.openjdk.org/jdk/pull/20420 From adinn at openjdk.org Thu Aug 1 14:02:31 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 1 Aug 2024 14:02:31 GMT Subject: RFR: 8337396: Cleanup usage of ExternalAddess [v2] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 03:19:01 GMT, Vladimir Kozlov wrote: >> `ExternalAddess` should be used only for data load. For calls (and jump) instructions we should use `RuntimeAddress` which uses `runtime_call_Relocation`. >> >> I found few places where `ExternalAddess` is used incorrectly and fixed them. >> >> I also added code to print "hottest" (most referenced) `ExternalAddess` addresses in global table to move them into static global tables which will be introduced by [JDK-8334691](https://bugs.openjdk.org/browse/JDK-8334691) and [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519). >> >> Here is current output from debug VM on MacBook M1 (Aarch64): >> >> External addresses table: 6 entries, 324 accesses >> 0: 158 0x00000001082de0f0 : extn: vmClasses::_klasses+480 >> 1: 84 0x00000001082ddf20 : extn: vmClasses::_klasses+16 >> 2: 40 0x00000001082c4790 : extn: SharedRuntime::_partial_subtype_ctr >> 3: 24 0x00000001082bdb04 : extn: JvmtiExport::_should_notify_object_alloc >> 4: 18 0x0000000118384080 : stub: forward exception >> >> >> on MacOS-x64: >> >> External addresses table: 143 entries, 44405 accesses >> 0: 11766 0x00000001047922a0 : extn: CompressedOops::_narrow_oop >> 1: 11002 0x0000000104474384 : 'should not reach here' >> 2: 9672 0x0000000104581a90 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+882068 >> 3: 2447 0x0000000104508005 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+383753 >> 4: 1916 0x000000010458188e : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+881554 >> >> >> and on linux-x64: >> >> External addresses table: 143 entries, 77297 accesses >> 0: 22334 0x00007f35d5b9c000 : '' >> 1: 19789 0x00007f35d55eea1f : 'should not reach here' >> 2: 18366 0x00007f35d5747bb8 : 'MacroAssembler::decode_heap_oop: heap base corrupted?' >> 3: 5036 0x00007f35d56e4d40 : 'uncommon trap returned which should never happen' >> 4: 3643 0x00007f35d57479f8 : 'MacroAssembler::encode_heap_oop: heap base corrupted?' >> >> >> Few points about difference in output: >> 1. aarch64 does not use `ExternalAddess` or any relocation for messages (strings). >> 2. `stub: forward exception` corresponds to `StubRoutines::forward_exception_entry()` for which C2 generates tail-call from [C2's stubs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/generateOptoStub.cpp#L258C48-L258C87). It is difficult to convert it to `RuntimeAddress` because how relocation for constants in C2 are handled. >> 3... > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Add missed ExternalAddress changes Hi Vladimir, If I am following your comment regarding forward exception correctly, it seems that this is being encoded as an ExternalAddress because: 1. the raw address constant for StubRoutines::forward_exception_entry() passed into the TailCallNode gets processed in the AD back end by moving the constant into a register 2. there is no way to mark the constant passed to the TailCallNode as requiring a runtime_call_type reloc 3. so the address move defaults to using external_word_type I'm wondering if this can be fixed in the back end. Firstly, can I just clarify that is it just the forward exception target that needs a runtime_call_type reloc? The other uses of TailCallNode and TailJumpNode transfer control from a stub to a return address passed as argument to the call. So, there is no constant supplied as argument and hence nothing to relocate for those cases. If so then could we use a match rule to detect this case e.g. instruct TailCalljmpInd(immP jump_target, inline_cache_RegP method_ptr) %{ match(TailCall jump_target method_ptr); . . . If we provide this rule with an encoding that loads the target using a runtime reloc then it should be preferred on cost grounds to the loadConP rule which uses the default external reloc. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20412#issuecomment-2263153016 From jvernee at openjdk.org Thu Aug 1 14:38:34 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 1 Aug 2024 14:38:34 GMT Subject: RFR: 8335638: Calling VarHandle.{access-mode} methods reflectively throws wrong exception [v2] In-Reply-To: References: <1yQze0X7kl1oxFtlWu0rtJwHF2WtnZYJ7t6OteIJAnQ=.85eae267-7848-4978-aa11-9f2720e67e00@github.com> Message-ID: <6iSYVWCzeYcklxLJeKLvaXaM-0Vis0f-LbUUy3mARvM=.23e3aa59-ec67-48aa-beaf-27689c94b52d@github.com> On Fri, 5 Jul 2024 14:23:50 GMT, Jorn Vernee wrote: >> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: >> >> address comments > > I think this needs a CSR, to document the change in behavior. (See e.g. https://bugs.openjdk.org/browse/JDK-8335554 which is a very similar case) > @JornVernee if you want you can also review https://bugs.openjdk.org/browse/JDK-8337301 @liach already asked me to take a look. The note looks good. There's no way for me to mark it as reviewed it seems ------------- PR Comment: https://git.openjdk.org/jdk/pull/20015#issuecomment-2263238945 From liach at openjdk.org Thu Aug 1 14:44:38 2024 From: liach at openjdk.org (Chen Liang) Date: Thu, 1 Aug 2024 14:44:38 GMT Subject: RFR: 8335638: Calling VarHandle.{access-mode} methods reflectively throws wrong exception [v2] In-Reply-To: <1yQze0X7kl1oxFtlWu0rtJwHF2WtnZYJ7t6OteIJAnQ=.85eae267-7848-4978-aa11-9f2720e67e00@github.com> References: <1yQze0X7kl1oxFtlWu0rtJwHF2WtnZYJ7t6OteIJAnQ=.85eae267-7848-4978-aa11-9f2720e67e00@github.com> Message-ID: On Thu, 4 Jul 2024 06:22:31 GMT, Hannes Greule wrote: >> Similar to how `MethodHandle#invoke(Exact)` methods are already handled, this change adds special casing for `VarHandle.{access-mode}` methods. >> >> The exception message is less exact, but I think that's acceptable. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > address comments Thanks for this fix and the discussions. We should be good to go. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20015#issuecomment-2263247935 From hgreule at openjdk.org Thu Aug 1 14:44:39 2024 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 1 Aug 2024 14:44:39 GMT Subject: Integrated: 8335638: Calling VarHandle.{access-mode} methods reflectively throws wrong exception In-Reply-To: References: Message-ID: On Wed, 3 Jul 2024 19:43:05 GMT, Hannes Greule wrote: > Similar to how `MethodHandle#invoke(Exact)` methods are already handled, this change adds special casing for `VarHandle.{access-mode}` methods. > > The exception message is less exact, but I think that's acceptable. This pull request has now been integrated. Changeset: 9fe6e231 Author: Hannes Greule Committer: Chen Liang URL: https://git.openjdk.org/jdk/commit/9fe6e2316aef8fd125a7905cff2a2d9ae5d26109 Stats: 77 lines in 2 files changed: 72 ins; 0 del; 5 mod 8335638: Calling VarHandle.{access-mode} methods reflectively throws wrong exception Reviewed-by: liach ------------- PR: https://git.openjdk.org/jdk/pull/20015 From adinn at openjdk.org Thu Aug 1 15:23:32 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 1 Aug 2024 15:23:32 GMT Subject: RFR: 8337396: Cleanup usage of ExternalAddess [v2] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 03:19:01 GMT, Vladimir Kozlov wrote: >> `ExternalAddess` should be used only for data load. For calls (and jump) instructions we should use `RuntimeAddress` which uses `runtime_call_Relocation`. >> >> I found few places where `ExternalAddess` is used incorrectly and fixed them. >> >> I also added code to print "hottest" (most referenced) `ExternalAddess` addresses in global table to move them into static global tables which will be introduced by [JDK-8334691](https://bugs.openjdk.org/browse/JDK-8334691) and [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519). >> >> Here is current output from debug VM on MacBook M1 (Aarch64): >> >> External addresses table: 6 entries, 324 accesses >> 0: 158 0x00000001082de0f0 : extn: vmClasses::_klasses+480 >> 1: 84 0x00000001082ddf20 : extn: vmClasses::_klasses+16 >> 2: 40 0x00000001082c4790 : extn: SharedRuntime::_partial_subtype_ctr >> 3: 24 0x00000001082bdb04 : extn: JvmtiExport::_should_notify_object_alloc >> 4: 18 0x0000000118384080 : stub: forward exception >> >> >> on MacOS-x64: >> >> External addresses table: 143 entries, 44405 accesses >> 0: 11766 0x00000001047922a0 : extn: CompressedOops::_narrow_oop >> 1: 11002 0x0000000104474384 : 'should not reach here' >> 2: 9672 0x0000000104581a90 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+882068 >> 3: 2447 0x0000000104508005 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+383753 >> 4: 1916 0x000000010458188e : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+881554 >> >> >> and on linux-x64: >> >> External addresses table: 143 entries, 77297 accesses >> 0: 22334 0x00007f35d5b9c000 : '' >> 1: 19789 0x00007f35d55eea1f : 'should not reach here' >> 2: 18366 0x00007f35d5747bb8 : 'MacroAssembler::decode_heap_oop: heap base corrupted?' >> 3: 5036 0x00007f35d56e4d40 : 'uncommon trap returned which should never happen' >> 4: 3643 0x00007f35d57479f8 : 'MacroAssembler::encode_heap_oop: heap base corrupted?' >> >> >> Few points about difference in output: >> 1. aarch64 does not use `ExternalAddess` or any relocation for messages (strings). >> 2. `stub: forward exception` corresponds to `StubRoutines::forward_exception_entry()` for which C2 generates tail-call from [C2's stubs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/generateOptoStub.cpp#L258C48-L258C87). It is difficult to convert it to `RuntimeAddress` because how relocation for constants in C2 are handled. >> 3... > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Add missed ExternalAddress changes I think I found a few more places in aarch64 and x86 where we need to use RuntimeAddress. There may be similar problems in the other arch implementations: diff --git a/src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp b/src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp index 90c7ca6f08a..c6b078c3c7d 100644 --- a/src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp +++ b/src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp @@ -179,7 +179,7 @@ void InterpreterRuntime::SignatureHandlerGenerator::generate(uint64_t fingerprin iterate(fingerprint); // return result handler - __ lea(r0, ExternalAddress(Interpreter::result_handler(method()->result_type()))); + __ lea(r0, RuntimeAddress(Interpreter::result_handler(method()->result_type()))); __ ret(lr); __ flush(); diff --git a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp index f90aefc8fd3..73da947c318 100644 --- a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp +++ b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp @@ -1879,7 +1879,7 @@ void MacroAssembler::_verify_oop(Register reg, const char* s, const char* file, movptr(rscratch1, (uintptr_t)(address)b); // call indirectly to solve generation ordering problem - lea(rscratch2, ExternalAddress(StubRoutines::verify_oop_subroutine_entry_address())); + lea(rscratch2, RuntimeAddress(StubRoutines::verify_oop_subroutine_entry_address())); ldr(rscratch2, Address(rscratch2)); blr(rscratch2); @@ -1918,7 +1918,7 @@ void MacroAssembler::_verify_oop_addr(Address addr, const char* s, const char* f movptr(rscratch1, (uintptr_t)(address)b); // call indirectly to solve generation ordering problem - lea(rscratch2, ExternalAddress(StubRoutines::verify_oop_subroutine_entry_address())); + lea(rscratch2, RuntimeAddress(StubRoutines::verify_oop_subroutine_entry_address())); ldr(rscratch2, Address(rscratch2)); blr(rscratch2); diff --git a/src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp b/src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp index 89f5fbd281b..215f1b6453b 100644 --- a/src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp +++ b/src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp @@ -1337,7 +1337,7 @@ address TemplateInterpreterGenerator::generate_native_entry(bool synchronized) { { Label L; __ ldr(r10, Address(rmethod, Method::native_function_offset())); - address unsatisfied = (SharedRuntime::native_method_throw_unsatisfied_link_error_entry()); + RuntimeAddress unsatisfied(SharedRuntime::native_method_throw_unsatisfied_link_error_entry()); __ mov(rscratch2, unsatisfied); __ ldr(rscratch2, rscratch2); __ cmp(r10, rscratch2); @@ -1432,7 +1432,7 @@ address TemplateInterpreterGenerator::generate_native_entry(bool synchronized) { // hand. // __ mov(c_rarg0, rthread); - __ mov(rscratch2, CAST_FROM_FN_PTR(address, JavaThread::check_special_condition_for_native_trans)); + __ mov(rscratch2, RuntimeAddress(CAST_FROM_FN_PTR(address, JavaThread::check_special_condition_for_native_trans))); __ blr(rscratch2); __ get_method(rmethod); __ reinit_heapbase(); @@ -1461,7 +1461,7 @@ address TemplateInterpreterGenerator::generate_native_entry(bool synchronized) { { Label no_oop; - __ adr(t, ExternalAddress(AbstractInterpreter::result_handler(T_OBJECT))); + __ adr(t, RuntimeAddress(AbstractInterpreter::result_handler(T_OBJECT))); __ cmp(t, result_handler); __ br(Assembler::NE, no_oop); // Unbox oop result, e.g. JNIHandles::resolve result. @@ -1482,7 +1482,7 @@ address TemplateInterpreterGenerator::generate_native_entry(bool synchronized) { __ push_call_clobbered_registers(); __ mov(c_rarg0, rthread); - __ mov(rscratch2, CAST_FROM_FN_PTR(address, SharedRuntime::reguard_yellow_pages)); + __ mov(rscratch2, RuntimeAddress(CAST_FROM_FN_PTR(address, SharedRuntime::reguard_yellow_pages))); __ blr(rscratch2); __ pop_call_clobbered_registers(); @@ -2085,7 +2085,7 @@ void TemplateInterpreterGenerator::trace_bytecode(Template* t) { assert(Interpreter::trace_code(t->tos_in()) != nullptr, "entry must have been generated"); - __ bl(Interpreter::trace_code(t->tos_in())); + __ bl(RuntimeAddress(Interpreter::trace_code(t->tos_in()))); __ reinit_heapbase(); } diff --git a/src/hotspot/cpu/x86/interpreterRT_x86_32.cpp b/src/hotspot/cpu/x86/interpreterRT_x86_32.cpp index a9b96c22427..f9adc8b49a9 100644 --- a/src/hotspot/cpu/x86/interpreterRT_x86_32.cpp +++ b/src/hotspot/cpu/x86/interpreterRT_x86_32.cpp @@ -93,7 +93,7 @@ void InterpreterRuntime::SignatureHandlerGenerator::generate( uint64_t fingerpri iterate(fingerprint); // return result handler __ lea(rax, - ExternalAddress((address)Interpreter::result_handler(method()->result_type()))); + RuntimeAddress((address)Interpreter::result_handler(method()->result_type()))); // return __ ret(0); __ flush(); diff --git a/src/hotspot/cpu/x86/interpreterRT_x86_64.cpp b/src/hotspot/cpu/x86/interpreterRT_x86_64.cpp index 7e390564f4c..ec78445dab6 100644 --- a/src/hotspot/cpu/x86/interpreterRT_x86_64.cpp +++ b/src/hotspot/cpu/x86/interpreterRT_x86_64.cpp @@ -295,7 +295,7 @@ void InterpreterRuntime::SignatureHandlerGenerator::generate(uint64_t fingerprin iterate(fingerprint); // return result handler - __ lea(rax, ExternalAddress(Interpreter::result_handler(method()->result_type()))); + __ lea(rax, RuntimeAddress(Interpreter::result_handler(method()->result_type()))); __ ret(0); __ flush(); diff --git a/src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp b/src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp index fe2bf67afc9..594216860f3 100644 --- a/src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp +++ b/src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp @@ -1002,7 +1002,7 @@ address TemplateInterpreterGenerator::generate_native_entry(bool synchronized) { { Label L; __ movptr(rax, Address(method, Method::native_function_offset())); - ExternalAddress unsatisfied(SharedRuntime::native_method_throw_unsatisfied_link_error_entry()); + RuntimeAddress unsatisfied(SharedRuntime::native_method_throw_unsatisfied_link_error_entry()); __ cmpptr(rax, unsatisfied.addr(), rscratch1); __ jcc(Assembler::notEqual, L); __ call_VM(noreg, @@ -1075,8 +1075,8 @@ address TemplateInterpreterGenerator::generate_native_entry(bool synchronized) { { Label L; Label push_double; - ExternalAddress float_handler(AbstractInterpreter::result_handler(T_FLOAT)); - ExternalAddress double_handler(AbstractInterpreter::result_handler(T_DOUBLE)); + RuntimeAddress float_handler(AbstractInterpreter::result_handler(T_FLOAT)); + RuntimeAddress double_handler(AbstractInterpreter::result_handler(T_DOUBLE)); __ cmpptr(Address(rbp, (frame::interpreter_frame_oop_temp_offset + 1)*wordSize), float_handler.addr(), noreg); __ jcc(Assembler::equal, push_double); @@ -1167,7 +1167,7 @@ address TemplateInterpreterGenerator::generate_native_entry(bool synchronized) { { Label no_oop; - __ lea(t, ExternalAddress(AbstractInterpreter::result_handler(T_OBJECT))); + __ lea(t, RuntimeAddress(AbstractInterpreter::result_handler(T_OBJECT))); __ cmpptr(t, Address(rbp, frame::interpreter_frame_result_handler_offset*wordSize)); __ jcc(Assembler::notEqual, no_oop); // retrieve result ------------- PR Comment: https://git.openjdk.org/jdk/pull/20412#issuecomment-2263340905 From kvn at openjdk.org Thu Aug 1 15:29:36 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 Aug 2024 15:29:36 GMT Subject: RFR: 8337396: Cleanup usage of ExternalAddess [v2] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 03:19:01 GMT, Vladimir Kozlov wrote: >> `ExternalAddess` should be used only for data load. For calls (and jump) instructions we should use `RuntimeAddress` which uses `runtime_call_Relocation`. >> >> I found few places where `ExternalAddess` is used incorrectly and fixed them. >> >> I also added code to print "hottest" (most referenced) `ExternalAddess` addresses in global table to move them into static global tables which will be introduced by [JDK-8334691](https://bugs.openjdk.org/browse/JDK-8334691) and [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519). >> >> Here is current output from debug VM on MacBook M1 (Aarch64): >> >> External addresses table: 6 entries, 324 accesses >> 0: 158 0x00000001082de0f0 : extn: vmClasses::_klasses+480 >> 1: 84 0x00000001082ddf20 : extn: vmClasses::_klasses+16 >> 2: 40 0x00000001082c4790 : extn: SharedRuntime::_partial_subtype_ctr >> 3: 24 0x00000001082bdb04 : extn: JvmtiExport::_should_notify_object_alloc >> 4: 18 0x0000000118384080 : stub: forward exception >> >> >> on MacOS-x64: >> >> External addresses table: 143 entries, 44405 accesses >> 0: 11766 0x00000001047922a0 : extn: CompressedOops::_narrow_oop >> 1: 11002 0x0000000104474384 : 'should not reach here' >> 2: 9672 0x0000000104581a90 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+882068 >> 3: 2447 0x0000000104508005 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+383753 >> 4: 1916 0x000000010458188e : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+881554 >> >> >> and on linux-x64: >> >> External addresses table: 143 entries, 77297 accesses >> 0: 22334 0x00007f35d5b9c000 : '' >> 1: 19789 0x00007f35d55eea1f : 'should not reach here' >> 2: 18366 0x00007f35d5747bb8 : 'MacroAssembler::decode_heap_oop: heap base corrupted?' >> 3: 5036 0x00007f35d56e4d40 : 'uncommon trap returned which should never happen' >> 4: 3643 0x00007f35d57479f8 : 'MacroAssembler::encode_heap_oop: heap base corrupted?' >> >> >> Few points about difference in output: >> 1. aarch64 does not use `ExternalAddess` or any relocation for messages (strings). >> 2. `stub: forward exception` corresponds to `StubRoutines::forward_exception_entry()` for which C2 generates tail-call from [C2's stubs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/generateOptoStub.cpp#L258C48-L258C87). It is difficult to convert it to `RuntimeAddress` because how relocation for constants in C2 are handled. >> 3... > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Add missed ExternalAddress changes Thank you Andrew. Yes, you analysis is correct - we fall into default value when constructing Address() from register. Nice suggestion about separate matching for constant for `TailCall`. I can add assert which checks that constant is forward_exception_entry() address and catch other cases if they happen. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20412#issuecomment-2263354860 From kvn at openjdk.org Thu Aug 1 15:49:31 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 Aug 2024 15:49:31 GMT Subject: RFR: 8337396: Cleanup usage of ExternalAddess [v2] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 03:19:01 GMT, Vladimir Kozlov wrote: >> `ExternalAddess` should be used only for data load. For calls (and jump) instructions we should use `RuntimeAddress` which uses `runtime_call_Relocation`. >> >> I found few places where `ExternalAddess` is used incorrectly and fixed them. >> >> I also added code to print "hottest" (most referenced) `ExternalAddess` addresses in global table to move them into static global tables which will be introduced by [JDK-8334691](https://bugs.openjdk.org/browse/JDK-8334691) and [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519). >> >> Here is current output from debug VM on MacBook M1 (Aarch64): >> >> External addresses table: 6 entries, 324 accesses >> 0: 158 0x00000001082de0f0 : extn: vmClasses::_klasses+480 >> 1: 84 0x00000001082ddf20 : extn: vmClasses::_klasses+16 >> 2: 40 0x00000001082c4790 : extn: SharedRuntime::_partial_subtype_ctr >> 3: 24 0x00000001082bdb04 : extn: JvmtiExport::_should_notify_object_alloc >> 4: 18 0x0000000118384080 : stub: forward exception >> >> >> on MacOS-x64: >> >> External addresses table: 143 entries, 44405 accesses >> 0: 11766 0x00000001047922a0 : extn: CompressedOops::_narrow_oop >> 1: 11002 0x0000000104474384 : 'should not reach here' >> 2: 9672 0x0000000104581a90 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+882068 >> 3: 2447 0x0000000104508005 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+383753 >> 4: 1916 0x000000010458188e : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+881554 >> >> >> and on linux-x64: >> >> External addresses table: 143 entries, 77297 accesses >> 0: 22334 0x00007f35d5b9c000 : '' >> 1: 19789 0x00007f35d55eea1f : 'should not reach here' >> 2: 18366 0x00007f35d5747bb8 : 'MacroAssembler::decode_heap_oop: heap base corrupted?' >> 3: 5036 0x00007f35d56e4d40 : 'uncommon trap returned which should never happen' >> 4: 3643 0x00007f35d57479f8 : 'MacroAssembler::encode_heap_oop: heap base corrupted?' >> >> >> Few points about difference in output: >> 1. aarch64 does not use `ExternalAddess` or any relocation for messages (strings). >> 2. `stub: forward exception` corresponds to `StubRoutines::forward_exception_entry()` for which C2 generates tail-call from [C2's stubs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/generateOptoStub.cpp#L258C48-L258C87). It is difficult to convert it to `RuntimeAddress` because how relocation for constants in C2 are handled. >> 3... > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Add missed ExternalAddress changes I think it is fine to use `ExternalAddress` in compare instruction or return value even if address points to stub or VM's method. We need to use `RuntimeAddress` only when it is used as target for call or jump instructions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20412#issuecomment-2263397282 From kvn at openjdk.org Thu Aug 1 15:58:33 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 Aug 2024 15:58:33 GMT Subject: RFR: 8337396: Cleanup usage of ExternalAddess [v2] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 15:20:50 GMT, Andrew Dinn wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Add missed ExternalAddress changes > > I think I found a few more places in aarch64 and x86 where we need to use RuntimeAddress. There may be similar problems in the other arch implementations: > > diff --git a/src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp b/src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp > index 90c7ca6f08a..c6b078c3c7d 100644 > --- a/src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp > +++ b/src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp > @@ -179,7 +179,7 @@ void InterpreterRuntime::SignatureHandlerGenerator::generate(uint64_t fingerprin > iterate(fingerprint); > > // return result handler > - __ lea(r0, ExternalAddress(Interpreter::result_handler(method()->result_type()))); > + __ lea(r0, RuntimeAddress(Interpreter::result_handler(method()->result_type()))); > __ ret(lr); > > __ flush(); > diff --git a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > index f90aefc8fd3..73da947c318 100644 > --- a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > +++ b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > @@ -1879,7 +1879,7 @@ void MacroAssembler::_verify_oop(Register reg, const char* s, const char* file, > movptr(rscratch1, (uintptr_t)(address)b); > > // call indirectly to solve generation ordering problem > - lea(rscratch2, ExternalAddress(StubRoutines::verify_oop_subroutine_entry_address())); > + lea(rscratch2, RuntimeAddress(StubRoutines::verify_oop_subroutine_entry_address())); > ldr(rscratch2, Address(rscratch2)); > blr(rscratch2); > > @@ -1918,7 +1918,7 @@ void MacroAssembler::_verify_oop_addr(Address addr, const char* s, const char* f > movptr(rscratch1, (uintptr_t)(address)b); > > // call indirectly to solve generation ordering problem > - lea(rscratch2, ExternalAddress(StubRoutines::verify_oop_subroutine_entry_address())); > + lea(rscratch2, RuntimeAddress(StubRoutines::verify_oop_subroutine_entry_address())); > ldr(rscratch2, Address(rscratch2)); > blr(rscratch2); > > diff --git a/src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp b/src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp > index 89f5fbd281b..215f1b6453b 100644 > --- a/src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp > +++ b/src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp > @@ -1337,7 +1337,7 @@ address TemplateInterpreterGenerator::generate_native... @adinn Can you explain aarch64 code? Why we use`mov(r, addr); blr(r);` instead of `bl(adr)`? And why use `lea(r1, addr); ldr(r, r1)` instead of `mov(r, addr)`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20412#issuecomment-2263413640 From duke at openjdk.org Thu Aug 1 16:34:33 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Thu, 1 Aug 2024 16:34:33 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: <9Yfsy3a2pKqkzXycE62Mamvc1RSptxMAEE_KDtEVupQ=.7459217d-fd7d-44e6-861a-987280fe7843@github.com> On Thu, 16 May 2024 12:40:30 GMT, Andrew Haley wrote: >> Hi, >> >>> I can update the patch with current results on Monday and we could decide how to proceed with this PR after that. Sounds good? >> >> Yes, that's right. > >> Hi @theRealAph ! You may find the latest version here: [mikabl-arm at b3db421](https://github.com/mikabl-arm/jdk/commit/b3db421c795f683db1a001853990026bafc2ed4b) . I gave a short explanation in the commit message, feel free to ask for more details if required. >> >> Unfortunately, it still contains critical bugs and I won't be able to take a look into the issue before the next week at best. Until it's fixed, it's not possible to run the benchmarks. Although I expect it to improve performance on longer integer arrays based on a benchmark I've written in C++ and Assembly. The results aren't comparable to the jmh results, so I won't post them here. > > OK. One small thing, I think it's possible to rearrange things a bit to use `mlav`, which may help performance. No need for that until the code is correct, though. @theRealAph , I've updated https://github.com/mikabl-arm/jdk/tree/285826-vmul to implement the Neon part of the intrinsic as a separate stub method. This solves the second issue mentioned in the comment above. Could you check the code in https://github.com/mikabl-arm/jdk/tree/285826-vmul ? if you're happy with the direction it's taking, I'll merge the changes to this PR's branch (https://github.com/mikabl-arm/jdk/tree/8322770) to make things a bit easier to keep track of. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2263484488 From cjplummer at openjdk.org Thu Aug 1 16:37:33 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 1 Aug 2024 16:37:33 GMT Subject: RFR: 8336846: assert(state->get_thread() == jt) failed: handshake unsafe conditions [v2] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 09:37:08 GMT, Serguei Spitsyn wrote: >> The JVMTI Watch Field functions do not disable VTMS transitions with the `JvmtiVTMSTransitionDisabler`: >> - `SetFieldAccessWatch()` >> - `ClearFieldAccessWatch()` >> - `SetFieldModificationWatch()` >> - `ClearFieldModificationWatch()` >> so in the `recompute_enabled()` we could see that a vthread is mounted, but in the `EnterInterpOnlyModeClosure` handshake the thread could have been unmounted already. This is a root cause of failures with this assert. >> >> The fix is to disable transitions in the `JvmtiEventControllerPrivate::change_field_watch()` function. >> >> Testing: >> - TBD: submit mach5 tiers 1-6 > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > rearranged to have one JvmtiVTMSTransitionDisabler instead of two Looks good. ------------- Marked as reviewed by cjplummer (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20413#pullrequestreview-2213327863 From kvn at openjdk.org Thu Aug 1 16:39:32 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 Aug 2024 16:39:32 GMT Subject: RFR: 8337654: Relocate uncommon trap stub from SharedRuntime to OptoRuntime [v2] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 12:26:44 GMT, Andrew Dinn wrote: >> Reorganization of generation and management code for C2-specific blob so that it is, as far as possible, under the scope of class OptoRuntime with an implementation located in C2-specific source files. > > Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: > > fix conditonal directives in arm code Did you forget to add new file `runtime_risc.cpp` to changeset? ------------- Changes requested by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20417#pullrequestreview-2213332132 From coleenp at openjdk.org Thu Aug 1 17:00:02 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 1 Aug 2024 17:00:02 GMT Subject: RFR: 8332120: Potential compilation failure in istream.cpp:205 - loss of data on conversion Message-ID: This field _line_ending isn't used, and I'm not sure how this even works. So I deleted it. Tested with tier1 on many Oracle supported platforms. ------------- Commit messages: - 8332120: Potential compilation failure in istream.cpp:205 - loss of data on conversion Changes: https://git.openjdk.org/jdk/pull/20427/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20427&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332120 Stats: 9 lines in 2 files changed: 0 ins; 8 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20427.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20427/head:pull/20427 PR: https://git.openjdk.org/jdk/pull/20427 From kvn at openjdk.org Thu Aug 1 17:43:31 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 Aug 2024 17:43:31 GMT Subject: RFR: 8337396: Cleanup usage of ExternalAddess [v2] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 03:19:01 GMT, Vladimir Kozlov wrote: >> `ExternalAddess` should be used only for data load. For calls (and jump) instructions we should use `RuntimeAddress` which uses `runtime_call_Relocation`. >> >> I found few places where `ExternalAddess` is used incorrectly and fixed them. >> >> I also added code to print "hottest" (most referenced) `ExternalAddess` addresses in global table to move them into static global tables which will be introduced by [JDK-8334691](https://bugs.openjdk.org/browse/JDK-8334691) and [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519). >> >> Here is current output from debug VM on MacBook M1 (Aarch64): >> >> External addresses table: 6 entries, 324 accesses >> 0: 158 0x00000001082de0f0 : extn: vmClasses::_klasses+480 >> 1: 84 0x00000001082ddf20 : extn: vmClasses::_klasses+16 >> 2: 40 0x00000001082c4790 : extn: SharedRuntime::_partial_subtype_ctr >> 3: 24 0x00000001082bdb04 : extn: JvmtiExport::_should_notify_object_alloc >> 4: 18 0x0000000118384080 : stub: forward exception >> >> >> on MacOS-x64: >> >> External addresses table: 143 entries, 44405 accesses >> 0: 11766 0x00000001047922a0 : extn: CompressedOops::_narrow_oop >> 1: 11002 0x0000000104474384 : 'should not reach here' >> 2: 9672 0x0000000104581a90 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+882068 >> 3: 2447 0x0000000104508005 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+383753 >> 4: 1916 0x000000010458188e : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+881554 >> >> >> and on linux-x64: >> >> External addresses table: 143 entries, 77297 accesses >> 0: 22334 0x00007f35d5b9c000 : '' >> 1: 19789 0x00007f35d55eea1f : 'should not reach here' >> 2: 18366 0x00007f35d5747bb8 : 'MacroAssembler::decode_heap_oop: heap base corrupted?' >> 3: 5036 0x00007f35d56e4d40 : 'uncommon trap returned which should never happen' >> 4: 3643 0x00007f35d57479f8 : 'MacroAssembler::encode_heap_oop: heap base corrupted?' >> >> >> Few points about difference in output: >> 1. aarch64 does not use `ExternalAddess` or any relocation for messages (strings). >> 2. `stub: forward exception` corresponds to `StubRoutines::forward_exception_entry()` for which C2 generates tail-call from [C2's stubs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/generateOptoStub.cpp#L258C48-L258C87). It is difficult to convert it to `RuntimeAddress` because how relocation for constants in C2 are handled. >> 3... > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Add missed ExternalAddress changes Answering my questions: `bl()` is used for small branches `branch_range = NOT_DEBUG(128 * M) DEBUG_ONLY(2 * M)` `mov(r, addr)` simple loads target `addr` as 64-bit value into register through which we jump. `lea(r, addr)` construct address in register based on `Address` mode and is used when we need to load value **from** it. May be we should also use `lea()` instead of `mov()` in next case since we change `address` argument's type to `Address`: - address unsatisfied = (SharedRuntime::native_method_throw_unsatisfied_link_error_entry()); + RuntimeAddress unsatisfied(SharedRuntime::native_method_throw_unsatisfied_link_error_entry()); __ mov(rscratch2, unsatisfied); __ ldr(rscratch2, rscratch2); __ cmp(r10, rscratch2); ------------- PR Comment: https://git.openjdk.org/jdk/pull/20412#issuecomment-2263613466 From kvn at openjdk.org Thu Aug 1 18:06:32 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 Aug 2024 18:06:32 GMT Subject: RFR: 8337396: Cleanup usage of ExternalAddess [v2] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 03:19:01 GMT, Vladimir Kozlov wrote: >> `ExternalAddess` should be used only for data load. For calls (and jump) instructions we should use `RuntimeAddress` which uses `runtime_call_Relocation`. >> >> I found few places where `ExternalAddess` is used incorrectly and fixed them. >> >> I also added code to print "hottest" (most referenced) `ExternalAddess` addresses in global table to move them into static global tables which will be introduced by [JDK-8334691](https://bugs.openjdk.org/browse/JDK-8334691) and [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519). >> >> Here is current output from debug VM on MacBook M1 (Aarch64): >> >> External addresses table: 6 entries, 324 accesses >> 0: 158 0x00000001082de0f0 : extn: vmClasses::_klasses+480 >> 1: 84 0x00000001082ddf20 : extn: vmClasses::_klasses+16 >> 2: 40 0x00000001082c4790 : extn: SharedRuntime::_partial_subtype_ctr >> 3: 24 0x00000001082bdb04 : extn: JvmtiExport::_should_notify_object_alloc >> 4: 18 0x0000000118384080 : stub: forward exception >> >> >> on MacOS-x64: >> >> External addresses table: 143 entries, 44405 accesses >> 0: 11766 0x00000001047922a0 : extn: CompressedOops::_narrow_oop >> 1: 11002 0x0000000104474384 : 'should not reach here' >> 2: 9672 0x0000000104581a90 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+882068 >> 3: 2447 0x0000000104508005 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+383753 >> 4: 1916 0x000000010458188e : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+881554 >> >> >> and on linux-x64: >> >> External addresses table: 143 entries, 77297 accesses >> 0: 22334 0x00007f35d5b9c000 : '' >> 1: 19789 0x00007f35d55eea1f : 'should not reach here' >> 2: 18366 0x00007f35d5747bb8 : 'MacroAssembler::decode_heap_oop: heap base corrupted?' >> 3: 5036 0x00007f35d56e4d40 : 'uncommon trap returned which should never happen' >> 4: 3643 0x00007f35d57479f8 : 'MacroAssembler::encode_heap_oop: heap base corrupted?' >> >> >> Few points about difference in output: >> 1. aarch64 does not use `ExternalAddess` or any relocation for messages (strings). >> 2. `stub: forward exception` corresponds to `StubRoutines::forward_exception_entry()` for which C2 generates tail-call from [C2's stubs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/generateOptoStub.cpp#L258C48-L258C87). It is difficult to convert it to `RuntimeAddress` because how relocation for constants in C2 are handled. >> 3... > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Add missed ExternalAddress changes Actually [lea(r, RuntimeAddress a)](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/assembler_aarch64.cpp#L140) method generates the same instructions as [mov(r, RuntimeAddress a)](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L2040) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20412#issuecomment-2263656919 From kvn at openjdk.org Thu Aug 1 18:21:31 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 Aug 2024 18:21:31 GMT Subject: RFR: 8337396: Cleanup usage of ExternalAddess [v2] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 03:19:01 GMT, Vladimir Kozlov wrote: >> `ExternalAddess` should be used only for data load. For calls (and jump) instructions we should use `RuntimeAddress` which uses `runtime_call_Relocation`. >> >> I found few places where `ExternalAddess` is used incorrectly and fixed them. >> >> I also added code to print "hottest" (most referenced) `ExternalAddess` addresses in global table to move them into static global tables which will be introduced by [JDK-8334691](https://bugs.openjdk.org/browse/JDK-8334691) and [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519). >> >> Here is current output from debug VM on MacBook M1 (Aarch64): >> >> External addresses table: 6 entries, 324 accesses >> 0: 158 0x00000001082de0f0 : extn: vmClasses::_klasses+480 >> 1: 84 0x00000001082ddf20 : extn: vmClasses::_klasses+16 >> 2: 40 0x00000001082c4790 : extn: SharedRuntime::_partial_subtype_ctr >> 3: 24 0x00000001082bdb04 : extn: JvmtiExport::_should_notify_object_alloc >> 4: 18 0x0000000118384080 : stub: forward exception >> >> >> on MacOS-x64: >> >> External addresses table: 143 entries, 44405 accesses >> 0: 11766 0x00000001047922a0 : extn: CompressedOops::_narrow_oop >> 1: 11002 0x0000000104474384 : 'should not reach here' >> 2: 9672 0x0000000104581a90 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+882068 >> 3: 2447 0x0000000104508005 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+383753 >> 4: 1916 0x000000010458188e : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+881554 >> >> >> and on linux-x64: >> >> External addresses table: 143 entries, 77297 accesses >> 0: 22334 0x00007f35d5b9c000 : '' >> 1: 19789 0x00007f35d55eea1f : 'should not reach here' >> 2: 18366 0x00007f35d5747bb8 : 'MacroAssembler::decode_heap_oop: heap base corrupted?' >> 3: 5036 0x00007f35d56e4d40 : 'uncommon trap returned which should never happen' >> 4: 3643 0x00007f35d57479f8 : 'MacroAssembler::encode_heap_oop: heap base corrupted?' >> >> >> Few points about difference in output: >> 1. aarch64 does not use `ExternalAddess` or any relocation for messages (strings). >> 2. `stub: forward exception` corresponds to `StubRoutines::forward_exception_entry()` for which C2 generates tail-call from [C2's stubs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/generateOptoStub.cpp#L258C48-L258C87). It is difficult to convert it to `RuntimeAddress` because how relocation for constants in C2 are handled. >> 3... > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Add missed ExternalAddress changes And I got compilation failure for `templateInterpreterGenerator_aarch64.cpp` because `mov(r, Address a)` is private member of `MacroAssembler`. I have to use `lea()` with these changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20412#issuecomment-2263684235 From kvn at openjdk.org Thu Aug 1 18:38:47 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 Aug 2024 18:38:47 GMT Subject: RFR: 8337396: Cleanup usage of ExternalAddess [v3] In-Reply-To: References: Message-ID: > `ExternalAddess` should be used only for data load. For calls (and jump) instructions we should use `RuntimeAddress` which uses `runtime_call_Relocation`. > > I found few places where `ExternalAddess` is used incorrectly and fixed them. > > I also added code to print "hottest" (most referenced) `ExternalAddess` addresses in global table to move them into static global tables which will be introduced by [JDK-8334691](https://bugs.openjdk.org/browse/JDK-8334691) and [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519). > > Here is current output from debug VM on MacBook M1 (Aarch64): > > External addresses table: 6 entries, 324 accesses > 0: 158 0x00000001082de0f0 : extn: vmClasses::_klasses+480 > 1: 84 0x00000001082ddf20 : extn: vmClasses::_klasses+16 > 2: 40 0x00000001082c4790 : extn: SharedRuntime::_partial_subtype_ctr > 3: 24 0x00000001082bdb04 : extn: JvmtiExport::_should_notify_object_alloc > 4: 18 0x0000000118384080 : stub: forward exception > > > on MacOS-x64: > > External addresses table: 143 entries, 44405 accesses > 0: 11766 0x00000001047922a0 : extn: CompressedOops::_narrow_oop > 1: 11002 0x0000000104474384 : 'should not reach here' > 2: 9672 0x0000000104581a90 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+882068 > 3: 2447 0x0000000104508005 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+383753 > 4: 1916 0x000000010458188e : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+881554 > > > and on linux-x64: > > External addresses table: 143 entries, 77297 accesses > 0: 22334 0x00007f35d5b9c000 : '' > 1: 19789 0x00007f35d55eea1f : 'should not reach here' > 2: 18366 0x00007f35d5747bb8 : 'MacroAssembler::decode_heap_oop: heap base corrupted?' > 3: 5036 0x00007f35d56e4d40 : 'uncommon trap returned which should never happen' > 4: 3643 0x00007f35d57479f8 : 'MacroAssembler::encode_heap_oop: heap base corrupted?' > > > Few points about difference in output: > 1. aarch64 does not use `ExternalAddess` or any relocation for messages (strings). > 2. `stub: forward exception` corresponds to `StubRoutines::forward_exception_entry()` for which C2 generates tail-call from [C2's stubs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/generateOptoStub.cpp#L258C48-L258C87). It is difficult to convert it to `RuntimeAddress` because how relocation for constants in C2 are handled. > 3. linux-x64 implementation of `dlladdr()`, I used to print C++ symbol name, only... Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Add more missing cases and update Copyright year ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20412/files - new: https://git.openjdk.org/jdk/pull/20412/files/7fa9e11f..9bb7867f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20412&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20412&range=01-02 Stats: 10 lines in 4 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/20412.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20412/head:pull/20412 PR: https://git.openjdk.org/jdk/pull/20412 From kvn at openjdk.org Thu Aug 1 18:38:47 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 Aug 2024 18:38:47 GMT Subject: RFR: 8337396: Cleanup usage of ExternalAddess [v2] In-Reply-To: References: Message-ID: <1wTMzYGP9OrldP0vgWU2-Q1J6ZIoh8vCZEBDzBQGwUc=.446b50a2-92f6-4c7c-8fdf-df5fc5ba178c@github.com> On Thu, 1 Aug 2024 03:19:01 GMT, Vladimir Kozlov wrote: >> `ExternalAddess` should be used only for data load. For calls (and jump) instructions we should use `RuntimeAddress` which uses `runtime_call_Relocation`. >> >> I found few places where `ExternalAddess` is used incorrectly and fixed them. >> >> I also added code to print "hottest" (most referenced) `ExternalAddess` addresses in global table to move them into static global tables which will be introduced by [JDK-8334691](https://bugs.openjdk.org/browse/JDK-8334691) and [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519). >> >> Here is current output from debug VM on MacBook M1 (Aarch64): >> >> External addresses table: 6 entries, 324 accesses >> 0: 158 0x00000001082de0f0 : extn: vmClasses::_klasses+480 >> 1: 84 0x00000001082ddf20 : extn: vmClasses::_klasses+16 >> 2: 40 0x00000001082c4790 : extn: SharedRuntime::_partial_subtype_ctr >> 3: 24 0x00000001082bdb04 : extn: JvmtiExport::_should_notify_object_alloc >> 4: 18 0x0000000118384080 : stub: forward exception >> >> >> on MacOS-x64: >> >> External addresses table: 143 entries, 44405 accesses >> 0: 11766 0x00000001047922a0 : extn: CompressedOops::_narrow_oop >> 1: 11002 0x0000000104474384 : 'should not reach here' >> 2: 9672 0x0000000104581a90 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+882068 >> 3: 2447 0x0000000104508005 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+383753 >> 4: 1916 0x000000010458188e : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+881554 >> >> >> and on linux-x64: >> >> External addresses table: 143 entries, 77297 accesses >> 0: 22334 0x00007f35d5b9c000 : '' >> 1: 19789 0x00007f35d55eea1f : 'should not reach here' >> 2: 18366 0x00007f35d5747bb8 : 'MacroAssembler::decode_heap_oop: heap base corrupted?' >> 3: 5036 0x00007f35d56e4d40 : 'uncommon trap returned which should never happen' >> 4: 3643 0x00007f35d57479f8 : 'MacroAssembler::encode_heap_oop: heap base corrupted?' >> >> >> Few points about difference in output: >> 1. aarch64 does not use `ExternalAddess` or any relocation for messages (strings). >> 2. `stub: forward exception` corresponds to `StubRoutines::forward_exception_entry()` for which C2 generates tail-call from [C2's stubs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/generateOptoStub.cpp#L258C48-L258C87). It is difficult to convert it to `RuntimeAddress` because how relocation for constants in C2 are handled. >> 3... > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Add missed ExternalAddress changes I will work on `forward_exception_entry()` case next. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20412#issuecomment-2263719295 From coleenp at openjdk.org Thu Aug 1 18:55:06 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 1 Aug 2024 18:55:06 GMT Subject: RFR: 8337683: Fix -Wconversion problem with arrayOop.hpp Message-ID: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> Since base_offset_in_bytes and HeapWordSize are int, there's no loss of conversion in making these variables int. This seems trivial. Tested with tier1 on linux and windows. ------------- Commit messages: - 8337683: Fix -Wconversion problem with arrayOop.hpp Changes: https://git.openjdk.org/jdk/pull/20431/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20431&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337683 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20431/head:pull/20431 PR: https://git.openjdk.org/jdk/pull/20431 From amenkov at openjdk.org Thu Aug 1 20:18:33 2024 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 1 Aug 2024 20:18:33 GMT Subject: RFR: 8336846: assert(state->get_thread() == jt) failed: handshake unsafe conditions [v2] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 09:37:08 GMT, Serguei Spitsyn wrote: >> The JVMTI Watch Field functions do not disable VTMS transitions with the `JvmtiVTMSTransitionDisabler`: >> - `SetFieldAccessWatch()` >> - `ClearFieldAccessWatch()` >> - `SetFieldModificationWatch()` >> - `ClearFieldModificationWatch()` >> so in the `recompute_enabled()` we could see that a vthread is mounted, but in the `EnterInterpOnlyModeClosure` handshake the thread could have been unmounted already. This is a root cause of failures with this assert. >> >> The fix is to disable transitions in the `JvmtiEventControllerPrivate::change_field_watch()` function. >> >> Testing: >> - TBD: submit mach5 tiers 1-6 > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > rearranged to have one JvmtiVTMSTransitionDisabler instead of two Marked as reviewed by amenkov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20413#pullrequestreview-2213842084 From kvn at openjdk.org Thu Aug 1 21:36:31 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 Aug 2024 21:36:31 GMT Subject: RFR: 8337396: Cleanup usage of ExternalAddess [v3] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 18:38:47 GMT, Vladimir Kozlov wrote: >> `ExternalAddess` should be used only for data load. For calls (and jump) instructions we should use `RuntimeAddress` which uses `runtime_call_Relocation`. >> >> I found few places where `ExternalAddess` is used incorrectly and fixed them. >> >> I also added code to print "hottest" (most referenced) `ExternalAddess` addresses in global table to move them into static global tables which will be introduced by [JDK-8334691](https://bugs.openjdk.org/browse/JDK-8334691) and [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519). >> >> Here is current output from debug VM on MacBook M1 (Aarch64): >> >> External addresses table: 6 entries, 324 accesses >> 0: 158 0x00000001082de0f0 : extn: vmClasses::_klasses+480 >> 1: 84 0x00000001082ddf20 : extn: vmClasses::_klasses+16 >> 2: 40 0x00000001082c4790 : extn: SharedRuntime::_partial_subtype_ctr >> 3: 24 0x00000001082bdb04 : extn: JvmtiExport::_should_notify_object_alloc >> 4: 18 0x0000000118384080 : stub: forward exception >> >> >> on MacOS-x64: >> >> External addresses table: 143 entries, 44405 accesses >> 0: 11766 0x00000001047922a0 : extn: CompressedOops::_narrow_oop >> 1: 11002 0x0000000104474384 : 'should not reach here' >> 2: 9672 0x0000000104581a90 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+882068 >> 3: 2447 0x0000000104508005 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+383753 >> 4: 1916 0x000000010458188e : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+881554 >> >> >> and on linux-x64: >> >> External addresses table: 143 entries, 77297 accesses >> 0: 22334 0x00007f35d5b9c000 : '' >> 1: 19789 0x00007f35d55eea1f : 'should not reach here' >> 2: 18366 0x00007f35d5747bb8 : 'MacroAssembler::decode_heap_oop: heap base corrupted?' >> 3: 5036 0x00007f35d56e4d40 : 'uncommon trap returned which should never happen' >> 4: 3643 0x00007f35d57479f8 : 'MacroAssembler::encode_heap_oop: heap base corrupted?' >> >> >> Few points about difference in output: >> 1. aarch64 does not use `ExternalAddess` or any relocation for messages (strings). >> 2. `stub: forward exception` corresponds to `StubRoutines::forward_exception_entry()` for which C2 generates tail-call from [C2's stubs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/generateOptoStub.cpp#L258C48-L258C87). It is difficult to convert it to `RuntimeAddress` because how relocation for constants in C2 are handled. >> 3... > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Add more missing cases and update Copyright year `forward_exception_entry()` is complicated. Using `immP` causes assert hit in [MachNode::in_RegMask()](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/machnode.cpp#L233) because C2 assumes that all TailCalls have 2 registers inputs: [matcher.cpp#L828](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/matcher.cpp#L828) I don't think I should work on it in these changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20412#issuecomment-2264048441 From kvn at openjdk.org Thu Aug 1 21:48:32 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 Aug 2024 21:48:32 GMT Subject: RFR: 8337396: Cleanup usage of ExternalAddess [v3] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 18:38:47 GMT, Vladimir Kozlov wrote: >> `ExternalAddess` should be used only for data load. For calls (and jump) instructions we should use `RuntimeAddress` which uses `runtime_call_Relocation`. >> >> I found few places where `ExternalAddess` is used incorrectly and fixed them. >> >> I also added code to print "hottest" (most referenced) `ExternalAddess` addresses in global table to move them into static global tables which will be introduced by [JDK-8334691](https://bugs.openjdk.org/browse/JDK-8334691) and [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519). >> >> Here is current output from debug VM on MacBook M1 (Aarch64): >> >> External addresses table: 6 entries, 324 accesses >> 0: 158 0x00000001082de0f0 : extn: vmClasses::_klasses+480 >> 1: 84 0x00000001082ddf20 : extn: vmClasses::_klasses+16 >> 2: 40 0x00000001082c4790 : extn: SharedRuntime::_partial_subtype_ctr >> 3: 24 0x00000001082bdb04 : extn: JvmtiExport::_should_notify_object_alloc >> 4: 18 0x0000000118384080 : stub: forward exception >> >> >> on MacOS-x64: >> >> External addresses table: 143 entries, 44405 accesses >> 0: 11766 0x00000001047922a0 : extn: CompressedOops::_narrow_oop >> 1: 11002 0x0000000104474384 : 'should not reach here' >> 2: 9672 0x0000000104581a90 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+882068 >> 3: 2447 0x0000000104508005 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+383753 >> 4: 1916 0x000000010458188e : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+881554 >> >> >> and on linux-x64: >> >> External addresses table: 143 entries, 77297 accesses >> 0: 22334 0x00007f35d5b9c000 : '' >> 1: 19789 0x00007f35d55eea1f : 'should not reach here' >> 2: 18366 0x00007f35d5747bb8 : 'MacroAssembler::decode_heap_oop: heap base corrupted?' >> 3: 5036 0x00007f35d56e4d40 : 'uncommon trap returned which should never happen' >> 4: 3643 0x00007f35d57479f8 : 'MacroAssembler::encode_heap_oop: heap base corrupted?' >> >> >> Few points about difference in output: >> 1. aarch64 does not use `ExternalAddess` or any relocation for messages (strings). >> 2. `stub: forward exception` corresponds to `StubRoutines::forward_exception_entry()` for which C2 generates tail-call from [C2's stubs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/generateOptoStub.cpp#L258C48-L258C87). It is difficult to convert it to `RuntimeAddress` because how relocation for constants in C2 are handled. >> 3... > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Add more missing cases and update Copyright year I created RFE [JDK-8337702](https://bugs.openjdk.org/browse/JDK-8337702) for TailCall(forward_exception_entry). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20412#issuecomment-2264066874 From dholmes at openjdk.org Thu Aug 1 21:50:33 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 1 Aug 2024 21:50:33 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured [v3] In-Reply-To: References: Message-ID: <3Ca9Fgs0jxc-E60vlqVEC45Pct727L3CY5eQrOPBB-s=.0ef1959e-ad55-426c-910e-feedc95d673d@github.com> On Wed, 31 Jul 2024 14:05:06 GMT, Matthias Baesken wrote: > Maybe David could give the change a try in the Oracle CI if that's possible ? Testing ... ------------- PR Comment: https://git.openjdk.org/jdk/pull/19907#issuecomment-2264069111 From adinn at openjdk.org Thu Aug 1 21:58:51 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 1 Aug 2024 21:58:51 GMT Subject: RFR: 8337654: Relocate uncommon trap stub from SharedRuntime to OptoRuntime [v3] In-Reply-To: References: Message-ID: > Reorganization of generation and management code for C2-specific blob so that it is, as far as possible, under the scope of class OptoRuntime with an implementation located in C2-specific source files. Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: add new riscv source ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20417/files - new: https://git.openjdk.org/jdk/pull/20417/files/eb9b2d5c..7d7b3288 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20417&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20417&range=01-02 Stats: 382 lines in 1 file changed: 382 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20417.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20417/head:pull/20417 PR: https://git.openjdk.org/jdk/pull/20417 From adinn at openjdk.org Thu Aug 1 21:58:51 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 1 Aug 2024 21:58:51 GMT Subject: RFR: 8337654: Relocate uncommon trap stub from SharedRuntime to OptoRuntime [v2] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 16:36:28 GMT, Vladimir Kozlov wrote: > Did you forget to add new file runtime_risc.cpp to changeset? Yes, apologies for that. Should be there now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20417#issuecomment-2264079670 From adinn at openjdk.org Thu Aug 1 22:02:32 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 1 Aug 2024 22:02:32 GMT Subject: RFR: 8337396: Cleanup usage of ExternalAddess [v3] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 18:38:47 GMT, Vladimir Kozlov wrote: >> `ExternalAddess` should be used only for data load. For calls (and jump) instructions we should use `RuntimeAddress` which uses `runtime_call_Relocation`. >> >> I found few places where `ExternalAddess` is used incorrectly and fixed them. >> >> I also added code to print "hottest" (most referenced) `ExternalAddess` addresses in global table to move them into static global tables which will be introduced by [JDK-8334691](https://bugs.openjdk.org/browse/JDK-8334691) and [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519). >> >> Here is current output from debug VM on MacBook M1 (Aarch64): >> >> External addresses table: 6 entries, 324 accesses >> 0: 158 0x00000001082de0f0 : extn: vmClasses::_klasses+480 >> 1: 84 0x00000001082ddf20 : extn: vmClasses::_klasses+16 >> 2: 40 0x00000001082c4790 : extn: SharedRuntime::_partial_subtype_ctr >> 3: 24 0x00000001082bdb04 : extn: JvmtiExport::_should_notify_object_alloc >> 4: 18 0x0000000118384080 : stub: forward exception >> >> >> on MacOS-x64: >> >> External addresses table: 143 entries, 44405 accesses >> 0: 11766 0x00000001047922a0 : extn: CompressedOops::_narrow_oop >> 1: 11002 0x0000000104474384 : 'should not reach here' >> 2: 9672 0x0000000104581a90 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+882068 >> 3: 2447 0x0000000104508005 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+383753 >> 4: 1916 0x000000010458188e : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+881554 >> >> >> and on linux-x64: >> >> External addresses table: 143 entries, 77297 accesses >> 0: 22334 0x00007f35d5b9c000 : '' >> 1: 19789 0x00007f35d55eea1f : 'should not reach here' >> 2: 18366 0x00007f35d5747bb8 : 'MacroAssembler::decode_heap_oop: heap base corrupted?' >> 3: 5036 0x00007f35d56e4d40 : 'uncommon trap returned which should never happen' >> 4: 3643 0x00007f35d57479f8 : 'MacroAssembler::encode_heap_oop: heap base corrupted?' >> >> >> Few points about difference in output: >> 1. aarch64 does not use `ExternalAddess` or any relocation for messages (strings). >> 2. `stub: forward exception` corresponds to `StubRoutines::forward_exception_entry()` for which C2 generates tail-call from [C2's stubs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/generateOptoStub.cpp#L258C48-L258C87). It is difficult to convert it to `RuntimeAddress` because how relocation for constants in C2 are handled. >> 3... > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Add more missing cases and update Copyright year > I don't think I should work on it in these changes. > . . . > I created RFE [JDK-8337702](https://bugs.openjdk.org/browse/JDK-8337702) for TailCall(forward_exception_entry). Yes good idea. As you say lea is needed when loading the RuntimeAddress into a register. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20412#issuecomment-2264085022 From adinn at openjdk.org Thu Aug 1 22:05:31 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 1 Aug 2024 22:05:31 GMT Subject: RFR: 8337396: Cleanup usage of ExternalAddess [v3] In-Reply-To: References: Message-ID: <27_bJXLLs9e4OR9mYrLs2w_Z39A8NZ_DBrKXbl48ceo=.50b885e3-4173-49c0-8cd2-3deb6c80cdf8@github.com> On Thu, 1 Aug 2024 18:38:47 GMT, Vladimir Kozlov wrote: >> `ExternalAddess` should be used only for data load. For calls (and jump) instructions we should use `RuntimeAddress` which uses `runtime_call_Relocation`. >> >> I found few places where `ExternalAddess` is used incorrectly and fixed them. >> >> I also added code to print "hottest" (most referenced) `ExternalAddess` addresses in global table to move them into static global tables which will be introduced by [JDK-8334691](https://bugs.openjdk.org/browse/JDK-8334691) and [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519). >> >> Here is current output from debug VM on MacBook M1 (Aarch64): >> >> External addresses table: 6 entries, 324 accesses >> 0: 158 0x00000001082de0f0 : extn: vmClasses::_klasses+480 >> 1: 84 0x00000001082ddf20 : extn: vmClasses::_klasses+16 >> 2: 40 0x00000001082c4790 : extn: SharedRuntime::_partial_subtype_ctr >> 3: 24 0x00000001082bdb04 : extn: JvmtiExport::_should_notify_object_alloc >> 4: 18 0x0000000118384080 : stub: forward exception >> >> >> on MacOS-x64: >> >> External addresses table: 143 entries, 44405 accesses >> 0: 11766 0x00000001047922a0 : extn: CompressedOops::_narrow_oop >> 1: 11002 0x0000000104474384 : 'should not reach here' >> 2: 9672 0x0000000104581a90 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+882068 >> 3: 2447 0x0000000104508005 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+383753 >> 4: 1916 0x000000010458188e : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+881554 >> >> >> and on linux-x64: >> >> External addresses table: 143 entries, 77297 accesses >> 0: 22334 0x00007f35d5b9c000 : '' >> 1: 19789 0x00007f35d55eea1f : 'should not reach here' >> 2: 18366 0x00007f35d5747bb8 : 'MacroAssembler::decode_heap_oop: heap base corrupted?' >> 3: 5036 0x00007f35d56e4d40 : 'uncommon trap returned which should never happen' >> 4: 3643 0x00007f35d57479f8 : 'MacroAssembler::encode_heap_oop: heap base corrupted?' >> >> >> Few points about difference in output: >> 1. aarch64 does not use `ExternalAddess` or any relocation for messages (strings). >> 2. `stub: forward exception` corresponds to `StubRoutines::forward_exception_entry()` for which C2 generates tail-call from [C2's stubs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/generateOptoStub.cpp#L258C48-L258C87). It is difficult to convert it to `RuntimeAddress` because how relocation for constants in C2 are handled. >> 3... > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Add more missing cases and update Copyright year Marked as reviewed by adinn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20412#pullrequestreview-2214099648 From pchilanomate at openjdk.org Thu Aug 1 22:47:31 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 1 Aug 2024 22:47:31 GMT Subject: RFR: 8336846: assert(state->get_thread() == jt) failed: handshake unsafe conditions [v2] In-Reply-To: References: Message-ID: <8AOxsfURVgBRGlS8WGtZ1wubjuUozfK-LcLkf9BGVoQ=.bb20a5eb-8821-48cd-bb09-0dfa8870f6f3@github.com> On Thu, 1 Aug 2024 09:37:08 GMT, Serguei Spitsyn wrote: >> The JVMTI Watch Field functions do not disable VTMS transitions with the `JvmtiVTMSTransitionDisabler`: >> - `SetFieldAccessWatch()` >> - `ClearFieldAccessWatch()` >> - `SetFieldModificationWatch()` >> - `ClearFieldModificationWatch()` >> so in the `recompute_enabled()` we could see that a vthread is mounted, but in the `EnterInterpOnlyModeClosure` handshake the thread could have been unmounted already. This is a root cause of failures with this assert. >> >> The fix is to disable transitions in the `JvmtiEventControllerPrivate::change_field_watch()` function. >> >> Testing: >> - TBD: submit mach5 tiers 1-6 > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > rearranged to have one JvmtiVTMSTransitionDisabler instead of two Looks good, but I see we have ranking issues with JvmtiThreadState_lock now. We will have to change JvmtiVTMSTransition_lock to be safepoint-1. ------------- PR Review: https://git.openjdk.org/jdk/pull/20413#pullrequestreview-2214162122 From kvn at openjdk.org Thu Aug 1 23:05:33 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 Aug 2024 23:05:33 GMT Subject: RFR: 8337396: Cleanup usage of ExternalAddess [v3] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 18:38:47 GMT, Vladimir Kozlov wrote: >> `ExternalAddess` should be used only for data load. For calls (and jump) instructions we should use `RuntimeAddress` which uses `runtime_call_Relocation`. >> >> I found few places where `ExternalAddess` is used incorrectly and fixed them. >> >> I also added code to print "hottest" (most referenced) `ExternalAddess` addresses in global table to move them into static global tables which will be introduced by [JDK-8334691](https://bugs.openjdk.org/browse/JDK-8334691) and [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519). >> >> Here is current output from debug VM on MacBook M1 (Aarch64): >> >> External addresses table: 6 entries, 324 accesses >> 0: 158 0x00000001082de0f0 : extn: vmClasses::_klasses+480 >> 1: 84 0x00000001082ddf20 : extn: vmClasses::_klasses+16 >> 2: 40 0x00000001082c4790 : extn: SharedRuntime::_partial_subtype_ctr >> 3: 24 0x00000001082bdb04 : extn: JvmtiExport::_should_notify_object_alloc >> 4: 18 0x0000000118384080 : stub: forward exception >> >> >> on MacOS-x64: >> >> External addresses table: 143 entries, 44405 accesses >> 0: 11766 0x00000001047922a0 : extn: CompressedOops::_narrow_oop >> 1: 11002 0x0000000104474384 : 'should not reach here' >> 2: 9672 0x0000000104581a90 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+882068 >> 3: 2447 0x0000000104508005 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+383753 >> 4: 1916 0x000000010458188e : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+881554 >> >> >> and on linux-x64: >> >> External addresses table: 143 entries, 77297 accesses >> 0: 22334 0x00007f35d5b9c000 : '' >> 1: 19789 0x00007f35d55eea1f : 'should not reach here' >> 2: 18366 0x00007f35d5747bb8 : 'MacroAssembler::decode_heap_oop: heap base corrupted?' >> 3: 5036 0x00007f35d56e4d40 : 'uncommon trap returned which should never happen' >> 4: 3643 0x00007f35d57479f8 : 'MacroAssembler::encode_heap_oop: heap base corrupted?' >> >> >> Few points about difference in output: >> 1. aarch64 does not use `ExternalAddess` or any relocation for messages (strings). >> 2. `stub: forward exception` corresponds to `StubRoutines::forward_exception_entry()` for which C2 generates tail-call from [C2's stubs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/generateOptoStub.cpp#L258C48-L258C87). It is difficult to convert it to `RuntimeAddress` because how relocation for constants in C2 are handled. >> 3... > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Add more missing cases and update Copyright year Thank you Andrew for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20412#issuecomment-2264164926 From kvn at openjdk.org Thu Aug 1 23:06:31 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 Aug 2024 23:06:31 GMT Subject: RFR: 8337654: Relocate uncommon trap stub from SharedRuntime to OptoRuntime [v3] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 21:58:51 GMT, Andrew Dinn wrote: >> Reorganization of generation and management code for C2-specific blob so that it is, as far as possible, under the scope of class OptoRuntime with an implementation located in C2-specific source files. > > Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: > > add new riscv source Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20417#pullrequestreview-2214190702 From vlivanov at openjdk.org Thu Aug 1 23:54:36 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 1 Aug 2024 23:54:36 GMT Subject: RFR: 8337654: Relocate uncommon trap stub from SharedRuntime to OptoRuntime [v3] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 21:58:51 GMT, Andrew Dinn wrote: >> Reorganization of generation and management code for C2-specific blob so that it is, as far as possible, under the scope of class OptoRuntime with an implementation located in C2-specific source files. > > Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: > > add new riscv source Looks good. src/hotspot/cpu/aarch64/runtime_aarch64.cpp line 219: > 217: // crud. We cannot block on this call, no GC can happen. Call should > 218: // restore return values to their stack-slots with the new SP. > 219: // Thread is in rdi already. It's weird to see `rdi` being mentioned in AArch64-specific code :-) ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20417#pullrequestreview-2214250900 PR Review Comment: https://git.openjdk.org/jdk/pull/20417#discussion_r1701019269 From jwtang at openjdk.org Fri Aug 2 02:45:51 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Fri, 2 Aug 2024 02:45:51 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v6] In-Reply-To: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> Message-ID: > I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing. Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision: change test condition for TestPinCaseWithCFLH.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20373/files - new: https://git.openjdk.org/jdk/pull/20373/files/60411296..64e40a62 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=04-05 Stats: 4 lines in 1 file changed: 0 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20373/head:pull/20373 PR: https://git.openjdk.org/jdk/pull/20373 From fyang at openjdk.org Fri Aug 2 03:23:33 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 2 Aug 2024 03:23:33 GMT Subject: RFR: 8337396: Cleanup usage of ExternalAddess [v3] In-Reply-To: References: Message-ID: <3eqNbe_hwvZEt2nOLC6GLZmvrMwgXYKCdIsnlC66z2Y=.e67ee2af-8cba-4ad9-8aba-2846fdcbd948@github.com> On Thu, 1 Aug 2024 18:38:47 GMT, Vladimir Kozlov wrote: >> `ExternalAddess` should be used only for data load. For calls (and jump) instructions we should use `RuntimeAddress` which uses `runtime_call_Relocation`. >> >> I found few places where `ExternalAddess` is used incorrectly and fixed them. >> >> I also added code to print "hottest" (most referenced) `ExternalAddess` addresses in global table to move them into static global tables which will be introduced by [JDK-8334691](https://bugs.openjdk.org/browse/JDK-8334691) and [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519). >> >> Here is current output from debug VM on MacBook M1 (Aarch64): >> >> External addresses table: 6 entries, 324 accesses >> 0: 158 0x00000001082de0f0 : extn: vmClasses::_klasses+480 >> 1: 84 0x00000001082ddf20 : extn: vmClasses::_klasses+16 >> 2: 40 0x00000001082c4790 : extn: SharedRuntime::_partial_subtype_ctr >> 3: 24 0x00000001082bdb04 : extn: JvmtiExport::_should_notify_object_alloc >> 4: 18 0x0000000118384080 : stub: forward exception >> >> >> on MacOS-x64: >> >> External addresses table: 143 entries, 44405 accesses >> 0: 11766 0x00000001047922a0 : extn: CompressedOops::_narrow_oop >> 1: 11002 0x0000000104474384 : 'should not reach here' >> 2: 9672 0x0000000104581a90 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+882068 >> 3: 2447 0x0000000104508005 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+383753 >> 4: 1916 0x000000010458188e : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+881554 >> >> >> and on linux-x64: >> >> External addresses table: 143 entries, 77297 accesses >> 0: 22334 0x00007f35d5b9c000 : '' >> 1: 19789 0x00007f35d55eea1f : 'should not reach here' >> 2: 18366 0x00007f35d5747bb8 : 'MacroAssembler::decode_heap_oop: heap base corrupted?' >> 3: 5036 0x00007f35d56e4d40 : 'uncommon trap returned which should never happen' >> 4: 3643 0x00007f35d57479f8 : 'MacroAssembler::encode_heap_oop: heap base corrupted?' >> >> >> Few points about difference in output: >> 1. aarch64 does not use `ExternalAddess` or any relocation for messages (strings). >> 2. `stub: forward exception` corresponds to `StubRoutines::forward_exception_entry()` for which C2 generates tail-call from [C2's stubs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/generateOptoStub.cpp#L258C48-L258C87). It is difficult to convert it to `RuntimeAddress` because how relocation for constants in C2 are handled. >> 3... > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Add more missing cases and update Copyright year And some addon change to keep RISC-V part up to date. Manually run workloads with -XX:+VerifyOops -XX:+TraceBytecodes. @vnkozlov : Could you please help add it? Thanks. diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp index e349eab3177..cd7a4ecf228 100644 --- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp +++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp @@ -547,7 +547,7 @@ void MacroAssembler::_verify_oop(Register reg, const char* s, const char* file, } // call indirectly to solve generation ordering problem - ExternalAddress target(StubRoutines::verify_oop_subroutine_entry_address()); + RuntimeAddress target(StubRoutines::verify_oop_subroutine_entry_address()); relocate(target.rspec(), [&] { int32_t offset; la(t1, target.target(), offset); @@ -592,7 +592,7 @@ void MacroAssembler::_verify_oop_addr(Address addr, const char* s, const char* f } // call indirectly to solve generation ordering problem - ExternalAddress target(StubRoutines::verify_oop_subroutine_entry_address()); + RuntimeAddress target(StubRoutines::verify_oop_subroutine_entry_address()); relocate(target.rspec(), [&] { int32_t offset; la(t1, target.target(), offset); diff --git a/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp b/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp index 769e4dc5ccc..f01945bc6a3 100644 --- a/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp +++ b/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp @@ -1111,8 +1111,8 @@ address TemplateInterpreterGenerator::generate_native_entry(bool synchronized) { { Label L; __ ld(x28, Address(xmethod, Method::native_function_offset())); - address unsatisfied = (SharedRuntime::native_method_throw_unsatisfied_link_error_entry()); - __ mv(t, unsatisfied); + ExternalAddress unsatisfied(SharedRuntime::native_method_throw_unsatisfied_link_error_entry()); + __ la(t, unsatisfied); __ load_long_misaligned(t1, Address(t, 0), t0, 2); // 2 bytes aligned, but not 4 or 8 __ bne(x28, t1, L); @@ -1815,7 +1815,7 @@ void TemplateInterpreterGenerator::trace_bytecode(Template* t) { // the tosca in-state for the given template. assert(Interpreter::trace_code(t->tos_in()) != nullptr, "entry must have been generated"); - __ call(Interpreter::trace_code(t->tos_in())); + __ rt_call(Interpreter::trace_code(t->tos_in())); __ reinit_heapbase(); } ------------- PR Comment: https://git.openjdk.org/jdk/pull/20412#issuecomment-2264437803 From jwtang at openjdk.org Fri Aug 2 03:24:12 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Fri, 2 Aug 2024 03:24:12 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v7] In-Reply-To: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> Message-ID: > I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing. Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision: rearrange to avoid long lines ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20373/files - new: https://git.openjdk.org/jdk/pull/20373/files/64e40a62..80b139ef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=05-06 Stats: 12 lines in 1 file changed: 9 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20373/head:pull/20373 PR: https://git.openjdk.org/jdk/pull/20373 From fyang at openjdk.org Fri Aug 2 04:37:37 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 2 Aug 2024 04:37:37 GMT Subject: RFR: 8337654: Relocate uncommon trap stub from SharedRuntime to OptoRuntime [v3] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 21:58:51 GMT, Andrew Dinn wrote: >> Reorganization of generation and management code for C2-specific blob so that it is, as far as possible, under the scope of class OptoRuntime with an implementation located in C2-specific source files. > > Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: > > add new riscv source Hi Andrew, I think we are lacking following addon change for riscv. Built fastdebug and manually tested on linux-riscv64. diff --git a/src/hotspot/cpu/riscv/runtime_riscv.cpp b/src/hotspot/cpu/riscv/runtime_riscv.cpp index 46fdcc65a2c..c030441857c 100644 --- a/src/hotspot/cpu/riscv/runtime_riscv.cpp +++ b/src/hotspot/cpu/riscv/runtime_riscv.cpp @@ -58,7 +58,7 @@ class SimpleRuntimeFrame { #define __ masm-> //------------------------------generate_uncommon_trap_blob-------------------- -void SharedRuntime::generate_uncommon_trap_blob() { +void OptoRuntime::generate_uncommon_trap_blob() { // Allocate space for the code ResourceMark rm; // Setup code generation tools @@ -122,7 +122,7 @@ void SharedRuntime::generate_uncommon_trap_blob() { __ lwu(t0, Address(x14, Deoptimization::UnrollBlock::unpack_kind_offset())); __ mv(t1, Deoptimization::Unpack_uncommon_trap); __ beq(t0, t1, L); - __ stop("SharedRuntime::generate_uncommon_trap_blob: expected Unpack_uncommon_trap"); + __ stop("OptoRuntime::generate_uncommon_trap_blob: expected Unpack_uncommon_trap"); __ bind(L); } #endif @@ -377,6 +377,5 @@ void OptoRuntime::generate_exception_blob() { // Set exception blob _exception_blob = ExceptionBlob::create(&buffer, oop_maps, SimpleRuntimeFrame::framesize >> 1); } -#endif // COMPILER2 - +#endif // COMPILER2 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20417#issuecomment-2264529669 From dholmes at openjdk.org Fri Aug 2 05:52:32 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 2 Aug 2024 05:52:32 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured [v4] In-Reply-To: References: Message-ID: On Wed, 31 Jul 2024 14:07:46 GMT, Matthias Baesken wrote: >> Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work. >> >> We find this in the test output >> >> [STDOUT] >> /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory >> >> The container where the test is executed does not contain the ubsan package; we might skip the test in this case. > > Matthias Baesken has updated the pull request incrementally with two additional commits since the last revision: > > - remove method from WhiteBox.java > - remove WB_isUbsanEnabled, fix test This had no impact on our tier5 testing where we test container related stuff. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19907#issuecomment-2264610271 From dholmes at openjdk.org Fri Aug 2 05:55:33 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 2 Aug 2024 05:55:33 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v7] In-Reply-To: References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> Message-ID: On Fri, 2 Aug 2024 03:24:12 GMT, Jiawei Tang wrote: >> I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing. > > Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision: > > rearrange to avoid long lines test/hotspot/jtreg/serviceability/jvmti/vthread/TestPinCaseWithCFLH/TestPinCaseWithCFLH.java line 31: > 29: * @modules java.base/java.lang:+open > 30: * @build TestPinCaseWithCFLH > 31: * @run driver jdk.test.lib.util.JavaAgentBuilder Indentation is now wrong ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1701309954 From kvn at openjdk.org Fri Aug 2 06:10:03 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 2 Aug 2024 06:10:03 GMT Subject: RFR: 8337396: Cleanup usage of ExternalAddess [v4] In-Reply-To: References: Message-ID: > `ExternalAddess` should be used only for data load. For calls (and jump) instructions we should use `RuntimeAddress` which uses `runtime_call_Relocation`. > > I found few places where `ExternalAddess` is used incorrectly and fixed them. > > I also added code to print "hottest" (most referenced) `ExternalAddess` addresses in global table to move them into static global tables which will be introduced by [JDK-8334691](https://bugs.openjdk.org/browse/JDK-8334691) and [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519). > > Here is current output from debug VM on MacBook M1 (Aarch64): > > External addresses table: 6 entries, 324 accesses > 0: 158 0x00000001082de0f0 : extn: vmClasses::_klasses+480 > 1: 84 0x00000001082ddf20 : extn: vmClasses::_klasses+16 > 2: 40 0x00000001082c4790 : extn: SharedRuntime::_partial_subtype_ctr > 3: 24 0x00000001082bdb04 : extn: JvmtiExport::_should_notify_object_alloc > 4: 18 0x0000000118384080 : stub: forward exception > > > on MacOS-x64: > > External addresses table: 143 entries, 44405 accesses > 0: 11766 0x00000001047922a0 : extn: CompressedOops::_narrow_oop > 1: 11002 0x0000000104474384 : 'should not reach here' > 2: 9672 0x0000000104581a90 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+882068 > 3: 2447 0x0000000104508005 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+383753 > 4: 1916 0x000000010458188e : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+881554 > > > and on linux-x64: > > External addresses table: 143 entries, 77297 accesses > 0: 22334 0x00007f35d5b9c000 : '' > 1: 19789 0x00007f35d55eea1f : 'should not reach here' > 2: 18366 0x00007f35d5747bb8 : 'MacroAssembler::decode_heap_oop: heap base corrupted?' > 3: 5036 0x00007f35d56e4d40 : 'uncommon trap returned which should never happen' > 4: 3643 0x00007f35d57479f8 : 'MacroAssembler::encode_heap_oop: heap base corrupted?' > > > Few points about difference in output: > 1. aarch64 does not use `ExternalAddess` or any relocation for messages (strings). > 2. `stub: forward exception` corresponds to `StubRoutines::forward_exception_entry()` for which C2 generates tail-call from [C2's stubs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/generateOptoStub.cpp#L258C48-L258C87). It is difficult to convert it to `RuntimeAddress` because how relocation for constants in C2 are handled. > 3. linux-x64 implementation of `dlladdr()`, I used to print C++ symbol name, only... Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Added RISCV missing cases ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20412/files - new: https://git.openjdk.org/jdk/pull/20412/files/9bb7867f..f19d284f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20412&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20412&range=02-03 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20412.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20412/head:pull/20412 PR: https://git.openjdk.org/jdk/pull/20412 From kvn at openjdk.org Fri Aug 2 06:10:03 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 2 Aug 2024 06:10:03 GMT Subject: RFR: 8337396: Cleanup usage of ExternalAddess [v3] In-Reply-To: <3eqNbe_hwvZEt2nOLC6GLZmvrMwgXYKCdIsnlC66z2Y=.e67ee2af-8cba-4ad9-8aba-2846fdcbd948@github.com> References: <3eqNbe_hwvZEt2nOLC6GLZmvrMwgXYKCdIsnlC66z2Y=.e67ee2af-8cba-4ad9-8aba-2846fdcbd948@github.com> Message-ID: On Fri, 2 Aug 2024 03:20:26 GMT, Fei Yang wrote: > And some addon change to keep RISC-V part up to date. Added. Thank you, Fei Yang. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20412#issuecomment-2264629468 From mbaesken at openjdk.org Fri Aug 2 06:17:36 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 2 Aug 2024 06:17:36 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured [v4] In-Reply-To: References: Message-ID: On Wed, 31 Jul 2024 14:07:46 GMT, Matthias Baesken wrote: >> Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work. >> >> We find this in the test output >> >> [STDOUT] >> /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory >> >> The container where the test is executed does not contain the ubsan package; we might skip the test in this case. > > Matthias Baesken has updated the pull request incrementally with two additional commits since the last revision: > > - remove method from WhiteBox.java > - remove WB_isUbsanEnabled, fix test Hi David, thanks for testing. Thinking more about it, should I test in ` generateDockerFile ` for `baseImage` name containing ubuntu, so that people using a system property to set an own special image do not run into issues with the added `RUN apt-get install libubsan1` ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19907#issuecomment-2264640910 From stuefe at openjdk.org Fri Aug 2 06:27:32 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 2 Aug 2024 06:27:32 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured [v4] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 07:31:07 GMT, Matthias Baesken wrote: > > Are we sure the images the tests use will always be Debian or Debian descendants? What about RHEL or Oracle Linux? > > We use Ubuntu for the container test base image as default , see > > https://github.com/openjdk/jdk/blob/65646b5f81279a7fcef3ea04ef9894cf66f77a5a/test/lib/jdk/test/lib/containers/docker/DockerfileConfig.java#L47 > > To be more on the safe side (potentially the image can be switched with jdk.test.docker.image.name) we could add a check (and avoid adding the libubsan1 package if it is not ubuntu). > > Regarding compatibility, I've seen no issues (and if you compile _without_ ubsan you would not reference the libubsan1 anyway). So it is for some special configuration. Okay, thanks Matthias ------------- PR Comment: https://git.openjdk.org/jdk/pull/19907#issuecomment-2264652502 From stefank at openjdk.org Fri Aug 2 06:48:31 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 2 Aug 2024 06:48:31 GMT Subject: RFR: 8337683: Fix -Wconversion problem with arrayOop.hpp In-Reply-To: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> References: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> Message-ID: On Thu, 1 Aug 2024 18:49:34 GMT, Coleen Phillimore wrote: > Since base_offset_in_bytes and HeapWordSize are int, there's no loss of conversion in making these variables int. This seems trivial. > Tested with tier1 on linux and windows. This doesn't seem to be enough to fix -Wconversion for this file. It just pushes the problem down to line 142. I ran your patch with -Wconversion -ferror-limit=20000 and searched for arrayOop.hpp and it gives me: src/hotspot/share/oops/arrayOop.hpp:71:17: error: implicit conversion changes signedness: 'int' to 'unsigned long' [-Werror,-Wsign-conversion] size_t hs = length_offset_in_bytes() + sizeof(int); ^~~~~~~~~~~~~~~~~~~~~~~~ ~ src/hotspot/share/oops/arrayOop.hpp:91:17: error: implicit conversion changes signedness: 'int' to 'size_t' (aka 'unsigned long') [-Werror,-Wsign-conversion] size_t hs = header_size_in_bytes(); ~~ ^~~~~~~~~~~~~~~~~~~~~~ src/hotspot/share/oops/arrayOop.hpp:142:43: error: implicit conversion changes signedness: 'int' to 'unsigned long' [-Werror,-Wsign-conversion] align_down((SIZE_MAX/HeapWordSize - hdr_size_in_words), MinObjAlignment); ~ ^~~~~~~~~~~~~~~~~ src/hotspot/share/oops/arrayOop.hpp:144:53: error: implicit conversion changes signedness: 'int' to 'unsigned long' [-Werror,-Wsign-conversion] HeapWordSize * max_element_words_per_size_t / type2aelembytes(type); ~ ^~~~~~~~~~~~~~~~~~~~~ ------------- Changes requested by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20431#pullrequestreview-2214782228 From dholmes at openjdk.org Fri Aug 2 06:52:33 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 2 Aug 2024 06:52:33 GMT Subject: RFR: 8332120: Potential compilation failure in istream.cpp:205 - loss of data on conversion In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 16:52:23 GMT, Coleen Phillimore wrote: > This field _line_ending isn't used, and I'm not sure how this even works. So I deleted it. > Tested with tier1 on many Oracle supported platforms. Looks like it is for debugging to ensure it does in fact only have the values 0,1, 2. ------------- PR Review: https://git.openjdk.org/jdk/pull/20427#pullrequestreview-2214790007 From dholmes at openjdk.org Fri Aug 2 07:02:32 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 2 Aug 2024 07:02:32 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured [v4] In-Reply-To: References: Message-ID: <3QjJtFZoqR4UK5_isGiWBEL-r0IiBGCFB6_GAskwxAI=.978ab364-5cd4-452a-ab24-164510fe9ba2@github.com> On Fri, 2 Aug 2024 06:15:15 GMT, Matthias Baesken wrote: >> Matthias Baesken has updated the pull request incrementally with two additional commits since the last revision: >> >> - remove method from WhiteBox.java >> - remove WB_isUbsanEnabled, fix test > > Hi David, thanks for testing. > Thinking more about it, should I test in ` generateDockerFile ` for `baseImage` name containing ubuntu, so that people using a system property to set an own special image do not run into issues with the added `RUN apt-get install libubsan1` ? @MBaesken I know nothing about container setup, sorry. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19907#issuecomment-2264701324 From dholmes at openjdk.org Fri Aug 2 07:05:34 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 2 Aug 2024 07:05:34 GMT Subject: RFR: 8335059: Consider renaming ClassLoaderData::keep_alive [v3] In-Reply-To: <6q02aOYDow87b8ohDXO6DVTeyzITIKxCmFbBZJsuEDo=.8da3927e-2992-4962-9af2-918757929ab0@github.com> References: <6q02aOYDow87b8ohDXO6DVTeyzITIKxCmFbBZJsuEDo=.8da3927e-2992-4962-9af2-918757929ab0@github.com> Message-ID: On Thu, 1 Aug 2024 12:35:00 GMT, Coleen Phillimore wrote: >> How does this rename look? Instead of ClassLoaderData::keep_alive() and a _keep_alive refcount, it's been renamed to _strongly_reachable and is_strongly_reachable(). >> Tested with tier1 on Oracle supported platforms. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Remove comments, update comment, fix compliation error. If @stefank is happy with this then I am too. Looks good. Thanks for the updates. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20408#pullrequestreview-2214815935 From dholmes at openjdk.org Fri Aug 2 07:08:36 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 2 Aug 2024 07:08:36 GMT Subject: RFR: 8336163: Remove declarations of some debug-only methods in release build [v3] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 08:33:48 GMT, Qizheng Xing wrote: >> Some of the methods are defined only in debug mode, but their declarations still exist in release mode. >> >> This is considered a bug because these methods may be called mistakenly in release mode and cause the build to fail. > > Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: > > Merge `#ifndef PRODUCT` regions. Update seems okay. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20131#pullrequestreview-2214822739 From qxing at openjdk.org Fri Aug 2 07:13:32 2024 From: qxing at openjdk.org (Qizheng Xing) Date: Fri, 2 Aug 2024 07:13:32 GMT Subject: RFR: 8336163: Remove declarations of some debug-only methods in release build [v3] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 10:51:21 GMT, Eric Liu wrote: >> Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: >> >> Merge `#ifndef PRODUCT` regions. > > Marked as reviewed by eliu (Committer). @e1iu @dholmes-ora Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20131#issuecomment-2264719566 From duke at openjdk.org Fri Aug 2 07:13:33 2024 From: duke at openjdk.org (duke) Date: Fri, 2 Aug 2024 07:13:33 GMT Subject: RFR: 8336163: Remove declarations of some debug-only methods in release build [v3] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 08:33:48 GMT, Qizheng Xing wrote: >> Some of the methods are defined only in debug mode, but their declarations still exist in release mode. >> >> This is considered a bug because these methods may be called mistakenly in release mode and cause the build to fail. > > Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: > > Merge `#ifndef PRODUCT` regions. @MaxXSoft Your change (at version 1117b89d7a19e7561c92669e8cc14e50f1c47963) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20131#issuecomment-2264720217 From coleenp at openjdk.org Fri Aug 2 11:50:35 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 2 Aug 2024 11:50:35 GMT Subject: RFR: 8335059: Consider renaming ClassLoaderData::keep_alive [v3] In-Reply-To: <6q02aOYDow87b8ohDXO6DVTeyzITIKxCmFbBZJsuEDo=.8da3927e-2992-4962-9af2-918757929ab0@github.com> References: <6q02aOYDow87b8ohDXO6DVTeyzITIKxCmFbBZJsuEDo=.8da3927e-2992-4962-9af2-918757929ab0@github.com> Message-ID: On Thu, 1 Aug 2024 12:35:00 GMT, Coleen Phillimore wrote: >> How does this rename look? Instead of ClassLoaderData::keep_alive() and a _keep_alive refcount, it's been renamed to _strongly_reachable and is_strongly_reachable(). >> Tested with tier1 on Oracle supported platforms. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Remove comments, update comment, fix compliation error. Thanks for the review, David and discussion and review, Stefan. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20408#issuecomment-2265183569 From coleenp at openjdk.org Fri Aug 2 11:50:36 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 2 Aug 2024 11:50:36 GMT Subject: Integrated: 8335059: Consider renaming ClassLoaderData::keep_alive In-Reply-To: References: Message-ID: On Wed, 31 Jul 2024 18:35:12 GMT, Coleen Phillimore wrote: > How does this rename look? Instead of ClassLoaderData::keep_alive() and a _keep_alive refcount, it's been renamed to _strongly_reachable and is_strongly_reachable(). > Tested with tier1 on Oracle supported platforms. This pull request has now been integrated. Changeset: 328a0533 Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/328a0533b2ee6793130dfb68d931e0ebd60c6b5d Stats: 44 lines in 10 files changed: 3 ins; 2 del; 39 mod 8335059: Consider renaming ClassLoaderData::keep_alive Reviewed-by: dholmes, stefank ------------- PR: https://git.openjdk.org/jdk/pull/20408 From coleenp at openjdk.org Fri Aug 2 12:02:38 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 2 Aug 2024 12:02:38 GMT Subject: RFR: 8337683: Fix -Wconversion problem with arrayOop.hpp In-Reply-To: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> References: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> Message-ID: On Thu, 1 Aug 2024 18:49:34 GMT, Coleen Phillimore wrote: > Since base_offset_in_bytes and HeapWordSize are int, there's no loss of conversion in making these variables int. This seems trivial. > Tested with tier1 on linux and windows. I run with this: diff --git a/make/autoconf/flags-cflags.m4 b/make/autoconf/flags-cflags.m4 index cf7f4534c89..131e9ece31c 100644 --- a/make/autoconf/flags-cflags.m4 +++ b/make/autoconf/flags-cflags.m4 @@ -186,12 +186,12 @@ AC_DEFUN([FLAGS_SETUP_WARNINGS], gcc) DISABLE_WARNING_PREFIX="-Wno-" BUILD_CC_DISABLE_WARNING_PREFIX="-Wno-" - CFLAGS_WARNINGS_ARE_ERRORS="-Werror" + CFLAGS_WARNINGS_ARE_ERRORS="" # Additional warnings that are not activated by -Wall and -Wextra WARNINGS_ENABLE_ADDITIONAL="-Wpointer-arith -Wsign-compare \ -Wunused-function -Wundef -Wunused-value -Wreturn-type \ - -Wtrampolines" + -Wtrampolines -Wconversion" WARNINGS_ENABLE_ADDITIONAL_CXX="-Woverloaded-virtual -Wreorder" WARNINGS_ENABLE_ALL_CFLAGS="-Wall -Wextra -Wformat=2 $WARNINGS_ENABLE_ADDITIONAL" WARNINGS_ENABLE_ALL_CXXFLAGS="$WARNINGS_ENABLE_ALL_CFLAGS $WARNINGS_ENABLE_ADDITIONAL_CXX" It doesn't check sign change, which all of the files have a million of these errors. Only size change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20431#issuecomment-2265203420 From coleenp at openjdk.org Fri Aug 2 12:04:36 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 2 Aug 2024 12:04:36 GMT Subject: RFR: 8332120: Potential compilation failure in istream.cpp:205 - loss of data on conversion In-Reply-To: References: Message-ID: On Fri, 2 Aug 2024 06:49:42 GMT, David Holmes wrote: > Looks like it is for debugging to ensure it does in fact only have the values 0,1, 2. do you know why it's useful? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20427#issuecomment-2265206834 From aph at openjdk.org Fri Aug 2 13:22:35 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 2 Aug 2024 13:22:35 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: <0c_DhTP3MnvJxeWXzKaQnwuA43ibiCNePlX8alMYr10=.31cfb2ab-0739-4f28-8224-0fe5399bb42d@github.com> On Thu, 16 May 2024 12:40:30 GMT, Andrew Haley wrote: >> Hi, >> >>> I can update the patch with current results on Monday and we could decide how to proceed with this PR after that. Sounds good? >> >> Yes, that's right. > >> Hi @theRealAph ! You may find the latest version here: [mikabl-arm at b3db421](https://github.com/mikabl-arm/jdk/commit/b3db421c795f683db1a001853990026bafc2ed4b) . I gave a short explanation in the commit message, feel free to ask for more details if required. >> >> Unfortunately, it still contains critical bugs and I won't be able to take a look into the issue before the next week at best. Until it's fixed, it's not possible to run the benchmarks. Although I expect it to improve performance on longer integer arrays based on a benchmark I've written in C++ and Assembly. The results aren't comparable to the jmh results, so I won't post them here. > > OK. One small thing, I think it's possible to rearrange things a bit to use `mlav`, which may help performance. No need for that until the code is correct, though. > @theRealAph , I've updated https://github.com/mikabl-arm/jdk/tree/285826-vmul to implement the Neon part of the intrinsic as a separate stub method. This solves the second issue mentioned in the comment above. > > Could you check the code in https://github.com/mikabl-arm/jdk/tree/285826-vmul ? if you're happy with the direction it's taking, I'll merge the changes to this PR's branch (https://github.com/mikabl-arm/jdk/tree/8322770) to make things a bit easier to keep track of. It's very hard for me to look at what you propose as a tree. I think you can provide it as a diff between two commits. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2265393446 From chagedorn at openjdk.org Fri Aug 2 13:45:39 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 2 Aug 2024 13:45:39 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v2] In-Reply-To: References: <2IdxXlsbkFOF9BnHuiSXm96Fil-4YoA0GCdKOIz2tPE=.c596ab28-a346-44f6-9e80-7ee76a2aa20b@github.com> Message-ID: On Thu, 1 Aug 2024 09:26:19 GMT, Aleksey Shipilev wrote: >> I'm not saying that `RestrictStable` should be made product. It was a deliberate decision to limit it only to trusted classes. >> >> There are existing tests for `@Stable` (under `test/hotspot/jtreg/compiler/stable/`) and they don't require any special assistance from the JVM. > > Again, the problem here is that new tests are *IR Tests*, and I am struggling to find a good way to bootclasspath the classes that IR Test Framework itself invokes. A develop `RestrictStable` flag is the cleanest approach I could find. The tests you referenced are not IR tests, and so do not have this problem. IIUC, you want to somehow have `-Xbootclasspath/a:path_to_your_ir_test`. I guess that's currently not so easy to do and probably needs some IR framework support. I've had a quick look. How about something like that (just prototyped, not fully tested): [ir_framework.patch](https://github.com/user-attachments/files/16471108/ir_framework.log) Then you could use it like that: TestFramework testFramework = new TestFramework(); testFramework .addFlags("...", "...") .bootstrapTestClasses() .start(); If that works for all your tests, then you could either add the IR framework changes directly to this patch or we do it in preceding RFE separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19635#discussion_r1701869062 From duke at openjdk.org Fri Aug 2 14:21:35 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Fri, 2 Aug 2024 14:21:35 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 In-Reply-To: <0c_DhTP3MnvJxeWXzKaQnwuA43ibiCNePlX8alMYr10=.31cfb2ab-0739-4f28-8224-0fe5399bb42d@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <0c_DhTP3MnvJxeWXzKaQnwuA43ibiCNePlX8alMYr10=.31cfb2ab-0739-4f28-8224-0fe5399bb42d@github.com> Message-ID: <7xsr_nMPbckMWutR8jJecfE1eACBEM6w_OxMO69LJow=.8bdda884-ffe2-4ec0-8ce3-b93c398cbe22@github.com> On Fri, 2 Aug 2024 13:20:24 GMT, Andrew Haley wrote: > It's very hard for me to look at what you propose as a tree. I think you can provide it as a diff between two commits. Sure, please find all changes from the PR's branch squashed into a single commit here: https://github.com/mikabl-arm/jdk/tree/285826-vmul-squashed ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2265500730 From sspitsyn at openjdk.org Fri Aug 2 14:21:34 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 2 Aug 2024 14:21:34 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v7] In-Reply-To: References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> Message-ID: On Fri, 2 Aug 2024 03:24:12 GMT, Jiawei Tang wrote: >> I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing. > > Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision: > > rearrange to avoid long lines Looks good. Thank you for the update. ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20373#pullrequestreview-2215697249 From stuefe at openjdk.org Fri Aug 2 14:29:39 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 2 Aug 2024 14:29:39 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured [v4] In-Reply-To: References: Message-ID: On Wed, 31 Jul 2024 14:07:46 GMT, Matthias Baesken wrote: >> Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work. >> >> We find this in the test output >> >> [STDOUT] >> /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory >> >> The container where the test is executed does not contain the ubsan package; we might skip the test in this case. > > Matthias Baesken has updated the pull request incrementally with two additional commits since the last revision: > > - remove method from WhiteBox.java > - remove WB_isUbsanEnabled, fix test Looks reasonable ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19907#pullrequestreview-2215717149 From aph at openjdk.org Fri Aug 2 15:05:33 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 2 Aug 2024 15:05:33 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 In-Reply-To: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Tue, 26 Mar 2024 13:59:12 GMT, Mikhail Ablakatov wrote: > Hello, > > Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. > > The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. > > At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. > > # Performance > > ## Neoverse N1 > > > -------------------------------------------------------------------------------------------- > Version Baseline This patch > -------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > -------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op > ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op > ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op > ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op > ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op > ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op > ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op > ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op > ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op > ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op > ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op > ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op > ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op > ArraysHashCode.multibytes 10 avgt 15 5.4... What is going on with `C2_MacroAssembler::arrays_hashcode_elsize`? It's a change that should not be in this patch. Anso, we already have functions to convert from BasicType to size. Please take out every change that either does nothing or duplicates functionality that already exists. See `type2aelembytes`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2265596665 PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2265600762 From duke at openjdk.org Fri Aug 2 15:30:35 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Fri, 2 Aug 2024 15:30:35 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Fri, 2 Aug 2024 15:00:38 GMT, Andrew Haley wrote: > What is going on with C2_MacroAssembler::arrays_hashcode_elsize? It's a change that should not be in this patch. It was a part of a cleanup. `C2_MacroAssembler::arrays_hashcode_elsize` was added by this PR, it's not present in [openjdk:master](https://github.com/openjdk/jdk/tree/master). JIC, please note that [mikabl-arm:285826-vmul-squashed](https://github.com/mikabl-arm/jdk/tree/285826-vmul-squashed) includes two commits: https://github.com/mikabl-arm/jdk/commit/f19203015fb69e50636bdfa597c7aa48176a56cc presented in the PR and https://github.com/mikabl-arm/jdk/commit/3a52c7f89c293b79559201149f3159d5a8c831b6 / `HEAD` that develops further on the current state of the PR. > Please take out every change that either does nothing or duplicates functionality that already exists. See type2aelembytes. May I suggest to remove all *clenaup* changes from [mikabl-arm:285826-vmul-squashed](https://github.com/mikabl-arm/jdk/tree/285826-vmul-squashed) and merge it to [mikabl-arm:8322770](https://github.com/mikabl-arm/jdk/tree/8322770) (this PR's branch) first? I can address any comments including `type2aelembytes` afterwards. This should make it easier to keep track of changes as now there are two different branches and I feel that this might get confusing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2265647359 From kvn at openjdk.org Fri Aug 2 15:32:35 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 2 Aug 2024 15:32:35 GMT Subject: RFR: 8336163: Remove declarations of some debug-only methods in release build [v3] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 08:33:48 GMT, Qizheng Xing wrote: >> Some of the methods are defined only in debug mode, but their declarations still exist in release mode. >> >> This is considered a bug because these methods may be called mistakenly in release mode and cause the build to fail. > > Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: > > Merge `#ifndef PRODUCT` regions. The GHA testing is not activated for this branch. Please fix it. Even so changes look trivial it should be tested (VM build) on all supported platforms in GHA. ------------- Changes requested by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20131#pullrequestreview-2215862303 From kvn at openjdk.org Fri Aug 2 15:56:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 2 Aug 2024 15:56:05 GMT Subject: RFR: 8337396: Cleanup usage of ExternalAddess [v5] In-Reply-To: References: Message-ID: > `ExternalAddess` should be used only for data load. For calls (and jump) instructions we should use `RuntimeAddress` which uses `runtime_call_Relocation`. > > I found few places where `ExternalAddess` is used incorrectly and fixed them. > > I also added code to print "hottest" (most referenced) `ExternalAddess` addresses in global table to move them into static global tables which will be introduced by [JDK-8334691](https://bugs.openjdk.org/browse/JDK-8334691) and [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519). > > Here is current output from debug VM on MacBook M1 (Aarch64): > > External addresses table: 6 entries, 324 accesses > 0: 158 0x00000001082de0f0 : extn: vmClasses::_klasses+480 > 1: 84 0x00000001082ddf20 : extn: vmClasses::_klasses+16 > 2: 40 0x00000001082c4790 : extn: SharedRuntime::_partial_subtype_ctr > 3: 24 0x00000001082bdb04 : extn: JvmtiExport::_should_notify_object_alloc > 4: 18 0x0000000118384080 : stub: forward exception > > > on MacOS-x64: > > External addresses table: 143 entries, 44405 accesses > 0: 11766 0x00000001047922a0 : extn: CompressedOops::_narrow_oop > 1: 11002 0x0000000104474384 : 'should not reach here' > 2: 9672 0x0000000104581a90 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+882068 > 3: 2447 0x0000000104508005 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+383753 > 4: 1916 0x000000010458188e : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+881554 > > > and on linux-x64: > > External addresses table: 143 entries, 77297 accesses > 0: 22334 0x00007f35d5b9c000 : '' > 1: 19789 0x00007f35d55eea1f : 'should not reach here' > 2: 18366 0x00007f35d5747bb8 : 'MacroAssembler::decode_heap_oop: heap base corrupted?' > 3: 5036 0x00007f35d56e4d40 : 'uncommon trap returned which should never happen' > 4: 3643 0x00007f35d57479f8 : 'MacroAssembler::encode_heap_oop: heap base corrupted?' > > > Few points about difference in output: > 1. aarch64 does not use `ExternalAddess` or any relocation for messages (strings). > 2. `stub: forward exception` corresponds to `StubRoutines::forward_exception_entry()` for which C2 generates tail-call from [C2's stubs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/generateOptoStub.cpp#L258C48-L258C87). It is difficult to convert it to `RuntimeAddress` because how relocation for constants in C2 are handled. > 3. linux-x64 implementation of `dlladdr()`, I used to print C++ symbol name, only... Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Remove PrintNMethodStatistics not needed check in ExternalsRecorder::print_statistics ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20412/files - new: https://git.openjdk.org/jdk/pull/20412/files/f19d284f..d734d41c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20412&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20412&range=03-04 Stats: 4 lines in 1 file changed: 1 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20412.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20412/head:pull/20412 PR: https://git.openjdk.org/jdk/pull/20412 From adinn at openjdk.org Fri Aug 2 17:52:10 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Fri, 2 Aug 2024 17:52:10 GMT Subject: RFR: 8337654: Relocate uncommon trap stub from SharedRuntime to OptoRuntime [v4] In-Reply-To: References: Message-ID: > Reorganization of generation and management code for C2-specific blob so that it is, as far as possible, under the scope of class OptoRuntime with an implementation located in C2-specific source files. Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: correct riscv source ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20417/files - new: https://git.openjdk.org/jdk/pull/20417/files/7d7b3288..15fb5fa0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20417&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20417&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20417.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20417/head:pull/20417 PR: https://git.openjdk.org/jdk/pull/20417 From adinn at openjdk.org Fri Aug 2 17:52:11 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Fri, 2 Aug 2024 17:52:11 GMT Subject: RFR: 8337654: Relocate uncommon trap stub from SharedRuntime to OptoRuntime [v3] In-Reply-To: References: Message-ID: On Fri, 2 Aug 2024 04:34:37 GMT, Fei Yang wrote: >> Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: >> >> add new riscv source > > Hi Andrew, I think we are lacking following addon change for riscv. Built fastdebug and manually tested on linux-riscv64. > > > diff --git a/src/hotspot/cpu/riscv/runtime_riscv.cpp b/src/hotspot/cpu/riscv/runtime_riscv.cpp > index 46fdcc65a2c..c030441857c 100644 > --- a/src/hotspot/cpu/riscv/runtime_riscv.cpp > +++ b/src/hotspot/cpu/riscv/runtime_riscv.cpp > @@ -58,7 +58,7 @@ class SimpleRuntimeFrame { > #define __ masm-> > > //------------------------------generate_uncommon_trap_blob-------------------- > -void SharedRuntime::generate_uncommon_trap_blob() { > +void OptoRuntime::generate_uncommon_trap_blob() { > // Allocate space for the code > ResourceMark rm; > // Setup code generation tools > @@ -122,7 +122,7 @@ void SharedRuntime::generate_uncommon_trap_blob() { > __ lwu(t0, Address(x14, Deoptimization::UnrollBlock::unpack_kind_offset())); > __ mv(t1, Deoptimization::Unpack_uncommon_trap); > __ beq(t0, t1, L); > - __ stop("SharedRuntime::generate_uncommon_trap_blob: expected Unpack_uncommon_trap"); > + __ stop("OptoRuntime::generate_uncommon_trap_blob: expected Unpack_uncommon_trap"); > __ bind(L); > } > #endif > @@ -377,6 +377,5 @@ void OptoRuntime::generate_exception_blob() { > // Set exception blob > _exception_blob = ExceptionBlob::create(&buffer, oop_maps, SimpleRuntimeFrame::framesize >> 1); > } > -#endif // COMPILER2 > - > > +#endif // COMPILER2 Hi Fei (@RealFYang). Thanks for checking. I pushed a corrected version. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20417#issuecomment-2265870940 From vlivanov at openjdk.org Fri Aug 2 18:23:32 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 2 Aug 2024 18:23:32 GMT Subject: RFR: 8337396: Cleanup usage of ExternalAddess [v5] In-Reply-To: References: Message-ID: On Fri, 2 Aug 2024 15:56:05 GMT, Vladimir Kozlov wrote: >> `ExternalAddess` should be used only for data load. For calls (and jump) instructions we should use `RuntimeAddress` which uses `runtime_call_Relocation`. >> >> I found few places where `ExternalAddess` is used incorrectly and fixed them. >> >> I also added code to print "hottest" (most referenced) `ExternalAddess` addresses in global table to move them into static global tables which will be introduced by [JDK-8334691](https://bugs.openjdk.org/browse/JDK-8334691) and [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519). >> >> Here is current output from debug VM on MacBook M1 (Aarch64): >> >> External addresses table: 6 entries, 324 accesses >> 0: 158 0x00000001082de0f0 : extn: vmClasses::_klasses+480 >> 1: 84 0x00000001082ddf20 : extn: vmClasses::_klasses+16 >> 2: 40 0x00000001082c4790 : extn: SharedRuntime::_partial_subtype_ctr >> 3: 24 0x00000001082bdb04 : extn: JvmtiExport::_should_notify_object_alloc >> 4: 18 0x0000000118384080 : stub: forward exception >> >> >> on MacOS-x64: >> >> External addresses table: 143 entries, 44405 accesses >> 0: 11766 0x00000001047922a0 : extn: CompressedOops::_narrow_oop >> 1: 11002 0x0000000104474384 : 'should not reach here' >> 2: 9672 0x0000000104581a90 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+882068 >> 3: 2447 0x0000000104508005 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+383753 >> 4: 1916 0x000000010458188e : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+881554 >> >> >> and on linux-x64: >> >> External addresses table: 143 entries, 77297 accesses >> 0: 22334 0x00007f35d5b9c000 : '' >> 1: 19789 0x00007f35d55eea1f : 'should not reach here' >> 2: 18366 0x00007f35d5747bb8 : 'MacroAssembler::decode_heap_oop: heap base corrupted?' >> 3: 5036 0x00007f35d56e4d40 : 'uncommon trap returned which should never happen' >> 4: 3643 0x00007f35d57479f8 : 'MacroAssembler::encode_heap_oop: heap base corrupted?' >> >> >> Few points about difference in output: >> 1. aarch64 does not use `ExternalAddess` or any relocation for messages (strings). >> 2. `stub: forward exception` corresponds to `StubRoutines::forward_exception_entry()` for which C2 generates tail-call from [C2's stubs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/generateOptoStub.cpp#L258C48-L258C87). It is difficult to convert it to `RuntimeAddress` because how relocation for constants in C2 are handled. >> 3... > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Remove PrintNMethodStatistics not needed check in ExternalsRecorder::print_statistics Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20412#pullrequestreview-2216158126 From kvn at openjdk.org Fri Aug 2 18:57:31 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 2 Aug 2024 18:57:31 GMT Subject: RFR: 8337396: Cleanup usage of ExternalAddess [v5] In-Reply-To: References: Message-ID: On Fri, 2 Aug 2024 15:56:05 GMT, Vladimir Kozlov wrote: >> `ExternalAddess` should be used only for data load. For calls (and jump) instructions we should use `RuntimeAddress` which uses `runtime_call_Relocation`. >> >> I found few places where `ExternalAddess` is used incorrectly and fixed them. >> >> I also added code to print "hottest" (most referenced) `ExternalAddess` addresses in global table to move them into static global tables which will be introduced by [JDK-8334691](https://bugs.openjdk.org/browse/JDK-8334691) and [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519). >> >> Here is current output from debug VM on MacBook M1 (Aarch64): >> >> External addresses table: 6 entries, 324 accesses >> 0: 158 0x00000001082de0f0 : extn: vmClasses::_klasses+480 >> 1: 84 0x00000001082ddf20 : extn: vmClasses::_klasses+16 >> 2: 40 0x00000001082c4790 : extn: SharedRuntime::_partial_subtype_ctr >> 3: 24 0x00000001082bdb04 : extn: JvmtiExport::_should_notify_object_alloc >> 4: 18 0x0000000118384080 : stub: forward exception >> >> >> on MacOS-x64: >> >> External addresses table: 143 entries, 44405 accesses >> 0: 11766 0x00000001047922a0 : extn: CompressedOops::_narrow_oop >> 1: 11002 0x0000000104474384 : 'should not reach here' >> 2: 9672 0x0000000104581a90 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+882068 >> 3: 2447 0x0000000104508005 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+383753 >> 4: 1916 0x000000010458188e : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+881554 >> >> >> and on linux-x64: >> >> External addresses table: 143 entries, 77297 accesses >> 0: 22334 0x00007f35d5b9c000 : '' >> 1: 19789 0x00007f35d55eea1f : 'should not reach here' >> 2: 18366 0x00007f35d5747bb8 : 'MacroAssembler::decode_heap_oop: heap base corrupted?' >> 3: 5036 0x00007f35d56e4d40 : 'uncommon trap returned which should never happen' >> 4: 3643 0x00007f35d57479f8 : 'MacroAssembler::encode_heap_oop: heap base corrupted?' >> >> >> Few points about difference in output: >> 1. aarch64 does not use `ExternalAddess` or any relocation for messages (strings). >> 2. `stub: forward exception` corresponds to `StubRoutines::forward_exception_entry()` for which C2 generates tail-call from [C2's stubs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/generateOptoStub.cpp#L258C48-L258C87). It is difficult to convert it to `RuntimeAddress` because how relocation for constants in C2 are handled. >> 3... > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Remove PrintNMethodStatistics not needed check in ExternalsRecorder::print_statistics Thank you, Vladimir ------------- PR Comment: https://git.openjdk.org/jdk/pull/20412#issuecomment-2265962526 From stevenschlansker at gmail.com Fri Aug 2 19:11:15 2024 From: stevenschlansker at gmail.com (Steven Schlansker) Date: Fri, 2 Aug 2024 12:11:15 -0700 Subject: Reliability of JVM in face of "recoverable" Errors, e.g. out of code cache space Message-ID: <26618BA9-BCF5-422D-89B5-8BEB20AF7856@gmail.com> Hi hotspot-dev, Please let me know me if this is not an appropriate place to raise this kind of question - happy to move to another more appropriate list We run the JVM (22.0.1) with many different application contexts loaded into one JVM, like an application server. This places a rather high demand on the code cache. We've observed warning messages about the code cache being full and compiler being disabled, so we increase our ReservedCodeCacheSize to make some space, and move on with life. This week, we ran into a new type of failure, that is much more serious than a warning but non-fatal to the JVM. Caused by: java.lang.ExceptionInInitializerError: Caused by: java.lang.NoClassDefFoundError: Could not initialize class java.time.temporal.WeekFields Caused by: Exception java.lang.VirtualMachineError: Out of space in CodeCache for adapters at java.base/java.time.format.DateTimeFormatterBuilder$WeekBasedFieldPrinterParser.printerParser(DateTimeFormatterBuilder.java:5264) at java.base/java.time.format.DateTimeFormatterBuilder$WeekBasedFieldPrinterParser.format(DateTimeFormatterBuilder.java:5248) at java.base/java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format(DateTimeFormatterBuilder.java:2529) at java.base/java.time.format.DateTimeFormatter.formatTo(DateTimeFormatter.java:1905) at java.base/java.time.format.DateTimeFormatter.format(DateTimeFormatter.java:1879) at java.base/java.time.LocalDate.format(LocalDate.java:1797) ... Once this happens, the affected classes (in this case the java.time infrastructure) is effectively dead for the remainder of JVM lifetime. As part of our JVM reliability configuration, we attempt to set -XX:OnError=/bin/gather-debuginfo-then-kill -9 %p to ensure that unexpected errors terminate the JVM, rather than leave it in an uncertain state. However, this particular VirtualMachineError does not seem to be triggering this OnError logic. Reading the docs, it seems that this is only triggered for 'irrecoverable' errors, which I guess this does not qualify as, since it triggers a userland exception not a hotspot dump. However, trying to imagine how we would recover from such a situation, it's not clear at all what to do. At this point some arbitrary subset of classes are no longer usable, forever. Even logging a date could fail. Arguably, user code shouldn't be thinking about VirtualMachineError as a possibility at all, as what can you even trust to work afterward? The exception could be thrown in an arbitrary thread - maybe it's one we control, but maybe it's thrown in a background thread like a Jetty server or Redis client io thread. Where it is thrown is not predictable either, making it very hard to add a "catch" clause and terminate the JVM, since nearly any statement could fail. Most threads are careful to have a top-level catch and log, so the uncaught exception handler does not seem reliable either. Ideally, I would turn on some VM option like '-XX:VMErrorIsAlwaysFatal' to trigger a hs dump, rather than ever seeing this sort of failure in userland. How can a user application recover from such an error happening? (I think it cannot.) If we cannot recover, how can we reliably configure the JVM to crash completely if such an error happens? I suppose a debugger-like tool could breakpoint on throwing VirtualMachineError, or maybe an agent could transform the VME constructor, but this doesn't feel "production-ready". Thank you for any advice! Steven Schlansker From lmesnik at openjdk.org Fri Aug 2 19:32:32 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 2 Aug 2024 19:32:32 GMT Subject: RFR: 8336846: assert(state->get_thread() == jt) failed: handshake unsafe conditions [v2] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 09:37:08 GMT, Serguei Spitsyn wrote: >> The JVMTI Watch Field functions do not disable VTMS transitions with the `JvmtiVTMSTransitionDisabler`: >> - `SetFieldAccessWatch()` >> - `ClearFieldAccessWatch()` >> - `SetFieldModificationWatch()` >> - `ClearFieldModificationWatch()` >> so in the `recompute_enabled()` we could see that a vthread is mounted, but in the `EnterInterpOnlyModeClosure` handshake the thread could have been unmounted already. This is a root cause of failures with this assert. >> >> The fix is to disable transitions in the `JvmtiEventControllerPrivate::change_field_watch()` function. >> >> Testing: >> - TBD: submit mach5 tiers 1-6 > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > rearranged to have one JvmtiVTMSTransitionDisabler instead of two Marked as reviewed by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20413#pullrequestreview-2216273600 From sspitsyn at openjdk.org Fri Aug 2 23:26:31 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 2 Aug 2024 23:26:31 GMT Subject: RFR: 8336846: assert(state->get_thread() == jt) failed: handshake unsafe conditions [v2] In-Reply-To: <8AOxsfURVgBRGlS8WGtZ1wubjuUozfK-LcLkf9BGVoQ=.bb20a5eb-8821-48cd-bb09-0dfa8870f6f3@github.com> References: <8AOxsfURVgBRGlS8WGtZ1wubjuUozfK-LcLkf9BGVoQ=.bb20a5eb-8821-48cd-bb09-0dfa8870f6f3@github.com> Message-ID: On Thu, 1 Aug 2024 22:44:37 GMT, Patricio Chilano Mateo wrote: > Looks good, but I see we have ranking issues with JvmtiThreadState_lock now. We will have to change JvmtiVTMSTransition_lock to be safepoint-1. Thank you, Patricio. Will check and test this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20413#issuecomment-2266253029 From sspitsyn at openjdk.org Fri Aug 2 23:26:32 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 2 Aug 2024 23:26:32 GMT Subject: RFR: 8336846: assert(state->get_thread() == jt) failed: handshake unsafe conditions [v2] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 09:37:08 GMT, Serguei Spitsyn wrote: >> The JVMTI Watch Field functions do not disable VTMS transitions with the `JvmtiVTMSTransitionDisabler`: >> - `SetFieldAccessWatch()` >> - `ClearFieldAccessWatch()` >> - `SetFieldModificationWatch()` >> - `ClearFieldModificationWatch()` >> so in the `recompute_enabled()` we could see that a vthread is mounted, but in the `EnterInterpOnlyModeClosure` handshake the thread could have been unmounted already. This is a root cause of failures with this assert. >> >> The fix is to disable transitions in the `JvmtiEventControllerPrivate::change_field_watch()` function. >> >> Testing: >> - TBD: submit mach5 tiers 1-6 > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > rearranged to have one JvmtiVTMSTransitionDisabler instead of two Chris, Alex and Leonid, thank you for review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20413#issuecomment-2266253476 From sspitsyn at openjdk.org Fri Aug 2 23:36:05 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 2 Aug 2024 23:36:05 GMT Subject: RFR: 8336846: assert(state->get_thread() == jt) failed: handshake unsafe conditions [v3] In-Reply-To: References: Message-ID: <4Fhved_SVvx_PSjHejewmqB3UlRnod--OiIS0hGToXA=.5aded3ae-0bfe-4f37-a7e6-81489ea1aeff@github.com> > The JVMTI Watch Field functions do not disable VTMS transitions with the `JvmtiVTMSTransitionDisabler`: > - `SetFieldAccessWatch()` > - `ClearFieldAccessWatch()` > - `SetFieldModificationWatch()` > - `ClearFieldModificationWatch()` > so in the `recompute_enabled()` we could see that a vthread is mounted, but in the `EnterInterpOnlyModeClosure` handshake the thread could have been unmounted already. This is a root cause of failures with this assert. > > The fix is to disable transitions in the `JvmtiEventControllerPrivate::change_field_watch()` function. > > Testing: > - TBD: submit mach5 tiers 1-6 Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: adjusted the JvmtiVTMSTransition_lock rank to safepoint-1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20413/files - new: https://git.openjdk.org/jdk/pull/20413/files/bb94448c..3d166446 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20413&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20413&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20413/head:pull/20413 PR: https://git.openjdk.org/jdk/pull/20413 From fyang at openjdk.org Sat Aug 3 01:24:41 2024 From: fyang at openjdk.org (Fei Yang) Date: Sat, 3 Aug 2024 01:24:41 GMT Subject: RFR: 8337654: Relocate uncommon trap stub from SharedRuntime to OptoRuntime [v4] In-Reply-To: References: Message-ID: On Fri, 2 Aug 2024 17:52:10 GMT, Andrew Dinn wrote: >> Reorganization of generation and management code for C2-specific blob so that it is, as far as possible, under the scope of class OptoRuntime with an implementation located in C2-specific source files. > > Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: > > correct riscv source LGTM. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20417#pullrequestreview-2216569772 From kvn at openjdk.org Sat Aug 3 06:24:35 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 3 Aug 2024 06:24:35 GMT Subject: Integrated: 8337396: Cleanup usage of ExternalAddess In-Reply-To: References: Message-ID: On Wed, 31 Jul 2024 23:42:36 GMT, Vladimir Kozlov wrote: > `ExternalAddess` should be used only for data load. For calls (and jump) instructions we should use `RuntimeAddress` which uses `runtime_call_Relocation`. > > I found few places where `ExternalAddess` is used incorrectly and fixed them. > > I also added code to print "hottest" (most referenced) `ExternalAddess` addresses in global table to move them into static global tables which will be introduced by [JDK-8334691](https://bugs.openjdk.org/browse/JDK-8334691) and [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519). > > Here is current output from debug VM on MacBook M1 (Aarch64): > > External addresses table: 6 entries, 324 accesses > 0: 158 0x00000001082de0f0 : extn: vmClasses::_klasses+480 > 1: 84 0x00000001082ddf20 : extn: vmClasses::_klasses+16 > 2: 40 0x00000001082c4790 : extn: SharedRuntime::_partial_subtype_ctr > 3: 24 0x00000001082bdb04 : extn: JvmtiExport::_should_notify_object_alloc > 4: 18 0x0000000118384080 : stub: forward exception > > > on MacOS-x64: > > External addresses table: 143 entries, 44405 accesses > 0: 11766 0x00000001047922a0 : extn: CompressedOops::_narrow_oop > 1: 11002 0x0000000104474384 : 'should not reach here' > 2: 9672 0x0000000104581a90 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+882068 > 3: 2447 0x0000000104508005 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+383753 > 4: 1916 0x000000010458188e : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+881554 > > > and on linux-x64: > > External addresses table: 143 entries, 77297 accesses > 0: 22334 0x00007f35d5b9c000 : '' > 1: 19789 0x00007f35d55eea1f : 'should not reach here' > 2: 18366 0x00007f35d5747bb8 : 'MacroAssembler::decode_heap_oop: heap base corrupted?' > 3: 5036 0x00007f35d56e4d40 : 'uncommon trap returned which should never happen' > 4: 3643 0x00007f35d57479f8 : 'MacroAssembler::encode_heap_oop: heap base corrupted?' > > > Few points about difference in output: > 1. aarch64 does not use `ExternalAddess` or any relocation for messages (strings). > 2. `stub: forward exception` corresponds to `StubRoutines::forward_exception_entry()` for which C2 generates tail-call from [C2's stubs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/generateOptoStub.cpp#L258C48-L258C87). It is difficult to convert it to `RuntimeAddress` because how relocation for constants in C2 are handled. > 3. linux-x64 implementation of `dlladdr()`, I used to print C++ symbol name, only... This pull request has now been integrated. Changeset: 34edc735 Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/34edc7358f733cdf433d0ff50921bcb5a94c5e35 Stats: 120 lines in 10 files changed: 85 ins; 3 del; 32 mod 8337396: Cleanup usage of ExternalAddess Co-authored-by: Fei Yang Reviewed-by: vlivanov, adinn ------------- PR: https://git.openjdk.org/jdk/pull/20412 From aph at openjdk.org Sat Aug 3 12:39:37 2024 From: aph at openjdk.org (Andrew Haley) Date: Sat, 3 Aug 2024 12:39:37 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 In-Reply-To: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Tue, 26 Mar 2024 13:59:12 GMT, Mikhail Ablakatov wrote: > Hello, > > Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. > > The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. > > At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. > > # Performance > > ## Neoverse N1 > > > -------------------------------------------------------------------------------------------- > Version Baseline This patch > -------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > -------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op > ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op > ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op > ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op > ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op > ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op > ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op > ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op > ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op > ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op > ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op > ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op > ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op > ArraysHashCode.multibytes 10 avgt 15 5.4... > > What is going on with C2_MacroAssembler::arrays_hashcode_elsize? It's a change that should not be in this patch. > > It was a part of a cleanup. `C2_MacroAssembler::arrays_hashcode_elsize` was added by this PR, it's not present in [openjdk:master](https://github.com/openjdk/jdk/tree/master). JIC, please note that [mikabl-arm:285826-vmul-squashed](https://github.com/mikabl-arm/jdk/tree/285826-vmul-squashed) includes two commits: [mikabl-arm at f192030](https://github.com/mikabl-arm/jdk/commit/f19203015fb69e50636bdfa597c7aa48176a56cc) presented in the PR and [mikabl-arm at 3a52c7f](https://github.com/mikabl-arm/jdk/commit/3a52c7f89c293b79559201149f3159d5a8c831b6) / `HEAD` that develops further on the current state of the PR. > > > Please take out every change that either does nothing or duplicates functionality that already exists. See type2aelembytes. > > May I suggest to remove all _clenaup_ changes from [mikabl-arm:285826-vmul-squashed](https://github.com/mikabl-arm/jdk/tree/285826-vmul-squashed) and merge it to [mikabl-arm:8322770](https://github.com/mikabl-arm/jdk/tree/8322770) (this PR's branch) first? I can address any comments including `type2aelembytes` afterwards. This should make it easier to keep track of changes as now there are two different branches and I feel that this might get confusing. I'm thinking that "clean > > What is going on with C2_MacroAssembler::arrays_hashcode_elsize? It's a change that should not be in this patch. > > It was a part of a cleanup. `C2_MacroAssembler::arrays_hashcode_elsize` was added by this PR, it's not present in [openjdk:master](https://github.com/openjdk/jdk/tree/master). Oh, I see. I was assuming that this was a diff from master. I was in a hurry at the time... > > Please take out every change that either does nothing or duplicates functionality that already exists. See type2aelembytes. > > May I suggest to remove all _clenaup_ changes from [mikabl-arm:285826-vmul-squashed](https://github.com/mikabl-arm/jdk/tree/285826-vmul-squashed) and merge it to [mikabl-arm:8322770](https://github.com/mikabl-arm/jdk/tree/8322770) (this PR's branch) first? I can address any comments including `type2aelembytes` afterwards. This should make it easier to keep track of changes as now there are two different branches and I feel that this might get confusing. It certainly is. If you can tell me the hashes of the code you want me to look at and the master commit it's based on it'll be much easier to see. Looking past the cleanups, though, `generate_large_arrays_hashcode` looks like it's doing exactly what we need. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2266699361 From kbarrett at openjdk.org Sat Aug 3 23:30:39 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 3 Aug 2024 23:30:39 GMT Subject: RFR: 8337785: Fix simple -Wzero-as-null-pointer-constant warnings in x86 code Message-ID: Please review this trivial change that replaces some uses of literal 0 as a null pointer constant in x86 code to instead use nullptr. Testing: mach5 tier1 ------------- Commit messages: - fix simple x86 Changes: https://git.openjdk.org/jdk/pull/20453/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20453&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337785 Stats: 13 lines in 5 files changed: 0 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/20453.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20453/head:pull/20453 PR: https://git.openjdk.org/jdk/pull/20453 From kbarrett at openjdk.org Sat Aug 3 23:33:02 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 3 Aug 2024 23:33:02 GMT Subject: RFR: 8337786: Fix simple -Wzero-as-null-pointer-constant warnings in aarch64 code Message-ID: Please review this trivial change that replaces some uses of literal 0 as a null pointer constant in aarch64 code to instead use nullptr. Testing: mach5 tier1 ------------- Commit messages: - fix aarch64 Changes: https://git.openjdk.org/jdk/pull/20454/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20454&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337786 Stats: 11 lines in 6 files changed: 0 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/20454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20454/head:pull/20454 PR: https://git.openjdk.org/jdk/pull/20454 From kbarrett at openjdk.org Sat Aug 3 23:37:58 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 3 Aug 2024 23:37:58 GMT Subject: RFR: 8337787: Fix -Wzero-as-null-pointer-constant warnings when JVMTI feature is disabled Message-ID: Please review this trivial change that replaces some uses of literal 0 as a null pointer constant when JVMTI is disabled to instead use nullptr. Testing: mach5 tier1 ------------- Commit messages: - fix NOT_JVMTI Changes: https://git.openjdk.org/jdk/pull/20455/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20455&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337787 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20455.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20455/head:pull/20455 PR: https://git.openjdk.org/jdk/pull/20455 From kbarrett at openjdk.org Sat Aug 3 23:43:59 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 3 Aug 2024 23:43:59 GMT Subject: RFR: 8337782: Use THROW_NULL instead of THROW_0 in pointer contexts in prims code Message-ID: Please review this trivial change that, in places in prims code where a pointer value is required, replaces uses of THROW_0 and THROW_MSG_0 with THROW_NULL and THROW_MSG_NULL respectively. Testing: mach5 tier1 ------------- Commit messages: - fix THROW_0 in prims Changes: https://git.openjdk.org/jdk/pull/20456/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20456&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337782 Stats: 47 lines in 6 files changed: 0 ins; 0 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/20456.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20456/head:pull/20456 PR: https://git.openjdk.org/jdk/pull/20456 From kbarrett at openjdk.org Sat Aug 3 23:46:58 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 3 Aug 2024 23:46:58 GMT Subject: RFR: 8337783: Use THROW_NULL instead of THROW_0 in pointer contexts in misc runtime code Message-ID: Please review this trivial change that, in misc. runtime code where a pointer value is required, replaces uses of THROW_0 and THROW_MSG_0 with THROW_NULL and THROW_MSG_NULL respectively. Testing: mach5 tier1 ------------- Commit messages: - fix THROW_0 in runtime Changes: https://git.openjdk.org/jdk/pull/20457/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20457&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337783 Stats: 33 lines in 6 files changed: 0 ins; 0 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/20457.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20457/head:pull/20457 PR: https://git.openjdk.org/jdk/pull/20457 From gcao at openjdk.org Sun Aug 4 02:48:13 2024 From: gcao at openjdk.org (Gui Cao) Date: Sun, 4 Aug 2024 02:48:13 GMT Subject: RFR: 8337788: RISC-V: Cleanup code in MacroAssembler::reserved_stack_check Message-ID: Hi, In the MacroAssembler::reserved_stack_check() function: RuntimeAddress target(StubRoutines::throw_delayed_StackOverflowError_entry()); relocate(target.rspec(), [&] { int32_t offset; movptr(t0, target.target(), offset); jr(t0, offset); }); can be simplified to: la(t0, RuntimeAddress(StubRoutines::throw_delayed_StackOverflowError_entry())); jr(t0); In addition, the code formatting has been modified to remove the extra spaces before the code. Please take a look and have some reviews. Thanks a lot. ### Testing - [x] Run tier1-3 tests on SOPHON SG2042 (fastdebug) ------------- Commit messages: - JDK-8337788: RISC-V: Cleanup code in MacroAssembler::reserved_stack_check Changes: https://git.openjdk.org/jdk/pull/20458/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20458&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337788 Stats: 19 lines in 1 file changed: 0 ins; 4 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/20458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20458/head:pull/20458 PR: https://git.openjdk.org/jdk/pull/20458 From jwaters at openjdk.org Sun Aug 4 04:38:30 2024 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 4 Aug 2024 04:38:30 GMT Subject: RFR: 8337785: Fix simple -Wzero-as-null-pointer-constant warnings in x86 code In-Reply-To: References: Message-ID: On Sat, 3 Aug 2024 23:26:25 GMT, Kim Barrett wrote: > Please review this trivial change that replaces some uses of literal 0 as a > null pointer constant in x86 code to instead use nullptr. > > Testing: mach5 tier1 There really are a lot of these, aren't there? :P ------------- Marked as reviewed by jwaters (Committer). PR Review: https://git.openjdk.org/jdk/pull/20453#pullrequestreview-2217388280 From fyang at openjdk.org Mon Aug 5 01:09:33 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 5 Aug 2024 01:09:33 GMT Subject: RFR: 8337788: RISC-V: Cleanup code in MacroAssembler::reserved_stack_check In-Reply-To: References: Message-ID: On Sun, 4 Aug 2024 02:43:26 GMT, Gui Cao wrote: > Hi, > In the MacroAssembler::reserved_stack_check() function: > > RuntimeAddress target(StubRoutines::throw_delayed_StackOverflowError_entry()); > relocate(target.rspec(), [&] { > int32_t offset; > movptr(t0, target.target(), offset); > jr(t0, offset); > }); > > can be simplified to: > > la(t0, RuntimeAddress(StubRoutines::throw_delayed_StackOverflowError_entry())); > jr(t0); > > > In addition, the code formatting has been modified to remove the extra spaces before the code. > > Please take a look and have some reviews. Thanks a lot. > > ### Testing > - [x] Run tier1-3 tests on SOPHON SG2042 (fastdebug) Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20458#pullrequestreview-2217868792 From jwtang at openjdk.org Mon Aug 5 02:45:10 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Mon, 5 Aug 2024 02:45:10 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v8] In-Reply-To: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> Message-ID: <6Ep9l29NGGzJUZX7k4uoqt3AafKMzTinzEWQx6vyUhE=.667016aa-1867-42f1-a966-25dfef6f5a84@github.com> > I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing. Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision: change the format of the testcase file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20373/files - new: https://git.openjdk.org/jdk/pull/20373/files/80b139ef..f15cbea4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=06-07 Stats: 14 lines in 1 file changed: 6 ins; 5 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20373/head:pull/20373 PR: https://git.openjdk.org/jdk/pull/20373 From fyang at openjdk.org Mon Aug 5 03:00:42 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 5 Aug 2024 03:00:42 GMT Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v6] In-Reply-To: <0NpNq_wNl-qus6kEr_6J7liSQXXYdjybbWQWDJPGPmQ=.8ba0ea43-2bc7-4f01-afee-adb4a43da29c@github.com> References: <0NpNq_wNl-qus6kEr_6J7liSQXXYdjybbWQWDJPGPmQ=.8ba0ea43-2bc7-4f01-afee-adb4a43da29c@github.com> Message-ID: On Fri, 26 Jul 2024 08:10:01 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? >> >> I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it. >> >> Thanks. >> >> ## Test >> benchmarks run on CanVM-K230 (vlenb == 16), and banana-pi (vlenb == 32) >> >> I've tried several implementations, respectively with vector group >> * m2+m1+scalar >> * m2+scalar >> * m1+scalar >> * pure scalar >> The best one is combination of m2+m1, it have best performance in all source size. >> >> ### K230 >> >> this implementation (m2+m1) >> >> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic >> -- | -- | -- | -- | -- | -- | -- | -- | -- >> Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063 >> Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443 >> Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364 >> Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856 >> Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146 >> Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423 >> Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862 >> Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469 >> Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631 >> Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698 >> Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876 >> >> >> >> vector with only m2 >> References: <2IdxXlsbkFOF9BnHuiSXm96Fil-4YoA0GCdKOIz2tPE=.c596ab28-a346-44f6-9e80-7ee76a2aa20b@github.com> Message-ID: On Fri, 2 Aug 2024 13:42:57 GMT, Christian Hagedorn wrote: >> Again, the problem here is that new tests are *IR Tests*, and I am struggling to find a good way to bootclasspath the classes that IR Test Framework itself invokes. A develop `RestrictStable` flag is the cleanest approach I could find. The tests you referenced are not IR tests, and so do not have this problem. > > IIUC, you want to somehow have `-Xbootclasspath/a:path_to_your_ir_test`. I guess that's currently not so easy to do and probably needs some IR framework support. > > I've had a quick look. How about something like that (just prototyped, not fully tested): [ir_framework.patch](https://github.com/user-attachments/files/16471225/ir_framework.log) > > Then you could use it like that: > > TestFramework testFramework = new TestFramework(); > testFramework > .addFlags("...", "...") > .addTestClassesToBootClassPath() > .start(); > > > If that works for all your tests, then you could either add the IR framework changes directly to this patch or we do it in preceding RFE separately. Yeah, OK, let's do IR Framework update separately. I am planning to have this fix backported, so I would like to have test infra fixes also be more or less cleanly backportable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19635#discussion_r1703710553 From rehn at openjdk.org Mon Aug 5 08:33:31 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 5 Aug 2024 08:33:31 GMT Subject: RFR: 8337788: RISC-V: Cleanup code in MacroAssembler::reserved_stack_check In-Reply-To: References: Message-ID: On Sun, 4 Aug 2024 02:43:26 GMT, Gui Cao wrote: > Hi, > In the MacroAssembler::reserved_stack_check() function: > > RuntimeAddress target(StubRoutines::throw_delayed_StackOverflowError_entry()); > relocate(target.rspec(), [&] { > int32_t offset; > movptr(t0, target.target(), offset); > jr(t0, offset); > }); > > can be simplified to: > > la(t0, RuntimeAddress(StubRoutines::throw_delayed_StackOverflowError_entry())); > jr(t0); > > > In addition, the code formatting has been modified to remove the extra spaces before the code. > > Please take a look and have some reviews. Thanks a lot. > > ### Testing > - [x] Run tier1-3 tests on SOPHON SG2042 (fastdebug) Thanks! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20458#pullrequestreview-2218358487 From sspitsyn at openjdk.org Mon Aug 5 08:40:30 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 5 Aug 2024 08:40:30 GMT Subject: RFR: 8337787: Fix -Wzero-as-null-pointer-constant warnings when JVMTI feature is disabled In-Reply-To: References: Message-ID: On Sat, 3 Aug 2024 23:32:38 GMT, Kim Barrett wrote: > Please review this trivial change that replaces some uses of literal 0 as a > null pointer constant when JVMTI is disabled to instead use nullptr. > > Testing: mach5 tier1 Looks good and trivial. ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20455#pullrequestreview-2218371482 From qxing at openjdk.org Mon Aug 5 10:06:34 2024 From: qxing at openjdk.org (Qizheng Xing) Date: Mon, 5 Aug 2024 10:06:34 GMT Subject: RFR: 8336163: Remove declarations of some debug-only methods in release build [v3] In-Reply-To: References: Message-ID: On Fri, 2 Aug 2024 15:29:54 GMT, Vladimir Kozlov wrote: >> Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: >> >> Merge `#ifndef PRODUCT` regions. > > The GHA testing is not activated for this branch. Please fix it. > Even so changes look trivial it should be tested (VM build) on all supported platforms in GHA. @vnkozlov Got it, I have enabled GHA testing on my fork and ran all workflows manually. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20131#issuecomment-2268689075 From jpai at openjdk.org Mon Aug 5 10:37:33 2024 From: jpai at openjdk.org (Jaikiran Pai) Date: Mon, 5 Aug 2024 10:37:33 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v11] In-Reply-To: References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> Message-ID: On Mon, 5 Aug 2024 07:37:11 GMT, Jiawei Tang wrote: >> I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing. > > Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision: > > change the format of the testcase file test/hotspot/jtreg/serviceability/jvmti/vthread/TestPinCaseWithCFLH/TestPinCaseWithCFLH.java line 37: > 35: * @modules java.base/java.lang:+open > 36: * @compile TestPinCaseWithCFLH.java > 37: * @run driver jdk.test.lib.util.JavaAgentBuilder Hello @jia-wei-tang, does adding an additional `@build jdk.test.lib.thread.VThreadPinner` line before the `@run` line help fix the `NoClassDefFoundError` you note in your comment? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1703911631 From fjiang at openjdk.org Mon Aug 5 11:30:30 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 5 Aug 2024 11:30:30 GMT Subject: RFR: 8337788: RISC-V: Cleanup code in MacroAssembler::reserved_stack_check In-Reply-To: References: Message-ID: <4a-Kw9D0gnSKMtOaMORYW5vDAV0NFQXoPgv5iVGGRT0=.fe447ea0-4af7-4ede-9f31-01140e68eedb@github.com> On Sun, 4 Aug 2024 02:43:26 GMT, Gui Cao wrote: > Hi, > In the MacroAssembler::reserved_stack_check() function: > > RuntimeAddress target(StubRoutines::throw_delayed_StackOverflowError_entry()); > relocate(target.rspec(), [&] { > int32_t offset; > movptr(t0, target.target(), offset); > jr(t0, offset); > }); > > can be simplified to: > > la(t0, RuntimeAddress(StubRoutines::throw_delayed_StackOverflowError_entry())); > jr(t0); > > > In addition, the code formatting has been modified to remove the extra spaces before the code. > > Please take a look and have some reviews. Thanks a lot. > > ### Testing > - [x] Run tier1-3 tests on SOPHON SG2042 (fastdebug) looks good ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/20458#pullrequestreview-2218749059 From adinn at openjdk.org Mon Aug 5 13:22:39 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 5 Aug 2024 13:22:39 GMT Subject: RFR: 8337654: Relocate uncommon trap stub from SharedRuntime to OptoRuntime [v3] In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 23:51:55 GMT, Vladimir Ivanov wrote: >> Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: >> >> add new riscv source > > src/hotspot/cpu/aarch64/runtime_aarch64.cpp line 219: > >> 217: // crud. We cannot block on this call, no GC can happen. Call should >> 218: // restore return values to their stack-slots with the new SP. >> 219: // Thread is in rdi already. > > It's weird to see `rdi` being mentioned in AArch64-specific code :-) A hangover from the existing (relocated) code going back to when it was cloned from runtime_x86_64.cpp (where the comment still exists even though it appears also to be incorrect!). I'll delete it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20417#discussion_r1704111240 From adinn at openjdk.org Mon Aug 5 13:32:08 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 5 Aug 2024 13:32:08 GMT Subject: RFR: 8337654: Relocate uncommon trap stub from SharedRuntime to OptoRuntime [v5] In-Reply-To: References: Message-ID: > Reorganization of generation and management code for C2-specific blob so that it is, as far as possible, under the scope of class OptoRuntime with an implementation located in C2-specific source files. Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: clean up misleading comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20417/files - new: https://git.openjdk.org/jdk/pull/20417/files/15fb5fa0..d59dfd13 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20417&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20417&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20417.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20417/head:pull/20417 PR: https://git.openjdk.org/jdk/pull/20417 From coleenp at openjdk.org Mon Aug 5 13:37:31 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 5 Aug 2024 13:37:31 GMT Subject: RFR: 8337683: Fix -Wconversion problem with arrayOop.hpp In-Reply-To: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> References: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> Message-ID: On Thu, 1 Aug 2024 18:49:34 GMT, Coleen Phillimore wrote: > Since base_offset_in_bytes and HeapWordSize are int, there's no loss of conversion in making these variables int. This seems trivial. > Tested with tier1 on linux and windows. If I add in the -Wsign-conversion flags, there's a lot more lines that give an error (everywhere). I can close this as WNF if just fixing the -Wconversion error isn't helpful. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20431#issuecomment-2269095835 From chagedorn at openjdk.org Mon Aug 5 14:06:34 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 Aug 2024 14:06:34 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v2] In-Reply-To: References: <2IdxXlsbkFOF9BnHuiSXm96Fil-4YoA0GCdKOIz2tPE=.c596ab28-a346-44f6-9e80-7ee76a2aa20b@github.com> Message-ID: On Mon, 5 Aug 2024 08:09:02 GMT, Aleksey Shipilev wrote: >> IIUC, you want to somehow have `-Xbootclasspath/a:path_to_your_ir_test`. I guess that's currently not so easy to do and probably needs some IR framework support. >> >> I've had a quick look. How about something like that (just prototyped, not fully tested): [ir_framework.patch](https://github.com/user-attachments/files/16471225/ir_framework.log) >> >> Then you could use it like that: >> >> TestFramework testFramework = new TestFramework(); >> testFramework >> .addFlags("...", "...") >> .addTestClassesToBootClassPath() >> .start(); >> >> >> If that works for all your tests, then you could either add the IR framework changes directly to this patch or we do it in preceding RFE separately. > > Yeah, OK, let's do IR Framework update separately. I am planning to have this fix backported, so I would like to have test infra fixes also be more or less cleanly backportable. @chhagedorn, are you taking point on this, or should I? Sounds good, I can take care of it tomorrow. I will ping you in the PR once it is out. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19635#discussion_r1704174548 From mli at openjdk.org Mon Aug 5 14:27:04 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 5 Aug 2024 14:27:04 GMT Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v7] In-Reply-To: References: Message-ID: > Hi, > Can you help to review the patch? > > I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it. > > Thanks. > > ## Test > benchmarks run on CanVM-K230 (vlenb == 16), and banana-pi (vlenb == 32) > > I've tried several implementations, respectively with vector group > * m2+m1+scalar > * m2+scalar > * m1+scalar > * pure scalar > The best one is combination of m2+m1, it have best performance in all source size. > > ### K230 > > this implementation (m2+m1) > > Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic > -- | -- | -- | -- | -- | -- | -- | -- | -- > Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063 > Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443 > Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364 > Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856 > Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146 > Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423 > Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862 > Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469 > Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631 > Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698 > Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876 > > > > vector with only m2 > > Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic > -- | -- | -- | -- | -- | -- | -- | -- | -- > Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063 > Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443 > Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364 > Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856 > Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146 > Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423 > Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862 > Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469 > Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631 > Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698 > Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876 > > > > vector with only m2 > References: <26618BA9-BCF5-422D-89B5-8BEB20AF7856@gmail.com> Message-ID: Hi Steven, On 3/08/2024 5:11 am, Steven Schlansker wrote: > Hi hotspot-dev, > > Please let me know me if this is not an appropriate place to raise this kind of question - > happy to move to another more appropriate list This does seem like an issue with method linking in hotspot and so is appropriate. I would suggest filing a bug in JBS. > We run the JVM (22.0.1) with many different application contexts loaded into one JVM, like an application server. > This places a rather high demand on the code cache. We've observed warning messages about the > code cache being full and compiler being disabled, so we increase our ReservedCodeCacheSize to make some space, > and move on with life. > > This week, we ran into a new type of failure, that is much more serious than a warning but non-fatal to the JVM. > > Caused by: java.lang.ExceptionInInitializerError: > Caused by: java.lang.NoClassDefFoundError: Could not initialize class java.time.temporal.WeekFields > Caused by: Exception java.lang.VirtualMachineError: Out of space in CodeCache for adapters > at java.base/java.time.format.DateTimeFormatterBuilder$WeekBasedFieldPrinterParser.printerParser(DateTimeFormatterBuilder.java:5264) > at java.base/java.time.format.DateTimeFormatterBuilder$WeekBasedFieldPrinterParser.format(DateTimeFormatterBuilder.java:5248) > at java.base/java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format(DateTimeFormatterBuilder.java:2529) > at java.base/java.time.format.DateTimeFormatter.formatTo(DateTimeFormatter.java:1905) > at java.base/java.time.format.DateTimeFormatter.format(DateTimeFormatter.java:1879) > at java.base/java.time.LocalDate.format(LocalDate.java:1797) > ... > > Once this happens, the affected classes (in this case the java.time infrastructure) is effectively dead for the remainder of JVM lifetime. I don't know if something has changed, either in VM or in library code, that makes this more likely now, but the ability to throw an exception in this circumstance has existed for many, many years. That said, it seems inappropriate. > As part of our JVM reliability configuration, we attempt to set > -XX:OnError=/bin/gather-debuginfo-then-kill -9 %p > to ensure that unexpected errors terminate the JVM, rather than leave it in an uncertain state. > However, this particular VirtualMachineError does not seem to be triggering this OnError logic. Reading the docs, it seems > that this is only triggered for 'irrecoverable' errors, which I guess this does not qualify as, since it triggers a userland exception > not a hotspot dump. > > However, trying to imagine how we would recover from such a situation, it's not clear at all what to do. > At this point some arbitrary subset of classes are no longer usable, forever. Even logging a date could fail. > Arguably, user code shouldn't be thinking about VirtualMachineError as a possibility at all, as what can > you even trust to work afterward? Yes this is a problem with exceptions that can happen during static initializers. From a JDK perspective we need to make our classes more robust in this area, and avoid potentially problematic API's if necessary. In this case I think we need to look at method linking and the role of adapters and see if this is truly a fatal condition for them, as it is not recoverable in any general sense by the user and can cripple arbitrary classes as evidenced here. > The exception could be thrown in an arbitrary thread - maybe it's one we control, but maybe it's thrown in a background > thread like a Jetty server or Redis client io thread. Where it is thrown is not predictable either, making it very hard to > add a "catch" clause and terminate the JVM, since nearly any statement could fail. > Most threads are careful to have a top-level catch and log, so the uncaught exception handler does not seem reliable either. > > Ideally, I would turn on some VM option like '-XX:VMErrorIsAlwaysFatal' to trigger a hs dump, rather than ever seeing this > sort of failure in userland. There is the diagnostic -XX:AbortVMOnException=xxx flag, but here the xxx would be `OutOfMemoryError` and that may be too broad to be useful. David ----- > > How can a user application recover from such an error happening? (I think it cannot.) If we cannot recover, how can we reliably > configure the JVM to crash completely if such an error happens? I suppose a debugger-like tool could breakpoint > on throwing VirtualMachineError, or maybe an agent could transform the VME constructor, but this doesn't feel "production-ready". > > Thank you for any advice! > Steven Schlansker > From gcao at openjdk.org Tue Aug 6 02:19:39 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 6 Aug 2024 02:19:39 GMT Subject: RFR: 8337788: RISC-V: Cleanup code in MacroAssembler::reserved_stack_check In-Reply-To: References: Message-ID: On Sun, 4 Aug 2024 02:43:26 GMT, Gui Cao wrote: > Hi, > In the MacroAssembler::reserved_stack_check() function: > > RuntimeAddress target(StubRoutines::throw_delayed_StackOverflowError_entry()); > relocate(target.rspec(), [&] { > int32_t offset; > movptr(t0, target.target(), offset); > jr(t0, offset); > }); > > can be simplified to: > > la(t0, RuntimeAddress(StubRoutines::throw_delayed_StackOverflowError_entry())); > jr(t0); > > > In addition, the code formatting has been modified to remove the extra spaces before the code. > > Please take a look and have some reviews. Thanks a lot. > > ### Testing > - [x] Run tier1-3 tests on SOPHON SG2042 (fastdebug) Thanks all for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20458#issuecomment-2270235538 From duke at openjdk.org Tue Aug 6 02:19:39 2024 From: duke at openjdk.org (duke) Date: Tue, 6 Aug 2024 02:19:39 GMT Subject: RFR: 8337788: RISC-V: Cleanup code in MacroAssembler::reserved_stack_check In-Reply-To: References: Message-ID: <1DV4mglSbS2Y8_XRnwfVOpT5BXM51BSxR6HCDZt65WI=.8e65aa0c-8028-41b5-9e44-d2688bc66972@github.com> On Sun, 4 Aug 2024 02:43:26 GMT, Gui Cao wrote: > Hi, > In the MacroAssembler::reserved_stack_check() function: > > RuntimeAddress target(StubRoutines::throw_delayed_StackOverflowError_entry()); > relocate(target.rspec(), [&] { > int32_t offset; > movptr(t0, target.target(), offset); > jr(t0, offset); > }); > > can be simplified to: > > la(t0, RuntimeAddress(StubRoutines::throw_delayed_StackOverflowError_entry())); > jr(t0); > > > In addition, the code formatting has been modified to remove the extra spaces before the code. > > Please take a look and have some reviews. Thanks a lot. > > ### Testing > - [x] Run tier1-3 tests on SOPHON SG2042 (fastdebug) @zifeihan Your change (at version d08100fadb5634352b000040fb7024a86e884b69) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20458#issuecomment-2270236869 From gcao at openjdk.org Tue Aug 6 02:19:40 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 6 Aug 2024 02:19:40 GMT Subject: Integrated: 8337788: RISC-V: Cleanup code in MacroAssembler::reserved_stack_check In-Reply-To: References: Message-ID: On Sun, 4 Aug 2024 02:43:26 GMT, Gui Cao wrote: > Hi, > In the MacroAssembler::reserved_stack_check() function: > > RuntimeAddress target(StubRoutines::throw_delayed_StackOverflowError_entry()); > relocate(target.rspec(), [&] { > int32_t offset; > movptr(t0, target.target(), offset); > jr(t0, offset); > }); > > can be simplified to: > > la(t0, RuntimeAddress(StubRoutines::throw_delayed_StackOverflowError_entry())); > jr(t0); > > > In addition, the code formatting has been modified to remove the extra spaces before the code. > > Please take a look and have some reviews. Thanks a lot. > > ### Testing > - [x] Run tier1-3 tests on SOPHON SG2042 (fastdebug) This pull request has now been integrated. Changeset: 73718fb8 Author: Gui Cao Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/73718fb8a3570023e7855137eb008f78b8a1e8ce Stats: 19 lines in 1 file changed: 0 ins; 4 del; 15 mod 8337788: RISC-V: Cleanup code in MacroAssembler::reserved_stack_check Reviewed-by: fyang, rehn, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/20458 From jwtang at openjdk.org Tue Aug 6 02:39:23 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Tue, 6 Aug 2024 02:39:23 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v12] In-Reply-To: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> Message-ID: > I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing. Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision: fix the test condition to avoid NoClassDefFoundError ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20373/files - new: https://git.openjdk.org/jdk/pull/20373/files/e81db430..05888a25 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=10-11 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20373/head:pull/20373 PR: https://git.openjdk.org/jdk/pull/20373 From fyang at openjdk.org Tue Aug 6 03:16:39 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 6 Aug 2024 03:16:39 GMT Subject: RFR: 8337797: Additional ExternalAddress cleanup In-Reply-To: References: Message-ID: <7nCKurwSX1t-G0BYRnAMhlcPTjd3LZkZeylP5W2D7Cc=.0f56140c-a8ff-47d0-b724-c25d18a504e3@github.com> On Mon, 5 Aug 2024 18:11:55 GMT, Vladimir Kozlov wrote: > While working on [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519) I noticed few ExternalAddress cases I missed in [JDK-8337396](https://bugs.openjdk.org/browse/JDK-8337396) changes. > > I also added asserts on x86 to catch using ExternalAddress for jumps and calls instructions and caught few additional cases (Windows and arraycopy cases). > > Tested tier1-5,hs-stress,hs-comp Hi Vladimir, Could you please apply the following riscv-specific addon change? This achieves the same purpose but much cleaner. diff --git a/src/hotspot/cpu/riscv/jniFastGetField_riscv.cpp b/src/hotspot/cpu/riscv/jniFastGetField_riscv.cpp index 91e3a707efa..f7d702c6310 100644 --- a/src/hotspot/cpu/riscv/jniFastGetField_riscv.cpp +++ b/src/hotspot/cpu/riscv/jniFastGetField_riscv.cpp @@ -173,12 +173,7 @@ address JNI_FastGetField::generate_fast_get_int_field0(BasicType type) { { __ enter(); - RuntimeAddress target(slow_case_addr); - __ relocate(target.rspec(), [&] { - int32_t offset; - __ la(t0, target.target(), offset); - __ jalr(t0, offset); - }); + __ rt_call(slow_case_addr); __ leave(); __ ret(); } ------------- PR Comment: https://git.openjdk.org/jdk/pull/20470#issuecomment-2270301871 From dholmes at openjdk.org Tue Aug 6 03:35:30 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 6 Aug 2024 03:35:30 GMT Subject: RFR: 8332120: Potential compilation failure in istream.cpp:205 - loss of data on conversion In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 16:52:23 GMT, Coleen Phillimore wrote: > This field _line_ending isn't used, and I'm not sure how this even works. So I deleted it. > Tested with tier1 on many Oracle supported platforms. Okay this code is simpler. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20427#pullrequestreview-2220205155 From dholmes at openjdk.org Tue Aug 6 03:53:30 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 6 Aug 2024 03:53:30 GMT Subject: RFR: 8337782: Use THROW_NULL instead of THROW_0 in pointer contexts in prims code In-Reply-To: References: Message-ID: On Sat, 3 Aug 2024 23:38:27 GMT, Kim Barrett wrote: > Please review this trivial change that, in places in prims code where a > pointer value is required, replaces uses of THROW_0 and THROW_MSG_0 with > THROW_NULL and THROW_MSG_NULL respectively. > > Testing: mach5 tier1 Okay. Thanks for cleaning up. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20456#pullrequestreview-2220221708 From amitkumar at openjdk.org Tue Aug 6 04:29:33 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 6 Aug 2024 04:29:33 GMT Subject: RFR: 8327652: S390x: Implements SLP support [v7] In-Reply-To: References: <1SGOMkL6TvnkQDt1WkH3FbPVrbCUOD_cA3e23QK5-jg=.b9b066f9-d50a-4710-a8a5-76c2d9b83236@github.com> Message-ID: <-TvJry6QBBCwCANTG4QW-plFQPEIyfd3zMEI8C18x_8=.25c9d878-74b4-483b-b79c-fcaa720fdbfa@github.com> On Wed, 27 Mar 2024 16:37:48 GMT, Lutz Schmidt wrote: >>> > I think we shouldn't allow `MacroAssembler::string_compress(...)` and `MacroAssembler::string_expand(...)` to use vector registers without specifying this effect. That can be solved by adding a KILL effect for all vector registers which are killed. Alternatively, we could revert to the old implementation before [d5adf1d](https://github.com/openjdk/jdk/commit/d5adf1df921e5ecb8ff4c7e4349a12660069ed28) which doesn't use vector registers. The benefit was not huge if I remember correctly. >>> >>> Agreed. My proposed circumvention is a too dirty hack. >>> >>> I would prefer to add KILL effects to the match rules. I believe the vector implementation had a substantial performance effect. Unfortunately, I can't find any records of performance results from back then. >>> >>> Reverting the commit @TheRealMDoerr mentioned is not possible. It contains many additions that may have been used by unrelated code. The vector code is well encapsulated and could be removed by deleting the >>> >>> ``` >>> if (VM_Version::has_VectorFacility()) { >>> } >>> ``` >>> >>> block. I would not like that, though. >> >> I didn't mean to back out the whole commit. Only the implementation of string_compress and string_expand. The benefit of the vector version certainly depends on what kind of strings are used. (Effect may also be negative in some cases.) I think that classical benchmarks didn't show a significant performance impact, but I don't remember exactly, either. I'll leave the s390 maintainers free to decide if they want to adapt the vector version or go for the short and simple implementation. > >> ... I think that classical benchmarks didn't show a significant performance impact, but I don't remember exactly, either. ... > > Yes, you need "long" strings (>= 32 characters) for the vector code to kick in. Once it kicks in, there is a performance improvement. Hard to say which applications might benefit. For too short strings I do not expect a visible performance penalty. It's just a shift and a branch. > > Let the maintainers decide. Hi @RealLucy, @TheRealMDoerr Can we disable the vector part in the `string_{compress, inflate, const_inflate}` and integrate this for now. I am working on static stub part and need change from s390.ad file. Moreover I need these changes for Late barrier extension changes in the save live register class implementation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18162#issuecomment-2270356558 From jpai at openjdk.org Tue Aug 6 04:39:33 2024 From: jpai at openjdk.org (Jaikiran Pai) Date: Tue, 6 Aug 2024 04:39:33 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v11] In-Reply-To: References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> Message-ID: <2vzlYj-B2DTKSdc_9UA6qArZTQCsARWsA4IDQeGT98o=.af817146-78ec-4fa8-83e6-28aa9cec7170@github.com> On Mon, 5 Aug 2024 10:34:56 GMT, Jaikiran Pai wrote: >> Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision: >> >> change the format of the testcase file > > test/hotspot/jtreg/serviceability/jvmti/vthread/TestPinCaseWithCFLH/TestPinCaseWithCFLH.java line 37: > >> 35: * @modules java.base/java.lang:+open >> 36: * @compile TestPinCaseWithCFLH.java >> 37: * @run driver jdk.test.lib.util.JavaAgentBuilder > > Hello @jia-wei-tang, does adding an additional `@build jdk.test.lib.thread.VThreadPinner` line before the `@run` line help fix the `NoClassDefFoundError` you note in your comment? Hello @jia-wei-tang, I see that a `@build jdk.test.lib.util.JavaAgentBuilder` was added in the latest update to the PR. Did that help solve the NoClassDefFoundError you were running into when running those tests? I find it surprising that this specific class is required in the `@build` declaration, since that class itself is part of the `@run` directive. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1704897105 From kvn at openjdk.org Tue Aug 6 04:48:04 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 6 Aug 2024 04:48:04 GMT Subject: RFR: 8337797: Additional ExternalAddress cleanup In-Reply-To: <7nCKurwSX1t-G0BYRnAMhlcPTjd3LZkZeylP5W2D7Cc=.0f56140c-a8ff-47d0-b724-c25d18a504e3@github.com> References: <7nCKurwSX1t-G0BYRnAMhlcPTjd3LZkZeylP5W2D7Cc=.0f56140c-a8ff-47d0-b724-c25d18a504e3@github.com> Message-ID: On Tue, 6 Aug 2024 03:13:37 GMT, Fei Yang wrote: > Hi Vladimir, Could you please apply the following riscv-specific addon change? Done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20470#issuecomment-2270371603 From kvn at openjdk.org Tue Aug 6 04:48:03 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 6 Aug 2024 04:48:03 GMT Subject: RFR: 8337797: Additional ExternalAddress cleanup [v2] In-Reply-To: References: Message-ID: > While working on [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519) I noticed few ExternalAddress cases I missed in [JDK-8337396](https://bugs.openjdk.org/browse/JDK-8337396) changes. > > I also added asserts on x86 to catch using ExternalAddress for jumps and calls instructions and caught few additional cases (Windows and arraycopy cases). > > Tested tier1-5,hs-stress,hs-comp Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: riscv update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20470/files - new: https://git.openjdk.org/jdk/pull/20470/files/0d996c0a..df0d257d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20470&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20470&range=00-01 Stats: 6 lines in 1 file changed: 0 ins; 5 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20470.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20470/head:pull/20470 PR: https://git.openjdk.org/jdk/pull/20470 From kbarrett at openjdk.org Tue Aug 6 05:15:31 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 6 Aug 2024 05:15:31 GMT Subject: RFR: 8337782: Use THROW_NULL instead of THROW_0 in pointer contexts in prims code In-Reply-To: <2ZvVwktD9rP9NetmIXBgcGJzGXnqHzRuqsNsdRV5p9s=.529f1f4e-1466-4f61-88f1-62d91937c51f@github.com> References: <2ZvVwktD9rP9NetmIXBgcGJzGXnqHzRuqsNsdRV5p9s=.529f1f4e-1466-4f61-88f1-62d91937c51f@github.com> Message-ID: On Mon, 5 Aug 2024 14:43:58 GMT, Aleksey Shipilev wrote: >> Please review this trivial change that, in places in prims code where a >> pointer value is required, replaces uses of THROW_0 and THROW_MSG_0 with >> THROW_NULL and THROW_MSG_NULL respectively. >> >> Testing: mach5 tier1 > > Current changes look okay. I have not checked these caught all the instances of `THROW_MSG_0` in those files, did you check that somehow? Thanks for reviews @shipilev and @dholmes-ora ------------- PR Comment: https://git.openjdk.org/jdk/pull/20456#issuecomment-2270399408 From kbarrett at openjdk.org Tue Aug 6 05:37:34 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 6 Aug 2024 05:37:34 GMT Subject: Integrated: 8337782: Use THROW_NULL instead of THROW_0 in pointer contexts in prims code In-Reply-To: References: Message-ID: On Sat, 3 Aug 2024 23:38:27 GMT, Kim Barrett wrote: > Please review this trivial change that, in places in prims code where a > pointer value is required, replaces uses of THROW_0 and THROW_MSG_0 with > THROW_NULL and THROW_MSG_NULL respectively. > > Testing: mach5 tier1 This pull request has now been integrated. Changeset: 20575949 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/20575949612a750a428316635715737183a2d58c Stats: 47 lines in 6 files changed: 0 ins; 0 del; 47 mod 8337782: Use THROW_NULL instead of THROW_0 in pointer contexts in prims code Reviewed-by: shade, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/20456 From jwtang at openjdk.org Tue Aug 6 06:13:32 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Tue, 6 Aug 2024 06:13:32 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v11] In-Reply-To: <2vzlYj-B2DTKSdc_9UA6qArZTQCsARWsA4IDQeGT98o=.af817146-78ec-4fa8-83e6-28aa9cec7170@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <2vzlYj-B2DTKSdc_9UA6qArZTQCsARWsA4IDQeGT98o=.af817146-78ec-4fa8-83e6-28aa9cec7170@github.com> Message-ID: On Tue, 6 Aug 2024 04:36:38 GMT, Jaikiran Pai wrote: > Hello @jia-wei-tang, I see that a `@build jdk.test.lib.util.JavaAgentBuilder` was added in the latest update to the PR. Did that help solve the NoClassDefFoundError you were running into when running those tests? I find it surprising that this specific class is required in the `@build` declaration, since that class itself is part of the `@run` directive. It cannot solve the NoClassDefFoundError. Should I try to add `@build jdk.test.lib.thread.VThreadPinner`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1704961370 From jpai at openjdk.org Tue Aug 6 06:17:39 2024 From: jpai at openjdk.org (Jaikiran Pai) Date: Tue, 6 Aug 2024 06:17:39 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v11] In-Reply-To: References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <2vzlYj-B2DTKSdc_9UA6qArZTQCsARWsA4IDQeGT98o=.af817146-78ec-4fa8-83e6-28aa9cec7170@github.com> Message-ID: <2lxO3l-IDFl1Frg9Xs4MhybvCAlzbAzHRRGlmSqN3m4=.35c98147-f58b-43e8-8458-e558b48f031b@github.com> On Tue, 6 Aug 2024 06:10:38 GMT, Jiawei Tang wrote: > Should I try to add @build jdk.test.lib.thread.VThreadPinner? Yes, please give that a try. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1704964895 From jwtang at openjdk.org Tue Aug 6 06:31:09 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Tue, 6 Aug 2024 06:31:09 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v13] In-Reply-To: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> Message-ID: <_Bt7S4igf_xPLO_BILhMZtOKY6xu2TuxZzuACGnVwCE=.24672498-aba7-470e-a759-0598dd03a298@github.com> > I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing. Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision: fix the test condition to avoid NoClassDefFoundError ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20373/files - new: https://git.openjdk.org/jdk/pull/20373/files/05888a25..a30b11f4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=11-12 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20373/head:pull/20373 PR: https://git.openjdk.org/jdk/pull/20373 From jwtang at openjdk.org Tue Aug 6 07:55:34 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Tue, 6 Aug 2024 07:55:34 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v11] In-Reply-To: <2lxO3l-IDFl1Frg9Xs4MhybvCAlzbAzHRRGlmSqN3m4=.35c98147-f58b-43e8-8458-e558b48f031b@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <2vzlYj-B2DTKSdc_9UA6qArZTQCsARWsA4IDQeGT98o=.af817146-78ec-4fa8-83e6-28aa9cec7170@github.com> <2lxO3l-IDFl1Frg9Xs4MhybvCAlzbAzHRRGlmSqN3m4=.35c98147-f58b-43e8-8458-e558b48f031b@github.com> Message-ID: On Tue, 6 Aug 2024 06:14:45 GMT, Jaikiran Pai wrote: >>> Hello @jia-wei-tang, I see that a `@build jdk.test.lib.util.JavaAgentBuilder` was added in the latest update to the PR. Did that help solve the NoClassDefFoundError you were running into when running those tests? I find it surprising that this specific class is required in the `@build` declaration, since that class itself is part of the `@run` directive. >> >> It cannot solve the NoClassDefFoundError. Should I try to add `@build jdk.test.lib.thread.VThreadPinner`? > >> Should I try to add @build jdk.test.lib.thread.VThreadPinner? > > Yes, please give that a try. It cannot solve the NoClassDefFoundError. Besides, this new testcase is not included in linux-x86 / test (hs/tier1 serviceability). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1705078115 From jpai at openjdk.org Tue Aug 6 08:01:32 2024 From: jpai at openjdk.org (Jaikiran Pai) Date: Tue, 6 Aug 2024 08:01:32 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v11] In-Reply-To: References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <2vzlYj-B2DTKSdc_9UA6qArZTQCsARWsA4IDQeGT98o=.af817146-78ec-4fa8-83e6-28aa9cec7170@github.com> <2lxO3l-IDFl1Frg9Xs4MhybvCAlzbAzHRRGlmSqN3m4=.35c98147-f58b-43e8-8458-e558b48f031b@github.com> Message-ID: On Tue, 6 Aug 2024 07:53:12 GMT, Jiawei Tang wrote: >>> Should I try to add @build jdk.test.lib.thread.VThreadPinner? >> >> Yes, please give that a try. > > It cannot solve the NoClassDefFoundError. Besides, this new testcase is not included in linux-x86 / test (hs/tier1 serviceability). I see that the `NoClassDefFoundError` failure that you are mentioning is actually being reported even in the GitHub Actions job failures. It does look odd (although might be a pre-known jtreg issue). It will need a bit of investigation to see what's going on. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1705086644 From coleenp at openjdk.org Tue Aug 6 11:37:38 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 6 Aug 2024 11:37:38 GMT Subject: RFR: 8332120: Potential compilation failure in istream.cpp:205 - loss of data on conversion In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 16:52:23 GMT, Coleen Phillimore wrote: > This field _line_ending isn't used, and I'm not sure how this even works. So I deleted it. > Tested with tier1 on many Oracle supported platforms. Thanks Ioi and David. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20427#issuecomment-2271072906 From coleenp at openjdk.org Tue Aug 6 11:37:38 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 6 Aug 2024 11:37:38 GMT Subject: Integrated: 8332120: Potential compilation failure in istream.cpp:205 - loss of data on conversion In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 16:52:23 GMT, Coleen Phillimore wrote: > This field _line_ending isn't used, and I'm not sure how this even works. So I deleted it. > Tested with tier1 on many Oracle supported platforms. This pull request has now been integrated. Changeset: 1348ece6 Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/1348ece6df7b460501931533c238e819995a2086 Stats: 9 lines in 2 files changed: 0 ins; 8 del; 1 mod 8332120: Potential compilation failure in istream.cpp:205 - loss of data on conversion Reviewed-by: dholmes, iklam ------------- PR: https://git.openjdk.org/jdk/pull/20427 From adinn at openjdk.org Tue Aug 6 13:25:38 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 6 Aug 2024 13:25:38 GMT Subject: Integrated: 8337654: Relocate uncommon trap stub from SharedRuntime to OptoRuntime In-Reply-To: References: Message-ID: <6uCxoYqKUWOO43q8MWrpd-WirpHwxuh1vuNUA-R7zT4=.58067e34-eda0-4e51-97de-16e91354de05@github.com> On Thu, 1 Aug 2024 11:36:57 GMT, Andrew Dinn wrote: > Reorganization of generation and management code for C2-specific blob so that it is, as far as possible, under the scope of class OptoRuntime with an implementation located in C2-specific source files. This pull request has now been integrated. Changeset: ab509f1b Author: Andrew Dinn URL: https://git.openjdk.org/jdk/commit/ab509f1b98329b1624a3111e226b640ee76f5969 Stats: 2734 lines in 22 files changed: 1363 ins; 1356 del; 15 mod 8337654: Relocate uncommon trap stub from SharedRuntime to OptoRuntime Reviewed-by: kvn, vlivanov, fyang ------------- PR: https://git.openjdk.org/jdk/pull/20417 From gziemski at openjdk.org Tue Aug 6 13:42:04 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 6 Aug 2024 13:42:04 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag, use consistent name for the argument Message-ID: Please review this NMT cleanup change that mainly renames `MEMFLAGS` to `MemType`, as well as cleanups the related arguments names. This avoids the inconsistencies and confusion currently on display in our code, where we use `flag`, `flags`, `mem_flag`, `memflags` for the same argument, even in the same file in related APIs. I made sure to change the related copyright years in all touched files. This change is rather simple - we are just renaming names, but it touches 103 files unfortunately. I did not rename any `NMTUtil::flag` (ex. `NMTUtil::flag_to_index()` -> `NMTUtil::type_to_index()`) to keep the amount of code changes smaller. Those APIs should be renamed, but I filed a followup issue for this [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) There are also further cleanup opportunities here (ex. renaming internal fields from `_memflags` to `_mem_type`), but again these can be addressed in followup issues. ------------- Commit messages: - undo incorrect rename - undo incorrect rename - more flag cleanup - rename MEMFLAGS to MemType and any related arguments that make sense Changes: https://git.openjdk.org/jdk/pull/20472/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20472&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337563 Stats: 850 lines in 103 files changed: 1 ins; 0 del; 849 mod Patch: https://git.openjdk.org/jdk/pull/20472.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20472/head:pull/20472 PR: https://git.openjdk.org/jdk/pull/20472 From adinn at openjdk.org Tue Aug 6 13:47:36 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 6 Aug 2024 13:47:36 GMT Subject: RFR: 8337797: Additional ExternalAddress cleanup [v2] In-Reply-To: References: Message-ID: On Mon, 5 Aug 2024 18:15:03 GMT, Vladimir Kozlov wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> riscv update > > src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 845: > >> 843: Label& L_ok) { >> 844: Label L_fail; >> 845: __ lea(temp_reg, AddressLiteral(code_start, relocInfo::none)); > > No need relocation here. This check code is generated for i2c adapters which we don't relocate - we put them into reserved code buffer. What about when try to save and restore an adapter in Leyden? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20470#discussion_r1705568710 From adinn at openjdk.org Tue Aug 6 13:52:31 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 6 Aug 2024 13:52:31 GMT Subject: RFR: 8337797: Additional ExternalAddress cleanup [v2] In-Reply-To: References: Message-ID: On Tue, 6 Aug 2024 04:48:03 GMT, Vladimir Kozlov wrote: >> While working on [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519) I noticed few ExternalAddress cases I missed in [JDK-8337396](https://bugs.openjdk.org/browse/JDK-8337396) changes. >> >> I also added asserts on x86 to catch using ExternalAddress for jumps and calls instructions and caught few additional cases (Windows and arraycopy cases). >> >> Tested tier1-5,hs-stress,hs-comp > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > riscv update src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 218: > 216: __ lea(end_from, Address(from, count, sf, 0)); > 217: if (NOLp == nullptr) { > 218: RuntimeAddress no_overlap(no_overlap_target); I think this is actually a jump within the same buffer. Can we use an AddressLiteral here with reloc_none and rely on this being encoded with a PC-relative jump? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20470#discussion_r1705576489 From tschatzl at openjdk.org Tue Aug 6 15:28:33 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 6 Aug 2024 15:28:33 GMT Subject: RFR: 8337786: Fix simple -Wzero-as-null-pointer-constant warnings in aarch64 code In-Reply-To: References: Message-ID: On Sat, 3 Aug 2024 23:28:33 GMT, Kim Barrett wrote: > Please review this trivial change that replaces some uses of literal 0 as a > null pointer constant in aarch64 code to instead use nullptr. > > Testing: mach5 tier1 lgtm ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20454#pullrequestreview-2221634125 From kvn at openjdk.org Tue Aug 6 16:45:31 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 6 Aug 2024 16:45:31 GMT Subject: RFR: 8337797: Additional ExternalAddress cleanup [v2] In-Reply-To: References: Message-ID: On Tue, 6 Aug 2024 13:44:32 GMT, Andrew Dinn wrote: >> src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 845: >> >>> 843: Label& L_ok) { >>> 844: Label L_fail; >>> 845: __ lea(temp_reg, AddressLiteral(code_start, relocInfo::none)); >> >> No need relocation here. This check code is generated for i2c adapters which we don't relocate - we put them into reserved code buffer. > > What about when try to save and restore an adapter in Leyden? Short answer: we should exclude these checks when we cache adapters in Leyden. Long answer: 1. These checks are enabled only in debug VM and only on x86. 2. We never caught any issues with these checks since JDK 9 (I found only one issue [JDK-8023465](https://bugs.openjdk.org/browse/JDK-8023465)). 3. Enabling it create issues because we don't add an adapters to hash table until all stubs are generated [contains_all_checks](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2637) which which creates duplicated adapters for the same method's signature. 4. These checks may miss an issue because they pass if adapter called from stubs but stubs may be called from compiled call. I was actually considering removing these checks but decided to keep them for now. But I don't think we need them in cached adapters in Leyden. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20470#discussion_r1705839108 From kvn at openjdk.org Tue Aug 6 17:00:35 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 6 Aug 2024 17:00:35 GMT Subject: RFR: 8337797: Additional ExternalAddress cleanup [v2] In-Reply-To: References: Message-ID: On Tue, 6 Aug 2024 13:48:57 GMT, Andrew Dinn wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> riscv update > > src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 218: > >> 216: __ lea(end_from, Address(from, count, sf, 0)); >> 217: if (NOLp == nullptr) { >> 218: RuntimeAddress no_overlap(no_overlap_target); > > I think this is actually a jump within the same buffer. Can we use an AddressLiteral here with reloc_none and rely on this being encoded with a PC-relative jump? `relic_none` is treated as unreachable and will force using lea(): https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L12915 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20470#discussion_r1705853432 From kvn at openjdk.org Tue Aug 6 17:00:36 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 6 Aug 2024 17:00:36 GMT Subject: RFR: 8337797: Additional ExternalAddress cleanup [v2] In-Reply-To: References: Message-ID: On Tue, 6 Aug 2024 16:55:30 GMT, Vladimir Kozlov wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 218: >> >>> 216: __ lea(end_from, Address(from, count, sf, 0)); >>> 217: if (NOLp == nullptr) { >>> 218: RuntimeAddress no_overlap(no_overlap_target); >> >> I think this is actually a jump within the same buffer. Can we use an AddressLiteral here with reloc_none and rely on this being encoded with a PC-relative jump? > > `relic_none` is treated as unreachable and will force using lea(): > https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L12915 And `runtime_call_type' will be reachable if inside CodeCache: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L12897 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20470#discussion_r1705856524 From gziemski at openjdk.org Tue Aug 6 18:11:36 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 6 Aug 2024 18:11:36 GMT Subject: Withdrawn: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Mon, 5 Aug 2024 19:07:08 GMT, Gerard Ziemski wrote: > Please review this NMT cleanup change that renames `MEMFLAGS` to `MemType`. > > To keep this change down I decided not to do any other cleanups as part of this fix. They will be handled in followup issues, such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/20472 From sspitsyn at openjdk.org Tue Aug 6 18:14:31 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 6 Aug 2024 18:14:31 GMT Subject: RFR: 8336846: assert(state->get_thread() == jt) failed: handshake unsafe conditions [v2] In-Reply-To: References: <8AOxsfURVgBRGlS8WGtZ1wubjuUozfK-LcLkf9BGVoQ=.bb20a5eb-8821-48cd-bb09-0dfa8870f6f3@github.com> Message-ID: On Mon, 5 Aug 2024 18:02:40 GMT, Patricio Chilano Mateo wrote: > Thanks Serguei. You could also define it relative to JvmtiThreadState_lock with MUTEX_DEFL to make the dependency clear. Good suggestion, thanks. Thank you for review, Patricio. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20413#issuecomment-2271866061 From aph at openjdk.org Tue Aug 6 19:01:07 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 6 Aug 2024 19:01:07 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v17] In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: > This patch expands the use of a hash table for secondary superclasses > to the interpreter, C1, and runtime. It also adds a C2 implementation > of hashed lookup in cases where the superclass isn't known at compile > time. > > HotSpot shared runtime > ---------------------- > > Building hashed secondary tables is now unconditional. It takes very > little time, and now that the shared runtime always has the tables, it > might as well take advantage of them. The shared code is easier to > follow now, I think. > > There might be a performance issue with x86-64 in that we build > HotSpot for a default x86-64 target that does not support popcount. > This means that HotSpot C++ runtime on x86 always uses a software > emulation for popcount, even though the vast majority of machines made > for the past 20 years can do popcount in a single instruction. It > wouldn't be terribly hard to do something about that. > > Having said that, the software popcount is really not bad. > > x86 > --- > > x86 is rather tricky, because we still support > `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as > well as 32- and 64-bit ports. There's some further complication in > that only `RCX` can be used as a shift count, so there's some register > shuffling to do. All of this makes the logic in macroAssembler_x86.cpp > rather gnarly, with multiple levels of conditionals at compile time > and runtime. > > AArch64 > ------- > > AArch64 is considerably more straightforward. We always have a > popcount instruction and (thankfully) no 32-bit code to worry about. > > Generally > --------- > > I would dearly love simply to rip out the "old" secondary supers cache > support, but I've left it in just in case someone has a performance > regression. > > The versions of `MacroAssembler::lookup_secondary_supers_table` that > work with variable superclasses don't take a fixed set of temp > registers, and neither do they call out to to a slow path subroutine. > Instead, the slow patch is expanded inline. > > I don't think this is necessarily bad. Apart from the very rare cases > where C2 can't determine the superclass to search for at compile time, > this code is only used for generating stubs, and it seemed to me > ridiculous to have stubs calling other stubs. > > I've followed the guidance from @iwanowww not to obsess too much about > the performance of C1-compiled secondary supers lookups, and to prefer > simplicity over absolute performance. Nonetheless, this is a > complicated patch that touches many areas. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19989/files - new: https://git.openjdk.org/jdk/pull/19989/files/eb739933..51c68a09 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=15-16 Stats: 11 lines in 2 files changed: 3 ins; 2 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19989.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989 PR: https://git.openjdk.org/jdk/pull/19989 From kbarrett at openjdk.org Tue Aug 6 20:18:35 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 6 Aug 2024 20:18:35 GMT Subject: RFR: 8337786: Fix simple -Wzero-as-null-pointer-constant warnings in aarch64 code In-Reply-To: References: Message-ID: On Mon, 5 Aug 2024 14:48:55 GMT, Aleksey Shipilev wrote: >> Please review this trivial change that replaces some uses of literal 0 as a >> null pointer constant in aarch64 code to instead use nullptr. >> >> Testing: mach5 tier1 > > Looks fine. Thanks for reviews @shipilev and @tschatzl ------------- PR Comment: https://git.openjdk.org/jdk/pull/20454#issuecomment-2272070131 From kbarrett at openjdk.org Tue Aug 6 20:18:35 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 6 Aug 2024 20:18:35 GMT Subject: Integrated: 8337786: Fix simple -Wzero-as-null-pointer-constant warnings in aarch64 code In-Reply-To: References: Message-ID: On Sat, 3 Aug 2024 23:28:33 GMT, Kim Barrett wrote: > Please review this trivial change that replaces some uses of literal 0 as a > null pointer constant in aarch64 code to instead use nullptr. > > Testing: mach5 tier1 This pull request has now been integrated. Changeset: 22a34213 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/22a3421379162bb302fb8e5ccc315e53d95b6245 Stats: 11 lines in 6 files changed: 0 ins; 0 del; 11 mod 8337786: Fix simple -Wzero-as-null-pointer-constant warnings in aarch64 code Reviewed-by: shade, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/20454 From sspitsyn at openjdk.org Tue Aug 6 21:06:45 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 6 Aug 2024 21:06:45 GMT Subject: RFR: 8336846: assert(state->get_thread() == jt) failed: handshake unsafe conditions [v4] In-Reply-To: References: Message-ID: > The JVMTI Watch Field functions do not disable VTMS transitions with the `JvmtiVTMSTransitionDisabler`: > - `SetFieldAccessWatch()` > - `ClearFieldAccessWatch()` > - `SetFieldModificationWatch()` > - `ClearFieldModificationWatch()` > so in the `recompute_enabled()` we could see that a vthread is mounted, but in the `EnterInterpOnlyModeClosure` handshake the thread could have been unmounted already. This is a root cause of failures with this assert. > > The fix is to disable transitions in the `JvmtiEventControllerPrivate::change_field_watch()` function. > > Testing: > - TBD: submit mach5 tiers 1-6 Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: use MUTEX_DEFL instead of MUTEX_DEFN to define JvmtiVTMSTransition_lock ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20413/files - new: https://git.openjdk.org/jdk/pull/20413/files/3d166446..29f807de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20413&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20413&range=02-03 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20413/head:pull/20413 PR: https://git.openjdk.org/jdk/pull/20413 From aph at openjdk.org Tue Aug 6 23:41:57 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 6 Aug 2024 23:41:57 GMT Subject: RFR: 8337958: Out-of-bounds array access in secondary_super_cache Message-ID: The fix for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), secondary_super_cache does not scale well, has a rare (and benign) out-of-bounds array access. While this bug is very unlikely ever to cause a failure, it should be fixed. ------------- Commit messages: - JDK-8337958: Out-of-bounds array access in secondary_super_cache Changes: https://git.openjdk.org/jdk/pull/20483/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20483&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337958 Stats: 11 lines in 3 files changed: 1 ins; 1 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/20483.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20483/head:pull/20483 PR: https://git.openjdk.org/jdk/pull/20483 From dholmes at openjdk.org Wed Aug 7 01:15:40 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 7 Aug 2024 01:15:40 GMT Subject: RFR: 8336846: assert(state->get_thread() == jt) failed: handshake unsafe conditions [v4] In-Reply-To: References: Message-ID: On Tue, 6 Aug 2024 21:06:45 GMT, Serguei Spitsyn wrote: >> The JVMTI Watch Field functions do not disable VTMS transitions with the `JvmtiVTMSTransitionDisabler`: >> - `SetFieldAccessWatch()` >> - `ClearFieldAccessWatch()` >> - `SetFieldModificationWatch()` >> - `ClearFieldModificationWatch()` >> so in the `recompute_enabled()` we could see that a vthread is mounted, but in the `EnterInterpOnlyModeClosure` handshake the thread could have been unmounted already. This is a root cause of failures with this assert. >> >> The fix is to disable transitions in the `JvmtiEventControllerPrivate::change_field_watch()` function. >> >> Testing: >> - TBD: submit mach5 tiers 1-6 > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > use MUTEX_DEFL instead of MUTEX_DEFN to define JvmtiVTMSTransition_lock Looks good. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20413#pullrequestreview-2222462585 From vlivanov at openjdk.org Wed Aug 7 01:58:42 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 7 Aug 2024 01:58:42 GMT Subject: RFR: 8337958: Out-of-bounds array access in secondary_super_cache In-Reply-To: References: Message-ID: On Tue, 6 Aug 2024 23:35:55 GMT, Andrew Haley wrote: > The fix for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), secondary_super_cache does not scale well, has a rare (and benign) out-of-bounds array access. While this bug is very unlikely ever to cause a failure, it should be fixed. The fix looks good. I submitted it for testing. src/hotspot/share/oops/klass.cpp line 347: > 345: } > 346: > 347: // Invariant: _secondary_supers.length >= population_count(_secondary_supers_bitmap) It makes sense to assert the invariant in `Klass::set_secondary_supers()` (and, probably, `Klass::restore_unshareable_info()` for a shared klass loaded from CDS archive). ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20483#pullrequestreview-2222493458 PR Review Comment: https://git.openjdk.org/jdk/pull/20483#discussion_r1706284693 From sspitsyn at openjdk.org Wed Aug 7 07:51:33 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 7 Aug 2024 07:51:33 GMT Subject: RFR: 8336846: assert(state->get_thread() == jt) failed: handshake unsafe conditions [v4] In-Reply-To: References: Message-ID: On Tue, 6 Aug 2024 21:06:45 GMT, Serguei Spitsyn wrote: >> The JVMTI Watch Field functions do not disable VTMS transitions with the `JvmtiVTMSTransitionDisabler`: >> - `SetFieldAccessWatch()` >> - `ClearFieldAccessWatch()` >> - `SetFieldModificationWatch()` >> - `ClearFieldModificationWatch()` >> so in the `recompute_enabled()` we could see that a vthread is mounted, but in the `EnterInterpOnlyModeClosure` handshake the thread could have been unmounted already. This is a root cause of failures with this assert. >> >> The fix is to disable transitions in the `JvmtiEventControllerPrivate::change_field_watch()` function. >> >> Testing: >> - TBD: submit mach5 tiers 1-6 > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > use MUTEX_DEFL instead of MUTEX_DEFN to define JvmtiVTMSTransition_lock Thank you for review, David. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20413#issuecomment-2272840007 From jwtang at openjdk.org Wed Aug 7 08:01:15 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Wed, 7 Aug 2024 08:01:15 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v14] In-Reply-To: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> Message-ID: > I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing. Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision: fix the test condition to avoid NoClassDefFoundError ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20373/files - new: https://git.openjdk.org/jdk/pull/20373/files/a30b11f4..91e1fc9c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=12-13 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20373/head:pull/20373 PR: https://git.openjdk.org/jdk/pull/20373 From amenkov at openjdk.org Wed Aug 7 08:25:36 2024 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 7 Aug 2024 08:25:36 GMT Subject: RFR: 8336846: assert(state->get_thread() == jt) failed: handshake unsafe conditions [v4] In-Reply-To: References: Message-ID: On Tue, 6 Aug 2024 21:06:45 GMT, Serguei Spitsyn wrote: >> The JVMTI Watch Field functions do not disable VTMS transitions with the `JvmtiVTMSTransitionDisabler`: >> - `SetFieldAccessWatch()` >> - `ClearFieldAccessWatch()` >> - `SetFieldModificationWatch()` >> - `ClearFieldModificationWatch()` >> so in the `recompute_enabled()` we could see that a vthread is mounted, but in the `EnterInterpOnlyModeClosure` handshake the thread could have been unmounted already. This is a root cause of failures with this assert. >> >> The fix is to disable transitions in the `JvmtiEventControllerPrivate::change_field_watch()` function. >> >> Testing: >> - TBD: submit mach5 tiers 1-6 > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > use MUTEX_DEFL instead of MUTEX_DEFN to define JvmtiVTMSTransition_lock Marked as reviewed by amenkov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20413#pullrequestreview-2223285620 From jpai at openjdk.org Wed Aug 7 08:34:32 2024 From: jpai at openjdk.org (Jaikiran Pai) Date: Wed, 7 Aug 2024 08:34:32 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v11] In-Reply-To: References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <2vzlYj-B2DTKSdc_9UA6qArZTQCsARWsA4IDQeGT98o=.af817146-78ec-4fa8-83e6-28aa9cec7170@github.com> <2lxO3l-IDFl1Frg9Xs4MhybvCAlzbAzHRRGlmSqN3m4=.35c98147-f58b-43e8-8458-e558b48f031b@github.com> Message-ID: On Tue, 6 Aug 2024 07:59:12 GMT, Jaikiran Pai wrote: >> It cannot solve the NoClassDefFoundError. Besides, this new testcase is not included in linux-x86 / test (hs/tier1 serviceability). > > I see that the `NoClassDefFoundError` failure that you are mentioning is actually being reported even in the GitHub Actions job failures. It does look odd (although might be a pre-known jtreg issue). It will need a bit of investigation to see what's going on. I was able to reproduce this `NoClassDefFoundError` in this new test locally. I will take a look to see if I can figure out what's going on. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1706606412 From gcao at openjdk.org Wed Aug 7 08:53:31 2024 From: gcao at openjdk.org (Gui Cao) Date: Wed, 7 Aug 2024 08:53:31 GMT Subject: RFR: 8337958: Out-of-bounds array access in secondary_super_cache In-Reply-To: References: Message-ID: On Tue, 6 Aug 2024 23:35:55 GMT, Andrew Haley wrote: > The fix for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), secondary_super_cache does not scale well, has a rare (and benign) out-of-bounds array access. While this bug is very unlikely ever to cause a failure, it should be fixed. @theRealAph Hi, I have prepared a small change for riscv platform. Can we take a ride? Thanks. ``` diff diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp index e349eab3177..8bda4006992 100644 --- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp +++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp @@ -3973,8 +3973,8 @@ void MacroAssembler::lookup_secondary_supers_table_slow_path(Register r_super_kl // Check if bitmap is SECONDARY_SUPERS_BITMAP_FULL assert(Klass::SECONDARY_SUPERS_BITMAP_FULL == ~uintx(0), "Adjust this code"); - addi(t0, r_bitmap, (u1)1); - beqz(t0, L_bitmap_full); + subw(t0, r_array_length, (u1)(Klass::SECONDARY_SUPERS_TABLE_SIZE - 2)); + bgtz(t0, L_bitmap_full); // NB! Our caller has checked bits 0 and 1 in the bitmap. The // current slot (at secondary_supers[r_array_index]) has not yet ------------- PR Comment: https://git.openjdk.org/jdk/pull/20483#issuecomment-2272957393 From adinn at openjdk.org Wed Aug 7 09:12:31 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 7 Aug 2024 09:12:31 GMT Subject: RFR: 8337797: Additional ExternalAddress cleanup [v2] In-Reply-To: References: Message-ID: <_q2KTSkxJvsoky8ZThR5S_BaaPcMsQSXydbq7vKJahY=.ecc4c30f-6682-4d04-8ae2-48bba1c9b1da@github.com> On Tue, 6 Aug 2024 04:48:03 GMT, Vladimir Kozlov wrote: >> While working on [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519) I noticed few ExternalAddress cases I missed in [JDK-8337396](https://bugs.openjdk.org/browse/JDK-8337396) changes. >> >> I also added asserts on x86 to catch using ExternalAddress for jumps and calls instructions and caught few additional cases (Windows and arraycopy cases). >> >> Tested tier1-5,hs-stress,hs-comp > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > riscv update Marked as reviewed by adinn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20470#pullrequestreview-2223569184 From adinn at openjdk.org Wed Aug 7 09:12:32 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 7 Aug 2024 09:12:32 GMT Subject: RFR: 8337797: Additional ExternalAddress cleanup [v2] In-Reply-To: References: Message-ID: On Tue, 6 Aug 2024 16:43:19 GMT, Vladimir Kozlov wrote: >> What about when try to save and restore an adapter in Leyden? > > Short answer: we should exclude these checks when we cache adapters in Leyden. > > Long answer: > 1. These checks are enabled only in debug VM and only on x86. > 2. We never caught any issues with these checks since JDK 9 (I found only one issue [JDK-8023465](https://bugs.openjdk.org/browse/JDK-8023465)). > 3. Enabling it create issues because we don't add an adapters to hash table until all stubs are generated [contains_all_checks](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2637) which which creates duplicated adapters for the same method's signature. > 4. These checks may miss an issue because they pass if adapter called from stubs but stubs may be called from compiled call. > > I was actually considering removing these checks but decided to keep them for now. But I don't think we need them in cached adapters in Leyden. I'm fine with that! (it makes save and restore in Leyden much simpler :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20470#discussion_r1706660576 From adinn at openjdk.org Wed Aug 7 09:12:33 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 7 Aug 2024 09:12:33 GMT Subject: RFR: 8337797: Additional ExternalAddress cleanup [v2] In-Reply-To: References: Message-ID: On Tue, 6 Aug 2024 16:58:08 GMT, Vladimir Kozlov wrote: >> `relic_none` is treated as unreachable and will force using lea(): >> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L12915 > > And `runtime_call_type' will be reachable if inside CodeCache: > https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L12897 Ok, so the PR looks good then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20470#discussion_r1706662879 From fgao at openjdk.org Wed Aug 7 10:46:11 2024 From: fgao at openjdk.org (Fei Gao) Date: Wed, 7 Aug 2024 10:46:11 GMT Subject: RFR: 8337536: AArch64: Enable BTI branch protection for runtime part Message-ID: This patch enables BTI branch protection for runtime part on Linux/aarch64 platform. Motivation 1. Since Fedora 33, glibc+kernel are PAC/BTI enabled by default. User-level packages can gain additional hardening by compiling with the GCC/Clang flag `-mbranch-protection=flag`. See [1]. 2. In JDK-8277204 [2], `--enable-branch-protection` was introduced as one VM configure flag, which would pass `-mbranch-protection=standard` compilation flags to all c/c++ files. Note that `standard` turns on both `pac-ret` and `bti` branch protections. For more details about code reuse attacks and hardware-assisted branch protections on AArch64, see [3]. However, we checked the `.note.gnu.property` section of all the shared libraries under `jdk/lib` on Fedora 40, and found that only libjvm.so didn't set these two target feature bits: GNU_PROPERTY_AARCH64_FEATURE_1_BTI GNU_PROPERTY_AARCH64_FEATURE_1_PAC Note-1: BTI is an all or nothing property for a link unit [4]. That is, libjvm.so is not BTI-enabled. Note-2: PAC bit in `.note.gnu.property` section is used to protect `.got.plt` table. It's independent of whether the relocatable objects use PAC or not. Goal Hence, this patch aims to set PAC/BTI feature bits of the `.note.gnu.property` section for libjvm.so. Implementation Task-1: find out the problematic input objects >From [5], "Static linkers processing ELF relocatable objects must set the feature bit in the output object or image only if all the input objects have the corresponding feature bit set." Hence we suspect that the root cause is probably that the PAC/BTI feature bits are not set only for some input objects of libjvm.so. In order to find out these inputs, we passed `--force-bti` linker flag [4] in my local test. This linker flag would warn if any input object does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI. We got the following list: src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S src/hotspot/os_cpu/linux_aarch64/copy_linux_aarch64.S src/hotspot/os_cpu/linux_aarch64/safefetch_linux_aarch64.S src/hotspot/os_cpu/linux_aarch64/threadLS_linux_aarch64.S Task-2: add `.note.gnu.property` section for these assembly files As mentioned in Motivation-2 part, `-mbranch-protection=standard` is passed to compile c/c++ files but these assembly files are missed. In this patch, we also pass `-mbranch-protection=standard` flag to assembler (See the update in flags-cflags.m4 and flags-other.m4), and add `.note.gnu.property` section at the end of these assembler files. With this change, we can see PAC/BTI feature bits in the final libjvm.so. Task-3: add BTI landing pads for hand written assembly In the local test on Fedora 40 with PAC/BTI-capable hardware, we got `SIGILL` error, which is one typical BTI error (branch target exception). The root cause is that we should add the missing BTI landing pads for hand written assembly in hotspot. File-1 copy_aarch64.hpp: It's a switch-case statement and we add `bti j` for these indirect jumps. File-2 atomic_linux_aarch64.S: We add landings pads `bti c` at the function entries. File-3 copy_linux_aarch64.S: There is no need to add `bti c` at the function entries since they are called via `bl`. And we should handle the indirect jumps. File-4 safefetch_linux_aarch64.S: Similar to file-3, there is no need to handle these function entries. File-5 threadLS_linux_aarch64.S: No need to handle the function entry because `paciasp` can act as the landing pad. Evaluation 1. jtreg test We ran tier 1-3 jtreg tests on Fedora 40 + GCC 14 + the following AArch64 hardware and all tests passed. 1. w/o PAC and w/o BTI 2. w/ PAC and w/o BTI 3. w/ PAC and w/ BTI We also ran the jtreg tests on Fedora 40 + Clang 18 + hardware w/ PAC and w/ BTI. The test passed too. 2. code size We got about 2% code size increase before and after `--enbale-branch-protection` is used. This code size change looks reasonable. See the evaluation on glibc [6]. [1] https://fedoraproject.org/wiki/Changes/Aarch64_PointerAuthentication [2] https://bugs.openjdk.org/browse/JDK-8277204 [3] https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/code-reuse-attacks-the-compiler-story [4] https://reviews.llvm.org/D62609 [5] https://github.com/ARM-software/abi-aa/blob/2a70c42d62e9c3eb5887fa50b71257f20daca6f9/aaelf64/aaelf64.rst#program-property [6] https://developer.arm.com/documentation/102433/0100/Applying-these-techniques-to-real-code ------------- Commit messages: - 8337536: AArch64: Enable BTI branch protection for runtime part Changes: https://git.openjdk.org/jdk/pull/20491/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20491&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337536 Stats: 223 lines in 7 files changed: 199 ins; 2 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/20491.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20491/head:pull/20491 PR: https://git.openjdk.org/jdk/pull/20491 From jpai at openjdk.org Wed Aug 7 12:16:33 2024 From: jpai at openjdk.org (Jaikiran Pai) Date: Wed, 7 Aug 2024 12:16:33 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v11] In-Reply-To: References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <2vzlYj-B2DTKSdc_9UA6qArZTQCsARWsA4IDQeGT98o=.af817146-78ec-4fa8-83e6-28aa9cec7170@github.com> <2lxO3l-IDFl1Frg9Xs4MhybvCAlzbAzHRRGlmSqN3m4=.35c98147-f58b-43e8-8458-e558b48f031b@github.com> Message-ID: <5XxVrOyJWe1YQ81qBGoyfg025VJqmrFcLyDm0tyjwA0=.6cb24af2-8aff-40c4-ac25-83f78e2e8122@github.com> On Wed, 7 Aug 2024 08:31:35 GMT, Jaikiran Pai wrote: >> I see that the `NoClassDefFoundError` failure that you are mentioning is actually being reported even in the GitHub Actions job failures. It does look odd (although might be a pre-known jtreg issue). It will need a bit of investigation to see what's going on. > > I was able to reproduce this `NoClassDefFoundError` in this new test locally. I will take a look to see if I can figure out what's going on. I looked into this locally and this is a (known) bug in jtreg https://bugs.openjdk.org/browse/CODETOOLS-7902847. What's happening here is that the tests are launched using make test TEST=test/hotspot/jtreg/:tier1_serviceability. One of those tests is the (pre-existing unrelated to this PR) AgentWithVThreadTest. That test has a `@compile AgentWithVThread.java AgentWithVThreadTest.java`. This triggers compilation of those classes and also any referenced classes in those 2 classes. One such class happens to be the test library's jdk.test.lib.Utils. This is a test library class (used/referenced indirectly in that test). jtreg ends up issuing a javac command with destination directory as the AgentWithVThreadTest's test specific work directory: -d build/macosx-aarch64/test-support/jtreg_test_hotspot_jtreg_tier1_serviceability/classes/0/serviceability/jvmti/vthread/premain/AgentWithVThreadTest.d So the `jdk.test.lib.Utils.class` (along with other classes) file ends up being compiled to `build/macosx-aarch64/test-support/jtreg_test_hotspot_jtreg_tier1_serviceability/classes/0/serviceability/jvmti/vthread/premain/AgentWithVThreadTest.d/jdk/test/lib/Utils.class`. During this compilation, the `jdk.test.lib.util.JavaAgentBuilder` doesn't get compiled because that class isn't referenced by AgentWithVThreadTest (neither directly or indirectly). Then during the same test execution, jtreg notices a `@run` statement: @run driver jdk.test.lib.util.JavaAgentBuilder .... And to launch that run action, it first builds and compiles the jdk.test.lib.util.JavaAgentBuilder and since this is a test library class, jtreg ends up launching `javac` with a destination directory which is common/shared by multiple tests: -d build/macosx-aarch64/test-support/jtreg_test_hotspot_jtreg_tier1_serviceability/classes/0/test/lib Additionally, since this is being compiled in context of the AgentWithVThreadTest, jtreg also passes the test specific work directory (`build/macosx-aarch64/test-support/jtreg_test_hotspot_jtreg_tier1_serviceability/classes/0/serviceability/jvmti/vthread/premain/AgentWithVThreadTest.d`) as the classpath to the javac command. So effectively, this compilation ends up finding the `jdk.test.lib.Utils.class` in the test specific directory and doesn't recompile to the shared location. Since the `jdk.test.lib.util.JavaAgentBuilder` hasn't yet been compiled nor is located in the test specific work directory, javac ends up compiling it and placing it in the destination directory which was passed to the javac invocation and happens to be a shared directory for tests `build/macosx-aarch64/test-support/jtreg_test_hotspot_jtreg_tier1_serviceability/classes/0/test/lib`. Effectively we now have a test library class (the `JavaAgentBuilder`) in the common shared directory and some of its referenced classes (`Utils.class`) in a test specific directory. At a later point in time, this new test `TestPinCaseWithCFLH` being proposed in this PR, gets launched and jtreg notices the `@run` statement: @run driver jdk.test.lib.util.JavaAgentBuilder ... Just like previously, since this is a test library class, jtreg tries to locate this class in the shared directory and it finds it in the shared directory `build/macosx-aarch64/test-support/jtreg_test_hotspot_jtreg_tier1_serviceability/classes/0/test/lib` (since it was compiled to that location by an unrelated test). Since it finds the class, jtreg then skips compilation of that test library class and thus the `Utils.class` (or any of the classes referenced by `JavaAgentBuilder`) isn't recompiled again. jtreg then launches the test with this shared directory and this test's specific work directory (which is different from the AgentWithVThreadTest's work directory) in the classpath. Since neither of these directories contain the `Utils.class`, we end up with this missing class error. Like I noted, this is a known problem in jtreg. I'll see if we can do something in that area. For now though, I think one workaround is to add a: @clean jdk.test.lib.util.JavaAgentBuilder before the `@run` tag to allow for the jdk.test.lib.util.JavaAgentBuilder and its referenced classes to be compiled afresh (into the shared test library directory). This is what the change would look like in your PR: diff --git a/test/hotspot/jtreg/serviceability/jvmti/vthread/TestPinCaseWithCFLH/TestPinCaseWithCFLH.java b/test/hotspot/jtreg/serviceability/jvmti/vthread/TestPinCaseWithCFLH/TestPinCaseWithCFLH.java index 02755a0289f..60564115f51 100644 --- a/test/hotspot/jtreg/serviceability/jvmti/vthread/TestPinCaseWithCFLH/TestPinCaseWithCFLH.java +++ b/test/hotspot/jtreg/serviceability/jvmti/vthread/TestPinCaseWithCFLH/TestPinCaseWithCFLH.java @@ -34,7 +34,7 @@ * @requires vm.jvmti * @modules java.base/java.lang:+open * @compile TestPinCaseWithCFLH.java - * @build jdk.test.lib.Utils + * @clean jdk.test.lib.util.JavaAgentBuilder * @run driver jdk.test.lib.util.JavaAgentBuilder * TestPinCaseWithCFLH TestPinCaseWithCFLH.jar * @run main/othervm/timeout=100 -Djdk.virtualThreadScheduler.maxPoolSize=1 I admit it's odd to be expecting the test to be doing this, but I think this should make the test stable. Can you give it a try? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1706894821 From ihse at openjdk.org Wed Aug 7 15:30:31 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 7 Aug 2024 15:30:31 GMT Subject: RFR: 8337536: AArch64: Enable BTI branch protection for runtime part In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 10:40:09 GMT, Fei Gao wrote: > This patch enables BTI branch protection for runtime part on Linux/aarch64 platform. > > Motivation > > 1. Since Fedora 33, glibc+kernel are PAC/BTI enabled by default. User-level packages can gain additional hardening by compiling with the GCC/Clang flag `-mbranch-protection=flag`. See [1]. > > 2. In JDK-8277204 [2], `--enable-branch-protection` was introduced as one VM configure flag, which would pass `-mbranch-protection=standard` compilation flags to all c/c++ files. Note that `standard` turns on both `pac-ret` and `bti` branch protections. For more details about code reuse attacks and hardware-assisted branch protections on AArch64, see [3]. > > However, we checked the `.note.gnu.property` section of all the shared libraries under `jdk/lib` on Fedora 40, and found that only libjvm.so didn't set these two target feature bits: > > > GNU_PROPERTY_AARCH64_FEATURE_1_BTI > GNU_PROPERTY_AARCH64_FEATURE_1_PAC > > > Note-1: BTI is an all or nothing property for a link unit [4]. That is, libjvm.so is not BTI-enabled. > > Note-2: PAC bit in `.note.gnu.property` section is used to protect `.got.plt` table. It's independent of whether the relocatable objects use PAC or not. > > Goal > > Hence, this patch aims to set PAC/BTI feature bits of the `.note.gnu.property` section for libjvm.so. > > Implementation > > Task-1: find out the problematic input objects > > From [5], "Static linkers processing ELF relocatable objects must set the feature bit in the output object or image only if all the input objects have the corresponding feature bit set." Hence we suspect that the root cause is probably that the PAC/BTI feature bits are not set only for some input objects of libjvm.so. > > In order to find out these inputs, we passed `--force-bti` linker flag [4] in my local test. This linker flag would warn if any input object does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI. We got the following list: > > > src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S > src/hotspot/os_cpu/linux_aarch64/copy_linux_aarch64.S > src/hotspot/os_cpu/linux_aarch64/safefetch_linux_aarch64.S > src/hotspot/os_cpu/linux_aarch64/threadLS_linux_aarch64.S > > > Task-2: add `.note.gnu.property` section for these assembly files > > As mentioned in Motivation-2 part, `-mbranch-protection=standard` is passed to compile c/c++ files but these assembly files are missed. > > In this patch, we also pass `-mbranch-protection=standard` flag to assembler (See the update in flags-cflags.m4 and flags-other.m4), and add `.note.gnu.property` section at the end... Thank you for the detailed description! Adding this flag to ASFLAGS as well seems reasonable, but I think the logic in the configure files can be straightened out a bit. I'll look around at the files and get back to you with a more concrete recommendation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20491#issuecomment-2273743971 From ihse at openjdk.org Wed Aug 7 15:51:30 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 7 Aug 2024 15:51:30 GMT Subject: RFR: 8337536: AArch64: Enable BTI branch protection for runtime part In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 10:40:09 GMT, Fei Gao wrote: > This patch enables BTI branch protection for runtime part on Linux/aarch64 platform. > > Motivation > > 1. Since Fedora 33, glibc+kernel are PAC/BTI enabled by default. User-level packages can gain additional hardening by compiling with the GCC/Clang flag `-mbranch-protection=flag`. See [1]. > > 2. In JDK-8277204 [2], `--enable-branch-protection` was introduced as one VM configure flag, which would pass `-mbranch-protection=standard` compilation flags to all c/c++ files. Note that `standard` turns on both `pac-ret` and `bti` branch protections. For more details about code reuse attacks and hardware-assisted branch protections on AArch64, see [3]. > > However, we checked the `.note.gnu.property` section of all the shared libraries under `jdk/lib` on Fedora 40, and found that only libjvm.so didn't set these two target feature bits: > > > GNU_PROPERTY_AARCH64_FEATURE_1_BTI > GNU_PROPERTY_AARCH64_FEATURE_1_PAC > > > Note-1: BTI is an all or nothing property for a link unit [4]. That is, libjvm.so is not BTI-enabled. > > Note-2: PAC bit in `.note.gnu.property` section is used to protect `.got.plt` table. It's independent of whether the relocatable objects use PAC or not. > > Goal > > Hence, this patch aims to set PAC/BTI feature bits of the `.note.gnu.property` section for libjvm.so. > > Implementation > > Task-1: find out the problematic input objects > > From [5], "Static linkers processing ELF relocatable objects must set the feature bit in the output object or image only if all the input objects have the corresponding feature bit set." Hence we suspect that the root cause is probably that the PAC/BTI feature bits are not set only for some input objects of libjvm.so. > > In order to find out these inputs, we passed `--force-bti` linker flag [4] in my local test. This linker flag would warn if any input object does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI. We got the following list: > > > src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S > src/hotspot/os_cpu/linux_aarch64/copy_linux_aarch64.S > src/hotspot/os_cpu/linux_aarch64/safefetch_linux_aarch64.S > src/hotspot/os_cpu/linux_aarch64/threadLS_linux_aarch64.S > > > Task-2: add `.note.gnu.property` section for these assembly files > > As mentioned in Motivation-2 part, `-mbranch-protection=standard` is passed to compile c/c++ files but these assembly files are missed. > > In this patch, we also pass `-mbranch-protection=standard` flag to assembler (See the update in flags-cflags.m4 and flags-other.m4), and add `.note.gnu.property` section at the end... It turned out to be easier to write it myself than trying to explain it. Please have a look here: https://github.com/openjdk/jdk/commit/0fe840dec597bb4a819eb2025a6d56cd82f237b5 (This also contains some additional cleanup in the branch protection configure code.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20491#issuecomment-2273789758 From kvn at openjdk.org Wed Aug 7 16:16:31 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 7 Aug 2024 16:16:31 GMT Subject: RFR: 8337797: Additional ExternalAddress cleanup [v2] In-Reply-To: References: Message-ID: On Tue, 6 Aug 2024 04:48:03 GMT, Vladimir Kozlov wrote: >> While working on [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519) I noticed few ExternalAddress cases I missed in [JDK-8337396](https://bugs.openjdk.org/browse/JDK-8337396) changes. >> >> I also added asserts on x86 to catch using ExternalAddress for jumps and calls instructions and caught few additional cases (Windows and arraycopy cases). >> >> Tested tier1-5,hs-stress,hs-comp > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > riscv update Thank you, Andrew. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20470#issuecomment-2273836559 From coleenp at openjdk.org Wed Aug 7 16:37:51 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 7 Aug 2024 16:37:51 GMT Subject: RFR: 8337683: Fix -Wconversion problem with arrayOop.hpp [v2] In-Reply-To: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> References: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> Message-ID: > Since base_offset_in_bytes and HeapWordSize are int, there's no loss of conversion in making these variables int. This seems trivial. > Tested with tier1 on linux and windows. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix additional -Wsign-conversion errors, contributed by Stefan. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20431/files - new: https://git.openjdk.org/jdk/pull/20431/files/01b3cf40..4ac52827 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20431&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20431&range=00-01 Stats: 10 lines in 3 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/20431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20431/head:pull/20431 PR: https://git.openjdk.org/jdk/pull/20431 From shade at openjdk.org Wed Aug 7 17:24:31 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 7 Aug 2024 17:24:31 GMT Subject: RFR: 8337958: Out-of-bounds array access in secondary_super_cache In-Reply-To: References: Message-ID: <2PCvwiU60BP83sT66tkGukVE9mBPchvw1s7IaXTPnq4=.dc3c9080-f54b-434e-a8ff-5a6fca832083@github.com> On Tue, 6 Aug 2024 23:35:55 GMT, Andrew Haley wrote: > The fix for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), secondary_super_cache does not scale well, has a rare (and benign) out-of-bounds array access. While this bug is very unlikely ever to cause a failure, it should be fixed. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 1734: > 1732: assert(Klass::SECONDARY_SUPERS_BITMAP_FULL == ~uintx(0), ""); > 1733: cmpw(r_array_length, (u1)(Klass::SECONDARY_SUPERS_TABLE_SIZE - 2)); > 1734: br(GT, L_huge); Silly questions: 1. Why is it `(u1)`, when we are comparing with `cmpw` (4 bytes)? Also, should it really be unsigned? x86 code uses signed `int32_t`. 2. I was trying to see if there is anything special about `-2` here. Would it be a bit cleaner to say `GE` `Klass::SECONDARY_SUPERS_TABLE_SIZE - 1`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20483#discussion_r1707498103 From aph at openjdk.org Wed Aug 7 17:29:30 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 7 Aug 2024 17:29:30 GMT Subject: RFR: 8337536: AArch64: Enable BTI branch protection for runtime part In-Reply-To: References: Message-ID: <446-JAhZlwZT7eNafXxR90EqiIUuV5Xd9bMfqXTOVA4=.45e46493-f09e-4ef4-9d13-6657b7938433@github.com> On Wed, 7 Aug 2024 10:40:09 GMT, Fei Gao wrote: > This patch enables BTI branch protection for runtime part on Linux/aarch64 platform. > > Motivation > > 1. Since Fedora 33, glibc+kernel are PAC/BTI enabled by default. User-level packages can gain additional hardening by compiling with the GCC/Clang flag `-mbranch-protection=flag`. See [1]. > > 2. In JDK-8277204 [2], `--enable-branch-protection` was introduced as one VM configure flag, which would pass `-mbranch-protection=standard` compilation flags to all c/c++ files. Note that `standard` turns on both `pac-ret` and `bti` branch protections. For more details about code reuse attacks and hardware-assisted branch protections on AArch64, see [3]. > > However, we checked the `.note.gnu.property` section of all the shared libraries under `jdk/lib` on Fedora 40, and found that only libjvm.so didn't set these two target feature bits: > > > GNU_PROPERTY_AARCH64_FEATURE_1_BTI > GNU_PROPERTY_AARCH64_FEATURE_1_PAC > > > Note-1: BTI is an all or nothing property for a link unit [4]. That is, libjvm.so is not BTI-enabled. > > Note-2: PAC bit in `.note.gnu.property` section is used to protect `.got.plt` table. It's independent of whether the relocatable objects use PAC or not. > > Goal > > Hence, this patch aims to set PAC/BTI feature bits of the `.note.gnu.property` section for libjvm.so. > > Implementation > > Task-1: find out the problematic input objects > > From [5], "Static linkers processing ELF relocatable objects must set the feature bit in the output object or image only if all the input objects have the corresponding feature bit set." Hence we suspect that the root cause is probably that the PAC/BTI feature bits are not set only for some input objects of libjvm.so. > > In order to find out these inputs, we passed `--force-bti` linker flag [4] in my local test. This linker flag would warn if any input object does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI. We got the following list: > > > src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S > src/hotspot/os_cpu/linux_aarch64/copy_linux_aarch64.S > src/hotspot/os_cpu/linux_aarch64/safefetch_linux_aarch64.S > src/hotspot/os_cpu/linux_aarch64/threadLS_linux_aarch64.S > > > Task-2: add `.note.gnu.property` section for these assembly files > > As mentioned in Motivation-2 part, `-mbranch-protection=standard` is passed to compile c/c++ files but these assembly files are missed. > > In this patch, we also pass `-mbranch-protection=standard` flag to assembler (See the update in flags-cflags.m4 and flags-other.m4), and add `.note.gnu.property` section at the end... Can you explain why we want to support PAC without BTI? Would anyone use such a config? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20491#issuecomment-2273969126 From stevenschlansker at gmail.com Wed Aug 7 17:29:44 2024 From: stevenschlansker at gmail.com (Steven Schlansker) Date: Wed, 7 Aug 2024 10:29:44 -0700 Subject: Reliability of JVM in face of "recoverable" Errors, e.g. out of code cache space In-Reply-To: References: <26618BA9-BCF5-422D-89B5-8BEB20AF7856@gmail.com> Message-ID: <67DA75AF-D768-4836-81F0-E63927A94E10@gmail.com> > On Aug 5, 2024, at 7:08?PM, David Holmes wrote: > > Hi Steven, > > On 3/08/2024 5:11 am, Steven Schlansker wrote: >> Hi hotspot-dev, >> Please let me know me if this is not an appropriate place to raise this kind of question - >> happy to move to another more appropriate list > > This does seem like an issue with method linking in hotspot and so is appropriate. I would suggest filing a bug in JBS. Thank you very much for your reply David. I filed into JBS with review ID 9077424 From sspitsyn at openjdk.org Wed Aug 7 17:44:36 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 7 Aug 2024 17:44:36 GMT Subject: Integrated: 8336846: assert(state->get_thread() == jt) failed: handshake unsafe conditions In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 04:09:31 GMT, Serguei Spitsyn wrote: > The JVMTI Watch Field functions do not disable VTMS transitions with the `JvmtiVTMSTransitionDisabler`: > - `SetFieldAccessWatch()` > - `ClearFieldAccessWatch()` > - `SetFieldModificationWatch()` > - `ClearFieldModificationWatch()` > so in the `recompute_enabled()` we could see that a vthread is mounted, but in the `EnterInterpOnlyModeClosure` handshake the thread could have been unmounted already. This is a root cause of failures with this assert. > > The fix is to disable transitions in the `JvmtiEventControllerPrivate::change_field_watch()` function. > > Testing: > - TBD: submit mach5 tiers 1-6 This pull request has now been integrated. Changeset: 36d08c21 Author: Serguei Spitsyn URL: https://git.openjdk.org/jdk/commit/36d08c213d03deddf69ecb9770a3afef73a15444 Stats: 5 lines in 2 files changed: 3 ins; 1 del; 1 mod 8336846: assert(state->get_thread() == jt) failed: handshake unsafe conditions Reviewed-by: amenkov, dholmes, cjplummer, pchilanomate, lmesnik ------------- PR: https://git.openjdk.org/jdk/pull/20413 From gziemski at openjdk.org Wed Aug 7 17:47:47 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Wed, 7 Aug 2024 17:47:47 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag Message-ID: Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) ------------- Commit messages: - rename MEMFLAGS to MemType Changes: https://git.openjdk.org/jdk/pull/20497/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20497&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337563 Stats: 502 lines in 100 files changed: 1 ins; 0 del; 501 mod Patch: https://git.openjdk.org/jdk/pull/20497.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20497/head:pull/20497 PR: https://git.openjdk.org/jdk/pull/20497 From coleenp at openjdk.org Wed Aug 7 18:24:16 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 7 Aug 2024 18:24:16 GMT Subject: RFR: 8337683: Fix -Wconversion problem with arrayOop.hpp [v3] In-Reply-To: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> References: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> Message-ID: > Since base_offset_in_bytes and HeapWordSize are int, there's no loss of conversion in making these variables int. This seems trivial. > Tested with tier1 on linux and windows. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix compilation error and use pointer_delta_as_int instead. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20431/files - new: https://git.openjdk.org/jdk/pull/20431/files/4ac52827..acbdbbdc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20431&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20431&range=01-02 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20431/head:pull/20431 PR: https://git.openjdk.org/jdk/pull/20431 From coleenp at openjdk.org Wed Aug 7 19:13:02 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 7 Aug 2024 19:13:02 GMT Subject: RFR: 8337683: Fix -Wconversion problem with arrayOop.hpp [v4] In-Reply-To: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> References: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> Message-ID: > Since base_offset_in_bytes and HeapWordSize are int, there's no loss of conversion in making these variables int. This seems trivial. > Tested with tier1 on linux and windows. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Add include file. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20431/files - new: https://git.openjdk.org/jdk/pull/20431/files/acbdbbdc..f14a675a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20431&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20431&range=02-03 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20431/head:pull/20431 PR: https://git.openjdk.org/jdk/pull/20431 From coleenp at openjdk.org Wed Aug 7 21:05:05 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 7 Aug 2024 21:05:05 GMT Subject: RFR: 8337683: Fix -Wconversion problem with arrayOop.hpp [v5] In-Reply-To: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> References: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> Message-ID: > Since base_offset_in_bytes and HeapWordSize are int, there's no loss of conversion in making these variables int. This seems trivial. > Tested with tier1 on linux and windows. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: See if GHA compilers like this. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20431/files - new: https://git.openjdk.org/jdk/pull/20431/files/f14a675a..22c16973 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20431&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20431&range=03-04 Stats: 3 lines in 1 file changed: 1 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20431/head:pull/20431 PR: https://git.openjdk.org/jdk/pull/20431 From lmesnik at openjdk.org Wed Aug 7 23:14:39 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 7 Aug 2024 23:14:39 GMT Subject: RFR: 8338010: WB_IsFrameDeoptimized miss ResourceMark Message-ID: The method WB_IsFrameDeoptimized is used only by test com/sun/jdi/EATests.java and intermittently fails with virtual thread test factory. The log explains how problem happens: Stack: [0x000000f373e00000,0x000000f373f00000] Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [jvm.dll+0xc956e1] os::win32::platform_print_native_stack+0x101 (os_windows_x86.cpp:235) V [jvm.dll+0xf59abb] VMError::report+0x149b (vmError.cpp:1010) V [jvm.dll+0xf5c15e] VMError::report_and_die+0x80e (vmError.cpp:1845) V [jvm.dll+0x55796e] report_fatal+0x7e (debug.cpp:214) V [jvm.dll+0xd4d591] ResourceArea::allocate_bytes+0x111 (resourceArea.inline.hpp:33) V [jvm.dll+0xf44bef] vframe::new_vframe+0x7f (vframe.cpp:68) V [jvm.dll+0x7fb97a] JavaThread::last_java_vframe+0x3a (javaThread.cpp:2044) V [jvm.dll+0xf8f659] WB_IsFrameDeoptimized+0x219 (whitebox.cpp:798) C 0x000001fbdebd3b96 (no source info available) Testing by running test with and without thread factory & tier1. ------------- Commit messages: - rm added Changes: https://git.openjdk.org/jdk/pull/20502/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20502&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338010 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20502.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20502/head:pull/20502 PR: https://git.openjdk.org/jdk/pull/20502 From coleenp at openjdk.org Wed Aug 7 23:22:31 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 7 Aug 2024 23:22:31 GMT Subject: RFR: 8337683: Fix -Wconversion problem with arrayOop.hpp [v5] In-Reply-To: References: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> Message-ID: On Wed, 7 Aug 2024 21:05:05 GMT, Coleen Phillimore wrote: >> Since base_offset_in_bytes and HeapWordSize are int, there's no loss of conversion in making these variables int. This seems trivial. >> Tested with tier1 on linux and windows. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > See if GHA compilers like this. These GHA compilers are happy with this change now, please re-review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20431#issuecomment-2274505732 From aph at openjdk.org Wed Aug 7 23:58:31 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 7 Aug 2024 23:58:31 GMT Subject: RFR: 8337958: Out-of-bounds array access in secondary_super_cache In-Reply-To: <2PCvwiU60BP83sT66tkGukVE9mBPchvw1s7IaXTPnq4=.dc3c9080-f54b-434e-a8ff-5a6fca832083@github.com> References: <2PCvwiU60BP83sT66tkGukVE9mBPchvw1s7IaXTPnq4=.dc3c9080-f54b-434e-a8ff-5a6fca832083@github.com> Message-ID: <5RJpu7-aI_IIx_uBh8r-EGzfSZFJFuKvomrn9x9ksMk=.8168a8e8-13ae-45dd-9abb-1afe08d4edc9@github.com> On Wed, 7 Aug 2024 17:19:04 GMT, Aleksey Shipilev wrote: > Silly questions: > > 1. Why is it `(u1)`, when we are comparing with `cmpw` (4 bytes)? Also, should it really be unsigned? x86 code uses signed `int32_t`. Yeah, but AArch64 has a restricted rage of operand sizes. There's a very long thread where we discussed all of this, but we ended up defining `cmpw` for `(u1)`. This means we never see an overflow at runtime. > 2. I was trying to see if there is anything special about `-2` here. Would it be a bit cleaner to say `GE` `Klass::SECONDARY_SUPERS_TABLE_SIZE - 1`? Mmm, maybe, but it means the same to me. It's just a performance optimization that does a linear search when the table is almost full, because in measurements it's faster to do so. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20483#discussion_r1708141161 From coleenp at openjdk.org Thu Aug 8 00:00:44 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 8 Aug 2024 00:00:44 GMT Subject: RFR: 8333356: JVM crashes with "aux_index does not match even or odd indices" Message-ID: You get this message if the nodes of the concurrent hash table are corrupted or somehow don't yield the same hashcode as when the node was entered in the table. This change makes the error message less obscure, and in debug mode compares the two hash codes. The hash code is kept in the table in only debug mode, because we don't want the table to take a lot more memory. Tested with tier1-4. ------------- Commit messages: - 8333356: JVM crashes with "aux_index does not match even or odd indices" Changes: https://git.openjdk.org/jdk/pull/20503/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20503&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333356 Stats: 69 lines in 3 files changed: 68 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20503.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20503/head:pull/20503 PR: https://git.openjdk.org/jdk/pull/20503 From aph at openjdk.org Thu Aug 8 01:15:17 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 8 Aug 2024 01:15:17 GMT Subject: RFR: 8337958: Out-of-bounds array access in secondary_super_cache [v2] In-Reply-To: References: Message-ID: > The fix for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), secondary_super_cache does not scale well, has a rare (and benign) out-of-bounds array access. While this bug is very unlikely ever to cause a failure, it should be fixed. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: JDK-8337958: Out-of-bounds array access in secondary_super_cache ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20483/files - new: https://git.openjdk.org/jdk/pull/20483/files/70cbbcd4..07169e59 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20483&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20483&range=00-01 Stats: 4 lines in 2 files changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20483.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20483/head:pull/20483 PR: https://git.openjdk.org/jdk/pull/20483 From jwtang at openjdk.org Thu Aug 8 02:21:39 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Thu, 8 Aug 2024 02:21:39 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v11] In-Reply-To: <5XxVrOyJWe1YQ81qBGoyfg025VJqmrFcLyDm0tyjwA0=.6cb24af2-8aff-40c4-ac25-83f78e2e8122@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <2vzlYj-B2DTKSdc_9UA6qArZTQCsARWsA4IDQeGT98o=.af817146-78ec-4fa8-83e6-28aa9cec7170@github.com> <2lxO3l-IDFl1Frg9Xs4MhybvCAlzbAzHRRGlmSqN3m4=.35c98147-f58b-43e8-8458-e558b48f031b@github.com> <5XxVrOyJWe1YQ81qBGoyfg025VJqmrFcLyDm0tyjwA0=.6cb24af2-8aff-40c4-ac25-83f78e2e8122@github.com> Message-ID: <0VAchDj2d80xrOvrrQUROr3lj0BdBAbvLVCQ_TbpYAw=.8cd1907e-4b66-47f0-8cab-d0d4049ae405@github.com> On Wed, 7 Aug 2024 12:13:55 GMT, Jaikiran Pai wrote: >> I was able to reproduce this `NoClassDefFoundError` in this new test locally. I will take a look to see if I can figure out what's going on. > > I looked into this locally and this is a (known) bug in jtreg https://bugs.openjdk.org/browse/CODETOOLS-7902847. > > What's happening here is that the tests are launched using make test TEST=test/hotspot/jtreg/:tier1_serviceability. One of those tests is the (pre-existing unrelated to this PR) AgentWithVThreadTest. That test has a `@compile AgentWithVThread.java AgentWithVThreadTest.java`. This triggers compilation of those classes and also any referenced classes in those 2 classes. One such class happens to be the test library's jdk.test.lib.Utils. This is a test library class (used/referenced indirectly in that test). jtreg ends up issuing a javac command with destination directory as the AgentWithVThreadTest's test specific work directory: > > > -d build/macosx-aarch64/test-support/jtreg_test_hotspot_jtreg_tier1_serviceability/classes/0/serviceability/jvmti/vthread/premain/AgentWithVThreadTest.d > > > So the `jdk.test.lib.Utils.class` (along with other classes) file ends up being compiled to `build/macosx-aarch64/test-support/jtreg_test_hotspot_jtreg_tier1_serviceability/classes/0/serviceability/jvmti/vthread/premain/AgentWithVThreadTest.d/jdk/test/lib/Utils.class`. During this compilation, the `jdk.test.lib.util.JavaAgentBuilder` doesn't get compiled because that class isn't referenced by AgentWithVThreadTest (neither directly or indirectly). > > Then during the same test execution, jtreg notices a `@run` statement: > > > @run driver jdk.test.lib.util.JavaAgentBuilder .... > > > And to launch that run action, it first builds and compiles the jdk.test.lib.util.JavaAgentBuilder and since this is a test library class, jtreg ends up launching `javac` with a destination directory which is common/shared by multiple tests: > > > -d build/macosx-aarch64/test-support/jtreg_test_hotspot_jtreg_tier1_serviceability/classes/0/test/lib > > > Additionally, since this is being compiled in context of the AgentWithVThreadTest, jtreg also passes the test specific work directory (`build/macosx-aarch64/test-support/jtreg_test_hotspot_jtreg_tier1_serviceability/classes/0/serviceability/jvmti/vthread/premain/AgentWithVThreadTest.d`) as the classpath to the javac command. So effectively, this compilation ends up finding the `jdk.test.lib.Utils.class` in the test specific directory and doesn't recompile to the shared location. Since the `jdk.test.lib.util.JavaAgentBuilder` hasn't yet been compiled nor is located in the test specific work directory, javac ends up compiling it and placing it in the destinati... I was inspired by [CODETOOLS-7901986](https://bugs.openjdk.org/browse/CODETOOLS-7901986). By adding `@build jdk.test.lib.Utils` all tests are passed. The GitHub Actions jobs are finished successfully, too. I hope I could get a final review now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1708439201 From jwtang at openjdk.org Thu Aug 8 02:26:32 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Thu, 8 Aug 2024 02:26:32 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v11] In-Reply-To: <0VAchDj2d80xrOvrrQUROr3lj0BdBAbvLVCQ_TbpYAw=.8cd1907e-4b66-47f0-8cab-d0d4049ae405@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <2vzlYj-B2DTKSdc_9UA6qArZTQCsARWsA4IDQeGT98o=.af817146-78ec-4fa8-83e6-28aa9cec7170@github.com> <2lxO3l-IDFl1Frg9Xs4MhybvCAlzbAzHRRGlmSqN3m4=.35c98147-f58b-43e8-8458-e558b48f031b@github.com> <5XxVrOyJWe1YQ81qBGoyfg025VJqmrFcLyDm0tyjwA0=.6cb24af2-8aff-40c4-ac25-83f78e2e8122@github.com> <0VAchDj2d80xrOvrrQUROr3lj0BdBAbvLVC Q_TbpYAw=.8cd1907e-4b66-47f0-8cab-d0d4049ae405@github.com> Message-ID: On Thu, 8 Aug 2024 02:18:34 GMT, Jiawei Tang wrote: >> I looked into this locally and this is a (known) bug in jtreg https://bugs.openjdk.org/browse/CODETOOLS-7902847. >> >> What's happening here is that the tests are launched using make test TEST=test/hotspot/jtreg/:tier1_serviceability. One of those tests is the (pre-existing unrelated to this PR) AgentWithVThreadTest. That test has a `@compile AgentWithVThread.java AgentWithVThreadTest.java`. This triggers compilation of those classes and also any referenced classes in those 2 classes. One such class happens to be the test library's jdk.test.lib.Utils. This is a test library class (used/referenced indirectly in that test). jtreg ends up issuing a javac command with destination directory as the AgentWithVThreadTest's test specific work directory: >> >> >> -d build/macosx-aarch64/test-support/jtreg_test_hotspot_jtreg_tier1_serviceability/classes/0/serviceability/jvmti/vthread/premain/AgentWithVThreadTest.d >> >> >> So the `jdk.test.lib.Utils.class` (along with other classes) file ends up being compiled to `build/macosx-aarch64/test-support/jtreg_test_hotspot_jtreg_tier1_serviceability/classes/0/serviceability/jvmti/vthread/premain/AgentWithVThreadTest.d/jdk/test/lib/Utils.class`. During this compilation, the `jdk.test.lib.util.JavaAgentBuilder` doesn't get compiled because that class isn't referenced by AgentWithVThreadTest (neither directly or indirectly). >> >> Then during the same test execution, jtreg notices a `@run` statement: >> >> >> @run driver jdk.test.lib.util.JavaAgentBuilder .... >> >> >> And to launch that run action, it first builds and compiles the jdk.test.lib.util.JavaAgentBuilder and since this is a test library class, jtreg ends up launching `javac` with a destination directory which is common/shared by multiple tests: >> >> >> -d build/macosx-aarch64/test-support/jtreg_test_hotspot_jtreg_tier1_serviceability/classes/0/test/lib >> >> >> Additionally, since this is being compiled in context of the AgentWithVThreadTest, jtreg also passes the test specific work directory (`build/macosx-aarch64/test-support/jtreg_test_hotspot_jtreg_tier1_serviceability/classes/0/serviceability/jvmti/vthread/premain/AgentWithVThreadTest.d`) as the classpath to the javac command. So effectively, this compilation ends up finding the `jdk.test.lib.Utils.class` in the test specific directory and doesn't recompile to the shared location. Since the `jdk.test.lib.util.JavaAgentBuilder` hasn't yet been compiled nor is located in the test specific work directory, javac ends u... > > I was inspired by [CODETOOLS-7901986](https://bugs.openjdk.org/browse/CODETOOLS-7901986). By adding `@build jdk.test.lib.Utils` all tests are passed. The GitHub Actions jobs are finished successfully, too. I hope I could get a final review now. Thank you for your help. I already made it by adding `@build jdk.test.lib.Utils`. Do I need to try to add `@clean jdk.test.lib.util.JavaAgentBuilder` and test again? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1708443593 From jpai at openjdk.org Thu Aug 8 02:26:32 2024 From: jpai at openjdk.org (Jaikiran Pai) Date: Thu, 8 Aug 2024 02:26:32 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v11] In-Reply-To: References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <2vzlYj-B2DTKSdc_9UA6qArZTQCsARWsA4IDQeGT98o=.af817146-78ec-4fa8-83e6-28aa9cec7170@github.com> <2lxO3l-IDFl1Frg9Xs4MhybvCAlzbAzHRRGlmSqN3m4=.35c98147-f58b-43e8-8458-e558b48f031b@github.com> <5XxVrOyJWe1YQ81qBGoyfg025VJqmrFcLyDm0tyjwA0=.6cb24af2-8aff-40c4-ac25-83f78e2e8122@github.com> <0VAchDj2d80xrOvrrQUROr3lj0BdBAbvLVC Q_TbpYAw=.8cd1907e-4b66-47f0-8cab-d0d4049ae405@github.com> Message-ID: On Thu, 8 Aug 2024 02:21:39 GMT, Jiawei Tang wrote: >> I was inspired by [CODETOOLS-7901986](https://bugs.openjdk.org/browse/CODETOOLS-7901986). By adding `@build jdk.test.lib.Utils` all tests are passed. The GitHub Actions jobs are finished successfully, too. I hope I could get a final review now. > > Thank you for your help. I already made it by adding `@build jdk.test.lib.Utils`. Do I need to try to add `@clean jdk.test.lib.util.JavaAgentBuilder` and test again? I am not too sure adding the `@build jdk.test.lib.Utils` is a good thing. This test definition nor the test code uses/references that class. So it's odd to be adding a build tag for an indirect dependent class (and only that specific class). I felt the `@clean jdk.test.lib.util.JavaAgentBuilder` would be a better option since that `jdk.test.lib.util.JavaAgentBuilder` class is being used by the test definition. Having said that, it's just a personal opinion and I would let hotspot and serviceability area members to decide what approach to use here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1708446392 From gcao at openjdk.org Thu Aug 8 03:41:02 2024 From: gcao at openjdk.org (Gui Cao) Date: Thu, 8 Aug 2024 03:41:02 GMT Subject: RFR: 8338019: Fix simple -Wzero-as-null-pointer-constant warnings in riscv code Message-ID: Hi, Consistent with the aarch64 issue [JDK-8337786](https://bugs.openjdk.org/browse/JDK-8337786), a similar build warning exists in riscv. Please help review this trivial change that replaces some uses of literal 0 as a null pointer constant in riscv code to instead use nullptr. ### Testing - [ ] Run hotspot:tier1 tests on SOPHON SG2042 (release) ------------- Commit messages: - 8338019: Fix simple -Wzero-as-null-pointer-constant warnings in riscv code Changes: https://git.openjdk.org/jdk/pull/20506/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20506&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338019 Stats: 5 lines in 4 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20506.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20506/head:pull/20506 PR: https://git.openjdk.org/jdk/pull/20506 From dholmes at openjdk.org Thu Aug 8 04:32:31 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 8 Aug 2024 04:32:31 GMT Subject: RFR: 8333356: JVM crashes with "aux_index does not match even or odd indices" In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 23:55:47 GMT, Coleen Phillimore wrote: > You get this message if the nodes of the concurrent hash table are corrupted or somehow don't yield the same hashcode as when the node was entered in the table. This change makes the error message less obscure, and in debug mode compares the two hash codes. The hash code is kept in the table in only debug mode, because we don't want the table to take a lot more memory. > Tested with tier1-4. So this doesn't fix the reported crash but just tries to provide more diagnostics - which is fine but should be done under a new bug id so that we can keep tracking test failures against 8333356. Thanks src/hotspot/share/utilities/concurrentHashTable.inline.hpp line 684: > 682: DEBUG_ONLY(fatal("Cannot resize table: Node hash code has changed possibly due to corruption of the contents." > 683: " Node hash code changed from " SIZE_FORMAT " to " SIZE_FORMAT, aux->saved_hash(), aux_hash)); > 684: fatal("Cannot resize table: Node hash code has changed possibly due to corruption of the contents."); Wondering if any compiler will flag the second fatal as unreachable? ------------- PR Review: https://git.openjdk.org/jdk/pull/20503#pullrequestreview-2226837154 PR Review Comment: https://git.openjdk.org/jdk/pull/20503#discussion_r1708583490 From dholmes at openjdk.org Thu Aug 8 04:45:31 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 8 Aug 2024 04:45:31 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v11] In-Reply-To: References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <2vzlYj-B2DTKSdc_9UA6qArZTQCsARWsA4IDQeGT98o=.af817146-78ec-4fa8-83e6-28aa9cec7170@github.com> <2lxO3l-IDFl1Frg9Xs4MhybvCAlzbAzHRRGlmSqN3m4=.35c98147-f58b-43e8-8458-e558b48f031b@github.com> <5XxVrOyJWe1YQ81qBGoyfg025VJqmrFcLyDm0tyjwA0=.6cb24af2-8aff-40c4-ac25-83f78e2e8122@github.com> <0VAchDj2d80xrOvrrQUROr3lj0BdBAbvLVC Q_TbpYAw=.8cd1907e-4b66-47f0-8cab-d0d4049ae405@github.com> Message-ID: On Thu, 8 Aug 2024 02:23:39 GMT, Jaikiran Pai wrote: >> Thank you for your help. I already made it by adding `@build jdk.test.lib.Utils`. Do I need to try to add `@clean jdk.test.lib.util.JavaAgentBuilder` and test again? > > I am not too sure adding the `@build jdk.test.lib.Utils` is a good thing. This test definition nor the test code uses/references that class. So it's odd to be adding a build tag for an indirect dependent class (and only that specific class). I felt the `@clean jdk.test.lib.util.JavaAgentBuilder` would be a better option since that `jdk.test.lib.util.JavaAgentBuilder` class is being used by the test definition. > > Having said that, it's just a personal opinion and I would let hotspot and serviceability area members to decide what approach to use here. So I would suggest you wait to hear from them before changing anymore. Building a test library class not actually used by the test is certainly somewhat odd. I wasn't aware of the `@clean` workaround but I see a lot of vmTestbase tests use it, so please try the `@clean` as Jai suggested. Thanks ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1708600374 From dholmes at openjdk.org Thu Aug 8 04:49:35 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 8 Aug 2024 04:49:35 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 17:13:06 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. > > There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) If you called it `MemTypeFlag` - which to me still suggests mutually-exclusive values - then you would not need to rename all the variables with "flag" in their name later. ------------- PR Review: https://git.openjdk.org/jdk/pull/20497#pullrequestreview-2226852251 From dholmes at openjdk.org Thu Aug 8 05:17:35 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 8 Aug 2024 05:17:35 GMT Subject: RFR: 8337683: Fix -Wconversion problem with arrayOop.hpp [v5] In-Reply-To: References: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> Message-ID: On Wed, 7 Aug 2024 21:05:05 GMT, Coleen Phillimore wrote: >> Since base_offset_in_bytes and HeapWordSize are int, there's no loss of conversion in making these variables int. This seems trivial. >> Tested with tier1 on linux and windows. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > See if GHA compilers like this. src/hotspot/share/oops/arrayOop.hpp line 74: > 72: #ifdef ASSERT > 73: // make sure it isn't called before UseCompressedOops is initialized. > 74: static int arrayoopdesc_hs = 0; Why not just do a `checked_cast` on the return statement? (or even a range assert and a static cast?) src/hotspot/share/utilities/byteswap.hpp line 67: > 65: struct ByteswapFallbackImpl { > 66: inline constexpr uint16_t operator()(uint16_t x) const { > 67: return checked_cast(((x & UINT16_C(0x00ff)) << 8) | ((x & UINT16_C(0xff00)) >> 8)); What is the type of the expression without the cast? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20431#discussion_r1708633394 PR Review Comment: https://git.openjdk.org/jdk/pull/20431#discussion_r1708630156 From dholmes at openjdk.org Thu Aug 8 05:27:32 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 8 Aug 2024 05:27:32 GMT Subject: RFR: 8338010: WB_IsFrameDeoptimized miss ResourceMark In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 23:09:35 GMT, Leonid Mesnik wrote: > The method WB_IsFrameDeoptimized is used only by test com/sun/jdi/EATests.java and intermittently fails with virtual thread test factory. > The log explains how problem happens: > Stack: [0x000000f373e00000,0x000000f373f00000] > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [jvm.dll+0xc956e1] os::win32::platform_print_native_stack+0x101 (os_windows_x86.cpp:235) > V [jvm.dll+0xf59abb] VMError::report+0x149b (vmError.cpp:1010) > V [jvm.dll+0xf5c15e] VMError::report_and_die+0x80e (vmError.cpp:1845) > V [jvm.dll+0x55796e] report_fatal+0x7e (debug.cpp:214) > V [jvm.dll+0xd4d591] ResourceArea::allocate_bytes+0x111 (resourceArea.inline.hpp:33) > V [jvm.dll+0xf44bef] vframe::new_vframe+0x7f (vframe.cpp:68) > V [jvm.dll+0x7fb97a] JavaThread::last_java_vframe+0x3a (javaThread.cpp:2044) > V [jvm.dll+0xf8f659] WB_IsFrameDeoptimized+0x219 (whitebox.cpp:798) > C 0x000001fbdebd3b96 (no source info available) > > Testing by running test with and without thread factory & tier1. Looks good and trivial. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20502#pullrequestreview-2226886668 From fyang at openjdk.org Thu Aug 8 06:49:31 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 8 Aug 2024 06:49:31 GMT Subject: RFR: 8338019: Fix simple -Wzero-as-null-pointer-constant warnings in riscv code In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 03:37:06 GMT, Gui Cao wrote: > Hi, > Same as [JDK-8337786](https://bugs.openjdk.org/browse/JDK-8337786) for aarch64, similar build warnings exist on riscv. Please help review this trivial change that replaces some uses of literal 0 as a null pointer constant in riscv code to instead use nullptr. > > ### Testing > - [x] Run tier1 tests on SOPHON SG2042 (release) Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20506#pullrequestreview-2226993312 From jwtang at openjdk.org Thu Aug 8 06:55:10 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Thu, 8 Aug 2024 06:55:10 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v15] In-Reply-To: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> Message-ID: > I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing. Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision: fix the test condition to avoid NoClassDefFoundError ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20373/files - new: https://git.openjdk.org/jdk/pull/20373/files/91e1fc9c..f9db7801 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=13-14 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20373/head:pull/20373 PR: https://git.openjdk.org/jdk/pull/20373 From mli at openjdk.org Thu Aug 8 07:01:30 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 8 Aug 2024 07:01:30 GMT Subject: RFR: 8338019: Fix simple -Wzero-as-null-pointer-constant warnings in riscv code In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 03:37:06 GMT, Gui Cao wrote: > Hi, > Same as [JDK-8337786](https://bugs.openjdk.org/browse/JDK-8337786) for aarch64, similar build warnings exist on riscv. Please help review this trivial change that replaces some uses of literal 0 as a null pointer constant in riscv code to instead use nullptr. > > ### Testing > - [x] Run tier1 tests on SOPHON SG2042 (release) Marked as reviewed by mli (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20506#pullrequestreview-2227017025 From stefank at openjdk.org Thu Aug 8 07:41:31 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 8 Aug 2024 07:41:31 GMT Subject: RFR: 8337683: Fix -Wconversion problem with arrayOop.hpp [v5] In-Reply-To: References: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> Message-ID: <1TPRAcWwX70qO_r98Lxak0LvjcWqEDE6zIa-7E-gccU=.80b9218c-959f-48e1-91b3-b592c19559e1@github.com> On Wed, 7 Aug 2024 21:05:05 GMT, Coleen Phillimore wrote: >> Since base_offset_in_bytes and HeapWordSize are int, there's no loss of conversion in making these variables int. This seems trivial. >> Tested with tier1 on linux and windows. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > See if GHA compilers like this. Marked as reviewed by stefank (Reviewer). src/hotspot/share/runtime/atomic.hpp line 1122: > 1120: volatile uint32_t* aligned_dest > 1121: = reinterpret_cast(align_down(dest, sizeof(uint32_t))); > 1122: uint32_t offset = checked_cast(pointer_delta(dest, aligned_dest, 1)); If this works for all compilers, then that's great. I was a bit concerned that the code in the statement below would cause a warning: (sizeof(uint32_t) - 1 - offset) given that `sizeof` returns a `size_t` and `1` an `int`, but I guess the compilers are smart enough to figure out that they all fit within a uint32_t? ------------- PR Review: https://git.openjdk.org/jdk/pull/20431#pullrequestreview-2227089884 PR Review Comment: https://git.openjdk.org/jdk/pull/20431#discussion_r1708848081 From stefank at openjdk.org Thu Aug 8 07:41:32 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 8 Aug 2024 07:41:32 GMT Subject: RFR: 8337683: Fix -Wconversion problem with arrayOop.hpp [v5] In-Reply-To: References: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> Message-ID: <8R1eFwIWaExP_sbb2BmEljVylzkcwAvQRRlqPe8Ha9k=.36a60f79-2b0e-4da2-8cc5-cc515a76281f@github.com> On Thu, 8 Aug 2024 05:08:52 GMT, David Holmes wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> See if GHA compilers like this. > > src/hotspot/share/utilities/byteswap.hpp line 67: > >> 65: struct ByteswapFallbackImpl { >> 66: inline constexpr uint16_t operator()(uint16_t x) const { >> 67: return checked_cast(((x & UINT16_C(0x00ff)) << 8) | ((x & UINT16_C(0xff00)) >> 8)); > > What is the type of the expression without the cast? The type of the expression is `int`. https://en.cppreference.com/w/cpp/language/implicit_conversion > In particular, [arithmetic operators](https://en.cppreference.com/w/cpp/language/operator_arithmetic) do not accept types smaller than int as arguments, and integral promotions are automatically applied after lvalue-to-rvalue conversion, if applicable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20431#discussion_r1708844436 From shade at openjdk.org Thu Aug 8 08:30:32 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 Aug 2024 08:30:32 GMT Subject: RFR: 8338010: WB_IsFrameDeoptimized miss ResourceMark In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 23:09:35 GMT, Leonid Mesnik wrote: > The method WB_IsFrameDeoptimized is used only by test com/sun/jdi/EATests.java and intermittently fails with virtual thread test factory. > The log explains how problem happens: > Stack: [0x000000f373e00000,0x000000f373f00000] > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [jvm.dll+0xc956e1] os::win32::platform_print_native_stack+0x101 (os_windows_x86.cpp:235) > V [jvm.dll+0xf59abb] VMError::report+0x149b (vmError.cpp:1010) > V [jvm.dll+0xf5c15e] VMError::report_and_die+0x80e (vmError.cpp:1845) > V [jvm.dll+0x55796e] report_fatal+0x7e (debug.cpp:214) > V [jvm.dll+0xd4d591] ResourceArea::allocate_bytes+0x111 (resourceArea.inline.hpp:33) > V [jvm.dll+0xf44bef] vframe::new_vframe+0x7f (vframe.cpp:68) > V [jvm.dll+0x7fb97a] JavaThread::last_java_vframe+0x3a (javaThread.cpp:2044) > V [jvm.dll+0xf8f659] WB_IsFrameDeoptimized+0x219 (whitebox.cpp:798) > C 0x000001fbdebd3b96 (no source info available) > > Testing by running test with and without thread factory & tier1. Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20502#pullrequestreview-2227201262 From jwtang at openjdk.org Thu Aug 8 08:49:34 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Thu, 8 Aug 2024 08:49:34 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v11] In-Reply-To: References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <2vzlYj-B2DTKSdc_9UA6qArZTQCsARWsA4IDQeGT98o=.af817146-78ec-4fa8-83e6-28aa9cec7170@github.com> <2lxO3l-IDFl1Frg9Xs4MhybvCAlzbAzHRRGlmSqN3m4=.35c98147-f58b-43e8-8458-e558b48f031b@github.com> <5XxVrOyJWe1YQ81qBGoyfg025VJqmrFcLyDm0tyjwA0=.6cb24af2-8aff-40c4-ac25-83f78e2e8122@github.com> <0VAchDj2d80xrOvrrQUROr3lj0BdBAbvLVC Q_TbpYAw=.8cd1907e-4b66-47f0-8cab-d0d4049ae405@github.com> Message-ID: On Thu, 8 Aug 2024 04:43:19 GMT, David Holmes wrote: >> I am not too sure adding the `@build jdk.test.lib.Utils` is a good thing. This test definition nor the test code uses/references that class. So it's odd to be adding a build tag for an indirect dependent class (and only that specific class). I felt the `@clean jdk.test.lib.util.JavaAgentBuilder` would be a better option since that `jdk.test.lib.util.JavaAgentBuilder` class is being used by the test definition. >> >> Having said that, it's just a personal opinion and I would let hotspot and serviceability area members to decide what approach to use here. So I would suggest you wait to hear from them before changing anymore. > > Building a test library class not actually used by the test is certainly somewhat odd. I wasn't aware of the `@clean` workaround but I see a lot of vmTestbase tests use it, so please try the `@clean` as Jai suggested. Thanks `@clean jdk.test.lib.util.JavaAgentBuilder` cannot solve the problem. I find that the reason why `AgentWithVThreadTest.java` can pass is that when compiling `AgentWithVThread.java`, it uses jdk.test.lib.Utils through `import` (`import jdk.test.lib.process.ProcessTools;`, `import jdk.test.lib.Utils;` is in the ProcessTools.java), so the test dir contains jdk.test.lib.Utils. However, my new testcase doesn't use it. I tryed add `import jdk.test.lib.Utils;` in my testcase, it can pass. If I only run the new testcase, it can pass and the work dir look like this: If I run `AgentWithVThread.java` and then `TestPinCaseWithCFLH.java`, the Utils.class is missed in test/lib/jdk/test/lib: ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1708962292 From jwtang at openjdk.org Thu Aug 8 08:57:33 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Thu, 8 Aug 2024 08:57:33 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v11] In-Reply-To: References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <2vzlYj-B2DTKSdc_9UA6qArZTQCsARWsA4IDQeGT98o=.af817146-78ec-4fa8-83e6-28aa9cec7170@github.com> <2lxO3l-IDFl1Frg9Xs4MhybvCAlzbAzHRRGlmSqN3m4=.35c98147-f58b-43e8-8458-e558b48f031b@github.com> <5XxVrOyJWe1YQ81qBGoyfg025VJqmrFcLyDm0tyjwA0=.6cb24af2-8aff-40c4-ac25-83f78e2e8122@github.com> <0VAchDj2d80xrOvrrQUROr3lj0BdBAbvLVC Q_TbpYAw=.8cd1907e-4b66-47f0-8cab-d0d4049ae405@github.com> Message-ID: On Thu, 8 Aug 2024 08:39:34 GMT, Jiawei Tang wrote: >> Building a test library class not actually used by the test is certainly somewhat odd. I wasn't aware of the `@clean` workaround but I see a lot of vmTestbase tests use it, so please try the `@clean` as Jai suggested. Thanks > > `@clean jdk.test.lib.util.JavaAgentBuilder` cannot solve the problem. > I find that the reason why `AgentWithVThreadTest.java` can pass is that when compiling `AgentWithVThread.java`, it uses jdk.test.lib.Utils through `import` (`import jdk.test.lib.process.ProcessTools;`, `import jdk.test.lib.Utils;` is in the ProcessTools.java), so the test dir contains jdk.test.lib.Utils. However, my new testcase doesn't use it. > > > I tryed add `import jdk.test.lib.Utils;` in my testcase, it can pass. > > > > If I only run the new testcase, it can pass and the work dir look like this: > > > If I run `AgentWithVThread.java` and then `TestPinCaseWithCFLH.java`, the Utils.class is missed in test/lib/jdk/test/lib: > A stable way to reproduce the problem: run AgentWithVThread.java and then TestPinCaseWithCFLH.java. jtreg -v:error,fail -jdk:{JDKPATH} ./test/hotspot/jtreg/serviceability/jvmti/vthread/premain/AgentWithVThreadTest.java jtreg -v:error,fail -jdk:{JDKPATH} ./test/hotspot/jtreg/serviceability/jvmti/vthread/TestPinCaseWithCFLH/TestPinCaseWithCFLH.java ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1708991727 From shade at openjdk.org Thu Aug 8 09:12:36 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 Aug 2024 09:12:36 GMT Subject: RFR: 8333356: JVM crashes with "aux_index does not match even or odd indices" In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 23:55:47 GMT, Coleen Phillimore wrote: > You get this message if the nodes of the concurrent hash table are corrupted or somehow don't yield the same hashcode as when the node was entered in the table. This change makes the error message less obscure, and in debug mode compares the two hash codes. The hash code is kept in the table in only debug mode, because we don't want the table to take a lot more memory. > Tested with tier1-4. The change looks good, and I agree with David that we are better off doing this under a separate bug ID. No need for new PR, just rename this one :) ------------- PR Review: https://git.openjdk.org/jdk/pull/20503#pullrequestreview-2227210554 From shade at openjdk.org Thu Aug 8 09:12:38 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 Aug 2024 09:12:38 GMT Subject: RFR: 8333356: JVM crashes with "aux_index does not match even or odd indices" In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 04:28:50 GMT, David Holmes wrote: >> You get this message if the nodes of the concurrent hash table are corrupted or somehow don't yield the same hashcode as when the node was entered in the table. This change makes the error message less obscure, and in debug mode compares the two hash codes. The hash code is kept in the table in only debug mode, because we don't want the table to take a lot more memory. >> Tested with tier1-4. > > src/hotspot/share/utilities/concurrentHashTable.inline.hpp line 684: > >> 682: DEBUG_ONLY(fatal("Cannot resize table: Node hash code has changed possibly due to corruption of the contents." >> 683: " Node hash code changed from " SIZE_FORMAT " to " SIZE_FORMAT, aux->saved_hash(), aux_hash)); >> 684: fatal("Cannot resize table: Node hash code has changed possibly due to corruption of the contents."); > > Wondering if any compiler will flag the second fatal as unreachable? +1. It would also dedup the messages if we wrap the second concat (yay preprocessor macros): fatal("Cannot resize table: Node hash code has changed possibly due to corruption of the contents." DEBUG_ONLY(" Node hash code changed from " SIZE_FORMAT " to " SIZE_FORMAT, aux->saved_hash(), aux_hash)); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20503#discussion_r1708948639 From jpai at openjdk.org Thu Aug 8 09:14:33 2024 From: jpai at openjdk.org (Jaikiran Pai) Date: Thu, 8 Aug 2024 09:14:33 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v11] In-Reply-To: References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <2vzlYj-B2DTKSdc_9UA6qArZTQCsARWsA4IDQeGT98o=.af817146-78ec-4fa8-83e6-28aa9cec7170@github.com> <2lxO3l-IDFl1Frg9Xs4MhybvCAlzbAzHRRGlmSqN3m4=.35c98147-f58b-43e8-8458-e558b48f031b@github.com> <5XxVrOyJWe1YQ81qBGoyfg025VJqmrFcLyDm0tyjwA0=.6cb24af2-8aff-40c4-ac25-83f78e2e8122@github.com> <0VAchDj2d80xrOvrrQUROr3lj0BdBAbvLVC Q_TbpYAw=.8cd1907e-4b66-47f0-8cab-d0d4049ae405@github.com> Message-ID: On Thu, 8 Aug 2024 08:54:54 GMT, Jiawei Tang wrote: >> `@clean jdk.test.lib.util.JavaAgentBuilder` cannot solve the problem. >> I find that the reason why `AgentWithVThreadTest.java` can pass is that when compiling `AgentWithVThread.java`, it uses jdk.test.lib.Utils through `import` (`import jdk.test.lib.process.ProcessTools;`, `import jdk.test.lib.Utils;` is in the ProcessTools.java), so the test dir contains jdk.test.lib.Utils. However, my new testcase doesn't use it. >> >> >> I tryed add `import jdk.test.lib.Utils;` in my testcase, it can pass. >> >> >> >> If I only run the new testcase, it can pass and the work dir look like this: >> >> >> If I run `AgentWithVThread.java` and then `TestPinCaseWithCFLH.java`, the Utils.class is missed in test/lib/jdk/test/lib: >> > > A stable way to reproduce the problem: run AgentWithVThread.java and then TestPinCaseWithCFLH.java. > > > jtreg -v:error,fail -jdk:{JDKPATH} ./test/hotspot/jtreg/serviceability/jvmti/vthread/premain/AgentWithVThreadTest.java > jtreg -v:error,fail -jdk:{JDKPATH} ./test/hotspot/jtreg/serviceability/jvmti/vthread/TestPinCaseWithCFLH/TestPinCaseWithCFLH.java When I proposed the `@clean` option, I had tried it locally and that had worked for me and I wasn't able to reproduce that issue anymore locally. However, looking at the failed GitHub actions job with the `@clean` option, I see this: #section:clean ----------messages:(5/232)---------- command: clean jdk.test.lib.util.JavaAgentBuilder reason: User specified action: run clean jdk.test.lib.util.JavaAgentBuilder started: Thu Aug 08 07:40:24 UTC 2024 finished: Thu Aug 08 07:40:24 UTC 2024 elapsed time (seconds): 0.0 ----------rerun:(2/367)*---------- cd /Users/runner/work/jdk/jdk/build/run-test-prebuilt/test-support/jtreg_test_hotspot_jtreg_tier1_serviceability/scratch/0 && \\ rm -f /Users/runner/work/jdk/jdk/build/run-test-prebuilt/test-support/jtreg_test_hotspot_jtreg_tier1_serviceability/classes/0/serviceability/jvmti/vthread/TestPinCaseWithCFLH/TestPinCaseWithCFLH.d/jdk/test/lib/util/JavaAgentBuilder.class result: Passed. Clean successful #section:build ----------messages:(5/194)---------- command: build jdk.test.lib.util.JavaAgentBuilder reason: Named class compiled on demand started: Thu Aug 08 07:40:24 UTC 2024 finished: Thu Aug 08 07:40:24 UTC 2024 elapsed time (seconds): 0.0 result: Passed. All files up to date So jtreg in its clean action appears to have only deleted the test specific work directory: rm -f /Users/runner/work/jdk/jdk/build/run-test-prebuilt/test-support/jtreg_test_hotspot_jtreg_tier1_serviceability/classes/0/serviceability/jvmti/vthread/TestPinCaseWithCFLH/TestPinCaseWithCFLH.d/jdk/test/lib/util/JavaAgentBuilder.class and of course that `JavaAgentBuilder.class` won't be there and is instead present in the shared directory at `/Users/runner/work/jdk/jdk/build/run-test-prebuilt/test-support/jtreg_test_hotspot_jtreg_tier1_serviceability/classes/0/test/lib`. So jtreg didn't clean up the shared directory location and let that class stay around which effectively meant the `@clean` ended up being a no-op. I have run out of ideas to introduce a proper workaround here. The right fix of course needs to happen in jtreg, which I will see if there are ways to implement it there. For now, it looks like the `@build jdk.test.lib.Utils` approach you used and made the test pass is the only way to make this consistently pass. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1709023985 From shade at openjdk.org Thu Aug 8 09:17:33 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 Aug 2024 09:17:33 GMT Subject: RFR: 8337958: Out-of-bounds array access in secondary_super_cache [v2] In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 01:15:17 GMT, Andrew Haley wrote: >> The fix for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), secondary_super_cache does not scale well, has a rare (and benign) out-of-bounds array access. While this bug is very unlikely ever to cause a failure, it should be fixed. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8337958: Out-of-bounds array access in secondary_super_cache Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20483#pullrequestreview-2227306812 From shade at openjdk.org Thu Aug 8 09:17:34 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 Aug 2024 09:17:34 GMT Subject: RFR: 8337958: Out-of-bounds array access in secondary_super_cache [v2] In-Reply-To: <5RJpu7-aI_IIx_uBh8r-EGzfSZFJFuKvomrn9x9ksMk=.8168a8e8-13ae-45dd-9abb-1afe08d4edc9@github.com> References: <2PCvwiU60BP83sT66tkGukVE9mBPchvw1s7IaXTPnq4=.dc3c9080-f54b-434e-a8ff-5a6fca832083@github.com> <5RJpu7-aI_IIx_uBh8r-EGzfSZFJFuKvomrn9x9ksMk=.8168a8e8-13ae-45dd-9abb-1afe08d4edc9@github.com> Message-ID: On Wed, 7 Aug 2024 23:56:14 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 1734: >> >>> 1732: assert(Klass::SECONDARY_SUPERS_BITMAP_FULL == ~uintx(0), ""); >>> 1733: cmpw(r_array_length, (u1)(Klass::SECONDARY_SUPERS_TABLE_SIZE - 2)); >>> 1734: br(GT, L_huge); >> >> Silly questions: >> 1. Why is it `(u1)`, when we are comparing with `cmpw` (4 bytes)? Also, should it really be unsigned? x86 code uses signed `int32_t`. >> 2. I was trying to see if there is anything special about `-2` here. Would it be a bit cleaner to say `GE` `Klass::SECONDARY_SUPERS_TABLE_SIZE - 1`? > >> Silly questions: >> >> 1. Why is it `(u1)`, when we are comparing with `cmpw` (4 bytes)? Also, should it really be unsigned? x86 code uses signed `int32_t`. > > Yeah, but AArch64 has a restricted rage of operand sizes. There's a very long thread where we discussed all of this, but we ended up defining `cmpw` for `(u1)`. This means we never see an overflow at runtime. > >> 2. I was trying to see if there is anything special about `-2` here. Would it be a bit cleaner to say `GE` `Klass::SECONDARY_SUPERS_TABLE_SIZE - 1`? > > Mmm, maybe, but it means the same to me. It's just a performance optimization that does a linear search when the table is almost full, because in measurements it's faster to do so. All right, fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20483#discussion_r1709029997 From luhenry at openjdk.org Thu Aug 8 09:30:32 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 8 Aug 2024 09:30:32 GMT Subject: RFR: 8338019: Fix simple -Wzero-as-null-pointer-constant warnings in riscv code In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 03:37:06 GMT, Gui Cao wrote: > Hi, > Same as [JDK-8337786](https://bugs.openjdk.org/browse/JDK-8337786) for aarch64, similar build warnings exist on riscv. Please help review this trivial change that replaces some uses of literal 0 as a null pointer constant in riscv code to instead use nullptr. > > ### Testing > - [x] Run tier1 tests on SOPHON SG2042 (release) Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20506#pullrequestreview-2227338006 From jwtang at openjdk.org Thu Aug 8 09:31:17 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Thu, 8 Aug 2024 09:31:17 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v16] In-Reply-To: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> Message-ID: > I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing. Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision: fix the test condition to avoid NoClassDefFoundError ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20373/files - new: https://git.openjdk.org/jdk/pull/20373/files/f9db7801..36e8e5cc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=14-15 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20373/head:pull/20373 PR: https://git.openjdk.org/jdk/pull/20373 From jwtang at openjdk.org Thu Aug 8 09:33:32 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Thu, 8 Aug 2024 09:33:32 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v11] In-Reply-To: References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> <2vzlYj-B2DTKSdc_9UA6qArZTQCsARWsA4IDQeGT98o=.af817146-78ec-4fa8-83e6-28aa9cec7170@github.com> <2lxO3l-IDFl1Frg9Xs4MhybvCAlzbAzHRRGlmSqN3m4=.35c98147-f58b-43e8-8458-e558b48f031b@github.com> <5XxVrOyJWe1YQ81qBGoyfg025VJqmrFcLyDm0tyjwA0=.6cb24af2-8aff-40c4-ac25-83f78e2e8122@github.com> <0VAchDj2d80xrOvrrQUROr3lj0BdBAbvLVC Q_TbpYAw=.8cd1907e-4b66-47f0-8cab-d0d4049ae405@github.com> Message-ID: On Thu, 8 Aug 2024 09:11:29 GMT, Jaikiran Pai wrote: >> A stable way to reproduce the problem: run AgentWithVThread.java and then TestPinCaseWithCFLH.java. >> >> >> jtreg -v:error,fail -jdk:{JDKPATH} ./test/hotspot/jtreg/serviceability/jvmti/vthread/premain/AgentWithVThreadTest.java >> jtreg -v:error,fail -jdk:{JDKPATH} ./test/hotspot/jtreg/serviceability/jvmti/vthread/TestPinCaseWithCFLH/TestPinCaseWithCFLH.java > > When I proposed the `@clean` option, I had tried it locally and that had worked for me and I wasn't able to reproduce that issue anymore locally. However, looking at the failed GitHub actions job with the `@clean` option, I see this: > > > > #section:clean > ----------messages:(5/232)---------- > command: clean jdk.test.lib.util.JavaAgentBuilder > reason: User specified action: run clean jdk.test.lib.util.JavaAgentBuilder > started: Thu Aug 08 07:40:24 UTC 2024 > finished: Thu Aug 08 07:40:24 UTC 2024 > elapsed time (seconds): 0.0 > ----------rerun:(2/367)*---------- > cd /Users/runner/work/jdk/jdk/build/run-test-prebuilt/test-support/jtreg_test_hotspot_jtreg_tier1_serviceability/scratch/0 && \\ > rm -f /Users/runner/work/jdk/jdk/build/run-test-prebuilt/test-support/jtreg_test_hotspot_jtreg_tier1_serviceability/classes/0/serviceability/jvmti/vthread/TestPinCaseWithCFLH/TestPinCaseWithCFLH.d/jdk/test/lib/util/JavaAgentBuilder.class > result: Passed. Clean successful > > #section:build > ----------messages:(5/194)---------- > command: build jdk.test.lib.util.JavaAgentBuilder > reason: Named class compiled on demand > started: Thu Aug 08 07:40:24 UTC 2024 > finished: Thu Aug 08 07:40:24 UTC 2024 > elapsed time (seconds): 0.0 > result: Passed. All files up to date > > So jtreg in its clean action appears to have only deleted the test specific work directory: > > > rm -f /Users/runner/work/jdk/jdk/build/run-test-prebuilt/test-support/jtreg_test_hotspot_jtreg_tier1_serviceability/classes/0/serviceability/jvmti/vthread/TestPinCaseWithCFLH/TestPinCaseWithCFLH.d/jdk/test/lib/util/JavaAgentBuilder.class > > and of course that `JavaAgentBuilder.class` won't be there and is instead present in the shared directory at `/Users/runner/work/jdk/jdk/build/run-test-prebuilt/test-support/jtreg_test_hotspot_jtreg_tier1_serviceability/classes/0/test/lib`. So jtreg didn't clean up the shared directory location and let that class stay around which effectively meant the `@clean` ended up being a no-op. > > I have run out of ideas to introduce a proper workaround here. The right fix of course needs to happen in jtreg, which I will see if there are ways to implement it there. > > For now, it looks like the `@build jdk.test.lib.Utils` approach you used and made the test pass is the only way to make this consistently pass. Thanks for your help. I changed it back. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1709062181 From coleenp at openjdk.org Thu Aug 8 11:10:32 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 8 Aug 2024 11:10:32 GMT Subject: RFR: 8337683: Fix -Wconversion problem with arrayOop.hpp [v5] In-Reply-To: References: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> Message-ID: On Thu, 8 Aug 2024 05:11:41 GMT, David Holmes wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> See if GHA compilers like this. > > src/hotspot/share/oops/arrayOop.hpp line 74: > >> 72: #ifdef ASSERT >> 73: // make sure it isn't called before UseCompressedOops is initialized. >> 74: static int arrayoopdesc_hs = 0; > > Why not just do a `checked_cast` on the return statement? (or even a range assert and a static cast?) Because restricting the types is better than checked_cast. The return type of length_offset_in_bytes() is int. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20431#discussion_r1709236293 From coleenp at openjdk.org Thu Aug 8 11:18:32 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 8 Aug 2024 11:18:32 GMT Subject: RFR: 8337683: Fix -Wconversion problem with arrayOop.hpp [v5] In-Reply-To: References: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> Message-ID: On Wed, 7 Aug 2024 21:05:05 GMT, Coleen Phillimore wrote: >> Since base_offset_in_bytes and HeapWordSize are int, there's no loss of conversion in making these variables int. This seems trivial. >> Tested with tier1 on linux and windows. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > See if GHA compilers like this. Thanks for reviewing Stefan and suggestions for byteswap.hpp and atomic.hpp. ------------- PR Review: https://git.openjdk.org/jdk/pull/20431#pullrequestreview-2227565170 From coleenp at openjdk.org Thu Aug 8 11:18:33 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 8 Aug 2024 11:18:33 GMT Subject: RFR: 8337683: Fix -Wconversion problem with arrayOop.hpp [v5] In-Reply-To: <1TPRAcWwX70qO_r98Lxak0LvjcWqEDE6zIa-7E-gccU=.80b9218c-959f-48e1-91b3-b592c19559e1@github.com> References: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> <1TPRAcWwX70qO_r98Lxak0LvjcWqEDE6zIa-7E-gccU=.80b9218c-959f-48e1-91b3-b592c19559e1@github.com> Message-ID: On Thu, 8 Aug 2024 07:38:33 GMT, Stefan Karlsson wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> See if GHA compilers like this. > > src/hotspot/share/runtime/atomic.hpp line 1122: > >> 1120: volatile uint32_t* aligned_dest >> 1121: = reinterpret_cast(align_down(dest, sizeof(uint32_t))); >> 1122: uint32_t offset = checked_cast(pointer_delta(dest, aligned_dest, 1)); > > If this works for all compilers, then that's great. I was a bit concerned that the code in the statement below would cause a warning: > > (sizeof(uint32_t) - 1 - offset) > > > given that `sizeof` returns a `size_t` and `1` an `int`, but I guess the compilers are smart enough to figure out that they all fit within a uint32_t? gcc doesn't complain about this with -Wsign-conversion, or at least as included with arrayOop.hpp, which was the only goal of this change was to fix the new -Wconversion warning with arrayOop.hpp. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20431#discussion_r1709249706 From mli at openjdk.org Thu Aug 8 12:15:34 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 8 Aug 2024 12:15:34 GMT Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v8] In-Reply-To: References: Message-ID: On Mon, 5 Aug 2024 15:44:07 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? >> >> I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it. >> >> Thanks. >> >> ## Test >> benchmarks run on CanVM-K230 (vlenb == 16), and banana-pi (vlenb == 32) >> >> I've tried several implementations, respectively with vector group >> * m2+m1+scalar >> * m2+scalar >> * m1+scalar >> * pure scalar >> The best one is combination of m2+m1, it have best performance in all source size. >> >> ### K230 >> >> this implementation (m2+m1) >> >> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic >> -- | -- | -- | -- | -- | -- | -- | -- | -- >> Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063 >> Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443 >> Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364 >> Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856 >> Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146 >> Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423 >> Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862 >> Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469 >> Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631 >> Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698 >> Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876 >> >> >> >> vector with only m2 >> >> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic >> -- | -- | -- | -- | -- | -- | -- | -- | -- >> Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063 >> Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443 >> Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364 >> Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856 >> Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146 >> Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423 >> Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862 >> Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469 >> Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631 >> Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698 >> Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876 >> >> >> >> vector with only m2 >> References: Message-ID: On Thu, 8 Aug 2024 08:32:17 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/utilities/concurrentHashTable.inline.hpp line 684: >> >>> 682: DEBUG_ONLY(fatal("Cannot resize table: Node hash code has changed possibly due to corruption of the contents." >>> 683: " Node hash code changed from " SIZE_FORMAT " to " SIZE_FORMAT, aux->saved_hash(), aux_hash)); >>> 684: fatal("Cannot resize table: Node hash code has changed possibly due to corruption of the contents."); >> >> Wondering if any compiler will flag the second fatal as unreachable? > > +1. It would also dedup the messages if we wrap the second concat (yay preprocessor macros): > > > fatal("Cannot resize table: Node hash code has changed possibly due to corruption of the contents." > DEBUG_ONLY(" Node hash code changed from " SIZE_FORMAT " to " SIZE_FORMAT, aux->saved_hash(), aux_hash)); I tried this dedup for the message and the compiler was displeased. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20503#discussion_r1709724890 From shade at openjdk.org Thu Aug 8 15:16:32 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 Aug 2024 15:16:32 GMT Subject: RFR: 8333356: JVM crashes with "aux_index does not match even or odd indices" In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 15:09:30 GMT, Coleen Phillimore wrote: >> +1. It would also dedup the messages if we wrap the second concat (yay preprocessor macros): >> >> >> fatal("Cannot resize table: Node hash code has changed possibly due to corruption of the contents." >> DEBUG_ONLY(" Node hash code changed from " SIZE_FORMAT " to " SIZE_FORMAT, aux->saved_hash(), aux_hash)); > > I tried this dedup for the message and the compiler was displeased. Oh, because `fatal` is macro as well, hrmpf. Too bad. Wrap the second `fatal` in `NOT_DEBUG` then? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20503#discussion_r1709735542 From coleenp at openjdk.org Thu Aug 8 15:36:08 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 8 Aug 2024 15:36:08 GMT Subject: RFR: 8333356: JVM crashes with "aux_index does not match even or odd indices" [v2] In-Reply-To: References: Message-ID: > You get this message if the nodes of the concurrent hash table are corrupted or somehow don't yield the same hashcode as when the node was entered in the table. This change makes the error message less obscure, and in debug mode compares the two hash codes. The hash code is kept in the table in only debug mode, because we don't want the table to take a lot more memory. > Tested with tier1-4. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: A bit less repetative message. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20503/files - new: https://git.openjdk.org/jdk/pull/20503/files/a778c923..60bf3745 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20503&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20503&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20503.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20503/head:pull/20503 PR: https://git.openjdk.org/jdk/pull/20503 From rcastanedalo at openjdk.org Thu Aug 8 15:37:19 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 8 Aug 2024 15:37:19 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v4] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Also include HOTSPOT_TARGET_CPU_ARCH-based G1 ADL source file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/20ef68c8..47079ea1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=02-03 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Thu Aug 8 15:37:19 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 8 Aug 2024 15:37:19 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: Message-ID: On Sat, 29 Jun 2024 03:51:29 GMT, Amit Kumar wrote: >> make/hotspot/gensrc/GensrcAdlc.gmk line 205: >> >>> 203: ifeq ($(call check-jvm-feature, g1gc), true) >>> 204: AD_SRC_FILES += $(call uniq, $(wildcard $(foreach d, $(AD_SRC_ROOTS), \ >>> 205: $d/cpu/$(HOTSPOT_TARGET_CPU_ARCH)/gc/g1/g1_$(HOTSPOT_TARGET_CPU).ad \ >> >> on s390, `g1_s390.ad` file is not compiled with current code. >> >> Suggestion: >> >> $d/cpu/$(HOTSPOT_TARGET_CPU_ARCH)/gc/g1/g1_$(HOTSPOT_TARGET_CPU_ARCH).ad \ > > I guess this one might be better: > > diff --git a/make/hotspot/gensrc/GensrcAdlc.gmk b/make/hotspot/gensrc/GensrcAdlc.gmk > index e34f0725397..ef9c15b2975 100644 > --- a/make/hotspot/gensrc/GensrcAdlc.gmk > +++ b/make/hotspot/gensrc/GensrcAdlc.gmk > @@ -203,6 +203,7 @@ ifeq ($(call check-jvm-feature, compiler2), true) > ifeq ($(call check-jvm-feature, g1gc), true) > AD_SRC_FILES += $(call uniq, $(wildcard $(foreach d, $(AD_SRC_ROOTS), \ > $d/cpu/$(HOTSPOT_TARGET_CPU_ARCH)/gc/g1/g1_$(HOTSPOT_TARGET_CPU).ad \ > + $d/cpu/$(HOTSPOT_TARGET_CPU_ARCH)/gc/g1/g1_$(HOTSPOT_TARGET_CPU_ARCH).ad \ > ))) > endif > > > Build is fine with both changes, (tested on Mac-M1) Thanks! I went with the second option (commit 47079ea1) for consistency with other collectors. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1709781421 From coleenp at openjdk.org Thu Aug 8 15:40:31 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 8 Aug 2024 15:40:31 GMT Subject: RFR: 8338064: Give better error for ConcurrentHashTable corruption [v2] In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 15:36:08 GMT, Coleen Phillimore wrote: >> You get this message if the nodes of the concurrent hash table are corrupted or somehow don't yield the same hashcode as when the node was entered in the table. This change makes the error message less obscure, and in debug mode compares the two hash codes. The hash code is kept in the table in only debug mode, because we don't want the table to take a lot more memory. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > A bit less repetative message. I created a new issue and changed the title. I don't know if the bots will associate this PR with the new issue though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20503#issuecomment-2276127984 From shade at openjdk.org Thu Aug 8 15:48:32 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 Aug 2024 15:48:32 GMT Subject: RFR: 8338064: Give better error for ConcurrentHashTable corruption [v2] In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 15:36:08 GMT, Coleen Phillimore wrote: >> You get this message if the nodes of the concurrent hash table are corrupted or somehow don't yield the same hashcode as when the node was entered in the table. This change makes the error message less obscure, and in debug mode compares the two hash codes. The hash code is kept in the table in only debug mode, because we don't want the table to take a lot more memory. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > A bit less repetative message. Looks fine. I think we only need to drop the PR links from the old issue, and that is it. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20503#pullrequestreview-2228271977 From coleenp at openjdk.org Thu Aug 8 16:11:31 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 8 Aug 2024 16:11:31 GMT Subject: RFR: 8338064: Give better error for ConcurrentHashTable corruption [v2] In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 15:36:08 GMT, Coleen Phillimore wrote: >> You get this message if the nodes of the concurrent hash table are corrupted or somehow don't yield the same hashcode as when the node was entered in the table. This change makes the error message less obscure, and in debug mode compares the two hash codes. The hash code is kept in the table in only debug mode, because we don't want the table to take a lot more memory. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > A bit less repetative message. Oh wow, it did it. Thanks for the code review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20503#issuecomment-2276188713 From coleenp at openjdk.org Thu Aug 8 16:13:37 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 8 Aug 2024 16:13:37 GMT Subject: RFR: 8337683: Fix -Wconversion problem with arrayOop.hpp [v5] In-Reply-To: References: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> Message-ID: On Wed, 7 Aug 2024 21:05:05 GMT, Coleen Phillimore wrote: >> Since base_offset_in_bytes and HeapWordSize are int, there's no loss of conversion in making these variables int. This seems trivial. >> Tested with tier1 on linux and windows. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > See if GHA compilers like this. Thanks for reviewing Stefan and David. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20431#issuecomment-2276190004 From coleenp at openjdk.org Thu Aug 8 16:13:37 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 8 Aug 2024 16:13:37 GMT Subject: Integrated: 8337683: Fix -Wconversion problem with arrayOop.hpp In-Reply-To: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> References: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> Message-ID: On Thu, 1 Aug 2024 18:49:34 GMT, Coleen Phillimore wrote: > Since base_offset_in_bytes and HeapWordSize are int, there's no loss of conversion in making these variables int. This seems trivial. > Tested with tier1 on linux and windows. This pull request has now been integrated. Changeset: 9695f095 Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/9695f09581bac856ad97943cca15c65dc21d2adf Stats: 14 lines in 3 files changed: 2 ins; 0 del; 12 mod 8337683: Fix -Wconversion problem with arrayOop.hpp Reviewed-by: stefank, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/20431 From ayang at openjdk.org Thu Aug 8 16:47:40 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 8 Aug 2024 16:47:40 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v4] In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 15:37:19 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Also include HOTSPOT_TARGET_CPU_ARCH-based G1 ADL source file Some naming comments/suggestions, up to you. g1_write_barrier_post_c2 generate_c2_post_barrier_stub The latter is the "next" step if slower path is taken. I wonder if it can be renamed to sth like "...write_barrier_post_c2_stub" to make it obvious that they are related. Both "write_barrier_pre" and "pre_write_barrier" exist. It's not obvious whether that is intended (to highlight some diff) or not. ------------- Marked as reviewed by ayang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2228393022 From jbhateja at openjdk.org Thu Aug 8 17:00:05 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 8 Aug 2024 17:00:05 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI Message-ID: Hi All, As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. . SATURATING_UADD : Saturating unsigned addition. . SATURATING_ADD : Saturating signed addition. . SATURATING_USUB : Saturating unsigned subtraction. . SATURATING_SUB : Saturating signed subtraction. . UMAX : Unsigned max . UMIN : Unsigned min. New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. Summary of changes: - Java side implementation of new vector operators. - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. - C2 compiler IR and inline expander changes. - Optimized x86 backend implementation for new vector operators and their predicated counterparts. - Extends existing VectorAPI Jtreg test suite to cover new operations. Kindly review and share your feedback. Best Regards, PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html ------------- Commit messages: - Removed redundant comment - 8338021: Support saturating vector operators in VectorAPI Changes: https://git.openjdk.org/jdk/pull/20507/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338021 Stats: 9013 lines in 67 files changed: 8923 ins; 28 del; 62 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From jbhateja at openjdk.org Thu Aug 8 17:02:05 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 8 Aug 2024 17:02:05 GMT Subject: RFR: 8338023: Support two vector selectFrom API Message-ID: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Hi All, As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. Declaration:- Vector.selectFrom(Vector v1, Vector v2) Semantics:- Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. Summary of changes: - Java side implementation of new selectFrom API. - C2 compiler IR and inline expander changes. - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. - Optimized x86 backend implementation for AVX512 and legacy target. - Function tests covering new API. JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] Benchmark (size) Mode Cnt Score Error Units SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.244 ops/ms SelectFromBenchmark.selectFromLongVector 1024 thrpt 2 5856.859 ops/ms SelectFromBenchmark.selectFromLongVector 2048 thrpt 2 1513.378 ops/ms SelectFromBenchmark.selectFromShortVector 1024 thrpt 2 17888.617 ops/ms SelectFromBenchmark.selectFromShortVector 2048 thrpt 2 9079.565 ops/ms Kindly review and share your feedback. Best Regards, Jatin [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html ------------- Commit messages: - Adding Benchmark - 8338023: Support two vector selectFrom API Changes: https://git.openjdk.org/jdk/pull/20508/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338023 Stats: 2737 lines in 95 files changed: 2719 ins; 17 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From jbhateja at openjdk.org Thu Aug 8 17:20:06 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 8 Aug 2024 17:20:06 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v2] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SATURATING_UADD : Saturating unsigned addition. > . SATURATING_ADD : Saturating signed addition. > . SATURATING_USUB : Saturating unsigned subtraction. > . SATURATING_SUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 - Removed redundant comment - 8338021: Support saturating vector operators in VectorAPI ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/1ffe4c68..5468e72b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=00-01 Stats: 3609 lines in 32 files changed: 177 ins; 3316 del; 116 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From gziemski at openjdk.org Thu Aug 8 17:50:31 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 8 Aug 2024 17:50:31 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 04:47:18 GMT, David Holmes wrote: > If you called it `MemTypeFlag` - which to me still suggests mutually-exclusive values - then you would not need to rename all the variables with "flag" in their name later. Hmm, not a bad idea. Are there any other opinions? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2276354395 From dlong at openjdk.org Thu Aug 8 19:33:39 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 8 Aug 2024 19:33:39 GMT Subject: RFR: 8337683: Fix -Wconversion problem with arrayOop.hpp [v5] In-Reply-To: References: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> Message-ID: On Thu, 8 Aug 2024 11:07:41 GMT, Coleen Phillimore wrote: >> src/hotspot/share/oops/arrayOop.hpp line 74: >> >>> 72: #ifdef ASSERT >>> 73: // make sure it isn't called before UseCompressedOops is initialized. >>> 74: static int arrayoopdesc_hs = 0; >> >> Why not just do a `checked_cast` on the return statement? (or even a range assert and a static cast?) > > Because restricting the types is better than checked_cast. The return type of length_offset_in_bytes() is int. Also I think the current checked_cast still has problems with signed <--> unsigned. Another technique to use an even more restricted type. For example, if these offset_in_bytes() functions returned uint16_t, then the value can be widened to either `int` or `size_t` without a cast. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20431#discussion_r1710159357 From lmesnik at openjdk.org Thu Aug 8 19:50:35 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 8 Aug 2024 19:50:35 GMT Subject: RFR: 8338010: WB_IsFrameDeoptimized miss ResourceMark In-Reply-To: References: Message-ID: <-tArBxSxhxgUxxbGA2xuniudCdWwSJFqNhC3o_A9TZI=.ff927a03-23e5-4c98-8258-ddd76374b6e2@github.com> On Wed, 7 Aug 2024 23:09:35 GMT, Leonid Mesnik wrote: > The method WB_IsFrameDeoptimized is used only by test com/sun/jdi/EATests.java and intermittently fails with virtual thread test factory. > The log explains how problem happens: > Stack: [0x000000f373e00000,0x000000f373f00000] > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [jvm.dll+0xc956e1] os::win32::platform_print_native_stack+0x101 (os_windows_x86.cpp:235) > V [jvm.dll+0xf59abb] VMError::report+0x149b (vmError.cpp:1010) > V [jvm.dll+0xf5c15e] VMError::report_and_die+0x80e (vmError.cpp:1845) > V [jvm.dll+0x55796e] report_fatal+0x7e (debug.cpp:214) > V [jvm.dll+0xd4d591] ResourceArea::allocate_bytes+0x111 (resourceArea.inline.hpp:33) > V [jvm.dll+0xf44bef] vframe::new_vframe+0x7f (vframe.cpp:68) > V [jvm.dll+0x7fb97a] JavaThread::last_java_vframe+0x3a (javaThread.cpp:2044) > V [jvm.dll+0xf8f659] WB_IsFrameDeoptimized+0x219 (whitebox.cpp:798) > C 0x000001fbdebd3b96 (no source info available) > > Testing by running test with and without thread factory & tier1. @dholmes-ora, @shipilev Thank you for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20502#issuecomment-2276539021 From lmesnik at openjdk.org Thu Aug 8 19:50:35 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 8 Aug 2024 19:50:35 GMT Subject: Integrated: 8338010: WB_IsFrameDeoptimized miss ResourceMark In-Reply-To: References: Message-ID: <4qCUslIYq1zZucuUibf4Feb0bdq5FHXtyE3grmegw6o=.3e19b3f2-18f3-4f03-8a8d-4d0ea621d038@github.com> On Wed, 7 Aug 2024 23:09:35 GMT, Leonid Mesnik wrote: > The method WB_IsFrameDeoptimized is used only by test com/sun/jdi/EATests.java and intermittently fails with virtual thread test factory. > The log explains how problem happens: > Stack: [0x000000f373e00000,0x000000f373f00000] > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [jvm.dll+0xc956e1] os::win32::platform_print_native_stack+0x101 (os_windows_x86.cpp:235) > V [jvm.dll+0xf59abb] VMError::report+0x149b (vmError.cpp:1010) > V [jvm.dll+0xf5c15e] VMError::report_and_die+0x80e (vmError.cpp:1845) > V [jvm.dll+0x55796e] report_fatal+0x7e (debug.cpp:214) > V [jvm.dll+0xd4d591] ResourceArea::allocate_bytes+0x111 (resourceArea.inline.hpp:33) > V [jvm.dll+0xf44bef] vframe::new_vframe+0x7f (vframe.cpp:68) > V [jvm.dll+0x7fb97a] JavaThread::last_java_vframe+0x3a (javaThread.cpp:2044) > V [jvm.dll+0xf8f659] WB_IsFrameDeoptimized+0x219 (whitebox.cpp:798) > C 0x000001fbdebd3b96 (no source info available) > > Testing by running test with and without thread factory & tier1. This pull request has now been integrated. Changeset: 9f08a01c Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/9f08a01cb6ebb08f67749aabdff4efaedfaf3228 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8338010: WB_IsFrameDeoptimized miss ResourceMark Reviewed-by: dholmes, shade ------------- PR: https://git.openjdk.org/jdk/pull/20502 From coleenp at openjdk.org Thu Aug 8 20:12:35 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 8 Aug 2024 20:12:35 GMT Subject: RFR: 8337683: Fix -Wconversion problem with arrayOop.hpp [v5] In-Reply-To: References: <8im3QKkwrXzgYQ-YIbKKn8wUN7d_cc5-uDnixO_pTCg=.2836aa9f-0138-4b05-b2d1-0a5048518700@github.com> Message-ID: On Thu, 8 Aug 2024 19:30:37 GMT, Dean Long wrote: >> Because restricting the types is better than checked_cast. The return type of length_offset_in_bytes() is int. > > Also I think the current checked_cast still has problems with signed <--> unsigned. > > Another technique to use an even more restricted type. For example, if these offset_in_bytes() functions returned uint16_t, then the value can be widened to either `int` or `size_t` without a cast. Yes that would be a change that we could make that would help, and these offset_in_bytes functions should return an unsigned type since they're never negative. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20431#discussion_r1710215749 From thomas.stuefe at gmail.com Thu Aug 8 20:33:55 2024 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 8 Aug 2024 13:33:55 -0700 Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: I like it short, succinct and greppable. NMTCategory? NMTCat? That said, I can live with the current name and dread the cv backporting and support implications of this change. On Thu 8. Aug 2024 at 10:50, Gerard Ziemski wrote: > On Thu, 8 Aug 2024 04:47:18 GMT, David Holmes wrote: > > > If you called it `MemTypeFlag` - which to me still suggests > mutually-exclusive values - then you would not need to rename all the > variables with "flag" in their name later. > > Hmm, not a bad idea. Are there any other opinions? > > ------------- > > PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2276354395 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mli at openjdk.org Thu Aug 8 20:35:08 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 8 Aug 2024 20:35:08 GMT Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v9] In-Reply-To: References: Message-ID: > Hi, > Can you help to review the patch? > > I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it. > > Thanks. > > ## Test > benchmarks run on CanVM-K230 (vlenb == 16), and banana-pi (vlenb == 32) > > I've tried several implementations, respectively with vector group > * m2+m1+scalar > * m2+scalar > * m1+scalar > * pure scalar > The best one is combination of m2+m1, it have best performance in all source size. > > ### K230 > > this implementation (m2+m1) > > Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic > -- | -- | -- | -- | -- | -- | -- | -- | -- > Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063 > Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443 > Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364 > Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856 > Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146 > Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423 > Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862 > Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469 > Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631 > Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698 > Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876 > > > > vector with only m2 > > >> Hmm, not a bad idea. Are there any other opinions? >> >> ------------- >> >> PR Comment: >> https://git.openjdk.org/jdk/pull/20497#issuecomment-2276354395 >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dholmes at openjdk.org Thu Aug 8 22:08:34 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 8 Aug 2024 22:08:34 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v16] In-Reply-To: References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> Message-ID: On Thu, 8 Aug 2024 09:31:17 GMT, Jiawei Tang wrote: >> I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing. > > Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision: > > fix the test condition to avoid NoClassDefFoundError Okay. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20373#pullrequestreview-2228943806 From iklam at openjdk.org Thu Aug 8 22:39:56 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 8 Aug 2024 22:39:56 GMT Subject: RFR: 8338011: CDS archived heap object support for 64-bit Windows Message-ID: We didn't support CDS archived heap object on Windows because - The Windows implementation of `os::map_memory()` cannot map the contents of a file into a region that's already reserved by the garbage collector. - We had a high failure rate for mapping the CDS archive on Windows due to ASLR, sometimes as high as 50%. So it didn't seem worth the effort (mainly testing) to support archived heap objects on Windows. Both of the above issues were fixed in [JDK-8231610](https://bugs.openjdk.org/browse/JDK-8231610), so we should add the support to Windows now. (Tested on Oracle CI tiers 1-7) ------------- Commit messages: - fixed whitespaces - 8338011: CDS archived heap object support for 64-bit Windows Changes: https://git.openjdk.org/jdk/pull/20514/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20514&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338011 Stats: 40 lines in 3 files changed: 23 ins; 10 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/20514.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20514/head:pull/20514 PR: https://git.openjdk.org/jdk/pull/20514 From dholmes at openjdk.org Fri Aug 9 01:09:34 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 9 Aug 2024 01:09:34 GMT Subject: RFR: 8338064: Give better error for ConcurrentHashTable corruption [v2] In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 15:36:08 GMT, Coleen Phillimore wrote: >> You get this message if the nodes of the concurrent hash table are corrupted or somehow don't yield the same hashcode as when the node was entered in the table. This change makes the error message less obscure, and in debug mode compares the two hash codes. The hash code is kept in the table in only debug mode, because we don't want the table to take a lot more memory. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > A bit less repetative message. Looks good. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20503#pullrequestreview-2229115800 From duke at openjdk.org Fri Aug 9 02:24:33 2024 From: duke at openjdk.org (duke) Date: Fri, 9 Aug 2024 02:24:33 GMT Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option [v16] In-Reply-To: References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> Message-ID: On Thu, 8 Aug 2024 09:31:17 GMT, Jiawei Tang wrote: >> I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing. > > Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision: > > fix the test condition to avoid NoClassDefFoundError @jia-wei-tang Your change (at version 36e8e5ccb23209780b983ad5b46bc62d8c0a9634) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20373#issuecomment-2277025430 From jwtang at openjdk.org Fri Aug 9 02:31:46 2024 From: jwtang at openjdk.org (Jiawei Tang) Date: Fri, 9 Aug 2024 02:31:46 GMT Subject: Integrated: 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option In-Reply-To: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com> Message-ID: On Mon, 29 Jul 2024 09:36:47 GMT, Jiawei Tang wrote: > I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing. This pull request has now been integrated. Changeset: 55c50970 Author: Jiawei Tang URL: https://git.openjdk.org/jdk/commit/55c509708e9b89a7609fd41b6e5a271f250bbacd Stats: 83 lines in 2 files changed: 78 ins; 1 del; 4 mod 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option Reviewed-by: dholmes, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/20373 From gcao at openjdk.org Fri Aug 9 03:02:37 2024 From: gcao at openjdk.org (Gui Cao) Date: Fri, 9 Aug 2024 03:02:37 GMT Subject: RFR: 8338019: Fix simple -Wzero-as-null-pointer-constant warnings in riscv code In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 03:37:06 GMT, Gui Cao wrote: > Hi, > Same as [JDK-8337786](https://bugs.openjdk.org/browse/JDK-8337786) for aarch64, similar build warnings exist on riscv. Please help review this trivial change that replaces some uses of literal 0 as a null pointer constant in riscv code to instead use nullptr. > > ### Testing > - [x] Run tier1 tests on SOPHON SG2042 (release) Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20506#issuecomment-2277050866 From duke at openjdk.org Fri Aug 9 03:02:37 2024 From: duke at openjdk.org (duke) Date: Fri, 9 Aug 2024 03:02:37 GMT Subject: RFR: 8338019: Fix simple -Wzero-as-null-pointer-constant warnings in riscv code In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 03:37:06 GMT, Gui Cao wrote: > Hi, > Same as [JDK-8337786](https://bugs.openjdk.org/browse/JDK-8337786) for aarch64, similar build warnings exist on riscv. Please help review this trivial change that replaces some uses of literal 0 as a null pointer constant in riscv code to instead use nullptr. > > ### Testing > - [x] Run tier1 tests on SOPHON SG2042 (release) @zifeihan Your change (at version d32e656716a5e937682ba687510c3de110c176be) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20506#issuecomment-2277051793 From gcao at openjdk.org Fri Aug 9 03:02:37 2024 From: gcao at openjdk.org (Gui Cao) Date: Fri, 9 Aug 2024 03:02:37 GMT Subject: Integrated: 8338019: Fix simple -Wzero-as-null-pointer-constant warnings in riscv code In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 03:37:06 GMT, Gui Cao wrote: > Hi, > Same as [JDK-8337786](https://bugs.openjdk.org/browse/JDK-8337786) for aarch64, similar build warnings exist on riscv. Please help review this trivial change that replaces some uses of literal 0 as a null pointer constant in riscv code to instead use nullptr. > > ### Testing > - [x] Run tier1 tests on SOPHON SG2042 (release) This pull request has now been integrated. Changeset: 0c1e9111 Author: Gui Cao Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/0c1e9111d226b601236b9826e27ecc67a8b625fb Stats: 5 lines in 4 files changed: 0 ins; 0 del; 5 mod 8338019: Fix simple -Wzero-as-null-pointer-constant warnings in riscv code Reviewed-by: fyang, mli, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/20506 From jkarthikeyan at openjdk.org Fri Aug 9 03:31:38 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 9 Aug 2024 03:31:38 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v2] In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 17:20:06 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SATURATING_UADD : Saturating unsigned addition. >> . SATURATING_ADD : Saturating signed addition. >> . SATURATING_USUB : Saturating unsigned subtraction. >> . SATURATING_SUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 > - Removed redundant comment > - 8338021: Support saturating vector operators in VectorAPI src/hotspot/share/opto/type.cpp line 495: > 493: TypeInt::POS1 = TypeInt::make(1,max_jint, WidenMin); // Positive values > 494: TypeInt::INT = TypeInt::make(min_jint,max_jint, WidenMax); // 32-bit integers > 495: TypeInt::UINT = TypeInt::make(0, max_juint, WidenMin); // Unsigned ints This would make an illegal type, right? Since `TypeInt` is signed using `max_juint` as the hi value would end up as signed -1, resulting in the type `0..-1`, an empty type. I wonder if there's a better way to handle this, since in the type system empty types are in a sense equivalent to `TOP`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1710642379 From fyang at openjdk.org Fri Aug 9 03:51:31 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 9 Aug 2024 03:51:31 GMT Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v9] In-Reply-To: References: Message-ID: <6hLPCDfOAhPesnozdkUtBteQGP5c6Dnl-ZfXpQLacSA=.9026bd07-b6c5-4380-b9b5-13493b38897b@github.com> On Thu, 8 Aug 2024 20:35:08 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? >> >> I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it. >> >> Thanks. >> >> ## Test >> benchmarks run on CanVM-K230 (vlenb == 16), and banana-pi (vlenb == 32) >> >> I've tried several implementations, respectively with vector group >> * m2+m1+scalar >> * m2+scalar >> * m1+scalar >> * pure scalar >> The best one is combination of m2+m1, it have best performance in all source size. >> >> ### K230 >> >> this implementation (m2+m1) >> >> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic >> -- | -- | -- | -- | -- | -- | -- | -- | -- >> Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063 >> Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443 >> Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364 >> Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856 >> Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146 >> Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423 >> Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862 >> Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469 >> Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631 >> Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698 >> Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876 >> >> >> >> vector with only m2 >> 2332: assert(!dst.rspec().reloc()->is_data(), "should not use ExternalAddress for jump"); Should these assert be added for other platforms as well? ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20470#pullrequestreview-2229681432 PR Review Comment: https://git.openjdk.org/jdk/pull/20470#discussion_r1711063820 From mli at openjdk.org Fri Aug 9 09:15:39 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 9 Aug 2024 09:15:39 GMT Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v6] In-Reply-To: References: <0NpNq_wNl-qus6kEr_6J7liSQXXYdjybbWQWDJPGPmQ=.8ba0ea43-2bc7-4f01-afee-adb4a43da29c@github.com> Message-ID: <2ZHFNTpxh16pIzrUHk1Qa9vx7pEvyJeTVMidGtcu04k=.02639246-2846-47bd-9526-847ff694be8e@github.com> On Mon, 29 Jul 2024 05:00:13 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: >> >> - merge master >> - Merge branch 'master' into baes64-encode-integrated >> - move label >> - refine code >> - use pure scalar version when rvv is not supported >> - clean code >> - Initial commit > > Hi, will take a look. BTW: Have you resolved the performance issue of base64 decode instrinsic? @RealFYang @luhenry Thanks for your reviewing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19973#issuecomment-2277509809 From mli at openjdk.org Fri Aug 9 09:15:40 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 9 Aug 2024 09:15:40 GMT Subject: Integrated: 8314125: RISC-V: implement Base64 intrinsic - encoding In-Reply-To: References: Message-ID: <834oTK-Lnzs8oiaW0xy3vi7_xL3bSNNgWXNiYGIfUX4=.6a79b2ed-248b-4d87-9694-de55e64271ef@github.com> On Mon, 1 Jul 2024 14:13:26 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch? > > I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it. > > Thanks. > > ## Test > benchmarks run on CanVM-K230 (vlenb == 16), and banana-pi (vlenb == 32) > > I've tried several implementations, respectively with vector group > * m2+m1+scalar > * m2+scalar > * m1+scalar > * pure scalar > The best one is combination of m2+m1, it have best performance in all source size. > > ### K230 > > this implementation (m2+m1) > > Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic > -- | -- | -- | -- | -- | -- | -- | -- | -- > Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063 > Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443 > Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364 > Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856 > Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146 > Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423 > Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862 > Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469 > Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631 > Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698 > Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876 > > > > vector with only m2 > Both "write_barrier_pre" and "pre_write_barrier" exist. It's not obvious whether that is intended (to highlight some diff) or not. This is accidental, as far as I can see. `write_barrier_pre` is the pre-existing name for the interpreter barrier generation functions, I would rather leave it as-is to avoid making this changeset even larger. Instead, I have renamed the helper functions `g1_pre_write_barrier()` and `g1_post_write_barrier()` to `write_barrier_pre()` and `write_barrier_post()`, for consistency (and dropped `g1_` since it is obvious from the context) in commit 1834bf4. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2277770042 From coleenp at openjdk.org Fri Aug 9 12:02:39 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 9 Aug 2024 12:02:39 GMT Subject: RFR: 8338064: Give better error for ConcurrentHashTable corruption [v2] In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 15:36:08 GMT, Coleen Phillimore wrote: >> You get this message if the nodes of the concurrent hash table are corrupted or somehow don't yield the same hashcode as when the node was entered in the table. This change makes the error message less obscure, and in debug mode compares the two hash codes. The hash code is kept in the table in only debug mode, because we don't want the table to take a lot more memory. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > A bit less repetative message. Thanks for the code review, David and Aleksey. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20503#issuecomment-2277785577 From coleenp at openjdk.org Fri Aug 9 12:02:40 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 9 Aug 2024 12:02:40 GMT Subject: Integrated: 8338064: Give better error for ConcurrentHashTable corruption In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 23:55:47 GMT, Coleen Phillimore wrote: > You get this message if the nodes of the concurrent hash table are corrupted or somehow don't yield the same hashcode as when the node was entered in the table. This change makes the error message less obscure, and in debug mode compares the two hash codes. The hash code is kept in the table in only debug mode, because we don't want the table to take a lot more memory. > Tested with tier1-4. This pull request has now been integrated. Changeset: 069e0ea6 Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/069e0ea69f43960164d3e077d2c7b950cde77927 Stats: 69 lines in 3 files changed: 68 ins; 0 del; 1 mod 8338064: Give better error for ConcurrentHashTable corruption Reviewed-by: dholmes, shade ------------- PR: https://git.openjdk.org/jdk/pull/20503 From rcastanedalo at openjdk.org Fri Aug 9 12:03:37 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 9 Aug 2024 12:03:37 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> Message-ID: <5Q8PqULlpKfoPLXRqI0ua0dVWAy3zPBqtFpycNwBg0Y=.f2830c84-63ba-43cd-85e3-2245e4ac8917@github.com> On Sun, 21 Jul 2024 08:21:39 GMT, Martin Doerr wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Build barrier data in G1BarrierSetC2::get_store_barrier() by adding, rather than removing, barrier tags > > src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 86: > >> 84: // an indirect memory operand) to reduce C2's scheduling and register >> 85: // allocation pressure (fewer Mach nodes). The same holds for g1StoreN and >> 86: // g1EncodePAndStoreN. > > I'm not convinced that this is beneficial. We're wasting a temp register just for an addition? I agree that using indirect memory operands is the most readable choice, and is slightly less wasteful from a register usage perspective. However, when I tried this choice a couple of months ago, I observed timeouts in some CTW runs, which as far as I remember were caused when LCM processed huge basic blocks with lots of memory writes (e.g. arising from static initializations of large String arrays such as in [here](https://github.com/apache/lucene/blob/ea562f6ef2b32fe6eadf57c6381d9a69acb043c7/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/KStemData1.java#L47-L748)), in combination with C2 stress options. In these scenarios, the large number of additional Mach nodes seemed to cause the timeouts. I settled for materializing the store address internally to guard against such corner cases. I did not see any significant performance difference between the two choices in my benchmark results. I would like to study whether LCM can be made more robust in this scenario, which would enable using indirect memory operands here, but I think this would be best addressed in a separate RFE. Would it be OK by now to extend the code comment with the details provided in the above explanation? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1711337413 From duke at openjdk.org Fri Aug 9 12:57:44 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 9 Aug 2024 12:57:44 GMT Subject: RFR: 8337938: ZUtils::alloc_aligned allocates without reporting to NMT Message-ID: Replaces usage of posix_memalign/_aligned_malloc with os::malloc and manual alignment to report memory usage to NMT. Manually aligning the memory makes the returned address unfreeable by malloc (as clarified by the added comment), which is reasonable since the memory used by ZUtils::alloc_aligned is never freed. Tested with tiers 1-3. ------------- Commit messages: - Remove trailing whitespace - 8337938: ZUtils::alloc_aligned allocates without reporting to NMT Changes: https://git.openjdk.org/jdk/pull/20523/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20523&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337938 Stats: 101 lines in 6 files changed: 13 ins; 83 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20523.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20523/head:pull/20523 PR: https://git.openjdk.org/jdk/pull/20523 From stefank at openjdk.org Fri Aug 9 13:30:32 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 9 Aug 2024 13:30:32 GMT Subject: RFR: 8337938: ZUtils::alloc_aligned allocates without reporting to NMT In-Reply-To: References: Message-ID: On Fri, 9 Aug 2024 12:47:18 GMT, Joel Sikstr?m wrote: > Replaces usage of posix_memalign/_aligned_malloc with os::malloc and manual alignment to report memory usage to NMT. Manually aligning the memory makes the returned address unfreeable by malloc (as clarified by the added comment), which is reasonable since the memory used by ZUtils::alloc_aligned is never freed. > > Tested with tiers 1-3. Looks good. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20523#pullrequestreview-2230191554 From fgao at openjdk.org Fri Aug 9 13:37:54 2024 From: fgao at openjdk.org (Fei Gao) Date: Fri, 9 Aug 2024 13:37:54 GMT Subject: RFR: 8337536: AArch64: Enable BTI branch protection for runtime part [v2] In-Reply-To: References: Message-ID: > This patch enables BTI branch protection for runtime part on Linux/aarch64 platform. > > Motivation > > 1. Since Fedora 33, glibc+kernel are PAC/BTI enabled by default. User-level packages can gain additional hardening by compiling with the GCC/Clang flag `-mbranch-protection=flag`. See [1]. > > 2. In JDK-8277204 [2], `--enable-branch-protection` was introduced as one VM configure flag, which would pass `-mbranch-protection=standard` compilation flags to all c/c++ files. Note that `standard` turns on both `pac-ret` and `bti` branch protections. For more details about code reuse attacks and hardware-assisted branch protections on AArch64, see [3]. > > However, we checked the `.note.gnu.property` section of all the shared libraries under `jdk/lib` on Fedora 40, and found that only libjvm.so didn't set these two target feature bits: > > > GNU_PROPERTY_AARCH64_FEATURE_1_BTI > GNU_PROPERTY_AARCH64_FEATURE_1_PAC > > > Note-1: BTI is an all or nothing property for a link unit [4]. That is, libjvm.so is not BTI-enabled. > > Note-2: PAC bit in `.note.gnu.property` section is used to protect `.got.plt` table. It's independent of whether the relocatable objects use PAC or not. > > Goal > > Hence, this patch aims to set PAC/BTI feature bits of the `.note.gnu.property` section for libjvm.so. > > Implementation > > Task-1: find out the problematic input objects > > From [5], "Static linkers processing ELF relocatable objects must set the feature bit in the output object or image only if all the input objects have the corresponding feature bit set." Hence we suspect that the root cause is probably that the PAC/BTI feature bits are not set only for some input objects of libjvm.so. > > In order to find out these inputs, we passed `--force-bti` linker flag [4] in my local test. This linker flag would warn if any input object does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI. We got the following list: > > > src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S > src/hotspot/os_cpu/linux_aarch64/copy_linux_aarch64.S > src/hotspot/os_cpu/linux_aarch64/safefetch_linux_aarch64.S > src/hotspot/os_cpu/linux_aarch64/threadLS_linux_aarch64.S > > > Task-2: add `.note.gnu.property` section for these assembly files > > As mentioned in Motivation-2 part, `-mbranch-protection=standard` is passed to compile c/c++ files but these assembly files are missed. > > In this patch, we also pass `-mbranch-protection=standard` flag to assembler (See the update in flags-cflags.m4 and flags-other.m4), and add `.note.gnu.property` section at the end... Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Clean up makefile - Merge branch 'master' into enable-bti-runtime - 8337536: AArch64: Enable BTI branch protection for runtime part This patch enables BTI branch protection for runtime part on Linux/aarch64 platform. Motivation 1. Since Fedora 33, glibc+kernel are PAC/BTI enabled by default. User-level packages can gain additional hardening by compiling with the GCC/Clang flag `-mbranch-protection=flag`. See [1]. 2. In JDK-8277204 [2], `--enable-branch-protection` was introduced as one VM configure flag, which would pass `-mbranch-protection=standard` compilation flags to all c/c++ files. Note that `standard` turns on both `pac-ret` and `bti` branch protections. For more details about code reuse attacks and hardware-assisted branch protections on AArch64, see [3]. However, we checked the `.note.gnu.property` section of all the shared libraries under `jdk/lib` on Fedora 40, and found that only libjvm.so didn't set these two target feature bits: ``` GNU_PROPERTY_AARCH64_FEATURE_1_BTI GNU_PROPERTY_AARCH64_FEATURE_1_PAC ``` Note-1: BTI is an all or nothing property for a link unit [4]. That is, libjvm.so is not BTI-enabled. Note-2: PAC bit in `.note.gnu.property` section is used to protect `.got.plt` table. It's independent of whether the relocatable objects use PAC or not. Goal Hence, this patch aims to set PAC/BTI feature bits of the `.note.gnu.property` section for libjvm.so. Implementation Task-1: find out the problematic input objects From [5], "Static linkers processing ELF relocatable objects must set the feature bit in the output object or image only if all the input objects have the corresponding feature bit set." Hence we suspect that the root cause is probably that the PAC/BTI feature bits are not set only for some input objects of libjvm.so. In order to find out these inputs, we passed `--force-bti` linker flag [4] in my local test. This linker flag would warn if any input object does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI. We got the following list: ``` src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S src/hotspot/os_cpu/linux_aarch64/copy_linux_aarch64.S src/hotspot/os_cpu/linux_aarch64/safefetch_linux_aarch64.S src/hotspot/os_cpu/linux_aarch64/threadLS_linux_aarch64.S ``` Task-2: add `.note.gnu.property` section for these assembly files As mentioned in Motivation-2 part, `-mbranch-protection=standard` is passed to compile c/c++ files but these assembly files are missed. In this patch, we also pass `-mbranch-protection=standard` flag to assembler (See the update in flags-cflags.m4 and flags-other.m4), and add `.note.gnu.property` section at the end of these assembler files. With this change, we can see PAC/BTI feature bits in the final libjvm.so. Task-3: add BTI landing pads for hand written assembly In the local test on Fedora 40 with PAC/BTI-capable hardware, we got `SIGILL` error, which is one typical BTI error (branch target exception). The root cause is that we should add the missing BTI landing pads for hand written assembly in hotspot. File-1 copy_aarch64.hpp: It's a switch-case statement and we add `bti j` for these indirect jumps. File-2 atomic_linux_aarch64.S: We add landings pads `bti c` at the function entries. File-3 copy_linux_aarch64.S: There is no need to add `bti c` at the function entries since they are called via `bl`. And we should handle the indirect jumps. File-4 safefetch_linux_aarch64.S: Similar to file-3, there is no need to handle these function entries. File-5 threadLS_linux_aarch64.S: No need to handle the function entry because `paciasp` can act as the landing pad. Evaluation 1. jtreg test We ran tier 1-3 jtreg tests on Fedora 40 + GCC 14 + the following AArch64 hardware and all tests passed. ``` 1. w/o PAC and w/o BTI 2. w/ PAC and w/o BTI 3. w/ PAC and w/ BTI ``` We also ran the jtreg tests on Fedora 40 + Clang 18 + hardware w/ PAC and w/ BTI. The test passed too. 2. code size We got about 2% code size increase before and after `--enbale-branch-protection` is used. This code size change looks reasonable. See the evaluation on glibc [6]. [1] https://fedoraproject.org/wiki/Changes/Aarch64_PointerAuthentication [2] https://bugs.openjdk.org/browse/JDK-8277204 [3] https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/code-reuse-attacks-the-compiler-story [4] https://reviews.llvm.org/D62609 [5] https://github.com/ARM-software/abi-aa/blob/2a70c42d62e9c3eb5887fa50b71257f20daca6f9/aaelf64/aaelf64.rst#program-property [6] https://developer.arm.com/documentation/102433/0100/Applying-these-techniques-to-real-code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20491/files - new: https://git.openjdk.org/jdk/pull/20491/files/d1506d7d..114953da Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20491&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20491&range=00-01 Stats: 7788 lines in 182 files changed: 2326 ins; 4845 del; 617 mod Patch: https://git.openjdk.org/jdk/pull/20491.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20491/head:pull/20491 PR: https://git.openjdk.org/jdk/pull/20491 From fgao at openjdk.org Fri Aug 9 13:37:54 2024 From: fgao at openjdk.org (Fei Gao) Date: Fri, 9 Aug 2024 13:37:54 GMT Subject: RFR: 8337536: AArch64: Enable BTI branch protection for runtime part In-Reply-To: References: Message-ID: <5_sk0lKoqEJXEC15_5m3NTft9fQ1nOJFeTdEWLGHAVw=.ed44303e-af6e-4d85-a0e7-e0be1ce8136c@github.com> On Wed, 7 Aug 2024 10:40:09 GMT, Fei Gao wrote: > This patch enables BTI branch protection for runtime part on Linux/aarch64 platform. > > Motivation > > 1. Since Fedora 33, glibc+kernel are PAC/BTI enabled by default. User-level packages can gain additional hardening by compiling with the GCC/Clang flag `-mbranch-protection=flag`. See [1]. > > 2. In JDK-8277204 [2], `--enable-branch-protection` was introduced as one VM configure flag, which would pass `-mbranch-protection=standard` compilation flags to all c/c++ files. Note that `standard` turns on both `pac-ret` and `bti` branch protections. For more details about code reuse attacks and hardware-assisted branch protections on AArch64, see [3]. > > However, we checked the `.note.gnu.property` section of all the shared libraries under `jdk/lib` on Fedora 40, and found that only libjvm.so didn't set these two target feature bits: > > > GNU_PROPERTY_AARCH64_FEATURE_1_BTI > GNU_PROPERTY_AARCH64_FEATURE_1_PAC > > > Note-1: BTI is an all or nothing property for a link unit [4]. That is, libjvm.so is not BTI-enabled. > > Note-2: PAC bit in `.note.gnu.property` section is used to protect `.got.plt` table. It's independent of whether the relocatable objects use PAC or not. > > Goal > > Hence, this patch aims to set PAC/BTI feature bits of the `.note.gnu.property` section for libjvm.so. > > Implementation > > Task-1: find out the problematic input objects > > From [5], "Static linkers processing ELF relocatable objects must set the feature bit in the output object or image only if all the input objects have the corresponding feature bit set." Hence we suspect that the root cause is probably that the PAC/BTI feature bits are not set only for some input objects of libjvm.so. > > In order to find out these inputs, we passed `--force-bti` linker flag [4] in my local test. This linker flag would warn if any input object does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI. We got the following list: > > > src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S > src/hotspot/os_cpu/linux_aarch64/copy_linux_aarch64.S > src/hotspot/os_cpu/linux_aarch64/safefetch_linux_aarch64.S > src/hotspot/os_cpu/linux_aarch64/threadLS_linux_aarch64.S > > > Task-2: add `.note.gnu.property` section for these assembly files > > As mentioned in Motivation-2 part, `-mbranch-protection=standard` is passed to compile c/c++ files but these assembly files are missed. > > In this patch, we also pass `-mbranch-protection=standard` flag to assembler (See the update in flags-cflags.m4 and flags-other.m4), and add `.note.gnu.property` section at the end... > It turned out to be easier to write it myself than trying to explain it. Please have a look here: [0fe840d](https://github.com/openjdk/jdk/commit/0fe840dec597bb4a819eb2025a6d56cd82f237b5) > > (This also contains some additional cleanup in the branch protection configure code.) Thanks for your review and suggestions @magicus . Updated in the new commit :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20491#issuecomment-2277959315 From mdoerr at openjdk.org Fri Aug 9 14:08:34 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 9 Aug 2024 14:08:34 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: <5Q8PqULlpKfoPLXRqI0ua0dVWAy3zPBqtFpycNwBg0Y=.f2830c84-63ba-43cd-85e3-2245e4ac8917@github.com> References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <5Q8PqULlpKfoPLXRqI0ua0dVWAy3zPBqtFpycNwBg0Y=.f2830c84-63ba-43cd-85e3-2245e4ac8917@github.com> Message-ID: On Fri, 9 Aug 2024 12:00:26 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 86: >> >>> 84: // an indirect memory operand) to reduce C2's scheduling and register >>> 85: // allocation pressure (fewer Mach nodes). The same holds for g1StoreN and >>> 86: // g1EncodePAndStoreN. >> >> I'm not convinced that this is beneficial. We're wasting a temp register just for an addition? > > I agree that using indirect memory operands is the most readable choice, and is slightly less wasteful from a register usage perspective. However, when I tried this choice a couple of months ago, I observed timeouts in some CTW runs, which as far as I remember were caused when LCM processed huge basic blocks with lots of memory writes (e.g. arising from static initializations of large String arrays such as in [here](https://github.com/apache/lucene/blob/ea562f6ef2b32fe6eadf57c6381d9a69acb043c7/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/KStemData1.java#L47-L748)), in combination with C2 stress options. In these scenarios, the large number of additional Mach nodes seemed to cause the timeouts. I settled for materializing the store address internally to guard against such corner cases. I did not see any significant performance difference between the two choices in my benchmark results. > > I would like to study whether LCM can be made more robust in this scenario, which would enable using indirect memory operands here, but I think this would be best addressed in a separate RFE. Would it be OK by now to extend the code comment with the details provided in the above explanation? Ok, doing it in a separate RFE is fine with me. This sounds like a C2 problem which should get investigated. It may cause other performance problems, too. Maybe a native profiler can show what takes too much time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1711536279 From fgao at openjdk.org Fri Aug 9 14:14:31 2024 From: fgao at openjdk.org (Fei Gao) Date: Fri, 9 Aug 2024 14:14:31 GMT Subject: RFR: 8337536: AArch64: Enable BTI branch protection for runtime part In-Reply-To: <446-JAhZlwZT7eNafXxR90EqiIUuV5Xd9bMfqXTOVA4=.45e46493-f09e-4ef4-9d13-6657b7938433@github.com> References: <446-JAhZlwZT7eNafXxR90EqiIUuV5Xd9bMfqXTOVA4=.45e46493-f09e-4ef4-9d13-6657b7938433@github.com> Message-ID: <21FL0PReEC5ZBPAFIGUhEmsnso1lLdf8W86qIjxHSPs=.1fe33a30-5d14-43f3-ab18-cbd4eb668ca6@github.com> On Wed, 7 Aug 2024 17:27:00 GMT, Andrew Haley wrote: > Can you explain why we want to support PAC without BTI? Would anyone use such a config? Thanks for reviewing @theRealAph . Sorry, I don't quite understand your question. Do you mean why we currently only support PAC? PAC is mandatory from Armv8.3 for ROP attacks, while BTI is mandatory from Armv8.5 for JOP attacks. JDK currently has PAC enabled, but not BTI. Or do you mean if we need the option to just support one of them? Now we enable BTI and PAC at the same time by configuring `--enable-branch-protection` and disable them without the flag, i.e. both or nothing. GCC supports all options to give maximum flexibility, just in case anyone wants it. What do you think? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20491#issuecomment-2278041901 From ihse at openjdk.org Fri Aug 9 14:23:33 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 9 Aug 2024 14:23:33 GMT Subject: RFR: 8337536: AArch64: Enable BTI branch protection for runtime part [v2] In-Reply-To: References: Message-ID: On Fri, 9 Aug 2024 13:37:54 GMT, Fei Gao wrote: >> This patch enables BTI branch protection for runtime part on Linux/aarch64 platform. >> >> Motivation >> >> 1. Since Fedora 33, glibc+kernel are PAC/BTI enabled by default. User-level packages can gain additional hardening by compiling with the GCC/Clang flag `-mbranch-protection=flag`. See [1]. >> >> 2. In JDK-8277204 [2], `--enable-branch-protection` was introduced as one VM configure flag, which would pass `-mbranch-protection=standard` compilation flags to all c/c++ files. Note that `standard` turns on both `pac-ret` and `bti` branch protections. For more details about code reuse attacks and hardware-assisted branch protections on AArch64, see [3]. >> >> However, we checked the `.note.gnu.property` section of all the shared libraries under `jdk/lib` on Fedora 40, and found that only libjvm.so didn't set these two target feature bits: >> >> >> GNU_PROPERTY_AARCH64_FEATURE_1_BTI >> GNU_PROPERTY_AARCH64_FEATURE_1_PAC >> >> >> Note-1: BTI is an all or nothing property for a link unit [4]. That is, libjvm.so is not BTI-enabled. >> >> Note-2: PAC bit in `.note.gnu.property` section is used to protect `.got.plt` table. It's independent of whether the relocatable objects use PAC or not. >> >> Goal >> >> Hence, this patch aims to set PAC/BTI feature bits of the `.note.gnu.property` section for libjvm.so. >> >> Implementation >> >> Task-1: find out the problematic input objects >> >> From [5], "Static linkers processing ELF relocatable objects must set the feature bit in the output object or image only if all the input objects have the corresponding feature bit set." Hence we suspect that the root cause is probably that the PAC/BTI feature bits are not set only for some input objects of libjvm.so. >> >> In order to find out these inputs, we passed `--force-bti` linker flag [4] in my local test. This linker flag would warn if any input object does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI. We got the following list: >> >> >> src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S >> src/hotspot/os_cpu/linux_aarch64/copy_linux_aarch64.S >> src/hotspot/os_cpu/linux_aarch64/safefetch_linux_aarch64.S >> src/hotspot/os_cpu/linux_aarch64/threadLS_linux_aarch64.S >> >> >> Task-2: add `.note.gnu.property` section for these assembly files >> >> As mentioned in Motivation-2 part, `-mbranch-protection=standard` is passed to compile c/c++ files but these assembly files are missed. >> >> In this patch, we also pass `-mbranch-protection=standard` flag to assembler (See the update i... > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Clean up makefile > - Merge branch 'master' into enable-bti-runtime > - 8337536: AArch64: Enable BTI branch protection for runtime part > > This patch enables BTI branch protection for runtime part on > Linux/aarch64 platform. > > Motivation > > 1. Since Fedora 33, glibc+kernel are PAC/BTI enabled by default. > User-level packages can gain additional hardening by compiling with the > GCC/Clang flag `-mbranch-protection=flag`. See [1]. > > 2. In JDK-8277204 [2], `--enable-branch-protection` was introduced as > one VM configure flag, which would pass `-mbranch-protection=standard` > compilation flags to all c/c++ files. Note that `standard` turns on both > `pac-ret` and `bti` branch protections. For more details about code > reuse attacks and hardware-assisted branch protections on AArch64, see > [3]. > > However, we checked the `.note.gnu.property` section of all the shared > libraries under `jdk/lib` on Fedora 40, and found that only libjvm.so > didn't set these two target feature bits: > > ``` > GNU_PROPERTY_AARCH64_FEATURE_1_BTI > GNU_PROPERTY_AARCH64_FEATURE_1_PAC > ``` > > Note-1: BTI is an all or nothing property for a link unit [4]. That is, > libjvm.so is not BTI-enabled. > > Note-2: PAC bit in `.note.gnu.property` section is used to protect > `.got.plt` table. It's independent of whether the relocatable objects > use PAC or not. > > Goal > > Hence, this patch aims to set PAC/BTI feature bits of the > `.note.gnu.property` section for libjvm.so. > > Implementation > > Task-1: find out the problematic input objects > > From [5], "Static linkers processing ELF relocatable objects must set > the feature bit in the output object or image only if all the input > objects have the corresponding feature bit set." Hence we suspect that > the root cause is probably that the PAC/BTI feature bits are not set > only for some input objects of libjvm.so. > > In order to find out these inputs, we passed `--force-bti` linker flag > [4] in my local test. This linker flag would warn if any input object > does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI. We got the following > list: > > ``` > src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S > ... Thanks. I'll leave it for someone else to review the build changes, as they are effectively written by me now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20491#issuecomment-2278061225 From duke at openjdk.org Fri Aug 9 15:23:48 2024 From: duke at openjdk.org (duke) Date: Fri, 9 Aug 2024 15:23:48 GMT Subject: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v13] In-Reply-To: References: Message-ID: <3r3BVhGKPPKpcHX9Xfz2nBwWnod3-FrolykXcf9EQPc=.65a6e150-f33d-4dd6-a5e3-3058427c821f@github.com> On Thu, 6 Oct 2022 06:28:04 GMT, Smita Kamath wrote: >> 8289552: Make intrinsic conversions between bit representations of half precision values and floats > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated instruct to use kmovw @smita-kamath Your change (at version a00c3ecdab6b2c8ca6883e92bb51e3fa99544a17) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/9781#issuecomment-1275006152 From epeter at openjdk.org Fri Aug 9 15:23:48 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 9 Aug 2024 15:23:48 GMT Subject: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v13] In-Reply-To: References: <66be8SJdxPOqmqsQ1YIwS4zM4GwPerypGIf8IbfxhRs=.1d03c94a-f3e5-40ae-999e-bdd5f328170d@github.com> Message-ID: On Tue, 11 Oct 2022 17:00:53 GMT, Smita Kamath wrote: >> I started new testing. > > @vnkozlov Thank you for reviewing the patch. @smita-kamath I think I just found another regression of this feature: https://bugs.openjdk.org/browse/JDK-8338126 Can you please have a look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/9781#issuecomment-2278194773 From phh at openjdk.org Fri Aug 9 17:10:39 2024 From: phh at openjdk.org (Paul Hohensee) Date: Fri, 9 Aug 2024 17:10:39 GMT Subject: RFR: 8337657: AArch64: No need for acquire fence in safepoint poll during JNI calls In-Reply-To: <44fCyKKdCaJQB8k82GuR5LnwERGsRta7cVpasF0kVvc=.6f38152b-22a1-4316-9835-0a1e2a9a78c5@github.com> References: <44fCyKKdCaJQB8k82GuR5LnwERGsRta7cVpasF0kVvc=.6f38152b-22a1-4316-9835-0a1e2a9a78c5@github.com> Message-ID: On Thu, 1 Aug 2024 13:36:26 GMT, Dmitry Chuyko wrote: > This is a tiny change to improve JNI calls performance on AArch64. In SharedRuntime::generate_native_wrapper() and TemplateInterpreterGenerator::generate_native_entry() safepoint_poll is made with acquire=true. It comes from the aarch64 implementation of Thread-local handshakes [0], [1]. Presently, it is no longer required [2]. > > Turning LDAR into regular load has significant performance effect. For instance, NativeCall benchmarks [3] by @simonis on Graviton 2 show following improvements: > > -XX:-UseSystemMemoryBarrier (current default) > > > NativeCall.callingEmptyNative 8.04% > NativeCall.callingJniCriticalArray 1.01% > NativeCall.callingJniCriticalEmpty 6.73% > NativeCall.callingStaticEmpty 10.47% > NativeCall.methodCallingNativeWithArgs 10.73% > NativeCall.methodCallingNativeWithManyArgs 9.41% > NativeCall.staticMethodCallingStaticNativeIntStub 3.68% > NativeCall.staticMethodCallingStaticNativeNoTiered 9.86% > NativeCall.staticMethodCallingStaticNativeWithManyArgs 4.81% > > > -XX:+UseSystemMemoryBarrier > > > NativeCall.callingEmptyNative 33.70% > NativeCall.callingJniCriticalArray 3.64% > NativeCall.callingJniCriticalEmpty 34.15% > NativeCall.callingStaticArray 3.02% > NativeCall.callingStaticEmpty 34.25% > NativeCall.methodCallingNativeWithArgs 35.98% > NativeCall.methodCallingNativeWithManyArgs 34.42% > NativeCall.staticMethodCallingStaticNativeIntStub 15.66% > NativeCall.staticMethodCallingStaticNativeNoTiered 35.27% > NativeCall.staticMethodCallingStaticNativeWithManyArgs 32.99% > > > Similar improvements are observed on different CPUs. > > It is especially interesting that -XX:+UseSystemMemoryBarrier variant began to show improvements in cases where there was parity. > > Testing: tier1-3 on linux-aarch64. > > [0] https://bugs.openjdk.org/browse/JDK-8189596 > [1] https://mail.openjdk.org/pipermail/hotspot-dev/2017-November/029264.html > [2] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-July/078715.html > [3] https://github.com/simonis/Java2Native/tree/main/examples/jmh/java2native Marked as reviewed by phh (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20420#pullrequestreview-2230681517 From kvn at openjdk.org Fri Aug 9 17:12:38 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 9 Aug 2024 17:12:38 GMT Subject: RFR: 8337797: Additional ExternalAddress cleanup [v2] In-Reply-To: References: Message-ID: On Tue, 6 Aug 2024 04:48:03 GMT, Vladimir Kozlov wrote: >> While working on [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519) I noticed few ExternalAddress cases I missed in [JDK-8337396](https://bugs.openjdk.org/browse/JDK-8337396) changes. >> >> I also added asserts on x86 to catch using ExternalAddress for jumps and calls instructions and caught few additional cases (Windows and arraycopy cases). >> >> Tested tier1-5,hs-stress,hs-comp > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > riscv update Thank you, Tobias, for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20470#issuecomment-2278378503 From kvn at openjdk.org Fri Aug 9 17:12:40 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 9 Aug 2024 17:12:40 GMT Subject: RFR: 8337797: Additional ExternalAddress cleanup [v2] In-Reply-To: References: Message-ID: <2J5YvZHmf9MIxf5mtJUzOvNWPQ-Ygo2DlVjo4pBJ-VY=.184df002-e76d-4361-8d24-d908957b9a93@github.com> On Fri, 9 Aug 2024 08:43:24 GMT, Tobias Hartmann wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> riscv update > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 2332: > >> 2330: void MacroAssembler::jump(AddressLiteral dst, Register rscratch) { >> 2331: assert(rscratch != noreg || always_reachable(dst), "missing"); >> 2332: assert(!dst.rspec().reloc()->is_data(), "should not use ExternalAddress for jump"); > > Should these assert be added for other platforms as well? It is not easy for me to find instructions in other ports into which to add such assert. For example, aarch64 use several instructions to load address for branch. I hope engineers responsible for ports will add such assert where it is needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20470#discussion_r1711820620 From kvn at openjdk.org Fri Aug 9 17:12:40 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 9 Aug 2024 17:12:40 GMT Subject: Integrated: 8337797: Additional ExternalAddress cleanup In-Reply-To: References: Message-ID: On Mon, 5 Aug 2024 18:11:55 GMT, Vladimir Kozlov wrote: > While working on [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519) I noticed few ExternalAddress cases I missed in [JDK-8337396](https://bugs.openjdk.org/browse/JDK-8337396) changes. > > I also added asserts on x86 to catch using ExternalAddress for jumps and calls instructions and caught few additional cases (Windows and arraycopy cases). > > Tested tier1-5,hs-stress,hs-comp This pull request has now been integrated. Changeset: 60fa08fc Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/60fa08fcfe5c6551ee3120330ade93e45df618c7 Stats: 26 lines in 10 files changed: 1 ins; 5 del; 20 mod 8337797: Additional ExternalAddress cleanup Reviewed-by: adinn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/20470 From dchuyko at openjdk.org Fri Aug 9 17:59:37 2024 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Fri, 9 Aug 2024 17:59:37 GMT Subject: Integrated: 8337657: AArch64: No need for acquire fence in safepoint poll during JNI calls In-Reply-To: <44fCyKKdCaJQB8k82GuR5LnwERGsRta7cVpasF0kVvc=.6f38152b-22a1-4316-9835-0a1e2a9a78c5@github.com> References: <44fCyKKdCaJQB8k82GuR5LnwERGsRta7cVpasF0kVvc=.6f38152b-22a1-4316-9835-0a1e2a9a78c5@github.com> Message-ID: On Thu, 1 Aug 2024 13:36:26 GMT, Dmitry Chuyko wrote: > This is a tiny change to improve JNI calls performance on AArch64. In SharedRuntime::generate_native_wrapper() and TemplateInterpreterGenerator::generate_native_entry() safepoint_poll is made with acquire=true. It comes from the aarch64 implementation of Thread-local handshakes [0], [1]. Presently, it is no longer required [2]. > > Turning LDAR into regular load has significant performance effect. For instance, NativeCall benchmarks [3] by @simonis on Graviton 2 show following improvements: > > -XX:-UseSystemMemoryBarrier (current default) > > > NativeCall.callingEmptyNative 8.04% > NativeCall.callingJniCriticalArray 1.01% > NativeCall.callingJniCriticalEmpty 6.73% > NativeCall.callingStaticEmpty 10.47% > NativeCall.methodCallingNativeWithArgs 10.73% > NativeCall.methodCallingNativeWithManyArgs 9.41% > NativeCall.staticMethodCallingStaticNativeIntStub 3.68% > NativeCall.staticMethodCallingStaticNativeNoTiered 9.86% > NativeCall.staticMethodCallingStaticNativeWithManyArgs 4.81% > > > -XX:+UseSystemMemoryBarrier > > > NativeCall.callingEmptyNative 33.70% > NativeCall.callingJniCriticalArray 3.64% > NativeCall.callingJniCriticalEmpty 34.15% > NativeCall.callingStaticArray 3.02% > NativeCall.callingStaticEmpty 34.25% > NativeCall.methodCallingNativeWithArgs 35.98% > NativeCall.methodCallingNativeWithManyArgs 34.42% > NativeCall.staticMethodCallingStaticNativeIntStub 15.66% > NativeCall.staticMethodCallingStaticNativeNoTiered 35.27% > NativeCall.staticMethodCallingStaticNativeWithManyArgs 32.99% > > > Similar improvements are observed on different CPUs. > > It is especially interesting that -XX:+UseSystemMemoryBarrier variant began to show improvements in cases where there was parity. > > Testing: tier1-3 on linux-aarch64. > > [0] https://bugs.openjdk.org/browse/JDK-8189596 > [1] https://mail.openjdk.org/pipermail/hotspot-dev/2017-November/029264.html > [2] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-July/078715.html > [3] https://github.com/simonis/Java2Native/tree/main/examples/jmh/java2native This pull request has now been integrated. Changeset: 358d77da Author: Dmitry Chuyko URL: https://git.openjdk.org/jdk/commit/358d77dafbe0e35d5b20340fccddc0fb8f3db82a Stats: 19 lines in 2 files changed: 0 ins; 15 del; 4 mod 8337657: AArch64: No need for acquire fence in safepoint poll during JNI calls Reviewed-by: phh ------------- PR: https://git.openjdk.org/jdk/pull/20420 From svkamath at openjdk.org Fri Aug 9 18:02:49 2024 From: svkamath at openjdk.org (Smita Kamath) Date: Fri, 9 Aug 2024 18:02:49 GMT Subject: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v13] In-Reply-To: References: <66be8SJdxPOqmqsQ1YIwS4zM4GwPerypGIf8IbfxhRs=.1d03c94a-f3e5-40ae-999e-bdd5f328170d@github.com> Message-ID: <_PlJd1cbdiMGb1yUCWWZDf13xpTIBH2FtPcJ62VduhE=.36f54fc6-2c4f-488d-8d41-874cbf24d722@github.com> On Fri, 9 Aug 2024 15:21:22 GMT, Emanuel Peter wrote: >> @vnkozlov Thank you for reviewing the patch. > > @smita-kamath I think I just found another regression of this feature: https://bugs.openjdk.org/browse/JDK-8338126 > Can you please have a look? @eme64, Sure will look into it. Thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/9781#issuecomment-2278462816 From vlivanov at openjdk.org Fri Aug 9 18:07:34 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 9 Aug 2024 18:07:34 GMT Subject: RFR: 8337958: Out-of-bounds array access in secondary_super_cache [v2] In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 01:15:17 GMT, Andrew Haley wrote: >> The fix for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), secondary_super_cache does not scale well, has a rare (and benign) out-of-bounds array access. While this bug is very unlikely ever to cause a failure, it should be fixed. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8337958: Out-of-bounds array access in secondary_super_cache Testing results are clean. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20483#pullrequestreview-2230784867 From ccheung at openjdk.org Fri Aug 9 18:18:33 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 9 Aug 2024 18:18:33 GMT Subject: RFR: 8338011: CDS archived heap object support for 64-bit Windows In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 19:16:20 GMT, Ioi Lam wrote: > We didn't support CDS archived heap object on Windows because > > - The Windows implementation of `os::map_memory()` cannot map the contents of a file into a region that's already reserved by the garbage collector. > - We had a high failure rate for mapping the CDS archive on Windows due to ASLR, sometimes as high as 50%. So it didn't seem worth the effort (mainly testing) to support archived heap objects on Windows. > > Both of the above issues were fixed in [JDK-8231610](https://bugs.openjdk.org/browse/JDK-8231610), so we should add the support to Windows now. > > (Tested on Oracle CI tiers 1-7) Looks good. src/hotspot/share/cds/heapShared.cpp line 1557: > 1555: // At runtime, these classes are initialized before X's archived fields > 1556: // are restored by HeapShared::initialize_from_archived_subgraph(). > 1557: int i; This cleanup seems unrelated to the fix but I think it's fine to include it. ------------- Marked as reviewed by ccheung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20514#pullrequestreview-2230802915 PR Review Comment: https://git.openjdk.org/jdk/pull/20514#discussion_r1711944103 From erikj at openjdk.org Fri Aug 9 18:38:32 2024 From: erikj at openjdk.org (Erik Joelsson) Date: Fri, 9 Aug 2024 18:38:32 GMT Subject: RFR: 8337536: AArch64: Enable BTI branch protection for runtime part [v2] In-Reply-To: References: Message-ID: <3fLHxTC8pKjO7NWHedvizKrEJZG6CSpJlOFgf27m2hw=.301e8f28-030c-4109-96dc-f4efc2fa918c@github.com> On Fri, 9 Aug 2024 13:37:54 GMT, Fei Gao wrote: >> This patch enables BTI branch protection for runtime part on Linux/aarch64 platform. >> >> Motivation >> >> 1. Since Fedora 33, glibc+kernel are PAC/BTI enabled by default. User-level packages can gain additional hardening by compiling with the GCC/Clang flag `-mbranch-protection=flag`. See [1]. >> >> 2. In JDK-8277204 [2], `--enable-branch-protection` was introduced as one VM configure flag, which would pass `-mbranch-protection=standard` compilation flags to all c/c++ files. Note that `standard` turns on both `pac-ret` and `bti` branch protections. For more details about code reuse attacks and hardware-assisted branch protections on AArch64, see [3]. >> >> However, we checked the `.note.gnu.property` section of all the shared libraries under `jdk/lib` on Fedora 40, and found that only libjvm.so didn't set these two target feature bits: >> >> >> GNU_PROPERTY_AARCH64_FEATURE_1_BTI >> GNU_PROPERTY_AARCH64_FEATURE_1_PAC >> >> >> Note-1: BTI is an all or nothing property for a link unit [4]. That is, libjvm.so is not BTI-enabled. >> >> Note-2: PAC bit in `.note.gnu.property` section is used to protect `.got.plt` table. It's independent of whether the relocatable objects use PAC or not. >> >> Goal >> >> Hence, this patch aims to set PAC/BTI feature bits of the `.note.gnu.property` section for libjvm.so. >> >> Implementation >> >> Task-1: find out the problematic input objects >> >> From [5], "Static linkers processing ELF relocatable objects must set the feature bit in the output object or image only if all the input objects have the corresponding feature bit set." Hence we suspect that the root cause is probably that the PAC/BTI feature bits are not set only for some input objects of libjvm.so. >> >> In order to find out these inputs, we passed `--force-bti` linker flag [4] in my local test. This linker flag would warn if any input object does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI. We got the following list: >> >> >> src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S >> src/hotspot/os_cpu/linux_aarch64/copy_linux_aarch64.S >> src/hotspot/os_cpu/linux_aarch64/safefetch_linux_aarch64.S >> src/hotspot/os_cpu/linux_aarch64/threadLS_linux_aarch64.S >> >> >> Task-2: add `.note.gnu.property` section for these assembly files >> >> As mentioned in Motivation-2 part, `-mbranch-protection=standard` is passed to compile c/c++ files but these assembly files are missed. >> >> In this patch, we also pass `-mbranch-protection=standard` flag to assembler (See the update i... > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Clean up makefile > - Merge branch 'master' into enable-bti-runtime > - 8337536: AArch64: Enable BTI branch protection for runtime part > > This patch enables BTI branch protection for runtime part on > Linux/aarch64 platform. > > Motivation > > 1. Since Fedora 33, glibc+kernel are PAC/BTI enabled by default. > User-level packages can gain additional hardening by compiling with the > GCC/Clang flag `-mbranch-protection=flag`. See [1]. > > 2. In JDK-8277204 [2], `--enable-branch-protection` was introduced as > one VM configure flag, which would pass `-mbranch-protection=standard` > compilation flags to all c/c++ files. Note that `standard` turns on both > `pac-ret` and `bti` branch protections. For more details about code > reuse attacks and hardware-assisted branch protections on AArch64, see > [3]. > > However, we checked the `.note.gnu.property` section of all the shared > libraries under `jdk/lib` on Fedora 40, and found that only libjvm.so > didn't set these two target feature bits: > > ``` > GNU_PROPERTY_AARCH64_FEATURE_1_BTI > GNU_PROPERTY_AARCH64_FEATURE_1_PAC > ``` > > Note-1: BTI is an all or nothing property for a link unit [4]. That is, > libjvm.so is not BTI-enabled. > > Note-2: PAC bit in `.note.gnu.property` section is used to protect > `.got.plt` table. It's independent of whether the relocatable objects > use PAC or not. > > Goal > > Hence, this patch aims to set PAC/BTI feature bits of the > `.note.gnu.property` section for libjvm.so. > > Implementation > > Task-1: find out the problematic input objects > > From [5], "Static linkers processing ELF relocatable objects must set > the feature bit in the output object or image only if all the input > objects have the corresponding feature bit set." Hence we suspect that > the root cause is probably that the PAC/BTI feature bits are not set > only for some input objects of libjvm.so. > > In order to find out these inputs, we passed `--force-bti` linker flag > [4] in my local test. This linker flag would warn if any input object > does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI. We got the following > list: > > ``` > src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S > ... Build changes look good. ------------- Marked as reviewed by erikj (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20491#pullrequestreview-2230834719 From gziemski at openjdk.org Fri Aug 9 21:27:40 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 9 Aug 2024 21:27:40 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 17:13:06 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. > > There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) We could do: `#define MEMFLAGS MemType` to help out with backports, and leave it up to the runtime/compiler/gc teams when to pull the trigger to switch over when they are ready, if that helps? Any backport would have to deal with this change to a degree though unfortunately. I would want this change in runtime/nmt at the minimum, though personally I would prefer if we adopted it everywhere as in my proposed fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2278776049 From gziemski at openjdk.org Fri Aug 9 21:33:43 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 9 Aug 2024 21:33:43 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 17:13:06 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. > > There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) The current proposed fix renames from MEMFLAGS to MemType, other proposed names are: - MemTypeFlag - NMTCategory - NMTCat I also like: - NMTMemType Any other suggestions? I don't think I will be following up with argument renaming after this, though, whatever the final name we agree on... ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2278781358 From dean.long at oracle.com Fri Aug 9 22:05:27 2024 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 9 Aug 2024 15:05:27 -0700 Subject: Reliability of JVM in face of "recoverable" Errors, e.g. out of code cache space In-Reply-To: <67DA75AF-D768-4836-81F0-E63927A94E10@gmail.com> References: <26618BA9-BCF5-422D-89B5-8BEB20AF7856@gmail.com> <67DA75AF-D768-4836-81F0-E63927A94E10@gmail.com> Message-ID: <29685591-8852-4702-8949-4b352805043b@oracle.com> On 8/7/24 10:29 AM, Steven Schlansker wrote: >> On Aug 5, 2024, at 7:08?PM, David Holmes wrote: >> >> Hi Steven, >> >> On 3/08/2024 5:11 am, Steven Schlansker wrote: >>> Hi hotspot-dev, >>> Please let me know me if this is not an appropriate place to raise this kind of question - >>> happy to move to another more appropriate list >> This does seem like an issue with method linking in hotspot and so is appropriate. I would suggest filing a bug in JBS. > Thank you very much for your reply David. > I filed into JBS with review ID 9077424 > I can't find https://bugs.openjdk.org/browse/JDK-9077424.? Recent JBS JDK issues start with "833". dl From stevenschlansker at gmail.com Fri Aug 9 22:28:49 2024 From: stevenschlansker at gmail.com (Steven Schlansker) Date: Fri, 9 Aug 2024 15:28:49 -0700 Subject: Reliability of JVM in face of "recoverable" Errors, e.g. out of code cache space In-Reply-To: <29685591-8852-4702-8949-4b352805043b@oracle.com> References: <26618BA9-BCF5-422D-89B5-8BEB20AF7856@gmail.com> <67DA75AF-D768-4836-81F0-E63927A94E10@gmail.com> <29685591-8852-4702-8949-4b352805043b@oracle.com> Message-ID: > On Aug 9, 2024, at 3:05?PM, dean.long at oracle.com wrote: > > On 8/7/24 10:29 AM, Steven Schlansker wrote: > >>> On Aug 5, 2024, at 7:08?PM, David Holmes wrote: >>> >>> Hi Steven, >>> >>> On 3/08/2024 5:11 am, Steven Schlansker wrote: >>>> Hi hotspot-dev, >>>> Please let me know me if this is not an appropriate place to raise this kind of question - >>>> happy to move to another more appropriate list >>> This does seem like an issue with method linking in hotspot and so is appropriate. I would suggest filing a bug in JBS. >> Thank you very much for your reply David. >> I filed into JBS with review ID 9077424 >> > > I can't find https://bugs.openjdk.org/browse/JDK-9077424. Recent JBS JDK issues start with "833". As far as I understand, "normal people" like myself cannot actually file JDK- bugs - I used the bugreport web interface and was assigned an "internal review ID" instead. Hopefully someone at Oracle will accept my report and actually create a JDK- bug. From daniel.daugherty at oracle.com Fri Aug 9 22:33:29 2024 From: daniel.daugherty at oracle.com (daniel.daugherty at oracle.com) Date: Fri, 9 Aug 2024 16:33:29 -0600 Subject: Reliability of JVM in face of "recoverable" Errors, e.g. out of code cache space In-Reply-To: References: <26618BA9-BCF5-422D-89B5-8BEB20AF7856@gmail.com> <67DA75AF-D768-4836-81F0-E63927A94E10@gmail.com> <29685591-8852-4702-8949-4b352805043b@oracle.com> Message-ID: <8aea95de-a55a-4c52-bedf-b5e170853951@oracle.com> On 8/9/24 4:28 PM, Steven Schlansker wrote: > >> On Aug 9, 2024, at 3:05?PM, dean.long at oracle.com wrote: >> >> On 8/7/24 10:29 AM, Steven Schlansker wrote: >> >>>> On Aug 5, 2024, at 7:08?PM, David Holmes wrote: >>>> >>>> Hi Steven, >>>> >>>> On 3/08/2024 5:11 am, Steven Schlansker wrote: >>>>> Hi hotspot-dev, >>>>> Please let me know me if this is not an appropriate place to raise this kind of question - >>>>> happy to move to another more appropriate list >>>> This does seem like an issue with method linking in hotspot and so is appropriate. I would suggest filing a bug in JBS. >>> Thank you very much for your reply David. >>> I filed into JBS with review ID 9077424 >>> >> I can't find https://bugs.openjdk.org/browse/JDK-9077424. Recent JBS JDK issues start with "833". > As far as I understand, "normal people" like myself cannot actually file JDK- bugs - I used the bugreport > web interface and was assigned an "internal review ID" instead. > > Hopefully someone at Oracle will accept my report and actually create a JDK- bug. It is here: https://bugs.openjdk.org/browse/JI-9077424 Dan From dean.long at oracle.com Sat Aug 10 00:17:31 2024 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 9 Aug 2024 17:17:31 -0700 Subject: Reliability of JVM in face of "recoverable" Errors, e.g. out of code cache space In-Reply-To: <8aea95de-a55a-4c52-bedf-b5e170853951@oracle.com> References: <26618BA9-BCF5-422D-89B5-8BEB20AF7856@gmail.com> <67DA75AF-D768-4836-81F0-E63927A94E10@gmail.com> <29685591-8852-4702-8949-4b352805043b@oracle.com> <8aea95de-a55a-4c52-bedf-b5e170853951@oracle.com> Message-ID: <57f3eb22-4219-4c5e-93b5-9fe3a6137556@oracle.com> On 8/9/24 3:33 PM, daniel.daugherty at oracle.com wrote: > > > On 8/9/24 4:28 PM, Steven Schlansker wrote: >> >>> On Aug 9, 2024, at 3:05?PM, dean.long at oracle.com wrote: >>> >>> On 8/7/24 10:29 AM, Steven Schlansker wrote: >>> >>>>> On Aug 5, 2024, at 7:08?PM, David Holmes >>>>> wrote: >>>>> >>>>> Hi Steven, >>>>> >>>>> On 3/08/2024 5:11 am, Steven Schlansker wrote: >>>>>> Hi hotspot-dev, >>>>>> Please let me know me if this is not an appropriate place to >>>>>> raise this kind of question - >>>>>> happy to move to another more appropriate list >>>>> This does seem like an issue with method linking in hotspot and so >>>>> is appropriate. I would suggest filing a bug in JBS. >>>> Thank you very much for your reply David. >>>> I filed into JBS with review ID 9077424 >>>> >>> I can't find https://bugs.openjdk.org/browse/JDK-9077424. Recent JBS >>> JDK issues start with "833". >> As far as I understand, "normal people" like myself cannot actually >> file JDK- bugs - I used the bugreport >> web interface and was assigned an "internal review ID" instead. >> >> Hopefully someone at Oracle will accept my report and actually create >> a JDK- bug. > > It is here: > https://bugs.openjdk.org/browse/JI-9077424 > > Dan Thanks, Dan.? I should have tried that. dl From stuefe at openjdk.org Sat Aug 10 17:01:39 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 10 Aug 2024 17:01:39 GMT Subject: RFR: 8338011: CDS archived heap object support for 64-bit Windows In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 19:16:20 GMT, Ioi Lam wrote: > We didn't support CDS archived heap object on Windows because > > - The Windows implementation of `os::map_memory()` cannot map the contents of a file into a region that's already reserved by the garbage collector. > - We had a high failure rate for mapping the CDS archive on Windows due to ASLR, sometimes as high as 50%. So it didn't seem worth the effort (mainly testing) to support archived heap objects on Windows. > > Both of the above issues were fixed in [JDK-8231610](https://bugs.openjdk.org/browse/JDK-8231610), so we should add the support to Windows now. > > (Tested on Oracle CI tiers 1-7) This looks good to me. Mostly questions from my side. src/hotspot/share/cds/filemap.cpp line 2181: > 2179: // for mapped region as it is part of the reserved java heap, which is already recorded. > 2180: char* addr = (char*)_mapped_heap_memregion.start(); > 2181: char* base = map_memory(_fd, _full_path, r->file_offset(), So, do I understand this correctly, this always failed on Windows, since we attempt to map into the reserved region of the already existing Java heap? src/hotspot/share/cds/filemap.cpp line 2188: > 2186: /* do_commit = */ true)) { > 2187: dealloc_heap_region(); > 2188: log_error(cds)("Failed to read archived heap region at " INTPTR_FORMAT, p2i(addr)); Very minor bikeshedding, we don't try to read the heap region at this address but load it into memory at that address. Up to you if you change anything. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20514#pullrequestreview-2231475971 PR Review Comment: https://git.openjdk.org/jdk/pull/20514#discussion_r1712672522 PR Review Comment: https://git.openjdk.org/jdk/pull/20514#discussion_r1712672803 From stuefe at openjdk.org Sat Aug 10 17:01:40 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 10 Aug 2024 17:01:40 GMT Subject: RFR: 8338011: CDS archived heap object support for 64-bit Windows In-Reply-To: References: Message-ID: On Sat, 10 Aug 2024 16:40:55 GMT, Thomas Stuefe wrote: >> We didn't support CDS archived heap object on Windows because >> >> - The Windows implementation of `os::map_memory()` cannot map the contents of a file into a region that's already reserved by the garbage collector. >> - We had a high failure rate for mapping the CDS archive on Windows due to ASLR, sometimes as high as 50%. So it didn't seem worth the effort (mainly testing) to support archived heap objects on Windows. >> >> Both of the above issues were fixed in [JDK-8231610](https://bugs.openjdk.org/browse/JDK-8231610), so we should add the support to Windows now. >> >> (Tested on Oracle CI tiers 1-7) > > src/hotspot/share/cds/filemap.cpp line 2181: > >> 2179: // for mapped region as it is part of the reserved java heap, which is already recorded. >> 2180: char* addr = (char*)_mapped_heap_memregion.start(); >> 2181: char* base = map_memory(_fd, _full_path, r->file_offset(), > > So, do I understand this correctly, this always failed on Windows, since we attempt to map into the reserved region of the already existing Java heap? No, wait, we never would have entered this path since `ArchiveHeapLoader::can_map()` is false on Windows. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20514#discussion_r1712673191 From stuefe at openjdk.org Sat Aug 10 17:01:40 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 10 Aug 2024 17:01:40 GMT Subject: RFR: 8338011: CDS archived heap object support for 64-bit Windows In-Reply-To: References: Message-ID: On Sat, 10 Aug 2024 16:47:32 GMT, Thomas Stuefe wrote: >> src/hotspot/share/cds/filemap.cpp line 2181: >> >>> 2179: // for mapped region as it is part of the reserved java heap, which is already recorded. >>> 2180: char* addr = (char*)_mapped_heap_memregion.start(); >>> 2181: char* base = map_memory(_fd, _full_path, r->file_offset(), >> >> So, do I understand this correctly, this always failed on Windows, since we attempt to map into the reserved region of the already existing Java heap? > > No, wait, we never would have entered this path since `ArchiveHeapLoader::can_map()` is false on Windows. But then a follow-up question, what you do now in `map_heap_region_impl` for Windows, would that not be the same as the `ArchiveHeapLoader::load_heap_region` path? And another question, sorry, unrelated to this PR: I see we always attempt to load the heap region regardless of here (note how its outside the INCLUDE_CDS_JAVA_HEAP block): https://github.com/openjdk/jdk/blob/358d77dafbe0e35d5b20340fccddc0fb8f3db82a/src/hotspot/share/cds/metaspaceShared.cpp#L1200 I wonder whether this is wrong. If it is wrong, its benign? Do we even include heap region in CDS dumps on Windows? (Sorry for loading this PR with questions) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20514#discussion_r1712673566 From iklam at openjdk.org Sun Aug 11 21:05:34 2024 From: iklam at openjdk.org (Ioi Lam) Date: Sun, 11 Aug 2024 21:05:34 GMT Subject: RFR: 8338011: CDS archived heap object support for 64-bit Windows In-Reply-To: References: Message-ID: <2KESRMgnLpnr8bQcyxFhBtezYwsNlcAJrPX6PeL9_JA=.85684a0b-282e-4147-ad9e-a72c85d31a79@github.com> On Fri, 9 Aug 2024 18:14:58 GMT, Calvin Cheung wrote: >> We didn't support CDS archived heap object on Windows because >> >> - The Windows implementation of `os::map_memory()` cannot map the contents of a file into a region that's already reserved by the garbage collector. >> - We had a high failure rate for mapping the CDS archive on Windows due to ASLR, sometimes as high as 50%. So it didn't seem worth the effort (mainly testing) to support archived heap objects on Windows. >> >> Both of the above issues were fixed in [JDK-8231610](https://bugs.openjdk.org/browse/JDK-8231610), so we should add the support to Windows now. >> >> (Tested on Oracle CI tiers 1-7) > > src/hotspot/share/cds/heapShared.cpp line 1557: > >> 1555: // At runtime, these classes are initialized before X's archived fields >> 1556: // are restored by HeapShared::initialize_from_archived_subgraph(). >> 1557: int i; > > This cleanup seems unrelated to the fix but I think it's fine to include it. This is necessary for the Windows build, which would fail because the `i` variable declared on this line is never used. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20514#discussion_r1713058088 From iklam at openjdk.org Sun Aug 11 21:17:08 2024 From: iklam at openjdk.org (Ioi Lam) Date: Sun, 11 Aug 2024 21:17:08 GMT Subject: RFR: 8338011: CDS archived heap object support for 64-bit Windows [v2] In-Reply-To: References: Message-ID: > We didn't support CDS archived heap object on Windows because > > - The Windows implementation of `os::map_memory()` cannot map the contents of a file into a region that's already reserved by the garbage collector. > - We had a high failure rate for mapping the CDS archive on Windows due to ASLR, sometimes as high as 50%. So it didn't seem worth the effort (mainly testing) to support archived heap objects on Windows. > > Both of the above issues were fixed in [JDK-8231610](https://bugs.openjdk.org/browse/JDK-8231610), so we should add the support to Windows now. > > (Tested on Oracle CI tiers 1-7) Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: @tstuefe review -- changed error message ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20514/files - new: https://git.openjdk.org/jdk/pull/20514/files/8c79b888..0745467c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20514&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20514&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20514.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20514/head:pull/20514 PR: https://git.openjdk.org/jdk/pull/20514 From iklam at openjdk.org Sun Aug 11 21:17:08 2024 From: iklam at openjdk.org (Ioi Lam) Date: Sun, 11 Aug 2024 21:17:08 GMT Subject: RFR: 8338011: CDS archived heap object support for 64-bit Windows [v2] In-Reply-To: References: Message-ID: <7gXHGM7cA-LVvtV9KrhKUXLzD-nOkZe1_I9cBRLyBn8=.9d2e0a46-f4be-47e0-aea4-15b5a89ef577@github.com> On Sat, 10 Aug 2024 16:50:46 GMT, Thomas Stuefe wrote: > But then a follow-up question, what you do now in `map_heap_region_impl` for Windows, would that not be the same as the `ArchiveHeapLoader::load_heap_region` path? No. `ArchiveHeapLoader::can_load()` and `ArchiveHeapLoader::can_map()` are not capabilities of the platform, but capabilities of the GC. - `can_map()` is hard-coded for G1. - `can_load()` returns `Universe::heap()->can_load_archived_objects() && UseCompressedOops`. This returns true only on Serial, Parallel and Epsilon. So on Windows, when G1 is used, we go into this funny "mapping" mode except that the "mapping" is really implemented with `os::read()`. > And another question, sorry, unrelated to this PR: > > I see we always attempt to load the heap region regardless of here (note how its outside the INCLUDE_CDS_JAVA_HEAP block): > > https://github.com/openjdk/jdk/blob/358d77dafbe0e35d5b20340fccddc0fb8f3db82a/src/hotspot/share/cds/metaspaceShared.cpp#L1200 > > I wonder whether this is wrong. If it is wrong, its benign? Do we even include heap region in CDS dumps on Windows? > > (Sorry for loading this PR with questions) It's benign. If the heap region is not in the CDS archive, the `static_mapinfo->map_or_load_heap_region()` call does nothing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20514#discussion_r1713059295 From iklam at openjdk.org Sun Aug 11 21:17:08 2024 From: iklam at openjdk.org (Ioi Lam) Date: Sun, 11 Aug 2024 21:17:08 GMT Subject: RFR: 8338011: CDS archived heap object support for 64-bit Windows [v2] In-Reply-To: References: Message-ID: On Sat, 10 Aug 2024 16:43:51 GMT, Thomas Stuefe wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> @tstuefe review -- changed error message > > src/hotspot/share/cds/filemap.cpp line 2188: > >> 2186: /* do_commit = */ true)) { >> 2187: dealloc_heap_region(); >> 2188: log_error(cds)("Failed to read archived heap region at " INTPTR_FORMAT, p2i(addr)); > > Very minor bikeshedding, we don't try to read the heap region at this address but load it into memory at that address. Up to you if you change anything. I changed the message to "Failed to read archived heap region into 0x12345678...". Does that sound better? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20514#discussion_r1713059616 From kbarrett at openjdk.org Mon Aug 12 05:21:32 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 12 Aug 2024 05:21:32 GMT Subject: RFR: 8337938: ZUtils::alloc_aligned allocates without reporting to NMT In-Reply-To: References: Message-ID: On Fri, 9 Aug 2024 12:47:18 GMT, Joel Sikstr?m wrote: > Replaces usage of posix_memalign/_aligned_malloc with os::malloc and manual alignment to report memory usage to NMT. Manually aligning the memory makes the returned address unfreeable by malloc (as clarified by the added comment), which is reasonable since the memory used by ZUtils::alloc_aligned is never freed. > > Tested with tiers 1-3. Looks good, except for some copyrights needing update. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20523#pullrequestreview-2232062157 From amitkumar at openjdk.org Mon Aug 12 05:25:33 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 12 Aug 2024 05:25:33 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v5] In-Reply-To: References: Message-ID: On Fri, 9 Aug 2024 11:48:17 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Give barrier generation helper functions a more consistent name is there issue if we replace this code: if (in_bytes(SATBMarkQueue::byte_width_of_active()) == 4) { __ ldrw(rscratch1, in_progress); } else { assert(in_bytes(SATBMarkQueue::byte_width_of_active()) == 1, "Assumption"); __ ldrb(rscratch1, in_progress); } in method `G1BarrierSetAssembler::gen_write_ref_array_pre_barrier` with `generate_queue_test_and_insertion(masm, rthread, rscratch1)` ? Though you have to move the `gen_write_ref_array_pre_barrier` on top otherwise compiler wouldn't be able to find it. ------------- PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2232065079 From duke at openjdk.org Mon Aug 12 06:29:48 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Mon, 12 Aug 2024 06:29:48 GMT Subject: RFR: 8337938: ZUtils::alloc_aligned allocates without reporting to NMT [v2] In-Reply-To: References: Message-ID: > Replaces usage of posix_memalign/_aligned_malloc with os::malloc and manual alignment to report memory usage to NMT. Manually aligning the memory makes the returned address unfreeable by malloc (as clarified by the added comment), which is reasonable since the memory used by ZUtils::alloc_aligned is never freed. > > Tested with tiers 1-3. Joel Sikstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into zgc_zutils_alloc_aligned - Updated copyright years - Remove trailing whitespace - 8337938: ZUtils::alloc_aligned allocates without reporting to NMT ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20523/files - new: https://git.openjdk.org/jdk/pull/20523/files/cb942b1b..d227e0de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20523&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20523&range=00-01 Stats: 551 lines in 25 files changed: 357 ins; 77 del; 117 mod Patch: https://git.openjdk.org/jdk/pull/20523.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20523/head:pull/20523 PR: https://git.openjdk.org/jdk/pull/20523 From jbhateja at openjdk.org Mon Aug 12 06:32:33 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 12 Aug 2024 06:32:33 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v2] In-Reply-To: References: Message-ID: On Fri, 9 Aug 2024 03:28:53 GMT, Jasmine Karthikeyan wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 >> - Removed redundant comment >> - 8338021: Support saturating vector operators in VectorAPI > > src/hotspot/share/opto/type.cpp line 495: > >> 493: TypeInt::POS1 = TypeInt::make(1,max_jint, WidenMin); // Positive values >> 494: TypeInt::INT = TypeInt::make(min_jint,max_jint, WidenMax); // 32-bit integers >> 495: TypeInt::UINT = TypeInt::make(0, max_juint, WidenMin); // Unsigned ints > > This would make an illegal type, right? Since `TypeInt` is signed using `max_juint` as the hi value would end up as signed -1, resulting in the type `0..-1`, an empty type. I wonder if there's a better way to handle this, since in the type system empty types are in a sense equivalent to `TOP`. @jaskarth , its usage in existing patch is limited to [type comparison.](https://github.com/openjdk/jdk/pull/20507/files#diff-3559dcf23b719805be5fd06fd5c1851dbd8f53e47afe6d99cba13a3de0ebc6b2R1542). My plan is to address intrinsification of new core lib APIs, associated value range folding optimization (since unsigned numbers have different value range of [0, MAX_VALUE) vs signed [-MIN_VALUE/2, +MAX_VALUE/2) numbers) and auto-vectorization in a follow up patch. **Notes on C2 type system:** Unlike Type::FLOAT, integral type ranges are specified using _lo and _hi value range, these ranges are pruned using flow functions associated with each operation IR. Constraining the value ranges allows logic pruning, e.g. in1[TypeInt] & 0x7FFFFFFF will chop off -ve values ranges from in1, thus a constrol structure like . `if (in1 < 0) { true_path ; } else { false_path; } ` which uses in1 as a flow condition will sweepout the true path. C2 type system only maintains value ranges for integral types i.e. long and int, any sub-word type which as per JVM specification has an int storage "word" only constrains the value range of TypeInt. A type which represent a constant value has same _hi and _lo value. Floating point types Type::FLOAT / DOUBLE cannot maintain upper / lower value ranges due to rounding constraints. Thus a C2 type system maintains a separate type TypeF and TypeD which are singletons and represent a constant value. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1713220777 From kbarrett at openjdk.org Mon Aug 12 07:12:37 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 12 Aug 2024 07:12:37 GMT Subject: RFR: 8337938: ZUtils::alloc_aligned allocates without reporting to NMT [v2] In-Reply-To: References: Message-ID: <6QCBXfFDwLUTkzz9hezTYFbRwUExGs3HOyNrXEDshks=.08f22e57-c52b-4374-9925-4f66ceed25a0@github.com> On Mon, 12 Aug 2024 06:29:48 GMT, Joel Sikstr?m wrote: >> Replaces usage of posix_memalign/_aligned_malloc with os::malloc and manual alignment to report memory usage to NMT. Manually aligning the memory makes the returned address unfreeable by malloc (as clarified by the added comment), which is reasonable since the memory used by ZUtils::alloc_aligned is never freed. >> >> Tested with tiers 1-3. > > Joel Sikstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into zgc_zutils_alloc_aligned > - Updated copyright years > - Remove trailing whitespace > - 8337938: ZUtils::alloc_aligned allocates without reporting to NMT Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20523#pullrequestreview-2232190875 From kbarrett at openjdk.org Mon Aug 12 07:14:00 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 12 Aug 2024 07:14:00 GMT Subject: RFR: 8338155: Fix -Wzero-as-null-pointer-constant warnings involving PTHREAD_MUTEX_INITIALIZER Message-ID: <-VzpSBjPqMAteyuXwn__dmUdsw0aw9GIr1cF5Fl-rkE=.686a6d6f-aa48-4c3e-98cc-cc5dde9a26ca@github.com> Please review this change to remove -Wzero-as-null-pointer-constant warnings involving the use of PTHREAD_MUTEX_INITIALIZER. We obviously can't change the initializer macro, and we can't avoid it. So we suppress that warning where the initializer is used. This involved adding a suppression pragma macro for that warning, which we haven't needed for any of the previous work on removing them. Hopefully we won't need it for many (or any) other places, but there are still a few places triggering that warning, and not all of them are otherwise simple to resolve. Testing: mach5 tier1 ------------- Commit messages: - mutex initializers Changes: https://git.openjdk.org/jdk/pull/20537/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20537&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338155 Stats: 20 lines in 4 files changed: 16 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20537.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20537/head:pull/20537 PR: https://git.openjdk.org/jdk/pull/20537 From aph at openjdk.org Mon Aug 12 07:41:36 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 12 Aug 2024 07:41:36 GMT Subject: Integrated: 8337958: Out-of-bounds array access in secondary_super_cache In-Reply-To: References: Message-ID: <9vQWW2Ij7bkcY6IirwkR0YLhF_Tj5PYwMVury6aHu8A=.5b23fc79-295e-4e4c-88e3-2ab2984b456d@github.com> On Tue, 6 Aug 2024 23:35:55 GMT, Andrew Haley wrote: > The fix for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), secondary_super_cache does not scale well, has a rare (and benign) out-of-bounds array access. While this bug is very unlikely ever to cause a failure, it should be fixed. This pull request has now been integrated. Changeset: 03204600 Author: Andrew Haley URL: https://git.openjdk.org/jdk/commit/03204600c596214895ef86581eba9722f76d39b3 Stats: 15 lines in 4 files changed: 3 ins; 1 del; 11 mod 8337958: Out-of-bounds array access in secondary_super_cache Reviewed-by: vlivanov, shade ------------- PR: https://git.openjdk.org/jdk/pull/20483 From aph at openjdk.org Mon Aug 12 07:50:05 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 12 Aug 2024 07:50:05 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v18] In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: > This patch expands the use of a hash table for secondary superclasses > to the interpreter, C1, and runtime. It also adds a C2 implementation > of hashed lookup in cases where the superclass isn't known at compile > time. > > HotSpot shared runtime > ---------------------- > > Building hashed secondary tables is now unconditional. It takes very > little time, and now that the shared runtime always has the tables, it > might as well take advantage of them. The shared code is easier to > follow now, I think. > > There might be a performance issue with x86-64 in that we build > HotSpot for a default x86-64 target that does not support popcount. > This means that HotSpot C++ runtime on x86 always uses a software > emulation for popcount, even though the vast majority of machines made > for the past 20 years can do popcount in a single instruction. It > wouldn't be terribly hard to do something about that. > > Having said that, the software popcount is really not bad. > > x86 > --- > > x86 is rather tricky, because we still support > `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as > well as 32- and 64-bit ports. There's some further complication in > that only `RCX` can be used as a shift count, so there's some register > shuffling to do. All of this makes the logic in macroAssembler_x86.cpp > rather gnarly, with multiple levels of conditionals at compile time > and runtime. > > AArch64 > ------- > > AArch64 is considerably more straightforward. We always have a > popcount instruction and (thankfully) no 32-bit code to worry about. > > Generally > --------- > > I would dearly love simply to rip out the "old" secondary supers cache > support, but I've left it in just in case someone has a performance > regression. > > The versions of `MacroAssembler::lookup_secondary_supers_table` that > work with variable superclasses don't take a fixed set of temp > registers, and neither do they call out to to a slow path subroutine. > Instead, the slow patch is expanded inline. > > I don't think this is necessarily bad. Apart from the very rare cases > where C2 can't determine the superclass to search for at compile time, > this code is only used for generating stubs, and it seemed to me > ridiculous to have stubs calling other stubs. > > I've followed the guidance from @iwanowww not to obsess too much about > the performance of C1-compiled secondary supers lookups, and to prefer > simplicity over absolute performance. Nonetheless, this is a > complicated patch that touches many areas. Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 55 commits: - Merge from JDK head. - Cleanup - Fix shared code - Fix shared code - use assert rather than guarantee - Untabify - Reorganize x86 - Merge branch 'JDK-8331658-work' of https://github.com/theRealAph/jdk into JDK-8331658-work - Fix AArch64 - Review comments - ... and 45 more: https://git.openjdk.org/jdk/compare/03204600...5b46b38d ------------- Changes: https://git.openjdk.org/jdk/pull/19989/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=17 Stats: 1046 lines in 20 files changed: 774 ins; 140 del; 132 mod Patch: https://git.openjdk.org/jdk/pull/19989.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989 PR: https://git.openjdk.org/jdk/pull/19989 From rcastanedalo at openjdk.org Mon Aug 12 08:38:37 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 12 Aug 2024 08:38:37 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v5] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 05:23:06 GMT, Amit Kumar wrote: > is there issue if we replace this code: > > ``` > if (in_bytes(SATBMarkQueue::byte_width_of_active()) == 4) { > __ ldrw(rscratch1, in_progress); > } else { > assert(in_bytes(SATBMarkQueue::byte_width_of_active()) == 1, "Assumption"); > __ ldrb(rscratch1, in_progress); > } > ``` > > in method `G1BarrierSetAssembler::gen_write_ref_array_pre_barrier` with `generate_queue_test_and_insertion(masm, rthread, rscratch1)` ? > > Though you have to move the `gen_write_ref_array_pre_barrier` on top otherwise compiler wouldn't be able to find it. Thanks for the suggestion Amit! this refactoring would work (assuming you mean `generate_pre_barrier_fast_path` instead of `generate_queue_test_and_insertion`), however I am hesitant to apply it because 1) it would further increase the size of the changelog and hence the burden of reviewing it and 2) it is not a clear maintainability win: some engineers prefer a little bit of code duplication to preserve the assembly code flow (see discussion [here](https://github.com/openjdk/jdk/pull/19746#discussion_r1645713269)). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2283395013 From rcastanedalo at openjdk.org Mon Aug 12 08:46:16 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 12 Aug 2024 08:46:16 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v6] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Further motivate the choice of internal store address materialization in x64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/1834bf41..d21104ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=04-05 Stats: 4 lines in 1 file changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Mon Aug 12 08:46:16 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 12 Aug 2024 08:46:16 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <5Q8PqULlpKfoPLXRqI0ua0dVWAy3zPBqtFpycNwBg0Y=.f2830c84-63ba-43cd-85e3-2245e4ac8917@github.com> Message-ID: On Fri, 9 Aug 2024 14:05:43 GMT, Martin Doerr wrote: >> I agree that using indirect memory operands is the most readable choice, and is slightly less wasteful from a register usage perspective. However, when I tried this choice a couple of months ago, I observed timeouts in some CTW runs, which as far as I remember were caused when LCM processed huge basic blocks with lots of memory writes (e.g. arising from static initializations of large String arrays such as in [here](https://github.com/apache/lucene/blob/ea562f6ef2b32fe6eadf57c6381d9a69acb043c7/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/KStemData1.java#L47-L748)), in combination with C2 stress options. In these scenarios, the large number of additional Mach nodes seemed to cause the timeouts. I settled for materializing the store address internally to guard against such corner cases. I did not see any significant performance difference between the two choices in my benchmark results. >> >> I would like to study whether LCM can be made more robust in this scenario, which would enable using indirect memory operands here, but I think this would be best addressed in a separate RFE. Would it be OK by now to extend the code comment with the details provided in the above explanation? > > Ok, doing it in a separate RFE is fine with me. This sounds like a C2 problem which should get investigated. It may cause other performance problems, too. Maybe a native profiler can show what takes too much time. Thanks Martin, I have added this to my list of follow-up tasks and extended the comment in the code with some more details (commit d21104ca8). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1713372749 From aph at openjdk.org Mon Aug 12 08:48:59 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 12 Aug 2024 08:48:59 GMT Subject: [jdk23] RFR: 8337958: Out-of-bounds array access in secondary_super_cache Message-ID: Hi all, This pull request contains a backport of commit [03204600](https://github.com/openjdk/jdk/commit/03204600c596214895ef86581eba9722f76d39b3) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Andrew Haley on 12 Aug 2024 and was reviewed by Vladimir Ivanov and Aleksey Shipilev. Thanks! ------------- Commit messages: - Backport 03204600c596214895ef86581eba9722f76d39b3 Changes: https://git.openjdk.org/jdk/pull/20545/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20545&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337958 Stats: 13 lines in 3 files changed: 3 ins; 1 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/20545.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20545/head:pull/20545 PR: https://git.openjdk.org/jdk/pull/20545 From amitkumar at openjdk.org Mon Aug 12 08:50:35 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 12 Aug 2024 08:50:35 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v5] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 08:35:57 GMT, Roberto Casta?eda Lozano wrote: > > is there issue if we replace this code: > > ``` > > if (in_bytes(SATBMarkQueue::byte_width_of_active()) == 4) { > > __ ldrw(rscratch1, in_progress); > > } else { > > assert(in_bytes(SATBMarkQueue::byte_width_of_active()) == 1, "Assumption"); > > __ ldrb(rscratch1, in_progress); > > } > > ``` > > > > > > > > > > > > > > > > > > > > > > > > in method `G1BarrierSetAssembler::gen_write_ref_array_pre_barrier` with `generate_queue_test_and_insertion(masm, rthread, rscratch1)` ? > > Though you have to move the `gen_write_ref_array_pre_barrier` on top otherwise compiler wouldn't be able to find it. > > Thanks for the suggestion Amit! this refactoring would work (assuming you mean `generate_pre_barrier_fast_path` instead of `generate_queue_test_and_insertion`), however I am hesitant to apply it because 1) it would further increase the size of the changelog and hence the burden of reviewing it and 2) it is not a clear maintainability win: some engineers prefer a little bit of code duplication to preserve the assembly code flow (see discussion [here](https://github.com/openjdk/jdk/pull/19746#discussion_r1645713269)). Ha! makes sense. Are you planning to rebase it with master ? Nothing important, but there were couple of failures which are fixed after this PR. So will make test result a bit clean for us ?. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2283418237 From alanb at openjdk.org Mon Aug 12 09:24:31 2024 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 12 Aug 2024 09:24:31 GMT Subject: [jdk23] RFR: 8337958: Out-of-bounds array access in secondary_super_cache In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 08:43:34 GMT, Andrew Haley wrote: > Hi all, > > This pull request contains a backport of commit [03204600](https://github.com/openjdk/jdk/commit/03204600c596214895ef86581eba9722f76d39b3) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Andrew Haley on 12 Aug 2024 and was reviewed by Vladimir Ivanov and Aleksey Shipilev. > > Thanks! jdk23u, not jdk23, right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20545#issuecomment-2283483051 From aph at openjdk.org Mon Aug 12 09:34:34 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 12 Aug 2024 09:34:34 GMT Subject: [jdk23] RFR: 8337958: Out-of-bounds array access in secondary_super_cache In-Reply-To: References: Message-ID: <4I18fDonUiNRLQoL2mB9v6T-R_us0BL8zXW9VsKXNbk=.6e797e8c-5429-439d-bb32-5c24a3880923@github.com> On Mon, 12 Aug 2024 09:22:02 GMT, Alan Bateman wrote: > jdk23u, not jdk23, right? Oh, Dear God. I searched for jdk23, and that was all it suggested. I guess I'll try again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20545#issuecomment-2283501055 From aph at openjdk.org Mon Aug 12 09:37:31 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 12 Aug 2024 09:37:31 GMT Subject: [jdk23] RFR: 8337958: Out-of-bounds array access in secondary_super_cache In-Reply-To: <4I18fDonUiNRLQoL2mB9v6T-R_us0BL8zXW9VsKXNbk=.6e797e8c-5429-439d-bb32-5c24a3880923@github.com> References: <4I18fDonUiNRLQoL2mB9v6T-R_us0BL8zXW9VsKXNbk=.6e797e8c-5429-439d-bb32-5c24a3880923@github.com> Message-ID: On Mon, 12 Aug 2024 09:31:43 GMT, Andrew Haley wrote: > > jdk23u, not jdk23, right? > > Oh, Dear God. I searched for jdk23, and that was all it suggested. I guess I'll try again. There's no such branch. What should I do? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20545#issuecomment-2283506389 From shade at openjdk.org Mon Aug 12 10:04:50 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 Aug 2024 10:04:50 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v3] In-Reply-To: References: Message-ID: > See bug for more discussion. > > Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193 > > A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice: > https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160 > > AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone. > > I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields. > > Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit. > > C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner. > > Additional testing: > - [x] New IR tests > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, jcstre... Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Use TestFramework bootclasspath instead of develop option - Merge branch 'master' into JDK-8333791-stable-field-barrier - Merge branch 'master' into JDK-8333791-stable-field-barrier - Merge branch 'master' into JDK-8333791-stable-field-barrier - Merge branch 'master' into JDK-8333791-stable-field-barrier - Variant 2: Only final-field like semantics for stable inits - Variant 3: Handle everything, including reads by compilers ------------- Changes: https://git.openjdk.org/jdk/pull/19635/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19635&range=02 Stats: 1067 lines in 14 files changed: 1028 ins; 20 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/19635.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19635/head:pull/19635 PR: https://git.openjdk.org/jdk/pull/19635 From shade at openjdk.org Mon Aug 12 10:04:50 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 Aug 2024 10:04:50 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v2] In-Reply-To: References: <2IdxXlsbkFOF9BnHuiSXm96Fil-4YoA0GCdKOIz2tPE=.c596ab28-a346-44f6-9e80-7ee76a2aa20b@github.com> Message-ID: On Mon, 5 Aug 2024 14:03:43 GMT, Christian Hagedorn wrote: >> Yeah, OK, let's do IR Framework update separately. I am planning to have this fix backported, so I would like to have test infra fixes also be more or less cleanly backportable. @chhagedorn, are you taking point on this, or should I? > > Sounds good, I can take care of it tomorrow. I will ping you in the PR once it is out. Redone the tests to use newly available TestFramework API. Retracted the develop flag. Made sure tests work fine in both fastdebug and release modes. See new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19635#discussion_r1713469955 From stuefe at openjdk.org Mon Aug 12 10:41:33 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 12 Aug 2024 10:41:33 GMT Subject: RFR: 8338011: CDS archived heap object support for 64-bit Windows [v2] In-Reply-To: References: Message-ID: On Sun, 11 Aug 2024 21:17:08 GMT, Ioi Lam wrote: >> We didn't support CDS archived heap object on Windows because >> >> - The Windows implementation of `os::map_memory()` cannot map the contents of a file into a region that's already reserved by the garbage collector. >> - We had a high failure rate for mapping the CDS archive on Windows due to ASLR, sometimes as high as 50%. So it didn't seem worth the effort (mainly testing) to support archived heap objects on Windows. >> >> Both of the above issues were fixed in [JDK-8231610](https://bugs.openjdk.org/browse/JDK-8231610), so we should add the support to Windows now. >> >> (Tested on Oracle CI tiers 1-7) > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @tstuefe review -- changed error message This looks good. Thanks for changing the error message. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20514#pullrequestreview-2232620901 From stuefe at openjdk.org Mon Aug 12 10:41:34 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 12 Aug 2024 10:41:34 GMT Subject: RFR: 8338011: CDS archived heap object support for 64-bit Windows [v2] In-Reply-To: <7gXHGM7cA-LVvtV9KrhKUXLzD-nOkZe1_I9cBRLyBn8=.9d2e0a46-f4be-47e0-aea4-15b5a89ef577@github.com> References: <7gXHGM7cA-LVvtV9KrhKUXLzD-nOkZe1_I9cBRLyBn8=.9d2e0a46-f4be-47e0-aea4-15b5a89ef577@github.com> Message-ID: On Sun, 11 Aug 2024 21:11:31 GMT, Ioi Lam wrote: >> But then a follow-up question, what you do now in `map_heap_region_impl` for Windows, would that not be the same as the `ArchiveHeapLoader::load_heap_region` path? >> >> And another question, sorry, unrelated to this PR: >> >> I see we always attempt to load the heap region regardless of here (note how its outside the INCLUDE_CDS_JAVA_HEAP block): https://github.com/openjdk/jdk/blob/358d77dafbe0e35d5b20340fccddc0fb8f3db82a/src/hotspot/share/cds/metaspaceShared.cpp#L1200 >> >> I wonder whether this is wrong. If it is wrong, its benign? Do we even include heap region in CDS dumps on Windows? >> >> (Sorry for loading this PR with questions) > >> But then a follow-up question, what you do now in `map_heap_region_impl` for Windows, would that not be the same as the `ArchiveHeapLoader::load_heap_region` path? > > No. `ArchiveHeapLoader::can_load()` and `ArchiveHeapLoader::can_map()` are not capabilities of the platform, but capabilities of the GC. > > - `can_map()` is hard-coded for G1. > - `can_load()` returns `Universe::heap()->can_load_archived_objects() && UseCompressedOops`. This returns true only on Serial, Parallel and Epsilon. > > So on Windows, when G1 is used, we go into this funny "mapping" mode except that the "mapping" is really implemented with `os::read()`. > >> And another question, sorry, unrelated to this PR: >> >> I see we always attempt to load the heap region regardless of here (note how its outside the INCLUDE_CDS_JAVA_HEAP block): >> >> https://github.com/openjdk/jdk/blob/358d77dafbe0e35d5b20340fccddc0fb8f3db82a/src/hotspot/share/cds/metaspaceShared.cpp#L1200 >> >> I wonder whether this is wrong. If it is wrong, its benign? Do we even include heap region in CDS dumps on Windows? >> >> (Sorry for loading this PR with questions) > > It's benign. If the heap region is not in the CDS archive, the `static_mapinfo->map_or_load_heap_region()` call does nothing. Thank you for explaining. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20514#discussion_r1713516486 From shade at openjdk.org Mon Aug 12 10:53:37 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 Aug 2024 10:53:37 GMT Subject: RFR: 8338011: CDS archived heap object support for 64-bit Windows [v2] In-Reply-To: References: Message-ID: On Sun, 11 Aug 2024 21:17:08 GMT, Ioi Lam wrote: >> We didn't support CDS archived heap object on Windows because >> >> - The Windows implementation of `os::map_memory()` cannot map the contents of a file into a region that's already reserved by the garbage collector. >> - We had a high failure rate for mapping the CDS archive on Windows due to ASLR, sometimes as high as 50%. So it didn't seem worth the effort (mainly testing) to support archived heap objects on Windows. >> >> Both of the above issues were fixed in [JDK-8231610](https://bugs.openjdk.org/browse/JDK-8231610), so we should add the support to Windows now. >> >> (Tested on Oracle CI tiers 1-7) > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @tstuefe review -- changed error message OK, so on Windows, we just _read_ things at the requested address without _mapping_ it. Sounds fine. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20514#pullrequestreview-2232643847 From stefank at openjdk.org Mon Aug 12 11:01:36 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 12 Aug 2024 11:01:36 GMT Subject: RFR: 8337938: ZUtils::alloc_aligned allocates without reporting to NMT [v2] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 06:29:48 GMT, Joel Sikstr?m wrote: >> Replaces usage of posix_memalign/_aligned_malloc with os::malloc and manual alignment to report memory usage to NMT. Manually aligning the memory makes the returned address unfreeable by malloc (as clarified by the added comment), which is reasonable since the memory used by ZUtils::alloc_aligned is never freed. >> >> Tested with tiers 1-3. > > Joel Sikstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into zgc_zutils_alloc_aligned > - Updated copyright years > - Remove trailing whitespace > - 8337938: ZUtils::alloc_aligned allocates without reporting to NMT Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20523#pullrequestreview-2232648180 From duke at openjdk.org Mon Aug 12 11:01:36 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Mon, 12 Aug 2024 11:01:36 GMT Subject: RFR: 8337938: ZUtils::alloc_aligned allocates without reporting to NMT [v2] In-Reply-To: References: Message-ID: <-fmLaWio8oZygaYx76ekJpOU7ZOAP9NZT943MOf0AdY=.ac0f1ffa-d9a4-4b45-a580-681d194235f8@github.com> On Mon, 12 Aug 2024 10:53:12 GMT, Stefan Karlsson wrote: >> Joel Sikstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into zgc_zutils_alloc_aligned >> - Updated copyright years >> - Remove trailing whitespace >> - 8337938: ZUtils::alloc_aligned allocates without reporting to NMT > > Marked as reviewed by stefank (Reviewer). Thank you for the reviews! @stefank @kimbarrett ------------- PR Comment: https://git.openjdk.org/jdk/pull/20523#issuecomment-2283655289 From duke at openjdk.org Mon Aug 12 11:01:36 2024 From: duke at openjdk.org (duke) Date: Mon, 12 Aug 2024 11:01:36 GMT Subject: RFR: 8337938: ZUtils::alloc_aligned allocates without reporting to NMT [v2] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 06:29:48 GMT, Joel Sikstr?m wrote: >> Replaces usage of posix_memalign/_aligned_malloc with os::malloc and manual alignment to report memory usage to NMT. Manually aligning the memory makes the returned address unfreeable by malloc (as clarified by the added comment), which is reasonable since the memory used by ZUtils::alloc_aligned is never freed. >> >> Tested with tiers 1-3. > > Joel Sikstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into zgc_zutils_alloc_aligned > - Updated copyright years > - Remove trailing whitespace > - 8337938: ZUtils::alloc_aligned allocates without reporting to NMT @jsikstro Your change (at version d227e0dee82abb51f23ffbb6c2e199248a273123) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20523#issuecomment-2283657113 From duke at openjdk.org Mon Aug 12 11:01:37 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Mon, 12 Aug 2024 11:01:37 GMT Subject: Integrated: 8337938: ZUtils::alloc_aligned allocates without reporting to NMT In-Reply-To: References: Message-ID: On Fri, 9 Aug 2024 12:47:18 GMT, Joel Sikstr?m wrote: > Replaces usage of posix_memalign/_aligned_malloc with os::malloc and manual alignment to report memory usage to NMT. Manually aligning the memory makes the returned address unfreeable by malloc (as clarified by the added comment), which is reasonable since the memory used by ZUtils::alloc_aligned is never freed. > > Tested with tiers 1-3. This pull request has now been integrated. Changeset: a6c06307 Author: Joel Sikstr?m Committer: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/a6c0630737bbf2f2e6c64863ff9b43c50c4742b6 Stats: 104 lines in 6 files changed: 13 ins; 83 del; 8 mod 8337938: ZUtils::alloc_aligned allocates without reporting to NMT Reviewed-by: stefank, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/20523 From dholmes at openjdk.org Mon Aug 12 11:15:31 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 12 Aug 2024 11:15:31 GMT Subject: RFR: 8338155: Fix -Wzero-as-null-pointer-constant warnings involving PTHREAD_MUTEX_INITIALIZER In-Reply-To: <-VzpSBjPqMAteyuXwn__dmUdsw0aw9GIr1cF5Fl-rkE=.686a6d6f-aa48-4c3e-98cc-cc5dde9a26ca@github.com> References: <-VzpSBjPqMAteyuXwn__dmUdsw0aw9GIr1cF5Fl-rkE=.686a6d6f-aa48-4c3e-98cc-cc5dde9a26ca@github.com> Message-ID: On Mon, 12 Aug 2024 07:08:56 GMT, Kim Barrett wrote: > Please review this change to remove -Wzero-as-null-pointer-constant warnings > involving the use of PTHREAD_MUTEX_INITIALIZER. We obviously can't change the > initializer macro, and we can't avoid it. So we suppress that warning where > the initializer is used. > > This involved adding a suppression pragma macro for that warning, which we > haven't needed for any of the previous work on removing them. Hopefully we > won't need it for many (or any) other places, but there are still a few places > triggering that warning, and not all of them are otherwise simple to resolve. > > Testing: mach5 tier1 I would expect there is an upstream bug to fix this - seems crazy we have to workaround it. Changes look fine though. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20537#pullrequestreview-2232692694 From aph at openjdk.org Mon Aug 12 11:51:31 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 12 Aug 2024 11:51:31 GMT Subject: [jdk23] RFR: 8337958: Out-of-bounds array access in secondary_super_cache In-Reply-To: References: <4I18fDonUiNRLQoL2mB9v6T-R_us0BL8zXW9VsKXNbk=.6e797e8c-5429-439d-bb32-5c24a3880923@github.com> Message-ID: On Mon, 12 Aug 2024 09:34:34 GMT, Andrew Haley wrote: > > > jdk23u, not jdk23, right? > > > > > > Oh, Dear God. I searched for jdk23, and that was all it suggested. I guess I'll try again. > > There's no such branch. What should I do? Ah, I found it. It looks like jdk23u is in its own repo. I don't know why; I thought we'd moved to branches. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20545#issuecomment-2283757820 From aph at openjdk.org Mon Aug 12 12:01:36 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 12 Aug 2024 12:01:36 GMT Subject: [jdk23] Withdrawn: 8337958: Out-of-bounds array access in secondary_super_cache In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 08:43:34 GMT, Andrew Haley wrote: > Hi all, > > This pull request contains a backport of commit [03204600](https://github.com/openjdk/jdk/commit/03204600c596214895ef86581eba9722f76d39b3) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Andrew Haley on 12 Aug 2024 and was reviewed by Vladimir Ivanov and Aleksey Shipilev. > > Thanks! This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/20545 From rcastanedalo at openjdk.org Mon Aug 12 12:13:42 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 12 Aug 2024 12:13:42 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v5] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 08:48:24 GMT, Amit Kumar wrote: > Are you planning to rebase it with master ? Nothing important, but there were couple of failures which are fixed after this PR. So will make test result a bit clean for us ?. Actually, I have refrained to update to the latest mainline changes to avoid interfering with the porting work while it is in progress, but if there is consensus among the port maintainers I would be happy to update the changeset regularly. @TheRealMDoerr @feilongjiang @offamitkumar @snazarkin what do you prefer? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2283802801 From mdoerr at openjdk.org Mon Aug 12 12:25:41 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 12 Aug 2024 12:25:41 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v6] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 08:46:16 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Further motivate the choice of internal store address materialization in x64 I'm a bit concerned about regular updates. We should at least check if all platforms are in a good shape before merging. JDK head looks good at the moment, so I'd appreciate an update. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2283828888 From stefank at openjdk.org Mon Aug 12 12:26:41 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 12 Aug 2024 12:26:41 GMT Subject: [jdk23] RFR: 8337958: Out-of-bounds array access in secondary_super_cache In-Reply-To: References: <4I18fDonUiNRLQoL2mB9v6T-R_us0BL8zXW9VsKXNbk=.6e797e8c-5429-439d-bb32-5c24a3880923@github.com> Message-ID: On Mon, 12 Aug 2024 11:49:13 GMT, Andrew Haley wrote: > I don't know why; I thought we'd moved to branches. FWIW, I think the JDK Updates maintainers wanted to wait the transition to branches: https://mail.openjdk.org/pipermail/jdk-dev/2024-March/008847.html ------------- PR Comment: https://git.openjdk.org/jdk/pull/20545#issuecomment-2283830722 From aph-open at littlepinkcloud.com Mon Aug 12 13:20:32 2024 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Mon, 12 Aug 2024 14:20:32 +0100 Subject: [jdk23] RFR: 8337958: Out-of-bounds array access in secondary_super_cache In-Reply-To: References: <4I18fDonUiNRLQoL2mB9v6T-R_us0BL8zXW9VsKXNbk=.6e797e8c-5429-439d-bb32-5c24a3880923@github.com> Message-ID: <95082d0d-58b3-4ec3-96dd-183e0308bf15@littlepinkcloud.com> On 8/12/24 13:26, Stefan Karlsson wrote: > On Mon, 12 Aug 2024 11:49:13 GMT, Andrew Haley wrote: > >> I don't know why; I thought we'd moved to branches. > > FWIW, I think the JDK Updates maintainers wanted to wait the transition to branches: > > https://mail.openjdk.org/pipermail/jdk-dev/2024-March/008847.html OK, thanks. Much confusion here: 23 is a branch, 23u is s repo... Go figure! :-) -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From sgehwolf at openjdk.org Mon Aug 12 13:44:35 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Mon, 12 Aug 2024 13:44:35 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v4] In-Reply-To: References: Message-ID: On Thu, 11 Jul 2024 16:46:13 GMT, Severin Gehwolf wrote: >> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. >> >> I'm adding those tests in order to not regress another time. >> >> Testing: >> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. >> - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) >> - [x] GHA > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Add Whitebox check for host cpu > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Fix comments > - 8333446: Add tests for hierarchical container support Please keep it open, bot. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2284030728 From rcastanedalo at openjdk.org Mon Aug 12 14:00:36 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 12 Aug 2024 14:00:36 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v6] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 12:23:24 GMT, Martin Doerr wrote: > I'm a bit concerned about regular updates. We should at least check if all platforms are in a good shape before merging. JDK head looks good at the moment, so I'd appreciate an update. OK, I will test and push a merge of jdk-24+10 (Thu Aug 8) in the next days, unless @feilongjiang or @snazarkin object. We can then check in a few weeks if another update is required. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2284064522 From mdoerr at openjdk.org Mon Aug 12 14:06:35 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 12 Aug 2024 14:06:35 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v6] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 08:46:16 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Further motivate the choice of internal store address materialization in x64 src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 203: > 201: // Do we need to load the previous value? > 202: if (obj != noreg) { > 203: __ load_heap_oop(pre_val, Address(obj, 0), noreg, noreg, AS_RAW); How do we handle implicit null checks for which `obj` is null? Note that we may expect the store instruction to trigger SIGSEGV. Does it work correctly if we trigger the SIGSEGV, here in the pre barrier? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1713842991 From aboldtch at openjdk.org Mon Aug 12 14:41:22 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 12 Aug 2024 14:41:22 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v10] In-Reply-To: References: Message-ID: <7BlK_mX-oDELURZVR0Haq7NCPUv18Q4Dk7Nblicdvn4=.1e06f523-0943-4dcd-9807-584d48b239ed@github.com> > When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. > > This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. > > A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). > > This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). > > Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. > > # Cleanups > > Cleaned up displaced header usage for: > * BasicLock > * Contains some Zero changes > * Renames one exported JVMCI field > * ObjectMonitor > * Updates comments and tests consistencies > > # Refactoring > > `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. > > The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. > > _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ > > # LightweightSynchronizer > > Working on adapting and incorporating the following section as a comment in the source code > > ## Fast Locking > > CAS on locking bits in markWord. > 0b00 (Fast Locked) <--> 0b01 (Unlocked) > > When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. > > If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf... Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 36 additional commits since the last revision: - Remove extra whitespace in UseObjectMonitorTableTest.java - Inline _table - Rename ObjectMonitorWorld to ObjectMonitorTable - Update comment basicLock.hpp - Remove const for InflateCause parameters in lightweightSynchronizer - Use [inc/dec]_no_safepoint_count directly instead of a conditionally created NoSafepointVerifier - Remove unnecessary assert - Rename _table_count to _items_count - Revert instanceKlass.cpp comment change - Merge tag 'jdk-24+10' into JDK-8315884 Added tag jdk-24+10 for changeset 16df9c33 - ... and 26 more: https://git.openjdk.org/jdk/compare/0cb42d5f...92a88366 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20067/files - new: https://git.openjdk.org/jdk/pull/20067/files/ebf11542..92a88366 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=08-09 Stats: 42239 lines in 1406 files changed: 24413 ins; 11582 del; 6244 mod Patch: https://git.openjdk.org/jdk/pull/20067.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067 PR: https://git.openjdk.org/jdk/pull/20067 From aboldtch at openjdk.org Mon Aug 12 14:41:23 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 12 Aug 2024 14:41:23 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9] In-Reply-To: <3m5N_Fh65MVy7vRvO0wq3qFlzxjbCLHhbTBJe8OJorw=.eb61b3bd-5aca-45cd-8e88-389ae86a599b@github.com> References: <3m5N_Fh65MVy7vRvO0wq3qFlzxjbCLHhbTBJe8OJorw=.eb61b3bd-5aca-45cd-8e88-389ae86a599b@github.com> Message-ID: On Tue, 23 Jul 2024 13:20:27 GMT, Coleen Phillimore wrote: >> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 77: >> >>> 75: using ConcurrentTable = ConcurrentHashTable; >>> 76: >>> 77: ConcurrentTable* _table; >> >> So you have a class ObjectMonitorWorld, which references the ConcurrentTable, which, internally also has its actual table. This is 3 dereferences to get to the actual table, if I counted correctly. I'd try to eliminate the outermost ObjectMonitorWorld class, or at least make it a global flat structure instead of a reference to a heap-allocated object. I think, because this is a structure that is global and would exist throughout the lifetime of the Java program anyway, it might be worth figuring out how to do the actual ConcurrentHashTable flat in the global structure, too. > > This is a really good suggestion and might help a lot with the performance problems that we see with the table with heavily contended locking. I think we should change this in a follow-on patch (which I'll work on). I inlined the table in the surrounding object as it is a trivial change. Removing both indirections and creating static storage I would require more work (some conditional deferred creation, similar to an optional). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1713909990 From aboldtch at openjdk.org Mon Aug 12 14:41:23 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 12 Aug 2024 14:41:23 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6] In-Reply-To: References: Message-ID: On Tue, 23 Jul 2024 16:44:06 GMT, Coleen Phillimore wrote: >> I wanted to avoid having to add `NoSafepointVerifier` implementation details in the synchroniser code. I guess `ContinuationWrapper` already does this. >> >> Simply creating a `NoSafepointVerifier` when you expect no safepoint is more obvious to me, shows the intent better. > > This looks strange to me also, but it's be better than changing the no_safepoint_count directly, since NSV handles when the current thread isn't a JavaThread, so you'd have to duplicate that in this VerifyThreadState code too. > > NoSafepointVerifier::NoSafepointVerifier() : _thread(Thread::current()) { > if (_thread->is_Java_thread()) { > JavaThread::cast(_thread)->inc_no_safepoint_count(); > } > } It was the call to `[inc/dec]_no_safepoint_count` I wanted to avoid. But I will switch the conditionally created NSV to the `[inc/dec]_no_safepoint_count` calls instead. >> Yeah. The only effect is has is that you cannot reassign the variable. It was the style taken from [synchronizer.hpp](https://github.com/openjdk/jdk/blob/15997bc3dfe9dddf21f20fa189f97291824892de/src/hotspot/share/runtime/synchronizer.hpp) where all `InflateCause` parameters are const. > > Do you get this for inflate_fast_locked_object also? Yes. I'll just remove the const from all lightweightSynchronizer parameters. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1713909267 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1713909417 From aboldtch at openjdk.org Mon Aug 12 14:41:23 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 12 Aug 2024 14:41:23 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9] In-Reply-To: <1FImJurji3MUi1rauLpFYqETg45LmnlxLrRijzXBukg=.7125982a-3507-4711-922e-2c7c9706d87c@github.com> References: <1FImJurji3MUi1rauLpFYqETg45LmnlxLrRijzXBukg=.7125982a-3507-4711-922e-2c7c9706d87c@github.com> Message-ID: On Wed, 17 Jul 2024 06:48:03 GMT, David Holmes wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision: >> >> - Remove try_read >> - Add explicit to single parameter constructors >> - Remove superfluous access specifier >> - Remove unused include >> - Update assert message OMCache::set_monitor >> - Fix indentation >> - Remove outdated comment LightweightSynchronizer::exit >> - Remove logStream include >> - Remove strange comment >> - Fix javaThread include > > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 60: > >> 58: >> 59: // ConcurrentHashTable storing links from objects to ObjectMonitors >> 60: class ObjectMonitorWorld : public CHeapObj { > > OMWorld describes the project not the hashtable, this should be called ObjectMonitorTable or some such. I agree. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1713909770 From aboldtch at openjdk.org Mon Aug 12 14:41:23 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 12 Aug 2024 14:41:23 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6] In-Reply-To: References: Message-ID: <2CVx8D98FuzuqhebUabz7AACiTl_pJCV6v1-cr6YzV0=.0d4dc6ca-c911-47d2-a13c-35686d7b3bf9@github.com> On Mon, 15 Jul 2024 00:45:25 GMT, Axel Boldt-Christmas wrote: >> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 477: >> >>> 475: if (obj->mark_acquire().has_monitor()) { >>> 476: if (_length > 0 && _contended_oops[_length-1] == obj) { >>> 477: // assert(VM_Version::supports_recursive_lightweight_locking(), "must be"); >> >> Uncomment or remove assert? > > Yeah not sure why it was ever uncommented. To me it seems like that the assert should be invariant. But will investigate. I probably wanted to remove this. It is a tautology on all platforms but arm32 (or other with zero) right now. So removed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1713909109 From aboldtch at openjdk.org Mon Aug 12 15:58:14 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 12 Aug 2024 15:58:14 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v11] In-Reply-To: References: Message-ID: > When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. > > This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. > > A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). > > This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). > > Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. > > # Cleanups > > Cleaned up displaced header usage for: > * BasicLock > * Contains some Zero changes > * Renames one exported JVMCI field > * ObjectMonitor > * Updates comments and tests consistencies > > # Refactoring > > `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. > > The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. > > _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ > > # LightweightSynchronizer > > Working on adapting and incorporating the following section as a comment in the source code > > ## Fast Locking > > CAS on locking bits in markWord. > 0b00 (Fast Locked) <--> 0b01 (Unlocked) > > When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. > > If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf... Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Missing DEBUG_ONLY ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20067/files - new: https://git.openjdk.org/jdk/pull/20067/files/92a88366..d020bc9b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=09-10 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20067.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067 PR: https://git.openjdk.org/jdk/pull/20067 From iklam at openjdk.org Mon Aug 12 16:03:37 2024 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 12 Aug 2024 16:03:37 GMT Subject: RFR: 8338011: CDS archived heap object support for 64-bit Windows [v2] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 10:38:51 GMT, Thomas Stuefe wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> @tstuefe review -- changed error message > > This looks good. Thanks for changing the error message. Thanks @tstuefe @calvinccheung @shipilev for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20514#issuecomment-2284360898 From iklam at openjdk.org Mon Aug 12 16:06:38 2024 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 12 Aug 2024 16:06:38 GMT Subject: Integrated: 8338011: CDS archived heap object support for 64-bit Windows In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 19:16:20 GMT, Ioi Lam wrote: > We didn't support CDS archived heap object on Windows because > > - The Windows implementation of `os::map_memory()` cannot map the contents of a file into a region that's already reserved by the garbage collector. > - We had a high failure rate for mapping the CDS archive on Windows due to ASLR, sometimes as high as 50%. So it didn't seem worth the effort (mainly testing) to support archived heap objects on Windows. > > Both of the above issues were fixed in [JDK-8231610](https://bugs.openjdk.org/browse/JDK-8231610), so we should add the support to Windows now. > > (Tested on Oracle CI tiers 1-7) This pull request has now been integrated. Changeset: f84240bc Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/f84240bca80d2ff01e198bb67931ad4725a5b334 Stats: 40 lines in 3 files changed: 23 ins; 10 del; 7 mod 8338011: CDS archived heap object support for 64-bit Windows Reviewed-by: stuefe, shade, ccheung ------------- PR: https://git.openjdk.org/jdk/pull/20514 From aph at openjdk.org Mon Aug 12 16:32:05 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 12 Aug 2024 16:32:05 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v19] In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: <7EEEdr5j3kpKgATSlwBSRTMXZ0fvsVbZPBAXDyhxPSQ=.455f963a-371a-4092-8067-8c3424460e88@github.com> > This patch expands the use of a hash table for secondary superclasses > to the interpreter, C1, and runtime. It also adds a C2 implementation > of hashed lookup in cases where the superclass isn't known at compile > time. > > HotSpot shared runtime > ---------------------- > > Building hashed secondary tables is now unconditional. It takes very > little time, and now that the shared runtime always has the tables, it > might as well take advantage of them. The shared code is easier to > follow now, I think. > > There might be a performance issue with x86-64 in that we build > HotSpot for a default x86-64 target that does not support popcount. > This means that HotSpot C++ runtime on x86 always uses a software > emulation for popcount, even though the vast majority of machines made > for the past 20 years can do popcount in a single instruction. It > wouldn't be terribly hard to do something about that. > > Having said that, the software popcount is really not bad. > > x86 > --- > > x86 is rather tricky, because we still support > `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as > well as 32- and 64-bit ports. There's some further complication in > that only `RCX` can be used as a shift count, so there's some register > shuffling to do. All of this makes the logic in macroAssembler_x86.cpp > rather gnarly, with multiple levels of conditionals at compile time > and runtime. > > AArch64 > ------- > > AArch64 is considerably more straightforward. We always have a > popcount instruction and (thankfully) no 32-bit code to worry about. > > Generally > --------- > > I would dearly love simply to rip out the "old" secondary supers cache > support, but I've left it in just in case someone has a performance > regression. > > The versions of `MacroAssembler::lookup_secondary_supers_table` that > work with variable superclasses don't take a fixed set of temp > registers, and neither do they call out to to a slow path subroutine. > Instead, the slow patch is expanded inline. > > I don't think this is necessarily bad. Apart from the very rare cases > where C2 can't determine the superclass to search for at compile time, > this code is only used for generating stubs, and it seemed to me > ridiculous to have stubs calling other stubs. > > I've followed the guidance from @iwanowww not to obsess too much about > the performance of C1-compiled secondary supers lookups, and to prefer > simplicity over absolute performance. Nonetheless, this is a > complicated patch that touches many areas. Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 57 commits: - Fix merge - Merge branch 'clean' into JDK-8331658-work - Merge from JDK head. - Cleanup - Fix shared code - Fix shared code - use assert rather than guarantee - Untabify - Reorganize x86 - Merge branch 'JDK-8331658-work' of https://github.com/theRealAph/jdk into JDK-8331658-work - ... and 47 more: https://git.openjdk.org/jdk/compare/f84240bc...77de087e ------------- Changes: https://git.openjdk.org/jdk/pull/19989/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=18 Stats: 1047 lines in 20 files changed: 774 ins; 140 del; 133 mod Patch: https://git.openjdk.org/jdk/pull/19989.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989 PR: https://git.openjdk.org/jdk/pull/19989 From vlivanov at openjdk.org Mon Aug 12 16:50:36 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 12 Aug 2024 16:50:36 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v3] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 10:04:50 GMT, Aleksey Shipilev wrote: >> See bug for more discussion. >> >> Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193 >> >> A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice: >> https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160 >> >> AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone. >> >> I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields. >> >> Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit. >> >> C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner. >> >> Additional testing: >> - [x] New IR tests >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x... > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Use TestFramework bootclasspath instead of develop option > - Merge branch 'master' into JDK-8333791-stable-field-barrier > - Merge branch 'master' into JDK-8333791-stable-field-barrier > - Merge branch 'master' into JDK-8333791-stable-field-barrier > - Merge branch 'master' into JDK-8333791-stable-field-barrier > - Variant 2: Only final-field like semantics for stable inits > - Variant 3: Handle everything, including reads by compilers Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19635#pullrequestreview-2233569223 From liach at openjdk.org Mon Aug 12 18:32:36 2024 From: liach at openjdk.org (Chen Liang) Date: Mon, 12 Aug 2024 18:32:36 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v3] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 10:04:50 GMT, Aleksey Shipilev wrote: >> See bug for more discussion. >> >> Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193 >> >> A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice: >> https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160 >> >> AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone. >> >> I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields. >> >> Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit. >> >> C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner. >> >> Additional testing: >> - [x] New IR tests >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x... > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Use TestFramework bootclasspath instead of develop option > - Merge branch 'master' into JDK-8333791-stable-field-barrier > - Merge branch 'master' into JDK-8333791-stable-field-barrier > - Merge branch 'master' into JDK-8333791-stable-field-barrier > - Merge branch 'master' into JDK-8333791-stable-field-barrier > - Variant 2: Only final-field like semantics for stable inits > - Variant 3: Handle everything, including reads by compilers This updated patch to the new test framework looks clean. I think current `@Stable` is a few distinct usages bundled together: 1. lazy variables or arrays - addressed by `StableValue` jep 2. frozen arrays - there's an inactive frozen array proposal 3. constant folding outside of trusted packages - addressed by strict final fields (nullable types jep) I hope we can gradually roll out the 3 features to benefit all java users. ------------- Marked as reviewed by liach (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19635#pullrequestreview-2233759582 From coleenp at openjdk.org Mon Aug 12 18:58:44 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 12 Aug 2024 18:58:44 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v11] In-Reply-To: References: Message-ID: <5r33aU_dXazxa8Lahw6JCsbt5aLwcjCp1N6vkcTm_yI=.f47905ec-aa19-447f-ba28-658c6c94003a@github.com> On Mon, 12 Aug 2024 15:58:14 GMT, Axel Boldt-Christmas wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Missing DEBUG_ONLY src/hotspot/share/runtime/lightweightSynchronizer.cpp line 341: > 339: }; > 340: > 341: ObjectMonitorTable* LightweightSynchronizer::_omworld = nullptr; I preferred my version where ObjectMonitorTable was AllStatic which gets rid of _omworld-> everwhere, but the internal table is the pointer. You can remove more omworld names also. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1714238075 From gziemski at openjdk.org Mon Aug 12 19:48:32 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 12 Aug 2024 19:48:32 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 04:47:18 GMT, David Holmes wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. >> >> There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) > > If you called it `MemTypeFlag` - which to me still suggests mutually-exclusive values - then you would not need to rename all the variables with "flag" in their name later. @dholmes-ora @tstuefe @jdksjolen Where are we here? I have renamed MEMFLAGS to MemType. Is this fine, or do you wish to see a different way to handle this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2284784541 From dlong at openjdk.org Mon Aug 12 20:01:33 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 12 Aug 2024 20:01:33 GMT Subject: RFR: 8338155: Fix -Wzero-as-null-pointer-constant warnings involving PTHREAD_MUTEX_INITIALIZER In-Reply-To: <-VzpSBjPqMAteyuXwn__dmUdsw0aw9GIr1cF5Fl-rkE=.686a6d6f-aa48-4c3e-98cc-cc5dde9a26ca@github.com> References: <-VzpSBjPqMAteyuXwn__dmUdsw0aw9GIr1cF5Fl-rkE=.686a6d6f-aa48-4c3e-98cc-cc5dde9a26ca@github.com> Message-ID: <2NvmJkp95VnOGf17j2vn0MtM45qJciocLUhlcHBQvdk=.6879d97f-5668-47be-ac49-48e169fda983@github.com> On Mon, 12 Aug 2024 07:08:56 GMT, Kim Barrett wrote: > Please review this change to remove -Wzero-as-null-pointer-constant warnings > involving the use of PTHREAD_MUTEX_INITIALIZER. We obviously can't change the > initializer macro, and we can't avoid it. So we suppress that warning where > the initializer is used. > > This involved adding a suppression pragma macro for that warning, which we > haven't needed for any of the previous work on removing them. Hopefully we > won't need it for many (or any) other places, but there are still a few places > triggering that warning, and not all of them are otherwise simple to resolve. > > Testing: mach5 tier1 Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20537#pullrequestreview-2233908098 From psandoz at openjdk.org Mon Aug 12 22:06:30 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 12 Aug 2024 22:06:30 GMT Subject: RFR: 8338023: Support two vector selectFrom API In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Thu, 8 Aug 2024 06:57:28 GMT, Jatin Bhateja wrote: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... The results look promising. I can provide guidance on the specification e.g., we can specify the behavior in terms of rearrange, with the addition of throwing on out of bounds indexes. Regarding the throwing of exceptions, some wider context will help to know where we are heading before we finalize the specification. I believe we are considering changing the default throwing behavior for index out of bounds to wrapping, thereby we can avoid bounds checks. If that is the case we should wait until that is done then update rather than submitting a CSR just yet? I see you created a specific intrinsic, which will avoid the cost of shuffle creation. Should we apply the same approach (in a subsequent PR) to the single argument shuffle? Or perhaps if we manage to optimize shuffles and change the default wrapping we don't require a specific intrinsic and can just use defer to rearrange? ------------- PR Review: https://git.openjdk.org/jdk/pull/20508#pullrequestreview-2234095541 From psandoz at openjdk.org Mon Aug 12 22:36:48 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 12 Aug 2024 22:36:48 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v2] In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 17:20:06 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SATURATING_UADD : Saturating unsigned addition. >> . SATURATING_ADD : Saturating signed addition. >> . SATURATING_USUB : Saturating unsigned subtraction. >> . SATURATING_SUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 > - Removed redundant comment > - 8338021: Support saturating vector operators in VectorAPI Naming wise for the scalar methods i recommend the pattern of `op{Saturating}{Unsigned}`, that fits better with naming patterns used elsewhere, where we tend to be literal. For the vector operations we should refer to unsigned consistently with the unsigned compare operation names. Here we can be more terse. Which makes me wonder if we should use `U` consistently for unsigned and `S` for saturating e.g. `SUADD`, `UGT`, `UMAX` etc. Then that flows into the names used in `VectorSupport.java` and `vectorSupport.hpp`. ------------- PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2234118377 From dholmes at openjdk.org Tue Aug 13 01:19:47 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 13 Aug 2024 01:19:47 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 17:13:06 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. > > There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) MemType still makes all the "flag" variable names look weird IMO. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2285163251 From dholmes at openjdk.org Tue Aug 13 05:49:27 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 13 Aug 2024 05:49:27 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int Message-ID: This work has been split out from JDK-8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. Though with compact strings this reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define UTF8 lengths as `size_t` to accommodate all possible representations. Higher-level API's can still use `int` if they know the strings (eg symbols) are sufficiently constrained in length. See the comments in utf8.hpp that explain Strings, compact strings and the encoding. As the existing JNI `GetStringUTFLength` still requires the current truncating behaviour of ` UNICODE::utf8_length` we add back `UNICODE::utf8_length_as_int` for it to use. Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` use `length` as an IN/OUT parameter: it is the incoming (int) length of the jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. This makes some of the call sites a little messy with casts. Testing: - tiers 1-4 - GHA ------------- Commit messages: - unnecessary cast - Fix comments - Fix off-by-one error - Rollback the GetLargeStringUTFLength addition. - Rollback the GetLargeStringUTFLength addition. - Initial commit before splitting out UTF8 changes Changes: https://git.openjdk.org/jdk/pull/20560/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20560&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338257 Stats: 243 lines in 16 files changed: 116 ins; 5 del; 122 mod Patch: https://git.openjdk.org/jdk/pull/20560.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20560/head:pull/20560 PR: https://git.openjdk.org/jdk/pull/20560 From kbarrett at openjdk.org Tue Aug 13 07:31:01 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 13 Aug 2024 07:31:01 GMT Subject: Integrated: 8338155: Fix -Wzero-as-null-pointer-constant warnings involving PTHREAD_MUTEX_INITIALIZER In-Reply-To: <-VzpSBjPqMAteyuXwn__dmUdsw0aw9GIr1cF5Fl-rkE=.686a6d6f-aa48-4c3e-98cc-cc5dde9a26ca@github.com> References: <-VzpSBjPqMAteyuXwn__dmUdsw0aw9GIr1cF5Fl-rkE=.686a6d6f-aa48-4c3e-98cc-cc5dde9a26ca@github.com> Message-ID: On Mon, 12 Aug 2024 07:08:56 GMT, Kim Barrett wrote: > Please review this change to remove -Wzero-as-null-pointer-constant warnings > involving the use of PTHREAD_MUTEX_INITIALIZER. We obviously can't change the > initializer macro, and we can't avoid it. So we suppress that warning where > the initializer is used. > > This involved adding a suppression pragma macro for that warning, which we > haven't needed for any of the previous work on removing them. Hopefully we > won't need it for many (or any) other places, but there are still a few places > triggering that warning, and not all of them are otherwise simple to resolve. > > Testing: mach5 tier1 This pull request has now been integrated. Changeset: 73f7a5f1 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/73f7a5f15dbba54a98f3916ff1190520ac07874d Stats: 20 lines in 4 files changed: 16 ins; 0 del; 4 mod 8338155: Fix -Wzero-as-null-pointer-constant warnings involving PTHREAD_MUTEX_INITIALIZER Reviewed-by: dholmes, dlong ------------- PR: https://git.openjdk.org/jdk/pull/20537 From kbarrett at openjdk.org Tue Aug 13 07:30:59 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 13 Aug 2024 07:30:59 GMT Subject: RFR: 8338155: Fix -Wzero-as-null-pointer-constant warnings involving PTHREAD_MUTEX_INITIALIZER In-Reply-To: References: <-VzpSBjPqMAteyuXwn__dmUdsw0aw9GIr1cF5Fl-rkE=.686a6d6f-aa48-4c3e-98cc-cc5dde9a26ca@github.com> Message-ID: On Mon, 12 Aug 2024 11:13:13 GMT, David Holmes wrote: >> Please review this change to remove -Wzero-as-null-pointer-constant warnings >> involving the use of PTHREAD_MUTEX_INITIALIZER. We obviously can't change the >> initializer macro, and we can't avoid it. So we suppress that warning where >> the initializer is used. >> >> This involved adding a suppression pragma macro for that warning, which we >> haven't needed for any of the previous work on removing them. Hopefully we >> won't need it for many (or any) other places, but there are still a few places >> triggering that warning, and not all of them are otherwise simple to resolve. >> >> Testing: mach5 tier1 > > I would expect there is an upstream bug to fix this - seems crazy we have to workaround it. > > Changes look fine though. > > Thanks Thanks for reviews @dholmes-ora and @dean-long ------------- PR Comment: https://git.openjdk.org/jdk/pull/20537#issuecomment-2285536811 From fgao at openjdk.org Tue Aug 13 08:37:58 2024 From: fgao at openjdk.org (Fei Gao) Date: Tue, 13 Aug 2024 08:37:58 GMT Subject: RFR: 8337536: AArch64: Enable BTI branch protection for runtime part [v2] In-Reply-To: <3fLHxTC8pKjO7NWHedvizKrEJZG6CSpJlOFgf27m2hw=.301e8f28-030c-4109-96dc-f4efc2fa918c@github.com> References: <3fLHxTC8pKjO7NWHedvizKrEJZG6CSpJlOFgf27m2hw=.301e8f28-030c-4109-96dc-f4efc2fa918c@github.com> Message-ID: On Fri, 9 Aug 2024 18:35:55 GMT, Erik Joelsson wrote: >> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Clean up makefile >> - Merge branch 'master' into enable-bti-runtime >> - 8337536: AArch64: Enable BTI branch protection for runtime part >> >> This patch enables BTI branch protection for runtime part on >> Linux/aarch64 platform. >> >> Motivation >> >> 1. Since Fedora 33, glibc+kernel are PAC/BTI enabled by default. >> User-level packages can gain additional hardening by compiling with the >> GCC/Clang flag `-mbranch-protection=flag`. See [1]. >> >> 2. In JDK-8277204 [2], `--enable-branch-protection` was introduced as >> one VM configure flag, which would pass `-mbranch-protection=standard` >> compilation flags to all c/c++ files. Note that `standard` turns on both >> `pac-ret` and `bti` branch protections. For more details about code >> reuse attacks and hardware-assisted branch protections on AArch64, see >> [3]. >> >> However, we checked the `.note.gnu.property` section of all the shared >> libraries under `jdk/lib` on Fedora 40, and found that only libjvm.so >> didn't set these two target feature bits: >> >> ``` >> GNU_PROPERTY_AARCH64_FEATURE_1_BTI >> GNU_PROPERTY_AARCH64_FEATURE_1_PAC >> ``` >> >> Note-1: BTI is an all or nothing property for a link unit [4]. That is, >> libjvm.so is not BTI-enabled. >> >> Note-2: PAC bit in `.note.gnu.property` section is used to protect >> `.got.plt` table. It's independent of whether the relocatable objects >> use PAC or not. >> >> Goal >> >> Hence, this patch aims to set PAC/BTI feature bits of the >> `.note.gnu.property` section for libjvm.so. >> >> Implementation >> >> Task-1: find out the problematic input objects >> >> From [5], "Static linkers processing ELF relocatable objects must set >> the feature bit in the output object or image only if all the input >> objects have the corresponding feature bit set." Hence we suspect that >> the root cause is probably that the PAC/BTI feature bits are not set >> only for some input objects of libjvm.so. >> >> In order to find out these inputs, we passed `--force-bti` linker flag >> [4] in my local test. This linker flag would warn if any input object >> does not have GNU_PROPERTY_AARCH64_FEATU... > > Build changes look good. Thanks for your review @erikj79 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20491#issuecomment-2285678720 From adinn at openjdk.org Tue Aug 13 11:42:26 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 13 Aug 2024 11:42:26 GMT Subject: RFR: 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime Message-ID: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime ------------- Commit messages: - 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime Changes: https://git.openjdk.org/jdk/pull/20566/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20566&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337987 Stats: 2795 lines in 40 files changed: 1287 ins; 1458 del; 50 mod Patch: https://git.openjdk.org/jdk/pull/20566.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20566/head:pull/20566 PR: https://git.openjdk.org/jdk/pull/20566 From aboldtch at openjdk.org Tue Aug 13 12:59:55 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 13 Aug 2024 12:59:55 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9] In-Reply-To: <1fs1zYHKJsoWuEpKNb1ZY_VQ7_i_gQrbmx4d2fJvQo0=.1e3cbf20-dedf-4113-95c2-444869a75d1d@github.com> References: <1fs1zYHKJsoWuEpKNb1ZY_VQ7_i_gQrbmx4d2fJvQo0=.1e3cbf20-dedf-4113-95c2-444869a75d1d@github.com> Message-ID: On Tue, 16 Jul 2024 12:36:08 GMT, Roman Kennke wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision: >> >> - Remove try_read >> - Add explicit to single parameter constructors >> - Remove superfluous access specifier >> - Remove unused include >> - Update assert message OMCache::set_monitor >> - Fix indentation >> - Remove outdated comment LightweightSynchronizer::exit >> - Remove logStream include >> - Remove strange comment >> - Fix javaThread include > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 674: > >> 672: >> 673: // Search for obj in cache. >> 674: bind(loop); > > Same loop transformation would be possible here. I tried the following (see diff below) and it shows about a 5-10% regression in most the `LockUnlock.testInflated*` micros. Also tried with just `num_unrolled = 1` saw the same regression. Maybe there was some other pattern you were thinking of. There are probably architecture and platform differences. This can and should probably be explored in a followup PR. diff --git a/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp b/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp index 5dbfdbc225d..4e6621cfece 100644 --- a/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp +++ b/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp @@ -663,25 +663,28 @@ void C2_MacroAssembler::fast_lock_lightweight(Register obj, Register box, Regist const int num_unrolled = 2; for (int i = 0; i < num_unrolled; i++) { - cmpptr(obj, Address(t)); - jccb(Assembler::equal, monitor_found); - increment(t, in_bytes(OMCache::oop_to_oop_difference())); + Label next; + cmpptr(obj, Address(t, OMCache::oop_to_oop_difference() * i)); + jccb(Assembler::notEqual, next); + increment(t, in_bytes(OMCache::oop_to_oop_difference() * i)); + jmpb(monitor_found); + bind(next); } + increment(t, in_bytes(OMCache::oop_to_oop_difference() * (num_unrolled - 1))); Label loop; // Search for obj in cache. bind(loop); - - // Check for match. - cmpptr(obj, Address(t)); - jccb(Assembler::equal, monitor_found); - + // Advance. + increment(t, in_bytes(OMCache::oop_to_oop_difference())); // Search until null encountered, guaranteed _null_sentinel at end. cmpptr(Address(t), 1); jcc(Assembler::below, slow_path); // 0 check, but with ZF=0 when *t == 0 - increment(t, in_bytes(OMCache::oop_to_oop_difference())); - jmpb(loop); + + // Check for match. + cmpptr(obj, Address(t)); + jccb(Assembler::notEqual, loop); // Cache hit. bind(monitor_found); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715249312 From shade at openjdk.org Tue Aug 13 13:13:30 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 13 Aug 2024 13:13:30 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v3] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 10:04:50 GMT, Aleksey Shipilev wrote: >> See bug for more discussion. >> >> Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193 >> >> A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice: >> https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160 >> >> AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone. >> >> I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields. >> >> Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit. >> >> C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner. >> >> Additional testing: >> - [x] New IR tests >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x... > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Use TestFramework bootclasspath instead of develop option > - Merge branch 'master' into JDK-8333791-stable-field-barrier > - Merge branch 'master' into JDK-8333791-stable-field-barrier > - Merge branch 'master' into JDK-8333791-stable-field-barrier > - Merge branch 'master' into JDK-8333791-stable-field-barrier > - Variant 2: Only final-field like semantics for stable inits > - Variant 3: Handle everything, including reads by compilers Thanks! Last call for comments. If there are no other comments, I am going to integrate this with the next 24 hours. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2286219478 From aboldtch at openjdk.org Tue Aug 13 13:29:23 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 13 Aug 2024 13:29:23 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v12] In-Reply-To: References: Message-ID: > When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. > > This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. > > A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). > > This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). > > Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. > > # Cleanups > > Cleaned up displaced header usage for: > * BasicLock > * Contains some Zero changes > * Renames one exported JVMCI field > * ObjectMonitor > * Updates comments and tests consistencies > > # Refactoring > > `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. > > The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. > > _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ > > # LightweightSynchronizer > > Working on adapting and incorporating the following section as a comment in the source code > > ## Fast Locking > > CAS on locking bits in markWord. > 0b00 (Fast Locked) <--> 0b01 (Unlocked) > > When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. > > If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf... Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits: - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8315884 - Remove top comment - Missing DEBUG_ONLY - Remove extra whitespace in UseObjectMonitorTableTest.java - Inline _table - Rename ObjectMonitorWorld to ObjectMonitorTable - Update comment basicLock.hpp - Remove const for InflateCause parameters in lightweightSynchronizer - Use [inc/dec]_no_safepoint_count directly instead of a conditionally created NoSafepointVerifier - Remove unnecessary assert - ... and 29 more: https://git.openjdk.org/jdk/compare/76e33b6c...b96b916a ------------- Changes: https://git.openjdk.org/jdk/pull/20067/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=11 Stats: 3612 lines in 69 files changed: 2699 ins; 314 del; 599 mod Patch: https://git.openjdk.org/jdk/pull/20067.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067 PR: https://git.openjdk.org/jdk/pull/20067 From aboldtch at openjdk.org Tue Aug 13 13:29:24 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 13 Aug 2024 13:29:24 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6] In-Reply-To: References: Message-ID: On Fri, 12 Jul 2024 15:32:45 GMT, Roman Kennke wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Update arguments.cpp > > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 58: > >> 56: >> 57: // >> 58: // Lightweight synchronization. > > This comment doesn't really say anything. Either remove it, or add a nice summary of how LW locking and OM table stuff works. Removed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715291832 From aboldtch at openjdk.org Tue Aug 13 13:33:55 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 13 Aug 2024 13:33:55 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9] In-Reply-To: References: Message-ID: On Tue, 23 Jul 2024 20:21:05 GMT, Coleen Phillimore wrote: >> Only legacy locking uses the displaced header, I believe, which isn't clear in this code at all. This seems like a fix. We should probably assert that only legacy locking uses this field as a displaced header. > > Update: yes, this code change does assert if you use BasicLock's displaced header for locking modes other than LM_LEGACY. This is correct. The `displaced_header` in the BasicLock is only used by Legacy. Which is more strongly asserted now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715299608 From aboldtch at openjdk.org Tue Aug 13 13:37:03 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 13 Aug 2024 13:37:03 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9] In-Reply-To: References: Message-ID: On Tue, 23 Jul 2024 13:19:02 GMT, Coleen Phillimore wrote: >> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 62: >> >>> 60: class ObjectMonitorWorld : public CHeapObj { >>> 61: struct Config { >>> 62: using Value = ObjectMonitor*; >> >> Does this alias really help? We don't state the type that many times and it looks odd to end up with a mix of `Value` and `ObjectMonitor*` in the same code. > > This alias is present in the other CHT implementations, alas as a typedef in StringTable and SymbolTable so this follows the pattern and allows cut/paste of the allocate_node, get_hash, and other functions. It is required by the `ConcurrentHashTable` implementation. https://github.com/openjdk/jdk/blob/ebf1154292aa5e78c9eb9ddb26a1a3f9885c2ef8/src/hotspot/share/utilities/concurrentHashTable.hpp#L43-L45 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715304376 From aboldtch at openjdk.org Tue Aug 13 14:01:56 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 13 Aug 2024 14:01:56 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9] In-Reply-To: References: Message-ID: On Tue, 23 Jul 2024 16:36:18 GMT, Coleen Phillimore wrote: >> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 102: >> >>> 100: assert(*value != nullptr, "must be"); >>> 101: return (*value)->object_is_cleared(); >>> 102: } >> >> The `is_dead` functions seem oddly placed given they do not relate to the object stored in the wrapper. Why are they here? And what is the difference between `object_is_cleared` and `object_is_dead` (as used by `LookupMonitor`) ? > > This is a good question. When we look up the Monitor, we don't want to find any that the GC has marked dead, so that's why we call object_is_dead. When we look up with the object to find the Monitor, the object won't be dead (since we're using it to look up). But we don't want to find one that we've cleared because the Monitor was deflated? I don't see where we would clear it though. We clear the WeakHandle in the destructor after the Monitor has been removed from the table. What @coleenp said is correct. One is just a null check, the other interacts with the GC and the objects lifecycle. But as also mentioned we do not ever clear these any WeakHandle, so this is currently always returning false. (At least from what I can see). This load should be cached or in a register because it always load this to check if it is the value when doing a lookup. So there should be no performance cost here, but it is unnecessary. I'll remove this. The `object_is_dead` is only used when removing, where we can take the extra cost. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715347334 From aboldtch at openjdk.org Tue Aug 13 14:06:55 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 13 Aug 2024 14:06:55 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9] In-Reply-To: References: Message-ID: On Tue, 23 Jul 2024 14:27:34 GMT, Coleen Phillimore wrote: >> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 105: >> >>> 103: }; >>> 104: >>> 105: class LookupMonitor : public StackObj { >> >> I'm not understanding why we need this little wrapper class. > > It's a two way lookup. The plain Lookup class is used to lookup the Monitor given the object. This LookupMonitor class is used to lookup the object given the Monitor. The CHT takes these wrapper classes. Maybe we should rename LookupObject to be more clear? As @coleenp said there are two lookups. One when you have the object and want to get or insert a new ObjectMonitor. Or the second when the deflator has an ObjectMonitor and wants to deflate and remove that object - ObjectMonitor association. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715356203 From aboldtch at openjdk.org Tue Aug 13 14:15:18 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 13 Aug 2024 14:15:18 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v13] In-Reply-To: References: Message-ID: > When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. > > This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. > > A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). > > This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). > > Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. > > # Cleanups > > Cleaned up displaced header usage for: > * BasicLock > * Contains some Zero changes > * Renames one exported JVMCI field > * ObjectMonitor > * Updates comments and tests consistencies > > # Refactoring > > `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. > > The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. > > _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ > > # LightweightSynchronizer > > Working on adapting and incorporating the following section as a comment in the source code > > ## Fast Locking > > CAS on locking bits in markWord. > 0b00 (Fast Locked) <--> 0b01 (Unlocked) > > When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. > > If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf... Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Remove object_is_cleared ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20067/files - new: https://git.openjdk.org/jdk/pull/20067/files/b96b916a..53f833bc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=11-12 Stats: 6 lines in 3 files changed: 0 ins; 5 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20067.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067 PR: https://git.openjdk.org/jdk/pull/20067 From rcastanedalo at openjdk.org Tue Aug 13 14:23:59 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 Aug 2024 14:23:59 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v6] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 14:03:53 GMT, Martin Doerr wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Further motivate the choice of internal store address materialization in x64 > > src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 203: > >> 201: // Do we need to load the previous value? >> 202: if (obj != noreg) { >> 203: __ load_heap_oop(pre_val, Address(obj, 0), noreg, noreg, AS_RAW); > > How do we handle implicit null checks for which `obj` is null? Note that we may expect the store instruction to trigger SIGSEGV. Does it work correctly if we trigger the SIGSEGV, here in the pre barrier? Good question! I have checked (no pun intended) and it turns out C2 never uses stores (or other memory access operations) with late-expanded barriers to perform implicit null checks. This is accidental, due to the fact that all these memory operations use MachTemp nodes in the C2 code generation stage (to reserve registers for their ADL TEMP operands). C2's implicit null check analysis requires that all inputs of a candidate memory operation dominate the null check [1], which fails if the operation uses MachTemp nodes, since these are always placed in the same basic block [2]. Note that this optimization triggers very rarely, if at all, for memory operations in the current early barrier expansion model, since the additional control flow of the barrier code obfuscates the analysis. For late barrier expansion, the analysis could be easily extended to recognize and hoist MachTemp nodes together with their user memory operation that is a candidate to implement the implicit null check [3], but that would require extending the barrier assembly emission step to populate the implicit null exception table correctly. Since this seems non-trivial, and would also affect other garbage collectors (ZGC), I suggest to simply assert for now that we do not generate implicit null checks for memory operations with barrier data (as in [4]), and leave full support for implicit null checks for these G1 and ZGC operations to a future RFE. What do you think? [1] https://github.com/robcasloz/jdk/blob/d21104ca8ff1eef88a9d87fb78dda3009414b5b8/src/hotspot/share/opto/lcm.cpp#L310-L328 [2] https://github.com/robcasloz/jdk/blob/d21104ca8ff1eef88a9d87fb78dda3009414b5b8/src/hotspot/share/opto/gcm.cpp#L1397-L1404 [3] https://github.com/robcasloz/jdk/commit/e0ab1e418b81c0acddff2190ca57b3335b5214ba [4] https://github.com/robcasloz/jdk/commit/e0ab1e418b81c0acddff2190ca57b3335b5214ba#diff-554fddca91406a67dc0f8faee12dc30c709181685a0add7f4ba9ae5ace68f192R2031-R2032 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1715387255 From aboldtch at openjdk.org Tue Aug 13 14:50:06 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 13 Aug 2024 14:50:06 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v14] In-Reply-To: References: Message-ID: > When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. > > This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. > > A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). > > This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). > > Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. > > # Cleanups > > Cleaned up displaced header usage for: > * BasicLock > * Contains some Zero changes > * Renames one exported JVMCI field > * ObjectMonitor > * Updates comments and tests consistencies > > # Refactoring > > `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. > > The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. > > _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ > > # LightweightSynchronizer > > Working on adapting and incorporating the following section as a comment in the source code > > ## Fast Locking > > CAS on locking bits in markWord. > 0b00 (Fast Locked) <--> 0b01 (Unlocked) > > When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. > > If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf... Axel Boldt-Christmas has updated the pull request incrementally with three additional commits since the last revision: - Remove _omworld - Make ObjectMonitorTable AllStatic - Revert "Inline _table" This reverts commit 937f531364bb7025998d3a80f9ea93cbc8c9650f. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20067/files - new: https://git.openjdk.org/jdk/pull/20067/files/53f833bc..123a2683 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=12-13 Stats: 64 lines in 3 files changed: 4 ins; 5 del; 55 mod Patch: https://git.openjdk.org/jdk/pull/20067.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067 PR: https://git.openjdk.org/jdk/pull/20067 From aboldtch at openjdk.org Tue Aug 13 14:52:03 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 13 Aug 2024 14:52:03 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v15] In-Reply-To: References: Message-ID: > When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. > > This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. > > A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). > > This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). > > Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. > > # Cleanups > > Cleaned up displaced header usage for: > * BasicLock > * Contains some Zero changes > * Renames one exported JVMCI field > * ObjectMonitor > * Updates comments and tests consistencies > > # Refactoring > > `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. > > The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. > > _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ > > # LightweightSynchronizer > > Working on adapting and incorporating the following section as a comment in the source code > > ## Fast Locking > > CAS on locking bits in markWord. > 0b00 (Fast Locked) <--> 0b01 (Unlocked) > > When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. > > If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf... Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: - Remove the last OMWorld references - Rename omworldtable_work to object_monitor_table_work ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20067/files - new: https://git.openjdk.org/jdk/pull/20067/files/123a2683..7946d148 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=13-14 Stats: 6 lines in 2 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20067.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067 PR: https://git.openjdk.org/jdk/pull/20067 From aboldtch at openjdk.org Tue Aug 13 14:52:04 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 13 Aug 2024 14:52:04 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v11] In-Reply-To: <5r33aU_dXazxa8Lahw6JCsbt5aLwcjCp1N6vkcTm_yI=.f47905ec-aa19-447f-ba28-658c6c94003a@github.com> References: <5r33aU_dXazxa8Lahw6JCsbt5aLwcjCp1N6vkcTm_yI=.f47905ec-aa19-447f-ba28-658c6c94003a@github.com> Message-ID: <4P75lMkZcWHH-gmLGywQxkgTIyAVhnURw7pLX1_5_Tk=.462bc748-1875-4e42-9a27-04ac2968c124@github.com> On Mon, 12 Aug 2024 18:55:41 GMT, Coleen Phillimore wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Missing DEBUG_ONLY > > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 341: > >> 339: }; >> 340: >> 341: ObjectMonitorTable* LightweightSynchronizer::_omworld = nullptr; > > I preferred my version where ObjectMonitorTable was AllStatic which gets rid of _omworld-> everwhere, but the internal table is the pointer. You can remove more omworld names also. I implemented the AllStatic table, and remove all mentions of OMWorld. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715432994 From coleenp at openjdk.org Tue Aug 13 14:56:56 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 13 Aug 2024 14:56:56 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9] In-Reply-To: References: <3m5N_Fh65MVy7vRvO0wq3qFlzxjbCLHhbTBJe8OJorw=.eb61b3bd-5aca-45cd-8e88-389ae86a599b@github.com> Message-ID: On Mon, 12 Aug 2024 14:37:50 GMT, Axel Boldt-Christmas wrote: >> This is a really good suggestion and might help a lot with the performance problems that we see with the table with heavily contended locking. I think we should change this in a follow-on patch (which I'll work on). > > I inlined the table in the surrounding object as it is a trivial change. > > Removing both indirections and creating static storage I would require more work (some conditional deferred creation, similar to an optional). Sadly doesn't help with performance. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715443138 From coleenp at openjdk.org Tue Aug 13 15:04:54 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 13 Aug 2024 15:04:54 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v15] In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 14:52:03 GMT, Axel Boldt-Christmas wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: > > - Remove the last OMWorld references > - Rename omworldtable_work to object_monitor_table_work Two tiny nits. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 172: > 170: static void create() { > 171: _table = new ConcurrentTable(initial_log_size(), > 172: max_log_size(), nit, can you line up these parameters? src/hotspot/share/runtime/lightweightSynchronizer.cpp line 579: > 577: } > 578: > 579: Extra blank line. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20067#pullrequestreview-2235807775 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715446552 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715449461 From shade at openjdk.org Tue Aug 13 16:03:42 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 13 Aug 2024 16:03:42 GMT Subject: RFR: 8338314: JFR: Split JFRCheckpoint VM operation Message-ID: Investigating JFR crashes is a bit tedious, as Events section in `hs_err` shows just: Event: 3.006 Executing VM operation: JFRCheckpoint Event: 3.006 Executing VM operation: JFRCheckpoint done What is that `JFRCheckpoint` doing is unclear, because it can do two separate things: clear or write. It would be good if we could disambiguate the two. Since there are only two flavors of checkpoint, I think we can just split the VMOp into two more precisely named ones, so it gives us e.g.: Event: 2.462 Executing VM operation: JFRSafepointClear Event: 2.463 Executing VM operation: JFRSafepointClear done ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/20570/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20570&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338314 Stats: 23 lines in 3 files changed: 12 ins; 1 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/20570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20570/head:pull/20570 PR: https://git.openjdk.org/jdk/pull/20570 From rkennke at openjdk.org Tue Aug 13 16:05:57 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 13 Aug 2024 16:05:57 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9] In-Reply-To: References: <1fs1zYHKJsoWuEpKNb1ZY_VQ7_i_gQrbmx4d2fJvQo0=.1e3cbf20-dedf-4113-95c2-444869a75d1d@github.com> Message-ID: On Tue, 13 Aug 2024 12:57:23 GMT, Axel Boldt-Christmas wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 674: >> >>> 672: >>> 673: // Search for obj in cache. >>> 674: bind(loop); >> >> Same loop transformation would be possible here. > > I tried the following (see diff below) and it shows about a 5-10% regression in most the `LockUnlock.testInflated*` micros. Also tried with just `num_unrolled = 1` saw the same regression. Maybe there was some other pattern you were thinking of. There are probably architecture and platform differences. This can and should probably be explored in a followup PR. > > > > diff --git a/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp b/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp > index 5dbfdbc225d..4e6621cfece 100644 > --- a/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp > +++ b/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp > @@ -663,25 +663,28 @@ void C2_MacroAssembler::fast_lock_lightweight(Register obj, Register box, Regist > > const int num_unrolled = 2; > for (int i = 0; i < num_unrolled; i++) { > - cmpptr(obj, Address(t)); > - jccb(Assembler::equal, monitor_found); > - increment(t, in_bytes(OMCache::oop_to_oop_difference())); > + Label next; > + cmpptr(obj, Address(t, OMCache::oop_to_oop_difference() * i)); > + jccb(Assembler::notEqual, next); > + increment(t, in_bytes(OMCache::oop_to_oop_difference() * i)); > + jmpb(monitor_found); > + bind(next); > } > + increment(t, in_bytes(OMCache::oop_to_oop_difference() * (num_unrolled - 1))); > > Label loop; > > // Search for obj in cache. > bind(loop); > - > - // Check for match. > - cmpptr(obj, Address(t)); > - jccb(Assembler::equal, monitor_found); > - > + // Advance. > + increment(t, in_bytes(OMCache::oop_to_oop_difference())); > // Search until null encountered, guaranteed _null_sentinel at end. > cmpptr(Address(t), 1); > jcc(Assembler::below, slow_path); // 0 check, but with ZF=0 when *t == 0 > - increment(t, in_bytes(OMCache::oop_to_oop_difference())); > - jmpb(loop); > + > + // Check for match. > + cmpptr(obj, Address(t)); > + jccb(Assembler::notEqual, loop); > > // Cache hit. > bind(monitor_found); Yeah it's probably not very important. But it's not quite what I had in mind, I was thinking more something like (aarch64 version, untested, may be wrong): diff --git a/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp b/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp index 19af03d3488..05bbb5760b8 100644 --- a/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp +++ b/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp @@ -302,14 +302,14 @@ void C2_MacroAssembler::fast_lock_lightweight(Register obj, Register box, Regist Label monitor_found; // Load cache address - lea(t3_t, Address(rthread, JavaThread::om_cache_oops_offset())); + lea(t3_t, Address(rthread, JavaThread::om_cache_oops_offset() - OMCache::oop_to_oop_difference())); const int num_unrolled = 2; for (int i = 0; i < num_unrolled; i++) { + increment(t3_t, in_bytes(OMCache::oop_to_oop_difference())); ldr(t1, Address(t3_t)); cmp(obj, t1); br(Assembler::EQ, monitor_found); - increment(t3_t, in_bytes(OMCache::oop_to_oop_difference())); } Label loop; @@ -317,16 +317,14 @@ void C2_MacroAssembler::fast_lock_lightweight(Register obj, Register box, Regist // Search for obj in cache. bind(loop); - // Check for match. - ldr(t1, Address(t3_t)); - cmp(obj, t1); - br(Assembler::EQ, monitor_found); + increment(t3_t, in_bytes(OMCache::oop_to_oop_difference())); + ldr(t1, Address(t3_t)); // Search until null encountered, guaranteed _null_sentinel at end. - increment(t3_t, in_bytes(OMCache::oop_to_oop_difference())); - cbnz(t1, loop); - // Cache Miss, NE set from cmp above, cbnz does not set flags - b(slow_path); + cbz(t1, slow_path); + // Check for match. + cmp(obj, t1); + br(Assembler::NE, loop); bind(monitor_found); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715557161 From aboldtch at openjdk.org Tue Aug 13 16:30:12 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 13 Aug 2024 16:30:12 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v16] In-Reply-To: References: Message-ID: > When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. > > This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. > > A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). > > This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). > > Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. > > # Cleanups > > Cleaned up displaced header usage for: > * BasicLock > * Contains some Zero changes > * Renames one exported JVMCI field > * ObjectMonitor > * Updates comments and tests consistencies > > # Refactoring > > `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. > > The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. > > _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ > > # LightweightSynchronizer > > Working on adapting and incorporating the following section as a comment in the source code > > ## Fast Locking > > CAS on locking bits in markWord. > 0b00 (Fast Locked) <--> 0b01 (Unlocked) > > When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. > > If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf... Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Whitespace and nits ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20067/files - new: https://git.openjdk.org/jdk/pull/20067/files/7946d148..68681c84 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=14-15 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20067.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067 PR: https://git.openjdk.org/jdk/pull/20067 From gziemski at openjdk.org Tue Aug 13 16:32:49 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 13 Aug 2024 16:32:49 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 01:16:45 GMT, David Holmes wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. >> >> There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) > > MemType still makes all the "flag" variable names look weird IMO. @dholmes-ora @tstuefe @jdksjolen Is everyone OK with MemTypeFlag? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2286658061 From wkemper at openjdk.org Tue Aug 13 16:39:50 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 13 Aug 2024 16:39:50 GMT Subject: RFR: 8293650: Shenandoah: Support archived heap objects [v2] In-Reply-To: References: Message-ID: On Mon, 5 Aug 2024 17:13:06 GMT, Aleksey Shipilev wrote: >> This implements CDS Java heap loading for Shenandoah. There are peculiarities with how CDS loads objects: it basically asks for a contiguous block of memory, fills it out, potentially relocating the objects. This gets interesting when a single Shenandoah region cannot contain the entirety of the load. See the implementation for gory details. >> >> Current implementation would work well only with Shenandoah heap regions >= 1M, in other words, with the heaps >=2G. It would be better if we trim down the min alignment, thus unblocking smaller heaps. It is not necessary to do so in this PR, so I track that work separately: [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828). >> >> Additional testing: >> - [x] New test >> - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` >> - [x] Same as above, but `MIN_GC_REGION_ALIGNMENT` manually dropped to 256K (mimics [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828)) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Move constant to separate class to unbreak Windows builds Changes requested by wkemper (Committer). src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2540: > 2538: > 2539: // If the trailing region is not full, we need to adjust its top. > 2540: size_t tail = (size % ShenandoahHeapRegion::region_size_words()); Not sure we need this. The free set adjusts top for all humongous regions in the allocation (including the tail): https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp#L885. What's more, I believe `new_top` is only used during a full GC to sort out `top` once compaction is complete. ------------- PR Review: https://git.openjdk.org/jdk/pull/20468#pullrequestreview-2236051231 PR Review Comment: https://git.openjdk.org/jdk/pull/20468#discussion_r1715601238 From adinn at openjdk.org Tue Aug 13 17:36:44 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 13 Aug 2024 17:36:44 GMT Subject: RFR: 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime [v2] In-Reply-To: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> References: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> Message-ID: > 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime Andrew Dinn has updated the pull request incrementally with six additional commits since the last revision: - copy macro across - typo - fix frame layouts for x86_32 - fix throw exception stub generation on zero - fix some header includes and defintions - fix issues with ports ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20566/files - new: https://git.openjdk.org/jdk/pull/20566/files/7f9f0c42..ea4ec0c3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20566&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20566&range=00-01 Stats: 105 lines in 10 files changed: 54 ins; 45 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20566.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20566/head:pull/20566 PR: https://git.openjdk.org/jdk/pull/20566 From rkennke at openjdk.org Tue Aug 13 17:48:54 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 13 Aug 2024 17:48:54 GMT Subject: RFR: 8293650: Shenandoah: Support archived heap objects [v2] In-Reply-To: References: Message-ID: On Mon, 5 Aug 2024 17:13:06 GMT, Aleksey Shipilev wrote: >> This implements CDS Java heap loading for Shenandoah. There are peculiarities with how CDS loads objects: it basically asks for a contiguous block of memory, fills it out, potentially relocating the objects. This gets interesting when a single Shenandoah region cannot contain the entirety of the load. See the implementation for gory details. >> >> Current implementation would work well only with Shenandoah heap regions >= 1M, in other words, with the heaps >=2G. It would be better if we trim down the min alignment, thus unblocking smaller heaps. It is not necessary to do so in this PR, so I track that work separately: [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828). >> >> Additional testing: >> - [x] New test >> - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` >> - [x] Same as above, but `MIN_GC_REGION_ALIGNMENT` manually dropped to 256K (mimics [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828)) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Move constant to separate class to unbreak Windows builds Nice work! Only a minor nit/question. src/hotspot/share/cds/archiveHeapWriter.hpp line 126: > 124: // depends on -Xmx, but can never be smaller than 1 * M. > 125: // (TODO: Perhaps change to 256K to be compatible with Shenandoah) > 126: static constexpr int MIN_GC_REGION_ALIGNMENT = 1 * M; Couldn't you just move up the constant to the public section? Also, I'm not sure what's the point of it being constexpr (rather that just const), but that is pre-existing. ------------- Changes requested by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20468#pullrequestreview-2236189386 PR Review Comment: https://git.openjdk.org/jdk/pull/20468#discussion_r1715688606 From szaldana at openjdk.org Tue Aug 13 19:26:09 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Tue, 13 Aug 2024 19:26:09 GMT Subject: RFR: 8204681: Option to include timestamp in hprof filename Message-ID: Hi all, This PR addresses [8204681](https://bugs.openjdk.org/browse/JDK-8204681) enabling support for timestamp expansion in filenames specified in `-XX:HeapDumpPath` using `%t`. As mentioned in this comments for this issue, this is somewhat related to [8334492](https://bugs.openjdk.org/browse/JDK-8334492) where we enabled support for `%p` for filenames specified in jcmd. With this patch, I propose: - Expanding the utility function `Arguments::copy_expand_pid` to `Arguments::copy_expand_arguments` to deal with `%p` expansions for pid and `%t` expansions for timestamps. - Leveraging the above utility function to enable argument expansion for both heap dump filenames and jcmd output commands. - Though the linked JBS issue only relates to heap dumps generated in case of OOM, I think we can edit it to more broadly support filename expansion to support `%t` for jcmd as well. Testing: - [x] Added test cases pass with all platforms (verified with a GHA job). - [x] Tier 1 passes with GHA. Looking forward to hearing your thoughts! Thanks, Sonia ------------- Commit messages: - 8204681: Option to include timestamp in hprof filename Changes: https://git.openjdk.org/jdk/pull/20568/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20568&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8204681 Stats: 361 lines in 12 files changed: 273 ins; 72 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/20568.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20568/head:pull/20568 PR: https://git.openjdk.org/jdk/pull/20568 From szaldana at openjdk.org Tue Aug 13 19:26:09 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Tue, 13 Aug 2024 19:26:09 GMT Subject: RFR: 8204681: Option to include timestamp in hprof filename In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 15:07:17 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses [8204681](https://bugs.openjdk.org/browse/JDK-8204681) enabling support for timestamp expansion in filenames specified in `-XX:HeapDumpPath` using `%t`. > > As mentioned in this comments for this issue, this is somewhat related to [8334492](https://bugs.openjdk.org/browse/JDK-8334492) where we enabled support for `%p` for filenames specified in jcmd. > > With this patch, I propose: > - Expanding the utility function `Arguments::copy_expand_pid` to `Arguments::copy_expand_arguments` to deal with `%p` expansions for pid and `%t` expansions for timestamps. > - Leveraging the above utility function to enable argument expansion for both heap dump filenames and jcmd output commands. > - Though the linked JBS issue only relates to heap dumps generated in case of OOM, I think we can edit it to more broadly support filename expansion to support `%t` for jcmd as well. > > Testing: > - [x] Added test cases pass with all platforms (verified with a GHA job). > - [x] Tier 1 passes with GHA. > > Looking forward to hearing your thoughts! > > Thanks, > Sonia cc @kevinjwalls ------------- PR Comment: https://git.openjdk.org/jdk/pull/20568#issuecomment-2286971556 From shade at openjdk.org Tue Aug 13 19:34:51 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 13 Aug 2024 19:34:51 GMT Subject: RFR: 8293650: Shenandoah: Support archived heap objects [v2] In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 16:34:00 GMT, William Kemper wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Move constant to separate class to unbreak Windows builds > > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2540: > >> 2538: >> 2539: // If the trailing region is not full, we need to adjust its top. >> 2540: size_t tail = (size % ShenandoahHeapRegion::region_size_words()); > > Not sure we need this. The free set adjusts top for all humongous regions in the allocation (including the tail): https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp#L885. What's more, I believe `new_top` is only used during a full GC to sort out `top` once compaction is complete. Right, d'oh. Not sure what I was thinking there. I'll drop the block, since the whole thing apparently works without adjusting the top. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20468#discussion_r1715823681 From shade at openjdk.org Tue Aug 13 19:51:10 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 13 Aug 2024 19:51:10 GMT Subject: RFR: 8293650: Shenandoah: Support archived heap objects [v3] In-Reply-To: References: Message-ID: > This implements CDS Java heap loading for Shenandoah. There are peculiarities with how CDS loads objects: it basically asks for a contiguous block of memory, fills it out, potentially relocating the objects. This gets interesting when a single Shenandoah region cannot contain the entirety of the load. See the implementation for gory details. > > Current implementation would work well only with Shenandoah heap regions >= 1M, in other words, with the heaps >=2G. It would be better if we trim down the min alignment, thus unblocking smaller heaps. It is not necessary to do so in this PR, so I track that work separately: [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828). > > Additional testing: > - [x] New test > - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` > - [x] Same as above, but `MIN_GC_REGION_ALIGNMENT` manually dropped to 256K (mimics [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828)) Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Review comments - Merge branch 'master' into JDK-8293650-shenandoah-archives - Move constant to separate class to unbreak Windows builds - Touchups in test - Basic implementation, works well, passes tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20468/files - new: https://git.openjdk.org/jdk/pull/20468/files/3a8fa655..361a67db Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20468&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20468&range=01-02 Stats: 11518 lines in 407 files changed: 4509 ins; 5515 del; 1494 mod Patch: https://git.openjdk.org/jdk/pull/20468.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20468/head:pull/20468 PR: https://git.openjdk.org/jdk/pull/20468 From shade at openjdk.org Tue Aug 13 19:51:10 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 13 Aug 2024 19:51:10 GMT Subject: RFR: 8293650: Shenandoah: Support archived heap objects [v3] In-Reply-To: References: Message-ID: <-ESSW0ZYc_J37C2REktneBwf3ybOs_RUntRbKQZYy1U=.388e8471-2f5c-4949-866b-4b3511739c22@github.com> On Tue, 13 Aug 2024 17:41:13 GMT, Roman Kennke wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Review comments >> - Merge branch 'master' into JDK-8293650-shenandoah-archives >> - Move constant to separate class to unbreak Windows builds >> - Touchups in test >> - Basic implementation, works well, passes tests > > src/hotspot/share/cds/archiveHeapWriter.hpp line 126: > >> 124: // depends on -Xmx, but can never be smaller than 1 * M. >> 125: // (TODO: Perhaps change to 256K to be compatible with Shenandoah) >> 126: static constexpr int MIN_GC_REGION_ALIGNMENT = 1 * M; > > Couldn't you just move up the constant to the public section? > Also, I'm not sure what's the point of it being constexpr (rather that just const), but that is pre-existing. Yeah, I had problems with that. But I think I just misplaced the constant the last time around. Let me see what Windows GHA runs have to say about the code in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20468#discussion_r1715839754 From kvn at openjdk.org Tue Aug 13 20:01:49 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 13 Aug 2024 20:01:49 GMT Subject: RFR: 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime [v2] In-Reply-To: References: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> Message-ID: On Tue, 13 Aug 2024 17:36:44 GMT, Andrew Dinn wrote: >> 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime > > Andrew Dinn has updated the pull request incrementally with six additional commits since the last revision: > > - copy macro across > - typo > - fix frame layouts for x86_32 > - fix throw exception stub generation on zero > - fix some header includes and defintions > - fix issues with ports Please, add description to PR src/hotspot/cpu/x86/stubGenerator_x86_32.cpp line 3917: > 3915: StubRoutines::x86::_d2l_wrapper = generate_d2i_wrapper(T_LONG, CAST_FROM_FN_PTR(address, SharedRuntime::d2l)); > 3916: > 3917: CAST_FROM_FN_PTR(address, SharedRuntime::throw_delayed_StackOverflowError)); This does not seems correct. And I see linux-x86 (32 bits) corresponding build failure in GHA. ------------- PR Review: https://git.openjdk.org/jdk/pull/20566#pullrequestreview-2236444699 PR Review Comment: https://git.openjdk.org/jdk/pull/20566#discussion_r1715843933 From mdoerr at openjdk.org Tue Aug 13 20:44:53 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 13 Aug 2024 20:44:53 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v6] In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 14:21:01 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 203: >> >>> 201: // Do we need to load the previous value? >>> 202: if (obj != noreg) { >>> 203: __ load_heap_oop(pre_val, Address(obj, 0), noreg, noreg, AS_RAW); >> >> How do we handle implicit null checks for which `obj` is null? Note that we may expect the store instruction to trigger SIGSEGV. Does it work correctly if we trigger the SIGSEGV, here in the pre barrier? > > Good question! I have checked (no pun intended) and it turns out C2 never uses stores (or other memory access operations) with late-expanded barriers to perform implicit null checks. This is accidental, due to the fact that all these memory operations use MachTemp nodes in the C2 code generation stage (to reserve registers for their ADL TEMP operands). C2's implicit null check analysis requires that all inputs of a candidate memory operation dominate the null check [1], which fails if the operation uses MachTemp nodes, since these are always placed in the same basic block [2]. > > Note that this optimization triggers very rarely, if at all, for memory operations in the current early barrier expansion model, since the additional control flow of the barrier code obfuscates the analysis. For late barrier expansion, the analysis could be easily extended to recognize and hoist MachTemp nodes together with their user memory operation that is a candidate to implement the implicit null check [3], but that would require extending the barrier assembly emission step to populate the implicit null exception table correctly. Since this seems non-trivial, and would also affect other garbage collectors (ZGC), I suggest to simply assert for now that we do not generate implicit null checks for memory operations with barrier data (as in [4]), and leave full support for implicit null checks for these G1 and ZGC operations to a future RFE. What do you think? > > [1] https://github.com/robcasloz/jdk/blob/d21104ca8ff1eef88a9d87fb78dda3009414b5b8/src/hotspot/share/opto/lcm.cpp#L310-L328 > [2] https://github.com/robcasloz/jdk/blob/d21104ca8ff1eef88a9d87fb78dda3009414b5b8/src/hotspot/share/opto/gcm.cpp#L1397-L1404 > [3] https://github.com/robcasloz/jdk/commit/e0ab1e418b81c0acddff2190ca57b3335b5214ba > [4] https://github.com/robcasloz/jdk/commit/e0ab1e418b81c0acddff2190ca57b3335b5214ba#diff-554fddca91406a67dc0f8faee12dc30c709181685a0add7f4ba9ae5ace68f192R2031-R2032 Thanks for figuring it out! Makes sense. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1715904218 From adinn at openjdk.org Tue Aug 13 21:38:10 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 13 Aug 2024 21:38:10 GMT Subject: RFR: 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime [v2] In-Reply-To: References: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> Message-ID: On Tue, 13 Aug 2024 19:52:05 GMT, Vladimir Kozlov wrote: >> Andrew Dinn has updated the pull request incrementally with six additional commits since the last revision: >> >> - copy macro across >> - typo >> - fix frame layouts for x86_32 >> - fix throw exception stub generation on zero >> - fix some header includes and defintions >> - fix issues with ports > > src/hotspot/cpu/x86/stubGenerator_x86_32.cpp line 3917: > >> 3915: StubRoutines::x86::_d2l_wrapper = generate_d2i_wrapper(T_LONG, CAST_FROM_FN_PTR(address, SharedRuntime::d2l)); >> 3916: >> 3917: CAST_FROM_FN_PTR(address, SharedRuntime::throw_delayed_StackOverflowError)); > > This does not seems correct. And I see linux-x86 (32 bits) corresponding build failure in GHA. Sorry, that was an accidental paste while editing the code. Should be fixed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20566#discussion_r1715953549 From adinn at openjdk.org Tue Aug 13 21:38:09 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 13 Aug 2024 21:38:09 GMT Subject: RFR: 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime [v3] In-Reply-To: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> References: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> Message-ID: <07DqhAfjMD9qfeno10HOAuNBeiIul86acqTMpE6YtaY=.2569accb-c0ab-470f-b348-5894831be5d5@github.com> > Store the throw_exception and jfr stub code as blobs in class SharedRuntime, move the generation code to the the arch-specific generator classes and update client code to access them from their new location. Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: fix accidental paste ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20566/files - new: https://git.openjdk.org/jdk/pull/20566/files/ea4ec0c3..f12aea03 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20566&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20566&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20566.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20566/head:pull/20566 PR: https://git.openjdk.org/jdk/pull/20566 From dcubed at openjdk.org Tue Aug 13 22:15:59 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 13 Aug 2024 22:15:59 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v16] In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 16:30:12 GMT, Axel Boldt-Christmas wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Whitespace and nits I finished my first pass crawl thru on these changes. I need to mull on these changes a bit before I make another pass. I think there's only one real bug buried in all of my comments. src/hotspot/share/runtime/objectMonitor.cpp line 377: > 375: > 376: if (cur == current) { > 377: // TODO-FIXME: check for integer overflow! BUGID 6557169. Thanks for removing this comment. JDK-6557169 was closed as "Will Not FIx" in 2017. src/hotspot/share/runtime/synchronizer.cpp line 970: > 968: if (value == 0) value = 0xBAD; > 969: assert(value != markWord::no_hash, "invariant"); > 970: return value; In the elided part above this line, we have: if (value == 0) value = 0xBAD; assert(value != markWord::no_hash, "invariant"); so my memory about zero being returnable as a hash value is wrong. src/hotspot/share/runtime/synchronizer.cpp line 977: > 975: > 976: markWord mark = obj->mark_acquire(); > 977: for(;;) { nit - please insert space before `(` src/hotspot/share/runtime/synchronizer.cpp line 997: > 995: // Since the monitor isn't in the object header, it can simply be installed. > 996: if (UseObjectMonitorTable) { > 997: return install_hash_code(current, obj); Perhaps: if (UseObjectMonitorTable) { // Since the monitor isn't in the object header, the hash can simply be // installed in the object header. return install_hash_code(current, obj); src/hotspot/share/runtime/synchronizer.cpp line 1271: > 1269: _no_progress_cnt >= NoAsyncDeflationProgressMax) { > 1270: double remainder = (100.0 - MonitorUsedDeflationThreshold) / 100.0; > 1271: size_t new_ceiling = ceiling / remainder + 1; Why was the `new_ceiling` calculation changed? I think the `new_ceiling` value is going to lower than the old ceiling value. src/hotspot/share/runtime/synchronizer.inline.hpp line 83: > 81: > 82: > 83: #endif // SHARE_RUNTIME_SYNCHRONIZER_INLINE_HPP nit - please delete one of the blank lines. src/hotspot/share/runtime/vframe.cpp line 252: > 250: if (mark.has_monitor()) { > 251: ObjectMonitor* mon = ObjectSynchronizer::read_monitor(current, monitor->owner(), mark); > 252: if (// if the monitor is null we must be in the process of locking nit - please add a space after `(` test/hotspot/gtest/runtime/test_objectMonitor.cpp line 36: > 34: EXPECT_EQ(in_bytes(ObjectMonitor::metadata_offset()), 0) > 35: << "_metadata at a non 0 offset. metadata_offset = " > 36: << in_bytes(ObjectMonitor::metadata_offset()); nit - the indent should be four spaces instead of five spaces. test/hotspot/gtest/runtime/test_objectMonitor.cpp line 40: > 38: EXPECT_GE((size_t) in_bytes(ObjectMonitor::owner_offset()), cache_line_size) > 39: << "the _metadata and _owner fields are closer " > 40: << "than a cache line which permits false sharing."; nit - the indent should be four spaces instead of five spaces. test/hotspot/gtest/runtime/test_objectMonitor.cpp line 44: > 42: EXPECT_GE((size_t) in_bytes(ObjectMonitor::recursions_offset() - ObjectMonitor::owner_offset()), cache_line_size) > 43: << "the _owner and _recursions fields are closer " > 44: << "than a cache line which permits false sharing."; nit - the indent should be four spaces instead of five spaces. test/hotspot/jtreg/runtime/Monitor/UseObjectMonitorTableTest.java line 148: > 146: static final int MAX_RECURSION_COUNT = 10; > 147: static final double RECURSION_CHANCE = .25; > 148: final Random random = new Random(); The test should output a seed value so that the user knows what random seed value was in use if/when the test fails. Also add a way to specify the seed value from the command line for reproducibility. test/micro/org/openjdk/bench/vm/lang/LockUnlock.java line 201: > 199: > 200: /** Perform two synchronized after each other on the same object. */ > 201: @Benchmark Please align L200 with L201. ------------- Changes requested by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20067#pullrequestreview-2234133776 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715917040 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715955877 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715957258 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715958513 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715965577 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715976433 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715978312 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715981270 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715981568 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715981881 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715986844 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715990141 From dcubed at openjdk.org Tue Aug 13 22:16:01 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 13 Aug 2024 22:16:01 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v11] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 15:58:14 GMT, Axel Boldt-Christmas wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Missing DEBUG_ONLY src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp line 632: > 630: bool success = false; > 631: if (LockingMode == LM_LEGACY) { > 632: // Traditional lightweight locking. The if-statement is for legacy locking so the comment about lightweight locking seems wrong. src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp line 736: > 734: bool success = false; > 735: if (LockingMode == LM_LEGACY) { > 736: // traditional lightweight locking The if-statement is for legacy locking so the comment about lightweight locking seems wrong. src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp line 1671: > 1669: bool success = false; > 1670: if (LockingMode == LM_LEGACY) { > 1671: // traditional lightweight locking The if-statement is for legacy locking so the comment about lightweight locking seems wrong. src/hotspot/share/prims/jvmtiEnvBase.cpp line 1503: > 1501: > 1502: if (mon != nullptr) { > 1503: assert(mon != nullptr, "must have monitor"); With the new if-statement on L1502, the assert is not needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1714439929 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1714440641 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1714441506 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1714448544 From dcubed at openjdk.org Tue Aug 13 22:16:10 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 13 Aug 2024 22:16:10 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v15] In-Reply-To: References: Message-ID: <1fb1K_XEPWOFdZohDLyQmgXhulJfSelL9Ib0fpkmVFI=.c3beb140-6cb1-43db-ae6c-547d997c554b@github.com> On Tue, 13 Aug 2024 14:52:03 GMT, Axel Boldt-Christmas wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: > > - Remove the last OMWorld references > - Rename omworldtable_work to object_monitor_table_work src/hotspot/share/runtime/basicLock.inline.hpp line 45: > 43: return reinterpret_cast(get_metadata()); > 44: #else > 45: // Other platforms does not make use of the cache yet, nit typo: s/does not/do not/ src/hotspot/share/runtime/basicLock.inline.hpp line 54: > 52: inline void BasicLock::clear_object_monitor_cache() { > 53: assert(UseObjectMonitorTable, "must be"); > 54: set_metadata(0); Should this be a literal `0` or should it be `nullptr`? Update: The metadata field is of type `unintptr_t`. Got it. src/hotspot/share/runtime/deoptimization.cpp line 1650: > 1648: mon_info->lock()->set_bad_metadata_deopt(); > 1649: } > 1650: #endif I like this! src/hotspot/share/runtime/globals.hpp line 1964: > 1962: \ > 1963: product(int, LightweightFastLockingSpins, 13, DIAGNOSTIC, \ > 1964: "Specifies the number of time lightweight fast locking will " \ nit typo: s/number of time/number of times/ src/hotspot/share/runtime/lightweightSynchronizer.cpp line 34: > 32: #include "oops/oop.inline.hpp" > 33: #include "runtime/atomic.hpp" > 34: #include "memory/allStatic.hpp" nit: this include is out of order. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 43: > 41: #include "runtime/mutexLocker.hpp" > 42: #include "runtime/objectMonitor.hpp" > 43: #include "runtime/objectMonitor.inline.hpp" Shouldn't have both includes here. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 81: > 79: oop _obj; > 80: > 81: public: nit - please indent by one more space. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 86: > 84: uintx get_hash() const { > 85: uintx hash = _obj->mark().hash(); > 86: assert(hash != 0, "should have a hash"); Hmmm... I can remember seeing hash values of zero in some of my older legacy inflation stress runs. Is a hash value of zero not a thing with lightweight locking? src/hotspot/share/runtime/lightweightSynchronizer.cpp line 104: > 102: ObjectMonitor* _monitor; > 103: > 104: public: nit - please indent by one more space. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 126: > 124: > 125: static void dec_items_count() { > 126: Atomic::inc(&_items_count); Shouldn't this be `Atomic::dec`? src/hotspot/share/runtime/lightweightSynchronizer.cpp line 130: > 128: > 129: static double get_load_factor() { > 130: return (double)_items_count/(double)_table_size; nit - please add spaces around `/` operator. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 169: > 167: } > 168: > 169: public: nit - please indent by one space. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 184: > 182: bool has_monitor = obj->mark().has_monitor(); > 183: assert(has_monitor == (monitor != nullptr), > 184: "Inconsistency between markWord and OMW table has_monitor: %s monitor: " PTR_FORMAT, Do you still want to use the name "OMW table"? src/hotspot/share/runtime/lightweightSynchronizer.cpp line 213: > 211: > 212: static bool should_shrink() { > 213: // No implemented; nit typo: s/No/Not/ src/hotspot/share/runtime/lightweightSynchronizer.cpp line 322: > 320: oop obj = om->object_peek(); > 321: st->print("monitor " PTR_FORMAT " ", p2i(om)); > 322: st->print("object " PTR_FORMAT, p2i(obj)); The monitor output style is to use `=` and commas, e.g. `monitor=, object=` src/hotspot/share/runtime/lightweightSynchronizer.cpp line 341: > 339: > 340: ObjectMonitor* LightweightSynchronizer::get_or_insert_monitor_from_table(oop object, JavaThread* current, bool* inserted) { > 341: assert(LockingMode == LM_LIGHTWEIGHT, "must be"); Do you want to assert: `inserted != nullptr`? src/hotspot/share/runtime/lightweightSynchronizer.cpp line 367: > 365: ResourceMark rm(current); > 366: log_trace(monitorinflation)("inflate: object=" INTPTR_FORMAT ", mark=" > 367: INTPTR_FORMAT ", type='%s' cause %s", p2i(object), nit typo: s/cause %s/cause=%s/ src/hotspot/share/runtime/lightweightSynchronizer.cpp line 414: > 412: > 413: intptr_t hash = obj->mark().hash(); > 414: assert(hash != 0, "must be set when claiming the object monitor"); Hmmm... I can remember seeing hash values of zero in some of my older legacy inflation stress runs. Is a hash value of zero not a thing with lightweight locking? src/hotspot/share/runtime/lightweightSynchronizer.cpp line 468: > 466: oop obj = *o; > 467: if (obj->mark_acquire().has_monitor()) { > 468: if (_length > 0 && _contended_oops[_length-1] == obj) { nit - please add space around `-` operator. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 501: > 499: // Make room on lock_stack > 500: if (lock_stack.is_full()) { > 501: // Inflate contented objects nit typo: s/contented/contended/ src/hotspot/share/runtime/lightweightSynchronizer.cpp line 545: > 543: bool _no_safepoint; > 544: > 545: public: nit - please indent by one space. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 664: > 662: > 663: // Used when deflation is observed. Progress here requires progress > 664: // from the deflator. After observing the that the deflator is not nit typo: s/observing the that the deflator/observing that the deflator/ src/hotspot/share/runtime/lightweightSynchronizer.cpp line 760: > 758: > 759: // LightweightSynchronizer::inflate_locked_or_imse is used to to get an inflated > 760: // ObjectMonitor* with LM_LIGHTWEIGHT. It is used from contexts which requires nit typo: s/used from contexts which requires/used from contexts which require/ src/hotspot/share/runtime/lightweightSynchronizer.cpp line 773: > 771: JavaThread* current = THREAD; > 772: > 773: for(;;) { nit: please add space before `(`. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 778: > 776: // No lock, IMSE. > 777: THROW_MSG_(vmSymbols::java_lang_IllegalMonitorStateException(), > 778: "current thread is not owner", nullptr); nit - please indent by one more space. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 785: > 783: // Fast locked by other thread, IMSE. > 784: THROW_MSG_(vmSymbols::java_lang_IllegalMonitorStateException(), > 785: "current thread is not owner", nullptr); nit - please indent by one more space. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 799: > 797: if (lock_stack.contains(obj)) { > 798: // Current thread owns the lock but someone else inflated > 799: // fix owner and pop lock stack Please consider: // Current thread owns the lock but someone else inflated it. // Fix owner and pop lock stack. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 805: > 803: // Fast locked (and inflated) by other thread, or deflation in progress, IMSE. > 804: THROW_MSG_(vmSymbols::java_lang_IllegalMonitorStateException(), > 805: "current thread is not owner", nullptr); nit - please indent by one more space. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 1045: > 1043: if (monitor->is_being_async_deflated()) { > 1044: // The MonitorDeflation thread is deflating the monitor. The locking thread > 1045: // must spin until further progress have been made. nit typo: s/progress have been made./progress has been made./ src/hotspot/share/runtime/lightweightSynchronizer.cpp line 1075: > 1073: // lock, then we make the locking_thread thread > 1074: // the ObjectMonitor owner and remove the > 1075: // lock from the locking_thread thread's lock stack. nit typos: s/locking_thread thread/locking_thread/ in three places. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 1190: > 1188: > 1189: #ifndef _LP64 > 1190: // Only for 32bit which have limited support for fast locking outside the runtime. nit typo: s/which have limited support/which has limited support/ src/hotspot/share/runtime/lightweightSynchronizer.cpp line 1194: > 1192: // Recursive lock successful. > 1193: current->inc_held_monitor_count(); > 1194: // Clears object monitor cache, because ? What does this comment mean? src/hotspot/share/runtime/lightweightSynchronizer.hpp line 36: > 34: > 35: class LightweightSynchronizer : AllStatic { > 36: private: nit: please indent by 1 space. src/hotspot/share/runtime/lightweightSynchronizer.hpp line 41: > 39: > 40: static ObjectMonitor* add_monitor(JavaThread* current, ObjectMonitor* monitor, oop obj); > 41: static bool remove_monitor(Thread* current, oop obj, ObjectMonitor* monitor); Hmmm... `add_monitor` has `monitor` and then `obj` params. `remove_monitor` has `obj` and then `monitor` params. Why not use the same order? src/hotspot/share/runtime/lightweightSynchronizer.hpp line 57: > 55: static bool resize_table(JavaThread* current); > 56: > 57: private: nit: please indent by 1 space. src/hotspot/share/runtime/lightweightSynchronizer.hpp line 61: > 59: static bool fast_lock_spin_enter(oop obj, LockStack& lock_stack, JavaThread* current, bool observed_deflation); > 60: > 61: public: nit: please indent by 1 space. src/hotspot/share/runtime/lightweightSynchronizer.hpp line 69: > 67: static ObjectMonitor* inflate_locked_or_imse(oop object, ObjectSynchronizer::InflateCause cause, TRAPS); > 68: static ObjectMonitor* inflate_fast_locked_object(oop object, JavaThread* locking_thread, JavaThread* current, ObjectSynchronizer::InflateCause cause); > 69: static ObjectMonitor* inflate_and_enter(oop object, JavaThread* locking_thread, JavaThread* current, ObjectSynchronizer::InflateCause cause); All of these are "inflate" functions, but: - two of them have `object` parameter next to the `cause` parameter - two of them have `object` parameter first - one of them has `current` parameter before the other thread parameter (`inflating_thread`) - two of them have the `current` parameter after the other thread parameter (`locking_thread`) Please consider making the parameter order consistent. src/hotspot/share/runtime/lockStack.hpp line 130: > 128: class OMCache { > 129: friend class VMStructs; > 130: public: Please indent by one space. src/hotspot/share/runtime/lockStack.hpp line 133: > 131: static constexpr int CAPACITY = 8; > 132: > 133: private: Please indent by one space. src/hotspot/share/runtime/lockStack.hpp line 140: > 138: const oop _null_sentinel = nullptr; > 139: > 140: public: Please indent by one space. src/hotspot/share/runtime/objectMonitor.cpp line 669: > 667: install_displaced_markword_in_object(obj); > 668: } > 669: } This can be cleaned up by putting L664 and L665 on the same line, fixing the indents on L666-7 and deleting L668. src/hotspot/share/runtime/objectMonitor.cpp line 682: > 680: // deflation process. > 681: void ObjectMonitor::install_displaced_markword_in_object(const oop obj) { > 682: assert(!UseObjectMonitorTable, "Lightweight has no dmw"); Perhaps: `assert(!UseObjectMonitorTable, "ObjectMonitorTable has no dmw");` src/hotspot/share/runtime/objectMonitor.hpp line 388: > 386: // Deflation support > 387: bool deflate_monitor(Thread* current); > 388: private: nit - please indent by one space src/hotspot/share/runtime/serviceThread.cpp line 44: > 42: #include "runtime/jniHandles.hpp" > 43: #include "runtime/serviceThread.hpp" > 44: #include "runtime/lightweightSynchronizer.hpp" Include is in the wrong place. src/hotspot/share/runtime/synchronizer.cpp line 404: > 402: > 403: bool ObjectSynchronizer::quick_enter_legacy(oop obj, JavaThread* current, > 404: BasicLock * lock) { nit - please fix this indent. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715569720 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715571831 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715582675 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715589490 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715617774 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715618379 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715621030 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715623399 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715640861 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715642307 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715643617 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715647133 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715649276 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715651080 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715661781 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715663965 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715662800 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715675411 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715678877 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715680995 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715682621 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715690997 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715696004 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715697549 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715698164 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715698568 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715700114 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715700801 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715716901 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715720857 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715724929 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715726115 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715603974 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715601643 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715604333 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715604553 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715614530 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715880925 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715881116 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715881379 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715922871 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715924876 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715900092 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715926418 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715940064 From dcubed at openjdk.org Tue Aug 13 22:16:10 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 13 Aug 2024 22:16:10 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v15] In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 14:56:32 GMT, Coleen Phillimore wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove the last OMWorld references >> - Rename omworldtable_work to object_monitor_table_work > > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 172: > >> 170: static void create() { >> 171: _table = new ConcurrentTable(initial_log_size(), >> 172: max_log_size(), > > nit, can you line up these parameters? Or put them all on L171 if that doesn't make it too long... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715648059 From dcubed at openjdk.org Tue Aug 13 22:16:10 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 13 Aug 2024 22:16:10 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6] In-Reply-To: References: Message-ID: On Mon, 15 Jul 2024 00:45:10 GMT, Axel Boldt-Christmas wrote: >> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 401: >> >>> 399: >>> 400: if (inserted) { >>> 401: // Hopefully the performance counters are allocated on distinct >> >> It doesn't look like the counters are on distinct cache lines (see objectMonitor.hpp, lines 212ff). If this is a concern, file a bug to investigate it later? The comment here is a bit misplaced, IMO. > > It originates from https://github.com/openjdk/jdk/blob/15997bc3dfe9dddf21f20fa189f97291824892de/src/hotspot/share/runtime/synchronizer.cpp#L1543 > > I think we just kept it and did not think more about it. > > Not sure what it is referring to. Maybe @dcubed-ojdk knows more, they originated from him (9 years old comment). I don't think we ever got around to experimenting with putting the perf counters on distinct cache lines. We've always had bigger fish to fry. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715669185 From dcubed at openjdk.org Tue Aug 13 22:16:10 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 13 Aug 2024 22:16:10 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v15] In-Reply-To: <1fb1K_XEPWOFdZohDLyQmgXhulJfSelL9Ib0fpkmVFI=.c3beb140-6cb1-43db-ae6c-547d997c554b@github.com> References: <1fb1K_XEPWOFdZohDLyQmgXhulJfSelL9Ib0fpkmVFI=.c3beb140-6cb1-43db-ae6c-547d997c554b@github.com> Message-ID: On Tue, 13 Aug 2024 16:49:42 GMT, Daniel D. Daugherty wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove the last OMWorld references >> - Rename omworldtable_work to object_monitor_table_work > > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 86: > >> 84: uintx get_hash() const { >> 85: uintx hash = _obj->mark().hash(); >> 86: assert(hash != 0, "should have a hash"); > > Hmmm... I can remember seeing hash values of zero in some > of my older legacy inflation stress runs. Is a hash value of zero > not a thing with lightweight locking? Update: My memory was wrong. When zero is encountered as a hash value, it is replaced with `0xBAD`. > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 414: > >> 412: >> 413: intptr_t hash = obj->mark().hash(); >> 414: assert(hash != 0, "must be set when claiming the object monitor"); > > Hmmm... I can remember seeing hash values of zero in some > of my older legacy inflation stress runs. Is a hash value of zero > not a thing with lightweight locking? Update: My memory was wrong. When zero is encountered as a hash value, it is replaced with `0xBAD`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715952007 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715952460 From dcubed at openjdk.org Tue Aug 13 22:21:55 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 13 Aug 2024 22:21:55 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v16] In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 16:30:12 GMT, Axel Boldt-Christmas wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Whitespace and nits Just for clarity, I think this is the one real bug that I found: src/hotspot/share/runtime/lightweightSynchronizer.cpp line 126: > 124: > 125: static void dec_items_count() { > 126: Atomic::inc(&_items_count); Shouldn't this be `Atomic::dec`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20067#issuecomment-2287237680 From kbarrett at openjdk.org Tue Aug 13 22:24:57 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 13 Aug 2024 22:24:57 GMT Subject: RFR: 8338330: Fix -Wzero-as-null-pointer-constant warnings from THROW_XXX_0 Message-ID: Please review this change to add THROW_ARG_NULL and THROW_HANDLE_NULL macros, and use them instead of the corresponding THROW_XXX_0 macros in contexts where a pointer value is needed. This removes some -Wzero-as-null-pointer-constant warnings. There aren't many uses of either (only one of the HANDLE variant). An alternative would have been to change the callers to use the unsuffixed variant with a nullptr value argument. Adding the macros is consistent with other THROW variants, and seems a little bit more readable. Testing: mach5 tier1 ------------- Commit messages: - add THROW_xxx_NULL Changes: https://git.openjdk.org/jdk/pull/20574/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20574&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338330 Stats: 14 lines in 3 files changed: 3 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/20574.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20574/head:pull/20574 PR: https://git.openjdk.org/jdk/pull/20574 From kbarrett at openjdk.org Tue Aug 13 22:45:59 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 13 Aug 2024 22:45:59 GMT Subject: RFR: 8338331: Fix -Wzero-as-null-pointer-constant warnings from CHECK_0 in jni.cpp Message-ID: Please review this change to some macros in jni.cpp. These macros were using CHECK_0 when calling functions that can "throw" exceptions. However, the return type involved is provided by a macro argument, and is a pointer type for some uses of the macros. This triggered -Wzero-as-null-pointer-constant warnings when enabled. To remove the warnings, these CHECK_0() uses are changed to CHECK_() with an argument expression that constructs and value-initializes a temporary of that return type, e.g. `ResultType{}`. Value-initialization of a scalar type is zero-initialization. Zero-initialization of a scalar type initializes it to the value obtained by convertion a literal 0 to that type. So a zero of the appropriate type for arithmetic types. For pointer types it's initialized to nullptr, without triggering the warning. Testing: mach5 tier1 ------------- Commit messages: - CHECK_0 in jni.cpp Changes: https://git.openjdk.org/jdk/pull/20575/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20575&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338331 Stats: 10 lines in 1 file changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/20575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20575/head:pull/20575 PR: https://git.openjdk.org/jdk/pull/20575 From dlong at openjdk.org Tue Aug 13 22:47:48 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 13 Aug 2024 22:47:48 GMT Subject: RFR: 8338330: Fix -Wzero-as-null-pointer-constant warnings from THROW_XXX_0 In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 22:19:45 GMT, Kim Barrett wrote: > Please review this change to add THROW_ARG_NULL and THROW_HANDLE_NULL macros, > and use them instead of the corresponding THROW_XXX_0 macros in contexts where > a pointer value is needed. This removes some -Wzero-as-null-pointer-constant > warnings. > > There aren't many uses of either (only one of the HANDLE variant). An > alternative would have been to change the callers to use the unsuffixed > variant with a nullptr value argument. Adding the macros is consistent with > other THROW variants, and seems a little bit more readable. > > Testing: mach5 tier1 Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20574#pullrequestreview-2236758685 From dholmes at openjdk.org Tue Aug 13 22:53:48 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 13 Aug 2024 22:53:48 GMT Subject: RFR: 8338331: Fix -Wzero-as-null-pointer-constant warnings from CHECK_0 in jni.cpp In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 22:41:19 GMT, Kim Barrett wrote: > Please review this change to some macros in jni.cpp. These macros were using > CHECK_0 when calling functions that can "throw" exceptions. However, the > return type involved is provided by a macro argument, and is a pointer type > for some uses of the macros. This triggered -Wzero-as-null-pointer-constant > warnings when enabled. > > To remove the warnings, these CHECK_0() uses are changed to CHECK_() with an > argument expression that constructs and value-initializes a temporary of that > return type, e.g. `ResultType{}`. Value-initialization of a scalar type is > zero-initialization. Zero-initialization of a scalar type initializes it to > the value obtained by convertion a literal 0 to that type. So a zero of the > appropriate type for arithmetic types. For pointer types it's initialized to > nullptr, without triggering the warning. > > Testing: mach5 tier1 Okay. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20575#pullrequestreview-2236774787 From dholmes at openjdk.org Tue Aug 13 22:55:48 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 13 Aug 2024 22:55:48 GMT Subject: RFR: 8338330: Fix -Wzero-as-null-pointer-constant warnings from THROW_XXX_0 In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 22:19:45 GMT, Kim Barrett wrote: > Please review this change to add THROW_ARG_NULL and THROW_HANDLE_NULL macros, > and use them instead of the corresponding THROW_XXX_0 macros in contexts where > a pointer value is needed. This removes some -Wzero-as-null-pointer-constant > warnings. > > There aren't many uses of either (only one of the HANDLE variant). An > alternative would have been to change the callers to use the unsuffixed > variant with a nullptr value argument. Adding the macros is consistent with > other THROW variants, and seems a little bit more readable. > > Testing: mach5 tier1 Okay. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20574#pullrequestreview-2236776583 From jbhateja at openjdk.org Wed Aug 14 04:59:23 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 14 Aug 2024 04:59:23 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v3] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SATURATING_UADD : Saturating unsigned addition. > . SATURATING_ADD : Saturating signed addition. > . SATURATING_USUB : Saturating unsigned subtraction. > . SATURATING_SUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/5468e72b..8c9bfeca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=01-02 Stats: 776 lines in 34 files changed: 0 ins; 0 del; 776 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From dholmes at openjdk.org Wed Aug 14 05:47:55 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 14 Aug 2024 05:47:55 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v16] In-Reply-To: References: Message-ID: <9PPJKBEF_-5HvSd4fnN6Maat1JVsHmxZy-dR7EqDCPw=.4e5d28fb-7dba-43b4-9452-3d9847ed7fa7@github.com> On Tue, 13 Aug 2024 16:30:12 GMT, Axel Boldt-Christmas wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Whitespace and nits src/hotspot/cpu/aarch64/c1_MacroAssembler_aarch64.cpp line 84: > 82: > 83: if (LockingMode == LM_LIGHTWEIGHT) { > 84: lightweight_lock(disp_hdr, obj, hdr, temp, rscratch2, slow_case); Given the declaration: void MacroAssembler::lightweight_lock(Register basic_lock, Register obj, Register t1, Register t2, Register t3, Label& slow) it looks odd to pass `disp_hdr` here - is that variable just mis-named? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1716339494 From aboldtch at openjdk.org Wed Aug 14 05:54:12 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 14 Aug 2024 05:54:12 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v17] In-Reply-To: References: Message-ID: <8201AEgxCNq6r17tpMSHJ_oZZKSfCRNLKb0GgQEs1RQ=.953ac465-04e3-47ba-aace-92c1a5ef85e7@github.com> > When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. > > This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. > > A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). > > This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). > > Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. > > # Cleanups > > Cleaned up displaced header usage for: > * BasicLock > * Contains some Zero changes > * Renames one exported JVMCI field > * ObjectMonitor > * Updates comments and tests consistencies > > # Refactoring > > `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. > > The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. > > _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ > > # LightweightSynchronizer > > Working on adapting and incorporating the following section as a comment in the source code > > ## Fast Locking > > CAS on locking bits in markWord. > 0b00 (Fast Locked) <--> 0b01 (Unlocked) > > When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. > > If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf... Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Fix items count ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20067/files - new: https://git.openjdk.org/jdk/pull/20067/files/68681c84..41ac7d37 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=15-16 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20067.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067 PR: https://git.openjdk.org/jdk/pull/20067 From aboldtch at openjdk.org Wed Aug 14 06:00:58 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 14 Aug 2024 06:00:58 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v11] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 22:40:06 GMT, Daniel D. Daugherty wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Missing DEBUG_ONLY > > src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp line 632: > >> 630: bool success = false; >> 631: if (LockingMode == LM_LEGACY) { >> 632: // Traditional lightweight locking. > > The if-statement is for legacy locking so the comment about lightweight > locking seems wrong. Yeah. It is an old comment from before LM_LIGHTWEIGHT existed. Will change to call it fast locking. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1716348558 From aboldtch at openjdk.org Wed Aug 14 06:06:57 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 14 Aug 2024 06:06:57 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v15] In-Reply-To: <1fb1K_XEPWOFdZohDLyQmgXhulJfSelL9Ib0fpkmVFI=.c3beb140-6cb1-43db-ae6c-547d997c554b@github.com> References: <1fb1K_XEPWOFdZohDLyQmgXhulJfSelL9Ib0fpkmVFI=.c3beb140-6cb1-43db-ae6c-547d997c554b@github.com> Message-ID: On Tue, 13 Aug 2024 16:34:17 GMT, Daniel D. Daugherty wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove the last OMWorld references >> - Rename omworldtable_work to object_monitor_table_work > > src/hotspot/share/runtime/lightweightSynchronizer.hpp line 41: > >> 39: >> 40: static ObjectMonitor* add_monitor(JavaThread* current, ObjectMonitor* monitor, oop obj); >> 41: static bool remove_monitor(Thread* current, oop obj, ObjectMonitor* monitor); > > Hmmm... `add_monitor` has `monitor` and then `obj` params. > `remove_monitor` has `obj` and then `monitor` params. Why > not use the same order? Yeah, why not? :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1716353596 From qxing at openjdk.org Wed Aug 14 06:31:00 2024 From: qxing at openjdk.org (Qizheng Xing) Date: Wed, 14 Aug 2024 06:31:00 GMT Subject: Integrated: 8336163: Remove declarations of some debug-only methods in release build In-Reply-To: References: Message-ID: On Thu, 11 Jul 2024 03:37:06 GMT, Qizheng Xing wrote: > Some of the methods are defined only in debug mode, but their declarations still exist in release mode. > > This is considered a bug because these methods may be called mistakenly in release mode and cause the build to fail. This pull request has now been integrated. Changeset: 3dd07b91 Author: Qizheng Xing Committer: Eric Liu URL: https://git.openjdk.org/jdk/commit/3dd07b91bbf644aa867452806e9388089fa97548 Stats: 23 lines in 5 files changed: 18 ins; 4 del; 1 mod 8336163: Remove declarations of some debug-only methods in release build Reviewed-by: dholmes, eliu, kvn ------------- PR: https://git.openjdk.org/jdk/pull/20131 From aboldtch at openjdk.org Wed Aug 14 06:35:01 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 14 Aug 2024 06:35:01 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v15] In-Reply-To: <1fb1K_XEPWOFdZohDLyQmgXhulJfSelL9Ib0fpkmVFI=.c3beb140-6cb1-43db-ae6c-547d997c554b@github.com> References: <1fb1K_XEPWOFdZohDLyQmgXhulJfSelL9Ib0fpkmVFI=.c3beb140-6cb1-43db-ae6c-547d997c554b@github.com> Message-ID: <2IX51ehxuHCfMr4W5i1RtKnBCqv-zU_8K3sLCYwgkoY=.3ad558a2-abce-4a6a-b53c-2750d7349925@github.com> On Tue, 13 Aug 2024 17:05:38 GMT, Daniel D. Daugherty wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove the last OMWorld references >> - Rename omworldtable_work to object_monitor_table_work > > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 126: > >> 124: >> 125: static void dec_items_count() { >> 126: Atomic::inc(&_items_count); > > Shouldn't this be `Atomic::dec`? Yes it should be. Surprised I never saw this. (Even though that using the service thread to grow was the very last thing changed before the PR, the resizing was handled differently before, by the deflator thread.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1716377509 From stefank at openjdk.org Wed Aug 14 06:56:49 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 14 Aug 2024 06:56:49 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 16:30:01 GMT, Gerard Ziemski wrote: > Is everyone OK with MemTypeFlag? It's quite unfortunate to have a three-word type for something this prolific in our code base. Why not go with `MemType` and change variable names from `flag` to `mt`? static char* map_memory_to_file(size_t size, int fd, MEMFLAGS flag = mtNone); would then become: static char* map_memory_to_file(size_t size, int fd, MemType mt = mtNone); ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2287987726 From fyang at openjdk.org Wed Aug 14 07:00:50 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 14 Aug 2024 07:00:50 GMT Subject: RFR: 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime [v3] In-Reply-To: <07DqhAfjMD9qfeno10HOAuNBeiIul86acqTMpE6YtaY=.2569accb-c0ab-470f-b348-5894831be5d5@github.com> References: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> <07DqhAfjMD9qfeno10HOAuNBeiIul86acqTMpE6YtaY=.2569accb-c0ab-470f-b348-5894831be5d5@github.com> Message-ID: On Tue, 13 Aug 2024 21:38:09 GMT, Andrew Dinn wrote: >> Store the throw_exception and jfr stub code as blobs in class SharedRuntime, move the generation code to the the arch-specific generator classes and update client code to access them from their new location. > > Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: > > fix accidental paste Hi Andrew, I find that we need following add-on change for riscv: diff --git a/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp b/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp index dc89e489b24..bed24e442e8 100644 --- a/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp +++ b/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp @@ -66,6 +66,12 @@ #define __ masm-> +#ifdef PRODUCT +#define BLOCK_COMMENT(str) /* nothing */ +#else +#define BLOCK_COMMENT(str) __ block_comment(str) +#endif + const int StackAlignmentInSlots = StackAlignmentInBytes / VMRegImpl::stack_slot_size; class RegisterSaver { @@ -2742,7 +2748,7 @@ static void jfr_epilogue(MacroAssembler* masm) { // For c2: c_rarg0 is junk, call to runtime to write a checkpoint. // It returns a jobject handle to the event writer. // The handle is dereferenced and the return value is the event writer oop. -static RuntimeStub* SharedRuntime::generate_jfr_write_checkpoint() { +RuntimeStub* SharedRuntime::generate_jfr_write_checkpoint() { enum layout { fp_off, fp_off2, @@ -2780,7 +2786,7 @@ static RuntimeStub* SharedRuntime::generate_jfr_write_checkpoint() { } // For c2: call to return a leased buffer. -static RuntimeStub* SharedRuntime::generate_jfr_return_lease() { +RuntimeStub* SharedRuntime::generate_jfr_return_lease() { enum layout { fp_off, fp_off2, ------------- PR Comment: https://git.openjdk.org/jdk/pull/20566#issuecomment-2287993001 From aboldtch at openjdk.org Wed Aug 14 07:11:57 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 14 Aug 2024 07:11:57 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v15] In-Reply-To: <1fb1K_XEPWOFdZohDLyQmgXhulJfSelL9Ib0fpkmVFI=.c3beb140-6cb1-43db-ae6c-547d997c554b@github.com> References: <1fb1K_XEPWOFdZohDLyQmgXhulJfSelL9Ib0fpkmVFI=.c3beb140-6cb1-43db-ae6c-547d997c554b@github.com> Message-ID: On Tue, 13 Aug 2024 18:13:30 GMT, Daniel D. Daugherty wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove the last OMWorld references >> - Rename omworldtable_work to object_monitor_table_work > > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 1194: > >> 1192: // Recursive lock successful. >> 1193: current->inc_held_monitor_count(); >> 1194: // Clears object monitor cache, because ? > > What does this comment mean? I probably meant to write something about how x86_32 still uses the cache, so it is important that the CacheSetter clears it here. I'll remove the comment and add one on the CacheSetter explaining why it is necessary. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1716414912 From aboldtch at openjdk.org Wed Aug 14 07:25:56 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 14 Aug 2024 07:25:56 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v15] In-Reply-To: <1fb1K_XEPWOFdZohDLyQmgXhulJfSelL9Ib0fpkmVFI=.c3beb140-6cb1-43db-ae6c-547d997c554b@github.com> References: <1fb1K_XEPWOFdZohDLyQmgXhulJfSelL9Ib0fpkmVFI=.c3beb140-6cb1-43db-ae6c-547d997c554b@github.com> Message-ID: <91YBmcYITOdHN6xHTbP-ZVABu4d-FNZYM6W630F9egw=.f75a47f5-bbc4-42d8-ba9b-941bea0ce584@github.com> On Tue, 13 Aug 2024 21:05:29 GMT, Daniel D. Daugherty wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove the last OMWorld references >> - Rename omworldtable_work to object_monitor_table_work > > src/hotspot/share/runtime/serviceThread.cpp line 44: > >> 42: #include "runtime/jniHandles.hpp" >> 43: #include "runtime/serviceThread.hpp" >> 44: #include "runtime/lightweightSynchronizer.hpp" > > Include is in the wrong place. I'll sort the whole list. `#include "runtime/serviceThread.hpp"` was in the wrong place. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1716430696 From jbhateja at openjdk.org Wed Aug 14 07:43:52 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 14 Aug 2024 07:43:52 GMT Subject: RFR: 8338023: Support two vector selectFrom API In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Mon, 12 Aug 2024 22:03:44 GMT, Paul Sandoz wrote: > The results look promising. I can provide guidance on the specification e.g., we can specify the behavior in terms of rearrange, with the addition of throwing on out of bounds indexes. > > Regarding the throwing of exceptions, some wider context will help to know where we are heading before we finalize the specification. I believe we are considering changing the default throwing behavior for index out of bounds to wrapping, thereby we can avoid bounds checks. If that is the case we should wait until that is done then update rather than submitting a CSR just yet? > > I see you created a specific intrinsic, which will avoid the cost of shuffle creation. Should we apply the same approach (in a subsequent PR) to the single argument shuffle? Or perhaps if we manage to optimize shuffles and change the default wrapping we don't require a specific intrinsic and can just use defer to rearrange? Hi @PaulSandoz , Thanks for your comments. With this new API we intend to enforce stricter specification w.r.t to index values to emit a lean instruction sequence preventing any cycles spent on massaging inputs to a consumable form, thus preventing redundant wrapping and unwrapping operations. Existing [two vector rearrange API](https://docs.oracle.com/en/java/javase/22/docs/api/jdk.incubator.vector/jdk/incubator/vector/Vector.html#rearrange(jdk.incubator.vector.VectorShuffle,jdk.incubator.vector.Vector)) has a flexible specification which allows wrapping out of bounds shuffle indexes into exceptional index with a -ve value. Even if we optimize existing two vector rearrange implementation we will still need to emit additional instructions to generate an indexes which lie within two vector range [0, 2*VLEN). I see this as a specialized API like vector compress/expand which cater to targets like x86-AVX512+ and aarch64-SVE which offers direct instruction for two vector lookups. May be the API nomenclature can be refined to better reflect its semantics i.e. from selectFrom to twoVectorLookup ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2288062038 From aboldtch at openjdk.org Wed Aug 14 08:05:00 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 14 Aug 2024 08:05:00 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v16] In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 21:45:57 GMT, Daniel D. Daugherty wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Whitespace and nits > > src/hotspot/share/runtime/synchronizer.cpp line 1271: > >> 1269: _no_progress_cnt >= NoAsyncDeflationProgressMax) { >> 1270: double remainder = (100.0 - MonitorUsedDeflationThreshold) / 100.0; >> 1271: size_t new_ceiling = ceiling / remainder + 1; > > Why was the `new_ceiling` calculation changed? > I think the `new_ceiling` value is going to lower than the old ceiling value. The old calculation made no sense if the goal was to after NoAsyncDeflationProgressMax deflation cycles with not deflation stop trying until we get more monitors. With the current calculation we can keep running these no progress cycles for quite a number of iterations. The remainder is a value in the range [0,1]. So it will always grow by at least 1. (When remainder == 1: `new_ceiling = ceiling + 1;`) I do however realise that the new calculation needs to handle remainder == 0.0 (MonitorUsedDeflationThreshold = 100). But the whole NoAsyncDeflationProgressMax makes little sense if MonitorUsedDeflationThreshold is 0, as the value for new ceiling would only stop deflation if it is large enough that the `size_t monitor_usage = (monitors_used * 100LL) / ceiling;` gets truncated to 0. (Less than 1%). But this is a leftover from the deflation changes so I will probably just remove this change and create a RFE for reevaluating this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1716479164 From rcastanedalo at openjdk.org Wed Aug 14 08:29:45 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 14 Aug 2024 08:29:45 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v7] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Rename 'HeapRegionBounds' to 'G1HeapRegionBounds' - Merge jdk-24+10 - Further motivate the choice of internal store address materialization in x64 - Give barrier generation helper functions a more consistent name - Also include HOTSPOT_TARGET_CPU_ARCH-based G1 ADL source file - Flatten barrier assembly generation code by removing helpers individual barrier tests and operations - Build barrier data in G1BarrierSetC2::get_store_barrier() by adding, rather than removing, barrier tags - Implement JEP 475 Co-authored-by: Erik ?sterlund, Siyao Liu, and Kim Barrett ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/d21104ca..88d28b9f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=05-06 Stats: 99129 lines in 2523 files changed: 60137 ins; 27053 del; 11939 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Wed Aug 14 08:29:45 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 14 Aug 2024 08:29:45 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v6] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 13:57:27 GMT, Roberto Casta?eda Lozano wrote: > OK, I will test and push a merge of jdk-24+10 (Thu Aug 8) in the next days, unless @feilongjiang or @snazarkin object. We can then check in a few weeks if another update is required. Done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2288146342 From aboldtch at openjdk.org Wed Aug 14 08:39:36 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 14 Aug 2024 08:39:36 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v18] In-Reply-To: References: Message-ID: > When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. > > This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. > > A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). > > This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). > > Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. > > # Cleanups > > Cleaned up displaced header usage for: > * BasicLock > * Contains some Zero changes > * Renames one exported JVMCI field > * ObjectMonitor > * Updates comments and tests consistencies > > # Refactoring > > `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. > > The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. > > _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ > > # LightweightSynchronizer > > Working on adapting and incorporating the following section as a comment in the source code > > ## Fast Locking > > CAS on locking bits in markWord. > 0b00 (Fast Locked) <--> 0b01 (Unlocked) > > When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. > > If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf... Axel Boldt-Christmas has updated the pull request incrementally with 30 additional commits since the last revision: - Add and print random seed UseObjectMonitorTableTest.java - Align comment LockUnlock.java - Fix test_objectMonitor.cpp indent - Revert monitors_used_above_threshold changes - Fix whitespace synchronizer.{inline.hpp,cpp} - Update `ObjectSynchronizer::FastHashCode` comment - Fix quick_enter parameter order - Fix serviceThread.cpp include order - Update `ObjectMonitor::install_displaced_markword_in_object` assert text - Fix merge else if scopes ObjectMonitor::deflate_monitor - ... and 20 more: https://git.openjdk.org/jdk/compare/41ac7d37...3f29e6d6 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20067/files - new: https://git.openjdk.org/jdk/pull/20067/files/41ac7d37..3f29e6d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=16-17 Stats: 119 lines in 18 files changed: 11 ins; 16 del; 92 mod Patch: https://git.openjdk.org/jdk/pull/20067.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067 PR: https://git.openjdk.org/jdk/pull/20067 From aboldtch at openjdk.org Wed Aug 14 08:46:57 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 14 Aug 2024 08:46:57 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v16] In-Reply-To: <9PPJKBEF_-5HvSd4fnN6Maat1JVsHmxZy-dR7EqDCPw=.4e5d28fb-7dba-43b4-9452-3d9847ed7fa7@github.com> References: <9PPJKBEF_-5HvSd4fnN6Maat1JVsHmxZy-dR7EqDCPw=.4e5d28fb-7dba-43b4-9452-3d9847ed7fa7@github.com> Message-ID: On Wed, 14 Aug 2024 05:45:27 GMT, David Holmes wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Whitespace and nits > > src/hotspot/cpu/aarch64/c1_MacroAssembler_aarch64.cpp line 84: > >> 82: >> 83: if (LockingMode == LM_LIGHTWEIGHT) { >> 84: lightweight_lock(disp_hdr, obj, hdr, temp, rscratch2, slow_case); > > Given the declaration: > > void MacroAssembler::lightweight_lock(Register basic_lock, Register obj, Register t1, Register t2, Register t3, Label& slow) > > it looks odd to pass `disp_hdr` here - is that variable just mis-named? Yeah, because the BasicLock and display header / metadata in BasicLock are all on the same address so they have been used interchangeably in c1,c2 and the interpreter. It should probably be fixed. But maybe in a separate PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1716539547 From aboldtch at openjdk.org Wed Aug 14 08:51:56 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 14 Aug 2024 08:51:56 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9] In-Reply-To: References: <1fs1zYHKJsoWuEpKNb1ZY_VQ7_i_gQrbmx4d2fJvQo0=.1e3cbf20-dedf-4113-95c2-444869a75d1d@github.com> Message-ID: On Tue, 13 Aug 2024 16:03:16 GMT, Roman Kennke wrote: >> I tried the following (see diff below) and it shows about a 5-10% regression in most the `LockUnlock.testInflated*` micros. Also tried with just `num_unrolled = 1` saw the same regression. Maybe there was some other pattern you were thinking of. There are probably architecture and platform differences. This can and should probably be explored in a followup PR. >> >> >> >> diff --git a/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp b/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp >> index 5dbfdbc225d..4e6621cfece 100644 >> --- a/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp >> +++ b/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp >> @@ -663,25 +663,28 @@ void C2_MacroAssembler::fast_lock_lightweight(Register obj, Register box, Regist >> >> const int num_unrolled = 2; >> for (int i = 0; i < num_unrolled; i++) { >> - cmpptr(obj, Address(t)); >> - jccb(Assembler::equal, monitor_found); >> - increment(t, in_bytes(OMCache::oop_to_oop_difference())); >> + Label next; >> + cmpptr(obj, Address(t, OMCache::oop_to_oop_difference() * i)); >> + jccb(Assembler::notEqual, next); >> + increment(t, in_bytes(OMCache::oop_to_oop_difference() * i)); >> + jmpb(monitor_found); >> + bind(next); >> } >> + increment(t, in_bytes(OMCache::oop_to_oop_difference() * (num_unrolled - 1))); >> >> Label loop; >> >> // Search for obj in cache. >> bind(loop); >> - >> - // Check for match. >> - cmpptr(obj, Address(t)); >> - jccb(Assembler::equal, monitor_found); >> - >> + // Advance. >> + increment(t, in_bytes(OMCache::oop_to_oop_difference())); >> // Search until null encountered, guaranteed _null_sentinel at end. >> cmpptr(Address(t), 1); >> jcc(Assembler::below, slow_path); // 0 check, but with ZF=0 when *t == 0 >> - increment(t, in_bytes(OMCache::oop_to_oop_difference())); >> - jmpb(loop); >> + >> + // Check for match. >> + cmpptr(obj, Address(t)); >> + jccb(Assembler::notEqual, loop); >> >> // Cache hit. >> bind(monitor_found); > > Yeah it's probably not very important. But it's not quite what I had in mind, I was thinking more something like (aarch64 version, untested, may be wrong): > > > diff --git a/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp b/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp > index 19af03d3488..05bbb5760b8 100644 > --- a/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp > +++ b/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp > @@ -302,14 +302,14 @@ void C2_MacroAssembler::fast_lock_lightweight(Register obj, Register box, Regist > Label monitor_found; > > // Load cache address > - lea(t3_t, Address(rthread, JavaThread::om_cache_oops_offset())); > + lea(t3_t, Address(rthread, JavaThread::om_cache_oops_offset() - OMCache::oop_to_oop_difference())); > > const int num_unrolled = 2; > for (int i = 0; i < num_unrolled; i++) { > + increment(t3_t, in_bytes(OMCache::oop_to_oop_difference())); > ldr(t1, Address(t3_t)); > cmp(obj, t1); > br(Assembler::EQ, monitor_found); > - increment(t3_t, in_bytes(OMCache::oop_to_oop_difference())); > } > > Label loop; > @@ -317,16 +317,14 @@ void C2_MacroAssembler::fast_lock_lightweight(Register obj, Register box, Regist > // Search for obj in cache. > bind(loop); > > - // Check for match. > - ldr(t1, Address(t3_t)); > - cmp(obj, t1); > - br(Assembler::EQ, monitor_found); > + increment(t3_t, in_bytes(OMCache::oop_to_oop_difference())); > > + ldr(t1, Address(t3_t)); > // Search until null encountered, guaranteed _null_sentinel at end. > - increment(t3_t, in_bytes(OMCache::oop_to_oop_difference())); > - cbnz(t1, loop); > - // Cache Miss, NE set from cmp above, cbnz does not set flags > - b(slow_path); > + cbz(t1, slow_path); > + // Check for match. > + cmp(obj, t1); > + br(Assembler::NE, loop); > > bind(monitor_found); I see just the loop. I read it as the a cachehit should not take conditional branches. The `num_unrolled = 1` effectively became what you suggest, and showed similar regressions (but with only one unrolled lookup). And only tested on one specific machine. But let us leave it to a follow up RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1716546168 From amitkumar at openjdk.org Wed Aug 14 09:03:18 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 14 Aug 2024 09:03:18 GMT Subject: RFR: 8338365: [PPC64, s390] Out-of-bounds array access in secondary_super_cache Message-ID: Port for s390x and PPC for the bug: [JDK-8337958](https://bugs.openjdk.org/browse/JDK-8337958), Out-of-bounds array access in secondary_super_cache ------------- Commit messages: - fix for ppc & s390x Changes: https://git.openjdk.org/jdk/pull/20578/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20578&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338365 Stats: 6 lines in 2 files changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20578.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20578/head:pull/20578 PR: https://git.openjdk.org/jdk/pull/20578 From fjiang at openjdk.org Wed Aug 14 09:12:53 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 14 Aug 2024 09:12:53 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v5] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 12:10:28 GMT, Roberto Casta?eda Lozano wrote: > > Are you planning to rebase it with master ? Nothing important, but there were couple of failures which are fixed after this PR. So will make test result a bit clean for us ?. > > Actually, I have refrained to update to the latest mainline changes to avoid interfering with the porting work while it is in progress, but if there is consensus among the port maintainers I would be happy to update the changeset regularly. @TheRealMDoerr @feilongjiang @offamitkumar @snazarkin what do you prefer? I have already merged upstream commits on my local branch, so I'm fine with regular updates. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2288247680 From adinn at openjdk.org Wed Aug 14 09:22:27 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 14 Aug 2024 09:22:27 GMT Subject: RFR: 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime [v4] In-Reply-To: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> References: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> Message-ID: > Store the throw_exception and jfr stub code as blobs in class SharedRuntime, move the generation code to the the arch-specific generator classes and update client code to access them from their new location. Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: fix riscv port issues ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20566/files - new: https://git.openjdk.org/jdk/pull/20566/files/f12aea03..16482052 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20566&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20566&range=02-03 Stats: 8 lines in 1 file changed: 6 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20566.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20566/head:pull/20566 PR: https://git.openjdk.org/jdk/pull/20566 From adinn at openjdk.org Wed Aug 14 09:22:27 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 14 Aug 2024 09:22:27 GMT Subject: RFR: 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime [v3] In-Reply-To: References: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> <07DqhAfjMD9qfeno10HOAuNBeiIul86acqTMpE6YtaY=.2569accb-c0ab-470f-b348-5894831be5d5@github.com> Message-ID: <8YHlnN9Sp2ngMRcg7Wp8hn4vVmixiqPRNex02J3wsW4=.25c2c2af-196c-4066-a7a0-99f689c54de8@github.com> On Wed, 14 Aug 2024 06:58:09 GMT, Fei Yang wrote: >> Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: >> >> fix accidental paste > > Hi Andrew, I find that we need following add-on change for riscv: > > > diff --git a/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp b/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp > index dc89e489b24..bed24e442e8 100644 > --- a/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp > +++ b/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp > @@ -66,6 +66,12 @@ > > #define __ masm-> > > +#ifdef PRODUCT > +#define BLOCK_COMMENT(str) /* nothing */ > +#else > +#define BLOCK_COMMENT(str) __ block_comment(str) > +#endif > + > const int StackAlignmentInSlots = StackAlignmentInBytes / VMRegImpl::stack_slot_size; > > class RegisterSaver { > @@ -2742,7 +2748,7 @@ static void jfr_epilogue(MacroAssembler* masm) { > // For c2: c_rarg0 is junk, call to runtime to write a checkpoint. > // It returns a jobject handle to the event writer. > // The handle is dereferenced and the return value is the event writer oop. > -static RuntimeStub* SharedRuntime::generate_jfr_write_checkpoint() { > +RuntimeStub* SharedRuntime::generate_jfr_write_checkpoint() { > enum layout { > fp_off, > fp_off2, > @@ -2780,7 +2786,7 @@ static RuntimeStub* SharedRuntime::generate_jfr_write_checkpoint() { > } > > // For c2: call to return a leased buffer. > -static RuntimeStub* SharedRuntime::generate_jfr_return_lease() { > +RuntimeStub* SharedRuntime::generate_jfr_return_lease() { > enum layout { > fp_off, > fp_off2, Hi @RealFYang > Hi Andrew, I find that we need following add-on change for riscv: Thanks for checking. Should be fixed by the latest push. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20566#issuecomment-2288265207 From aboldtch at openjdk.org Wed Aug 14 09:24:34 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 14 Aug 2024 09:24:34 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v19] In-Reply-To: References: Message-ID: <4g1C6IMPhW60t2IygFaP6KBhaOtr0VEepPM2D6fWMPE=.91c3d202-a5e1-4669-9846-e89e6a360e91@github.com> > When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. > > This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. > > A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). > > This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). > > Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. > > # Cleanups > > Cleaned up displaced header usage for: > * BasicLock > * Contains some Zero changes > * Renames one exported JVMCI field > * ObjectMonitor > * Updates comments and tests consistencies > > # Refactoring > > `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. > > The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. > > _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ > > # LightweightSynchronizer > > Working on adapting and incorporating the following section as a comment in the source code > > ## Fast Locking > > CAS on locking bits in markWord. > 0b00 (Fast Locked) <--> 0b01 (Unlocked) > > When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. > > If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf... Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Use jdk.test.lib.Utils.getRandomInstance() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20067/files - new: https://git.openjdk.org/jdk/pull/20067/files/3f29e6d6..4d67422f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=17-18 Stats: 8 lines in 1 file changed: 4 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20067.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067 PR: https://git.openjdk.org/jdk/pull/20067 From aboldtch at openjdk.org Wed Aug 14 09:37:59 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 14 Aug 2024 09:37:59 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v16] In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 21:56:29 GMT, Daniel D. Daugherty wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Whitespace and nits > > src/hotspot/share/runtime/vframe.cpp line 252: > >> 250: if (mark.has_monitor()) { >> 251: ObjectMonitor* mon = ObjectSynchronizer::read_monitor(current, monitor->owner(), mark); >> 252: if (// if the monitor is null we must be in the process of locking > > nit - please add a space after `(` Should I align the rest of the lines? Adding the extra space here looks strange to me. But the inlined comments looks strange as well. This is all pre-existing code that just moved around a bit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1716616351 From amitkumar at openjdk.org Wed Aug 14 09:41:49 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 14 Aug 2024 09:41:49 GMT Subject: RFR: 8338365: [PPC64, s390] Out-of-bounds array access in secondary_super_cache In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 08:58:20 GMT, Amit Kumar wrote: > Port for s390x and PPC for the bug: [JDK-8337958](https://bugs.openjdk.org/browse/JDK-8337958), Out-of-bounds array access in secondary_super_cache Tier1 (fastdebug) test passed with these settings: diff --git a/src/hotspot/share/runtime/globals.hpp b/src/hotspot/share/runtime/globals.hpp index 61efc0b9376..5e91b4e22ca 100644 --- a/src/hotspot/share/runtime/globals.hpp +++ b/src/hotspot/share/runtime/globals.hpp @@ -1975,16 +1975,16 @@ const int ObjectAlignmentInBytes = 8; "rewriting/transformation independently of the JVMTI " \ "can_{retransform/redefine}_classes capabilities.") \ \ - product(bool, UseSecondarySupersCache, true, DIAGNOSTIC, \ + product(bool, UseSecondarySupersCache, false, DIAGNOSTIC, \ "Use secondary supers cache during subtype checks.") \ \ - product(bool, UseSecondarySupersTable, false, DIAGNOSTIC, \ + product(bool, UseSecondarySupersTable, true, DIAGNOSTIC, \ "Use hash table to lookup secondary supers.") \ \ - product(bool, VerifySecondarySupers, false, DIAGNOSTIC, \ + product(bool, VerifySecondarySupers, true, DIAGNOSTIC, \ "Check that linear and hashed secondary lookups return the same result.") \ \ - product(bool, StressSecondarySupers, false, DIAGNOSTIC, \ + product(bool, StressSecondarySupers, true, DIAGNOSTIC, \ "Use a terrible hash function in order to generate many collisions.") \ // end of RUNTIME_FLAGS @TheRealMDoerr would you please take a look :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20578#issuecomment-2288302795 PR Comment: https://git.openjdk.org/jdk/pull/20578#issuecomment-2288303554 From shade at openjdk.org Wed Aug 14 10:01:09 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Aug 2024 10:01:09 GMT Subject: RFR: 8293650: Shenandoah: Support archived heap objects [v4] In-Reply-To: References: Message-ID: > This implements CDS Java heap loading for Shenandoah. There are peculiarities with how CDS loads objects: it basically asks for a contiguous block of memory, fills it out, potentially relocating the objects. This gets interesting when a single Shenandoah region cannot contain the entirety of the load. See the implementation for gory details. > > Current implementation would work well only with Shenandoah heap regions >= 1M, in other words, with the heaps >=2G. It would be better if we trim down the min alignment, thus unblocking smaller heaps. It is not necessary to do so in this PR, so I track that work separately: [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828). > > Additional testing: > - [x] New test > - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` > - [x] Same as above, but `MIN_GC_REGION_ALIGNMENT` manually dropped to 256K (mimics [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828)) Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Work around 32-bit build failure ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20468/files - new: https://git.openjdk.org/jdk/pull/20468/files/361a67db..7486a1e7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20468&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20468&range=02-03 Stats: 8 lines in 1 file changed: 7 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20468.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20468/head:pull/20468 PR: https://git.openjdk.org/jdk/pull/20468 From shade at openjdk.org Wed Aug 14 10:01:09 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Aug 2024 10:01:09 GMT Subject: RFR: 8293650: Shenandoah: Support archived heap objects [v4] In-Reply-To: <-ESSW0ZYc_J37C2REktneBwf3ybOs_RUntRbKQZYy1U=.388e8471-2f5c-4949-866b-4b3511739c22@github.com> References: <-ESSW0ZYc_J37C2REktneBwf3ybOs_RUntRbKQZYy1U=.388e8471-2f5c-4949-866b-4b3511739c22@github.com> Message-ID: On Tue, 13 Aug 2024 19:47:51 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/cds/archiveHeapWriter.hpp line 126: >> >>> 124: // depends on -Xmx, but can never be smaller than 1 * M. >>> 125: // (TODO: Perhaps change to 256K to be compatible with Shenandoah) >>> 126: static constexpr int MIN_GC_REGION_ALIGNMENT = 1 * M; >> >> Couldn't you just move up the constant to the public section? >> Also, I'm not sure what's the point of it being constexpr (rather that just const), but that is pre-existing. > > Yeah, I had problems with that. But I think I just misplaced the constant the last time around. Let me see what Windows GHA runs have to say about the code in new commit. Ah, now I remember the real reason I did this: I needed the constant out of the `#if INCLUDE_CDS_JAVA_HEAP` block. `INCLUDE_CDS_JAVA_HEAP` is only defined for `#if INCLUDE_CDS && INCLUDE_G1GC && defined(_LP64)`. So it breaks 32-bit builds. This will become moot after we remove Shenandoah check for region alignment, so I just worked it around in Shenandoah code now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20468#discussion_r1716648043 From yzheng at openjdk.org Wed Aug 14 10:02:51 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 14 Aug 2024 10:02:51 GMT Subject: RFR: 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime [v4] In-Reply-To: References: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> Message-ID: On Wed, 14 Aug 2024 09:22:27 GMT, Andrew Dinn wrote: >> Store the throw_exception and jfr stub code as blobs in class SharedRuntime, move the generation code to the the arch-specific generator classes and update client code to access them from their new location. > > Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: > > fix riscv port issues src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 330: > 328: static_field(StubRoutines, _verify_oop_count, jint) \ > 329: \ > 330: static_field(StubRoutines, _throw_delayed_StackOverflowError_entry, address) \ Please add the following symbol exporting diff --git a/src/hotspot/share/jvmci/jvmciCompilerToVM.hpp b/src/hotspot/share/jvmci/jvmciCompilerToVM.hpp index 8fdb96a3038..0e96cea6596 100644 --- a/src/hotspot/share/jvmci/jvmciCompilerToVM.hpp +++ b/src/hotspot/share/jvmci/jvmciCompilerToVM.hpp @@ -50,6 +50,7 @@ class CompilerToVM { static address SharedRuntime_deopt_blob_unpack_with_exception_in_tls; static address SharedRuntime_deopt_blob_uncommon_trap; static address SharedRuntime_polling_page_return_handler; + static address SharedRuntime_throw_delayed_StackOverflowError_blob; static address nmethod_entry_barrier; static int thread_disarmed_guard_value_offset; diff --git a/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp b/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp index 2116133e56e..27031bf55fe 100644 --- a/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp +++ b/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp @@ -68,6 +68,7 @@ address CompilerToVM::Data::SharedRuntime_deopt_blob_unpack; address CompilerToVM::Data::SharedRuntime_deopt_blob_unpack_with_exception_in_tls; address CompilerToVM::Data::SharedRuntime_deopt_blob_uncommon_trap; address CompilerToVM::Data::SharedRuntime_polling_page_return_handler; +address CompilerToVM::Data::SharedRuntime_throw_delayed_StackOverflowError_blob; address CompilerToVM::Data::nmethod_entry_barrier; int CompilerToVM::Data::thread_disarmed_guard_value_offset; @@ -158,6 +159,7 @@ void CompilerToVM::Data::initialize(JVMCI_TRAPS) { SharedRuntime_deopt_blob_unpack_with_exception_in_tls = SharedRuntime::deopt_blob()->unpack_with_exception_in_tls(); SharedRuntime_deopt_blob_uncommon_trap = SharedRuntime::deopt_blob()->uncommon_trap(); SharedRuntime_polling_page_return_handler = SharedRuntime::polling_page_return_handler_blob()->entry_point(); + SharedRuntime_throw_delayed_StackOverflowError_blob = SharedRuntime::throw_delayed_StackOverflowError_entry(); BarrierSetNMethod* bs_nm = BarrierSet::barrier_set()->barrier_set_nmethod(); if (bs_nm != nullptr) { diff --git a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp index 130f1032e65..e0c56ccabee 100644 --- a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp +++ b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp @@ -68,6 +68,8 @@ static_field(CompilerToVM::Data, SharedRuntime_deopt_blob_uncommon_trap, address) \ static_field(CompilerToVM::Data, SharedRuntime_polling_page_return_handler, \ address) \ + static_field(CompilerToVM::Data, SharedRuntime_throw_delayed_StackOverflowError_blob, \ + address) \ \ static_field(CompilerToVM::Data, nmethod_entry_barrier, address) \ static_field(CompilerToVM::Data, thread_disarmed_guard_value_offset, int) \ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20566#discussion_r1716650487 From mdoerr at openjdk.org Wed Aug 14 10:05:47 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 14 Aug 2024 10:05:47 GMT Subject: RFR: 8338365: [PPC64, s390] Out-of-bounds array access in secondary_super_cache In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 08:58:20 GMT, Amit Kumar wrote: > Port for s390x and PPC for the bug: [JDK-8337958](https://bugs.openjdk.org/browse/JDK-8337958), Out-of-bounds array access in secondary_super_cache LGTM. Thanks for removing the obsolete comment. I'll test it over night. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20578#pullrequestreview-2237714252 From rcastanedalo at openjdk.org Wed Aug 14 12:38:51 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 14 Aug 2024 12:38:51 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v5] In-Reply-To: References: Message-ID: <259a7NXcZVtVnc3vlOTN2eF4zPq3U_QBKDLNnvE1OJw=.894d8054-8947-40c2-a62d-1dd387477013@github.com> On Wed, 14 Aug 2024 09:10:10 GMT, Feilong Jiang wrote: > I have already merged upstream commits on my local branch, so I'm fine with regular updates. Thanks, let's go with this version and see if we need a new update in a few weeks (or, perhaps, all platforms have been ported by then ?). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2288628007 From shade at openjdk.org Wed Aug 14 12:55:50 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Aug 2024 12:55:50 GMT Subject: RFR: 8338330: Fix -Wzero-as-null-pointer-constant warnings from THROW_XXX_0 In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 22:19:45 GMT, Kim Barrett wrote: > Please review this change to add THROW_ARG_NULL and THROW_HANDLE_NULL macros, > and use them instead of the corresponding THROW_XXX_0 macros in contexts where > a pointer value is needed. This removes some -Wzero-as-null-pointer-constant > warnings. > > There aren't many uses of either (only one of the HANDLE variant). An > alternative would have been to change the callers to use the unsuffixed > variant with a nullptr value argument. Adding the macros is consistent with > other THROW variants, and seems a little bit more readable. > > Testing: mach5 tier1 Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20574#pullrequestreview-2238075033 From shade at openjdk.org Wed Aug 14 13:00:54 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Aug 2024 13:00:54 GMT Subject: RFR: 8336468: Reflection and MethodHandles should use more precise initializer checks [v2] In-Reply-To: References: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com> Message-ID: On Tue, 16 Jul 2024 17:12:08 GMT, Aleksey Shipilev wrote: >> This PR should cover the Reflection/MethodHandles part of [JDK-8336103](https://bugs.openjdk.org/browse/JDK-8336103). >> >> There are places where we change the behavior: `clinit` would now be recorded as "method", instead of "constructor". Tracing back the uses of `get_flags`: it is used for initializing `java.lang.ClassFrameInfo.flags`. There seem to be no readers for this field in VM. Java side for `j.l.CFI` does not seem to check any method/constructor flags. So I would say this change in behavior is not really visible, and there is no need to try and keep the old (odd) behavior. >> >> I also inlined the `select_method` definition, which allows for a bit more straight-forward local code, and obviates the need for wrapping things with `methodHandle`. >> >> @mlchung, you probably want to look at this more closely. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into JDK-8336468-reflection-init-checks > - Remove unnecessary handle-izing > - Fix > - Fix Not now, bot. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20192#issuecomment-2288672706 From shade at openjdk.org Wed Aug 14 13:06:51 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Aug 2024 13:06:51 GMT Subject: RFR: 8338331: Fix -Wzero-as-null-pointer-constant warnings from CHECK_0 in jni.cpp In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 22:41:19 GMT, Kim Barrett wrote: > Please review this change to some macros in jni.cpp. These macros were using > CHECK_0 when calling functions that can "throw" exceptions. However, the > return type involved is provided by a macro argument, and is a pointer type > for some uses of the macros. This triggered -Wzero-as-null-pointer-constant > warnings when enabled. > > To remove the warnings, these CHECK_0() uses are changed to CHECK_() with an > argument expression that constructs and value-initializes a temporary of that > return type, e.g. `ResultType{}`. Value-initialization of a scalar type is > zero-initialization. Zero-initialization of a scalar type initializes it to > the value obtained by convertion a literal 0 to that type. So a zero of the > appropriate type for arithmetic types. For pointer types it's initialized to > nullptr, without triggering the warning. > > Testing: mach5 tier1 Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20575#pullrequestreview-2238104942 From rcastanedalo at openjdk.org Wed Aug 14 13:11:25 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 14 Aug 2024 13:11:25 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v6] In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 20:42:36 GMT, Martin Doerr wrote: >> Good question! I have checked (no pun intended) and it turns out C2 never uses stores (or other memory access operations) with late-expanded barriers to perform implicit null checks. This is accidental, due to the fact that all these memory operations use MachTemp nodes in the C2 code generation stage (to reserve registers for their ADL TEMP operands). C2's implicit null check analysis requires that all inputs of a candidate memory operation dominate the null check [1], which fails if the operation uses MachTemp nodes, since these are always placed in the same basic block [2]. >> >> Note that this optimization triggers very rarely, if at all, for memory operations in the current early barrier expansion model, since the additional control flow of the barrier code obfuscates the analysis. For late barrier expansion, the analysis could be easily extended to recognize and hoist MachTemp nodes together with their user memory operation that is a candidate to implement the implicit null check [3], but that would require extending the barrier assembly emission step to populate the implicit null exception table correctly. Since this seems non-trivial, and would also affect other garbage collectors (ZGC), I suggest to simply assert for now that we do not generate implicit null checks for memory operations with barrier data (as in [4]), and leave full support for implicit null checks for these G1 and ZGC operations to a future RFE. What do you think? >> >> [1] https://github.com/robcasloz/jdk/blob/d21104ca8ff1eef88a9d87fb78dda3009414b5b8/src/hotspot/share/opto/lcm.cpp#L310-L328 >> [2] https://github.com/robcasloz/jdk/blob/d21104ca8ff1eef88a9d87fb78dda3009414b5b8/src/hotspot/share/opto/gcm.cpp#L1397-L1404 >> [3] https://github.com/robcasloz/jdk/commit/e0ab1e418b81c0acddff2190ca57b3335b5214ba >> [4] https://github.com/robcasloz/jdk/commit/e0ab1e418b81c0acddff2190ca57b3335b5214ba#diff-554fddca91406a67dc0f8faee12dc30c709181685a0add7f4ba9ae5ace68f192R2031-R2032 > > Thanks for figuring it out! Makes sense. Added the assertion in commit 554de779. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1716895420 From rcastanedalo at openjdk.org Wed Aug 14 13:11:25 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 14 Aug 2024 13:11:25 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v8] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Assert that no implicit null checks are generated for memory accesses with barriers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/88d28b9f..554de779 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=06-07 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From adinn at openjdk.org Wed Aug 14 13:21:50 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 14 Aug 2024 13:21:50 GMT Subject: RFR: 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime [v4] In-Reply-To: References: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> Message-ID: On Wed, 14 Aug 2024 10:00:19 GMT, Yudi Zheng wrote: >> Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: >> >> fix riscv port issues > > src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 330: > >> 328: static_field(StubRoutines, _verify_oop_count, jint) \ >> 329: \ >> 330: static_field(StubRoutines, _throw_delayed_StackOverflowError_entry, address) \ > > Please add the following symbol exporting > > diff --git a/src/hotspot/share/jvmci/jvmciCompilerToVM.hpp b/src/hotspot/share/jvmci/jvmciCompilerToVM.hpp > index 8fdb96a3038..0e96cea6596 100644 > --- a/src/hotspot/share/jvmci/jvmciCompilerToVM.hpp > +++ b/src/hotspot/share/jvmci/jvmciCompilerToVM.hpp > @@ -50,6 +50,7 @@ class CompilerToVM { > static address SharedRuntime_deopt_blob_unpack_with_exception_in_tls; > static address SharedRuntime_deopt_blob_uncommon_trap; > static address SharedRuntime_polling_page_return_handler; > + static address SharedRuntime_throw_delayed_StackOverflowError_blob; > > static address nmethod_entry_barrier; > static int thread_disarmed_guard_value_offset; > diff --git a/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp b/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp > index 2116133e56e..27031bf55fe 100644 > --- a/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp > +++ b/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp > @@ -68,6 +68,7 @@ address CompilerToVM::Data::SharedRuntime_deopt_blob_unpack; > address CompilerToVM::Data::SharedRuntime_deopt_blob_unpack_with_exception_in_tls; > address CompilerToVM::Data::SharedRuntime_deopt_blob_uncommon_trap; > address CompilerToVM::Data::SharedRuntime_polling_page_return_handler; > +address CompilerToVM::Data::SharedRuntime_throw_delayed_StackOverflowError_blob; > > address CompilerToVM::Data::nmethod_entry_barrier; > int CompilerToVM::Data::thread_disarmed_guard_value_offset; > @@ -158,6 +159,7 @@ void CompilerToVM::Data::initialize(JVMCI_TRAPS) { > SharedRuntime_deopt_blob_unpack_with_exception_in_tls = SharedRuntime::deopt_blob()->unpack_with_exception_in_tls(); > SharedRuntime_deopt_blob_uncommon_trap = SharedRuntime::deopt_blob()->uncommon_trap(); > SharedRuntime_polling_page_return_handler = SharedRuntime::polling_page_return_handler_blob()->entry_point(); > + SharedRuntime_throw_delayed_StackOverflowError_blob = SharedRuntime::throw_delayed_StackOverflowError_entry(); > > BarrierSetNMethod* bs_nm = BarrierSet::b... Hi @mur47x111. Thanks for looking at this PR. I deleted the static field declaration from `vmStructs_jvmci.cpp` because I found no other mention of `throw_delayed_StackOverflowError_entry` under `src/hotspot/share/jvmci`. So, I assumed it was not being used by any JVMCI clients. I am happy to push a patch with your proposed changes. However, I just wanted to check whether you are sure you want to use the name `SharedRuntime_throw_delayed_StackOverflowError_blob` in the new declarations of the `CompilerToVM` field. I am asking because I when I moved the exception stubs from `StubGenerator` to `SharedRuntime` I also made a change in the way the stubs are named, stored and accessed. The original field `StubGenerator::SharedRuntime_throw_delayed_StackOverflowError_entry` stored the entry point for the throw routine and was of type `address`. The getter method that returns the entry address simply returns the field value. The relocated field `SharedRuntime::_throw_delayed_StackOverflowError_blob` stores a pointer to a `RuntimeBlob`, the one that contains the throw routine. That is why the field name ends with `_blob` not `_entry`. The entry address is not stored directly in class `SharedRuntime`. In the new code the getter method computes the address by calling the blob's `entry_point()` method. So, your patch is seems to be mixing up things by using the name `SharedRuntime_throw_delayed_StackOverflowError_blob` for the `CompilerToVM` field but declaring it with type `address`. Would it make more sense to use the name `SharedRuntime_throw_delayed_StackOverflowError_entry` for this field? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20566#discussion_r1716914885 From yzheng at openjdk.org Wed Aug 14 13:27:51 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 14 Aug 2024 13:27:51 GMT Subject: RFR: 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime [v4] In-Reply-To: References: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> Message-ID: On Wed, 14 Aug 2024 13:19:35 GMT, Andrew Dinn wrote: >> src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 330: >> >>> 328: static_field(StubRoutines, _verify_oop_count, jint) \ >>> 329: \ >>> 330: static_field(StubRoutines, _throw_delayed_StackOverflowError_entry, address) \ >> >> Please add the following symbol exporting >> >> diff --git a/src/hotspot/share/jvmci/jvmciCompilerToVM.hpp b/src/hotspot/share/jvmci/jvmciCompilerToVM.hpp >> index 8fdb96a3038..0e96cea6596 100644 >> --- a/src/hotspot/share/jvmci/jvmciCompilerToVM.hpp >> +++ b/src/hotspot/share/jvmci/jvmciCompilerToVM.hpp >> @@ -50,6 +50,7 @@ class CompilerToVM { >> static address SharedRuntime_deopt_blob_unpack_with_exception_in_tls; >> static address SharedRuntime_deopt_blob_uncommon_trap; >> static address SharedRuntime_polling_page_return_handler; >> + static address SharedRuntime_throw_delayed_StackOverflowError_blob; >> >> static address nmethod_entry_barrier; >> static int thread_disarmed_guard_value_offset; >> diff --git a/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp b/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp >> index 2116133e56e..27031bf55fe 100644 >> --- a/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp >> +++ b/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp >> @@ -68,6 +68,7 @@ address CompilerToVM::Data::SharedRuntime_deopt_blob_unpack; >> address CompilerToVM::Data::SharedRuntime_deopt_blob_unpack_with_exception_in_tls; >> address CompilerToVM::Data::SharedRuntime_deopt_blob_uncommon_trap; >> address CompilerToVM::Data::SharedRuntime_polling_page_return_handler; >> +address CompilerToVM::Data::SharedRuntime_throw_delayed_StackOverflowError_blob; >> >> address CompilerToVM::Data::nmethod_entry_barrier; >> int CompilerToVM::Data::thread_disarmed_guard_value_offset; >> @@ -158,6 +159,7 @@ void CompilerToVM::Data::initialize(JVMCI_TRAPS) { >> SharedRuntime_deopt_blob_unpack_with_exception_in_tls = SharedRuntime::deopt_blob()->unpack_with_exception_in_tls(); >> SharedRuntime_deopt_blob_uncommon_trap = SharedRuntime::deopt_blob()->uncommon_trap(); >> SharedRuntime_polling_page_return_handler = SharedRuntime::polling_page_return_handler_blob()->entry_point(); >> + SharedRuntime_throw_delayed_StackOverflowError_blob = SharedRuntime::throw_delayed_St... > > Hi @mur47x111. Thanks for looking at this PR. > > I deleted the static field declaration from `vmStructs_jvmci.cpp` because I found no other mention of `throw_delayed_StackOverflowError_entry` under `src/hotspot/share/jvmci`. So, I assumed it was not being used by any JVMCI clients. > > I am happy to push a patch with your proposed changes. However, I just wanted to check whether you are sure you want to use the name `SharedRuntime_throw_delayed_StackOverflowError_blob` in the new declarations of the `CompilerToVM` field. > > I am asking because I when I moved the exception stubs from `StubGenerator` to `SharedRuntime` I also made a change in the way the stubs are named, stored and accessed. > > The original field `StubGenerator::SharedRuntime_throw_delayed_StackOverflowError_entry` stored the entry point for the throw routine and was of type `address`. The getter method that returns the entry address simply returns the field value. > > The relocated field `SharedRuntime::_throw_delayed_StackOverflowError_blob` stores a pointer to a `RuntimeBlob`, the one that contains the throw routine. That is why the field name ends with `_blob` not `_entry`. The entry address is not stored directly in class `SharedRuntime`. In the new code the getter method computes the address by calling the blob's `entry_point()` method. > > So, your patch is seems to be mixing up things by using the name `SharedRuntime_throw_delayed_StackOverflowError_blob` for the `CompilerToVM` field but declaring it with type `address`. Would it make more sense to use the name `SharedRuntime_throw_delayed_StackOverflowError_entry` for this field? Right, `SharedRuntime_throw_delayed_StackOverflowError_entry` is better. Thanks for the explanation! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20566#discussion_r1716924978 From adinn at openjdk.org Wed Aug 14 13:39:23 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 14 Aug 2024 13:39:23 GMT Subject: RFR: 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime [v5] In-Reply-To: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> References: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> Message-ID: <4YGUzK1zrvn1DEADwS8jCCaoGGJZOmKVwx0opVd74ZQ=.739fe436-a46e-4e0f-ac88-8ba7dd9c2f9b@github.com> > Store the throw_exception and jfr stub code as blobs in class SharedRuntime, move the generation code to the the arch-specific generator classes and update client code to access them from their new location. Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: fix up jvmci static field declarations ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20566/files - new: https://git.openjdk.org/jdk/pull/20566/files/16482052..d0ba9688 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20566&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20566&range=03-04 Stats: 5 lines in 3 files changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20566.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20566/head:pull/20566 PR: https://git.openjdk.org/jdk/pull/20566 From adinn at openjdk.org Wed Aug 14 13:39:23 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 14 Aug 2024 13:39:23 GMT Subject: RFR: 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime [v5] In-Reply-To: References: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> Message-ID: On Wed, 14 Aug 2024 13:25:39 GMT, Yudi Zheng wrote: >> Hi @mur47x111. Thanks for looking at this PR. >> >> I deleted the static field declaration from `vmStructs_jvmci.cpp` because I found no other mention of `throw_delayed_StackOverflowError_entry` under `src/hotspot/share/jvmci`. So, I assumed it was not being used by any JVMCI clients. >> >> I am happy to push a patch with your proposed changes. However, I just wanted to check whether you are sure you want to use the name `SharedRuntime_throw_delayed_StackOverflowError_blob` in the new declarations of the `CompilerToVM` field. >> >> I am asking because I when I moved the exception stubs from `StubGenerator` to `SharedRuntime` I also made a change in the way the stubs are named, stored and accessed. >> >> The original field `StubGenerator::SharedRuntime_throw_delayed_StackOverflowError_entry` stored the entry point for the throw routine and was of type `address`. The getter method that returns the entry address simply returns the field value. >> >> The relocated field `SharedRuntime::_throw_delayed_StackOverflowError_blob` stores a pointer to a `RuntimeBlob`, the one that contains the throw routine. That is why the field name ends with `_blob` not `_entry`. The entry address is not stored directly in class `SharedRuntime`. In the new code the getter method computes the address by calling the blob's `entry_point()` method. >> >> So, your patch is seems to be mixing up things by using the name `SharedRuntime_throw_delayed_StackOverflowError_blob` for the `CompilerToVM` field but declaring it with type `address`. Would it make more sense to use the name `SharedRuntime_throw_delayed_StackOverflowError_entry` for this field? > > Right, `SharedRuntime_throw_delayed_StackOverflowError_entry` is better. Thanks for the explanation! Ok, I pushed a version of your patch modified to use that name. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20566#discussion_r1716943650 From dcubed at openjdk.org Wed Aug 14 15:10:56 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 14 Aug 2024 15:10:56 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v16] In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 09:35:10 GMT, Axel Boldt-Christmas wrote: >> src/hotspot/share/runtime/vframe.cpp line 252: >> >>> 250: if (mark.has_monitor()) { >>> 251: ObjectMonitor* mon = ObjectSynchronizer::read_monitor(current, monitor->owner(), mark); >>> 252: if (// if the monitor is null we must be in the process of locking >> >> nit - please add a space after `(` > > Should I align the rest of the lines? Adding the extra space here looks strange to me. But the inlined comments looks strange as well. This is all pre-existing code that just moved around a bit. I'm just not fond of a `// comment` butting right against code. Your call on whether to leave it alone or how to reformat it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1717114224 From gziemski at openjdk.org Wed Aug 14 15:42:51 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Wed, 14 Aug 2024 15:42:51 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 06:54:22 GMT, Stefan Karlsson wrote: > > Is everyone OK with MemTypeFlag? > > It's quite unfortunate to have a three-word type for something this prolific in our code base. Why not go with `MemType` and change variable names from `flag` to `mt`? > > ``` > static char* map_memory_to_file(size_t size, int fd, MEMFLAGS flag = mtNone); > ``` > > would then become: > > ``` > static char* map_memory_to_file(size_t size, int fd, MemType mt = mtNone); > ``` My initial choice was exactly that, but then I backed-off from renaming the arguments, because how big and intrusive the change it seemed. David seems to prefer `MemTypeFlag`, so that we don't have to rename all the arguments and I see a point in that, but it wouldn't be my first choice. Thomas seems to prefer `NMTCat` that I just don't like much, despite that it has NMT prefix in it, for some reason. If we could find a compromise that we all can live with, despite it not being exactly what every single person wants, then that would be great. We could this in separate steps: Initial effort (this fix): we rename `MEMFLAGS` to `MemType` Follow up effort(s): we either rename all arguments in one big push (intrusive) or we do it a file, or related files (like NMT) together at a time in a followup(s) or whenever we are in the file with some related fix. Eventually we would get there, which is better than what we have right now IMHO. Is this a reasonable compromise to everyone? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2289134648 From thomas.stuefe at gmail.com Wed Aug 14 16:14:36 2024 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 14 Aug 2024 18:14:36 +0200 Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: I am out sick after JVMLS, typical post conference flu. There is no need to rush this, right? I really dislike MemType for its very genericness. MEMFLAGS is a handle into the NMT subsystem. It has no meaning beyond NMT. Yet, it is spread all over the code base. Therefore I really would like an NMT prefix, whatever the name then is. Clear and easy to grep. A very generic name like MemType or similar will clash with many similar sounding specifiers from other places. That may not sound so much of an issue if you are only working within a single hotspot subsystem; but if you work in all corners of the JDK, you come to like clear succinct names, and an important part of clearness is scope, and the scope here is NMT. Just my 5 cent On Wed 14. Aug 2024 at 17:43, Gerard Ziemski wrote: > On Wed, 14 Aug 2024 06:54:22 GMT, Stefan Karlsson > wrote: > > > > Is everyone OK with MemTypeFlag? > > > > It's quite unfortunate to have a three-word type for something this > prolific in our code base. Why not go with `MemType` and change variable > names from `flag` to `mt`? > > > > ``` > > static char* map_memory_to_file(size_t size, int fd, MEMFLAGS flag = > mtNone); > > ``` > > > > would then become: > > > > ``` > > static char* map_memory_to_file(size_t size, int fd, MemType mt = > mtNone); > > ``` > > My initial choice was exactly that, but then I backed-off from renaming > the arguments, because how big and intrusive the change it seemed. > > David seems to prefer `MemTypeFlag`, so that we don't have to rename all > the arguments and I see a point in that, but it wouldn't be my first choice. > > Thomas seems to prefer `NMTCat` that I just don't like much, despite that > it has NMT prefix in it, for some reason. > > If we could find a compromise that we all can live with, despite it not > being exactly what every single person wants, then that would be great. We > could this in separate steps: > > Initial effort (this fix): we rename `MEMFLAGS` to `MemType` > > Follow up effort(s): we either rename all arguments in one big push > (intrusive) or we do it a file, or related files (like NMT) together at a > time in a followup(s) or whenever we are in the file with some related fix. > Eventually we would get there, which is better than what we have right now > IMHO. > > Is this a reasonable compromise to everyone? > > ------------- > > PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2289134648 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wkemper at openjdk.org Wed Aug 14 16:48:50 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 14 Aug 2024 16:48:50 GMT Subject: RFR: 8293650: Shenandoah: Support archived heap objects [v4] In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 10:01:09 GMT, Aleksey Shipilev wrote: >> This implements CDS Java heap loading for Shenandoah. There are peculiarities with how CDS loads objects: it basically asks for a contiguous block of memory, fills it out, potentially relocating the objects. This gets interesting when a single Shenandoah region cannot contain the entirety of the load. See the implementation for gory details. >> >> Current implementation would work well only with Shenandoah heap regions >= 1M, in other words, with the heaps >=2G. It would be better if we trim down the min alignment, thus unblocking smaller heaps. It is not necessary to do so in this PR, so I track that work separately: [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828). >> >> Additional testing: >> - [x] New test >> - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` >> - [x] Same as above, but `MIN_GC_REGION_ALIGNMENT` manually dropped to 256K (mimics [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828)) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Work around 32-bit build failure Marked as reviewed by wkemper (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20468#pullrequestreview-2238706842 From rkennke at openjdk.org Wed Aug 14 16:56:51 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 14 Aug 2024 16:56:51 GMT Subject: RFR: 8293650: Shenandoah: Support archived heap objects [v4] In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 10:01:09 GMT, Aleksey Shipilev wrote: >> This implements CDS Java heap loading for Shenandoah. There are peculiarities with how CDS loads objects: it basically asks for a contiguous block of memory, fills it out, potentially relocating the objects. This gets interesting when a single Shenandoah region cannot contain the entirety of the load. See the implementation for gory details. >> >> Current implementation would work well only with Shenandoah heap regions >= 1M, in other words, with the heaps >=2G. It would be better if we trim down the min alignment, thus unblocking smaller heaps. It is not necessary to do so in this PR, so I track that work separately: [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828). >> >> Additional testing: >> - [x] New test >> - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` >> - [x] Same as above, but `MIN_GC_REGION_ALIGNMENT` manually dropped to 256K (mimics [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828)) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Work around 32-bit build failure src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2509: > 2507: // Pull the constant here, since it is only available when INCLUDE_CDS_JAVA_HEAP is defined. > 2508: const size_t min_gc_region_align = 1 * M; > 2509: #if INCLUDE_CDS_JAVA_HEAP It makes me wonder if the whole body should perhaps go inside the INCLUDE_CDS_JAVA_HEAP block. It is only ever called from ArchiveHeapLoader, and that doesn's even exist without INCLUDE_CDS_JAVA_HEAP. I leave that up to you. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20468#discussion_r1717265883 From shade at openjdk.org Wed Aug 14 17:30:05 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Aug 2024 17:30:05 GMT Subject: RFR: 8293650: Shenandoah: Support archived heap objects [v5] In-Reply-To: References: Message-ID: > This implements CDS Java heap loading for Shenandoah. There are peculiarities with how CDS loads objects: it basically asks for a contiguous block of memory, fills it out, potentially relocating the objects. This gets interesting when a single Shenandoah region cannot contain the entirety of the load. See the implementation for gory details. > > Current implementation would work well only with Shenandoah heap regions >= 1M, in other words, with the heaps >=2G. It would be better if we trim down the min alignment, thus unblocking smaller heaps. It is not necessary to do so in this PR, so I track that work separately: [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828). > > Additional testing: > - [x] New test > - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` > - [x] Same as above, but `MIN_GC_REGION_ALIGNMENT` manually dropped to 256K (mimics [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828)) Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Wrap the whole thing in CDS define ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20468/files - new: https://git.openjdk.org/jdk/pull/20468/files/7486a1e7..d8984aa4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20468&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20468&range=03-04 Stats: 13 lines in 1 file changed: 5 ins; 7 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20468.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20468/head:pull/20468 PR: https://git.openjdk.org/jdk/pull/20468 From shade at openjdk.org Wed Aug 14 17:30:06 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Aug 2024 17:30:06 GMT Subject: RFR: 8293650: Shenandoah: Support archived heap objects [v4] In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 16:53:40 GMT, Roman Kennke wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Work around 32-bit build failure > > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2509: > >> 2507: // Pull the constant here, since it is only available when INCLUDE_CDS_JAVA_HEAP is defined. >> 2508: const size_t min_gc_region_align = 1 * M; >> 2509: #if INCLUDE_CDS_JAVA_HEAP > > It makes me wonder if the whole body should perhaps go inside the INCLUDE_CDS_JAVA_HEAP block. It is only ever called from ArchiveHeapLoader, and that doesn's even exist without INCLUDE_CDS_JAVA_HEAP. I leave that up to you. That should actually be cleaner. Let's see what 32-bit builds say. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20468#discussion_r1717308901 From shade at openjdk.org Wed Aug 14 17:54:51 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Aug 2024 17:54:51 GMT Subject: RFR: 8293650: Shenandoah: Support archived heap objects [v4] In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 17:26:17 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2509: >> >>> 2507: // Pull the constant here, since it is only available when INCLUDE_CDS_JAVA_HEAP is defined. >>> 2508: const size_t min_gc_region_align = 1 * M; >>> 2509: #if INCLUDE_CDS_JAVA_HEAP >> >> It makes me wonder if the whole body should perhaps go inside the INCLUDE_CDS_JAVA_HEAP block. It is only ever called from ArchiveHeapLoader, and that doesn's even exist without INCLUDE_CDS_JAVA_HEAP. I leave that up to you. > > That should actually be cleaner. Let's see what 32-bit builds say. Yup, looks like that worked. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20468#discussion_r1717343327 From duke at openjdk.org Wed Aug 14 18:00:06 2024 From: duke at openjdk.org (duke) Date: Wed, 14 Aug 2024 18:00:06 GMT Subject: Withdrawn: 8314488: Compile the JDK as C++17 In-Reply-To: References: Message-ID: On Mon, 24 Jul 2023 01:41:16 GMT, Julian Waters wrote: > Compile the JDK as C++17, enabling the use of all C++17 language features This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/14988 From shade at openjdk.org Wed Aug 14 18:00:51 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Aug 2024 18:00:51 GMT Subject: RFR: 8293650: Shenandoah: Support archived heap objects [v5] In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 17:30:05 GMT, Aleksey Shipilev wrote: >> This implements CDS Java heap loading for Shenandoah. There are peculiarities with how CDS loads objects: it basically asks for a contiguous block of memory, fills it out, potentially relocating the objects. This gets interesting when a single Shenandoah region cannot contain the entirety of the load. See the implementation for gory details. >> >> Current implementation would work well only with Shenandoah heap regions >= 1M, in other words, with the heaps >=2G. It would be better if we trim down the min alignment, thus unblocking smaller heaps. It is not necessary to do so in this PR, so I track that work separately: [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828). >> >> Additional testing: >> - [x] New test >> - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` >> - [x] Same as above, but `MIN_GC_REGION_ALIGNMENT` manually dropped to 256K (mimics [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828)) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Wrap the whole thing in CDS define @iklam, are you OK to move the `MIN_GC_REGION_ALIGNMENT` like this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20468#issuecomment-2289483974 From rkennke at openjdk.org Wed Aug 14 18:30:51 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 14 Aug 2024 18:30:51 GMT Subject: RFR: 8293650: Shenandoah: Support archived heap objects [v5] In-Reply-To: References: Message-ID: <6HJGTyyvtGoBeqp1FzUVpzY1JloYNHxao66m9DzQaj4=.098327d9-b2c8-45d5-b28a-fc152220f4cb@github.com> On Wed, 14 Aug 2024 17:30:05 GMT, Aleksey Shipilev wrote: >> This implements CDS Java heap loading for Shenandoah. There are peculiarities with how CDS loads objects: it basically asks for a contiguous block of memory, fills it out, potentially relocating the objects. This gets interesting when a single Shenandoah region cannot contain the entirety of the load. See the implementation for gory details. >> >> Current implementation would work well only with Shenandoah heap regions >= 1M, in other words, with the heaps >=2G. It would be better if we trim down the min alignment, thus unblocking smaller heaps. It is not necessary to do so in this PR, so I track that work separately: [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828). >> >> Additional testing: >> - [x] New test >> - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` >> - [x] Same as above, but `MIN_GC_REGION_ALIGNMENT` manually dropped to 256K (mimics [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828)) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Wrap the whole thing in CDS define Looks good to me now. Thank you! ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20468#pullrequestreview-2238915598 From dcubed at openjdk.org Wed Aug 14 21:07:59 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 14 Aug 2024 21:07:59 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v19] In-Reply-To: <4g1C6IMPhW60t2IygFaP6KBhaOtr0VEepPM2D6fWMPE=.91c3d202-a5e1-4669-9846-e89e6a360e91@github.com> References: <4g1C6IMPhW60t2IygFaP6KBhaOtr0VEepPM2D6fWMPE=.91c3d202-a5e1-4669-9846-e89e6a360e91@github.com> Message-ID: On Wed, 14 Aug 2024 09:24:34 GMT, Axel Boldt-Christmas wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Use jdk.test.lib.Utils.getRandomInstance() Just a couple of comments this time. I originally reviewed v10 and this time I reviewed v10..v18. src/hotspot/share/runtime/synchronizer.hpp line 126: > 124: > 125: static bool quick_notify(oopDesc* obj, JavaThread* current, bool All); > 126: Why add the extra blank line? ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20067#pullrequestreview-2239178344 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1717549253 From dcubed at openjdk.org Wed Aug 14 21:08:00 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 14 Aug 2024 21:08:00 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v15] In-Reply-To: <1fb1K_XEPWOFdZohDLyQmgXhulJfSelL9Ib0fpkmVFI=.c3beb140-6cb1-43db-ae6c-547d997c554b@github.com> References: <1fb1K_XEPWOFdZohDLyQmgXhulJfSelL9Ib0fpkmVFI=.c3beb140-6cb1-43db-ae6c-547d997c554b@github.com> Message-ID: On Tue, 13 Aug 2024 17:24:19 GMT, Daniel D. Daugherty wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove the last OMWorld references >> - Rename omworldtable_work to object_monitor_table_work > > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 341: > >> 339: >> 340: ObjectMonitor* LightweightSynchronizer::get_or_insert_monitor_from_table(oop object, JavaThread* current, bool* inserted) { >> 341: assert(LockingMode == LM_LIGHTWEIGHT, "must be"); > > Do you want to assert: `inserted != nullptr`? What was the resolution? I don't see a reply or a change here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1717543005 From iklam at openjdk.org Thu Aug 15 00:50:50 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 15 Aug 2024 00:50:50 GMT Subject: RFR: 8293650: Shenandoah: Support archived heap objects [v5] In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 17:30:05 GMT, Aleksey Shipilev wrote: >> This implements CDS Java heap loading for Shenandoah. There are peculiarities with how CDS loads objects: it basically asks for a contiguous block of memory, fills it out, potentially relocating the objects. This gets interesting when a single Shenandoah region cannot contain the entirety of the load. See the implementation for gory details. >> >> Current implementation would work well only with Shenandoah heap regions >= 1M, in other words, with the heaps >=2G. It would be better if we trim down the min alignment, thus unblocking smaller heaps. It is not necessary to do so in this PR, so I track that work separately: [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828). >> >> Additional testing: >> - [x] New test >> - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` >> - [x] Same as above, but `MIN_GC_REGION_ALIGNMENT` manually dropped to 256K (mimics [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828)) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Wrap the whole thing in CDS define Marked as reviewed by iklam (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20468#pullrequestreview-2239392410 From iklam at openjdk.org Thu Aug 15 00:50:50 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 15 Aug 2024 00:50:50 GMT Subject: RFR: 8293650: Shenandoah: Support archived heap objects [v5] In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 17:58:04 GMT, Aleksey Shipilev wrote: > @iklam, are you OK to move the `MIN_GC_REGION_ALIGNMENT` like this? The changes in the CDS code look fine to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20468#issuecomment-2290169886 From iklam at openjdk.org Thu Aug 15 00:57:59 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 15 Aug 2024 00:57:59 GMT Subject: RFR: 8337828: CDS: Trim down minimum GC region alignment In-Reply-To: References: Message-ID: On Mon, 5 Aug 2024 17:41:41 GMT, Aleksey Shipilev wrote: > CDS currently follows G1's minimum region size to guess which alignment to use when dumping the heap. The comment near the constant rightfully recognizes it would be convenient for Shenandoah to trim the alignment down to 256K (Shenandoah's min region size). If we do this, we will improve the heap sizes [JDK-8293650](https://bugs.openjdk.org/browse/JDK-8293650) can operate at. > > Unless I am missing something else, trimming down the min region alignment has impact on the size of the objects we can store in CDS archive. Conveniently, `-Xlog:cds+heap` prints the object size stats for us, and it looks we are way under the 256K limit: > > > $ build/macosx-aarch64-server-fastdebug/images/jdk/bin/java -XX:-UseCompressedOops -Xshare:dump -Xlog:cds+heap > ... > [0.921s][info][cds,heap] 0 objects are <= 8 bytes (total 0 bytes, avg 0.0 bytes) > [0.921s][info][cds,heap] 2550 objects are <= 16 bytes (total 40800 bytes, avg 16.0 bytes) > [0.921s][info][cds,heap] 14325 objects are <= 32 bytes (total 431896 bytes, avg 30.1 bytes) > [0.921s][info][cds,heap] 6572 objects are <= 64 bytes (total 301304 bytes, avg 45.8 bytes) > [0.921s][info][cds,heap] 1225 objects are <= 128 bytes (total 113112 bytes, avg 92.3 bytes) > [0.921s][info][cds,heap] 2173 objects are <= 256 bytes (total 384024 bytes, avg 176.7 bytes) > [0.921s][info][cds,heap] 143 objects are <= 512 bytes (total 47720 bytes, avg 333.7 bytes) > [0.921s][info][cds,heap] 40 objects are <= 1024 bytes (total 26872 bytes, avg 671.8 bytes) > [0.921s][info][cds,heap] 19 objects are <= 2048 bytes (total 29656 bytes, avg 1560.8 bytes) > [0.921s][info][cds,heap] 9 objects are <= 4096 bytes (total 20744 bytes, avg 2304.9 bytes) > [0.921s][info][cds,heap] 4 objects are <= 8192 bytes (total 20536 bytes, avg 5134.0 bytes) > [0.921s][info][cds,heap] 3 objects are <= 16384 bytes (total 30168 bytes, avg 10056.0 bytes) > [0.921s][info][cds,heap] 2 objects are <= 32768 bytes (total 32800 bytes, avg 16400.0 bytes) > [0.921s][info][cds,heap] 0 objects are <= 65536 bytes (total 0 bytes, avg 0.0 bytes) > [0.921s][info][cds,heap] 1 objects are <= 131072 bytes (total 66848 bytes, avg 66848.0 bytes) > [0.921s][info][cds,heap] 0 objects are <= 262144 bytes (total 0 bytes, avg 0.0 bytes) > [0.921s][info][cds,heap] 0 huge objects (tot... I think this is fine. Currently we archive only a very specific set of heap objects. In JDK-8315737 we will archive more kinds of object graphs, such as those used by invokedynamic call sites, but they are still more or less under our control and won't use very large objects. I don't foresee that we will archive arbitrary (user-defined) object graphs any time soon, without coming up with a clean API to do so. In time, perhaps CDS heap objects will transition to JDK-8326035 which will eliminate the need for MIN_GC_REGION_ALIGNMENT which limits the size of individual archived heap objects. ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20469#pullrequestreview-2239400600 From dholmes at openjdk.org Thu Aug 15 01:37:50 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 15 Aug 2024 01:37:50 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 17:13:06 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. > > There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) > I really dislike MemType for its very genericness. MEMFLAGS is a handle into the NMT subsystem. It has no meaning beyond NMT. Yet, it is spread all over the code base. Therefore I really would like an NMT prefix, whatever the name then is. I agree with @tstuefe here. `MemFlag` and `MemType` sound far too general when this is NMT specific. My preference to keep the "flags" part of the type was to avoid needing to rename many parameters. The usage of `MEMFLAGS flags` is quite extensive. I would not want to see a partial approach here where we end up with a non-flag type name but a flag variable name. `NMTTypeFlag` would I hope satisfy Thomas's requirement and avoid the need to do variable renames. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2290293698 From ysr at openjdk.org Thu Aug 15 02:14:55 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 15 Aug 2024 02:14:55 GMT Subject: RFR: 8293650: Shenandoah: Support archived heap objects [v5] In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 17:30:05 GMT, Aleksey Shipilev wrote: >> This implements CDS Java heap loading for Shenandoah. There are peculiarities with how CDS loads objects: it basically asks for a contiguous block of memory, fills it out, potentially relocating the objects. This gets interesting when a single Shenandoah region cannot contain the entirety of the load. See the implementation for gory details. >> >> Current implementation would work well only with Shenandoah heap regions >= 1M, in other words, with the heaps >=2G. It would be better if we trim down the min alignment, thus unblocking smaller heaps. It is not necessary to do so in this PR, so I track that work separately: [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828). >> >> Additional testing: >> - [x] New test >> - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` >> - [x] Same as above, but `MIN_GC_REGION_ALIGNMENT` manually dropped to 256K (mimics [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828)) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Wrap the whole thing in CDS define src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2503: > 2501: // We need to make sure it looks like regular allocation to the rest of GC. > 2502: > 2503: // CDS code would guarantee no objects straggle multiple regions, as long as straggle -> straddle ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20468#discussion_r1717776848 From ysr at openjdk.org Thu Aug 15 02:14:55 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 15 Aug 2024 02:14:55 GMT Subject: RFR: 8293650: Shenandoah: Support archived heap objects [v5] In-Reply-To: References: Message-ID: <-VMDmWFhJnNdTmb5O9bb2l64i47hodl2ngHAjjlheMI=.5e6d83aa-c053-4456-914b-10b4903d685b@github.com> On Thu, 15 Aug 2024 01:48:59 GMT, Y. Srinivas Ramakrishna wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Wrap the whole thing in CDS define > > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2503: > >> 2501: // We need to make sure it looks like regular allocation to the rest of GC. >> 2502: >> 2503: // CDS code would guarantee no objects straggle multiple regions, as long as > > straggle -> straddle Noob question: can one prove that CDS archived objects will never exceed the size of a region? Or does the archive writer refuse to create a dump in that case? I will assume the latter for the purposes of the following remark: When you walk over the objects allocated in the verifier loop further below at lines 2539-2543, I wonder if you should check that you always have a legitimate object starting and ending each alignment boundary, just to be super-paranoid. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20468#discussion_r1717791226 From eliu at openjdk.org Thu Aug 15 02:28:49 2024 From: eliu at openjdk.org (Eric Liu) Date: Thu, 15 Aug 2024 02:28:49 GMT Subject: RFR: 8337536: AArch64: Enable BTI branch protection for runtime part [v2] In-Reply-To: References: Message-ID: On Fri, 9 Aug 2024 13:37:54 GMT, Fei Gao wrote: >> This patch enables BTI branch protection for runtime part on Linux/aarch64 platform. >> >> Motivation >> >> 1. Since Fedora 33, glibc+kernel are PAC/BTI enabled by default. User-level packages can gain additional hardening by compiling with the GCC/Clang flag `-mbranch-protection=flag`. See [1]. >> >> 2. In JDK-8277204 [2], `--enable-branch-protection` was introduced as one VM configure flag, which would pass `-mbranch-protection=standard` compilation flags to all c/c++ files. Note that `standard` turns on both `pac-ret` and `bti` branch protections. For more details about code reuse attacks and hardware-assisted branch protections on AArch64, see [3]. >> >> However, we checked the `.note.gnu.property` section of all the shared libraries under `jdk/lib` on Fedora 40, and found that only libjvm.so didn't set these two target feature bits: >> >> >> GNU_PROPERTY_AARCH64_FEATURE_1_BTI >> GNU_PROPERTY_AARCH64_FEATURE_1_PAC >> >> >> Note-1: BTI is an all or nothing property for a link unit [4]. That is, libjvm.so is not BTI-enabled. >> >> Note-2: PAC bit in `.note.gnu.property` section is used to protect `.got.plt` table. It's independent of whether the relocatable objects use PAC or not. >> >> Goal >> >> Hence, this patch aims to set PAC/BTI feature bits of the `.note.gnu.property` section for libjvm.so. >> >> Implementation >> >> Task-1: find out the problematic input objects >> >> From [5], "Static linkers processing ELF relocatable objects must set the feature bit in the output object or image only if all the input objects have the corresponding feature bit set." Hence we suspect that the root cause is probably that the PAC/BTI feature bits are not set only for some input objects of libjvm.so. >> >> In order to find out these inputs, we passed `--force-bti` linker flag [4] in my local test. This linker flag would warn if any input object does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI. We got the following list: >> >> >> src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S >> src/hotspot/os_cpu/linux_aarch64/copy_linux_aarch64.S >> src/hotspot/os_cpu/linux_aarch64/safefetch_linux_aarch64.S >> src/hotspot/os_cpu/linux_aarch64/threadLS_linux_aarch64.S >> >> >> Task-2: add `.note.gnu.property` section for these assembly files >> >> As mentioned in Motivation-2 part, `-mbranch-protection=standard` is passed to compile c/c++ files but these assembly files are missed. >> >> In this patch, we also pass `-mbranch-protection=standard` flag to assembler (See the update i... > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Clean up makefile > - Merge branch 'master' into enable-bti-runtime > - 8337536: AArch64: Enable BTI branch protection for runtime part > > This patch enables BTI branch protection for runtime part on > Linux/aarch64 platform. > > Motivation > > 1. Since Fedora 33, glibc+kernel are PAC/BTI enabled by default. > User-level packages can gain additional hardening by compiling with the > GCC/Clang flag `-mbranch-protection=flag`. See [1]. > > 2. In JDK-8277204 [2], `--enable-branch-protection` was introduced as > one VM configure flag, which would pass `-mbranch-protection=standard` > compilation flags to all c/c++ files. Note that `standard` turns on both > `pac-ret` and `bti` branch protections. For more details about code > reuse attacks and hardware-assisted branch protections on AArch64, see > [3]. > > However, we checked the `.note.gnu.property` section of all the shared > libraries under `jdk/lib` on Fedora 40, and found that only libjvm.so > didn't set these two target feature bits: > > ``` > GNU_PROPERTY_AARCH64_FEATURE_1_BTI > GNU_PROPERTY_AARCH64_FEATURE_1_PAC > ``` > > Note-1: BTI is an all or nothing property for a link unit [4]. That is, > libjvm.so is not BTI-enabled. > > Note-2: PAC bit in `.note.gnu.property` section is used to protect > `.got.plt` table. It's independent of whether the relocatable objects > use PAC or not. > > Goal > > Hence, this patch aims to set PAC/BTI feature bits of the > `.note.gnu.property` section for libjvm.so. > > Implementation > > Task-1: find out the problematic input objects > > From [5], "Static linkers processing ELF relocatable objects must set > the feature bit in the output object or image only if all the input > objects have the corresponding feature bit set." Hence we suspect that > the root cause is probably that the PAC/BTI feature bits are not set > only for some input objects of libjvm.so. > > In order to find out these inputs, we passed `--force-bti` linker flag > [4] in my local test. This linker flag would warn if any input object > does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI. We got the following > list: > > ``` > src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S > ... Marked as reviewed by eliu (Committer). src/hotspot/cpu/aarch64/copy_aarch64.hpp line 67: > 65: " .align 5;\n" \ > 66: "0:" \ > 67: " hint #0x24; // bti j\n" \ LGTM. Only a few indent issues. ------------- PR Review: https://git.openjdk.org/jdk/pull/20491#pullrequestreview-2239522676 PR Review Comment: https://git.openjdk.org/jdk/pull/20491#discussion_r1717797701 From jkarthikeyan at openjdk.org Thu Aug 15 03:03:49 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 15 Aug 2024 03:03:49 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v2] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 06:29:03 GMT, Jatin Bhateja wrote: > its usage in existing patch is limited to [type comparison.](https://github.com/openjdk/jdk/pull/20507/files#diff-3559dcf23b719805be5fd06fd5c1851dbd8f53e47afe6d99cba13a3de0ebc6b2R1542) Ah, that makes sense to me. I took a closer look and I think since the patch is creating a `VectorReinterpret` node after unsigned vector nodes, it might be able to avoid cases where the type might get filtered/joined, like with `PhiNode::Value`. That might lead to errors since `empty_type->filter(other_type) == TOP`. It's unfortunate that it's not really possible to disambiguate between an empty type and an unsigned range, which would allow us to solve this elegantly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1717820103 From rrich at openjdk.org Thu Aug 15 04:44:48 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 15 Aug 2024 04:44:48 GMT Subject: RFR: 8338365: [PPC64, s390] Out-of-bounds array access in secondary_super_cache In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 08:58:20 GMT, Amit Kumar wrote: > Port for s390x and PPC for the bug: [JDK-8337958](https://bugs.openjdk.org/browse/JDK-8337958), Out-of-bounds array access in secondary_super_cache Looks good. Thanks, Richard. ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20578#pullrequestreview-2239641880 From aboldtch at openjdk.org Thu Aug 15 06:01:02 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 15 Aug 2024 06:01:02 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v15] In-Reply-To: References: <1fb1K_XEPWOFdZohDLyQmgXhulJfSelL9Ib0fpkmVFI=.c3beb140-6cb1-43db-ae6c-547d997c554b@github.com> Message-ID: On Wed, 14 Aug 2024 20:58:24 GMT, Daniel D. Daugherty wrote: >> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 341: >> >>> 339: >>> 340: ObjectMonitor* LightweightSynchronizer::get_or_insert_monitor_from_table(oop object, JavaThread* current, bool* inserted) { >>> 341: assert(LockingMode == LM_LIGHTWEIGHT, "must be"); >> >> Do you want to assert: `inserted != nullptr`? > > What was the resolution? I don't see a reply or a change here. _Must have missed pressing Comment_ The assert does not seem necessary, this member function is local to LightweightSynchronizer (private) and it will crash hard if it is called with a bad pointer for the out parameter. It could have been a reference type here instead. I am not sure what is the best when it comes to out parameters, all styles have pros and cons. We seem to use all different combinations throughout hotspot. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1717996898 From stefank at openjdk.org Thu Aug 15 06:11:13 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 15 Aug 2024 06:11:13 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 01:35:29 GMT, David Holmes wrote: > I agree with @tstuefe here. MemFlag and MemType sound far too general when this is NMT specific. Yes, it is not very specific, but it also not hard to learn and then know what this type is all about. > My preference to keep the "flags" part of the type was to avoid needing to rename many parameters. The usage of MEMFLAGS flags is quite extensive. I would not want to see a partial approach here where we end up with a non-flag type name but a flag variable name. I think we should rename all the 'flags' variables in the same change. > NMTTypeFlag would I hope satisfy Thomas's requirement and avoid the need to do variable renames. * To me, that's really not an appealing name for a type that is going to be used by all parts of the HotSpot code base. I much more prefer a shorter name that is easy on the eyes, then a longer and more specific name that is an eyesore. * And even as a longer name, it doesn't tell what it is going to be used for. What is a Native Memory Tracker Type Flag? * I don't want us to select a bad name so that we don't have to change the variable names. * Whatever we choose we also need to consider the mt prefix of things like mtGC, mtClass, etc. With all that said, I hope it is clear that we various reviewers have different opinions around this and that we don't integrate this before we have some kind of consensus about the way forward with this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2290734669 From aboldtch at openjdk.org Thu Aug 15 06:12:22 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 15 Aug 2024 06:12:22 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v20] In-Reply-To: References: Message-ID: > When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. > > This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. > > A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). > > This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). > > Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. > > # Cleanups > > Cleaned up displaced header usage for: > * BasicLock > * Contains some Zero changes > * Renames one exported JVMCI field > * ObjectMonitor > * Updates comments and tests consistencies > > # Refactoring > > `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. > > The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. > > _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ > > # LightweightSynchronizer > > Working on adapting and incorporating the following section as a comment in the source code > > ## Fast Locking > > CAS on locking bits in markWord. > 0b00 (Fast Locked) <--> 0b01 (Unlocked) > > When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. > > If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf... Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Remove newline ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20067/files - new: https://git.openjdk.org/jdk/pull/20067/files/4d67422f..e287445d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=18-19 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20067.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067 PR: https://git.openjdk.org/jdk/pull/20067 From aboldtch at openjdk.org Thu Aug 15 06:12:22 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 15 Aug 2024 06:12:22 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v19] In-Reply-To: References: <4g1C6IMPhW60t2IygFaP6KBhaOtr0VEepPM2D6fWMPE=.91c3d202-a5e1-4669-9846-e89e6a360e91@github.com> Message-ID: On Wed, 14 Aug 2024 21:02:47 GMT, Daniel D. Daugherty wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Use jdk.test.lib.Utils.getRandomInstance() > > src/hotspot/share/runtime/synchronizer.hpp line 126: > >> 124: >> 125: static bool quick_notify(oopDesc* obj, JavaThread* current, bool All); >> 126: > > Why add the extra blank line? The `quick` grouping seemed inconsistent. Was probably thinking about moving down `quick_enter_legacy`. But in the end, there is no obvious grouping. The newline must have slipped through, I'll revert it. Suggestion: ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1718006715 From dholmes at openjdk.org Thu Aug 15 06:33:53 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 15 Aug 2024 06:33:53 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 06:07:49 GMT, Stefan Karlsson wrote: > What is a Native Memory Tracker Type Flag? It is a flag telling us the type of native memory being tracked. > Whatever we choose we also need to consider the mt prefix of things like mtGC, mtClass, etc. And what does that stand for: memory type? memory tracker? Arguably they should have been nmtGC etc. > I think we should rename all the 'flags' variables in the same change. Okay. That's a big change but I'd prefer it to any half-way measures. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2290754399 From jbhateja at openjdk.org Thu Aug 15 07:02:53 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 15 Aug 2024 07:02:53 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v2] In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 03:01:00 GMT, Jasmine Karthikeyan wrote: >> @jaskarth , its usage in existing patch is limited to [type comparison.](https://github.com/openjdk/jdk/pull/20507/files#diff-3559dcf23b719805be5fd06fd5c1851dbd8f53e47afe6d99cba13a3de0ebc6b2R1542). >> >> My plan is to address intrinsification of new core lib APIs, associated value range folding optimization (since unsigned numbers have different value range of [0, MAX_VALUE) vs signed [-MIN_VALUE/2, +MAX_VALUE/2) numbers) and auto-vectorization in a follow up patch. >> >> **Notes on C2 type system:** >> Unlike Type::FLOAT, integral type ranges are specified using _lo and _hi value range, these ranges are pruned using flow functions associated with each operation IR. Constraining the value ranges allows logic pruning, e.g. in1[TypeInt] & 0x7FFFFFFF will chop off -ve values ranges from in1, thus a constrol structure like . `if (in1 < 0) { true_path ; } else { false_path; } ` which uses in1 as a flow condition will sweepout the true path. >> >> C2 type system only maintains value ranges for integral types i.e. long and int, any sub-word type which as per JVM specification has an int storage "word" only constrains the value range of TypeInt. >> >> A type which represent a constant value has same _hi and _lo value. >> >> Floating point types Type::FLOAT / DOUBLE cannot maintain upper / lower value ranges due to rounding constraints. >> Thus C2 type system maintains a separate type TypeF and TypeD which are singletons and represent a constant value. > >> its usage in existing patch is limited to [type comparison.](https://github.com/openjdk/jdk/pull/20507/files#diff-3559dcf23b719805be5fd06fd5c1851dbd8f53e47afe6d99cba13a3de0ebc6b2R1542) > > Ah, that makes sense to me. I took a closer look and I think since the patch is creating a `VectorReinterpret` node after unsigned vector nodes, it might be able to avoid cases where the type might get filtered/joined, like with `PhiNode::Value`. That might lead to errors since `empty_type->filter(other_type) == TOP`. It's unfortunate that it's not really possible to disambiguate between an empty type and an unsigned range, which would allow us to solve this elegantly. @jaskarth , Central idea behind introducing VectorReinterpretNode after unsigned vector IR is to facilitate unboxing-boxing optimization, this explicit reinterpretation ensures type compatibility between value being boxed and box type which is always signed vector types. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1718044262 From kbarrett at openjdk.org Thu Aug 15 07:20:58 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 15 Aug 2024 07:20:58 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 06:07:49 GMT, Stefan Karlsson wrote: > > I agree with @tstuefe here. MemFlag and MemType sound far too general when this is NMT specific. > > Yes, it is not very specific, but it also not hard to learn and then know what this type is all about. > > > My preference to keep the "flags" part of the type was to avoid needing to rename many parameters. The usage of MEMFLAGS flags is quite extensive. I would not want to see a partial approach here where we end up with a non-flag type name but a flag variable name. > > I think we should rename all the 'flags' variables in the same change. > > > NMTTypeFlag would I hope satisfy Thomas's requirement and avoid the need to do variable renames. > > * To me, that's really not an appealing name for a type that is going to be used by all parts of the HotSpot code base. I much more prefer a shorter name that is easy on the eyes, then a longer and more specific name that is an eyesore. > > * And even as a longer name, it doesn't tell what it is going to be used for. What is a Native Memory Tracker Type Flag? > > * I don't want us to select a bad name so that we don't have to change the variable names. > > * Whatever we choose we also need to consider the mt prefix of things like mtGC, mtClass, etc. > > > With all that said, I hope it is clear that we various reviewers have different opinions around this and that we don't integrate this before we have some kind of consensus about the way forward with this. I strongly agree with everything @stefank said above. I also think some of the suggestions that have been offered are not improvements from the status quo, or not enough to be worth the code churn. The only thing I have against MemType is that "type" is pretty overloaded. I spent some time with a thesaurus, but the only alternative I came up with that I liked was MemGroup, but it fails the "mt" prefix consideration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2290801845 From stefank at openjdk.org Thu Aug 15 07:48:48 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 15 Aug 2024 07:48:48 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 06:31:14 GMT, David Holmes wrote: > > Whatever we choose we also need to consider the mt prefix of things like mtGC, mtClass, etc. > > And what does that stand for: memory type? memory tracker? Arguably they should have been nmtGC etc. The memflags.hpp file says that it's a memory type: +#define MEMORY_TYPES_DO(f) \ + /* Memory type by sub systems. It occupies lower byte. */ \ + f(mtJavaHeap, "Java Heap") /* Java heap */ \ My guess is that at some point there was an incomplete rename from "memory type". ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2290832943 From aph at openjdk.org Thu Aug 15 08:01:49 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 15 Aug 2024 08:01:49 GMT Subject: RFR: 8338365: [PPC64, s390] Out-of-bounds array access in secondary_super_cache In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 08:58:20 GMT, Amit Kumar wrote: > Port for s390x and PPC for the bug: [JDK-8337958](https://bugs.openjdk.org/browse/JDK-8337958), Out-of-bounds array access in secondary_super_cache src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3323: > 3321: > 3322: // The bitmap is full to bursting. > 3323: z_cghi(r_array_length, Klass::SECONDARY_SUPERS_BITMAP_FULL - 2); Suggestion: z_chi(r_array_length, Klass::SECONDARY_SUPERS_BITMAP_FULL - 2); This probably doesn't matter, but it's a 32-bit length. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20578#discussion_r1718091900 From kbarrett at openjdk.org Thu Aug 15 08:01:50 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 15 Aug 2024 08:01:50 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 07:46:34 GMT, Stefan Karlsson wrote: > > > Whatever we choose we also need to consider the mt prefix of things like mtGC, mtClass, etc. > > > > > > And what does that stand for: memory type? memory tracker? Arguably they should have been nmtGC etc. > > The memflags.hpp file says that it's a memory type: > > ``` > +#define MEMORY_TYPES_DO(f) \ > + /* Memory type by sub systems. It occupies lower byte. */ \ > + f(mtJavaHeap, "Java Heap") /* Java heap */ \ > ``` > > My guess is that at some point there was an incomplete rename from "memory type". Note that we used to have `typedef MemType MEMFLAGS;` ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2290846873 From kbarrett at openjdk.org Thu Aug 15 08:01:51 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 15 Aug 2024 08:01:51 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: <-NKUnExfzAqwhaGAtlJQD81b6wtoEnOhaXF4R779--s=.6d31a953-da4f-4875-8950-6971148ff639@github.com> On Wed, 7 Aug 2024 17:13:06 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. > > There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) The suggestion of "nmtGC" from @daholmes initially looked somewhat appealing to me. But that then suggests "NMTType" or (better?) "NMTGroup" or something like that. But I don't much like the look of or typing those acronyms. (Note that NMTGroup is HotSpot style, not NmtGroup.) So I'm still preferring "MemType". ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2290847405 From aph at openjdk.org Thu Aug 15 09:05:53 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 15 Aug 2024 09:05:53 GMT Subject: RFR: 8337958: Out-of-bounds array access in secondary_super_cache In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 08:50:51 GMT, Gui Cao wrote: > @theRealAph Hi, I have prepared a small change for riscv platform. Can we take a ride? Thanks. Just for my curiosity rather than an real concern, why is this not a `bgt` ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20483#issuecomment-2290930408 From mdoerr at openjdk.org Thu Aug 15 09:09:48 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 15 Aug 2024 09:09:48 GMT Subject: RFR: 8338365: [PPC64, s390] Out-of-bounds array access in secondary_super_cache In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 08:58:20 GMT, Amit Kumar wrote: > Port for s390x and PPC for the bug: [JDK-8337958](https://bugs.openjdk.org/browse/JDK-8337958), Out-of-bounds array access in secondary_super_cache Tier 1 has passed on linux and AIX on PPC64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20578#issuecomment-2290936633 From mdoerr at openjdk.org Thu Aug 15 09:09:49 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 15 Aug 2024 09:09:49 GMT Subject: RFR: 8338365: [PPC64, s390] Out-of-bounds array access in secondary_super_cache In-Reply-To: References: Message-ID: <7tNqkzotNx_JCIfpJoT1mE37F8wavJ_Ukpf05-WvDPU=.7801e973-4c1e-485a-baa7-73011058ba87@github.com> On Thu, 15 Aug 2024 07:59:20 GMT, Andrew Haley wrote: >> Port for s390x and PPC for the bug: [JDK-8337958](https://bugs.openjdk.org/browse/JDK-8337958), Out-of-bounds array access in secondary_super_cache > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3323: > >> 3321: >> 3322: // The bitmap is full to bursting. >> 3323: z_cghi(r_array_length, Klass::SECONDARY_SUPERS_BITMAP_FULL - 2); > > Suggestion: > > z_chi(r_array_length, Klass::SECONDARY_SUPERS_BITMAP_FULL - 2); > > This probably doesn't matter, but it's a 32-bit length. Correct, chi would be cleaner. cghi works too, because the length is loaded as 32 bit value. (The length is loaded as unsigned 32 bit with zero extend. Not sure if this is ideal, but negative length should not occur AFAIK.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20578#discussion_r1718162545 From shade at openjdk.org Thu Aug 15 09:47:53 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 15 Aug 2024 09:47:53 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v3] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 16:48:15 GMT, Vladimir Ivanov wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: >> >> - Use TestFramework bootclasspath instead of develop option >> - Merge branch 'master' into JDK-8333791-stable-field-barrier >> - Merge branch 'master' into JDK-8333791-stable-field-barrier >> - Merge branch 'master' into JDK-8333791-stable-field-barrier >> - Merge branch 'master' into JDK-8333791-stable-field-barrier >> - Variant 2: Only final-field like semantics for stable inits >> - Variant 3: Handle everything, including reads by compilers > > Looks good. Actually, maybe anyone wants to run it through their testing pipelines as well? @iwanowww, @chhagedorn, @TobiHartmann? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2290984998 From shade at openjdk.org Thu Aug 15 10:00:23 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 15 Aug 2024 10:00:23 GMT Subject: RFR: 8293650: Shenandoah: Support archived heap objects [v5] In-Reply-To: <-VMDmWFhJnNdTmb5O9bb2l64i47hodl2ngHAjjlheMI=.5e6d83aa-c053-4456-914b-10b4903d685b@github.com> References: <-VMDmWFhJnNdTmb5O9bb2l64i47hodl2ngHAjjlheMI=.5e6d83aa-c053-4456-914b-10b4903d685b@github.com> Message-ID: On Thu, 15 Aug 2024 02:11:53 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2503: >> >>> 2501: // We need to make sure it looks like regular allocation to the rest of GC. >>> 2502: >>> 2503: // CDS code would guarantee no objects straggle multiple regions, as long as >> >> straggle -> straddle > > Noob question: can one prove that CDS archived objects will never exceed the size of a region? Or does the archive writer refuse to create a dump in that case? I will assume the latter for the purposes of the following remark: > > When you walk over the objects allocated in the verifier loop further below at lines 2539-2543, I wonder if you should check that you always have a legitimate object starting and ending each alignment boundary, just to be super-paranoid. Typo fixed. Verifier checks objects are within the regions: https://github.com/openjdk/jdk/blob/da7311bbe37c2b9632b117d52a77c659047820b7/src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp#L143-L144 I think we reasoned that check is too expensive for the regular `shenandoah_assert_correct`. But there is a less used `shenandoah_assert_in_correct_region` that we can co-opt for this check. For testing, I disabled `MIN_ALIGNMENT` bailout, and caught the expected failure: # # Internal Error (/Users/shipilev/Work/shipilev-jdk/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp:2541), pid=99715, tid=9987 # Error: Shenandoah assert_in_correct_region failed; Object end should be within the active area of the region Referenced from: no interior location recorded (probably a plain heap scan, or detached oop) Object: 0x00000007c003ffe8 - safe print, no details region: | 0|R |BTE 7c0000000, 7c0040000, 7c0040000|TAMS 7c0000000|UWM 7c0000000|U 256K|T 0B|G 0B|S 256K|L 0B|CP 0 Raw heap memory: 0x00000007c003ffc8: 6f6e2074 61682074 61206576 7365206e t not have an es 0x00000007c003ffd8: 616d6974 20646574 61727564 6e6f6974 timated duration 0x00000007c003ffe8: 00000001 00000000 001b64c8 0000001b .........d...... 0x00000007c003fff8: 74696e49 206c6169 Initial Looking more at this, I think there might be an interaction with humongous threshold if it is smaller than region size. If CDS allocates the object that is larger than our humongous threshold, this new code would effectively allow a "humongous" object to reside in regular region. CDS would not know about this, because it thinks the largest possible object is `MIN_GC_REGION_ALIGNMENT`. I think we would just remove the tunable to avoid any corner cases: https://bugs.openjdk.org/browse/JDK-8338444. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20468#discussion_r1718210279 From shade at openjdk.org Thu Aug 15 10:00:23 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 15 Aug 2024 10:00:23 GMT Subject: RFR: 8293650: Shenandoah: Support archived heap objects [v6] In-Reply-To: References: Message-ID: <6Y1uh2FMg3cWFZOl09XPnTBf3KyR81cMdwkwlReWA3A=.49b9fe0a-433d-4ae3-afba-5363e077a7db@github.com> > This implements CDS Java heap loading for Shenandoah. There are peculiarities with how CDS loads objects: it basically asks for a contiguous block of memory, fills it out, potentially relocating the objects. This gets interesting when a single Shenandoah region cannot contain the entirety of the load. See the implementation for gory details. > > Current implementation would work well only with Shenandoah heap regions >= 1M, in other words, with the heaps >=2G. It would be better if we trim down the min alignment, thus unblocking smaller heaps. It is not necessary to do so in this PR, so I track that work separately: [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828). > > Additional testing: > - [x] New test > - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` > - [x] Same as above, but `MIN_GC_REGION_ALIGNMENT` manually dropped to 256K (mimics [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828)) Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: More review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20468/files - new: https://git.openjdk.org/jdk/pull/20468/files/d8984aa4..1d2d45da Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20468&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20468&range=04-05 Stats: 12 lines in 2 files changed: 8 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20468.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20468/head:pull/20468 PR: https://git.openjdk.org/jdk/pull/20468 From aph at openjdk.org Thu Aug 15 10:27:48 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 15 Aug 2024 10:27:48 GMT Subject: RFR: 8338365: [PPC64, s390] Out-of-bounds array access in secondary_super_cache In-Reply-To: <7tNqkzotNx_JCIfpJoT1mE37F8wavJ_Ukpf05-WvDPU=.7801e973-4c1e-485a-baa7-73011058ba87@github.com> References: <7tNqkzotNx_JCIfpJoT1mE37F8wavJ_Ukpf05-WvDPU=.7801e973-4c1e-485a-baa7-73011058ba87@github.com> Message-ID: On Thu, 15 Aug 2024 09:05:13 GMT, Martin Doerr wrote: >> src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3323: >> >>> 3321: >>> 3322: // The bitmap is full to bursting. >>> 3323: z_cghi(r_array_length, Klass::SECONDARY_SUPERS_BITMAP_FULL - 2); >> >> Suggestion: >> >> z_chi(r_array_length, Klass::SECONDARY_SUPERS_BITMAP_FULL - 2); >> >> This probably doesn't matter, but it's a 32-bit length. > > Correct, chi would be cleaner. cghi works too, because the length is loaded as 32 bit value. (The length is loaded as unsigned 32 bit with zero extend. Not sure if this is ideal, but negative length should not occur AFAIK.) That's what I was thinking. To use anything other than `chi` is confusing to the reader. (Well, it was confusing to this reader, anyway.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20578#discussion_r1718235248 From chagedorn at openjdk.org Thu Aug 15 11:22:52 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 15 Aug 2024 11:22:52 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v3] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 10:04:50 GMT, Aleksey Shipilev wrote: >> See bug for more discussion. >> >> Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193 >> >> A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice: >> https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160 >> >> AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone. >> >> I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields. >> >> Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit. >> >> C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner. >> >> Additional testing: >> - [x] New IR tests >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x... > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Use TestFramework bootclasspath instead of develop option > - Merge branch 'master' into JDK-8333791-stable-field-barrier > - Merge branch 'master' into JDK-8333791-stable-field-barrier > - Merge branch 'master' into JDK-8333791-stable-field-barrier > - Merge branch 'master' into JDK-8333791-stable-field-barrier > - Variant 2: Only final-field like semantics for stable inits > - Variant 3: Handle everything, including reads by compilers Already done, sorry forgot to report. Ran t1-4,hs_precheckin_comp,hs_comp_stress - looked good! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2291098065 From shade at openjdk.org Thu Aug 15 11:27:00 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 15 Aug 2024 11:27:00 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v3] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 10:04:50 GMT, Aleksey Shipilev wrote: >> See bug for more discussion. >> >> Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193 >> >> A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice: >> https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160 >> >> AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone. >> >> I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields. >> >> Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit. >> >> C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner. >> >> Additional testing: >> - [x] New IR tests >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x... > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Use TestFramework bootclasspath instead of develop option > - Merge branch 'master' into JDK-8333791-stable-field-barrier > - Merge branch 'master' into JDK-8333791-stable-field-barrier > - Merge branch 'master' into JDK-8333791-stable-field-barrier > - Merge branch 'master' into JDK-8333791-stable-field-barrier > - Variant 2: Only final-field like semantics for stable inits > - Variant 3: Handle everything, including reads by compilers Awesome, thanks! Here it goes, then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2291101221 From shade at openjdk.org Thu Aug 15 11:27:02 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 15 Aug 2024 11:27:02 GMT Subject: Integrated: 8333791: Fix memory barriers for @Stable fields In-Reply-To: References: Message-ID: On Mon, 10 Jun 2024 18:05:09 GMT, Aleksey Shipilev wrote: > See bug for more discussion. > > Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193 > > A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice: > https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160 > > AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone. > > I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields. > > Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit. > > C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner. > > Additional testing: > - [x] New IR tests > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, jcstre... This pull request has now been integrated. Changeset: 74fdd686 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/74fdd6868d3f71d44ef9f71a0ca9506c04d39148 Stats: 1067 lines in 14 files changed: 1028 ins; 20 del; 19 mod 8333791: Fix memory barriers for @Stable fields Reviewed-by: liach, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/19635 From aboldtch at openjdk.org Thu Aug 15 12:00:57 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 15 Aug 2024 12:00:57 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6] In-Reply-To: References: Message-ID: On Fri, 12 Jul 2024 10:12:32 GMT, Roman Kennke wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Update arguments.cpp > > Is there a plan to get rid of the UseObjectMonitorTable flag in a future release? Ideally we would have one fast-locking implementation (LW locking) with one OM mapping (+UOMT), right? @rkennke Is there any issue/blocker you want resolved before integrating? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20067#issuecomment-2291141036 From mgronlun at openjdk.org Thu Aug 15 12:10:49 2024 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 15 Aug 2024 12:10:49 GMT Subject: RFR: 8338314: JFR: Split JFRCheckpoint VM operation In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 15:56:46 GMT, Aleksey Shipilev wrote: > Investigating JFR crashes is a bit tedious, as Events section in `hs_err` shows just: > > > Event: 3.006 Executing VM operation: JFRCheckpoint > Event: 3.006 Executing VM operation: JFRCheckpoint done > > > What is that `JFRCheckpoint` doing is unclear, because it can do two separate things: clear or write. It would be good if we could disambiguate the two. Since there are only two flavors of checkpoint, I think we can just split the VMOp into two more precisely named ones, so it gives us e.g.: > > > Event: 2.462 Executing VM operation: JFRSafepointClear > Event: 2.463 Executing VM operation: JFRSafepointClear done Looks good. ------------- Marked as reviewed by mgronlun (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20570#pullrequestreview-2240242758 From coleenp at openjdk.org Thu Aug 15 13:07:57 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 15 Aug 2024 13:07:57 GMT Subject: RFR: 8338447: Remove InstanceKlass::_is_marked_dependent Message-ID: Please review this trivial change. I ran the SA tests to make sure the is_marked_dependent flag in vmStructs wasn't used there. ------------- Commit messages: - 8338447: Remove InstanceKlass::_is_marked_dependent Changes: https://git.openjdk.org/jdk/pull/20595/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20595&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338447 Stats: 5 lines in 2 files changed: 0 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20595.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20595/head:pull/20595 PR: https://git.openjdk.org/jdk/pull/20595 From shade at openjdk.org Thu Aug 15 13:21:49 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 15 Aug 2024 13:21:49 GMT Subject: RFR: 8338447: Remove InstanceKlass::_is_marked_dependent In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 13:04:03 GMT, Coleen Phillimore wrote: > Please review this trivial change. I ran the SA tests to make sure the is_marked_dependent flag in vmStructs wasn't used there. I wonder what removed the last use of it. But this is fine. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20595#pullrequestreview-2240379919 From coleenp at openjdk.org Thu Aug 15 13:27:48 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 15 Aug 2024 13:27:48 GMT Subject: RFR: 8338447: Remove InstanceKlass::_is_marked_dependent In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 13:04:03 GMT, Coleen Phillimore wrote: > Please review this trivial change. I ran the SA tests to make sure the is_marked_dependent flag in vmStructs wasn't used there. It was moved into InstanceKlassFlags::_status field which has atomic setters and getters. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20595#issuecomment-2291265502 From yzheng at openjdk.org Thu Aug 15 13:47:58 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Thu, 15 Aug 2024 13:47:58 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v20] In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 06:12:22 GMT, Axel Boldt-Christmas wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Remove newline src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 677: > 675: > 676: // Check for match. > 677: cmpptr(obj, Address(t)); `Address(t)` can be cached (in rax?) and reused in the subsequent `comptr` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1718426158 From aboldtch at openjdk.org Thu Aug 15 14:34:58 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 15 Aug 2024 14:34:58 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v20] In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 13:45:11 GMT, Yudi Zheng wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove newline > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 677: > >> 675: >> 676: // Check for match. >> 677: cmpptr(obj, Address(t)); > > `Address(t)` can be cached (in rax?) and reused in the subsequent `comptr` It can. Just a quick test with `LockUnlock.testInflated` ?benchmarks shows a ~5% regression when using a mov of [T] into rax and then using rax for cmp. (On my current AMD machine). But worth pursuing in a more extensive followup. I'll note this in the followup RFEs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1718488820 From coleenp at openjdk.org Thu Aug 15 15:14:51 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 15 Aug 2024 15:14:51 GMT Subject: RFR: 8338447: Remove InstanceKlass::_is_marked_dependent In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 13:04:03 GMT, Coleen Phillimore wrote: > Please review this trivial change. I ran the SA tests to make sure the is_marked_dependent flag in vmStructs wasn't used there. Can I get a 'trival' once GHA finished? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20595#issuecomment-2291492259 From shade at openjdk.org Thu Aug 15 15:14:52 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 15 Aug 2024 15:14:52 GMT Subject: RFR: 8338447: Remove InstanceKlass::_is_marked_dependent In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 13:04:03 GMT, Coleen Phillimore wrote: > Please review this trivial change. I ran the SA tests to make sure the is_marked_dependent flag in vmStructs wasn't used there. Yes, trivial. Feel free to integrate, once builds clear. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20595#issuecomment-2291495528 From rkennke at openjdk.org Thu Aug 15 15:28:57 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 15 Aug 2024 15:28:57 GMT Subject: RFR: 8293650: Shenandoah: Support archived heap objects [v6] In-Reply-To: <6Y1uh2FMg3cWFZOl09XPnTBf3KyR81cMdwkwlReWA3A=.49b9fe0a-433d-4ae3-afba-5363e077a7db@github.com> References: <6Y1uh2FMg3cWFZOl09XPnTBf3KyR81cMdwkwlReWA3A=.49b9fe0a-433d-4ae3-afba-5363e077a7db@github.com> Message-ID: On Thu, 15 Aug 2024 10:00:23 GMT, Aleksey Shipilev wrote: >> This implements CDS Java heap loading for Shenandoah. There are peculiarities with how CDS loads objects: it basically asks for a contiguous block of memory, fills it out, potentially relocating the objects. This gets interesting when a single Shenandoah region cannot contain the entirety of the load. See the implementation for gory details. >> >> Current implementation would work well only with Shenandoah heap regions >= 1M, in other words, with the heaps >=2G. It would be better if we trim down the min alignment, thus unblocking smaller heaps. It is not necessary to do so in this PR, so I track that work separately: [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828). >> >> Additional testing: >> - [x] New test >> - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` >> - [x] Same as above, but `MIN_GC_REGION_ALIGNMENT` manually dropped to 256K (mimics [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828)) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > More review comments Looks good to me! ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20468#pullrequestreview-2240648276 From fgao at openjdk.org Thu Aug 15 15:32:28 2024 From: fgao at openjdk.org (Fei Gao) Date: Thu, 15 Aug 2024 15:32:28 GMT Subject: RFR: 8337536: AArch64: Enable BTI branch protection for runtime part [v3] In-Reply-To: References: Message-ID: <7JRzzIvH26CZPYCX76eWBbQSYUhMDnOqRufDtWaIXq8=.d3270022-4933-4fa7-828a-f57dbc5b8a46@github.com> > This patch enables BTI branch protection for runtime part on Linux/aarch64 platform. > > Motivation > > 1. Since Fedora 33, glibc+kernel are PAC/BTI enabled by default. User-level packages can gain additional hardening by compiling with the GCC/Clang flag `-mbranch-protection=flag`. See [1]. > > 2. In JDK-8277204 [2], `--enable-branch-protection` was introduced as one VM configure flag, which would pass `-mbranch-protection=standard` compilation flags to all c/c++ files. Note that `standard` turns on both `pac-ret` and `bti` branch protections. For more details about code reuse attacks and hardware-assisted branch protections on AArch64, see [3]. > > However, we checked the `.note.gnu.property` section of all the shared libraries under `jdk/lib` on Fedora 40, and found that only libjvm.so didn't set these two target feature bits: > > > GNU_PROPERTY_AARCH64_FEATURE_1_BTI > GNU_PROPERTY_AARCH64_FEATURE_1_PAC > > > Note-1: BTI is an all or nothing property for a link unit [4]. That is, libjvm.so is not BTI-enabled. > > Note-2: PAC bit in `.note.gnu.property` section is used to protect `.got.plt` table. It's independent of whether the relocatable objects use PAC or not. > > Goal > > Hence, this patch aims to set PAC/BTI feature bits of the `.note.gnu.property` section for libjvm.so. > > Implementation > > Task-1: find out the problematic input objects > > From [5], "Static linkers processing ELF relocatable objects must set the feature bit in the output object or image only if all the input objects have the corresponding feature bit set." Hence we suspect that the root cause is probably that the PAC/BTI feature bits are not set only for some input objects of libjvm.so. > > In order to find out these inputs, we passed `--force-bti` linker flag [4] in my local test. This linker flag would warn if any input object does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI. We got the following list: > > > src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S > src/hotspot/os_cpu/linux_aarch64/copy_linux_aarch64.S > src/hotspot/os_cpu/linux_aarch64/safefetch_linux_aarch64.S > src/hotspot/os_cpu/linux_aarch64/threadLS_linux_aarch64.S > > > Task-2: add `.note.gnu.property` section for these assembly files > > As mentioned in Motivation-2 part, `-mbranch-protection=standard` is passed to compile c/c++ files but these assembly files are missed. > > In this patch, we also pass `-mbranch-protection=standard` flag to assembler (See the update in flags-cflags.m4 and flags-other.m4), and add `.note.gnu.property` section at the end... Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Fix indentation - Merge branch 'master' into enable-bti-runtime - Clean up makefile - Merge branch 'master' into enable-bti-runtime - 8337536: AArch64: Enable BTI branch protection for runtime part This patch enables BTI branch protection for runtime part on Linux/aarch64 platform. Motivation 1. Since Fedora 33, glibc+kernel are PAC/BTI enabled by default. User-level packages can gain additional hardening by compiling with the GCC/Clang flag `-mbranch-protection=flag`. See [1]. 2. In JDK-8277204 [2], `--enable-branch-protection` was introduced as one VM configure flag, which would pass `-mbranch-protection=standard` compilation flags to all c/c++ files. Note that `standard` turns on both `pac-ret` and `bti` branch protections. For more details about code reuse attacks and hardware-assisted branch protections on AArch64, see [3]. However, we checked the `.note.gnu.property` section of all the shared libraries under `jdk/lib` on Fedora 40, and found that only libjvm.so didn't set these two target feature bits: ``` GNU_PROPERTY_AARCH64_FEATURE_1_BTI GNU_PROPERTY_AARCH64_FEATURE_1_PAC ``` Note-1: BTI is an all or nothing property for a link unit [4]. That is, libjvm.so is not BTI-enabled. Note-2: PAC bit in `.note.gnu.property` section is used to protect `.got.plt` table. It's independent of whether the relocatable objects use PAC or not. Goal Hence, this patch aims to set PAC/BTI feature bits of the `.note.gnu.property` section for libjvm.so. Implementation Task-1: find out the problematic input objects From [5], "Static linkers processing ELF relocatable objects must set the feature bit in the output object or image only if all the input objects have the corresponding feature bit set." Hence we suspect that the root cause is probably that the PAC/BTI feature bits are not set only for some input objects of libjvm.so. In order to find out these inputs, we passed `--force-bti` linker flag [4] in my local test. This linker flag would warn if any input object does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI. We got the following list: ``` src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S src/hotspot/os_cpu/linux_aarch64/copy_linux_aarch64.S src/hotspot/os_cpu/linux_aarch64/safefetch_linux_aarch64.S src/hotspot/os_cpu/linux_aarch64/threadLS_linux_aarch64.S ``` Task-2: add `.note.gnu.property` section for these assembly files As mentioned in Motivation-2 part, `-mbranch-protection=standard` is passed to compile c/c++ files but these assembly files are missed. In this patch, we also pass `-mbranch-protection=standard` flag to assembler (See the update in flags-cflags.m4 and flags-other.m4), and add `.note.gnu.property` section at the end of these assembler files. With this change, we can see PAC/BTI feature bits in the final libjvm.so. Task-3: add BTI landing pads for hand written assembly In the local test on Fedora 40 with PAC/BTI-capable hardware, we got `SIGILL` error, which is one typical BTI error (branch target exception). The root cause is that we should add the missing BTI landing pads for hand written assembly in hotspot. File-1 copy_aarch64.hpp: It's a switch-case statement and we add `bti j` for these indirect jumps. File-2 atomic_linux_aarch64.S: We add landings pads `bti c` at the function entries. File-3 copy_linux_aarch64.S: There is no need to add `bti c` at the function entries since they are called via `bl`. And we should handle the indirect jumps. File-4 safefetch_linux_aarch64.S: Similar to file-3, there is no need to handle these function entries. File-5 threadLS_linux_aarch64.S: No need to handle the function entry because `paciasp` can act as the landing pad. Evaluation 1. jtreg test We ran tier 1-3 jtreg tests on Fedora 40 + GCC 14 + the following AArch64 hardware and all tests passed. ``` 1. w/o PAC and w/o BTI 2. w/ PAC and w/o BTI 3. w/ PAC and w/ BTI ``` We also ran the jtreg tests on Fedora 40 + Clang 18 + hardware w/ PAC and w/ BTI. The test passed too. 2. code size We got about 2% code size increase before and after `--enbale-branch-protection` is used. This code size change looks reasonable. See the evaluation on glibc [6]. [1] https://fedoraproject.org/wiki/Changes/Aarch64_PointerAuthentication [2] https://bugs.openjdk.org/browse/JDK-8277204 [3] https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/code-reuse-attacks-the-compiler-story [4] https://reviews.llvm.org/D62609 [5] https://github.com/ARM-software/abi-aa/blob/2a70c42d62e9c3eb5887fa50b71257f20daca6f9/aaelf64/aaelf64.rst#program-property [6] https://developer.arm.com/documentation/102433/0100/Applying-these-techniques-to-real-code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20491/files - new: https://git.openjdk.org/jdk/pull/20491/files/114953da..06c6e234 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20491&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20491&range=01-02 Stats: 5391 lines in 320 files changed: 3562 ins; 739 del; 1090 mod Patch: https://git.openjdk.org/jdk/pull/20491.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20491/head:pull/20491 PR: https://git.openjdk.org/jdk/pull/20491 From fgao at openjdk.org Thu Aug 15 15:32:29 2024 From: fgao at openjdk.org (Fei Gao) Date: Thu, 15 Aug 2024 15:32:29 GMT Subject: RFR: 8337536: AArch64: Enable BTI branch protection for runtime part [v2] In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 02:26:10 GMT, Eric Liu wrote: >> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Clean up makefile >> - Merge branch 'master' into enable-bti-runtime >> - 8337536: AArch64: Enable BTI branch protection for runtime part >> >> This patch enables BTI branch protection for runtime part on >> Linux/aarch64 platform. >> >> Motivation >> >> 1. Since Fedora 33, glibc+kernel are PAC/BTI enabled by default. >> User-level packages can gain additional hardening by compiling with the >> GCC/Clang flag `-mbranch-protection=flag`. See [1]. >> >> 2. In JDK-8277204 [2], `--enable-branch-protection` was introduced as >> one VM configure flag, which would pass `-mbranch-protection=standard` >> compilation flags to all c/c++ files. Note that `standard` turns on both >> `pac-ret` and `bti` branch protections. For more details about code >> reuse attacks and hardware-assisted branch protections on AArch64, see >> [3]. >> >> However, we checked the `.note.gnu.property` section of all the shared >> libraries under `jdk/lib` on Fedora 40, and found that only libjvm.so >> didn't set these two target feature bits: >> >> ``` >> GNU_PROPERTY_AARCH64_FEATURE_1_BTI >> GNU_PROPERTY_AARCH64_FEATURE_1_PAC >> ``` >> >> Note-1: BTI is an all or nothing property for a link unit [4]. That is, >> libjvm.so is not BTI-enabled. >> >> Note-2: PAC bit in `.note.gnu.property` section is used to protect >> `.got.plt` table. It's independent of whether the relocatable objects >> use PAC or not. >> >> Goal >> >> Hence, this patch aims to set PAC/BTI feature bits of the >> `.note.gnu.property` section for libjvm.so. >> >> Implementation >> >> Task-1: find out the problematic input objects >> >> From [5], "Static linkers processing ELF relocatable objects must set >> the feature bit in the output object or image only if all the input >> objects have the corresponding feature bit set." Hence we suspect that >> the root cause is probably that the PAC/BTI feature bits are not set >> only for some input objects of libjvm.so. >> >> In order to find out these inputs, we passed `--force-bti` linker flag >> [4] in my local test. This linker flag would warn if any input object >> does not have GNU_PROPERTY_AARCH64_FEATU... > > src/hotspot/cpu/aarch64/copy_aarch64.hpp line 67: > >> 65: " .align 5;\n" \ >> 66: "0:" \ >> 67: " hint #0x24; // bti j\n" \ > > LGTM. Only a few indent issues. @e1iu thanks for review! Updated it in the new commit :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20491#discussion_r1718569298 From ysr at openjdk.org Thu Aug 15 16:13:50 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 15 Aug 2024 16:13:50 GMT Subject: RFR: 8293650: Shenandoah: Support archived heap objects [v5] In-Reply-To: References: <-VMDmWFhJnNdTmb5O9bb2l64i47hodl2ngHAjjlheMI=.5e6d83aa-c053-4456-914b-10b4903d685b@github.com> Message-ID: On Thu, 15 Aug 2024 09:56:26 GMT, Aleksey Shipilev wrote: >> Noob question: can one prove that CDS archived objects will never exceed the size of a region? Or does the archive writer refuse to create a dump in that case? I will assume the latter for the purposes of the following remark: >> >> When you walk over the objects allocated in the verifier loop further below at lines 2539-2543, I wonder if you should check that you always have a legitimate object starting and ending each alignment boundary, just to be super-paranoid. > > Typo fixed. > > Verifier checks objects are within the regions: > https://github.com/openjdk/jdk/blob/da7311bbe37c2b9632b117d52a77c659047820b7/src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp#L143-L144 > > I think we reasoned that check is too expensive for the regular `shenandoah_assert_correct`. But there is a less used `shenandoah_assert_in_correct_region` that we can co-opt for this check. For testing, I disabled `MIN_ALIGNMENT` bailout, and caught the expected failure: > > > # > # Internal Error (/Users/shipilev/Work/shipilev-jdk/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp:2541), pid=99715, tid=9987 > # Error: Shenandoah assert_in_correct_region failed; Object end should be within the active area of the region > > Referenced from: > no interior location recorded (probably a plain heap scan, or detached oop) > > Object: > 0x00000007c003ffe8 - safe print, no details > region: | 0|R |BTE 7c0000000, 7c0040000, 7c0040000|TAMS 7c0000000|UWM 7c0000000|U 256K|T 0B|G 0B|S 256K|L 0B|CP 0 > > Raw heap memory: > 0x00000007c003ffc8: 6f6e2074 61682074 61206576 7365206e t not have an es > 0x00000007c003ffd8: 616d6974 20646574 61727564 6e6f6974 timated duration > 0x00000007c003ffe8: 00000001 00000000 001b64c8 0000001b .........d...... > 0x00000007c003fff8: 74696e49 206c6169 Initial > > > Looking more at this, I think there might be an interaction with humongous threshold if it is smaller than region size. If CDS allocates the object that is larger than our humongous threshold, this new code would effectively allow a "humongous" object to reside in regular region. CDS would not know about this, because it thinks the largest possible object is `MIN_GC_REGION_ALIGNMENT`. I think we would just remove the tunable to avoid any corner cases: https://bugs.openjdk.org/browse/JDK-8338444. Sounds good, thanks for checking this and tightening these issues! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20468#discussion_r1718632237 From shade at openjdk.org Thu Aug 15 17:27:23 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 15 Aug 2024 17:27:23 GMT Subject: RFR: 8293650: Shenandoah: Support archived heap objects [v7] In-Reply-To: References: Message-ID: > This implements CDS Java heap loading for Shenandoah. There are peculiarities with how CDS loads objects: it basically asks for a contiguous block of memory, fills it out, potentially relocating the objects. This gets interesting when a single Shenandoah region cannot contain the entirety of the load. See the implementation for gory details. > > Current implementation would work well only with Shenandoah heap regions >= 1M, in other words, with the heaps >=2G. It would be better if we trim down the min alignment, thus unblocking smaller heaps. It is not necessary to do so in this PR, so I track that work separately: [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828). > > Additional testing: > - [x] New test > - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` > - [x] Same as above, but `MIN_GC_REGION_ALIGNMENT` manually dropped to 256K (mimics [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828)) Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - Merge branch 'master' into JDK-8293650-shenandoah-archives - More review comments - Wrap the whole thing in CDS define - Work around 32-bit build failure - Review comments - Merge branch 'master' into JDK-8293650-shenandoah-archives - Move constant to separate class to unbreak Windows builds - Touchups in test - Basic implementation, works well, passes tests ------------- Changes: https://git.openjdk.org/jdk/pull/20468/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20468&range=06 Stats: 201 lines in 7 files changed: 189 ins; 6 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20468.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20468/head:pull/20468 PR: https://git.openjdk.org/jdk/pull/20468 From kbarrett at openjdk.org Thu Aug 15 17:45:53 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 15 Aug 2024 17:45:53 GMT Subject: RFR: 8338330: Fix -Wzero-as-null-pointer-constant warnings from THROW_XXX_0 In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 22:44:54 GMT, Dean Long wrote: >> Please review this change to add THROW_ARG_NULL and THROW_HANDLE_NULL macros, >> and use them instead of the corresponding THROW_XXX_0 macros in contexts where >> a pointer value is needed. This removes some -Wzero-as-null-pointer-constant >> warnings. >> >> There aren't many uses of either (only one of the HANDLE variant). An >> alternative would have been to change the callers to use the unsuffixed >> variant with a nullptr value argument. Adding the macros is consistent with >> other THROW variants, and seems a little bit more readable. >> >> Testing: mach5 tier1 > > Marked as reviewed by dlong (Reviewer). Thanks for reviews @dean-long , @dholmes-ora , and @shipilev ------------- PR Comment: https://git.openjdk.org/jdk/pull/20574#issuecomment-2291837493 From kbarrett at openjdk.org Thu Aug 15 17:45:54 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 15 Aug 2024 17:45:54 GMT Subject: Integrated: 8338330: Fix -Wzero-as-null-pointer-constant warnings from THROW_XXX_0 In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 22:19:45 GMT, Kim Barrett wrote: > Please review this change to add THROW_ARG_NULL and THROW_HANDLE_NULL macros, > and use them instead of the corresponding THROW_XXX_0 macros in contexts where > a pointer value is needed. This removes some -Wzero-as-null-pointer-constant > warnings. > > There aren't many uses of either (only one of the HANDLE variant). An > alternative would have been to change the callers to use the unsuffixed > variant with a nullptr value argument. Adding the macros is consistent with > other THROW variants, and seems a little bit more readable. > > Testing: mach5 tier1 This pull request has now been integrated. Changeset: 96550827 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/965508270ecd092019f7bea3a1605c5d9f19d81e Stats: 14 lines in 3 files changed: 3 ins; 0 del; 11 mod 8338330: Fix -Wzero-as-null-pointer-constant warnings from THROW_XXX_0 Reviewed-by: dlong, dholmes, shade ------------- PR: https://git.openjdk.org/jdk/pull/20574 From kbarrett at openjdk.org Thu Aug 15 17:53:59 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 15 Aug 2024 17:53:59 GMT Subject: RFR: 8338331: Fix -Wzero-as-null-pointer-constant warnings from CHECK_0 in jni.cpp In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 22:51:34 GMT, David Holmes wrote: >> Please review this change to some macros in jni.cpp. These macros were using >> CHECK_0 when calling functions that can "throw" exceptions. However, the >> return type involved is provided by a macro argument, and is a pointer type >> for some uses of the macros. This triggered -Wzero-as-null-pointer-constant >> warnings when enabled. >> >> To remove the warnings, these CHECK_0() uses are changed to CHECK_() with an >> argument expression that constructs and value-initializes a temporary of that >> return type, e.g. `ResultType{}`. Value-initialization of a scalar type is >> zero-initialization. Zero-initialization of a scalar type initializes it to >> the value obtained by convertion a literal 0 to that type. So a zero of the >> appropriate type for arithmetic types. For pointer types it's initialized to >> nullptr, without triggering the warning. >> >> Testing: mach5 tier1 > > Okay. Thanks Thanks for reviews @dholmes-ora and @shipilev ------------- PR Comment: https://git.openjdk.org/jdk/pull/20575#issuecomment-2291848592 From kbarrett at openjdk.org Thu Aug 15 17:54:00 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 15 Aug 2024 17:54:00 GMT Subject: Integrated: 8338331: Fix -Wzero-as-null-pointer-constant warnings from CHECK_0 in jni.cpp In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 22:41:19 GMT, Kim Barrett wrote: > Please review this change to some macros in jni.cpp. These macros were using > CHECK_0 when calling functions that can "throw" exceptions. However, the > return type involved is provided by a macro argument, and is a pointer type > for some uses of the macros. This triggered -Wzero-as-null-pointer-constant > warnings when enabled. > > To remove the warnings, these CHECK_0() uses are changed to CHECK_() with an > argument expression that constructs and value-initializes a temporary of that > return type, e.g. `ResultType{}`. Value-initialization of a scalar type is > zero-initialization. Zero-initialization of a scalar type initializes it to > the value obtained by convertion a literal 0 to that type. So a zero of the > appropriate type for arithmetic types. For pointer types it's initialized to > nullptr, without triggering the warning. > > Testing: mach5 tier1 This pull request has now been integrated. Changeset: 52d9d69d Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/52d9d69db5c1853445a95794c5bf21243aefa852 Stats: 10 lines in 1 file changed: 0 ins; 0 del; 10 mod 8338331: Fix -Wzero-as-null-pointer-constant warnings from CHECK_0 in jni.cpp Reviewed-by: dholmes, shade ------------- PR: https://git.openjdk.org/jdk/pull/20575 From rkennke at openjdk.org Thu Aug 15 18:08:02 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 15 Aug 2024 18:08:02 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v20] In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 06:12:22 GMT, Axel Boldt-Christmas wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Remove newline Looks good to me, thank you! Great work! ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20067#pullrequestreview-2241022506 From coleenp at openjdk.org Thu Aug 15 18:22:52 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 15 Aug 2024 18:22:52 GMT Subject: RFR: 8338447: Remove InstanceKlass::_is_marked_dependent In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 13:04:03 GMT, Coleen Phillimore wrote: > Please review this trivial change. I ran the SA tests to make sure the is_marked_dependent flag in vmStructs wasn't used there. One looks like it glitched. Thanks Aleksey. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20595#issuecomment-2291895131 From coleenp at openjdk.org Thu Aug 15 18:22:53 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 15 Aug 2024 18:22:53 GMT Subject: Integrated: 8338447: Remove InstanceKlass::_is_marked_dependent In-Reply-To: References: Message-ID: <4kKJbX39ZW7cdTt90WwV54qCz9ylYA9HfTYnxMoRUU4=.c8f9f762-dcf1-4d0f-bbe1-f6d07c2a233b@github.com> On Thu, 15 Aug 2024 13:04:03 GMT, Coleen Phillimore wrote: > Please review this trivial change. I ran the SA tests to make sure the is_marked_dependent flag in vmStructs wasn't used there. This pull request has now been integrated. Changeset: 1cd48843 Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/1cd488436880b00c55fa91f44c115999cf686afd Stats: 5 lines in 2 files changed: 0 ins; 5 del; 0 mod 8338447: Remove InstanceKlass::_is_marked_dependent Reviewed-by: shade ------------- PR: https://git.openjdk.org/jdk/pull/20595 From kvn at openjdk.org Thu Aug 15 19:27:52 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 15 Aug 2024 19:27:52 GMT Subject: RFR: 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime [v5] In-Reply-To: <4YGUzK1zrvn1DEADwS8jCCaoGGJZOmKVwx0opVd74ZQ=.739fe436-a46e-4e0f-ac88-8ba7dd9c2f9b@github.com> References: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> <4YGUzK1zrvn1DEADwS8jCCaoGGJZOmKVwx0opVd74ZQ=.739fe436-a46e-4e0f-ac88-8ba7dd9c2f9b@github.com> Message-ID: On Wed, 14 Aug 2024 13:39:23 GMT, Andrew Dinn wrote: >> Store the throw_exception and jfr stub code as blobs in class SharedRuntime, move the generation code to the the arch-specific generator classes and update client code to access them from their new location. > > Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: > > fix up jvmci static field declarations Update is good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20566#pullrequestreview-2241166926 From shade at openjdk.org Thu Aug 15 20:15:54 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 15 Aug 2024 20:15:54 GMT Subject: RFR: 8293650: Shenandoah: Support archived heap objects [v7] In-Reply-To: References: Message-ID: <0kjwFmSmNp3f6qR5fCXSRXl6hnAG1zfvTTVrux6tF_g=.e497cb6b-f16a-430d-bb4b-704184176c97@github.com> On Thu, 15 Aug 2024 17:27:23 GMT, Aleksey Shipilev wrote: >> This implements CDS Java heap loading for Shenandoah. There are peculiarities with how CDS loads objects: it basically asks for a contiguous block of memory, fills it out, potentially relocating the objects. This gets interesting when a single Shenandoah region cannot contain the entirety of the load. See the implementation for gory details. >> >> Current implementation would work well only with Shenandoah heap regions >= 1M, in other words, with the heaps >=2G. It would be better if we trim down the min alignment, thus unblocking smaller heaps. It is not necessary to do so in this PR, so I track that work separately: [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828). >> >> Additional testing: >> - [x] New test >> - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` >> - [x] Same as above, but `MIN_GC_REGION_ALIGNMENT` manually dropped to 256K (mimics [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828)) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: > > - Merge branch 'master' into JDK-8293650-shenandoah-archives > - More review comments > - Wrap the whole thing in CDS define > - Work around 32-bit build failure > - Review comments > - Merge branch 'master' into JDK-8293650-shenandoah-archives > - Move constant to separate class to unbreak Windows builds > - Touchups in test > - Basic implementation, works well, passes tests Had to remerge with current state of master, which unfortunately invalidated reviews. Need another ack, please :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20468#issuecomment-2292141755 From rkennke at openjdk.org Thu Aug 15 20:26:53 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 15 Aug 2024 20:26:53 GMT Subject: RFR: 8293650: Shenandoah: Support archived heap objects [v7] In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 17:27:23 GMT, Aleksey Shipilev wrote: >> This implements CDS Java heap loading for Shenandoah. There are peculiarities with how CDS loads objects: it basically asks for a contiguous block of memory, fills it out, potentially relocating the objects. This gets interesting when a single Shenandoah region cannot contain the entirety of the load. See the implementation for gory details. >> >> Current implementation would work well only with Shenandoah heap regions >= 1M, in other words, with the heaps >=2G. It would be better if we trim down the min alignment, thus unblocking smaller heaps. It is not necessary to do so in this PR, so I track that work separately: [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828). >> >> Additional testing: >> - [x] New test >> - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` >> - [x] Same as above, but `MIN_GC_REGION_ALIGNMENT` manually dropped to 256K (mimics [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828)) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: > > - Merge branch 'master' into JDK-8293650-shenandoah-archives > - More review comments > - Wrap the whole thing in CDS define > - Work around 32-bit build failure > - Review comments > - Merge branch 'master' into JDK-8293650-shenandoah-archives > - Move constant to separate class to unbreak Windows builds > - Touchups in test > - Basic implementation, works well, passes tests Marked as reviewed by rkennke (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20468#pullrequestreview-2241264683 From coleenp at openjdk.org Thu Aug 15 20:47:51 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 15 Aug 2024 20:47:51 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int In-Reply-To: References: Message-ID: <1of0cndqphEvQjJD8q54cUrYMAKOPh-Y7hkZiZ-uooU=.46215ef2-b141-465f-9247-071f3eec483e@github.com> On Tue, 13 Aug 2024 02:20:41 GMT, David Holmes wrote: > This work has been split out from JDK-8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths > > The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. Though with compact strings this reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define UTF8 lengths as `size_t` to accommodate all possible representations. Higher-level API's can still use `int` if they know the strings (eg symbols) are sufficiently constrained in length. See the comments in utf8.hpp that explain Strings, compact strings and the encoding. > > As the existing JNI `GetStringUTFLength` still requires the current truncating behaviour of ` UNICODE::utf8_length` we add back `UNICODE::utf8_length_as_int` for it to use. > > Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` use `length` as an IN/OUT parameter: it is the incoming (int) length of the jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. This makes some of the call sites a little messy with casts. > > Testing: > - tiers 1-4 > - GHA One thing that I think needs changing and some comments, but your change strikes a good balance of promoting enough parameters to size_t but not too many. Did you try to compile this with -Wconversion (not -Wsign-conversion) without -Werror? It doesn't look like GHA is configured for you here. src/hotspot/share/classfile/javaClasses.cpp line 586: > 584: ResourceMark rm; > 585: jbyte* position = (length == 0) ? nullptr : value->byte_at_addr(0); > 586: size_t utf8_len = length; These are sign conversions. This is too big to change here and not sure what the fan out would be but java_lang_String::length() should return unsigned and also arrayOop.hpp length. They're never negative. We should probably assert that (below). src/hotspot/share/classfile/javaClasses.cpp line 639: > 637: if (length == 0) { > 638: return 0; > 639: } Maybe assert length > 0 here? src/hotspot/share/classfile/javaClasses.cpp line 702: > 700: } else { > 701: jbyte* position = (length == 0) ? nullptr : value->byte_at_addr(0); > 702: return UNICODE::as_utf8(position, static_cast(length), buf, buflen); Don't you want checked_cast here not static casts. src/hotspot/share/prims/jni.cpp line 2226: > 2224: HOTSPOT_JNI_GETSTRINGUTFLENGTH_ENTRY(env, string); > 2225: oop java_string = JNIHandles::resolve_non_null(string); > 2226: jsize ret = java_lang_String::utf8_length_as_int(java_string); So the spec says that this should be jsize (signed int), which is why this is, right? src/hotspot/share/utilities/utf8.cpp line 512: > 510: > 511: template > 512: int UNICODE::utf8_length_as_int(const T* base, int length) { Why was this parameter left as int and not size_t ? src/hotspot/share/utilities/utf8.hpp line 47: > 45: > 46: In the code below if we have latin-1 content then we treat the String's data > 47: array as a jbyte[], else a jchar[]. The lengths of these arrays are specified jchar is 16-bits right? So the max length if not latin-1 is INT_MAX/2 ? src/hotspot/share/utilities/utf8.hpp line 70: > 68: > 69: */ > 70: This is a good place for this comment. One could find it again if they need to understand this. ------------- PR Review: https://git.openjdk.org/jdk/pull/20560#pullrequestreview-2241255691 PR Comment: https://git.openjdk.org/jdk/pull/20560#issuecomment-2292205627 PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1718945451 PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1718948865 PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1718950816 PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1718954620 PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1718959373 PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1718964159 PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1718966422 From shade at openjdk.org Thu Aug 15 20:54:58 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 15 Aug 2024 20:54:58 GMT Subject: RFR: 8293650: Shenandoah: Support archived heap objects [v7] In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 17:27:23 GMT, Aleksey Shipilev wrote: >> This implements CDS Java heap loading for Shenandoah. There are peculiarities with how CDS loads objects: it basically asks for a contiguous block of memory, fills it out, potentially relocating the objects. This gets interesting when a single Shenandoah region cannot contain the entirety of the load. See the implementation for gory details. >> >> Current implementation would work well only with Shenandoah heap regions >= 1M, in other words, with the heaps >=2G. It would be better if we trim down the min alignment, thus unblocking smaller heaps. It is not necessary to do so in this PR, so I track that work separately: [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828). >> >> Additional testing: >> - [x] New test >> - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` >> - [x] Same as above, but `MIN_GC_REGION_ALIGNMENT` manually dropped to 256K (mimics [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828)) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: > > - Merge branch 'master' into JDK-8293650-shenandoah-archives > - More review comments > - Wrap the whole thing in CDS define > - Work around 32-bit build failure > - Review comments > - Merge branch 'master' into JDK-8293650-shenandoah-archives > - Move constant to separate class to unbreak Windows builds > - Touchups in test > - Basic implementation, works well, passes tests Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20468#issuecomment-2292214646 From shade at openjdk.org Thu Aug 15 20:54:59 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 15 Aug 2024 20:54:59 GMT Subject: Integrated: 8293650: Shenandoah: Support archived heap objects In-Reply-To: References: Message-ID: On Mon, 5 Aug 2024 16:32:09 GMT, Aleksey Shipilev wrote: > This implements CDS Java heap loading for Shenandoah. There are peculiarities with how CDS loads objects: it basically asks for a contiguous block of memory, fills it out, potentially relocating the objects. This gets interesting when a single Shenandoah region cannot contain the entirety of the load. See the implementation for gory details. > > Current implementation would work well only with Shenandoah heap regions >= 1M, in other words, with the heaps >=2G. It would be better if we trim down the min alignment, thus unblocking smaller heaps. It is not necessary to do so in this PR, so I track that work separately: [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828). > > Additional testing: > - [x] New test > - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` > - [x] Same as above, but `MIN_GC_REGION_ALIGNMENT` manually dropped to 256K (mimics [JDK-8337828](https://bugs.openjdk.org/browse/JDK-8337828)) This pull request has now been integrated. Changeset: d86e99c3 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/d86e99c3ca94ee8705e44fe2830edd3ceb0a7f64 Stats: 201 lines in 7 files changed: 189 ins; 6 del; 6 mod 8293650: Shenandoah: Support archived heap objects Reviewed-by: rkennke, wkemper, iklam ------------- PR: https://git.openjdk.org/jdk/pull/20468 From shade at openjdk.org Thu Aug 15 21:34:19 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 15 Aug 2024 21:34:19 GMT Subject: RFR: 8337828: CDS: Trim down minimum GC region alignment [v2] In-Reply-To: References: Message-ID: <9eVYBElFs0NFXIuILx2Iu_NDuTM8wX8biuSkjxoUDNU=.5200d483-fa83-4ed6-9bbd-daa152201b90@github.com> > CDS currently follows G1's minimum region size to guess which alignment to use when dumping the heap. The comment near the constant rightfully recognizes it would be convenient for Shenandoah to trim the alignment down to 256K (Shenandoah's min region size). If we do this, we will improve the heap sizes [JDK-8293650](https://bugs.openjdk.org/browse/JDK-8293650) can operate at. > > Unless I am missing something else, trimming down the min region alignment has impact on the size of the objects we can store in CDS archive. Conveniently, `-Xlog:cds+heap` prints the object size stats for us, and it looks we are way under the 256K limit: > > > $ build/macosx-aarch64-server-fastdebug/images/jdk/bin/java -XX:-UseCompressedOops -Xshare:dump -Xlog:cds+heap > ... > [0.921s][info][cds,heap] 0 objects are <= 8 bytes (total 0 bytes, avg 0.0 bytes) > [0.921s][info][cds,heap] 2550 objects are <= 16 bytes (total 40800 bytes, avg 16.0 bytes) > [0.921s][info][cds,heap] 14325 objects are <= 32 bytes (total 431896 bytes, avg 30.1 bytes) > [0.921s][info][cds,heap] 6572 objects are <= 64 bytes (total 301304 bytes, avg 45.8 bytes) > [0.921s][info][cds,heap] 1225 objects are <= 128 bytes (total 113112 bytes, avg 92.3 bytes) > [0.921s][info][cds,heap] 2173 objects are <= 256 bytes (total 384024 bytes, avg 176.7 bytes) > [0.921s][info][cds,heap] 143 objects are <= 512 bytes (total 47720 bytes, avg 333.7 bytes) > [0.921s][info][cds,heap] 40 objects are <= 1024 bytes (total 26872 bytes, avg 671.8 bytes) > [0.921s][info][cds,heap] 19 objects are <= 2048 bytes (total 29656 bytes, avg 1560.8 bytes) > [0.921s][info][cds,heap] 9 objects are <= 4096 bytes (total 20744 bytes, avg 2304.9 bytes) > [0.921s][info][cds,heap] 4 objects are <= 8192 bytes (total 20536 bytes, avg 5134.0 bytes) > [0.921s][info][cds,heap] 3 objects are <= 16384 bytes (total 30168 bytes, avg 10056.0 bytes) > [0.921s][info][cds,heap] 2 objects are <= 32768 bytes (total 32800 bytes, avg 16400.0 bytes) > [0.921s][info][cds,heap] 0 objects are <= 65536 bytes (total 0 bytes, avg 0.0 bytes) > [0.921s][info][cds,heap] 1 objects are <= 131072 bytes (total 66848 bytes, avg 66848.0 bytes) > [0.921s][info][cds,heap] 0 objects are <= 262144 bytes (total 0 bytes, avg 0.0 bytes) > [0.921s][info][cds,heap] 0 huge objects (tot... Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge branch 'master' into JDK-8337828-cds-min-alignment - Work ------------- Changes: https://git.openjdk.org/jdk/pull/20469/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20469&range=01 Stats: 5 lines in 1 file changed: 0 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20469.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20469/head:pull/20469 PR: https://git.openjdk.org/jdk/pull/20469 From fyang at openjdk.org Fri Aug 16 01:03:55 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 16 Aug 2024 01:03:55 GMT Subject: RFR: 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime [v5] In-Reply-To: <4YGUzK1zrvn1DEADwS8jCCaoGGJZOmKVwx0opVd74ZQ=.739fe436-a46e-4e0f-ac88-8ba7dd9c2f9b@github.com> References: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> <4YGUzK1zrvn1DEADwS8jCCaoGGJZOmKVwx0opVd74ZQ=.739fe436-a46e-4e0f-ac88-8ba7dd9c2f9b@github.com> Message-ID: On Wed, 14 Aug 2024 13:39:23 GMT, Andrew Dinn wrote: >> Store the throw_exception and jfr stub code as blobs in class SharedRuntime, move the generation code to the the arch-specific generator classes and update client code to access them from their new location. > > Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: > > fix up jvmci static field declarations LGTM. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20566#pullrequestreview-2241588374 From qamai at openjdk.org Fri Aug 16 04:30:59 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 16 Aug 2024 04:30:59 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v3] In-Reply-To: References: Message-ID: <0UcwNid-jUKzMXi3nhkKr6e0RUdJs7jlouLlEOVxKo4=.6db98098-b034-42db-8f00-25bff0c764d4@github.com> On Mon, 12 Aug 2024 10:04:50 GMT, Aleksey Shipilev wrote: >> See bug for more discussion. >> >> Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193 >> >> A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice: >> https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160 >> >> AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone. >> >> I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields. >> >> Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit. >> >> C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner. >> >> Additional testing: >> - [x] New IR tests >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x... > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Use TestFramework bootclasspath instead of develop option > - Merge branch 'master' into JDK-8333791-stable-field-barrier > - Merge branch 'master' into JDK-8333791-stable-field-barrier > - Merge branch 'master' into JDK-8333791-stable-field-barrier > - Merge branch 'master' into JDK-8333791-stable-field-barrier > - Variant 2: Only final-field like semantics for stable inits > - Variant 3: Handle everything, including reads by compilers src/hotspot/share/c1/c1_GraphBuilder.cpp line 1752: > 1750: scope()->set_wrote_final(); > 1751: } > 1752: if (field->is_stable()) { What if the `field` is a field of another object, not the one in construction? For final fields the verifier ensures that we only write to it in the containing object constructor, but for final fields it is not guaranteed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19635#discussion_r1719282291 From liach at openjdk.org Fri Aug 16 04:42:01 2024 From: liach at openjdk.org (Chen Liang) Date: Fri, 16 Aug 2024 04:42:01 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v3] In-Reply-To: <0UcwNid-jUKzMXi3nhkKr6e0RUdJs7jlouLlEOVxKo4=.6db98098-b034-42db-8f00-25bff0c764d4@github.com> References: <0UcwNid-jUKzMXi3nhkKr6e0RUdJs7jlouLlEOVxKo4=.6db98098-b034-42db-8f00-25bff0c764d4@github.com> Message-ID: On Fri, 16 Aug 2024 04:28:41 GMT, Quan Anh Mai wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: >> >> - Use TestFramework bootclasspath instead of develop option >> - Merge branch 'master' into JDK-8333791-stable-field-barrier >> - Merge branch 'master' into JDK-8333791-stable-field-barrier >> - Merge branch 'master' into JDK-8333791-stable-field-barrier >> - Merge branch 'master' into JDK-8333791-stable-field-barrier >> - Variant 2: Only final-field like semantics for stable inits >> - Variant 3: Handle everything, including reads by compilers > > src/hotspot/share/c1/c1_GraphBuilder.cpp line 1752: > >> 1750: scope()->set_wrote_final(); >> 1751: } >> 1752: if (field->is_stable()) { > > What if the `field` is a field of another object, not the one in construction? For final fields the verifier ensures that we only write to it in the containing object constructor, but for final fields it is not guaranteed. Isn't membar additions already checked with `is_object_initializer()` or `method()->name() == ciSymbols::object_initializer_name()`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19635#discussion_r1719289420 From qamai at openjdk.org Fri Aug 16 04:50:58 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 16 Aug 2024 04:50:58 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v3] In-Reply-To: References: <0UcwNid-jUKzMXi3nhkKr6e0RUdJs7jlouLlEOVxKo4=.6db98098-b034-42db-8f00-25bff0c764d4@github.com> Message-ID: On Fri, 16 Aug 2024 04:38:44 GMT, Chen Liang wrote: >> src/hotspot/share/c1/c1_GraphBuilder.cpp line 1752: >> >>> 1750: scope()->set_wrote_final(); >>> 1751: } >>> 1752: if (field->is_stable()) { >> >> What if the `field` is a field of another object, not the one in construction? For final fields the verifier ensures that we only write to it in the containing object constructor, but for final fields it is not guaranteed. > > Isn't membar additions already checked with `is_object_initializer()` or `method()->name() == ciSymbols::object_initializer_name()`? Yes we only emit membars at the end of constructors but we do not check if the stable fields being written are of the same objects as the ones being constructed there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19635#discussion_r1719296315 From liach at openjdk.org Fri Aug 16 05:23:57 2024 From: liach at openjdk.org (Chen Liang) Date: Fri, 16 Aug 2024 05:23:57 GMT Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v3] In-Reply-To: References: <0UcwNid-jUKzMXi3nhkKr6e0RUdJs7jlouLlEOVxKo4=.6db98098-b034-42db-8f00-25bff0c764d4@github.com> Message-ID: On Fri, 16 Aug 2024 04:47:42 GMT, Quan Anh Mai wrote: >> Isn't membar additions already checked with `is_object_initializer()` or `method()->name() == ciSymbols::object_initializer_name()`? > > Yes we only emit membars at the end of constructors but we do not check if the stable fields being written are of the same objects as the ones being constructed there. That should be still fine, as this shouldn't affect performance-sensitive fast paths. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19635#discussion_r1719327139 From yzheng at openjdk.org Fri Aug 16 06:15:51 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Fri, 16 Aug 2024 06:15:51 GMT Subject: RFR: 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime [v5] In-Reply-To: <4YGUzK1zrvn1DEADwS8jCCaoGGJZOmKVwx0opVd74ZQ=.739fe436-a46e-4e0f-ac88-8ba7dd9c2f9b@github.com> References: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> <4YGUzK1zrvn1DEADwS8jCCaoGGJZOmKVwx0opVd74ZQ=.739fe436-a46e-4e0f-ac88-8ba7dd9c2f9b@github.com> Message-ID: <0YVI0zz30rAQQpSFqfeZ5NTpfz79ucVK9FnoAPhod6s=.01bf6cde-55ba-4208-8490-2636bca0ab31@github.com> On Wed, 14 Aug 2024 13:39:23 GMT, Andrew Dinn wrote: >> Store the throw_exception and jfr stub code as blobs in class SharedRuntime, move the generation code to the the arch-specific generator classes and update client code to access them from their new location. > > Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: > > fix up jvmci static field declarations JVMCI changes look good to me. Thanks! ------------- Marked as reviewed by yzheng (Committer). PR Review: https://git.openjdk.org/jdk/pull/20566#pullrequestreview-2241967520 From aboldtch at openjdk.org Fri Aug 16 06:23:01 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 16 Aug 2024 06:23:01 GMT Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v20] In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 06:12:22 GMT, Axel Boldt-Christmas wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >> >> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >> >> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Remove newline Thanks for all the reviews and contributions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20067#issuecomment-2292891722 From aboldtch at openjdk.org Fri Aug 16 06:23:04 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 16 Aug 2024 06:23:04 GMT Subject: Integrated: 8315884: New Object to ObjectMonitor mapping In-Reply-To: References: Message-ID: On Mon, 8 Jul 2024 08:18:42 GMT, Axel Boldt-Christmas wrote: > When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. > > This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. > > A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). > > This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). > > Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. > > # Cleanups > > Cleaned up displaced header usage for: > * BasicLock > * Contains some Zero changes > * Renames one exported JVMCI field > * ObjectMonitor > * Updates comments and tests consistencies > > # Refactoring > > `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. > > The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. > > _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ > > # LightweightSynchronizer > > Working on adapting and incorporating the following section as a comment in the source code > > ## Fast Locking > > CAS on locking bits in markWord. > 0b00 (Fast Locked) <--> 0b01 (Unlocked) > > When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. > > If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf... This pull request has now been integrated. Changeset: bd4160ce Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/bd4160cea8b6b0fcf0507199ed76a12f5d0aaba9 Stats: 3612 lines in 68 files changed: 2691 ins; 318 del; 603 mod 8315884: New Object to ObjectMonitor mapping Co-authored-by: Erik ?sterlund Co-authored-by: Stefan Karlsson Co-authored-by: Coleen Phillimore Reviewed-by: rkennke, coleenp, dcubed ------------- PR: https://git.openjdk.org/jdk/pull/20067 From shade at openjdk.org Fri Aug 16 09:12:47 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 16 Aug 2024 09:12:47 GMT Subject: RFR: 8338314: JFR: Split JFRCheckpoint VM operation In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 15:56:46 GMT, Aleksey Shipilev wrote: > Investigating JFR crashes is a bit tedious, as Events section in `hs_err` shows just: > > > Event: 3.006 Executing VM operation: JFRCheckpoint > Event: 3.006 Executing VM operation: JFRCheckpoint done > > > What is that `JFRCheckpoint` doing is unclear, because it can do two separate things: clear or write. It would be good if we could disambiguate the two. Since there are only two flavors of checkpoint, I think we can just split the VMOp into two more precisely named ones, so it gives us e.g.: > > > Event: 2.462 Executing VM operation: JFRSafepointClear > Event: 2.463 Executing VM operation: JFRSafepointClear done Thanks! Maybe @egahlin wants to review too? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20570#issuecomment-2293136610 From egahlin at openjdk.org Fri Aug 16 11:12:48 2024 From: egahlin at openjdk.org (Erik Gahlin) Date: Fri, 16 Aug 2024 11:12:48 GMT Subject: RFR: 8338314: JFR: Split JFRCheckpoint VM operation In-Reply-To: References: Message-ID: <41-FRDrvpKpcUOSXKszSuJvWkTlzoCkbEATQ9X1vi98=.afa26bda-a4ea-4e34-a184-ea39a2e823f2@github.com> On Tue, 13 Aug 2024 15:56:46 GMT, Aleksey Shipilev wrote: > Investigating JFR crashes is a bit tedious, as Events section in `hs_err` shows just: > > > Event: 3.006 Executing VM operation: JFRCheckpoint > Event: 3.006 Executing VM operation: JFRCheckpoint done > > > What is that `JFRCheckpoint` doing is unclear, because it can do two separate things: clear or write. It would be good if we could disambiguate the two. Since there are only two flavors of checkpoint, I think we can just split the VMOp into two more precisely named ones, so it gives us e.g.: > > > Event: 2.462 Executing VM operation: JFRSafepointClear > Event: 2.463 Executing VM operation: JFRSafepointClear done Marked as reviewed by egahlin (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20570#pullrequestreview-2242485832 From coleenp at openjdk.org Fri Aug 16 12:58:50 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 16 Aug 2024 12:58:50 GMT Subject: RFR: 8336468: Reflection and MethodHandles should use more precise initializer checks [v2] In-Reply-To: References: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com> Message-ID: <4L07gHAQMsFU2gzWpWm16TtzrY_e_nQp8YfklCPYiRc=.614132c5-c0ca-423f-af43-58fd3502a451@github.com> On Tue, 16 Jul 2024 17:12:08 GMT, Aleksey Shipilev wrote: >> This PR should cover the Reflection/MethodHandles part of [JDK-8336103](https://bugs.openjdk.org/browse/JDK-8336103). >> >> There are places where we change the behavior: `clinit` would now be recorded as "method", instead of "constructor". Tracing back the uses of `get_flags`: it is used for initializing `java.lang.ClassFrameInfo.flags`. There seem to be no readers for this field in VM. Java side for `j.l.CFI` does not seem to check any method/constructor flags. So I would say this change in behavior is not really visible, and there is no need to try and keep the old (odd) behavior. >> >> I also inlined the `select_method` definition, which allows for a bit more straight-forward local code, and obviates the need for wrapping things with `methodHandle`. >> >> @mlchung, you probably want to look at this more closely. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into JDK-8336468-reflection-init-checks > - Remove unnecessary handle-izing > - Fix > - Fix This seems fine. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20192#pullrequestreview-2242654575 From rcastanedalo at openjdk.org Fri Aug 16 13:06:55 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 16 Aug 2024 13:06:55 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> Message-ID: On Sun, 21 Jul 2024 08:27:52 GMT, Martin Doerr wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Build barrier data in G1BarrierSetC2::get_store_barrier() by adding, rather than removing, barrier tags > > src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 123: > >> 121: if ((barrier_data() & G1C2BarrierPost) != 0) { >> 122: __ movl($tmp2$$Register, $src$$Register); >> 123: if ((barrier_data() & G1C2BarrierPostNotNull) == 0) { > > `decode_heap_oop` contains a null check in some cases which makes some of your code redundant. Optimization idea: In case of `(((barrier_data() & G1C2BarrierPostNotNull) == 0) && CompressedOops::base() != nullptr)` use a null check and bail out because there's nothing left to do if it's null. After that, we can always use `decode_heap_oop_not_null`. Also note that the latter supports specifying different src and dst registers which saves the extra move operation. Thanks for the suggestion, Martin! I have prototyped the optimization [here](https://github.com/robcasloz/jdk/blob/JDK-8334060-g1-late-barrier-expansion-x64-optimizations/src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad) (guarded by `RemoveRedundantNullChecks`), and in my opinion its expected benefit does not justify the additional complexity, especially since the scope is limited (in my earlier experiments, most of the stores are implemented with `g1EncodePAndStoreN` rather than `g1StoreN`, plus the optimization only applies to a specific compressed OOPs mode). I have run a few general-purpose benchmarks using a non-zero base compressed oops mode and the optimization did not yield any statistically significant improvement, but please let me know if you have any specific benchmark/configuration in mind and I can re-check. > src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 182: > >> 180: $tmp2$$Register /* pre_val */, >> 181: $tmp3$$Register /* tmp */, >> 182: RegSet::of($mem$$Register, $newval$$Register, $oldval$$Register) /* preserve */); > > The only value which can get overwritten is `oldval`. Optimization idea: Pass `oldval` to the SATB barrier. There is no load of the old value required. Thanks, I will test and apply this one: it is simple enough and, in a way, even more intuitive than the current solution. For reference, I prototyped it [here](https://github.com/robcasloz/jdk/blob/JDK-8334060-g1-late-barrier-expansion-x64-optimizations/src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad) (guarded by `UseOldValInPreBarriers`). It should also apply to `g1CompareAndSwapP`, right? > src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 301: > >> 299: RegSet::of($mem$$Register, $newval$$Register) /* preserve */); >> 300: __ movq($tmp1$$Register, $newval$$Register); >> 301: __ xchgq($newval$$Register, Address($mem$$Register, 0)); > > Optimization idea: Despite its name, `g1_pre_write_barrier` can be moved after the xchg operation because there's no safepoint within this MachNode. This allows avoiding loading the old value twice. Thanks, I also prototyped this [here](https://github.com/robcasloz/jdk/blob/JDK-8334060-g1-late-barrier-expansion-x64-optimizations/src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad) (guarded by `UseExchangedValueinPreBarriers`). The atomic barrier implementation in this PR is purposefully simple based on 1) the assumption that the cost of the atomic operations will dominate that of their barriers, and 2) the risk of introducing subtle bugs which can be difficult to catch by regular testing. Because of this, I feel hesitant to introduce this kind of optimizations for atomic operation barriers. But I am happy to reconsider, if you have any specific benchmark/configuration in mind where the benefit could outweigh the cost. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1719811953 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1719812882 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1719814312 From sgehwolf at openjdk.org Fri Aug 16 13:28:53 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 16 Aug 2024 13:28:53 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured [v4] In-Reply-To: References: Message-ID: On Mon, 5 Aug 2024 05:51:53 GMT, Matthias Baesken wrote: > If the added apt-get call causes issues for some people running the tests, we can easily add an ubuntu distro check in a follow up. @MBaesken This breaks container tests on non Debian distros. Please add some form of property that needs to be set to install `libubsan1` on the test containers if you need that. It should have no impact on users not needing/using it. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19907#issuecomment-2293509335 From stuefe at openjdk.org Fri Aug 16 14:10:56 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 16 Aug 2024 14:10:56 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured [v4] In-Reply-To: References: Message-ID: On Fri, 16 Aug 2024 13:26:23 GMT, Severin Gehwolf wrote: > > If the added apt-get call causes issues for some people running the tests, we can easily add an ubuntu distro check in a follow up. > > > > @MBaesken This breaks container tests on non Debian distros. Please add some form of property that needs to be set to install `libubsan1` on the test containers if you need that. It should have no impact on users not needing/using it. Thanks! weird that Oracle reported no errors since I assume they test on what essentially is RHEL ------------- PR Comment: https://git.openjdk.org/jdk/pull/19907#issuecomment-2293580070 From sgehwolf at openjdk.org Fri Aug 16 15:02:54 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 16 Aug 2024 15:02:54 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured [v4] In-Reply-To: References: Message-ID: On Fri, 16 Aug 2024 14:08:07 GMT, Thomas Stuefe wrote: > > > If the added apt-get call causes issues for some people running the tests, we can easily add an ubuntu distro check in a follow up. > > > > > > @MBaesken This breaks container tests on non Debian distros. Please add some form of property that needs to be set to install `libubsan1` on the test containers if you need that. It should have no impact on users not needing/using it. Thanks! > > weird that Oracle reported no errors since I assume they test on what essentially is RHEL It would depend whether or not skipped tests are flagged or not. The container tests get silently skipped if the container build of the image to test fails. That might be a reason. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19907#issuecomment-2293672603 From aph at openjdk.org Fri Aug 16 15:55:26 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 16 Aug 2024 15:55:26 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v20] In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: > This patch expands the use of a hash table for secondary superclasses > to the interpreter, C1, and runtime. It also adds a C2 implementation > of hashed lookup in cases where the superclass isn't known at compile > time. > > HotSpot shared runtime > ---------------------- > > Building hashed secondary tables is now unconditional. It takes very > little time, and now that the shared runtime always has the tables, it > might as well take advantage of them. The shared code is easier to > follow now, I think. > > There might be a performance issue with x86-64 in that we build > HotSpot for a default x86-64 target that does not support popcount. > This means that HotSpot C++ runtime on x86 always uses a software > emulation for popcount, even though the vast majority of machines made > for the past 20 years can do popcount in a single instruction. It > wouldn't be terribly hard to do something about that. > > Having said that, the software popcount is really not bad. > > x86 > --- > > x86 is rather tricky, because we still support > `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as > well as 32- and 64-bit ports. There's some further complication in > that only `RCX` can be used as a shift count, so there's some register > shuffling to do. All of this makes the logic in macroAssembler_x86.cpp > rather gnarly, with multiple levels of conditionals at compile time > and runtime. > > AArch64 > ------- > > AArch64 is considerably more straightforward. We always have a > popcount instruction and (thankfully) no 32-bit code to worry about. > > Generally > --------- > > I would dearly love simply to rip out the "old" secondary supers cache > support, but I've left it in just in case someone has a performance > regression. > > The versions of `MacroAssembler::lookup_secondary_supers_table` that > work with variable superclasses don't take a fixed set of temp > registers, and neither do they call out to to a slow path subroutine. > Instead, the slow patch is expanded inline. > > I don't think this is necessarily bad. Apart from the very rare cases > where C2 can't determine the superclass to search for at compile time, > this code is only used for generating stubs, and it seemed to me > ridiculous to have stubs calling other stubs. > > I've followed the guidance from @iwanowww not to obsess too much about > the performance of C1-compiled secondary supers lookups, and to prefer > simplicity over absolute performance. Nonetheless, this is a > complicated patch that touches many areas. Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 58 commits: - Merge branch 'clean' into JDK-8331658-work - Fix merge - Merge branch 'clean' into JDK-8331658-work - Merge from JDK head. - Cleanup - Fix shared code - Fix shared code - use assert rather than guarantee - Untabify - Reorganize x86 - ... and 48 more: https://git.openjdk.org/jdk/compare/07352c67...dd42fe93 ------------- Changes: https://git.openjdk.org/jdk/pull/19989/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=19 Stats: 1047 lines in 20 files changed: 774 ins; 140 del; 133 mod Patch: https://git.openjdk.org/jdk/pull/19989.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989 PR: https://git.openjdk.org/jdk/pull/19989 From egahlin at openjdk.org Fri Aug 16 19:07:57 2024 From: egahlin at openjdk.org (Erik Gahlin) Date: Fri, 16 Aug 2024 19:07:57 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v18] In-Reply-To: References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> Message-ID: On Tue, 30 Jul 2024 14:33:10 GMT, Sonia Zaldana Calles wrote: >> Hi all, >> >> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. >> >> This PR addresses the following diagnostic commands: >> - [x] Compiler.perfmap >> - [x] GC.heap_dump >> - [x] System.dump_map >> - [x] Thread.dump_to_file >> - [x] VM.cds >> >> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). >> >> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, >> >> >> filename (Optional) Name of the file to which the flight recording data is >> written when the recording is stopped. If no filename is given, a >> filename is generated from the PID and the current date and is >> placed in the directory where the process was started. The >> filename may also be a directory in which case, the filename is >> generated from the PID and the current date in the specified >> directory. (STRING, no default value) >> >> Note: If a filename is given, '%p' in the filename will be >> replaced by the PID, and '%t' will be replaced by the time in >> 'yyyy_MM_dd_HH_mm_ss' format. >> >> >> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. >> >> Testing: >> >> - [x] Added test case passes. >> - [x] Modified existing VM.cds tests to also check for `%p` filenames. >> >> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). >> >> Cheers, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Fixing invocation outside of jcmd This change should have had a CSR. The change from "STRING" to "FILE" broke JMC. It's no longer possible to run the "Thread.dump_to_file" with a filename in the Diagnostic Command tab because the type is now "FILE" instead of "STRING", so the value can't be edited. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2294014583 From cjplummer at openjdk.org Fri Aug 16 19:30:55 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 16 Aug 2024 19:30:55 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v18] In-Reply-To: References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> Message-ID: On Tue, 30 Jul 2024 14:33:10 GMT, Sonia Zaldana Calles wrote: >> Hi all, >> >> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. >> >> This PR addresses the following diagnostic commands: >> - [x] Compiler.perfmap >> - [x] GC.heap_dump >> - [x] System.dump_map >> - [x] Thread.dump_to_file >> - [x] VM.cds >> >> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). >> >> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, >> >> >> filename (Optional) Name of the file to which the flight recording data is >> written when the recording is stopped. If no filename is given, a >> filename is generated from the PID and the current date and is >> placed in the directory where the process was started. The >> filename may also be a directory in which case, the filename is >> generated from the PID and the current date in the specified >> directory. (STRING, no default value) >> >> Note: If a filename is given, '%p' in the filename will be >> replaced by the PID, and '%t' will be replaced by the time in >> 'yyyy_MM_dd_HH_mm_ss' format. >> >> >> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. >> >> Testing: >> >> - [x] Added test case passes. >> - [x] Modified existing VM.cds tests to also check for `%p` filenames. >> >> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). >> >> Cheers, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Fixing invocation outside of jcmd I think the only reason for the change from STRING to FILE was to support shared code (not per dcmd code) doing the %p expansion. See the following change: https://github.com/openjdk/jdk/pull/20198/files#diff-4f9f273d22b9cf7af6e58a5770b95cc758e8c8e9c257b56b9ac95bfa359c16f2 I wonder if it is possible to do this without the addition of FILE. Maybe most of the code can be shared, but we need a bit of dcmd specific code to invoke this filename expansion code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2294055791 From mdoerr at openjdk.org Fri Aug 16 20:58:53 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 16 Aug 2024 20:58:53 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> Message-ID: <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> On Fri, 16 Aug 2024 13:01:28 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 123: >> >>> 121: if ((barrier_data() & G1C2BarrierPost) != 0) { >>> 122: __ movl($tmp2$$Register, $src$$Register); >>> 123: if ((barrier_data() & G1C2BarrierPostNotNull) == 0) { >> >> `decode_heap_oop` contains a null check in some cases which makes some of your code redundant. Optimization idea: In case of `(((barrier_data() & G1C2BarrierPostNotNull) == 0) && CompressedOops::base() != nullptr)` use a null check and bail out because there's nothing left to do if it's null. After that, we can always use `decode_heap_oop_not_null`. Also note that the latter supports specifying different src and dst registers which saves the extra move operation. > > Thanks for the suggestion, Martin! I have prototyped the optimization [here](https://github.com/robcasloz/jdk/blob/JDK-8334060-g1-late-barrier-expansion-x64-optimizations/src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad) (guarded by `RemoveRedundantNullChecks`), and in my opinion its expected benefit does not justify the additional complexity, especially since the scope is limited (in my earlier experiments, most of the stores are implemented with `g1EncodePAndStoreN` rather than `g1StoreN`, plus the optimization only applies to a specific compressed OOPs mode). I have run a few general-purpose benchmarks using a non-zero base compressed oops mode and the optimization did not yield any statistically significant improvement, but please let me know if you have any specific benchmark/configuration in mind and I can re-check. Thanks for trying! I think I should try it on PPC64. The null check can be integrated into the oop decoding, so all Compressed Oops Modes would benefit. It could be that x86 is less sensitive to such optimizations. >> src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 182: >> >>> 180: $tmp2$$Register /* pre_val */, >>> 181: $tmp3$$Register /* tmp */, >>> 182: RegSet::of($mem$$Register, $newval$$Register, $oldval$$Register) /* preserve */); >> >> The only value which can get overwritten is `oldval`. Optimization idea: Pass `oldval` to the SATB barrier. There is no load of the old value required. > > Thanks, I will test and apply this one: it is simple enough and, in a way, even more intuitive than the current solution. For reference, I prototyped it [here](https://github.com/robcasloz/jdk/blob/JDK-8334060-g1-late-barrier-expansion-x64-optimizations/src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad) (guarded by `UseOldValInPreBarriers`). It should also apply to `g1CompareAndSwapP`, right? Exactly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1720338672 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1720340149 From mdoerr at openjdk.org Fri Aug 16 21:05:51 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 16 Aug 2024 21:05:51 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> Message-ID: <4aS0KysKaIPue1D60nootRPb8m7pdP_2hlPSwGUIk8w=.6b68b278-62ce-4df0-a6c6-c7fee40557aa@github.com> On Fri, 16 Aug 2024 13:03:51 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 301: >> >>> 299: RegSet::of($mem$$Register, $newval$$Register) /* preserve */); >>> 300: __ movq($tmp1$$Register, $newval$$Register); >>> 301: __ xchgq($newval$$Register, Address($mem$$Register, 0)); >> >> Optimization idea: Despite its name, `g1_pre_write_barrier` can be moved after the xchg operation because there's no safepoint within this MachNode. This allows avoiding loading the old value twice. > > Thanks, I also prototyped this [here](https://github.com/robcasloz/jdk/blob/JDK-8334060-g1-late-barrier-expansion-x64-optimizations/src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad) (guarded by `UseExchangedValueinPreBarriers`). The atomic barrier implementation in this PR is purposefully simple based on 1) the assumption that the cost of the atomic operations will dominate that of their barriers, and 2) the risk of introducing subtle bugs which can be difficult to catch by regular testing. Because of this, I feel hesitant to introduce this kind of optimizations for atomic operation barriers. But I am happy to reconsider, if you have any specific benchmark/configuration in mind where the benefit could outweigh the cost. Note that we had such an optimization already in C2: https://github.com/openjdk/jdk8u-dev/blob/4106121e0ae42d644e45c6eab9037874110ed670/hotspot/src/share/vm/opto/library_call.cpp#L3114 But, it's probably not a big deal. Maybe I can try it on PPC64 which may be more sensitive to accesses on contended memory. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1720347473 From egahlin at openjdk.org Fri Aug 16 23:09:01 2024 From: egahlin at openjdk.org (Erik Gahlin) Date: Fri, 16 Aug 2024 23:09:01 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v18] In-Reply-To: References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> Message-ID: On Fri, 16 Aug 2024 19:27:47 GMT, Chris Plummer wrote: > I wonder if it is possible to do this without the addition of FILE. Maybe most of the code can be shared, but we need a bit of dcmd specific code to invoke this filename expansion code. In JFR and JMC we have made the distinction between data type and content type. In this case, the data type would be STRING, but the content type would FILE. Maybe something similar could be added to the dcmd framework? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2294425946 From john.r.rose at oracle.com Fri Aug 16 23:38:45 2024 From: john.r.rose at oracle.com (John Rose) Date: Fri, 16 Aug 2024 19:38:45 -0400 Subject: RFR: 8338023: Support two vector selectFrom API In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: (Better late than never, although I wish I?d been more explicit about this on panama-dev.) I think we should be moving away from throwing exceptions on all reorder/shuffle/permute vector ops, and moving toward wrapping. These ops all operate on vectors (small arrays) of vector lane indexes (small array indexes in a fixed domain, always a power of two). The throwing behavior checks an input for bad indexes and throws a (scalar) exception if there are any at all. The wrapping behavior reduces bad indexes to good ones by an unsigned modulo operation (which is at worst a mask for powers of two). If I?m right, then new API points should start out with wrap semantics, not throw semantics. And old API points should be migrated ASAP. There?s no loss of functionality in such a move. Instead the defaults are moved around. Before, throwing was the default and wrapping was an explicit operation. After, wrapping would be the default and throwing would be explicit. Both wrapping and throwing checks are available through explicit calls to VectorShuffle methods checkIndexes and wrapIndexes. OK, so why is wrapping better than throwing? And first, why did we start with throwing as the default? Well, we chose throwing as the default to make the vector operations more Java-like. Java scalar operations don?t try to reduce bad array indexes into the array domain; they throw. Since a shuffle op is like an array reference, it makes sense to emulate the checks built into Java array references. Or it did make sense. I think there is a technical debt here which is turning out to be hard to pay off. The tech debt is to suppress or hoist or strength-reduce the vector instructions that perform the check for invalid indexes (in parallel), then ask ?did any of those checks fail?? (a mask reduction), then do a conditional branch to failure code. I think I was over-confident that our scalar tactics for reducing array range checks would apply to vectors as well. On second thought, vectorizing our key optimization, of loop range splitting (pre/main/post loops) is kind of a nightmare. Instead, consider the alternative of wrapping. First, you use vpand or the like to mask the indexes down to the valid range. Then you run the shuffle/permute instruction. That?s it. There is no scalar query or branch. And, there are probably some circumstances where you can omit the vpand operation: Perhaps the hardware already masks the inputs (as with shift instructions). Or, perhaps C2 can do bitwise inference of the vectors and figure out that the vpand is a nop. (I am agitating for bitwise types in C2; this is a use case for them.) In the worst case, the vpand op is fast and pipelines well. This is why I think we should switch, ASAP, to masking instead of throwing, on bad indexes. I think some of our reports from customers have shown that the extra checks necessary for throwing on bad indexes are giving their code surprising slowdowns, relative to C-based vector code. Did I miss a point? ? John On 14 Aug 2024, at 3:43, Jatin Bhateja wrote: > On Mon, 12 Aug 2024 22:03:44 GMT, Paul Sandoz wrote: > >> The results look promising. I can provide guidance on the specification e.g., we can specify the behavior in terms of rearrange, with the addition of throwing on out of bounds indexes. >> >> Regarding the throwing of exceptions, some wider context will help to know where we are heading before we finalize the specification. I believe we are considering changing the default throwing behavior for index out of bounds to wrapping, thereby we can avoid bounds checks. If that is the case we should wait until that is done then update rather than submitting a CSR just yet? >> >> I see you created a specific intrinsic, which will avoid the cost of shuffle creation. Should we apply the same approach (in a subsequent PR) to the single argument shuffle? Or perhaps if we manage to optimize shuffles and change the default wrapping we don't require a specific intrinsic and can just use defer to rearrange? > > Hi @PaulSandoz , > Thanks for your comments. With this new API we intend to enforce stricter specification w.r.t to index values to emit a lean instruction sequence preventing any cycles spent on massaging inputs to a consumable form, thus preventing redundant wrapping and unwrapping operations. > > Existing [two vector rearrange API](https://docs.oracle.com/en/java/javase/22/docs/api/jdk.incubator.vector/jdk/incubator/vector/Vector.html#rearrange(jdk.incubator.vector.VectorShuffle,jdk.incubator.vector.Vector)) has a flexible specification which allows wrapping out of bounds shuffle indexes into exceptional index with a -ve value. > > Even if we optimize existing two vector rearrange implementation we will still need to emit additional instructions to generate an indexes which lie within two vector range [0, 2*VLEN). I see this as a specialized API like vector compress/expand which cater to targets like x86-AVX512+ and aarch64-SVE which offers direct instruction for two vector lookups. > > May be the API nomenclature can be refined to better reflect its semantics i.e. from selectFrom to twoVectorLookup ? > > ------------- > > PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2288062038 From adinn at openjdk.org Sat Aug 17 15:48:20 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Sat, 17 Aug 2024 15:48:20 GMT Subject: RFR: 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime [v6] In-Reply-To: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> References: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> Message-ID: > Store the throw_exception and jfr stub code as blobs in class SharedRuntime, move the generation code to the the arch-specific generator classes and update client code to access them from their new location. Andrew Dinn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: - merge - fix up jvmci static field declarations - fix riscv port issues - fix accidental paste - copy macro across - typo - fix frame layouts for x86_32 - fix throw exception stub generation on zero - fix some header includes and defintions - fix issues with ports - ... and 1 more: https://git.openjdk.org/jdk/compare/8635642d...6307695a ------------- Changes: https://git.openjdk.org/jdk/pull/20566/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20566&range=05 Stats: 2886 lines in 43 files changed: 1341 ins; 1494 del; 51 mod Patch: https://git.openjdk.org/jdk/pull/20566.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20566/head:pull/20566 PR: https://git.openjdk.org/jdk/pull/20566 From adinn at openjdk.org Sat Aug 17 15:48:20 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Sat, 17 Aug 2024 15:48:20 GMT Subject: RFR: 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime [v5] In-Reply-To: References: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> <4YGUzK1zrvn1DEADwS8jCCaoGGJZOmKVwx0opVd74ZQ=.739fe436-a46e-4e0f-ac88-8ba7dd9c2f9b@github.com> Message-ID: On Thu, 15 Aug 2024 19:24:55 GMT, Vladimir Kozlov wrote: >> Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: >> >> fix up jvmci static field declarations > > Update is good. @vnkozlov @RealFYang @mur47x111 Thank you for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20566#issuecomment-2294895525 From aturbanov at openjdk.org Sat Aug 17 17:26:56 2024 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Sat, 17 Aug 2024 17:26:56 GMT Subject: RFR: 8338023: Support two vector selectFrom API In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Thu, 8 Aug 2024 06:57:28 GMT, Jatin Bhateja wrote: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... test/jdk/jdk/incubator/vector/Byte128VectorTests.java line 331: > 329: boolean is_exceptional_idx = (int)order[idx] >= vector_len; > 330: int oidx = is_exceptional_idx ? ((int)order[idx] - vector_len) : (int)order[idx]; > 331: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx]), "at index #" + idx + ", order = " + (int)order[idx] + ", a = " + a[i + oidx] + ", b = " + b[i + oidx]); Suggestion: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx]), "at index #" + idx + ", order = " + (int)order[idx] + ", a = " + a[i + oidx] + ", b = " + b[i + oidx]); test/jdk/jdk/incubator/vector/Double64VectorTests.java line 348: > 346: boolean is_exceptional_idx = (int)order[idx] >= vector_len; > 347: int oidx = is_exceptional_idx ? ((int)order[idx] - vector_len) : (int)order[idx]; > 348: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx]), "at index #" + idx + ", order = " + (int)order[idx] + ", a = " + a[i + oidx] + ", b = " + b[i + oidx]); Suggestion: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx]), "at index #" + idx + ", order = " + (int)order[idx] + ", a = " + a[i + oidx] + ", b = " + b[i + oidx]); test/jdk/jdk/incubator/vector/DoubleMaxVectorTests.java line 353: > 351: boolean is_exceptional_idx = (int)order[idx] >= vector_len; > 352: int oidx = is_exceptional_idx ? ((int)order[idx] - vector_len) : (int)order[idx]; > 353: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx]), "at index #" + idx + ", order = " + (int)order[idx] + ", a = " + a[i + oidx] + ", b = " + b[i + oidx]); Suggestion: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx]), "at index #" + idx + ", order = " + (int)order[idx] + ", a = " + a[i + oidx] + ", b = " + b[i + oidx]); test/jdk/jdk/incubator/vector/Float128VectorTests.java line 348: > 346: boolean is_exceptional_idx = (int)order[idx] >= vector_len; > 347: int oidx = is_exceptional_idx ? ((int)order[idx] - vector_len) : (int)order[idx]; > 348: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx]), "at index #" + idx + ", order = " + (int)order[idx] + ", a = " + a[i + oidx] + ", b = " + b[i + oidx]); Suggestion: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx]), "at index #" + idx + ", order = " + (int)order[idx] + ", a = " + a[i + oidx] + ", b = " + b[i + oidx]); test/jdk/jdk/incubator/vector/Float256VectorTests.java line 348: > 346: boolean is_exceptional_idx = (int)order[idx] >= vector_len; > 347: int oidx = is_exceptional_idx ? ((int)order[idx] - vector_len) : (int)order[idx]; > 348: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx]), "at index #" + idx + ", order = " + (int)order[idx] + ", a = " + a[i + oidx] + ", b = " + b[i + oidx]); Suggestion: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx]), "at index #" + idx + ", order = " + (int)order[idx] + ", a = " + a[i + oidx] + ", b = " + b[i + oidx]); test/jdk/jdk/incubator/vector/Float512VectorTests.java line 348: > 346: boolean is_exceptional_idx = (int)order[idx] >= vector_len; > 347: int oidx = is_exceptional_idx ? ((int)order[idx] - vector_len) : (int)order[idx]; > 348: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx]), "at index #" + idx + ", order = " + (int)order[idx] + ", a = " + a[i + oidx] + ", b = " + b[i + oidx]); Suggestion: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx]), "at index #" + idx + ", order = " + (int)order[idx] + ", a = " + a[i + oidx] + ", b = " + b[i + oidx]); test/jdk/jdk/incubator/vector/FloatMaxVectorTests.java line 353: > 351: boolean is_exceptional_idx = (int)order[idx] >= vector_len; > 352: int oidx = is_exceptional_idx ? ((int)order[idx] - vector_len) : (int)order[idx]; > 353: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx]), "at index #" + idx + ", order = " + (int)order[idx] + ", a = " + a[i + oidx] + ", b = " + b[i + oidx]); Suggestion: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx]), "at index #" + idx + ", order = " + (int)order[idx] + ", a = " + a[i + oidx] + ", b = " + b[i + oidx]); test/jdk/jdk/incubator/vector/Int512VectorTests.java line 331: > 329: boolean is_exceptional_idx = (int)order[idx] >= vector_len; > 330: int oidx = is_exceptional_idx ? ((int)order[idx] - vector_len) : (int)order[idx]; > 331: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx]), "at index #" + idx + ", order = " + (int)order[idx] + ", a = " + a[i + oidx] + ", b = " + b[i + oidx]); Suggestion: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx]), "at index #" + idx + ", order = " + (int)order[idx] + ", a = " + a[i + oidx] + ", b = " + b[i + oidx]); test/jdk/jdk/incubator/vector/IntMaxVectorTests.java line 336: > 334: boolean is_exceptional_idx = (int)order[idx] >= vector_len; > 335: int oidx = is_exceptional_idx ? ((int)order[idx] - vector_len) : (int)order[idx]; > 336: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx]), "at index #" + idx + ", order = " + (int)order[idx] + ", a = " + a[i + oidx] + ", b = " + b[i + oidx]); Suggestion: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx]), "at index #" + idx + ", order = " + (int)order[idx] + ", a = " + a[i + oidx] + ", b = " + b[i + oidx]); test/jdk/jdk/incubator/vector/Long256VectorTests.java line 288: > 286: boolean is_exceptional_idx = (int)order[idx] >= vector_len; > 287: int oidx = is_exceptional_idx ? ((int)order[idx] - vector_len) : (int)order[idx]; > 288: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx]), "at index #" + idx + ", order = " + (int)order[idx] + ", a = " + a[i + oidx] + ", b = " + b[i + oidx]); Suggestion: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx]), "at index #" + idx + ", order = " + (int)order[idx] + ", a = " + a[i + oidx] + ", b = " + b[i + oidx]); test/jdk/jdk/incubator/vector/Long64VectorTests.java line 288: > 286: boolean is_exceptional_idx = (int)order[idx] >= vector_len; > 287: int oidx = is_exceptional_idx ? ((int)order[idx] - vector_len) : (int)order[idx]; > 288: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx]), "at index #" + idx + ", order = " + (int)order[idx] + ", a = " + a[i + oidx] + ", b = " + b[i + oidx]); Suggestion: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx]), "at index #" + idx + ", order = " + (int)order[idx] + ", a = " + a[i + oidx] + ", b = " + b[i + oidx]); test/jdk/jdk/incubator/vector/Short256VectorTests.java line 331: > 329: boolean is_exceptional_idx = (int)order[idx] >= vector_len; > 330: int oidx = is_exceptional_idx ? ((int)order[idx] - vector_len) : (int)order[idx]; > 331: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx]), "at index #" + idx + ", order = " + (int)order[idx] + ", a = " + a[i + oidx] + ", b = " + b[i + oidx]); Suggestion: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx]), "at index #" + idx + ", order = " + (int)order[idx] + ", a = " + a[i + oidx] + ", b = " + b[i + oidx]); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1720807165 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1720807191 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1720807216 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1720807254 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1720807143 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1720807202 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1720807129 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1720807262 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1720807098 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1720807239 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1720807206 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1720807231 From aturbanov at openjdk.org Sat Aug 17 17:28:56 2024 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Sat, 17 Aug 2024 17:28:56 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v3] In-Reply-To: References: Message-ID: <9inKZjq3czAlh1fgRHhzGPxABxYlC6FEVpg7nloQYok=.9cd4a3f6-6d87-40c1-b9ee-63927bd7391f@github.com> On Wed, 14 Aug 2024 04:59:23 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions src/java.base/share/classes/java/lang/Byte.java line 81: > 79: * A constant holding polarity(sign) mask used by saturating operations. > 80: */ > 81: public static final byte POLARITY_MASK_BYTE = (byte)(1 << 7); Suggestion: public static final byte POLARITY_MASK_BYTE = (byte)(1 << 7); src/java.base/share/classes/java/lang/Byte.java line 672: > 670: byte res = (byte)(a + b); > 671: boolean overflow = Byte.compareUnsigned(res, (byte)(a | b)) < 0; > 672: if (overflow) { Suggestion: if (overflow) { src/java.base/share/classes/java/lang/Long.java line 93: > 91: * A constant holding polarity(sign) mask used by saturating operations. > 92: */ > 93: public static final long POLARITY_MASK_LONG = 1L << 63; Suggestion: public static final long POLARITY_MASK_LONG = 1L << 63; src/java.base/share/classes/java/lang/Long.java line 2033: > 2031: long res = a + b; > 2032: boolean overflow = Long.compareUnsigned(res, (a | b)) < 0; > 2033: if (overflow) { Suggestion: if (overflow) { src/java.base/share/classes/java/lang/Short.java line 707: > 705: short res = (short)(a + b); > 706: boolean overflow = Short.compareUnsigned(res, (short)(a | b)) < 0; > 707: if (overflow) { Suggestion: if (overflow) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1720807587 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1720807612 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1720807574 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1720807513 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1720807466 From gcao at openjdk.org Sun Aug 18 13:52:19 2024 From: gcao at openjdk.org (Gui Cao) Date: Sun, 18 Aug 2024 13:52:19 GMT Subject: RFR: 8338539: New Object to ObjectMonitor mapping: riscv64 implementation Message-ID: The riscv64 implementation of JDK-8315884 New Object to ObjectMonitor mapping ### Testing: - [x] tier1-3 & hotspot:tier4 tests (release) - [x] test/hotspot/jtreg/runtime/Monitor/UseObjectMonitorTableTest.java (release & fastdebug) ------------- Commit messages: - 8338539: New Object to ObjectMonitor mapping: riscv64 implementation Changes: https://git.openjdk.org/jdk/pull/20621/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20621&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338539 Stats: 147 lines in 9 files changed: 66 ins; 11 del; 70 mod Patch: https://git.openjdk.org/jdk/pull/20621.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20621/head:pull/20621 PR: https://git.openjdk.org/jdk/pull/20621 From fyang at openjdk.org Mon Aug 19 02:12:48 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 19 Aug 2024 02:12:48 GMT Subject: RFR: 8338539: New Object to ObjectMonitor mapping: riscv64 implementation In-Reply-To: References: Message-ID: On Sun, 18 Aug 2024 13:47:29 GMT, Gui Cao wrote: > The riscv64 implementation of JDK-8315884 New Object to ObjectMonitor mapping > > ### Testing: > - [x] tier1-3 & hotspot:tier4 tests (release) > - [x] test/hotspot/jtreg/runtime/Monitor/UseObjectMonitorTableTest.java (release & fastdebug) Looks good to me. Thanks! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20621#pullrequestreview-2244506555 From jbhateja at openjdk.org Mon Aug 19 06:47:50 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 19 Aug 2024 06:47:50 GMT Subject: RFR: 8338023: Support two vector selectFrom API In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Thu, 8 Aug 2024 06:57:28 GMT, Jatin Bhateja wrote: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... > _Mailing list message from [John Rose](mailto:john.r.rose at oracle.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at mail.openjdk.org):_ > > (Better late than never, although I wish I?d been more explicit about this on panama-dev.) > > I think we should be moving away from throwing exceptions on all reorder/shuffle/permute vector ops, and moving toward wrapping. These ops all operate on vectors (small arrays) of vector lane indexes (small array indexes in a fixed domain, always a power of two). The throwing behavior checks an input for bad indexes and throws a (scalar) exception if there are any at all. The wrapping behavior reduces bad indexes to good ones by an unsigned modulo operation (which is at worst a mask for powers of two). > > If I?m right, then new API points should start out with wrap semantics, not throw semantics. And old API points should be migrated ASAP. > > There?s no loss of functionality in such a move. Instead the defaults are moved around. Before, throwing was the default and wrapping was an explicit operation. After, wrapping would be the default and throwing would be explicit. Both wrapping and throwing checks are available through explicit calls to VectorShuffle methods checkIndexes and wrapIndexes. > > OK, so why is wrapping better than throwing? And first, why did we start with throwing as the default? Well, we chose throwing as the default to make the vector operations more Java-like. Java scalar operations don?t try to reduce bad array indexes into the array domain; they throw. Since a shuffle op is like an array reference, it makes sense to emulate the checks built into Java array references. > > Or it did make sense. I think there is a technical debt here which is turning out to be hard to pay off. The tech debt is to suppress or hoist or strength-reduce the vector instructions that perform the check for invalid indexes (in parallel), then ask ?did any of those checks fail?? (a mask reduction), then do a conditional branch to failure code. I think I was over-confident that our scalar tactics for reducing array range checks would apply to vectors as well. On second thought, vectorizing our key optimization, of loop range splitting (pre/main/post loops) is kind of a nightmare. > > Instead, consider the alternative of wrapping. First, you use vpand or the like to mask the indexes down to the valid range. Then you run the shuffle/permute instruction. That?s it. There is no scalar query or branch. And, there are probably some circumstances where you can omit the vpand operation: Perhaps the hardware already masks the inputs (as with shift instructions). Or, perhaps C2 can do bitwise inference of the vectors and figure out that the vpand is a nop. (I am agitating for bitwise types in C2; this is a use case for them.) In the worst case, the vpand op is fast and pipelines well. > > This is why I think we should switch, ASAP, to masking instead of throwing, on bad indexes. > > I think some of our reports from customers have shown that the extra checks necessary for throwing on bad indexes are giving their code surprising slowdowns, relative to C-based vector code. > > Did I miss a point? > > ? John > > On 14 Aug 2024, at 3:43, Jatin Bhateja wrote: Hi @rose00, I agree that wrapping should be the default behaviour if indices are passed through shuffles, idea was to pick exception throwing semantics for out of bounds indexes *only* for selectFrom flavour of APIs which accept indexes through vector interface, this will save redundant partial wrapping and un-wrapping for cross vector permutation API which has a direct mappings in x86 and AARCH64 ISA. As @PaulSandoz [suggested](https://github.com/openjdk/jdk/pull/20508#pullrequestreview-2234095541) we can also tune existing single 'selectFrom' API to adopt default exception throwing semantics if any of the indices lies beyond valid index range. While we will continue keeping default wrapping semantics for APIs accepting shuffles, this little deviation of semantics for selectFrom family of APIs will enable generating efficient code and will enable users to chooses between the rearrange and selectFrom APIs based on convenience vs efficient code trade-off. Since, API interfaces were crafted keeping in view long term flexibility, having multiple permutation interfaces (selectFrom / rearrange) accepting indexes though vector or shuffle enables compiler to emit efficient code. Best Regards, Jatin ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2295785781 From mbaesken at openjdk.org Mon Aug 19 06:57:53 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 19 Aug 2024 06:57:53 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured [v4] In-Reply-To: References: Message-ID: <-NTmrxbTC75OYtRPYCvVAZIvL5XW5AEO5e8ugXub11s=.db0fac76-7fda-4614-bab9-73c8a640781e@github.com> On Wed, 31 Jul 2024 14:07:46 GMT, Matthias Baesken wrote: >> Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work. >> >> We find this in the test output >> >> [STDOUT] >> /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory >> >> The container where the test is executed does not contain the ubsan package; we might skip the test in this case. > > Matthias Baesken has updated the pull request incrementally with two additional commits since the last revision: > > - remove method from WhiteBox.java > - remove WB_isUbsanEnabled, fix test I created https://bugs.openjdk.org/browse/JDK-8338550 8338550: Make libubsan1 installation in test container optional ------------- PR Comment: https://git.openjdk.org/jdk/pull/19907#issuecomment-2295799272 From jbhateja at openjdk.org Mon Aug 19 07:19:30 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 19 Aug 2024 07:19:30 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v4] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/8c9bfeca..c42b4afa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=02-03 Stats: 5 lines in 3 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From jbhateja at openjdk.org Mon Aug 19 07:36:15 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 19 Aug 2024 07:36:15 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v2] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review suggestions incorporated. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/82c0b0a2..055fb22f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=00-01 Stats: 31 lines in 31 files changed: 0 ins; 0 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From adinn at openjdk.org Mon Aug 19 08:32:51 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 19 Aug 2024 08:32:51 GMT Subject: RFR: 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime [v5] In-Reply-To: References: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> <4YGUzK1zrvn1DEADwS8jCCaoGGJZOmKVwx0opVd74ZQ=.739fe436-a46e-4e0f-ac88-8ba7dd9c2f9b@github.com> Message-ID: On Thu, 15 Aug 2024 19:24:55 GMT, Vladimir Kozlov wrote: >> Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: >> >> fix up jvmci static field declarations > > Update is good. @vnkozlov @RealFYang Could one of you please re-review. I had to merge with master to allow for a one-line include file change in sharedRuntime.cpp. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20566#issuecomment-2295978345 From shade at openjdk.org Mon Aug 19 08:37:49 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 19 Aug 2024 08:37:49 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v3] In-Reply-To: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> References: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> Message-ID: On Fri, 19 Jul 2024 15:52:14 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Amend the test case for guaranteing it works under different compilation regimes Not now, bot. Still waiting for reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2295989638 From fyang at openjdk.org Mon Aug 19 08:40:53 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 19 Aug 2024 08:40:53 GMT Subject: RFR: 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime [v6] In-Reply-To: References: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> Message-ID: On Sat, 17 Aug 2024 15:48:20 GMT, Andrew Dinn wrote: >> Store the throw_exception and jfr stub code as blobs in class SharedRuntime, move the generation code to the the arch-specific generator classes and update client code to access them from their new location. > > Andrew Dinn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - merge > - fix up jvmci static field declarations > - fix riscv port issues > - fix accidental paste > - copy macro across > - typo > - fix frame layouts for x86_32 > - fix throw exception stub generation on zero > - fix some header includes and defintions > - fix issues with ports > - ... and 1 more: https://git.openjdk.org/jdk/compare/8635642d...6307695a Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20566#pullrequestreview-2244946505 From rcastanedalo at openjdk.org Mon Aug 19 08:53:30 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 19 Aug 2024 08:53:30 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v9] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Pass oldval to the pre-barrier in g1CompareAndExchange/SwapP ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/554de779..92112802 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=07-08 Stats: 28 lines in 3 files changed: 12 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Mon Aug 19 08:53:30 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 19 Aug 2024 08:53:30 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> Message-ID: On Fri, 16 Aug 2024 20:56:08 GMT, Martin Doerr wrote: >> Thanks, I will test and apply this one: it is simple enough and, in a way, even more intuitive than the current solution. For reference, I prototyped it [here](https://github.com/robcasloz/jdk/blob/JDK-8334060-g1-late-barrier-expansion-x64-optimizations/src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad) (guarded by `UseOldValInPreBarriers`). It should also apply to `g1CompareAndSwapP`, right? > > Exactly. Done (commit 9211280). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1721433361 From adinn at openjdk.org Mon Aug 19 09:03:56 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 19 Aug 2024 09:03:56 GMT Subject: RFR: 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime [v3] In-Reply-To: References: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> <07DqhAfjMD9qfeno10HOAuNBeiIul86acqTMpE6YtaY=.2569accb-c0ab-470f-b348-5894831be5d5@github.com> Message-ID: On Wed, 14 Aug 2024 06:58:09 GMT, Fei Yang wrote: >> Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: >> >> fix accidental paste > > Hi Andrew, I find that we need following add-on change for riscv: > > > diff --git a/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp b/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp > index dc89e489b24..bed24e442e8 100644 > --- a/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp > +++ b/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp > @@ -66,6 +66,12 @@ > > #define __ masm-> > > +#ifdef PRODUCT > +#define BLOCK_COMMENT(str) /* nothing */ > +#else > +#define BLOCK_COMMENT(str) __ block_comment(str) > +#endif > + > const int StackAlignmentInSlots = StackAlignmentInBytes / VMRegImpl::stack_slot_size; > > class RegisterSaver { > @@ -2742,7 +2748,7 @@ static void jfr_epilogue(MacroAssembler* masm) { > // For c2: c_rarg0 is junk, call to runtime to write a checkpoint. > // It returns a jobject handle to the event writer. > // The handle is dereferenced and the return value is the event writer oop. > -static RuntimeStub* SharedRuntime::generate_jfr_write_checkpoint() { > +RuntimeStub* SharedRuntime::generate_jfr_write_checkpoint() { > enum layout { > fp_off, > fp_off2, > @@ -2780,7 +2786,7 @@ static RuntimeStub* SharedRuntime::generate_jfr_write_checkpoint() { > } > > // For c2: call to return a leased buffer. > -static RuntimeStub* SharedRuntime::generate_jfr_return_lease() { > +RuntimeStub* SharedRuntime::generate_jfr_return_lease() { > enum layout { > fp_off, > fp_off2, @RealFYang Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20566#issuecomment-2296041503 From adinn at openjdk.org Mon Aug 19 09:03:57 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 19 Aug 2024 09:03:57 GMT Subject: Integrated: 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime In-Reply-To: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> References: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> Message-ID: On Tue, 13 Aug 2024 11:35:59 GMT, Andrew Dinn wrote: > Store the throw_exception and jfr stub code as blobs in class SharedRuntime, move the generation code to the the arch-specific generator classes and update client code to access them from their new location. This pull request has now been integrated. Changeset: f0374a0b Author: Andrew Dinn URL: https://git.openjdk.org/jdk/commit/f0374a0bc181d0f2a8c0aa9aa032b07998ffaf60 Stats: 2886 lines in 43 files changed: 1341 ins; 1494 del; 51 mod 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime Reviewed-by: fyang, kvn, yzheng ------------- PR: https://git.openjdk.org/jdk/pull/20566 From shade at openjdk.org Mon Aug 19 09:11:54 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 19 Aug 2024 09:11:54 GMT Subject: RFR: 8338314: JFR: Split JFRCheckpoint VM operation In-Reply-To: References: Message-ID: <2zs7F7-VYeBINLWVq1DPndHhcGqTYM9giaSJK50xP9Q=.5b471238-a6c3-456e-8b4b-5506cfda016b@github.com> On Tue, 13 Aug 2024 15:56:46 GMT, Aleksey Shipilev wrote: > Investigating JFR crashes is a bit tedious, as Events section in `hs_err` shows just: > > > Event: 3.006 Executing VM operation: JFRCheckpoint > Event: 3.006 Executing VM operation: JFRCheckpoint done > > > What is that `JFRCheckpoint` doing is unclear, because it can do two separate things: clear or write. It would be good if we could disambiguate the two. Since there are only two flavors of checkpoint, I think we can just split the VMOp into two more precisely named ones, so it gives us e.g.: > > > Event: 2.462 Executing VM operation: JFRSafepointClear > Event: 2.463 Executing VM operation: JFRSafepointClear done Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20570#issuecomment-2296061659 From shade at openjdk.org Mon Aug 19 09:11:54 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 19 Aug 2024 09:11:54 GMT Subject: Integrated: 8338314: JFR: Split JFRCheckpoint VM operation In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 15:56:46 GMT, Aleksey Shipilev wrote: > Investigating JFR crashes is a bit tedious, as Events section in `hs_err` shows just: > > > Event: 3.006 Executing VM operation: JFRCheckpoint > Event: 3.006 Executing VM operation: JFRCheckpoint done > > > What is that `JFRCheckpoint` doing is unclear, because it can do two separate things: clear or write. It would be good if we could disambiguate the two. Since there are only two flavors of checkpoint, I think we can just split the VMOp into two more precisely named ones, so it gives us e.g.: > > > Event: 2.462 Executing VM operation: JFRSafepointClear > Event: 2.463 Executing VM operation: JFRSafepointClear done This pull request has now been integrated. Changeset: 6d430f24 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/6d430f24df9d599fe1e12c6b65117c02773ae5d8 Stats: 23 lines in 3 files changed: 12 ins; 1 del; 10 mod 8338314: JFR: Split JFRCheckpoint VM operation Reviewed-by: mgronlun, egahlin ------------- PR: https://git.openjdk.org/jdk/pull/20570 From shade at openjdk.org Mon Aug 19 09:17:50 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 19 Aug 2024 09:17:50 GMT Subject: RFR: 8337828: CDS: Trim down minimum GC region alignment [v2] In-Reply-To: <9eVYBElFs0NFXIuILx2Iu_NDuTM8wX8biuSkjxoUDNU=.5200d483-fa83-4ed6-9bbd-daa152201b90@github.com> References: <9eVYBElFs0NFXIuILx2Iu_NDuTM8wX8biuSkjxoUDNU=.5200d483-fa83-4ed6-9bbd-daa152201b90@github.com> Message-ID: On Thu, 15 Aug 2024 21:34:19 GMT, Aleksey Shipilev wrote: >> CDS currently follows G1's minimum region size to guess which alignment to use when dumping the heap. The comment near the constant rightfully recognizes it would be convenient for Shenandoah to trim the alignment down to 256K (Shenandoah's min region size). If we do this, we will improve the heap sizes [JDK-8293650](https://bugs.openjdk.org/browse/JDK-8293650) can operate at. >> >> Unless I am missing something else, trimming down the min region alignment has impact on the size of the objects we can store in CDS archive. Conveniently, `-Xlog:cds+heap` prints the object size stats for us, and it looks we are way under the 256K limit: >> >> >> $ build/macosx-aarch64-server-fastdebug/images/jdk/bin/java -XX:-UseCompressedOops -Xshare:dump -Xlog:cds+heap >> ... >> [0.921s][info][cds,heap] 0 objects are <= 8 bytes (total 0 bytes, avg 0.0 bytes) >> [0.921s][info][cds,heap] 2550 objects are <= 16 bytes (total 40800 bytes, avg 16.0 bytes) >> [0.921s][info][cds,heap] 14325 objects are <= 32 bytes (total 431896 bytes, avg 30.1 bytes) >> [0.921s][info][cds,heap] 6572 objects are <= 64 bytes (total 301304 bytes, avg 45.8 bytes) >> [0.921s][info][cds,heap] 1225 objects are <= 128 bytes (total 113112 bytes, avg 92.3 bytes) >> [0.921s][info][cds,heap] 2173 objects are <= 256 bytes (total 384024 bytes, avg 176.7 bytes) >> [0.921s][info][cds,heap] 143 objects are <= 512 bytes (total 47720 bytes, avg 333.7 bytes) >> [0.921s][info][cds,heap] 40 objects are <= 1024 bytes (total 26872 bytes, avg 671.8 bytes) >> [0.921s][info][cds,heap] 19 objects are <= 2048 bytes (total 29656 bytes, avg 1560.8 bytes) >> [0.921s][info][cds,heap] 9 objects are <= 4096 bytes (total 20744 bytes, avg 2304.9 bytes) >> [0.921s][info][cds,heap] 4 objects are <= 8192 bytes (total 20536 bytes, avg 5134.0 bytes) >> [0.921s][info][cds,heap] 3 objects are <= 16384 bytes (total 30168 bytes, avg 10056.0 bytes) >> [0.921s][info][cds,heap] 2 objects are <= 32768 bytes (total 32800 bytes, avg 16400.0 bytes) >> [0.921s][info][cds,heap] 0 objects are <= 65536 bytes (total 0 bytes, avg 0.0 bytes) >> [0.921s][info][cds,heap] 1 objects are <= 131072 bytes (total 66848 bytes, avg 66848.0 bytes) >> [0.921s][info][cds,heap] 0 objects are <= 262144 bytes (total 0 bytes, avg 0.0 bytes) >> [0.921s][info][cds... > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' into JDK-8337828-cds-min-alignment > - Work Need a re-review after merge :) Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20469#issuecomment-2296075672 From shade at openjdk.org Mon Aug 19 09:34:27 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 19 Aug 2024 09:34:27 GMT Subject: RFR: 8336468: Reflection and MethodHandles should use more precise initializer checks [v3] In-Reply-To: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com> References: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com> Message-ID: > This PR should cover the Reflection/MethodHandles part of [JDK-8336103](https://bugs.openjdk.org/browse/JDK-8336103). > > There are places where we change the behavior: `clinit` would now be recorded as "method", instead of "constructor". Tracing back the uses of `get_flags`: it is used for initializing `java.lang.ClassFrameInfo.flags`. There seem to be no readers for this field in VM. Java side for `j.l.CFI` does not seem to check any method/constructor flags. So I would say this change in behavior is not really visible, and there is no need to try and keep the old (odd) behavior. > > I also inlined the `select_method` definition, which allows for a bit more straight-forward local code, and obviates the need for wrapping things with `methodHandle`. > > @mlchung, you probably want to look at this more closely. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Whitespace and comments - Merge branch 'master' into JDK-8336468-reflection-init-checks - Merge branch 'master' into JDK-8336468-reflection-init-checks - Remove unnecessary handle-izing - Fix - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20192/files - new: https://git.openjdk.org/jdk/pull/20192/files/6e35634b..969cbb9e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20192&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20192&range=01-02 Stats: 47349 lines in 1374 files changed: 26449 ins; 14376 del; 6524 mod Patch: https://git.openjdk.org/jdk/pull/20192.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20192/head:pull/20192 PR: https://git.openjdk.org/jdk/pull/20192 From shade at openjdk.org Mon Aug 19 09:34:29 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 19 Aug 2024 09:34:29 GMT Subject: RFR: 8336468: Reflection and MethodHandles should use more precise initializer checks [v2] In-Reply-To: References: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com> Message-ID: On Tue, 16 Jul 2024 22:13:55 GMT, Chen Liang wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8336468-reflection-init-checks >> - Remove unnecessary handle-izing >> - Fix >> - Fix > > src/hotspot/share/runtime/reflection.cpp line 769: > >> 767: >> 768: oop Reflection::new_method(const methodHandle& method, bool for_constant_pool_access, TRAPS) { >> 769: // Allow sun.reflect.ConstantPool to refer to methods as java.lang.reflect.Methods. > > Not quite related, but it's jdk.internal.reflect.ConstantPool now :) Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20192#discussion_r1721485545 From sgehwolf at openjdk.org Mon Aug 19 10:08:54 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Mon, 19 Aug 2024 10:08:54 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured [v4] In-Reply-To: References: Message-ID: On Wed, 31 Jul 2024 14:07:46 GMT, Matthias Baesken wrote: >> Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work. >> >> We find this in the test output >> >> [STDOUT] >> /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory >> >> The container where the test is executed does not contain the ubsan package; we might skip the test in this case. > > Matthias Baesken has updated the pull request incrementally with two additional commits since the last revision: > > - remove method from WhiteBox.java > - remove WB_isUbsanEnabled, fix test Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19907#issuecomment-2296182319 From rcastanedalo at openjdk.org Mon Aug 19 12:19:51 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 19 Aug 2024 12:19:51 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> Message-ID: On Fri, 16 Aug 2024 20:54:25 GMT, Martin Doerr wrote: > The null check can be integrated into the oop decoding, so all Compressed Oops Modes would benefit. But integrating the null check into the zero-based OOP decoding operation would require adding a conditional branch to OOP decoding (as prototyped [here](https://github.com/robcasloz/jdk/blob/ac71b1a02c8c1bd4989f762d27092fd3bf19ccd7/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L5859-L5875)), right? This would effectively mean moving the post-barrier null check above the post-barrier inter-region check, which I am not sure is beneficial. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1721697447 From rcastanedalo at openjdk.org Mon Aug 19 12:22:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 19 Aug 2024 12:22:52 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: <4aS0KysKaIPue1D60nootRPb8m7pdP_2hlPSwGUIk8w=.6b68b278-62ce-4df0-a6c6-c7fee40557aa@github.com> References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <4aS0KysKaIPue1D60nootRPb8m7pdP_2hlPSwGUIk8w=.6b68b278-62ce-4df0-a6c6-c7fee40557aa@github.com> Message-ID: On Fri, 16 Aug 2024 21:03:14 GMT, Martin Doerr wrote: >> Thanks, I also prototyped this [here](https://github.com/robcasloz/jdk/blob/JDK-8334060-g1-late-barrier-expansion-x64-optimizations/src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad) (guarded by `UseExchangedValueinPreBarriers`). The atomic barrier implementation in this PR is purposefully simple based on 1) the assumption that the cost of the atomic operations will dominate that of their barriers, and 2) the risk of introducing subtle bugs which can be difficult to catch by regular testing. Because of this, I feel hesitant to introduce this kind of optimizations for atomic operation barriers. But I am happy to reconsider, if you have any specific benchmark/configuration in mind where the benefit could outweigh the cost. > > Note that we had such an optimization already in C2: https://github.com/openjdk/jdk8u-dev/blob/4106121e0ae42d644e45c6eab9037874110ed670/hotspot/src/share/vm/opto/library_call.cpp#L3114 > But, it's probably not a big deal. Maybe I can try it on PPC64 which may be more sensitive to accesses on contended memory. Thanks for the reference, I would still prefer to keep this part as is for simplicity. We can always optimize the atomic barriers in follow-up work, if a need arises. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1721701794 From mdoerr at openjdk.org Mon Aug 19 13:45:55 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 19 Aug 2024 13:45:55 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> Message-ID: On Mon, 19 Aug 2024 12:16:44 GMT, Roberto Casta?eda Lozano wrote: >> Thanks for trying! I think I should try it on PPC64. The null check can be integrated into the oop decoding, so all Compressed Oops Modes would benefit. It could be that x86 is less sensitive to such optimizations. > >> The null check can be integrated into the oop decoding, so all Compressed Oops Modes would benefit. > > But integrating the null check into the zero-based OOP decoding operation would require adding a conditional branch to OOP decoding (as prototyped [here](https://github.com/robcasloz/jdk/blob/ac71b1a02c8c1bd4989f762d27092fd3bf19ccd7/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L5859-L5875)), right? This would effectively mean moving the post-barrier null check above the post-barrier inter-region check, which I am not sure is beneficial. If case of heap base != null, a branch already exists which makes the other null check redundant. So, we have null check, region crossing check, another null check. Maybe this compressed oop mode is not important enough. For the other compressed oop modes, yes, this means moving the null check above the region crossing check. On PPC64, the null check can be combined with the shift instruction, so we save one compare instruction. Technically, it would even be possible to use only one branch instruction for both checks, but I'm not sure if it's worth the complexity. I'll think about it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1721813878 From kevinw at openjdk.org Mon Aug 19 14:14:57 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Mon, 19 Aug 2024 14:14:57 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v18] In-Reply-To: References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> Message-ID: On Fri, 16 Aug 2024 23:05:16 GMT, Erik Gahlin wrote: >> I think the only reason for the change from STRING to FILE was to support shared code (not per dcmd code) doing the %p expansion. See the following change: >> >> https://github.com/openjdk/jdk/pull/20198/files#diff-4f9f273d22b9cf7af6e58a5770b95cc758e8c8e9c257b56b9ac95bfa359c16f2 >> >> I wonder if it is possible to do this without the addition of FILE. Maybe most of the code can be shared, but we need a bit of dcmd specific code to invoke this filename expansion code. > >> I wonder if it is possible to do this without the addition of FILE. Maybe most of the code can be shared, but we need a bit of dcmd specific code to invoke this filename expansion code. > > In JFR/JMC we have made the distinction between data type and content type [1]. In this case, the data type would be STRING, but the content type would be FILE. Maybe something similar could be added to the dcmd framework? > > [1] https://docs.oracle.com/en/java/javase/22/docs/api/jdk.jfr/jdk/jfr/ContentType.html Thanks @egahlin I did some investigating, will long a jbs entry for this and try to summarise the issues. 8-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2296681845 From rcastanedalo at openjdk.org Mon Aug 19 14:27:53 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 19 Aug 2024 14:27:53 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> Message-ID: <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> On Mon, 19 Aug 2024 13:43:04 GMT, Martin Doerr wrote: >>> The null check can be integrated into the oop decoding, so all Compressed Oops Modes would benefit. >> >> But integrating the null check into the zero-based OOP decoding operation would require adding a conditional branch to OOP decoding (as prototyped [here](https://github.com/robcasloz/jdk/blob/ac71b1a02c8c1bd4989f762d27092fd3bf19ccd7/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L5859-L5875)), right? This would effectively mean moving the post-barrier null check above the post-barrier inter-region check, which I am not sure is beneficial. > > If case of heap base != null, a branch already exists which makes the other null check redundant. So, we have null check, region crossing check, another null check. Maybe this compressed oop mode is not important enough. > > For the other compressed oop modes, yes, this means moving the null check above the region crossing check. On PPC64, the null check can be combined with the shift instruction, so we save one compare instruction. Technically, it would even be possible to use only one branch instruction for both checks, but I'm not sure if it's worth the complexity. I'll think about it. OK, thanks. I just ran some benchmarks with zero-based OOP compression ([prototype here](https://github.com/robcasloz/jdk/tree/JDK-8334060-g1-late-barrier-expansion-x64-optimizations)) and could not observe any significant performance effect on three different x64 implementations. I think I will keep the `g1StoreN` implementation as-is in the x64 and aarch64 backends, for simplicity. Again, we can revisit this in follow-up work if need be. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1721881065 From gcao at openjdk.org Mon Aug 19 14:28:23 2024 From: gcao at openjdk.org (Gui Cao) Date: Mon, 19 Aug 2024 14:28:23 GMT Subject: RFR: 8338539: New Object to ObjectMonitor mapping: riscv64 implementation [v2] In-Reply-To: References: Message-ID: > The riscv64 implementation of JDK-8315884 New Object to ObjectMonitor mapping > > ### Testing: > - [x] tier1-3 & hotspot:tier4 tests (release) > - [x] test/hotspot/jtreg/runtime/Monitor/UseObjectMonitorTableTest.java (release & fastdebug) Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into JDK-8338539 - 8338539: New Object to ObjectMonitor mapping: riscv64 implementation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20621/files - new: https://git.openjdk.org/jdk/pull/20621/files/b636586e..4ad30729 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20621&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20621&range=00-01 Stats: 3392 lines in 65 files changed: 1657 ins; 1558 del; 177 mod Patch: https://git.openjdk.org/jdk/pull/20621.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20621/head:pull/20621 PR: https://git.openjdk.org/jdk/pull/20621 From kevinw at openjdk.org Mon Aug 19 18:48:57 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Mon, 19 Aug 2024 18:48:57 GMT Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID [v18] In-Reply-To: References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com> Message-ID: On Tue, 30 Jul 2024 14:33:10 GMT, Sonia Zaldana Calles wrote: >> Hi all, >> >> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. >> >> This PR addresses the following diagnostic commands: >> - [x] Compiler.perfmap >> - [x] GC.heap_dump >> - [x] System.dump_map >> - [x] Thread.dump_to_file >> - [x] VM.cds >> >> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). >> >> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, >> >> >> filename (Optional) Name of the file to which the flight recording data is >> written when the recording is stopped. If no filename is given, a >> filename is generated from the PID and the current date and is >> placed in the directory where the process was started. The >> filename may also be a directory in which case, the filename is >> generated from the PID and the current date in the specified >> directory. (STRING, no default value) >> >> Note: If a filename is given, '%p' in the filename will be >> replaced by the PID, and '%t' will be replaced by the time in >> 'yyyy_MM_dd_HH_mm_ss' format. >> >> >> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. >> >> Testing: >> >> - [x] Added test case passes. >> - [x] Modified existing VM.cds tests to also check for `%p` filenames. >> >> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). >> >> Cheers, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Fixing invocation outside of jcmd The follow-on JBS entry for dealing with the type names is: https://bugs.openjdk.org/browse/JDK-8338603 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2297214314 From sviswanathan at openjdk.org Mon Aug 19 22:05:52 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 19 Aug 2024 22:05:52 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v2] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Mon, 19 Aug 2024 07:36:15 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review suggestions incorporated. @rose00 @PaulSandoz Please see the work in progress (https://github.com/openjdk/jdk/pull/20634) to make wrap indices as default for rearrange and selectFrom apis. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2297535442 From david.holmes at oracle.com Mon Aug 19 22:34:41 2024 From: david.holmes at oracle.com (David Holmes) Date: Tue, 20 Aug 2024 08:34:41 +1000 Subject: RFC 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths In-Reply-To: References: Message-ID: Broadening the audience to hotspot-dev as zero response on hotspot-runtime-dev. David On 13/08/2024 4:12 pm, David Holmes wrote: > > Comment is sought on this proposed updated to the JNI Specification > > https://bugs.openjdk.org/browse/JDK-8328877 > > The modified UTf-8 format used by the VM can lead to UTF-8 sequences > that exceed the maximum value of an int, due to multi-byte encoding, > but the JNI GetStringUTFLength returns a jsize, which is (perhaps > incorrectly) a jint ie. an int. As a result the current implementation > will return a truncated version of the length of the sequence. To > address this we propose to do two things in the JNI spec: > > 1. We Deprecate GetStringUTFLength > > +### GetStringUTFLength (Deprecated) > > ?`jsize GetStringUTFLength(JNIEnv *env, jstring string);` > > ?Returns the length in bytes of the modified UTF-8 representation of a string. > > +As the capacity of a `jsize` variable is not sufficient to hold the length of > +all possible modified UTF-8 string representations (due to multi-byte encodings) > +this function is deprecated in favor of [`GetLargeStringUTFLength()`](#getlargestringutflength). > +If the modified UTF-8 representation of `string` has a length that exceeds the capacity > +of a `jsize` variable, then the length as of the last character that could be fully > +encoded without exceeding that capacity, is returned. > > 2. We add a new function GetLargeStringUTFLength > > +### GetLargeStringUTFLength > + > +`jlong GetLargeStringUTFLength(JNIEnv *env, jstring string);` > + > +Returns the complete length in bytes of the modified UTF-8 representation of a string. > > In addition we tweak the wording of GetStringUTFChars so that it: > > a) refers to a byte sequence instead of a byte array (to avoid > suggesting the returned sequence is limited by the capacity of a Java > array); and > > b) references the new GetLargeStringUTFLength function instead of the > Deprecated GetStringUTFLength > > Note that GetStringUTFRegion is still using an int length so can't be > used to obtain a giant region, but we don't expect this to be a > practical concern. > > The JNI version will also be bumped for this API addition. > > Thanks, > David > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dholmes at openjdk.org Mon Aug 19 22:59:47 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 19 Aug 2024 22:59:47 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int In-Reply-To: <1of0cndqphEvQjJD8q54cUrYMAKOPh-Y7hkZiZ-uooU=.46215ef2-b141-465f-9247-071f3eec483e@github.com> References: <1of0cndqphEvQjJD8q54cUrYMAKOPh-Y7hkZiZ-uooU=.46215ef2-b141-465f-9247-071f3eec483e@github.com> Message-ID: On Thu, 15 Aug 2024 20:44:52 GMT, Coleen Phillimore wrote: >> This work has been split out from JDK-8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths >> >> The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. Though with compact strings this reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define UTF8 lengths as `size_t` to accommodate all possible representations. Higher-level API's can still use `int` if they know the strings (eg symbols) are sufficiently constrained in length. See the comments in utf8.hpp that explain Strings, compact strings and the encoding. >> >> As the existing JNI `GetStringUTFLength` still requires the current truncating behaviour of ` UNICODE::utf8_length` we add back `UNICODE::utf8_length_as_int` for it to use. >> >> Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` use `length` as an IN/OUT parameter: it is the incoming (int) length of the jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. This makes some of the call sites a little messy with casts. >> >> Testing: >> - tiers 1-4 >> - GHA > > It doesn't look like GHA is configured for you here. Thanks for looking at this @coleenp ! > It doesn't look like GHA is configured for you here. I run it manually via the "actions" page. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20560#issuecomment-2297656882 From dholmes at openjdk.org Mon Aug 19 23:19:04 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 19 Aug 2024 23:19:04 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v2] In-Reply-To: References: Message-ID: > This work has been split out from JDK-8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths > > The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. Though with compact strings this reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define UTF8 lengths as `size_t` to accommodate all possible representations. Higher-level API's can still use `int` if they know the strings (eg symbols) are sufficiently constrained in length. See the comments in utf8.hpp that explain Strings, compact strings and the encoding. > > As the existing JNI `GetStringUTFLength` still requires the current truncating behaviour of ` UNICODE::utf8_length` we add back `UNICODE::utf8_length_as_int` for it to use. > > Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` use `length` as an IN/OUT parameter: it is the incoming (int) length of the jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. This makes some of the call sites a little messy with casts. > > Testing: > - tiers 1-4 > - GHA David Holmes has updated the pull request incrementally with one additional commit since the last revision: Add missing cast for signed-to-unsigned converion. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20560/files - new: https://git.openjdk.org/jdk/pull/20560/files/61459c68..5f38de2c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20560&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20560&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20560.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20560/head:pull/20560 PR: https://git.openjdk.org/jdk/pull/20560 From dholmes at openjdk.org Mon Aug 19 23:19:04 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 19 Aug 2024 23:19:04 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v2] In-Reply-To: <1of0cndqphEvQjJD8q54cUrYMAKOPh-Y7hkZiZ-uooU=.46215ef2-b141-465f-9247-071f3eec483e@github.com> References: <1of0cndqphEvQjJD8q54cUrYMAKOPh-Y7hkZiZ-uooU=.46215ef2-b141-465f-9247-071f3eec483e@github.com> Message-ID: <5ned4M2iUF1GfIM3E5uMRhYsM3f8trrPaX1yuXVY__g=.b2811553-0d1f-48b5-a0ed-61c3b118af1b@github.com> On Thu, 15 Aug 2024 20:43:51 GMT, Coleen Phillimore wrote: > Did you try to compile this with -Wconversion (not -Wsign-conversion) without -Werror? I hadn't but will do so ... > src/hotspot/share/classfile/javaClasses.cpp line 586: > >> 584: ResourceMark rm; >> 585: jbyte* position = (length == 0) ? nullptr : value->byte_at_addr(0); >> 586: size_t utf8_len = length; > > These are sign conversions. This is too big to change here and not sure what the fan out would be but java_lang_String::length() should return unsigned and also arrayOop.hpp length. They're never negative. We should probably assert that (below). Sorry I don't follow what you are asking for here. A Java String length is an int even though it can't be negative in length - just as Java array indices are int even though they can't be negative either. Anyway this conversion should have a cast, though none of the usual complaining compilers complained about it. > src/hotspot/share/classfile/javaClasses.cpp line 639: > >> 637: if (length == 0) { >> 638: return 0; >> 639: } > > Maybe assert length > 0 here? Why "> 0" ? > src/hotspot/share/classfile/javaClasses.cpp line 702: > >> 700: } else { >> 701: jbyte* position = (length == 0) ? nullptr : value->byte_at_addr(0); >> 702: return UNICODE::as_utf8(position, static_cast(length), buf, buflen); > > Don't you want checked_cast here not static casts. These lengths are the lengths of Java array so an int >= 0. > src/hotspot/share/prims/jni.cpp line 2226: > >> 2224: HOTSPOT_JNI_GETSTRINGUTFLENGTH_ENTRY(env, string); >> 2225: oop java_string = JNIHandles::resolve_non_null(string); >> 2226: jsize ret = java_lang_String::utf8_length_as_int(java_string); > > So the spec says that this should be jsize (signed int), which is why this is, right? Yes. Hence the other change to add a new JNI API. > src/hotspot/share/utilities/utf8.cpp line 512: > >> 510: >> 511: template >> 512: int UNICODE::utf8_length_as_int(const T* base, int length) { > > Why was this parameter left as int and not size_t ? It is the length of a Java array - so int >= 0 > src/hotspot/share/utilities/utf8.hpp line 47: > >> 45: >> 46: In the code below if we have latin-1 content then we treat the String's data >> 47: array as a jbyte[], else a jchar[]. The lengths of these arrays are specified > > jchar is 16-bits right? So the max length if not latin-1 is INT_MAX/2 ? The maximum number of jchar characters is INT_MAX/2 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20560#issuecomment-2297676721 PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1722493414 PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1722494136 PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1722494680 PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1722495327 PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1722495875 PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1722496709 From darcy at openjdk.org Tue Aug 20 00:34:49 2024 From: darcy at openjdk.org (Joe Darcy) Date: Tue, 20 Aug 2024 00:34:49 GMT Subject: RFR: 8336275: Move common Method and Constructor fields to Executable [v2] In-Reply-To: References: Message-ID: On Wed, 17 Jul 2024 03:03:23 GMT, Chen Liang wrote: >> Move fields common to Method and Field to executable, which simplifies implementation. Removed useless transient modifiers as Method and Field were never serializable. >> >> Note to core-libs reviewers: Please review the associated CSR on trivial removal of `abstract` modifier as well. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Redundant transient; Update the comments to be more accurate Catching up on reviews, core libs changes look fine. Since Executable is sealed with Constructor and Executable on its permits lists, moving methods up and down the hierarchy (as long as there are concrete methods on Constructor and Method) is fine. ------------- Marked as reviewed by darcy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20188#pullrequestreview-2246761996 From dlong at openjdk.org Tue Aug 20 03:18:54 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 20 Aug 2024 03:18:54 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v2] In-Reply-To: References: Message-ID: On Mon, 19 Aug 2024 23:19:04 GMT, David Holmes wrote: >> This work has been split out from JDK-8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths >> >> The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. Though with compact strings this reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define UTF8 lengths as `size_t` to accommodate all possible representations. Higher-level API's can still use `int` if they know the strings (eg symbols) are sufficiently constrained in length. See the comments in utf8.hpp that explain Strings, compact strings and the encoding. >> >> As the existing JNI `GetStringUTFLength` still requires the current truncating behaviour of ` UNICODE::utf8_length` we add back `UNICODE::utf8_length_as_int` for it to use. >> >> Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` use `length` as an IN/OUT parameter: it is the incoming (int) length of the jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. This makes some of the call sites a little messy with casts. >> >> Testing: >> - tiers 1-4 >> - GHA > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Add missing cast for signed-to-unsigned converion. src/hotspot/share/classfile/javaClasses.cpp line 695: > 693: // `length` is used as the incoming number of characters to > 694: // convert, and then set as the number of bytes in the UTF8 sequence. > 695: size_t length = java_lang_String::length(java_string, value); Above comment looks wrong. `length` is not an in/out reference below, so why not leave it as `int` at line 695? Assigning `int` to `size_t` will introduce a -Wsign-conversion warning. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1722633109 From dholmes at openjdk.org Tue Aug 20 03:56:51 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 20 Aug 2024 03:56:51 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v2] In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 03:16:29 GMT, Dean Long wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Add missing cast for signed-to-unsigned converion. > > src/hotspot/share/classfile/javaClasses.cpp line 695: > >> 693: // `length` is used as the incoming number of characters to >> 694: // convert, and then set as the number of bytes in the UTF8 sequence. >> 695: size_t length = java_lang_String::length(java_string, value); > > Above comment looks wrong. `length` is not an in/out reference below, so why not leave it as `int` at line 695? Assigning `int` to `size_t` will introduce a -Wsign-conversion warning. Thanks for looking at this @dean-long ! Good catch. Two places in the `String` code had leftovers from an earlier incarnation of the fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1722651664 From dholmes at openjdk.org Tue Aug 20 03:59:26 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 20 Aug 2024 03:59:26 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v3] In-Reply-To: References: Message-ID: > This work has been split out from JDK-8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths > > The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. Though with compact strings this reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define UTF8 lengths as `size_t` to accommodate all possible representations. Higher-level API's can still use `int` if they know the strings (eg symbols) are sufficiently constrained in length. See the comments in utf8.hpp that explain Strings, compact strings and the encoding. > > As the existing JNI `GetStringUTFLength` still requires the current truncating behaviour of ` UNICODE::utf8_length` we add back `UNICODE::utf8_length_as_int` for it to use. > > Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` use `length` as an IN/OUT parameter: it is the incoming (int) length of the jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. This makes some of the call sites a little messy with casts. > > Testing: > - tiers 1-4 > - GHA David Holmes has updated the pull request incrementally with one additional commit since the last revision: Fix incorrect comments and size_t use per Dean's review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20560/files - new: https://git.openjdk.org/jdk/pull/20560/files/5f38de2c..4d41ea31 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20560&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20560&range=01-02 Stats: 10 lines in 1 file changed: 0 ins; 5 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20560.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20560/head:pull/20560 PR: https://git.openjdk.org/jdk/pull/20560 From dholmes at openjdk.org Tue Aug 20 04:05:23 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 20 Aug 2024 04:05:23 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v4] In-Reply-To: References: Message-ID: > This work has been split out from JDK-8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths > > The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. Though with compact strings this reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define UTF8 lengths as `size_t` to accommodate all possible representations. Higher-level API's can still use `int` if they know the strings (eg symbols) are sufficiently constrained in length. See the comments in utf8.hpp that explain Strings, compact strings and the encoding. > > As the existing JNI `GetStringUTFLength` still requires the current truncating behaviour of ` UNICODE::utf8_length` we add back `UNICODE::utf8_length_as_int` for it to use. > > Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` use `length` as an IN/OUT parameter: it is the incoming (int) length of the jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. This makes some of the call sites a little messy with casts. > > Testing: > - tiers 1-4 > - GHA David Holmes has updated the pull request incrementally with two additional commits since the last revision: - fix cast - missing cast ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20560/files - new: https://git.openjdk.org/jdk/pull/20560/files/4d41ea31..8b651323 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20560&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20560&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20560.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20560/head:pull/20560 PR: https://git.openjdk.org/jdk/pull/20560 From amitkumar at openjdk.org Tue Aug 20 04:07:30 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 20 Aug 2024 04:07:30 GMT Subject: RFR: 8338365: [PPC64, s390] Out-of-bounds array access in secondary_super_cache [v2] In-Reply-To: References: Message-ID: <7-KM8rc-jN8FbeCI-yy2ew07rsjfRJhxA6hwHjNwdd0=.973bb860-a6c9-4988-8c5f-6591193e992d@github.com> > Port for s390x and PPC for the bug: [JDK-8337958](https://bugs.openjdk.org/browse/JDK-8337958), Out-of-bounds array access in secondary_super_cache Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/cpu/s390/macroAssembler_s390.cpp Co-authored-by: Andrew Haley ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20578/files - new: https://git.openjdk.org/jdk/pull/20578/files/7edd7326..55ee78e4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20578&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20578&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20578.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20578/head:pull/20578 PR: https://git.openjdk.org/jdk/pull/20578 From dholmes at openjdk.org Tue Aug 20 04:09:04 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 20 Aug 2024 04:09:04 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: References: Message-ID: > This work has been split out from JDK-8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths > > The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. Though with compact strings this reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define UTF8 lengths as `size_t` to accommodate all possible representations. Higher-level API's can still use `int` if they know the strings (eg symbols) are sufficiently constrained in length. See the comments in utf8.hpp that explain Strings, compact strings and the encoding. > > As the existing JNI `GetStringUTFLength` still requires the current truncating behaviour of ` UNICODE::utf8_length` we add back `UNICODE::utf8_length_as_int` for it to use. > > Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` use `length` as an IN/OUT parameter: it is the incoming (int) length of the jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. This makes some of the call sites a little messy with casts. > > Testing: > - tiers 1-4 > - GHA David Holmes has updated the pull request incrementally with one additional commit since the last revision: more missing casts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20560/files - new: https://git.openjdk.org/jdk/pull/20560/files/8b651323..0c332e9d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20560&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20560&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20560.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20560/head:pull/20560 PR: https://git.openjdk.org/jdk/pull/20560 From liach at openjdk.org Tue Aug 20 04:14:49 2024 From: liach at openjdk.org (Chen Liang) Date: Tue, 20 Aug 2024 04:14:49 GMT Subject: RFR: 8336275: Move common Method and Constructor fields to Executable [v2] In-Reply-To: References: Message-ID: On Wed, 17 Jul 2024 03:03:23 GMT, Chen Liang wrote: >> Move fields common to Method and Field to executable, which simplifies implementation. Removed useless transient modifiers as Method and Field were never serializable. >> >> Note to core-libs reviewers: Please review the associated CSR on trivial removal of `abstract` modifier as well. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Redundant transient; Update the comments to be more accurate Can anyone review the associated csr? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20188#issuecomment-2297928223 From dholmes at openjdk.org Tue Aug 20 05:27:17 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 20 Aug 2024 05:27:17 GMT Subject: RFR: 8328880: Events::log_exception should limit the size of the logging message Message-ID: <8KXxdCRCjeJny7i7Sv-d-7vTuGNJwUzq94r5jnG1o3I=.c59b2513-5066-46ea-9731-de3faf7e40f0@github.com> This simple enhancement allows for `Exceptions::_throw` to limit the message length printed by `Events::log_exception` in the same way that unified logging is limited. We simply allow a `message_length_limit` variable to be passed down - default value zero which means no limit (i.e. the full `strlen` of the message will be printed). Testing: - tiers 1-3 Thanks ------------- Commit messages: - Fix whitespace - 8328880: Events::log_exception should limit the size of the logging message Changes: https://git.openjdk.org/jdk/pull/20638/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20638&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8328880 Stats: 18 lines in 3 files changed: 10 ins; 1 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/20638.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20638/head:pull/20638 PR: https://git.openjdk.org/jdk/pull/20638 From vitaly.provodin at jetbrains.com Tue Aug 20 05:39:10 2024 From: vitaly.provodin at jetbrains.com (Vitaly Provodin) Date: Tue, 20 Aug 2024 09:39:10 +0400 Subject: 8315884: New Object to ObjectMonitor mapping causes linux-aarch64 musl to fail In-Reply-To: References: Message-ID: <8E87A8D2-8147-4F66-92EF-F688F7ADDE07@jetbrains.com> Hi all Recently we at JetBrains became getting the following build failure for Linux aarch64 musl ===============================8<------------------------------- . . . Compiling up to 55 files for jdk.jpackage /mnt/agent/work/f25b6e4d8156543c/src/hotspot/share/runtime/synchronizer.cpp: In static member function 'static intptr_t ObjectSynchronizer::FastHashCode(Thread*, oop)': /mnt/agent/work/f25b6e4d8156543c/src/hotspot/share/runtime/synchronizer.cpp:1116:1: error: unable to generate reloads for: 1116 | } | ^ (insn 565 564 566 28 (set (reg/v:DI 2 x2 [ reg2 ]) (ior:DI (and:DI (ashift:DI (reg/v:DI 188 [ ]) (const_int 8 [0x8])) (const_int 549755813632 [0x7fffffff00])) (and:DI (reg/v:DI 1 x1 [ reg1 ]) (const_int -549755813633 [0xffffff80000000ff])))) "/mnt/agent/work/f25b6e4d8156543c/src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp":49:21 792 {*aarch64_bfidi5_shift_alt} (nil)) during RTL pass: reload /mnt/agent/work/f25b6e4d8156543c/src/hotspot/share/runtime/synchronizer.cpp:1116:1: internal compiler error: in curr_insn_transform, at lra-constraints.c:3962 Please submit a full bug report, with preprocessed source if appropriate. See for instructions. make[3]: *** [lib/CompileJvm.gmk:168: /mnt/agent/work/f25b6e4d8156543c/build/linux-aarch64-server-release/hotspot/variant-server/libjvm/objs/synchronizer.o] Error 1 make[3]: *** Waiting for unfinished jobs.... make[2]: *** [make/Main.gmk:245: hotspot-server-libs] Error 2 . . . ===============================8 On 16 Aug 2024, at 10:23, Axel Boldt-Christmas wrote: > > On Thu, 15 Aug 2024 06:12:22 GMT, Axel Boldt-Christmas wrote: > >>> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >>> >>> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >>> >>> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >>> >>> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >>> >>> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >>> >>> # Cleanups >>> >>> Cleaned up displaced header usage for: >>> * BasicLock >>> * Contains some Zero changes >>> * Renames one exported JVMCI field >>> * ObjectMonitor >>> * Updates comments and tests consistencies >>> >>> # Refactoring >>> >>> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >>> >>> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >>> >>> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >>> >>> # LightweightSynchronizer >>> >>> Working on adapting and incorporating the following section as a comment in the source code >>> >>> ## Fast Locking >>> >>> CAS on locking bits in markWord. >>> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >>> >>> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >>> >>> If 0b10 (Inflated) is observed or there is to... >> >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove newline > > Thanks for all the reviews and contributions. > > ------------- > > PR Comment: https://git.openjdk.org/jdk/pull/20067#issuecomment-2292891722 -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitaly.provodin at jetbrains.com Tue Aug 20 05:39:10 2024 From: vitaly.provodin at jetbrains.com (Vitaly Provodin) Date: Tue, 20 Aug 2024 09:39:10 +0400 Subject: 8315884: New Object to ObjectMonitor mapping causes linux-aarch64 musl to fail In-Reply-To: References: Message-ID: <8E87A8D2-8147-4F66-92EF-F688F7ADDE07@jetbrains.com> Hi all Recently we at JetBrains became getting the following build failure for Linux aarch64 musl ===============================8<------------------------------- . . . Compiling up to 55 files for jdk.jpackage /mnt/agent/work/f25b6e4d8156543c/src/hotspot/share/runtime/synchronizer.cpp: In static member function 'static intptr_t ObjectSynchronizer::FastHashCode(Thread*, oop)': /mnt/agent/work/f25b6e4d8156543c/src/hotspot/share/runtime/synchronizer.cpp:1116:1: error: unable to generate reloads for: 1116 | } | ^ (insn 565 564 566 28 (set (reg/v:DI 2 x2 [ reg2 ]) (ior:DI (and:DI (ashift:DI (reg/v:DI 188 [ ]) (const_int 8 [0x8])) (const_int 549755813632 [0x7fffffff00])) (and:DI (reg/v:DI 1 x1 [ reg1 ]) (const_int -549755813633 [0xffffff80000000ff])))) "/mnt/agent/work/f25b6e4d8156543c/src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp":49:21 792 {*aarch64_bfidi5_shift_alt} (nil)) during RTL pass: reload /mnt/agent/work/f25b6e4d8156543c/src/hotspot/share/runtime/synchronizer.cpp:1116:1: internal compiler error: in curr_insn_transform, at lra-constraints.c:3962 Please submit a full bug report, with preprocessed source if appropriate. See for instructions. make[3]: *** [lib/CompileJvm.gmk:168: /mnt/agent/work/f25b6e4d8156543c/build/linux-aarch64-server-release/hotspot/variant-server/libjvm/objs/synchronizer.o] Error 1 make[3]: *** Waiting for unfinished jobs.... make[2]: *** [make/Main.gmk:245: hotspot-server-libs] Error 2 . . . ===============================8 On 16 Aug 2024, at 10:23, Axel Boldt-Christmas wrote: > > On Thu, 15 Aug 2024 06:12:22 GMT, Axel Boldt-Christmas wrote: > >>> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere. >>> >>> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. >>> >>> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). >>> >>> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). >>> >>> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. >>> >>> # Cleanups >>> >>> Cleaned up displaced header usage for: >>> * BasicLock >>> * Contains some Zero changes >>> * Renames one exported JVMCI field >>> * ObjectMonitor >>> * Updates comments and tests consistencies >>> >>> # Refactoring >>> >>> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. >>> >>> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. >>> >>> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ >>> >>> # LightweightSynchronizer >>> >>> Working on adapting and incorporating the following section as a comment in the source code >>> >>> ## Fast Locking >>> >>> CAS on locking bits in markWord. >>> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >>> >>> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. >>> >>> If 0b10 (Inflated) is observed or there is to... >> >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove newline > > Thanks for all the reviews and contributions. > > ------------- > > PR Comment: https://git.openjdk.org/jdk/pull/20067#issuecomment-2292891722 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rehn at openjdk.org Tue Aug 20 07:56:48 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 20 Aug 2024 07:56:48 GMT Subject: RFR: 8338539: New Object to ObjectMonitor mapping: riscv64 implementation [v2] In-Reply-To: References: Message-ID: On Mon, 19 Aug 2024 14:28:23 GMT, Gui Cao wrote: >> The riscv64 implementation of JDK-8315884 New Object to ObjectMonitor mapping >> >> ### Testing: >> - [x] tier1-3 & hotspot:tier4 tests (release) >> - [x] test/hotspot/jtreg/runtime/Monitor/UseObjectMonitorTableTest.java (release & fastdebug) > > Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into JDK-8338539 > - 8338539: New Object to ObjectMonitor mapping: riscv64 implementation Hey, not funcational review yet. But this code have been patched to many times "let me just add this". We shadow tmp1 register with tmp1_mark, then we shadow tmp1 with tmp1_monitor. And similar for other tmp registers. If we create two methods, one for "{ // Lightweight locking" and one for "{ // Handle inflated monitor." this code will be so much better. If you are not up for the task I can do it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20621#issuecomment-2298200469 From aph-open at littlepinkcloud.com Tue Aug 20 08:07:37 2024 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Tue, 20 Aug 2024 09:07:37 +0100 Subject: 8315884: New Object to ObjectMonitor mapping causes linux-aarch64 musl to fail In-Reply-To: <8E87A8D2-8147-4F66-92EF-F688F7ADDE07@jetbrains.com> References: <8E87A8D2-8147-4F66-92EF-F688F7ADDE07@jetbrains.com> Message-ID: On 8/20/24 06:39, Vitaly Provodin wrote: > Not sure if a ticket should be submitted against this issue into JBS because I could not find any info about supporting build platform for at?Linux aarch64 musl at https://wiki.openjdk.org/display/Build/Supported+Build+Platforms . Hopefully the list of supported platforms was outdated and aarch64 is still supported... Musl isn't involved here. This looks to me to be a GCC bug, and this line is the clue: during RTL pass: reload /mnt/agent/work/f25b6e4d8156543c/src/hotspot/share/runtime/synchronizer.cpp:1116:1: internal compiler error: in curr_insn_transform, at lra-constraints.c:3962 Please submit a full bug report, with preprocessed source if appropriate. See for instructions. I'd be interested to see the preprocessed C++. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From mdoerr at openjdk.org Tue Aug 20 08:38:48 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 20 Aug 2024 08:38:48 GMT Subject: RFR: 8338365: [PPC64, s390] Out-of-bounds array access in secondary_super_cache [v2] In-Reply-To: <7-KM8rc-jN8FbeCI-yy2ew07rsjfRJhxA6hwHjNwdd0=.973bb860-a6c9-4988-8c5f-6591193e992d@github.com> References: <7-KM8rc-jN8FbeCI-yy2ew07rsjfRJhxA6hwHjNwdd0=.973bb860-a6c9-4988-8c5f-6591193e992d@github.com> Message-ID: On Tue, 20 Aug 2024 04:07:30 GMT, Amit Kumar wrote: >> Port for s390x and PPC for the bug: [JDK-8337958](https://bugs.openjdk.org/browse/JDK-8337958), Out-of-bounds array access in secondary_super_cache > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/cpu/s390/macroAssembler_s390.cpp > > Co-authored-by: Andrew Haley Marked as reviewed by mdoerr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20578#pullrequestreview-2247335924 From amitkumar at openjdk.org Tue Aug 20 09:17:49 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 20 Aug 2024 09:17:49 GMT Subject: RFR: 8338365: [PPC64, s390] Out-of-bounds array access in secondary_super_cache [v2] In-Reply-To: <7-KM8rc-jN8FbeCI-yy2ew07rsjfRJhxA6hwHjNwdd0=.973bb860-a6c9-4988-8c5f-6591193e992d@github.com> References: <7-KM8rc-jN8FbeCI-yy2ew07rsjfRJhxA6hwHjNwdd0=.973bb860-a6c9-4988-8c5f-6591193e992d@github.com> Message-ID: On Tue, 20 Aug 2024 04:07:30 GMT, Amit Kumar wrote: >> Port for s390x and PPC for the bug: [JDK-8337958](https://bugs.openjdk.org/browse/JDK-8337958), Out-of-bounds array access in secondary_super_cache > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/cpu/s390/macroAssembler_s390.cpp > > Co-authored-by: Andrew Haley @theRealAph if it's fine for you, should I integrate it ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20578#issuecomment-2298375615 From aph at openjdk.org Tue Aug 20 09:24:50 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 20 Aug 2024 09:24:50 GMT Subject: RFR: 8338365: [PPC64, s390] Out-of-bounds array access in secondary_super_cache [v2] In-Reply-To: <7-KM8rc-jN8FbeCI-yy2ew07rsjfRJhxA6hwHjNwdd0=.973bb860-a6c9-4988-8c5f-6591193e992d@github.com> References: <7-KM8rc-jN8FbeCI-yy2ew07rsjfRJhxA6hwHjNwdd0=.973bb860-a6c9-4988-8c5f-6591193e992d@github.com> Message-ID: On Tue, 20 Aug 2024 04:07:30 GMT, Amit Kumar wrote: >> Port for s390x and PPC for the bug: [JDK-8337958](https://bugs.openjdk.org/browse/JDK-8337958), Out-of-bounds array access in secondary_super_cache > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/cpu/s390/macroAssembler_s390.cpp > > Co-authored-by: Andrew Haley Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20578#pullrequestreview-2247457019 From vitaly.provodin at jetbrains.com Tue Aug 20 09:38:57 2024 From: vitaly.provodin at jetbrains.com (Vitaly Provodin) Date: Tue, 20 Aug 2024 13:38:57 +0400 Subject: 8315884: New Object to ObjectMonitor mapping causes linux-aarch64 musl to fail In-Reply-To: References: <8E87A8D2-8147-4F66-92EF-F688F7ADDE07@jetbrains.com> Message-ID: <7F626C6C-D7E0-4893-A699-A5A9A4AA33AC@jetbrains.com> Andrew, bug report may be found here - https://bugs.openjdk.org/browse/JDK-8338660 Please let me know if more info required. Thanks, Vitaly > On 20 Aug 2024, at 12:07, Andrew Haley wrote: > > On 8/20/24 06:39, Vitaly Provodin wrote: >> Not sure if a ticket should be submitted against this issue into JBS because I could not find any info about supporting build platform for at Linux aarch64 musl at https://wiki.openjdk.org/display/Build/Supported+Build+Platforms . Hopefully the list of supported platforms was outdated and aarch64 is still supported... > > Musl isn't involved here. This looks to me to be a GCC bug, and this line > is the clue: > > during RTL pass: reload > /mnt/agent/work/f25b6e4d8156543c/src/hotspot/share/runtime/synchronizer.cpp:1116:1: internal compiler error: in curr_insn_transform, at lra-constraints.c:3962 > Please submit a full bug report, > with preprocessed source if appropriate. > See for instructions. > > I'd be interested to see the preprocessed C++. > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amitkumar at openjdk.org Tue Aug 20 09:56:52 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 20 Aug 2024 09:56:52 GMT Subject: Integrated: 8338365: [PPC64, s390] Out-of-bounds array access in secondary_super_cache In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 08:58:20 GMT, Amit Kumar wrote: > Port for s390x and PPC for the bug: [JDK-8337958](https://bugs.openjdk.org/browse/JDK-8337958), Out-of-bounds array access in secondary_super_cache This pull request has now been integrated. Changeset: 89ca5b6f Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/89ca5b6fbd82f00375b4f96b2f3526078088d3f9 Stats: 6 lines in 2 files changed: 0 ins; 2 del; 4 mod 8338365: [PPC64, s390] Out-of-bounds array access in secondary_super_cache Reviewed-by: mdoerr, aph, rrich ------------- PR: https://git.openjdk.org/jdk/pull/20578 From duke at openjdk.org Tue Aug 20 12:21:05 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Tue, 20 Aug 2024 12:21:05 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v2] In-Reply-To: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: > Hello, > > Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. > > The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. > > At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. > > # Performance > > ## Neoverse N1 > > > -------------------------------------------------------------------------------------------- > Version Baseline This patch > -------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > -------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op > ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op > ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op > ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op > ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op > ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op > ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op > ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op > ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op > ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op > ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op > ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op > ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op > ArraysHashCode.multibytes 10 avgt 15 5.4... Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: 8322770: AArch64: C2: Implement VectorizedHashCode The code to calculate a hash code consists of two parts: a stub method that implements a vectorized loop using Neon instruction which processes 16 or 32 elements per iteration depending on the data type; and an unrolled inlined scalar loop that processes remaining tail elements. [Performance] [[Neoverse V2]] ``` | 328a053 (master) | dc2909f (this) | ---------------------------------------------------------------------------------------------------------- Benchmark (size) Mode Cnt | Score Error | Score Error | Units ---------------------------------------------------------------------------------------------------------- ArraysHashCode.bytes 1 avgt 15 | 0.805 ? 0.206 | 0.815 ? 0.141 | ns/op ArraysHashCode.bytes 10 avgt 15 | 4.362 ? 0.013 | 3.522 ? 0.124 | ns/op ArraysHashCode.bytes 100 avgt 15 | 78.374 ? 0.136 | 12.935 ? 0.016 | ns/op ArraysHashCode.bytes 10000 avgt 15 | 9247.335 ? 13.691 | 1344.770 ? 1.898 | ns/op ArraysHashCode.chars 1 avgt 15 | 0.731 ? 0.035 | 0.723 ? 0.046 | ns/op ArraysHashCode.chars 10 avgt 15 | 4.359 ? 0.007 | 3.385 ? 0.004 | ns/op ArraysHashCode.chars 100 avgt 15 | 78.374 ? 0.117 | 11.903 ? 0.023 | ns/op ArraysHashCode.chars 10000 avgt 15 | 9248.328 ? 13.644 | 1344.007 ? 1.795 | ns/op ArraysHashCode.ints 1 avgt 15 | 0.746 ? 0.083 | 0.631 ? 0.020 | ns/op ArraysHashCode.ints 10 avgt 15 | 4.357 ? 0.009 | 3.387 ? 0.005 | ns/op ArraysHashCode.ints 100 avgt 15 | 78.391 ? 0.103 | 10.934 ? 0.015 | ns/op ArraysHashCode.ints 10000 avgt 15 | 9248.125 ? 12.583 | 1340.644 ? 1.869 | ns/op ArraysHashCode.multibytes 1 avgt 15 | 0.555 ? 0.020 | 0.559 ? 0.020 | ns/op ArraysHashCode.multibytes 10 avgt 15 | 2.681 ? 0.020 | 2.175 ? 0.045 | ns/op ArraysHashCode.multibytes 100 avgt 15 | 36.954 ? 0.051 | 12.870 ? 0.021 | ns/op ArraysHashCode.multibytes 10000 avgt 15 | 4862.703 ? 6.909 | 720.774 ? 3.487 | ns/op ArraysHashCode.multichars 1 avgt 15 | 0.551 ? 0.017 | 0.552 ? 0.018 | ns/op ArraysHashCode.multichars 10 avgt 15 | 2.683 ? 0.018 | 2.182 ? 0.086 | ns/op ArraysHashCode.multichars 100 avgt 15 | 36.988 ? 0.054 | 8.830 ? 0.013 | ns/op ArraysHashCode.multichars 10000 avgt 15 | 4862.279 ? 6.839 | 756.074 ? 6.754 | ns/op ArraysHashCode.multiints 1 avgt 15 | 0.555 ? 0.018 | 0.557 ? 0.019 | ns/op ArraysHashCode.multiints 10 avgt 15 | 2.689 ? 0.029 | 2.184 ? 0.074 | ns/op ArraysHashCode.multiints 100 avgt 15 | 36.992 ? 0.044 | 8.098 ? 0.012 | ns/op ArraysHashCode.multiints 10000 avgt 15 | 4873.863 ? 6.689 | 783.540 ? 9.151 | ns/op ArraysHashCode.multishorts 1 avgt 15 | 0.563 ? 0.021 | 0.561 ? 0.021 | ns/op ArraysHashCode.multishorts 10 avgt 15 | 2.679 ? 0.020 | 2.164 ? 0.054 | ns/op ArraysHashCode.multishorts 100 avgt 15 | 36.976 ? 0.053 | 8.828 ? 0.013 | ns/op ArraysHashCode.multishorts 10000 avgt 15 | 4861.118 ? 7.057 | 748.952 ? 6.040 | ns/op ArraysHashCode.shorts 1 avgt 15 | 0.631 ? 0.020 | 0.643 ? 0.033 | ns/op ArraysHashCode.shorts 10 avgt 15 | 4.362 ? 0.005 | 3.400 ? 0.025 | ns/op ArraysHashCode.shorts 100 avgt 15 | 78.324 ? 0.151 | 11.892 ? 0.017 | ns/op ArraysHashCode.shorts 10000 avgt 15 | 9246.323 ? 13.126 | 1344.304 ? 1.906 | ns/op StringHashCode.Algorithm.defaultLatin1 1 avgt 15 | 0.946 ? 0.061 | 0.924 ? 0.001 | ns/op StringHashCode.Algorithm.defaultLatin1 10 avgt 15 | 4.334 ? 0.046 | 3.447 ? 0.051 | ns/op StringHashCode.Algorithm.defaultLatin1 100 avgt 15 | 78.136 ? 0.105 | 12.950 ? 0.048 | ns/op StringHashCode.Algorithm.defaultLatin1 10000 avgt 15 | 9266.117 ? 13.184 | 1345.097 ? 1.963 | ns/op StringHashCode.Algorithm.defaultUTF16 1 avgt 15 | 0.692 ? 0.035 | 0.687 ? 0.034 | ns/op StringHashCode.Algorithm.defaultUTF16 10 avgt 15 | 4.323 ? 0.023 | 3.394 ? 0.015 | ns/op StringHashCode.Algorithm.defaultUTF16 100 avgt 15 | 78.317 ? 0.109 | 11.911 ? 0.017 | ns/op StringHashCode.Algorithm.defaultUTF16 10000 avgt 15 | 9249.620 ? 14.594 | 1344.533 ? 1.908 | ns/op StringHashCode.cached N/A avgt 15 | 0.518 ? 0.017 | 0.530 ? 0.031 | ns/op StringHashCode.empty N/A avgt 15 | 0.733 ? 0.086 | 0.849 ? 0.168 | ns/op StringHashCode.notCached N/A avgt 15 | 0.687 ? 0.084 | 0.630 ? 0.018 | ns/op ``` [Test] jtreg::tier1 passed on AArch64 and x86. ------------- Changes: https://git.openjdk.org/jdk/pull/18487/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=01 Stats: 484 lines in 9 files changed: 481 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18487.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18487/head:pull/18487 PR: https://git.openjdk.org/jdk/pull/18487 From gcao at openjdk.org Tue Aug 20 12:40:50 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 20 Aug 2024 12:40:50 GMT Subject: RFR: 8338539: New Object to ObjectMonitor mapping: riscv64 implementation [v2] In-Reply-To: References: Message-ID: <41Gz085AoCNBu48scw1l0qoN7aHibe1e0u1mxtBpTZY=.6231d6a0-5b92-42f9-a835-65d247c1b7fb@github.com> On Tue, 20 Aug 2024 07:50:43 GMT, Robbin Ehn wrote: > Hey, not functional review yet. > > But this code have been patched to many times "let me just add this". We shadow tmp1 register with tmp1_mark, then we shadow tmp1 with tmp1_monitor. And similar for other tmp registers. > > If we create two methods, one for "{ // Lightweight locking" and one for "{ // Handle inflated monitor." this code will be so much better. > > If you are not up for the task I can do it. > > (your patch actually slightly improves this, so I'm not saying it's your doing) Hi, Thanks for having a look! Your suggestion makes sense to me. But I feel that it's better to go with another PR. Will leave it for you :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20621#issuecomment-2298757697 From duke at openjdk.org Tue Aug 20 12:56:52 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Tue, 20 Aug 2024 12:56:52 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Sat, 3 Aug 2024 12:36:31 GMT, Andrew Haley wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > >> > What is going on with C2_MacroAssembler::arrays_hashcode_elsize? It's a change that should not be in this patch. >> >> It was a part of a cleanup. `C2_MacroAssembler::arrays_hashcode_elsize` was added by this PR, it's not present in [openjdk:master](https://github.com/openjdk/jdk/tree/master). JIC, please note that [mikabl-arm:285826-vmul-squashed](https://github.com/mikabl-arm/jdk/tree/285826-vmul-squashed) includes two commits: [mikabl-arm at f192030](https://github.com/mikabl-arm/jdk/commit/f19203015fb69e50636bdfa597c7aa48176a56cc) presented in the PR and [mikabl-arm at 3a52c7f](https://github.com/mikabl-arm/jdk/commit/3a52c7f89c293b79559201149f3159d5a8c831b6) / `HEAD` that develops further on the current state of the PR. >> >> > Please take out every change that either does nothing or duplicates functionality that already exists. See type2aelembytes. >> >> May I suggest to remove all _clenaup_ changes from [mikabl-arm:285826-vmul-squashed](https://github.com/mikabl-arm/jdk/tree/285826-vmul-squashed) and merge it to [mikabl-arm:8322770](https://github.com/mikabl-arm/jdk/tree/8322770) (this PR's branch) first? I can address any comments including `type2aelembytes` afterwards. This should make it easier to keep track of changes as now there are two different branches and I feel that this might get confusing. > > I'm thinking that "clean > >> > What is going on with C2_MacroAssembler::arrays_hashcode_elsize? It's a change that should not be in this patch. >> >> It was a part of a cleanup. `C2_MacroAssembler::arrays_hashcode_elsize` was added by this PR, it's not present in [openjdk:master](https://github.com/openjdk/jdk/tree/master). > > Oh, I see. I was assuming that this was a diff from master. I was in a hurry at the time... > >> > Please take out every change that either does nothing or duplicates functionality that already exists. See type2aelembytes. >> >> May I suggest to remove all _clenaup_ changes from [mikabl-arm:285826-vmul-squashed](https://github.com/mikabl-arm/jdk/tree/285826-vmul-squashed) and merge it to [mikabl-arm:8322770](https://github.com/mikabl-arm/jdk/tree/8322770) (this PR's branch) first? I can address any comments including `type2aelembytes` afterwards. This should make it easier to keep track of changes as now there are two different branches and I feel that this might get confusing. > > It certainly is. If you can tell me the hashes of the code you want me to look at and the master commit it's based on it'll be much easier to see. > > Looking past th... Hello @theRealAph , this revision addresses most of the comments from https://github.com/mikabl-arm/jdk/commit/3a52c7f89c293b79559201149f3159d5a8c831b6 and merges changes from that branch to the PR branch. Thank you for taking a look! Sorry for the force-push. Luckily, there were not so many review comments referencing the source code, so I hope this won't make the review process more difficult. You may find current performance data in the commit message. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2298790678 From shade at openjdk.org Tue Aug 20 12:56:52 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 20 Aug 2024 12:56:52 GMT Subject: RFR: 8328880: Events::log_exception should limit the size of the logging message In-Reply-To: <8KXxdCRCjeJny7i7Sv-d-7vTuGNJwUzq94r5jnG1o3I=.c59b2513-5066-46ea-9731-de3faf7e40f0@github.com> References: <8KXxdCRCjeJny7i7Sv-d-7vTuGNJwUzq94r5jnG1o3I=.c59b2513-5066-46ea-9731-de3faf7e40f0@github.com> Message-ID: On Tue, 20 Aug 2024 05:22:40 GMT, David Holmes wrote: > This simple enhancement allows for `Exceptions::_throw` to limit the message length printed by `Events::log_exception` in the same way that unified logging is limited. We simply allow a `message_length_limit` variable to be passed down - default value zero which means no limit (i.e. the full `strlen` of the message will be printed). > > Testing: > - tiers 1-3 > > Thanks Looks reasonable. (First time I see `%.*s`, this is a cute trick I did not know about before.) ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20638#pullrequestreview-2247915371 From rehn at openjdk.org Tue Aug 20 13:01:49 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 20 Aug 2024 13:01:49 GMT Subject: RFR: 8338539: New Object to ObjectMonitor mapping: riscv64 implementation [v2] In-Reply-To: References: Message-ID: On Mon, 19 Aug 2024 14:28:23 GMT, Gui Cao wrote: >> The riscv64 implementation of JDK-8315884 New Object to ObjectMonitor mapping >> >> ### Testing: >> - [x] tier1-3 & hotspot:tier4 tests (release) >> - [x] test/hotspot/jtreg/runtime/Monitor/UseObjectMonitorTableTest.java (release & fastdebug) > > Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into JDK-8338539 > - 8338539: New Object to ObjectMonitor mapping: riscv64 implementation Marked as reviewed by rehn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20621#pullrequestreview-2247919900 From rehn at openjdk.org Tue Aug 20 13:01:50 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 20 Aug 2024 13:01:50 GMT Subject: RFR: 8338539: New Object to ObjectMonitor mapping: riscv64 implementation [v2] In-Reply-To: <41Gz085AoCNBu48scw1l0qoN7aHibe1e0u1mxtBpTZY=.6231d6a0-5b92-42f9-a835-65d247c1b7fb@github.com> References: <41Gz085AoCNBu48scw1l0qoN7aHibe1e0u1mxtBpTZY=.6231d6a0-5b92-42f9-a835-65d247c1b7fb@github.com> Message-ID: <4SNl_nhyPkYFDXRztC2HTmvk9CXGWzzq3hLCydbDoHM=.70c059be-6727-4965-b965-c3352b5df156@github.com> On Tue, 20 Aug 2024 12:37:58 GMT, Gui Cao wrote: > Hi, Thanks for having a look! Your suggestion makes sense to me. But I feel that it's better to go with another PR. Will leave it for you :-) Yes, thanks! Looks good! Note that you are fixing https://bugs.openjdk.org/browse/JDK-8338638 with this patch also. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20621#issuecomment-2298797085 From aph at openjdk.org Tue Aug 20 13:13:50 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 20 Aug 2024 13:13:50 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v2] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Tue, 20 Aug 2024 12:21:05 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > 8322770: AArch64: C2: Implement VectorizedHashCode > > The code to calculate a hash code consists of two parts: a stub method that > implements a vectorized loop using Neon instruction which processes 16 or 32 > elements per iteration depending on the data type; and an unrolled inlined > scalar loop that processes remaining tail elements. > > [Performance] > > [[Neoverse V2]] > ``` > | 328a053 (master) | dc2909f (this) | > ---------------------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt | Score Error | Score Error | Units > ---------------------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 | 0.805 ? 0.206 | 0.815 ? 0.141 | ns/op > ArraysHashCode.bytes 10 avgt 15 | 4.362 ? 0.013 | 3.522 ? 0.124 | ns/op > ArraysHashCode.bytes 100 avgt 15 | 78.374 ? 0.136 | 12.935 ? 0.016 | ns/op > ArraysHashCode.bytes 10000 avgt 15 | 9247.335 ? 13.691 | 1344.770 ? 1.898 | ns/op > ArraysHashCode.chars 1 avgt 15 | 0.731 ? 0.035 | 0.723 ? 0.046 | ns/op > ArraysHashCode.chars 10 avgt 15 | 4.359 ? 0.007 | 3.385 ? 0.004 | ns/op > ArraysHashCode.chars 100 avgt 15 | 78.374 ? 0.117 | 11.903 ? 0.023 | ns/op > ArraysHashCode.chars 10000 avgt 15 | 9248.328 ? 13.644 | 1344.007 ? 1.795 | ns/op > ArraysHashCode.ints 1 avgt 15 | 0.746 ? 0.083 | 0.631 ? 0.020 | ns/op > ArraysHashCode.ints 10 avgt 15 | 4.357 ? 0.009 | 3.387 ? 0.005 | ns/op > ArraysHashCode.ints 100 avgt 15 | 78.391 ? 0.103 | 10.934 ? 0.015 | ns/op > ArraysHashCode.ints 10000 avgt 15 | 9248.125 ? 12.583 | 1340.644 ? 1.869 | ns/op > ArraysHashCode.multibytes 1 avgt 15 | 0.555 ? 0.020 | 0.559 ? 0.020 | ns/op > ArraysHashCode.multibytes 10 avgt 15 | 2.681 ? 0.020 | 2.175 ? 0.045 | ns/op > ArraysHas... src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5342: > 5340: __ align(CodeEntryAlignment); > 5341: > 5342: auto unreachable_mark_name = [this](BasicType eltype) -> const char * { What's going on here? `eltype` is unused. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1723289396 From duke at openjdk.org Tue Aug 20 13:18:57 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Tue, 20 Aug 2024 13:18:57 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v2] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Tue, 20 Aug 2024 13:11:17 GMT, Andrew Haley wrote: >> Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision: >> >> 8322770: AArch64: C2: Implement VectorizedHashCode >> >> The code to calculate a hash code consists of two parts: a stub method that >> implements a vectorized loop using Neon instruction which processes 16 or 32 >> elements per iteration depending on the data type; and an unrolled inlined >> scalar loop that processes remaining tail elements. >> >> [Performance] >> >> [[Neoverse V2]] >> ``` >> | 328a053 (master) | dc2909f (this) | >> ---------------------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt | Score Error | Score Error | Units >> ---------------------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 | 0.805 ? 0.206 | 0.815 ? 0.141 | ns/op >> ArraysHashCode.bytes 10 avgt 15 | 4.362 ? 0.013 | 3.522 ? 0.124 | ns/op >> ArraysHashCode.bytes 100 avgt 15 | 78.374 ? 0.136 | 12.935 ? 0.016 | ns/op >> ArraysHashCode.bytes 10000 avgt 15 | 9247.335 ? 13.691 | 1344.770 ? 1.898 | ns/op >> ArraysHashCode.chars 1 avgt 15 | 0.731 ? 0.035 | 0.723 ? 0.046 | ns/op >> ArraysHashCode.chars 10 avgt 15 | 4.359 ? 0.007 | 3.385 ? 0.004 | ns/op >> ArraysHashCode.chars 100 avgt 15 | 78.374 ? 0.117 | 11.903 ? 0.023 | ns/op >> ArraysHashCode.chars 10000 avgt 15 | 9248.328 ? 13.644 | 1344.007 ? 1.795 | ns/op >> ArraysHashCode.ints 1 avgt 15 | 0.746 ? 0.083 | 0.631 ? 0.020 | ns/op >> ArraysHashCode.ints 10 avgt 15 | 4.357 ? 0.009 | 3.387 ? 0.005 | ns/op >> ArraysHashCode.ints 100 avgt 15 | 78.391 ? 0.103 | 10.934 ? 0.015 | ns/op >> ArraysHashCode.ints 10000 avgt 15 | 9248.125 ? 12.583 | 1340.644 ? 1.869 | ns/op >> ArraysHashCode.multibytes 1 av... > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5342: > >> 5340: __ align(CodeEntryAlignment); >> 5341: >> 5342: auto unreachable_mark_name = [this](BasicType eltype) -> const char * { > > What's going on here? `eltype` is unused. Yeah, `eltype` looks like something I forgot to remove, will do. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1723296582 From matsaave at openjdk.org Tue Aug 20 14:23:14 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 20 Aug 2024 14:23:14 GMT Subject: RFR: 8335664: Parsing jsr broken: assert(bci>= 0 && bci < c->method()->code_size()) failed: index out of bounds Message-ID: Although JSR bytecodes cannot be generated by javac anymore, a classfile generated with a tool like JASM can still contain this bytecode. Should a program end with a JSR, there will be undefined behavior since the bytecode reads the address of the next instruction. In the case of Hotspot, this leads to a crash when generating oop maps. This fixes the calculation of basic blocks. The early exploration of this issue was done by @eme64 who also generated a reproducer. ------------- Commit messages: - Merge branch 'master' into jsr_8335664 - 8335664: Parsing jsr broken: assert(bci>= 0 && bci < c->method()->code_size()) failed: index out of bounds Changes: https://git.openjdk.org/jdk/pull/20645/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20645&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335664 Stats: 14 lines in 1 file changed: 6 ins; 6 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20645.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20645/head:pull/20645 PR: https://git.openjdk.org/jdk/pull/20645 From bkilambi at openjdk.org Tue Aug 20 14:26:50 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 20 Aug 2024 14:26:50 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v4] In-Reply-To: References: Message-ID: <_AcGpxU2tXImdvN3I65WLEFa5bDnLe6sdlHACNyxRUI=.7a79c223-254e-474a-bc40-fa861b9c1520@github.com> On Mon, 19 Aug 2024 07:19:30 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions. src/hotspot/share/opto/addnode.hpp line 404: > 402: //------------------------------UMaxINode--------------------------------------- > 403: // Maximum of 2 unsigned integers. > 404: class UMaxINode : public Node { Would it be better to define `max_opcode()` and `min_opcode()` for `UMaxINode` and `UMinINode`? These are used to find commutative patterns in `AddNode::Ideal()` and `MulNode::Ideal()` and optimize them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1723414044 From bkilambi at openjdk.org Tue Aug 20 14:55:52 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 20 Aug 2024 14:55:52 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v4] In-Reply-To: References: Message-ID: On Mon, 19 Aug 2024 07:19:30 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions. src/hotspot/share/opto/vectornode.hpp line 150: > 148: class SaturatingVectorNode : public VectorNode { > 149: private: > 150: bool _is_unsigned; Would it be better to make it a `const bool`? src/hotspot/share/opto/vectornode.hpp line 172: > 170: class SaturatingAddVBNode : public SaturatingVectorNode { > 171: public: > 172: SaturatingAddVBNode(Node* in1, Node* in2, const TypeVect* vt, bool is_unsigned) : SaturatingVectorNode(in1,in2,vt,is_unsigned) {} Style: spaces after the commas in `SaturatingVectorNode(in1,in2,vt,is_unsigned)` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1723459735 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1723463554 From lucy at openjdk.org Tue Aug 20 15:02:56 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 20 Aug 2024 15:02:56 GMT Subject: RFR: 8327652: S390x: Implements SLP support [v7] In-Reply-To: References: Message-ID: On Tue, 26 Mar 2024 15:10:37 GMT, Sidraya Jayagond wrote: >> This PR Adds SIMD support on s390x. > > Sidraya Jayagond has updated the pull request incrementally with one additional commit since the last revision: > > PopCountVI supported by z14 onwards. I agree to disable the vector part in the above mentioned emitters. Please do so by enclosing the code in a #if 0 . . . #endif block. And add a comment why the code was disabled. Add the changed file to this PR. I will then review and approve. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18162#issuecomment-2299076003 From duke at openjdk.org Tue Aug 20 15:18:08 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Tue, 20 Aug 2024 15:18:08 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v3] In-Reply-To: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: > Hello, > > Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. > > The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. > > At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. > > # Performance > > ## Neoverse N1 > > > -------------------------------------------------------------------------------------------- > Version Baseline This patch > -------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > -------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op > ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op > ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op > ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op > ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op > ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op > ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op > ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op > ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op > ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op > ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op > ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op > ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op > ArraysHashCode.multibytes 10 avgt 15 5.4... Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: cleanup: remove a redundant parameter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18487/files - new: https://git.openjdk.org/jdk/pull/18487/files/4c6812f6..8e9f8d0c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18487.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18487/head:pull/18487 PR: https://git.openjdk.org/jdk/pull/18487 From aph at openjdk.org Tue Aug 20 15:27:54 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 20 Aug 2024 15:27:54 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v2] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: <3iO7COBTn_HscQ0fGX27kHIk6zJNa8PzLMalLlnZ5Ak=.03b96849-a910-4fd6-a625-078f58ec60fb@github.com> On Tue, 20 Aug 2024 12:21:05 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision: > > 8322770: AArch64: C2: Implement VectorizedHashCode > > The code to calculate a hash code consists of two parts: a stub method that > implements a vectorized loop using Neon instruction which processes 16 or 32 > elements per iteration depending on the data type; and an unrolled inlined > scalar loop that processes remaining tail elements. > > [Performance] > > [[Neoverse V2]] > ``` > | 328a053 (master) | dc2909f (this) | > ---------------------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt | Score Error | Score Error | Units > ---------------------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 | 0.805 ? 0.206 | 0.815 ? 0.141 | ns/op > ArraysHashCode.bytes 10 avgt 15 | 4.362 ? 0.013 | 3.522 ? 0.124 | ns/op > ArraysHashCode.bytes 100 avgt 15 | 78.374 ? 0.136 | 12.935 ? 0.016 | ns/op > ArraysHashCode.bytes 10000 avgt 15 | 9247.335 ? 13.691 | 1344.770 ? 1.898 | ns/op > ArraysHashCode.chars 1 avgt 15 | 0.731 ? 0.035 | 0.723 ? 0.046 | ns/op > ArraysHashCode.chars 10 avgt 15 | 4.359 ? 0.007 | 3.385 ? 0.004 | ns/op > ArraysHashCode.chars 100 avgt 15 | 78.374 ? 0.117 | 11.903 ? 0.023 | ns/op > ArraysHashCode.chars 10000 avgt 15 | 9248.328 ? 13.644 | 1344.007 ? 1.795 | ns/op > ArraysHashCode.ints 1 avgt 15 | 0.746 ? 0.083 | 0.631 ? 0.020 | ns/op > ArraysHashCode.ints 10 avgt 15 | 4.357 ? 0.009 | 3.387 ? 0.005 | ns/op > ArraysHashCode.ints 100 avgt 15 | 78.391 ? 0.103 | 10.934 ? 0.015 | ns/op > ArraysHashCode.ints 10000 avgt 15 | 9248.125 ? 12.583 | 1340.644 ? 1.869 | ns/op > ArraysHashCode.multibytes 1 avgt 15 | 0.555 ? 0.020 | 0.559 ? 0.020 | ns/op > A... Yep, looks good. Compares reasonably well the x86 implementation. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18487#pullrequestreview-2248338427 From aph at openjdk.org Tue Aug 20 15:35:51 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 20 Aug 2024 15:35:51 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v3] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Tue, 20 Aug 2024 15:18:08 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: > > cleanup: remove a redundant parameter src/hotspot/cpu/aarch64/aarch64.ad line 16613: > 16611: vRegD_V12 vtmp8, vRegD_V13 vtmp9, vRegD_V14 vtmp10, > 16612: vRegD_V15 vtmp11, vRegD_V16 vtmp12, vRegD_V17 vtmp13, > 16613: rFlagsReg cr) Using fixed registers here is rather odd. Is there some reason not simply to use `vReg` here, rather than named specific vector registers? You could just pass them down to `arrays_hashcode`. This issue isn't drop-dead-critical, but it would simplify this patch, which is otherwise fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1723524435 From aph at openjdk.org Tue Aug 20 15:35:52 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 20 Aug 2024 15:35:52 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v2] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: <8e4e1ZzE5scPDGAZYMJP2jvd9L3tn2MYHF6QqjuLRC0=.50d76243-8d59-4aaf-abe4-1c0c80ff5988@github.com> On Tue, 20 Aug 2024 12:21:05 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > 8322770: AArch64: C2: Implement VectorizedHashCode > > The code to calculate a hash code consists of two parts: a stub method that > implements a vectorized loop using Neon instruction which processes 16 or 32 > elements per iteration depending on the data type; and an unrolled inlined > scalar loop that processes remaining tail elements. > > [Performance] > > [[Neoverse V2]] > ``` > | 328a053 (master) | dc2909f (this) | > ---------------------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt | Score Error | Score Error | Units > ---------------------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 | 0.805 ? 0.206 | 0.815 ? 0.141 | ns/op > ArraysHashCode.bytes 10 avgt 15 | 4.362 ? 0.013 | 3.522 ? 0.124 | ns/op > ArraysHashCode.bytes 100 avgt 15 | 78.374 ? 0.136 | 12.935 ? 0.016 | ns/op > ArraysHashCode.bytes 10000 avgt 15 | 9247.335 ? 13.691 | 1344.770 ? 1.898 | ns/op > ArraysHashCode.chars 1 avgt 15 | 0.731 ? 0.035 | 0.723 ? 0.046 | ns/op > ArraysHashCode.chars 10 avgt 15 | 4.359 ? 0.007 | 3.385 ? 0.004 | ns/op > ArraysHashCode.chars 100 avgt 15 | 78.374 ? 0.117 | 11.903 ? 0.023 | ns/op > ArraysHashCode.chars 10000 avgt 15 | 9248.328 ? 13.644 | 1344.007 ? 1.795 | ns/op > ArraysHashCode.ints 1 avgt 15 | 0.746 ? 0.083 | 0.631 ? 0.020 | ns/op > ArraysHashCode.ints 10 avgt 15 | 4.357 ? 0.009 | 3.387 ? 0.005 | ns/op > ArraysHashCode.ints 100 avgt 15 | 78.391 ? 0.103 | 10.934 ? 0.015 | ns/op > ArraysHashCode.ints 10000 avgt 15 | 9248.125 ? 12.583 | 1340.644 ? 1.869 | ns/op > ArraysHashCode.multibytes 1 avgt 15 | 0.555 ? 0.020 | 0.559 ? 0.020 | ns/op > ArraysHashCode.multibytes 10 avgt 15 | 2.681 ? 0.020 | 2.175 ? 0.045 | ns/op > ArraysHas... src/hotspot/share/utilities/intpow.hpp line 29: > 27: #define SHARE_UTILITIES_INTPOW_HPP > 28: > 29: #include "metaprogramming/enableIf.hpp" There's no need for any of this metaprogramming. A constexpr function would be better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1723527790 From aph at openjdk.org Tue Aug 20 16:00:51 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 20 Aug 2024 16:00:51 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v3] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Tue, 20 Aug 2024 15:30:56 GMT, Andrew Haley wrote: > Using fixed registers here is rather odd. My mistake. I see that you're calling a stub, unlike x86 which expands inline. It could go either way, whichever you choose to do is OK. Inline might be a bit more performant, but I suspect it's marginal. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1723566254 From aph at openjdk.org Tue Aug 20 16:07:54 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 20 Aug 2024 16:07:54 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v3] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: <0xc78J0-54rUP-0qceeo6ookOz1rxaYR5nxAP_MCgRE=.dc0f0a52-196d-4d50-85a5-f4e7e8abe15d@github.com> On Tue, 20 Aug 2024 15:18:08 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: > > cleanup: remove a redundant parameter src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5348: > 5346: }; > 5347: StubCodeMark mark(this, "StubRoutines", > 5348: eltype == T_BOOLEAN ? "_large_arrays_hashcode_boolean" Use a `switch(eltype)` here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1723579120 From aph at openjdk.org Tue Aug 20 16:14:54 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 20 Aug 2024 16:14:54 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v3] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Tue, 20 Aug 2024 15:18:08 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: > > cleanup: remove a redundant parameter General tip for working on Hotspot: prefer simple to complex. So if there's a simple way to do something, so simple that a first-time C++ programmer would get it, and there's a fancy way to do it, do it the simple way. Keep stuff like lambdas for stuff that really is easier to understand that way. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2299232815 From duke at openjdk.org Tue Aug 20 16:31:26 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Tue, 20 Aug 2024 16:31:26 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v3] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Tue, 20 Aug 2024 15:58:31 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/aarch64.ad line 16613: >> >>> 16611: vRegD_V12 vtmp8, vRegD_V13 vtmp9, vRegD_V14 vtmp10, >>> 16612: vRegD_V15 vtmp11, vRegD_V16 vtmp12, vRegD_V17 vtmp13, >>> 16613: rFlagsReg cr) >> >> Using fixed registers here is rather odd. >> >> Is there some reason not simply to use `vReg` here, rather than named specific vector registers? You could just pass them down to `arrays_hashcode`. This issue isn't drop-dead-critical, but it would simplify this patch, which is otherwise fine. > >> Using fixed registers here is rather odd. > > My mistake. I see that you're calling a stub, unlike x86 which expands inline. It could go either way, whichever you choose to do is OK. Inline might be a bit more performant, but I suspect it's marginal. The implementation is split into two parts: an unrolled scalar loop that handles up to tail 16/32 elements depending on the data type and a vectorized Neon loop. Inlining both parts takes more than 300B of code cache. Most of that size accounts for the Neon loop so it was moved into a stub. The scalar loop expands inline, the Neon loop is implemented via a stub which is called from the inlined part. This had no statistically significant effect on the performance of `ArraysHashCode` and `StringHashCode` benchmarks compared to the fully inlined (no stub) version. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1723610944 From amitkumar at openjdk.org Tue Aug 20 16:31:46 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 20 Aug 2024 16:31:46 GMT Subject: RFR: 8327652: S390x: Implements SLP support [v7] In-Reply-To: References: Message-ID: On Tue, 26 Mar 2024 15:10:37 GMT, Sidraya Jayagond wrote: >> This PR Adds SIMD support on s390x. > > Sidraya Jayagond has updated the pull request incrementally with one additional commit since the last revision: > > PopCountVI supported by z14 onwards. There is another problem, Build is broken with these changes. If we just pull the changes from this PR and do a build, this is the error I was getting: /home/amit/jdk/src/jdk.compiler/share/classes/com/sun/tools/javac/comp/Lower.java warning: [dangling-doc-comments] documentation comment is not attached to any declaration Then I thought maybe any dependency is causing issue and might have pick the latest changes; So I rebased it with head stream and was welcomed by this error: * For target hotspot_variant-server_libjvm_objs_ad_s390.o: /home/amit/jdk/src/hotspot/cpu/s390/s390.ad: In member function 'uint MachSpillCopyNode::implementation(C2_MacroAssembler*, PhaseRegAlloc*, bool, outputStream*) const': /home/amit/jdk/src/hotspot/cpu/s390/s390.ad:1153:11: error: 'cbuf' was not declared in this scope 1153 | if (cbuf) { | ^~~~ /home/amit/jdk/src/hotspot/cpu/s390/s390.ad:1160:11: error: 'cbuf' was not declared in this scope 1160 | if (cbuf) { | ^~~~ /home/amit/jdk/src/hotspot/cpu/s390/s390.ad:1167:11: error: 'cbuf' was not declared in this scope 1167 | if (cbuf) { | ^~~~ /home/amit/jdk/src/hotspot/cpu/s390/s390.ad:1175:11: error: 'cbuf' was not declared in this scope 1175 | if (cbuf) { | ^~~~ Weird thing is that initially it was building fine with all of the VMs (fastdebug, slowdebug, release). ------------- PR Comment: https://git.openjdk.org/jdk/pull/18162#issuecomment-2299265908 From dlong at openjdk.org Tue Aug 20 17:06:07 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 20 Aug 2024 17:06:07 GMT Subject: RFR: 8335664: Parsing jsr broken: assert(bci>= 0 && bci < c->method()->code_size()) failed: index out of bounds In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 14:12:55 GMT, Matias Saavedra Silva wrote: > Although JSR bytecodes cannot be generated by javac anymore, a classfile generated with a tool like JASM can still contain this bytecode. Should a program end with a JSR, there will be undefined behavior since the bytecode reads the address of the next instruction. In the case of Hotspot, this leads to a crash when generating oop maps. This fixes the calculation of basic blocks. > > The early exploration of this issue was done by @eme64 who also generated a reproducer. Please add the regression test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20645#issuecomment-2299336364 From dlong at openjdk.org Tue Aug 20 17:24:04 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 20 Aug 2024 17:24:04 GMT Subject: RFR: 8335664: Parsing jsr broken: assert(bci>= 0 && bci < c->method()->code_size()) failed: index out of bounds In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 14:12:55 GMT, Matias Saavedra Silva wrote: > Although JSR bytecodes cannot be generated by javac anymore, a classfile generated with a tool like JASM can still contain this bytecode. Should a program end with a JSR, there will be undefined behavior since the bytecode reads the address of the next instruction. In the case of Hotspot, this leads to a crash when generating oop maps. This fixes the calculation of basic blocks. > > The early exploration of this issue was done by @eme64 who also generated a reproducer. What happens if the `jsr` is followed by illegal bytecodes that are unreachable, because the `jsr` has no `ret`? Does GenerateOopMap still try to process the unreachable illegal bytecodes? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20645#issuecomment-2299369679 From sgehwolf at openjdk.org Tue Aug 20 17:34:46 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Tue, 20 Aug 2024 17:34:46 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v5] In-Reply-To: References: Message-ID: > Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. > > I'm adding those tests in order to not regress another time. > > Testing: > - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. > - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) > - [x] GHA Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Add Whitebox check for host cpu - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Fix comments - 8333446: Add tests for hierarchical container support ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19530/files - new: https://git.openjdk.org/jdk/pull/19530/files/139a9069..eda249b4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19530&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19530&range=03-04 Stats: 54052 lines in 1621 files changed: 30101 ins; 16055 del; 7896 mod Patch: https://git.openjdk.org/jdk/pull/19530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19530/head:pull/19530 PR: https://git.openjdk.org/jdk/pull/19530 From lucy at openjdk.org Tue Aug 20 17:35:08 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 20 Aug 2024 17:35:08 GMT Subject: RFR: 8327652: S390x: Implements SLP support [v7] In-Reply-To: References: Message-ID: On Tue, 26 Mar 2024 15:10:37 GMT, Sidraya Jayagond wrote: >> This PR Adds SIMD support on s390x. > > Sidraya Jayagond has updated the pull request incrementally with one additional commit since the last revision: > > PopCountVI supported by z14 onwards. Hi Amit, the function prototype now reads uint MachSpillCopyNode::implementation(C2_MacroAssembler *masm, PhaseRegAlloc *ra_, bool do_size, outputStream *os) const { There is no cbuf argument anymore. Side note: I would prefer to see If (masm != nullprt) Instead of If (masm) Thanks, Lutz From: Amit Kumar ***@***.***> Date: Tuesday, 20. August 2024 at 18:26 To: openjdk/jdk ***@***.***> Cc: Schmidt, Lutz ***@***.***>, Mention ***@***.***> Subject: Re: [openjdk/jdk] 8327652: S390x: Implements SLP support (PR #18162) There is another problem, Build is broken with these changes. If we just pull the changes from this PR and do a build, this is the error I was getting: /home/amit/jdk/src/jdk.compiler/share/classes/com/sun/tools/javac/comp/Lower.java warning: [dangling-doc-comments] documentation comment is not attached to any declaration Then I thought maybe any dependency is causing issue and might have pick the latest changes; So I rebased it with head stream and was welcomed by this error: * For target hotspot_variant-server_libjvm_objs_ad_s390.o: /home/amit/jdk/src/hotspot/cpu/s390/s390.ad: In member function 'uint MachSpillCopyNode::implementation(C2_MacroAssembler*, PhaseRegAlloc*, bool, outputStream*) const': /home/amit/jdk/src/hotspot/cpu/s390/s390.ad:1153:11: error: 'cbuf' was not declared in this scope 1153 | if (cbuf) { | ^~~~ /home/amit/jdk/src/hotspot/cpu/s390/s390.ad:1160:11: error: 'cbuf' was not declared in this scope 1160 | if (cbuf) { | ^~~~ /home/amit/jdk/src/hotspot/cpu/s390/s390.ad:1167:11: error: 'cbuf' was not declared in this scope 1167 | if (cbuf) { | ^~~~ /home/amit/jdk/src/hotspot/cpu/s390/s390.ad:1175:11: error: 'cbuf' was not declared in this scope 1175 | if (cbuf) { | ^~~~ Weird thing is that initially it was building fine with all of the VMs (fastdebug, slowdebug, release). ? Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***> ------------- PR Comment: https://git.openjdk.org/jdk/pull/18162#issuecomment-2299389036 From mli at openjdk.org Tue Aug 20 18:13:04 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 20 Aug 2024 18:13:04 GMT Subject: RFR: 8338539: New Object to ObjectMonitor mapping: riscv64 implementation [v2] In-Reply-To: References: Message-ID: On Mon, 19 Aug 2024 14:28:23 GMT, Gui Cao wrote: >> The riscv64 implementation of JDK-8315884 New Object to ObjectMonitor mapping >> >> ### Testing: >> - [x] tier1-3 & hotspot:tier4 tests (release) >> - [x] test/hotspot/jtreg/runtime/Monitor/UseObjectMonitorTableTest.java (release & fastdebug) > > Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into JDK-8338539 > - 8338539: New Object to ObjectMonitor mapping: riscv64 implementation Marked as reviewed by mli (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20621#pullrequestreview-2248704293 From duke at openjdk.org Tue Aug 20 18:31:12 2024 From: duke at openjdk.org (Aksh Desai) Date: Tue, 20 Aug 2024 18:31:12 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding In-Reply-To: References: Message-ID: On Thu, 4 Jul 2024 10:09:41 GMT, Hamlin Li wrote: > ## Performance > benchmarks run on CanVM-K230 > > data > > Benchmark m2+m1+scalar | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score +intrinsic+rvv | Score -intrinsic | Error | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Base64Decode.testBase64Decode | 0 | 144 | 4 | 1 | avgt | 10 | 97.771 | 98.506 | 0.713 | ns/op | 1.008 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 3 | avgt | 10 | 117.715 | 118.422 | 0.428 | ns/op | 1.006 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 7 | avgt | 10 | 174.625 | 172.767 | 7.671 | ns/op | 0.989 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 32 | avgt | 10 | 286.391 | 317.175 | 11.443 | ns/op | 1.107 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 64 | avgt | 10 | 336.932 | 503.257 | 15.738 | ns/op | 1.494 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 80 | avgt | 10 | 418.894 | 625.485 | 7.21 | ns/op | 1.493 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 96 | avgt | 10 | 353.813 | 698.67 | 15.485 | ns/op | 1.975 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 112 | avgt | 10 | 499.243 | 866.909 | 4.427 | ns/op | 1.736 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 512 | avgt | 10 | 1451.277 | 3530.048 | 3.685 | ns/op | 2.432 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 1000 | avgt | 10 | 2258.785 | 5964.066 | 9.075 | ns/op | 2.64 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 20000 | avgt | 10 | 39689.204 | 122334.929 | 255.195 | ns/op | 3.082 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1 | avgt | 10 | 187.032 | 158.558 | 7.606 | ns/op | 0.848 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 3 | avgt | 10 | 209.558 | 200.774 | 7.648 | ns/op | 0.958 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 7 | avgt | 10 | 556.696 | 505.072 | 8.748 | ns/op | 0.907 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 32 | avgt | 10 | 2139.767 | 1876.825 | 13.787 | ns/op | 0.877 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 64 | avgt | 10 | 6142.353 | 3818.199 | 35.622 | ns/op | 0.622 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 80 | avgt | 10 | 8746.205 | 4787.155 | 109.819 | ns/op | 0.547 > Base64Decode.testBase64MIMEDecode | 0 | ... LGTM ------------- Marked as reviewed by AkshDesai04 at github.com (no known OpenJDK username). PR Review: https://git.openjdk.org/jdk/pull/20026#pullrequestreview-2160205834 From mli at openjdk.org Tue Aug 20 18:31:12 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 20 Aug 2024 18:31:12 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding Message-ID: ## Performance benchmarks run on CanVM-K230 data Benchmark m2+m1+scalar | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score +intrinsic+rvv | Score -intrinsic | Error | Units | Improvement -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- Base64Decode.testBase64Decode | 0 | 144 | 4 | 1 | avgt | 10 | 97.771 | 98.506 | 0.713 | ns/op | 1.008 Base64Decode.testBase64Decode | 0 | 144 | 4 | 3 | avgt | 10 | 117.715 | 118.422 | 0.428 | ns/op | 1.006 Base64Decode.testBase64Decode | 0 | 144 | 4 | 7 | avgt | 10 | 174.625 | 172.767 | 7.671 | ns/op | 0.989 Base64Decode.testBase64Decode | 0 | 144 | 4 | 32 | avgt | 10 | 286.391 | 317.175 | 11.443 | ns/op | 1.107 Base64Decode.testBase64Decode | 0 | 144 | 4 | 64 | avgt | 10 | 336.932 | 503.257 | 15.738 | ns/op | 1.494 Base64Decode.testBase64Decode | 0 | 144 | 4 | 80 | avgt | 10 | 418.894 | 625.485 | 7.21 | ns/op | 1.493 Base64Decode.testBase64Decode | 0 | 144 | 4 | 96 | avgt | 10 | 353.813 | 698.67 | 15.485 | ns/op | 1.975 Base64Decode.testBase64Decode | 0 | 144 | 4 | 112 | avgt | 10 | 499.243 | 866.909 | 4.427 | ns/op | 1.736 Base64Decode.testBase64Decode | 0 | 144 | 4 | 512 | avgt | 10 | 1451.277 | 3530.048 | 3.685 | ns/op | 2.432 Base64Decode.testBase64Decode | 0 | 144 | 4 | 1000 | avgt | 10 | 2258.785 | 5964.066 | 9.075 | ns/op | 2.64 Base64Decode.testBase64Decode | 0 | 144 | 4 | 20000 | avgt | 10 | 39689.204 | 122334.929 | 255.195 | ns/op | 3.082 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1 | avgt | 10 | 187.032 | 158.558 | 7.606 | ns/op | 0.848 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 3 | avgt | 10 | 209.558 | 200.774 | 7.648 | ns/op | 0.958 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 7 | avgt | 10 | 556.696 | 505.072 | 8.748 | ns/op | 0.907 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 32 | avgt | 10 | 2139.767 | 1876.825 | 13.787 | ns/op | 0.877 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 64 | avgt | 10 | 6142.353 | 3818.199 | 35.622 | ns/op | 0.622 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 80 | avgt | 10 | 8746.205 | 4787.155 | 109.819 | ns/op | 0.547 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 96 | avgt | 10 | 11429.643 | 5225.869 | 57.408 | ns/op | 0.457 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 112 | avgt | 10 | 14586.534 | 6528.511 | 86.167 | ns/op | 0.448 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 512 | avgt | 10 | 82884.484 | 29764.031 | 1538.338 | ns/op | 0.359 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1000 | avgt | 10 | 166154.047 | 56193.313 | 934.885 | ns/op | 0.338 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 20000 | avgt | 10 | 3178903.467 | 912297.825 | 10766.282 | ns/op | 0.287 Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 1 | avgt | 10 | 104.456 | 97.654 | 1.042 | ns/op | 0.935 Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 3 | avgt | 10 | 117.342 | 116.513 | 0.757 | ns/op | 0.993 Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 7 | avgt | 10 | 176.452 | 172.904 | 2.236 | ns/op | 0.98 Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 32 | avgt | 10 | 289.951 | 321.197 | 13.252 | ns/op | 1.108 Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 64 | avgt | 10 | 341.196 | 504.073 | 10.689 | ns/op | 1.477 Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 80 | avgt | 10 | 425.068 | 635.353 | 2.713 | ns/op | 1.495 Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 96 | avgt | 10 | 355.75 | 712.835 | 30.237 | ns/op | 2.004 Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 112 | avgt | 10 | 492.822 | 867.697 | 5.785 | ns/op | 1.761 Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 512 | avgt | 10 | 1468.55 | 3318.803 | 2.294 | ns/op | 2.26 Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 1000 | avgt | 10 | 2319.816 | 5961.423 | 12.996 | ns/op | 2.57 Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 20000 | avgt | 10 | 39549.088 | 121430.165 | 69.5 | ns/op | 3.07 Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 1 | avgt | 10 | 22485.102 | 21945.412 | 137.514 | ns/op | 0.976 Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 3 | avgt | 10 | 22083.094 | 22710.36 | 141.745 | ns/op | 1.028 Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 7 | avgt | 10 | 22330.146 | 22280.193 | 187.924 | ns/op | 0.998 Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 32 | avgt | 10 | 22398.4 | 22701.438 | 166.468 | ns/op | 1.014 Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 64 | avgt | 10 | 22754.157 | 22274.207 | 166.477 | ns/op | 0.979 Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 80 | avgt | 10 | 21927.062 | 22913.011 | 1134.089 | ns/op | 1.045 Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 96 | avgt | 10 | 22250.999 | 21675.835 | 144.776 | ns/op | 0.974 Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 112 | avgt | 10 | 22470.258 | 21932.519 | 139.712 | ns/op | 0.976 Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 512 | avgt | 10 | 22317.662 | 22568.852 | 151.706 | ns/op | 1.011 Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 1000 | avgt | 10 | 22934.285 | 21969.27 | 52.458 | ns/op | 0.958 Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 20000 | avgt | 10 | 23931.765 | 23779.238 | 657.466 | ns/op | 0.994 ------------- Commit messages: - fix length issue - minor - merge master - fix misc - fix MIME perf issue; misc - minor - merge master - minor - Initial commit Changes: https://git.openjdk.org/jdk/pull/20026/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20026&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314124 Stats: 284 lines in 3 files changed: 282 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20026.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20026/head:pull/20026 PR: https://git.openjdk.org/jdk/pull/20026 From mli at openjdk.org Tue Aug 20 18:31:12 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 20 Aug 2024 18:31:12 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding In-Reply-To: References: Message-ID: On Thu, 4 Jul 2024 10:09:41 GMT, Hamlin Li wrote: > ## Performance > benchmarks run on CanVM-K230 > > data > > Benchmark m2+m1+scalar | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score +intrinsic+rvv | Score -intrinsic | Error | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Base64Decode.testBase64Decode | 0 | 144 | 4 | 1 | avgt | 10 | 97.771 | 98.506 | 0.713 | ns/op | 1.008 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 3 | avgt | 10 | 117.715 | 118.422 | 0.428 | ns/op | 1.006 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 7 | avgt | 10 | 174.625 | 172.767 | 7.671 | ns/op | 0.989 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 32 | avgt | 10 | 286.391 | 317.175 | 11.443 | ns/op | 1.107 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 64 | avgt | 10 | 336.932 | 503.257 | 15.738 | ns/op | 1.494 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 80 | avgt | 10 | 418.894 | 625.485 | 7.21 | ns/op | 1.493 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 96 | avgt | 10 | 353.813 | 698.67 | 15.485 | ns/op | 1.975 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 112 | avgt | 10 | 499.243 | 866.909 | 4.427 | ns/op | 1.736 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 512 | avgt | 10 | 1451.277 | 3530.048 | 3.685 | ns/op | 2.432 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 1000 | avgt | 10 | 2258.785 | 5964.066 | 9.075 | ns/op | 2.64 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 20000 | avgt | 10 | 39689.204 | 122334.929 | 255.195 | ns/op | 3.082 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1 | avgt | 10 | 187.032 | 158.558 | 7.606 | ns/op | 0.848 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 3 | avgt | 10 | 209.558 | 200.774 | 7.648 | ns/op | 0.958 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 7 | avgt | 10 | 556.696 | 505.072 | 8.748 | ns/op | 0.907 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 32 | avgt | 10 | 2139.767 | 1876.825 | 13.787 | ns/op | 0.877 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 64 | avgt | 10 | 6142.353 | 3818.199 | 35.622 | ns/op | 0.622 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 80 | avgt | 10 | 8746.205 | 4787.155 | 109.819 | ns/op | 0.547 > Base64Decode.testBase64MIMEDecode | 0 | ... To continue the discussion at https://github.com/openjdk/jdk/pull/19973#issuecomment-2210907011. vrgroup implementation bring some regression compared with current implementation in this pr in large size data (vrgroup also bring regression in small size data, but we can ignore the regression in small size data, as current implementation use scalar version when data size is small, it's expected.) A implementation with vrgroup is at https://github.com/openjdk/jdk/compare/master...Hamlin-Li:jdk:baes64-decode-vrgroup?expand=1 comparison between this implementation and vrgroup Benchmark +/- vrgroup | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score +intrinsic+rvv+vrgroup | Score +intrinsic+rvv-vrgroup | Error | Units | Improvement of vrgroup -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- Base64Decode.testBase64Decode | 0 | 144 | 4 | 1 | avgt | 10 | 101.993 | 99.2 | 0.781 | ns/op | 0.973 Base64Decode.testBase64Decode | 0 | 144 | 4 | 3 | avgt | 10 | 117.832 | 117.596 | 2.431 | ns/op | 0.998 Base64Decode.testBase64Decode | 0 | 144 | 4 | 7 | avgt | 10 | 429.577 | 174.873 | 4.125 | ns/op | 0.407 Base64Decode.testBase64Decode | 0 | 144 | 4 | 32 | avgt | 10 | 1760.438 | 286.046 | 3.946 | ns/op | 0.162 Base64Decode.testBase64Decode | 0 | 144 | 4 | 64 | avgt | 10 | 1060.156 | 339.35 | 1.789 | ns/op | 0.32 Base64Decode.testBase64Decode | 0 | 144 | 4 | 80 | avgt | 10 | 1929.515 | 422.906 | 48.816 | ns/op | 0.219 Base64Decode.testBase64Decode | 0 | 144 | 4 | 96 | avgt | 10 | 398.397 | 340.595 | 1.805 | ns/op | 0.855 Base64Decode.testBase64Decode | 0 | 144 | 4 | 112 | avgt | 10 | 1257.429 | 495.14 | 1.849 | ns/op | 0.394 Base64Decode.testBase64Decode | 0 | 144 | 4 | 512 | avgt | 10 | 3115.738 | 1451.795 | 17.349 | ns/op | 0.466 Base64Decode.testBase64Decode | 0 | 144 | 4 | 1000 | avgt | 10 | 4719.422 | 2321.598 | 582.276 | ns/op | 0.492 Base64Decode.testBase64Decode | 0 | 144 | 4 | 20000 | avgt | 10 | 48630.78 | 40487.502 | 370.749 | ns/op | 0.833 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1 | avgt | 10 | 252.071 | 187.793 | 12.937 | ns/op | 0.745 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 3 | avgt | 10 | 316.001 | 209.721 | 18.705 | ns/op | 0.664 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 7 | avgt | 10 | 1162.103 | 561.51 | 2.002 | ns/op | 0.483 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 32 | avgt | 10 | 4870.108 | 2145.144 | 28.822 | ns/op | 0.44 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 64 | avgt | 10 | 10383.563 | 6138.464 | 65.675 | ns/op | 0.591 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 80 | avgt | 10 | 13784.27 | 8764.186 | 176.608 | ns/op | 0.636 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 96 | avgt | 10 | 16233.788 | 11421.009 | 109.045 | ns/op | 0.704 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 112 | avgt | 10 | 18013.584 | 14380.185 | 106.091 | ns/op | 0.798 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 512 | avgt | 10 | 80484.884 | 82614.343 | 113.118 | ns/op | 1.026 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1000 | avgt | 10 | 157590.94 | 165972.524 | 877.18 | ns/op | 1.053 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 20000 | avgt | 10 | 2927722.669 | 3177495.202 | 12088.306 | ns/op | 1.085 Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 1 | avgt | 10 | 97.234 | 97.971 | 0.155 | ns/op | 1.008 Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 3 | avgt | 10 | 116.975 | 116.443 | 0.92 | ns/op | 0.995 Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 7 | avgt | 10 | 428.975 | 177.21 | 3.4 | ns/op | 0.413 Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 32 | avgt | 10 | 1759.346 | 293.573 | 10.449 | ns/op | 0.167 Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 64 | avgt | 10 | 3036.901 | 340.794 | 6.915 | ns/op | 0.112 Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 80 | avgt | 10 | 3682.705 | 425.593 | 6.004 | ns/op | 0.116 Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 96 | avgt | 10 | 3891.699 | 349.889 | 9.875 | ns/op | 0.09 Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 112 | avgt | 10 | 5135.364 | 494.459 | 32.21 | ns/op | 0.096 Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 512 | avgt | 10 | 21152.095 | 1465.85 | 123.962 | ns/op | 0.069 Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 1000 | avgt | 10 | 40731.606 | 2258.455 | 253.43 | ns/op | 0.055 Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 20000 | avgt | 10 | 800260.537 | 39655.109 | 3438.808 | ns/op | 0.05 Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 1 | avgt | 10 | 22709.004 | 22146.988 | 449.864 | ns/op | 0.975 Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 3 | avgt | 10 | 22852.835 | 23008.575 | 142.386 | ns/op | 1.007 Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 7 | avgt | 10 | 22954.637 | 21762.891 | 29.84 | ns/op | 0.948 Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 32 | avgt | 10 | 22279.986 | 21683.46 | 145.879 | ns/op | 0.973 Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 64 | avgt | 10 | 22512.975 | 22018.745 | 131.94 | ns/op | 0.978 Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 80 | avgt | 10 | 23507.467 | 22171.746 | 130.631 | ns/op | 0.943 Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 96 | avgt | 10 | 22264.421 | 22109.353 | 32.412 | ns/op | 0.993 Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 112 | avgt | 10 | 22295.383 | 21843.31 | 128.373 | ns/op | 0.98 Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 512 | avgt | 10 | 23068.531 | 22249.809 | 53.561 | ns/op | 0.965 Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 1000 | avgt | 10 | 22287.685 | 22598.346 | 59.69 | ns/op | 1.014 Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 20000 | avgt | 10 | 23788.214 | 23140.676 | 523.307 | ns/op | 0.973 With latset patch, MIME case performance as below: Benchmark | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score +intrinsic | Score -intrinsic | Error | Units | Improvement -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1 | avgt | 10 | 240.501 | 201.761 | 3.126 | ns/op | 0.839 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 3 | avgt | 10 | 236.175 | 227.85 | 7.486 | ns/op | 0.965 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 7 | avgt | 10 | 584.142 | 541.063 | 0.98 | ns/op | 0.926 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 32 | avgt | 10 | 2030.001 | 1901.634 | 3.404 | ns/op | 0.937 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 64 | avgt | 10 | 4300.895 | 3949.644 | 6.415 | ns/op | 0.918 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 80 | avgt | 10 | 5377.374 | 5122.923 | 32.501 | ns/op | 0.953 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 96 | avgt | 10 | 6086.2 | 5546.335 | 8.686 | ns/op | 0.911 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 112 | avgt | 10 | 7506.78 | 6969.159 | 5.112 | ns/op | 0.928 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 512 | avgt | 10 | 32669.495 | 31921.418 | 4.913 | ns/op | 0.977 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1000 | avgt | 10 | 62497.135 | 57552.972 | 40.188 | ns/op | 0.921 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 20000 | avgt | 10 9 | 91544.935 | 914449.121 | 91.182 | ns/op | 9.989 Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 50000 | avgt | 10 22 | 78953.76 | 206748.186 | 61.744 | ns/op | 2.619 Base64Decode.testBase64MIMEDecode | 0 | 144 | 32 | 1 | avgt | 10 | 154.333 | 161.999 | 7.97 | ns/op | 1.05 Base64Decode.testBase64MIMEDecode | 0 | 144 | 32 | 3 | avgt | 10 | 197.941 | 195.536 | 0.466 | ns/op | 0.988 Base64Decode.testBase64MIMEDecode | 0 | 144 | 32 | 7 | avgt | 10 | 301.185 | 308.205 | 1.772 | ns/op | 1.023 Base64Decode.testBase64MIMEDecode | 0 | 144 | 32 | 32 | avgt | 10 | 855.663 | 894.838 | 1.361 | ns/op | 1.046 Base64Decode.testBase64MIMEDecode | 0 | 144 | 32 | 64 | avgt | 10 | 1599.578 | 1702.096 | 2.229 | ns/op | 1.064 Base64Decode.testBase64MIMEDecode | 0 | 144 | 32 | 80 | avgt | 10 | 2161.773 | 2256.243 | 15.275 | ns/op | 1.044 Base64Decode.testBase64MIMEDecode | 0 | 144 | 32 | 96 | avgt | 10 | 2410.724 | 2580.8 | 1.4 | ns/op | 1.071 Base64Decode.testBase64MIMEDecode | 0 | 144 | 32 | 112 | avgt | 10 | 3025.063 | 3212.42 | 1.392 | ns/op | 1.062 Base64Decode.testBase64MIMEDecode | 0 | 144 | 32 | 512 | avgt | 10 | 12836.04 | 13714.194 | 4.74 | ns/op | 1.068 Base64Decode.testBase64MIMEDecode | 0 | 144 | 32 | 1000 | avgt | 10 | 23009.573 | 24648.995 | 2.358 | ns/op | 1.071 Base64Decode.testBase64MIMEDecode | 0 | 144 | 32 | 20000 | avgt | 10 2 | 87745.171 | 324781.646 | 96.118 | ns/op | 3.701 Base64Decode.testBase64MIMEDecode | 0 | 144 | 32 | 50000 | avgt | 10 6 | 88805.99 | 800777.988 | 17.202 | ns/op | 9.017 Base64Decode.testBase64MIMEDecode | 0 | 144 | 76 | 1 | avgt | 10 | 162.18 | 151.062 | 1.984 | ns/op | 0.931 Base64Decode.testBase64MIMEDecode | 0 | 144 | 76 | 3 | avgt | 10 | 197.894 | 195.335 | 1.261 | ns/op | 0.987 Base64Decode.testBase64MIMEDecode | 0 | 144 | 76 | 7 | avgt | 10 | 301.012 | 318.607 | 2.875 | ns/op | 1.058 Base64Decode.testBase64MIMEDecode | 0 | 144 | 76 | 32 | avgt | 10 | 743.716 | 770.01 | 1.095 | ns/op | 1.035 Base64Decode.testBase64MIMEDecode | 0 | 144 | 76 | 64 | avgt | 10 | 1443.015 | 1549.228 | 2.714 | ns/op | 1.074 Base64Decode.testBase64MIMEDecode | 0 | 144 | 76 | 80 | avgt | 10 | 1841.23 | 2008.152 | 2.681 | ns/op | 1.091 Base64Decode.testBase64MIMEDecode | 0 | 144 | 76 | 96 | avgt | 10 | 2085.889 | 2334.91 | 0.696 | ns/op | 1.119 Base64Decode.testBase64MIMEDecode | 0 | 144 | 76 | 112 | avgt | 10 | 2581.392 | 2825.756 | 2.019 | ns/op | 1.095 Base64Decode.testBase64MIMEDecode | 0 | 144 | 76 | 512 | avgt | 10 | 11093.438 | 12072.401 | 49.43 | ns/op | 1.088 Base64Decode.testBase64MIMEDecode | 0 | 144 | 76 | 1000 | avgt | 10 | 19899.375 | 21965.728 | 2.75 | ns/op | 1.104 Base64Decode.testBase64MIMEDecode | 0 | 144 | 76 | 20000 | avgt | 10 3 | 32801.005 | 353076.979 | 82.059 | ns/op | 10.764 Base64Decode.testBase64MIMEDecode | 0 | 144 | 76 | 50000 | avgt | 10 5 | 56850.177 | 664287.226 | 45.683 | ns/op | 11.685 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20026#issuecomment-2210912450 PR Comment: https://git.openjdk.org/jdk/pull/20026#issuecomment-2298764766 From duke at openjdk.org Tue Aug 20 18:31:12 2024 From: duke at openjdk.org (Camel Coder) Date: Tue, 20 Aug 2024 18:31:12 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding In-Reply-To: References: Message-ID: On Fri, 5 Jul 2024 13:48:24 GMT, Hamlin Li wrote: >> ## Performance >> benchmarks run on CanVM-K230 >> >> data >> >> Benchmark m2+m1+scalar | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score +intrinsic+rvv | Score -intrinsic | Error | Units | Improvement >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 1 | avgt | 10 | 97.771 | 98.506 | 0.713 | ns/op | 1.008 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 3 | avgt | 10 | 117.715 | 118.422 | 0.428 | ns/op | 1.006 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 7 | avgt | 10 | 174.625 | 172.767 | 7.671 | ns/op | 0.989 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 32 | avgt | 10 | 286.391 | 317.175 | 11.443 | ns/op | 1.107 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 64 | avgt | 10 | 336.932 | 503.257 | 15.738 | ns/op | 1.494 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 80 | avgt | 10 | 418.894 | 625.485 | 7.21 | ns/op | 1.493 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 96 | avgt | 10 | 353.813 | 698.67 | 15.485 | ns/op | 1.975 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 112 | avgt | 10 | 499.243 | 866.909 | 4.427 | ns/op | 1.736 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 512 | avgt | 10 | 1451.277 | 3530.048 | 3.685 | ns/op | 2.432 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 1000 | avgt | 10 | 2258.785 | 5964.066 | 9.075 | ns/op | 2.64 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 20000 | avgt | 10 | 39689.204 | 122334.929 | 255.195 | ns/op | 3.082 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1 | avgt | 10 | 187.032 | 158.558 | 7.606 | ns/op | 0.848 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 3 | avgt | 10 | 209.558 | 200.774 | 7.648 | ns/op | 0.958 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 7 | avgt | 10 | 556.696 | 505.072 | 8.748 | ns/op | 0.907 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 32 | avgt | 10 | 2139.767 | 1876.825 | 13.787 | ns/op | 0.877 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 64 | avgt | 10 | 6142.353 | 3818.199 | 35.622 | ns/op | 0.622 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 80 | avgt | 10 | 8746.205 | 4787.155 | 109.819 | ns/op ... > > To continue the discussion at https://github.com/openjdk/jdk/pull/19973#issuecomment-2210907011. > > vrgroup implementation bring some regression compared with current implementation in this pr in large size data (vrgroup also bring regression in small size data, but we can ignore the regression in small size data, as current implementation use scalar version when data size is small, it's expected.) > A implementation with vrgroup is at https://github.com/openjdk/jdk/compare/master...Hamlin-Li:jdk:baes64-decode-vrgroup?expand=1 > > comparison between this implementation and vrgroup > > Benchmark +/- vrgroup | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score +intrinsic+rvv+vrgroup | Score +intrinsic+rvv-vrgroup | Error | Units | Improvement of vrgroup > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Base64Decode.testBase64Decode | 0 | 144 | 4 | 1 | avgt | 10 | 101.993 | 99.2 | 0.781 | ns/op | 0.973 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 3 | avgt | 10 | 117.832 | 117.596 | 2.431 | ns/op | 0.998 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 7 | avgt | 10 | 429.577 | 174.873 | 4.125 | ns/op | 0.407 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 32 | avgt | 10 | 1760.438 | 286.046 | 3.946 | ns/op | 0.162 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 64 | avgt | 10 | 1060.156 | 339.35 | 1.789 | ns/op | 0.32 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 80 | avgt | 10 | 1929.515 | 422.906 | 48.816 | ns/op | 0.219 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 96 | avgt | 10 | 398.397 | 340.595 | 1.805 | ns/op | 0.855 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 112 | avgt | 10 | 1257.429 | 495.14 | 1.849 | ns/op | 0.394 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 512 | avgt | 10 | 3115.738 | 1451.795 | 17.349 | ns/op | 0.466 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 1000 | avgt | 10 | 4719.422 | 2321.598 | 582.276 | ns/op | 0.492 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 20000 | avgt | 10 | 48630.78 | 40487.502 | 370.749 | ns/op | 0.833 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1 | avgt | 10 | 252.071 | 187.793 | 12.937 | ns/op | 0.745 > Base64Decode.testBase64MIMEDe... @Hamlin-Li Yeah, you are right, your is faster on the C908, and also X60. I measured that yours took C908: 0.93x and X60: 0.85x the amount of time mine took. (Note: I modified my code that was linked in the base64simd issue, because I thought I found an easy optimization, and added that code to my first post here, turns out that change made it slightly slower, so I used the original variant: https://godbolt.org/z/hrs61x9aP). I think I have an idea of what's going on, first look at these 4-bit LUT benchmarks, specifically `rvv_gathers_m1` and `rvv_vluxei8_m2`: On The [C908](https://camel-cdr.github.io/rvv-bench-results/canmv_k230/LUT4.html), using LMUL=1 vrgather for lookup tables is roughly twice as fast as using `rvv_vluxei8_m2`, however your code uses four vluxei8, while mine uses twelve LMUL=1 vrgather, so yours ends up faster. Now for the [X60](https://camel-cdr.github.io/rvv-bench-results/bpi_f3/LUT4.html), there LMUL=1 vrgather is about four times faster for smaller input sizes (presumably when it fits into cache), and somehow slower than vluxei8 on larger input sizes (presumably doing memory accesses). I mentioned that yours is faster on the X60, but after looking at this graph I tried restricting the input to under 200KB, and as predicted now mine is 1.3x faster. On the [C920](https://camel-cdr.github.io/rvv-bench-results/milkv_pioneer/LUT4.html) LMUL=1 vrgather ~~is about 4.5 times faster than using vluxei8 for smaller inputs, but somehow its up to 8.5x faster for large inputs??~~ Sorry I accidentally looked at the `rvv_m1_gather_m2` graph, the `rvv_gather_m1` graph is 15x faster for small inputs and 8x faster for large ones. That all seems very weird, but I think I know what's going on, and what distinguishes current ooo implementations from the in-order ones: vector chaining support I think it's safe to say that the X60 does use vector chaining for its load stores, but not for vrgather, that can explain how it ended up slower than the vluxei8 variant, because vrgather isn't chained and needs all elements ready, while vluxei8 chain with the other load/stores, one both need to access memory directly. If you compare the measurements in the graps, you'll see that vluxei8 takes about 0.88x less time than vrgather for large inputs, that's quite close to the 0.85x I measured for the base64 decode. Although that is probably a bit of an coincidence. I'm don't know if ooo makes vector chaining considerably harder, but I'll make the prediction that most first gen ooo processors won't implement it, because the people working on those cores have a lot of experience doing fixed width SIMD without chaining on arm or x86 cores. Chaining is also more useful for DLEN References: Message-ID: On Thu, 4 Jul 2024 10:09:41 GMT, Hamlin Li wrote: > ## Performance > benchmarks run on CanVM-K230 > > data > > Benchmark m2+m1+scalar | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score +intrinsic+rvv | Score -intrinsic | Error | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Base64Decode.testBase64Decode | 0 | 144 | 4 | 1 | avgt | 10 | 97.771 | 98.506 | 0.713 | ns/op | 1.008 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 3 | avgt | 10 | 117.715 | 118.422 | 0.428 | ns/op | 1.006 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 7 | avgt | 10 | 174.625 | 172.767 | 7.671 | ns/op | 0.989 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 32 | avgt | 10 | 286.391 | 317.175 | 11.443 | ns/op | 1.107 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 64 | avgt | 10 | 336.932 | 503.257 | 15.738 | ns/op | 1.494 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 80 | avgt | 10 | 418.894 | 625.485 | 7.21 | ns/op | 1.493 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 96 | avgt | 10 | 353.813 | 698.67 | 15.485 | ns/op | 1.975 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 112 | avgt | 10 | 499.243 | 866.909 | 4.427 | ns/op | 1.736 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 512 | avgt | 10 | 1451.277 | 3530.048 | 3.685 | ns/op | 2.432 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 1000 | avgt | 10 | 2258.785 | 5964.066 | 9.075 | ns/op | 2.64 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 20000 | avgt | 10 | 39689.204 | 122334.929 | 255.195 | ns/op | 3.082 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1 | avgt | 10 | 187.032 | 158.558 | 7.606 | ns/op | 0.848 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 3 | avgt | 10 | 209.558 | 200.774 | 7.648 | ns/op | 0.958 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 7 | avgt | 10 | 556.696 | 505.072 | 8.748 | ns/op | 0.907 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 32 | avgt | 10 | 2139.767 | 1876.825 | 13.787 | ns/op | 0.877 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 64 | avgt | 10 | 6142.353 | 3818.199 | 35.622 | ns/op | 0.622 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 80 | avgt | 10 | 8746.205 | 4787.155 | 109.819 | ns/op | 0.547 > Base64Decode.testBase64MIMEDecode | 0 | ... I also modified the microbench test a bit. As for MIME case, 76 is the linesize limit, so regular linesize is about 76 or less, but current value of linesize is just 4, which is not a common case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20026#issuecomment-2299497846 From mli at openjdk.org Tue Aug 20 18:48:03 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 20 Aug 2024 18:48:03 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding In-Reply-To: References: Message-ID: On Thu, 4 Jul 2024 10:09:41 GMT, Hamlin Li wrote: > ## Performance > benchmarks run on CanVM-K230 > > data > > Benchmark m2+m1+scalar | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score +intrinsic+rvv | Score -intrinsic | Error | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Base64Decode.testBase64Decode | 0 | 144 | 4 | 1 | avgt | 10 | 97.771 | 98.506 | 0.713 | ns/op | 1.008 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 3 | avgt | 10 | 117.715 | 118.422 | 0.428 | ns/op | 1.006 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 7 | avgt | 10 | 174.625 | 172.767 | 7.671 | ns/op | 0.989 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 32 | avgt | 10 | 286.391 | 317.175 | 11.443 | ns/op | 1.107 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 64 | avgt | 10 | 336.932 | 503.257 | 15.738 | ns/op | 1.494 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 80 | avgt | 10 | 418.894 | 625.485 | 7.21 | ns/op | 1.493 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 96 | avgt | 10 | 353.813 | 698.67 | 15.485 | ns/op | 1.975 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 112 | avgt | 10 | 499.243 | 866.909 | 4.427 | ns/op | 1.736 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 512 | avgt | 10 | 1451.277 | 3530.048 | 3.685 | ns/op | 2.432 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 1000 | avgt | 10 | 2258.785 | 5964.066 | 9.075 | ns/op | 2.64 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 20000 | avgt | 10 | 39689.204 | 122334.929 | 255.195 | ns/op | 3.082 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1 | avgt | 10 | 187.032 | 158.558 | 7.606 | ns/op | 0.848 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 3 | avgt | 10 | 209.558 | 200.774 | 7.648 | ns/op | 0.958 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 7 | avgt | 10 | 556.696 | 505.072 | 8.748 | ns/op | 0.907 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 32 | avgt | 10 | 2139.767 | 1876.825 | 13.787 | ns/op | 0.877 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 64 | avgt | 10 | 6142.353 | 3818.199 | 35.622 | ns/op | 0.622 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 80 | avgt | 10 | 8746.205 | 4787.155 | 109.819 | ns/op | 0.547 > Base64Decode.testBase64MIMEDecode | 0 | ... Hi @RealFYang , please have a look when you're available. Sorry for the delayed push. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20026#issuecomment-2299520010 From iklam at openjdk.org Tue Aug 20 18:55:07 2024 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 20 Aug 2024 18:55:07 GMT Subject: RFR: 8337828: CDS: Trim down minimum GC region alignment [v2] In-Reply-To: <9eVYBElFs0NFXIuILx2Iu_NDuTM8wX8biuSkjxoUDNU=.5200d483-fa83-4ed6-9bbd-daa152201b90@github.com> References: <9eVYBElFs0NFXIuILx2Iu_NDuTM8wX8biuSkjxoUDNU=.5200d483-fa83-4ed6-9bbd-daa152201b90@github.com> Message-ID: On Thu, 15 Aug 2024 21:34:19 GMT, Aleksey Shipilev wrote: >> CDS currently follows G1's minimum region size to guess which alignment to use when dumping the heap. The comment near the constant rightfully recognizes it would be convenient for Shenandoah to trim the alignment down to 256K (Shenandoah's min region size). If we do this, we will improve the heap sizes [JDK-8293650](https://bugs.openjdk.org/browse/JDK-8293650) can operate at. >> >> Unless I am missing something else, trimming down the min region alignment has impact on the size of the objects we can store in CDS archive. Conveniently, `-Xlog:cds+heap` prints the object size stats for us, and it looks we are way under the 256K limit: >> >> >> $ build/macosx-aarch64-server-fastdebug/images/jdk/bin/java -XX:-UseCompressedOops -Xshare:dump -Xlog:cds+heap >> ... >> [0.921s][info][cds,heap] 0 objects are <= 8 bytes (total 0 bytes, avg 0.0 bytes) >> [0.921s][info][cds,heap] 2550 objects are <= 16 bytes (total 40800 bytes, avg 16.0 bytes) >> [0.921s][info][cds,heap] 14325 objects are <= 32 bytes (total 431896 bytes, avg 30.1 bytes) >> [0.921s][info][cds,heap] 6572 objects are <= 64 bytes (total 301304 bytes, avg 45.8 bytes) >> [0.921s][info][cds,heap] 1225 objects are <= 128 bytes (total 113112 bytes, avg 92.3 bytes) >> [0.921s][info][cds,heap] 2173 objects are <= 256 bytes (total 384024 bytes, avg 176.7 bytes) >> [0.921s][info][cds,heap] 143 objects are <= 512 bytes (total 47720 bytes, avg 333.7 bytes) >> [0.921s][info][cds,heap] 40 objects are <= 1024 bytes (total 26872 bytes, avg 671.8 bytes) >> [0.921s][info][cds,heap] 19 objects are <= 2048 bytes (total 29656 bytes, avg 1560.8 bytes) >> [0.921s][info][cds,heap] 9 objects are <= 4096 bytes (total 20744 bytes, avg 2304.9 bytes) >> [0.921s][info][cds,heap] 4 objects are <= 8192 bytes (total 20536 bytes, avg 5134.0 bytes) >> [0.921s][info][cds,heap] 3 objects are <= 16384 bytes (total 30168 bytes, avg 10056.0 bytes) >> [0.921s][info][cds,heap] 2 objects are <= 32768 bytes (total 32800 bytes, avg 16400.0 bytes) >> [0.921s][info][cds,heap] 0 objects are <= 65536 bytes (total 0 bytes, avg 0.0 bytes) >> [0.921s][info][cds,heap] 1 objects are <= 131072 bytes (total 66848 bytes, avg 66848.0 bytes) >> [0.921s][info][cds,heap] 0 objects are <= 262144 bytes (total 0 bytes, avg 0.0 bytes) >> [0.921s][info][cds... > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' into JDK-8337828-cds-min-alignment > - Work Marked as reviewed by iklam (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20469#pullrequestreview-2248815955 From dholmes at openjdk.org Tue Aug 20 21:51:03 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 20 Aug 2024 21:51:03 GMT Subject: RFR: 8328880: Events::log_exception should limit the size of the logging message In-Reply-To: References: <8KXxdCRCjeJny7i7Sv-d-7vTuGNJwUzq94r5jnG1o3I=.c59b2513-5066-46ea-9731-de3faf7e40f0@github.com> Message-ID: On Tue, 20 Aug 2024 12:54:26 GMT, Aleksey Shipilev wrote: >> This simple enhancement allows for `Exceptions::_throw` to limit the message length printed by `Events::log_exception` in the same way that unified logging is limited. We simply allow a `message_length_limit` variable to be passed down - default value zero which means no limit (i.e. the full `strlen` of the message will be printed). >> >> Testing: >> - tiers 1-3 >> >> Thanks > > Looks reasonable. (First time I see `%.*s`, this is a cute trick I did not know about before.) Thanks for the review @shipilev ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20638#issuecomment-2299822509 From dholmes at openjdk.org Tue Aug 20 22:47:03 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 20 Aug 2024 22:47:03 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int In-Reply-To: <1of0cndqphEvQjJD8q54cUrYMAKOPh-Y7hkZiZ-uooU=.46215ef2-b141-465f-9247-071f3eec483e@github.com> References: <1of0cndqphEvQjJD8q54cUrYMAKOPh-Y7hkZiZ-uooU=.46215ef2-b141-465f-9247-071f3eec483e@github.com> Message-ID: On Thu, 15 Aug 2024 20:44:52 GMT, Coleen Phillimore wrote: >> This work has been split out from JDK-8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths >> >> The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. Though with compact strings this reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define UTF8 lengths as `size_t` to accommodate all possible representations. Higher-level API's can still use `int` if they know the strings (eg symbols) are sufficiently constrained in length. See the comments in utf8.hpp that explain Strings, compact strings and the encoding. >> >> As the existing JNI `GetStringUTFLength` still requires the current truncating behaviour of ` UNICODE::utf8_length` we add back `UNICODE::utf8_length_as_int` for it to use. >> >> Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` use `length` as an IN/OUT parameter: it is the incoming (int) length of the jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. This makes some of the call sites a little messy with casts. >> >> Testing: >> - tiers 1-4 >> - GHA > > It doesn't look like GHA is configured for you here. @coleenp I compiled with `-Wconversion` on Linux and there were no warnings in relation to the code I have changed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20560#issuecomment-2299884096 From phh at openjdk.org Tue Aug 20 22:50:04 2024 From: phh at openjdk.org (Paul Hohensee) Date: Tue, 20 Aug 2024 22:50:04 GMT Subject: RFR: 8337828: CDS: Trim down minimum GC region alignment [v2] In-Reply-To: <9eVYBElFs0NFXIuILx2Iu_NDuTM8wX8biuSkjxoUDNU=.5200d483-fa83-4ed6-9bbd-daa152201b90@github.com> References: <9eVYBElFs0NFXIuILx2Iu_NDuTM8wX8biuSkjxoUDNU=.5200d483-fa83-4ed6-9bbd-daa152201b90@github.com> Message-ID: On Thu, 15 Aug 2024 21:34:19 GMT, Aleksey Shipilev wrote: >> CDS currently follows G1's minimum region size to guess which alignment to use when dumping the heap. The comment near the constant rightfully recognizes it would be convenient for Shenandoah to trim the alignment down to 256K (Shenandoah's min region size). If we do this, we will improve the heap sizes [JDK-8293650](https://bugs.openjdk.org/browse/JDK-8293650) can operate at. >> >> Unless I am missing something else, trimming down the min region alignment has impact on the size of the objects we can store in CDS archive. Conveniently, `-Xlog:cds+heap` prints the object size stats for us, and it looks we are way under the 256K limit: >> >> >> $ build/macosx-aarch64-server-fastdebug/images/jdk/bin/java -XX:-UseCompressedOops -Xshare:dump -Xlog:cds+heap >> ... >> [0.921s][info][cds,heap] 0 objects are <= 8 bytes (total 0 bytes, avg 0.0 bytes) >> [0.921s][info][cds,heap] 2550 objects are <= 16 bytes (total 40800 bytes, avg 16.0 bytes) >> [0.921s][info][cds,heap] 14325 objects are <= 32 bytes (total 431896 bytes, avg 30.1 bytes) >> [0.921s][info][cds,heap] 6572 objects are <= 64 bytes (total 301304 bytes, avg 45.8 bytes) >> [0.921s][info][cds,heap] 1225 objects are <= 128 bytes (total 113112 bytes, avg 92.3 bytes) >> [0.921s][info][cds,heap] 2173 objects are <= 256 bytes (total 384024 bytes, avg 176.7 bytes) >> [0.921s][info][cds,heap] 143 objects are <= 512 bytes (total 47720 bytes, avg 333.7 bytes) >> [0.921s][info][cds,heap] 40 objects are <= 1024 bytes (total 26872 bytes, avg 671.8 bytes) >> [0.921s][info][cds,heap] 19 objects are <= 2048 bytes (total 29656 bytes, avg 1560.8 bytes) >> [0.921s][info][cds,heap] 9 objects are <= 4096 bytes (total 20744 bytes, avg 2304.9 bytes) >> [0.921s][info][cds,heap] 4 objects are <= 8192 bytes (total 20536 bytes, avg 5134.0 bytes) >> [0.921s][info][cds,heap] 3 objects are <= 16384 bytes (total 30168 bytes, avg 10056.0 bytes) >> [0.921s][info][cds,heap] 2 objects are <= 32768 bytes (total 32800 bytes, avg 16400.0 bytes) >> [0.921s][info][cds,heap] 0 objects are <= 65536 bytes (total 0 bytes, avg 0.0 bytes) >> [0.921s][info][cds,heap] 1 objects are <= 131072 bytes (total 66848 bytes, avg 66848.0 bytes) >> [0.921s][info][cds,heap] 0 objects are <= 262144 bytes (total 0 bytes, avg 0.0 bytes) >> [0.921s][info][cds... > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' into JDK-8337828-cds-min-alignment > - Work Marked as reviewed by phh (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20469#pullrequestreview-2249233619 From shade at openjdk.org Wed Aug 21 08:20:20 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 21 Aug 2024 08:20:20 GMT Subject: Integrated: 8337828: CDS: Trim down minimum GC region alignment In-Reply-To: References: Message-ID: On Mon, 5 Aug 2024 17:41:41 GMT, Aleksey Shipilev wrote: > CDS currently follows G1's minimum region size to guess which alignment to use when dumping the heap. The comment near the constant rightfully recognizes it would be convenient for Shenandoah to trim the alignment down to 256K (Shenandoah's min region size). If we do this, we will improve the heap sizes [JDK-8293650](https://bugs.openjdk.org/browse/JDK-8293650) can operate at. > > Unless I am missing something else, trimming down the min region alignment has impact on the size of the objects we can store in CDS archive. Conveniently, `-Xlog:cds+heap` prints the object size stats for us, and it looks we are way under the 256K limit: > > > $ build/macosx-aarch64-server-fastdebug/images/jdk/bin/java -XX:-UseCompressedOops -Xshare:dump -Xlog:cds+heap > ... > [0.921s][info][cds,heap] 0 objects are <= 8 bytes (total 0 bytes, avg 0.0 bytes) > [0.921s][info][cds,heap] 2550 objects are <= 16 bytes (total 40800 bytes, avg 16.0 bytes) > [0.921s][info][cds,heap] 14325 objects are <= 32 bytes (total 431896 bytes, avg 30.1 bytes) > [0.921s][info][cds,heap] 6572 objects are <= 64 bytes (total 301304 bytes, avg 45.8 bytes) > [0.921s][info][cds,heap] 1225 objects are <= 128 bytes (total 113112 bytes, avg 92.3 bytes) > [0.921s][info][cds,heap] 2173 objects are <= 256 bytes (total 384024 bytes, avg 176.7 bytes) > [0.921s][info][cds,heap] 143 objects are <= 512 bytes (total 47720 bytes, avg 333.7 bytes) > [0.921s][info][cds,heap] 40 objects are <= 1024 bytes (total 26872 bytes, avg 671.8 bytes) > [0.921s][info][cds,heap] 19 objects are <= 2048 bytes (total 29656 bytes, avg 1560.8 bytes) > [0.921s][info][cds,heap] 9 objects are <= 4096 bytes (total 20744 bytes, avg 2304.9 bytes) > [0.921s][info][cds,heap] 4 objects are <= 8192 bytes (total 20536 bytes, avg 5134.0 bytes) > [0.921s][info][cds,heap] 3 objects are <= 16384 bytes (total 30168 bytes, avg 10056.0 bytes) > [0.921s][info][cds,heap] 2 objects are <= 32768 bytes (total 32800 bytes, avg 16400.0 bytes) > [0.921s][info][cds,heap] 0 objects are <= 65536 bytes (total 0 bytes, avg 0.0 bytes) > [0.921s][info][cds,heap] 1 objects are <= 131072 bytes (total 66848 bytes, avg 66848.0 bytes) > [0.921s][info][cds,heap] 0 objects are <= 262144 bytes (total 0 bytes, avg 0.0 bytes) > [0.921s][info][cds,heap] 0 huge objects (tot... This pull request has now been integrated. Changeset: 59816975 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/598169756c903bb1f77e35ea32717043bc166e3c Stats: 5 lines in 1 file changed: 0 ins; 1 del; 4 mod 8337828: CDS: Trim down minimum GC region alignment Reviewed-by: iklam, phh ------------- PR: https://git.openjdk.org/jdk/pull/20469 From shade at openjdk.org Wed Aug 21 08:17:07 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 21 Aug 2024 08:17:07 GMT Subject: RFR: 8337828: CDS: Trim down minimum GC region alignment [v2] In-Reply-To: <9eVYBElFs0NFXIuILx2Iu_NDuTM8wX8biuSkjxoUDNU=.5200d483-fa83-4ed6-9bbd-daa152201b90@github.com> References: <9eVYBElFs0NFXIuILx2Iu_NDuTM8wX8biuSkjxoUDNU=.5200d483-fa83-4ed6-9bbd-daa152201b90@github.com> Message-ID: On Thu, 15 Aug 2024 21:34:19 GMT, Aleksey Shipilev wrote: >> CDS currently follows G1's minimum region size to guess which alignment to use when dumping the heap. The comment near the constant rightfully recognizes it would be convenient for Shenandoah to trim the alignment down to 256K (Shenandoah's min region size). If we do this, we will improve the heap sizes [JDK-8293650](https://bugs.openjdk.org/browse/JDK-8293650) can operate at. >> >> Unless I am missing something else, trimming down the min region alignment has impact on the size of the objects we can store in CDS archive. Conveniently, `-Xlog:cds+heap` prints the object size stats for us, and it looks we are way under the 256K limit: >> >> >> $ build/macosx-aarch64-server-fastdebug/images/jdk/bin/java -XX:-UseCompressedOops -Xshare:dump -Xlog:cds+heap >> ... >> [0.921s][info][cds,heap] 0 objects are <= 8 bytes (total 0 bytes, avg 0.0 bytes) >> [0.921s][info][cds,heap] 2550 objects are <= 16 bytes (total 40800 bytes, avg 16.0 bytes) >> [0.921s][info][cds,heap] 14325 objects are <= 32 bytes (total 431896 bytes, avg 30.1 bytes) >> [0.921s][info][cds,heap] 6572 objects are <= 64 bytes (total 301304 bytes, avg 45.8 bytes) >> [0.921s][info][cds,heap] 1225 objects are <= 128 bytes (total 113112 bytes, avg 92.3 bytes) >> [0.921s][info][cds,heap] 2173 objects are <= 256 bytes (total 384024 bytes, avg 176.7 bytes) >> [0.921s][info][cds,heap] 143 objects are <= 512 bytes (total 47720 bytes, avg 333.7 bytes) >> [0.921s][info][cds,heap] 40 objects are <= 1024 bytes (total 26872 bytes, avg 671.8 bytes) >> [0.921s][info][cds,heap] 19 objects are <= 2048 bytes (total 29656 bytes, avg 1560.8 bytes) >> [0.921s][info][cds,heap] 9 objects are <= 4096 bytes (total 20744 bytes, avg 2304.9 bytes) >> [0.921s][info][cds,heap] 4 objects are <= 8192 bytes (total 20536 bytes, avg 5134.0 bytes) >> [0.921s][info][cds,heap] 3 objects are <= 16384 bytes (total 30168 bytes, avg 10056.0 bytes) >> [0.921s][info][cds,heap] 2 objects are <= 32768 bytes (total 32800 bytes, avg 16400.0 bytes) >> [0.921s][info][cds,heap] 0 objects are <= 65536 bytes (total 0 bytes, avg 0.0 bytes) >> [0.921s][info][cds,heap] 1 objects are <= 131072 bytes (total 66848 bytes, avg 66848.0 bytes) >> [0.921s][info][cds,heap] 0 objects are <= 262144 bytes (total 0 bytes, avg 0.0 bytes) >> [0.921s][info][cds... > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' into JDK-8337828-cds-min-alignment > - Work Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20469#issuecomment-2301435517 From aph at openjdk.org Wed Aug 21 08:53:14 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 21 Aug 2024 08:53:14 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v3] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Tue, 20 Aug 2024 15:18:08 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: > > cleanup: remove a redundant parameter src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 99: > 97: bind(TAIL); > 98: > 99: assert(is_power_of_2(unroll_factor), "cant use this value to calculate the jump target PC"); Suggestion: assert(is_power_of_2(unroll_factor), "can't use this value to calculate the jump target PC"); src/hotspot/cpu/aarch64/stubRoutines_aarch64.hpp line 166: > 164: return _large_arrays_hashcode_int; > 165: default: > 166: assert(0, "unsupported eltype"); Suggestion: ShouldNotReachHere(); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1724684905 PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1724682669 From sgehwolf at openjdk.org Wed Aug 21 08:56:07 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Wed, 21 Aug 2024 08:56:07 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v5] In-Reply-To: References: Message-ID: <2WWkA0hsiI5IEhVfh67kj3650YjI5n5PYmg1wakBFzI=.3e3c50b9-81a8-49ca-a032-a33fe36adf45@github.com> On Tue, 20 Aug 2024 17:34:46 GMT, Severin Gehwolf wrote: >> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. >> >> I'm adding those tests in order to not regress another time. >> >> Testing: >> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. >> - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) >> - [x] GHA > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Add Whitebox check for host cpu > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Fix comments > - 8333446: Add tests for hierarchical container support GHA failure on maxos-aarch64 is unrelated: [runtime/cds/DeterministicDump](https://github.com/jerboaa/jdk/actions/runs/10476366525#user-content-runtime_cds_deterministicdump) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2301519742 From gcao at openjdk.org Wed Aug 21 08:57:07 2024 From: gcao at openjdk.org (Gui Cao) Date: Wed, 21 Aug 2024 08:57:07 GMT Subject: RFR: 8338539: New Object to ObjectMonitor mapping: riscv64 implementation [v2] In-Reply-To: References: Message-ID: On Mon, 19 Aug 2024 14:28:23 GMT, Gui Cao wrote: >> The riscv64 implementation of JDK-8315884 New Object to ObjectMonitor mapping >> >> ### Testing: >> - [x] tier1-3 & hotspot:tier4 tests (release) >> - [x] test/hotspot/jtreg/runtime/Monitor/UseObjectMonitorTableTest.java (release & fastdebug) > > Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into JDK-8338539 > - 8338539: New Object to ObjectMonitor mapping: riscv64 implementation Thanks all for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20621#issuecomment-2301516692 From duke at openjdk.org Wed Aug 21 08:57:07 2024 From: duke at openjdk.org (duke) Date: Wed, 21 Aug 2024 08:57:07 GMT Subject: RFR: 8338539: New Object to ObjectMonitor mapping: riscv64 implementation [v2] In-Reply-To: References: Message-ID: On Mon, 19 Aug 2024 14:28:23 GMT, Gui Cao wrote: >> The riscv64 implementation of JDK-8315884 New Object to ObjectMonitor mapping >> >> ### Testing: >> - [x] tier1-3 & hotspot:tier4 tests (release) >> - [x] test/hotspot/jtreg/runtime/Monitor/UseObjectMonitorTableTest.java (release & fastdebug) > > Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into JDK-8338539 > - 8338539: New Object to ObjectMonitor mapping: riscv64 implementation @zifeihan Your change (at version 4ad30729c3cce2ed084abf8efa2fe89d4a833da6) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20621#issuecomment-2301520458 From gcao at openjdk.org Wed Aug 21 09:01:09 2024 From: gcao at openjdk.org (Gui Cao) Date: Wed, 21 Aug 2024 09:01:09 GMT Subject: Integrated: 8338539: New Object to ObjectMonitor mapping: riscv64 implementation In-Reply-To: References: Message-ID: On Sun, 18 Aug 2024 13:47:29 GMT, Gui Cao wrote: > The riscv64 implementation of JDK-8315884 New Object to ObjectMonitor mapping > > ### Testing: > - [x] tier1-3 & hotspot:tier4 tests (release) > - [x] test/hotspot/jtreg/runtime/Monitor/UseObjectMonitorTableTest.java (release & fastdebug) This pull request has now been integrated. Changeset: c4cf1e93 Author: Gui Cao Committer: Hamlin Li URL: https://git.openjdk.org/jdk/commit/c4cf1e93bb22bf7c65ce1943fff91f74839434df Stats: 147 lines in 9 files changed: 66 ins; 11 del; 70 mod 8338539: New Object to ObjectMonitor mapping: riscv64 implementation Reviewed-by: fyang, rehn, mli ------------- PR: https://git.openjdk.org/jdk/pull/20621 From stuefe at openjdk.org Wed Aug 21 09:58:01 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 21 Aug 2024 09:58:01 GMT Subject: RFR: 8204681: Option to include timestamp in hprof filename In-Reply-To: References: Message-ID: <3Z33v_k5LPdJndHtRPY6JnKHWsJWilQRyYxa7DFUftM=.2954c811-04db-49a8-8316-b21cdab28558@github.com> On Tue, 13 Aug 2024 15:07:17 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses [8204681](https://bugs.openjdk.org/browse/JDK-8204681) enabling support for timestamp expansion in filenames specified in `-XX:HeapDumpPath` using `%t`. > > As mentioned in this comments for this issue, this is somewhat related to [8334492](https://bugs.openjdk.org/browse/JDK-8334492) where we enabled support for `%p` for filenames specified in jcmd. > > With this patch, I propose: > - Expanding the utility function `Arguments::copy_expand_pid` to `Arguments::copy_expand_arguments` to deal with `%p` expansions for pid and `%t` expansions for timestamps. > - Leveraging the above utility function to enable argument expansion for both heap dump filenames and jcmd output commands. > - Though the linked JBS issue only relates to heap dumps generated in case of OOM, I think we can edit it to more broadly support filename expansion to support `%t` for jcmd as well. > > Testing: > - [x] Added test cases pass with all platforms (verified with a GHA job). > - [x] Tier 1 passes with GHA. > > Looking forward to hearing your thoughts! > > Thanks, > Sonia I think this could be very useful, but it needs more preparation and decisions. Possibly a CSR. - copy_expand_xxx is used in many places. While I think all of these places would benefit from more expansions than just %p, there is a potential backward compatibility issue if clients use %t for whatever reason today - Do we want the time of the dump or the JVM start? If the JVM runs for a week, then produces a JFR file, should the file be named by the JVM start date? I think in most cases the *current* time makes more sense - Do we want the printout as a human-readable date or as a numeric timestamp? Both makes sense depending on the post-processing clients want to do. - Do we want to improve this function further, potentially adding more replacement options? One possible way to solve this: - use different characters for timestamp (number) and datetime (human readable date) - use always the current time - If we want to add further replacements: - come up with a new replacement character that does not clash with libc sprintf (IMHO using percent was not a good idea in the first place). E.g. `$` - Add a new switch to guard this new replacement logic. By default off. If on, the contract is that any character following a `$` may be either now or in the future replaced with something different. Client must not use `$` as a normal character. - We probably should remove all non-matching `$` from the input. - The first replacements could be: `$p` for pid, `$t` for timestamp (numeric), `$d` for datetime - later replacements can be added later. Since we guard the new feature with a switch and forbid the use of `$`, we are then free to do so without breaking backward compatibility. I would like to hear @dholmes-ora take on this. We had a similar system at SAP in our proprietary JVM, which was really useful, so I like this idea in general. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20568#issuecomment-2301641288 From rehn at openjdk.org Wed Aug 21 10:07:37 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 21 Aug 2024 10:07:37 GMT Subject: RFR: 8338727: RISCV: Avoid synthetic data dependency in nmethod barrier on Ztso Message-ID: Hi please consider, On TSO we don't need the synthetic data dependency in between the loads. Also added some comment about this. Sanity tested ------------- Commit messages: - 8338727: RISCV: Avoid synthetic data dependency in nmethod barrier on Ztso Changes: https://git.openjdk.org/jdk/pull/20661/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20661&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338727 Stats: 12 lines in 1 file changed: 7 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20661.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20661/head:pull/20661 PR: https://git.openjdk.org/jdk/pull/20661 From mli at openjdk.org Wed Aug 21 10:23:33 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 21 Aug 2024 10:23:33 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v2] In-Reply-To: References: Message-ID: > ## Performance > benchmarks run on CanVM-K230 > > data > > Benchmark m2+m1+scalar | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score +intrinsic+rvv | Score -intrinsic | Error | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Base64Decode.testBase64Decode | 0 | 144 | 4 | 1 | avgt | 10 | 97.771 | 98.506 | 0.713 | ns/op | 1.008 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 3 | avgt | 10 | 117.715 | 118.422 | 0.428 | ns/op | 1.006 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 7 | avgt | 10 | 174.625 | 172.767 | 7.671 | ns/op | 0.989 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 32 | avgt | 10 | 286.391 | 317.175 | 11.443 | ns/op | 1.107 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 64 | avgt | 10 | 336.932 | 503.257 | 15.738 | ns/op | 1.494 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 80 | avgt | 10 | 418.894 | 625.485 | 7.21 | ns/op | 1.493 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 96 | avgt | 10 | 353.813 | 698.67 | 15.485 | ns/op | 1.975 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 112 | avgt | 10 | 499.243 | 866.909 | 4.427 | ns/op | 1.736 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 512 | avgt | 10 | 1451.277 | 3530.048 | 3.685 | ns/op | 2.432 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 1000 | avgt | 10 | 2258.785 | 5964.066 | 9.075 | ns/op | 2.64 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 20000 | avgt | 10 | 39689.204 | 122334.929 | 255.195 | ns/op | 3.082 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1 | avgt | 10 | 187.032 | 158.558 | 7.606 | ns/op | 0.848 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 3 | avgt | 10 | 209.558 | 200.774 | 7.648 | ns/op | 0.958 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 7 | avgt | 10 | 556.696 | 505.072 | 8.748 | ns/op | 0.907 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 32 | avgt | 10 | 2139.767 | 1876.825 | 13.787 | ns/op | 0.877 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 64 | avgt | 10 | 6142.353 | 3818.199 | 35.622 | ns/op | 0.622 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 80 | avgt | 10 | 8746.205 | 4787.155 | 109.819 | ns/op | 0.547 > Base64Decode.testBase64MIMEDecode | 0 | ... Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - merge master - fix length issue - minor - merge master - fix misc - fix MIME perf issue; misc - minor - merge master - minor - Initial commit ------------- Changes: https://git.openjdk.org/jdk/pull/20026/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20026&range=01 Stats: 283 lines in 3 files changed: 282 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20026.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20026/head:pull/20026 PR: https://git.openjdk.org/jdk/pull/20026 From mli at openjdk.org Wed Aug 21 10:26:18 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 21 Aug 2024 10:26:18 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v3] In-Reply-To: References: Message-ID: > ## Performance > benchmarks run on CanVM-K230 > > data > > Benchmark m2+m1+scalar | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score +intrinsic+rvv | Score -intrinsic | Error | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Base64Decode.testBase64Decode | 0 | 144 | 4 | 1 | avgt | 10 | 97.771 | 98.506 | 0.713 | ns/op | 1.008 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 3 | avgt | 10 | 117.715 | 118.422 | 0.428 | ns/op | 1.006 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 7 | avgt | 10 | 174.625 | 172.767 | 7.671 | ns/op | 0.989 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 32 | avgt | 10 | 286.391 | 317.175 | 11.443 | ns/op | 1.107 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 64 | avgt | 10 | 336.932 | 503.257 | 15.738 | ns/op | 1.494 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 80 | avgt | 10 | 418.894 | 625.485 | 7.21 | ns/op | 1.493 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 96 | avgt | 10 | 353.813 | 698.67 | 15.485 | ns/op | 1.975 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 112 | avgt | 10 | 499.243 | 866.909 | 4.427 | ns/op | 1.736 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 512 | avgt | 10 | 1451.277 | 3530.048 | 3.685 | ns/op | 2.432 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 1000 | avgt | 10 | 2258.785 | 5964.066 | 9.075 | ns/op | 2.64 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 20000 | avgt | 10 | 39689.204 | 122334.929 | 255.195 | ns/op | 3.082 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1 | avgt | 10 | 187.032 | 158.558 | 7.606 | ns/op | 0.848 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 3 | avgt | 10 | 209.558 | 200.774 | 7.648 | ns/op | 0.958 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 7 | avgt | 10 | 556.696 | 505.072 | 8.748 | ns/op | 0.907 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 32 | avgt | 10 | 2139.767 | 1876.825 | 13.787 | ns/op | 0.877 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 64 | avgt | 10 | 6142.353 | 3818.199 | 35.622 | ns/op | 0.622 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 80 | avgt | 10 | 8746.205 | 4787.155 | 109.819 | ns/op | 0.547 > Base64Decode.testBase64MIMEDecode | 0 | ... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: revert misc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20026/files - new: https://git.openjdk.org/jdk/pull/20026/files/5bcabbc6..b29e927f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20026&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20026&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20026.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20026/head:pull/20026 PR: https://git.openjdk.org/jdk/pull/20026 From rkennke at openjdk.org Wed Aug 21 11:48:26 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 21 Aug 2024 11:48:26 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) Message-ID: This is the main body of the JEP 450: Compact Object Headers (Experimental). Main changes: - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #20603 and #20605 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. - The identity hash-code is temporarily narrowed to 25 bits. As soon as we get Tiny Class-Pointers (planned before the JEP can be integrated, and to be opened for review soon), we will widen the hash-bits back to 31 bits. - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). - Arrays will can now store their length at offset 8. - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant. - Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral to a narrow-Klass* (or at least I could not find it), and also I fear that doing so could mess with optimizations. This may be useful to revisit. OTOH, the approach that I have taken works and is similar to DecodeNKlass and similar instructions. Testing: (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests.) The below testing has been run many times, but not with this exact base version of the JDK. I want to hold off the full testing until we also have the Tiny Class-Pointers PR lined-up, and test with that. - [x] tier1 (x86_64) - [ ] tier2 (x86_64) - [ ] tier3 (x86_64) - [ ] tier4 (x86_64) - [x] tier1 (aarch64) - [ ] tier2 (aarch64) - [ ] tier3 (aarch64) - [ ] tier4 (aarch64) - [x] tier1 (x86_64) +UseCompactObjectHeaders - [ ] tier2 (x86_64) +UseCompactObjectHeaders - [ ] tier3 (x86_64) +UseCompactObjectHeaders - [ ] tier4 (x86_64) +UseCompactObjectHeaders - [x] tier1 (aarch64) +UseCompactObjectHeaders - [ ] tier2 (aarch64) +UseCompactObjectHeaders - [ ] tier3 (aarch64) +UseCompactObjectHeaders - [ ] tier4 (aarch64) +UseCompactObjectHeaders - [x] Running as a backport in production since >1 year. ------------- Depends on: https://git.openjdk.org/jdk/pull/20605 Commit messages: - 8305894: Implementation: JEP 450: Compact Object Headers (Experimental) Changes: https://git.openjdk.org/jdk/pull/20640/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20640&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305895 Stats: 1668 lines in 104 files changed: 1232 ins; 206 del; 230 mod Patch: https://git.openjdk.org/jdk/pull/20640.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20640/head:pull/20640 PR: https://git.openjdk.org/jdk/pull/20640 From rehn at openjdk.org Wed Aug 21 11:56:04 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 21 Aug 2024 11:56:04 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v3] In-Reply-To: References: Message-ID: <0CsRTA7DLHnrYQuBL4LBO-z_Z0Fysx-Chyw7w57yyYU=.411be3a8-1587-4bef-94e8-4ffa8d48ea2c@github.com> On Wed, 21 Aug 2024 10:26:18 GMT, Hamlin Li wrote: >> ## Performance >> benchmarks run on CanVM-K230 >> >> data >> >> Benchmark m2+m1+scalar | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score +intrinsic+rvv | Score -intrinsic | Error | Units | Improvement >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 1 | avgt | 10 | 97.771 | 98.506 | 0.713 | ns/op | 1.008 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 3 | avgt | 10 | 117.715 | 118.422 | 0.428 | ns/op | 1.006 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 7 | avgt | 10 | 174.625 | 172.767 | 7.671 | ns/op | 0.989 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 32 | avgt | 10 | 286.391 | 317.175 | 11.443 | ns/op | 1.107 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 64 | avgt | 10 | 336.932 | 503.257 | 15.738 | ns/op | 1.494 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 80 | avgt | 10 | 418.894 | 625.485 | 7.21 | ns/op | 1.493 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 96 | avgt | 10 | 353.813 | 698.67 | 15.485 | ns/op | 1.975 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 112 | avgt | 10 | 499.243 | 866.909 | 4.427 | ns/op | 1.736 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 512 | avgt | 10 | 1451.277 | 3530.048 | 3.685 | ns/op | 2.432 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 1000 | avgt | 10 | 2258.785 | 5964.066 | 9.075 | ns/op | 2.64 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 20000 | avgt | 10 | 39689.204 | 122334.929 | 255.195 | ns/op | 3.082 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1 | avgt | 10 | 187.032 | 158.558 | 7.606 | ns/op | 0.848 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 3 | avgt | 10 | 209.558 | 200.774 | 7.648 | ns/op | 0.958 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 7 | avgt | 10 | 556.696 | 505.072 | 8.748 | ns/op | 0.907 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 32 | avgt | 10 | 2139.767 | 1876.825 | 13.787 | ns/op | 0.877 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 64 | avgt | 10 | 6142.353 | 3818.199 | 35.622 | ns/op | 0.622 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 80 | avgt | 10 | 8746.205 | 4787.155 | 109.819 | ns/op ... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > revert misc Seems good to me, thanks. ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20026#pullrequestreview-2250588168 From aph at openjdk.org Wed Aug 21 12:27:05 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 21 Aug 2024 12:27:05 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 10:07:26 GMT, Roman Kennke wrote: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #20603 and #20605 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - The identity hash-code is temporarily narrowed to 25 bits. As soon as we get Tiny Class-Pointers (planned before the JEP can be integrated, and to be opened for review soon), we will widen the hash-bits back to 31 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant. > - Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral to a narrow-Klass* (o... src/hotspot/cpu/aarch64/c1_MacroAssembler_aarch64.cpp line 184: > 182: } else { > 183: // This assumes that all prototype bits fit in an int32_t > 184: mov(t1, (int32_t)(intptr_t)markWord::prototype().value()); Suggestion: mov(t1, checked_cast((intptr_t)markWord::prototype().value())); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20640#discussion_r1724960170 From aph at openjdk.org Wed Aug 21 13:11:05 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 21 Aug 2024 13:11:05 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) In-Reply-To: References: Message-ID: <1JS96-UJBR95NbAe79ETt1c7aJ3vH9-cijwG24ufs0E=.537b0a80-aedd-4b25-9605-264a2f7f60fb@github.com> On Tue, 20 Aug 2024 10:07:26 GMT, Roman Kennke wrote: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #20603 and #20605 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - The identity hash-code is temporarily narrowed to 25 bits. As soon as we get Tiny Class-Pointers (planned before the JEP can be integrated, and to be opened for review soon), we will widen the hash-bits back to 31 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant. > - Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral to a narrow-Klass* (o... src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2575: > 2573: } else { > 2574: lea(dst, Address(obj, index, Address::lsl(scale))); > 2575: ldr(dst, Address(dst, offset)); Suggestion: ldr(dst, Address(dst, index, Address::lsl(scale))); Will this work? Or is dst unaligned? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20640#discussion_r1725025276 From rkennke at openjdk.org Wed Aug 21 13:21:04 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 21 Aug 2024 13:21:04 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) In-Reply-To: <1JS96-UJBR95NbAe79ETt1c7aJ3vH9-cijwG24ufs0E=.537b0a80-aedd-4b25-9605-264a2f7f60fb@github.com> References: <1JS96-UJBR95NbAe79ETt1c7aJ3vH9-cijwG24ufs0E=.537b0a80-aedd-4b25-9605-264a2f7f60fb@github.com> Message-ID: On Wed, 21 Aug 2024 13:08:23 GMT, Andrew Haley wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #20603 and #20605 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - The identity hash-code is temporarily narrowed to 25 bits. As soon as we get Tiny Class-Pointers (planned before the JEP can be integrated, and to be opened for review soon), we will widen the hash-bits back to 31 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant. >> - Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral t... > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2575: > >> 2573: } else { >> 2574: lea(dst, Address(obj, index, Address::lsl(scale))); >> 2575: ldr(dst, Address(dst, offset)); > > Suggestion: > > ldr(dst, Address(dst, index, Address::lsl(scale))); > > Will this work? Or is dst unaligned? It ignores the offset, right? Or are you saying that offset must be 0 on that path? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20640#discussion_r1725040269 From duke at openjdk.org Wed Aug 21 14:07:40 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Wed, 21 Aug 2024 14:07:40 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v4] In-Reply-To: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: > Hello, > > Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. > > The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. > > At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. > > # Performance > > ## Neoverse N1 > > > -------------------------------------------------------------------------------------------- > Version Baseline This patch > -------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > -------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op > ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op > ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op > ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op > ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op > ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op > ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op > ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op > ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op > ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op > ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op > ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op > ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op > ArraysHashCode.multibytes 10 avgt 15 5.4... Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: cleanup: address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18487/files - new: https://git.openjdk.org/jdk/pull/18487/files/8e9f8d0c..7ddae523 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=02-03 Stats: 30 lines in 3 files changed: 18 ins; 7 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18487.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18487/head:pull/18487 PR: https://git.openjdk.org/jdk/pull/18487 From alan.bateman at oracle.com Wed Aug 21 14:19:50 2024 From: alan.bateman at oracle.com (Alan Bateman) Date: Wed, 21 Aug 2024 15:19:50 +0100 Subject: concurrency-discuss mailing list Message-ID: For many years, concurrency-interest was the mailing list to discuss j.u.concurrent and related topics. The mailing list was hosted on a server maintained by Prof. Doug Lea. Sadly, the mailing list didn't survive an OS upgrade and has been unavailable for some time. A new mailing list concurrency-discuss at openjdk.org [1] has been created to replace the old list.? It's a completely new list, the archives and subscribers from the previous mailing list have not carried over. As the name suggests, the mailing list is for discussion of the j.u.concurrent API/implementation and related topics. It's not the place for support questions. -Alan [1] https://mail.openjdk.org/mailman/listinfo/concurrency-discuss From yzheng at openjdk.org Wed Aug 21 14:34:04 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 21 Aug 2024 14:34:04 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 10:07:26 GMT, Roman Kennke wrote: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #20603 and #20605 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - The identity hash-code is temporarily narrowed to 25 bits. As soon as we get Tiny Class-Pointers (planned before the JEP can be integrated, and to be opened for review soon), we will widen the hash-bits back to 31 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant. > - Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral to a narrow-Klass* (o... src/hotspot/share/opto/library_call.cpp line 4631: > 4629: // vm: see markWord.hpp. > 4630: Node *hash_mask = _gvn.intcon(UseCompactObjectHeaders ? markWord::hash_mask_compact : markWord::hash_mask); > 4631: Node *hash_shift = _gvn.intcon(UseCompactObjectHeaders ? markWord::hash_shift_compact : markWord::hash_shift); Could you please export these two symbols to JVMCI? Thanks! diff --git a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp index 688691fb976..d97fdcb3f44 100644 --- a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp +++ b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp @@ -792,11 +792,13 @@ declare_constant(InvocationCounter::count_shift) \ \ declare_constant(markWord::hash_shift) \ + declare_constant(markWord::hash_shift_compact) \ declare_constant(markWord::monitor_value) \ \ declare_constant(markWord::lock_mask_in_place) \ declare_constant(markWord::age_mask_in_place) \ declare_constant(markWord::hash_mask) \ + declare_constant(markWord::hash_mask_compact) \ declare_constant(markWord::hash_mask_in_place) \ \ declare_constant(markWord::unlocked_value) \ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20640#discussion_r1725162361 From aph at openjdk.org Wed Aug 21 14:43:06 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 21 Aug 2024 14:43:06 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) In-Reply-To: References: <1JS96-UJBR95NbAe79ETt1c7aJ3vH9-cijwG24ufs0E=.537b0a80-aedd-4b25-9605-264a2f7f60fb@github.com> Message-ID: On Wed, 21 Aug 2024 13:18:03 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #20603 and #20605 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - The identity hash-code is temporarily narrowed to 25 bits. As soon as we get Tiny Class-Pointers (planned before the JEP can be integrated, and to be opened for review soon), we will widen the hash-bits back to 31 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant. >> - Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral t... > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2575: > >> 2573: } else { >> 2574: lea(dst, Address(obj, index, Address::lsl(scale))); >> 2575: ldr(dst, Address(dst, offset)); > > It ignores the offset, right? Or are you saying that offset must be 0 on that path? Sorry, brain fart. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20640#discussion_r1725176978 From ihse at openjdk.org Wed Aug 21 14:50:07 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 21 Aug 2024 14:50:07 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 10:07:26 GMT, Roman Kennke wrote: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #20603 and #20605 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - The identity hash-code is temporarily narrowed to 25 bits. As soon as we get Tiny Class-Pointers (planned before the JEP can be integrated, and to be opened for review soon), we will widen the hash-bits back to 31 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant. > - Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral to a narrow-Klass* (o... make/autoconf/jdk-options.m4 line 696: > 694: AVAILABLE=false > 695: else > 696: AC_MSG_RESULT([yes]) You should set `AVAILABLE=true` in this case. Apparently it works anyway, but it will increase clarity of the code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20640#discussion_r1725190726 From duke at openjdk.org Wed Aug 21 14:53:09 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Wed, 21 Aug 2024 14:53:09 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v2] In-Reply-To: <8e4e1ZzE5scPDGAZYMJP2jvd9L3tn2MYHF6QqjuLRC0=.50d76243-8d59-4aaf-abe4-1c0c80ff5988@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <8e4e1ZzE5scPDGAZYMJP2jvd9L3tn2MYHF6QqjuLRC0=.50d76243-8d59-4aaf-abe4-1c0c80ff5988@github.com> Message-ID: On Tue, 20 Aug 2024 15:33:16 GMT, Andrew Haley wrote: >> Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> 8322770: AArch64: C2: Implement VectorizedHashCode >> >> The code to calculate a hash code consists of two parts: a stub method that >> implements a vectorized loop using Neon instruction which processes 16 or 32 >> elements per iteration depending on the data type; and an unrolled inlined >> scalar loop that processes remaining tail elements. >> >> [Performance] >> >> [[Neoverse V2]] >> ``` >> | 328a053 (master) | dc2909f (this) | >> ---------------------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt | Score Error | Score Error | Units >> ---------------------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 | 0.805 ? 0.206 | 0.815 ? 0.141 | ns/op >> ArraysHashCode.bytes 10 avgt 15 | 4.362 ? 0.013 | 3.522 ? 0.124 | ns/op >> ArraysHashCode.bytes 100 avgt 15 | 78.374 ? 0.136 | 12.935 ? 0.016 | ns/op >> ArraysHashCode.bytes 10000 avgt 15 | 9247.335 ? 13.691 | 1344.770 ? 1.898 | ns/op >> ArraysHashCode.chars 1 avgt 15 | 0.731 ? 0.035 | 0.723 ? 0.046 | ns/op >> ArraysHashCode.chars 10 avgt 15 | 4.359 ? 0.007 | 3.385 ? 0.004 | ns/op >> ArraysHashCode.chars 100 avgt 15 | 78.374 ? 0.117 | 11.903 ? 0.023 | ns/op >> ArraysHashCode.chars 10000 avgt 15 | 9248.328 ? 13.644 | 1344.007 ? 1.795 | ns/op >> ArraysHashCode.ints 1 avgt 15 | 0.746 ? 0.083 | 0.631 ? 0.020 | ns/op >> ArraysHashCode.ints 10 avgt 15 | 4.357 ? 0.009 | 3.387 ? 0.005 | ns/op >> ArraysHashCode.ints 100 avgt 15 | 78.391 ? 0.103 | 10.934 ? 0.015 | ns/op >> ArraysHashCode.ints 10000 avgt 15 | 9248.125 ? 12.583 | 1340.644 ? 1.869 | ns/op >> ArraysHashCode.multibytes 1 avgt 15 | 0.555 ? 0.020 | 0.559 ? 0.020 | ns/op >> ArraysHashCode.multibytes 10 avgt 1... > > src/hotspot/share/utilities/intpow.hpp line 29: > >> 27: #define SHARE_UTILITIES_INTPOW_HPP >> 28: >> 29: #include "metaprogramming/enableIf.hpp" > > There's no need for any of this metaprogramming. A constexpr function would be better. Replied in another thread: https://github.com/openjdk/jdk/pull/18487/files/4c6812f63bf9a6d5cf17c7899fe4a77e390c1645#r1725193686 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1725196617 From duke at openjdk.org Wed Aug 21 14:53:10 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Wed, 21 Aug 2024 14:53:10 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v2] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Tue, 20 Aug 2024 12:21:05 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > 8322770: AArch64: C2: Implement VectorizedHashCode > > The code to calculate a hash code consists of two parts: a stub method that > implements a vectorized loop using Neon instruction which processes 16 or 32 > elements per iteration depending on the data type; and an unrolled inlined > scalar loop that processes remaining tail elements. > > [Performance] > > [[Neoverse V2]] > ``` > | 328a053 (master) | dc2909f (this) | > ---------------------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt | Score Error | Score Error | Units > ---------------------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 | 0.805 ? 0.206 | 0.815 ? 0.141 | ns/op > ArraysHashCode.bytes 10 avgt 15 | 4.362 ? 0.013 | 3.522 ? 0.124 | ns/op > ArraysHashCode.bytes 100 avgt 15 | 78.374 ? 0.136 | 12.935 ? 0.016 | ns/op > ArraysHashCode.bytes 10000 avgt 15 | 9247.335 ? 13.691 | 1344.770 ? 1.898 | ns/op > ArraysHashCode.chars 1 avgt 15 | 0.731 ? 0.035 | 0.723 ? 0.046 | ns/op > ArraysHashCode.chars 10 avgt 15 | 4.359 ? 0.007 | 3.385 ? 0.004 | ns/op > ArraysHashCode.chars 100 avgt 15 | 78.374 ? 0.117 | 11.903 ? 0.023 | ns/op > ArraysHashCode.chars 10000 avgt 15 | 9248.328 ? 13.644 | 1344.007 ? 1.795 | ns/op > ArraysHashCode.ints 1 avgt 15 | 0.746 ? 0.083 | 0.631 ? 0.020 | ns/op > ArraysHashCode.ints 10 avgt 15 | 4.357 ? 0.009 | 3.387 ? 0.005 | ns/op > ArraysHashCode.ints 100 avgt 15 | 78.391 ? 0.103 | 10.934 ? 0.015 | ns/op > ArraysHashCode.ints 10000 avgt 15 | 9248.125 ? 12.583 | 1340.644 ? 1.869 | ns/op > ArraysHashCode.multibytes 1 avgt 15 | 0.555 ? 0.020 | 0.559 ? 0.020 | ns/op > ArraysHashCode.multibytes 10 avgt 15 | 2.681 ? 0.020 | 2.175 ? 0.045 | ns/op > ArraysHas... src/hotspot/share/utilities/intpow.hpp line 58: > 56: struct intpow { > 57: static const T value = v; > 58: }; > There's no need for any of this metaprogramming. A constexpr function would be better. The main advantage of this implementation over one using a `constexpr` function is the ability to verify input parameter values and detect overflows at compile time using `static_assert`. This feature helps prevent user errors that could otherwise be difficult to debug. I chose this approach for its functional benefits over a `constexpr` function. If you have any suggestions on how to implement this functionality in a `constexpr` function, they would be highly welcome, as I might be missing something here. ? Below is a similar `constexpr` implementation. **Please don't accept it through the UI**. I kept a `typedef` template parameter to ensure the return value has the desired width, if needed, without applying a mask at the call site. If you'd still prefer a `constexpr` function, please let me know if you have any comments on the implementation below (I'll rename it, remove `no_overflow` parameter and the comments). Suggestion: #include template static constexpr T intpow_c(T v, unsigned p, bool no_overflow = false) { // Can't be used in constexpr function // static_assert(v || p, "0^0 is not defined"); if (p == 0) { return 1; } T a = intpow_c(v, p / 2, no_overflow); T b = (p % 2) ? v : 1; // Can't be used in constexpr function // static_assert(!no_overflow || a <= std::numeric_limits::max() / a, "Integer overflow"); // static_assert(!no_overflow || a * a <= std::numeric_limits::max() / b, "Integer overflow"); return a * a * b; } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1725193686 From aph at openjdk.org Wed Aug 21 15:00:08 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 21 Aug 2024 15:00:08 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v2] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: <9LLOY_mJWmquBrFwkt954r5mBKyJ-TzuIkXgCjZKvs4=.c2bf26cc-09fb-48fb-a731-c06d64382738@github.com> On Wed, 21 Aug 2024 14:49:05 GMT, Mikhail Ablakatov wrote: >> Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> 8322770: AArch64: C2: Implement VectorizedHashCode >> >> The code to calculate a hash code consists of two parts: a stub method that >> implements a vectorized loop using Neon instruction which processes 16 or 32 >> elements per iteration depending on the data type; and an unrolled inlined >> scalar loop that processes remaining tail elements. >> >> [Performance] >> >> [[Neoverse V2]] >> ``` >> | 328a053 (master) | dc2909f (this) | >> ---------------------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt | Score Error | Score Error | Units >> ---------------------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 | 0.805 ? 0.206 | 0.815 ? 0.141 | ns/op >> ArraysHashCode.bytes 10 avgt 15 | 4.362 ? 0.013 | 3.522 ? 0.124 | ns/op >> ArraysHashCode.bytes 100 avgt 15 | 78.374 ? 0.136 | 12.935 ? 0.016 | ns/op >> ArraysHashCode.bytes 10000 avgt 15 | 9247.335 ? 13.691 | 1344.770 ? 1.898 | ns/op >> ArraysHashCode.chars 1 avgt 15 | 0.731 ? 0.035 | 0.723 ? 0.046 | ns/op >> ArraysHashCode.chars 10 avgt 15 | 4.359 ? 0.007 | 3.385 ? 0.004 | ns/op >> ArraysHashCode.chars 100 avgt 15 | 78.374 ? 0.117 | 11.903 ? 0.023 | ns/op >> ArraysHashCode.chars 10000 avgt 15 | 9248.328 ? 13.644 | 1344.007 ? 1.795 | ns/op >> ArraysHashCode.ints 1 avgt 15 | 0.746 ? 0.083 | 0.631 ? 0.020 | ns/op >> ArraysHashCode.ints 10 avgt 15 | 4.357 ? 0.009 | 3.387 ? 0.005 | ns/op >> ArraysHashCode.ints 100 avgt 15 | 78.391 ? 0.103 | 10.934 ? 0.015 | ns/op >> ArraysHashCode.ints 10000 avgt 15 | 9248.125 ? 12.583 | 1340.644 ? 1.869 | ns/op >> ArraysHashCode.multibytes 1 avgt 15 | 0.555 ? 0.020 | 0.559 ? 0.020 | ns/op >> ArraysHashCode.multibytes 10 avgt 1... > > src/hotspot/share/utilities/intpow.hpp line 58: > >> 56: struct intpow { >> 57: static const T value = v; >> 58: }; > >> There's no need for any of this metaprogramming. A constexpr function would be better. > > The main advantage of this implementation over one using a `constexpr` function is the ability to verify input parameter values and detect overflows at compile time using `static_assert`. This feature helps prevent user errors that could otherwise be difficult to debug. I chose this approach for its functional benefits over a `constexpr` function. If you have any suggestions on how to implement this functionality in a `constexpr` function, they would be highly welcome, as I might be missing something here. ? > > Below is a similar `constexpr` implementation. **Please don't accept it through the UI**. I kept a `typedef` template parameter to ensure the return value has the desired width, if needed, without applying a mask at the call site. > > If you'd still prefer a `constexpr` function, please let me know if you have any comments on the implementation below (I'll rename it, remove `no_overflow` parameter and the comments). > > Suggestion: > > #include > > template > static constexpr T intpow_c(T v, unsigned p, bool no_overflow = false) { > // Can't be used in constexpr function > // static_assert(v || p, "0^0 is not defined"); > > if (p == 0) { > return 1; > } > > T a = intpow_c(v, p / 2, no_overflow); > T b = (p % 2) ? v : 1; > > // Can't be used in constexpr function > // static_assert(!no_overflow || a <= std::numeric_limits::max() / a, "Integer overflow"); > // static_assert(!no_overflow || a * a <= std::numeric_limits::max() / b, "Integer overflow"); > > return a * a * b; > } You're assuming a requirement that isn't really there. HotSpot calculates a great many things at startup time, and given that the arguments to intpow here are constants, there's no possibility of any surprises at runtime. Just assert what needs to be asserted, and please keep utility functions like this as simple as possible, leaving template metaprogramming for places that need it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1725219259 From aph at openjdk.org Wed Aug 21 15:22:06 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 21 Aug 2024 15:22:06 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v2] In-Reply-To: <9LLOY_mJWmquBrFwkt954r5mBKyJ-TzuIkXgCjZKvs4=.c2bf26cc-09fb-48fb-a731-c06d64382738@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <9LLOY_mJWmquBrFwkt954r5mBKyJ-TzuIkXgCjZKvs4=.c2bf26cc-09fb-48fb-a731-c06d64382738@github.com> Message-ID: On Wed, 21 Aug 2024 14:57:24 GMT, Andrew Haley wrote: >> src/hotspot/share/utilities/intpow.hpp line 58: >> >>> 56: struct intpow { >>> 57: static const T value = v; >>> 58: }; >> >>> There's no need for any of this metaprogramming. A constexpr function would be better. >> >> The main advantage of this implementation over one using a `constexpr` function is the ability to verify input parameter values and detect overflows at compile time using `static_assert`. This feature helps prevent user errors that could otherwise be difficult to debug. I chose this approach for its functional benefits over a `constexpr` function. If you have any suggestions on how to implement this functionality in a `constexpr` function, they would be highly welcome, as I might be missing something here. ? >> >> Below is a similar `constexpr` implementation. **Please don't accept it through the UI**. I kept a `typedef` template parameter to ensure the return value has the desired width, if needed, without applying a mask at the call site. >> >> If you'd still prefer a `constexpr` function, please let me know if you have any comments on the implementation below (I'll rename it, remove `no_overflow` parameter and the comments). >> >> Suggestion: >> >> #include >> >> template >> static constexpr T intpow_c(T v, unsigned p, bool no_overflow = false) { >> // Can't be used in constexpr function >> // static_assert(v || p, "0^0 is not defined"); >> >> if (p == 0) { >> return 1; >> } >> >> T a = intpow_c(v, p / 2, no_overflow); >> T b = (p % 2) ? v : 1; >> >> // Can't be used in constexpr function >> // static_assert(!no_overflow || a <= std::numeric_limits::max() / a, "Integer overflow"); >> // static_assert(!no_overflow || a * a <= std::numeric_limits::max() / b, "Integer overflow"); >> >> return a * a * b; >> } > > You're assuming a requirement that isn't really there. HotSpot calculates a great many things at startup time, and given that the arguments to intpow here are constants, there's no possibility of any surprises at runtime. Just assert what needs to be asserted, and please keep utility functions like this as simple as possible, leaving template metaprogramming for places that need it. So yes, this looks OK, but of course the "Can't use" comments are unnecessary if you just use` assert`s. P.S. If T is unsigned, there is no overflow. The desired result is mod 2**n, by the definition of the unsigned integer types. P.P.S 0**0 is 1, for all purposes for which integer pow() is going to be used in HotSpot: this is combinatorics rather than calculus. It's also what `Math.pow()` does, I think so we're good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1725284490 From duke at openjdk.org Wed Aug 21 15:42:06 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Wed, 21 Aug 2024 15:42:06 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v2] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <9LLOY_mJWmquBrFwkt954r5mBKyJ-TzuIkXgCjZKvs4=.c2bf26cc-09fb-48fb-a731-c06d64382738@github.com> Message-ID: On Wed, 21 Aug 2024 15:19:39 GMT, Andrew Haley wrote: > P.S. If T is unsigned, there is no overflow. The desired result is mod 2**n, by the definition of the unsigned integer types. The idea behind `no_overflow` was to allow user to verify that the result is the exact representation of `v**p` and not just `mod 2**n`. Since this is not required, I'll remove the parameter and the assertions and add `std::is_unsigned::value` constraint instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1725317718 From liach at openjdk.org Wed Aug 21 15:42:18 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 21 Aug 2024 15:42:18 GMT Subject: RFR: 8336275: Move common Method and Constructor fields to Executable [v3] In-Reply-To: References: Message-ID: > Move fields common to Method and Field to executable, which simplifies implementation. Removed useless transient modifiers as Method and Field were never serializable. > > Note to core-libs reviewers: Please review the associated CSR on trivial removal of `abstract` modifier as well. Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Fix after merge - Merge branch 'master' of https://github.com/openjdk/jdk into feature/executable-inline - Merge branch 'master' of https://github.com/openjdk/jdk into feature/executable-inline - Redundant transient; Update the comments to be more accurate - Inline some common ctor + method fields to executable ------------- Changes: https://git.openjdk.org/jdk/pull/20188/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20188&range=02 Stats: 448 lines in 11 files changed: 77 ins; 238 del; 133 mod Patch: https://git.openjdk.org/jdk/pull/20188.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20188/head:pull/20188 PR: https://git.openjdk.org/jdk/pull/20188 From duke at openjdk.org Wed Aug 21 16:11:25 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Wed, 21 Aug 2024 16:11:25 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v5] In-Reply-To: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: > Hello, > > Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. > > The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. > > At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. > > # Performance > > ## Neoverse N1 > > > -------------------------------------------------------------------------------------------- > Version Baseline This patch > -------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > -------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op > ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op > ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op > ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op > ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op > ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op > ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op > ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op > ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op > ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op > ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op > ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op > ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op > ArraysHashCode.multibytes 10 avgt 15 5.4... Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: cleanup: use a constexpr function for intpow instead of a templated class ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18487/files - new: https://git.openjdk.org/jdk/pull/18487/files/7ddae523..eb9708c9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=03-04 Stats: 32 lines in 2 files changed: 1 ins; 14 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/18487.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18487/head:pull/18487 PR: https://git.openjdk.org/jdk/pull/18487 From coleenp at openjdk.org Wed Aug 21 16:38:16 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 21 Aug 2024 16:38:16 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace Message-ID: This change stores InstanceKlass for interface and abstract classes in the non-class metaspace, since class metaspace will have limits on number of classes that can be represented when Lilliput changes go in. Classes that have no instances created for them don't require compressed class pointers. The generated LambdaForm classes are also AllStatic, and changing them to abstract moves them to non-class metaspace too. It's not technically great to make them abstract and not final but you can't have both. Java classfile access flags have no way of specifying something like AllStatic. Tested with tier1-8. ------------- Commit messages: - 8338526: Don't store abstract and interface Klasses in class metaspace Changes: https://git.openjdk.org/jdk/pull/19157/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19157&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338526 Stats: 71 lines in 17 files changed: 31 ins; 6 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/19157.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19157/head:pull/19157 PR: https://git.openjdk.org/jdk/pull/19157 From sgehwolf at openjdk.org Wed Aug 21 16:40:09 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Wed, 21 Aug 2024 16:40:09 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v5] In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 17:34:46 GMT, Severin Gehwolf wrote: >> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. >> >> I'm adding those tests in order to not regress another time. >> >> Testing: >> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. >> - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) >> - [x] GHA > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Add Whitebox check for host cpu > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Fix comments > - 8333446: Add tests for hierarchical container support @dholmes-ora @iklam @MBaesken Could somebody of you help review this, please? Increases test coverage in the container detection area. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2302519288 From jbhateja at openjdk.org Wed Aug 21 16:42:44 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 21 Aug 2024 16:42:44 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v3] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Pass explicit wrap argument to selectFrom API with default value set to true. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/055fb22f..e24632cb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=01-02 Stats: 491 lines in 40 files changed: 430 ins; 1 del; 60 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From coleenp at openjdk.org Wed Aug 21 16:44:08 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 21 Aug 2024 16:44:08 GMT Subject: RFR: 8336275: Move common Method and Constructor fields to Executable [v3] In-Reply-To: References: Message-ID: On Wed, 21 Aug 2024 15:42:18 GMT, Chen Liang wrote: >> Move fields common to Method and Field to executable, which simplifies implementation. Removed useless transient modifiers as Method and Field were never serializable. >> >> Note to core-libs reviewers: Please review the associated CSR on trivial removal of `abstract` modifier as well. > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Fix after merge > - Merge branch 'master' of https://github.com/openjdk/jdk into feature/executable-inline > - Merge branch 'master' of https://github.com/openjdk/jdk into feature/executable-inline > - Redundant transient; Update the comments to be more accurate > - Inline some common ctor + method fields to executable src/hotspot/share/classfile/javaClasses.cpp line 3308: > 3306: macro(_name_offset, k, vmSymbols::name_name(), string_signature, false); \ > 3307: macro(_returnType_offset, k, vmSymbols::returnType_name(), class_signature, false); \ > 3308: macro(_annotation_default_offset, k, vmSymbols::annotation_default_name(), byte_array_signature, false); Can you re-align these since you modified them? src/hotspot/share/classfile/javaClasses.cpp line 3323: > 3321: Handle java_lang_reflect_Method::create(TRAPS) { > 3322: assert(Universe::is_fully_initialized(), "Need to find another solution to the reflection problem"); > 3323: Klass* klass = vmClasses::reflect_Method_klass(); This also does not need a cast, vmClasses::reflect_Method_klass() returns an InstanceKlass. src/hotspot/share/classfile/javaClasses.cpp line 3351: > 3349: assert(Universe::is_fully_initialized(), "Need to find another solution to the reflection problem"); > 3350: Klass* k = vmClasses::reflect_Constructor_klass(); > 3351: InstanceKlass* ik = InstanceKlass::cast(k); This doesn't need a cast because vmClasses returns InstanceKlass. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20188#discussion_r1725402105 PR Review Comment: https://git.openjdk.org/jdk/pull/20188#discussion_r1725403135 PR Review Comment: https://git.openjdk.org/jdk/pull/20188#discussion_r1725399269 From coleenp at openjdk.org Wed Aug 21 16:47:05 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 21 Aug 2024 16:47:05 GMT Subject: RFR: 8336275: Move common Method and Constructor fields to Executable [v3] In-Reply-To: References: Message-ID: On Wed, 21 Aug 2024 15:42:18 GMT, Chen Liang wrote: >> Move fields common to Method and Field to executable, which simplifies implementation. Removed useless transient modifiers as Method and Field were never serializable. >> >> Note to core-libs reviewers: Please review the associated CSR on trivial removal of `abstract` modifier as well. > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Fix after merge > - Merge branch 'master' of https://github.com/openjdk/jdk into feature/executable-inline > - Merge branch 'master' of https://github.com/openjdk/jdk into feature/executable-inline > - Redundant transient; Update the comments to be more accurate > - Inline some common ctor + method fields to executable The hotspot code looks okay, but I have a couple of small cleanup requests in the changes. ------------- Changes requested by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20188#pullrequestreview-2251378680 From jbhateja at openjdk.org Wed Aug 21 16:52:06 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 21 Aug 2024 16:52:06 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v3] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Wed, 21 Aug 2024 16:42:44 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Pass explicit wrap argument to selectFrom API with default value set to true. Hi @rose00 , @sviswa7 , @PaulSandoz , As suggested, now passing explicit 'wrap' argument to new selectFrom API. Following are the performance number of modified JMH micro included with the patch. Baseline:- Benchmark (size) Mode Cnt Score Error Units SelectFromBenchmark.rearrangeFromByteVector 4096 thrpt 2 5849.771 ops/ms SelectFromBenchmark.rearrangeFromDoubleVector 4096 thrpt 2 430.712 ops/ms SelectFromBenchmark.rearrangeFromFloatVector 4096 thrpt 2 942.737 ops/ms SelectFromBenchmark.rearrangeFromIntVector 4096 thrpt 2 1057.695 ops/ms SelectFromBenchmark.rearrangeFromLongVector 4096 thrpt 2 616.360 ops/ms SelectFromBenchmark.rearrangeFromShortVector 4096 thrpt 2 2146.465 ops/ms With Patch:- Benchmark (size) Mode Cnt Score Error Units SelectFromBenchmark.selectFromByteVector 4096 thrpt 2 9543.775 ops/ms SelectFromBenchmark.selectFromDoubleVector 4096 thrpt 2 558.195 ops/ms SelectFromBenchmark.selectFromFloatVector 4096 thrpt 2 1325.059 ops/ms SelectFromBenchmark.selectFromIntVector 4096 thrpt 2 1418.748 ops/ms SelectFromBenchmark.selectFromLongVector 4096 thrpt 2 687.231 ops/ms SelectFromBenchmark.selectFromShortVector 4096 thrpt 2 4782.395 ops/ms With WIP wrap index acceleration PR#20634: Benchmark (size) Mode Cnt Score Error Units SelectFromBenchmark.rearrangeFromByteVector 4096 thrpt 2 7602.645 ops/ms SelectFromBenchmark.rearrangeFromDoubleVector 4096 thrpt 2 441.684 ops/ms SelectFromBenchmark.rearrangeFromFloatVector 4096 thrpt 2 926.112 ops/ms SelectFromBenchmark.rearrangeFromIntVector 4096 thrpt 2 1061.695 ops/ms SelectFromBenchmark.rearrangeFromLongVector 4096 thrpt 2 644.058 ops/ms SelectFromBenchmark.rearrangeFromShortVector 4096 thrpt 2 2777.735 ops/ms ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2302541724 From liach at openjdk.org Wed Aug 21 17:34:02 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 21 Aug 2024 17:34:02 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace In-Reply-To: References: Message-ID: On Thu, 9 May 2024 13:51:09 GMT, Coleen Phillimore wrote: > This change stores InstanceKlass for interface and abstract classes in the non-class metaspace, since class metaspace will have limits on number of classes that can be represented when Lilliput changes go in. Classes that have no instances created for them don't require compressed class pointers. The generated LambdaForm classes are also AllStatic, and changing them to abstract moves them to non-class metaspace too. It's not technically great to make them abstract and not final but you can't have both. Java classfile access flags have no way of specifying something like AllStatic. > > Tested with tier1-8. java.lang.invoke changes look good. ------------- PR Review: https://git.openjdk.org/jdk/pull/19157#pullrequestreview-2251483857 From sviswanathan at openjdk.org Wed Aug 21 17:51:06 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 21 Aug 2024 17:51:06 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v3] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Wed, 21 Aug 2024 16:49:40 GMT, Jatin Bhateja wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Pass explicit wrap argument to selectFrom API with default value set to true. > > Hi @rose00 , @sviswa7 , @PaulSandoz , > As suggested, now passing explicit 'wrap' argument to new selectFrom API. > > Following are the performance number of modified JMH micro included with the patch. > > > > Baseline:- > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 4096 thrpt 2 5849.771 ops/ms > SelectFromBenchmark.rearrangeFromDoubleVector 4096 thrpt 2 430.712 ops/ms > SelectFromBenchmark.rearrangeFromFloatVector 4096 thrpt 2 942.737 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 4096 thrpt 2 1057.695 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 4096 thrpt 2 616.360 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 4096 thrpt 2 2146.465 ops/ms > > With Patch:- > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.selectFromByteVector 4096 thrpt 2 9543.775 ops/ms > SelectFromBenchmark.selectFromDoubleVector 4096 thrpt 2 558.195 ops/ms > SelectFromBenchmark.selectFromFloatVector 4096 thrpt 2 1325.059 ops/ms > SelectFromBenchmark.selectFromIntVector 4096 thrpt 2 1418.748 ops/ms > SelectFromBenchmark.selectFromLongVector 4096 thrpt 2 687.231 ops/ms > SelectFromBenchmark.selectFromShortVector 4096 thrpt 2 4782.395 ops/ms > > > With WIP wrap index acceleration PR#20634: > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 4096 thrpt 2 7602.645 ops/ms > SelectFromBenchmark.rearrangeFromDoubleVector 4096 thrpt 2 441.684 ops/ms > SelectFromBenchmark.rearrangeFromFloatVector 4096 thrpt 2 926.112 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 4096 thrpt 2 1061.695 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 4096 thrpt 2 644.058 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 4096 thrpt 2 2777.735 ops/ms @jatin-bhateja Thanks, the PR ((https://github.com/openjdk/jdk/pull/20634) is still work in progress and can be simplified much further. The changes I am currently working on are do wrap by default for rearrange and selectFrom as suggested by John and Paul, no additional api with boolean wrap as parameter, and no changes to shuffle constructors. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2302641840 From mli at openjdk.org Wed Aug 21 18:11:13 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 21 Aug 2024 18:11:13 GMT Subject: RFR: 8338760: Adjust the comment after UseObjectMonitorTable Message-ID: Hi, Can you help to review this simple comment change? After reading description and code of https://github.com/openjdk/jdk/pull/20067, seems to me this comment should be changed accordingly. But I could be wrong. Thanks ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/20663/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20663&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338760 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20663.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20663/head:pull/20663 PR: https://git.openjdk.org/jdk/pull/20663 From psandoz at openjdk.org Wed Aug 21 18:30:06 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 21 Aug 2024 18:30:06 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v3] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Wed, 21 Aug 2024 16:42:44 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Pass explicit wrap argument to selectFrom API with default value set to true. Is it possible for the intrinsic to be responsible for wrapping, if needed? If was looking at [`vpermi2b`](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=vpermi2b&ig_expand=4917,4982,5004,5010,5014&techs=AVX_512) and AFAICT it implicitly wraps, operating on the lower N bits. Is that correct? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2302707611 From john.r.rose at oracle.com Wed Aug 21 18:40:15 2024 From: john.r.rose at oracle.com (John Rose) Date: Wed, 21 Aug 2024 11:40:15 -0700 Subject: RFR: 8338023: Support two vector selectFrom API [v3] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On 21 Aug 2024, at 10:51, Sandhya Viswanathan wrote: > @jatin-bhateja Thanks, the PR ((https://github.com/openjdk/jdk/pull/20634) is still work in progress and can be simplified much further. The changes I am currently working on are do wrap by default for rearrange and selectFrom as suggested by John and Paul, no additional api with boolean wrap as parameter, and no changes to shuffle constructors. Yes, thank you Sandhya; this is the destination I hope to arrive at. Not necessarily 100% in this PR, but this PR should be consistent with it. ?To review: Shuffles store their indexes ?partially wrapped? so as to preserve information about which indexes were out of bounds, but they also preserve all index values mod VLEN. It?s always an option, though not a requirement, to fully wrap, removing the OOB info and reducing every index down to 0..VLEN-1. When using a vector instead of a shuffle for steering, we think of this as creating a temp shuffle first, then doing the appropriate operation(s). But for best instruction selection, we have found that it?s fastest to force everything down to 0..VLEN-1 immediately, at least in the vector case, and to a doubled dynamic range, mod 2VLEN, for the two-input case. There?s always an equivalent expression which uses an explicit shuffle to carry either VLEN (fully wrapped) or 2VLEN (partially wrapped) indexes. For the vector-steered version we implement only the most favorable pattern of shuffle usage, one which never throws. And of course we don?t build a temp shuffle either. From john.r.rose at oracle.com Wed Aug 21 18:51:25 2024 From: john.r.rose at oracle.com (John Rose) Date: Wed, 21 Aug 2024 11:51:25 -0700 Subject: RFR: 8338023: Support two vector selectFrom API [v3] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On 21 Aug 2024, at 11:30, Paul Sandoz wrote: > Is it possible for the intrinsic to be responsible for wrapping, if needed? If was looking at [`vpermi2b`](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=vpermi2b&ig_expand=4917,4982,5004,5010,5014&techs=AVX_512) and AFAICT it implicitly wraps, operating on the lower N bits. Is that correct? That?s not a bad idea. But it is also possible (and routine) for the JIT to take an expression like (i >> (j&31)) down to (i >> j) if the hardware takes care of the (j&31) inside its >> operation. I think that some hardware permutation operations do something similar to >> in that they simply ignore irrelevant bits in the steering indexes. (Other operations do exotic things with irrelevant bits, such as interpreting the sign bit as a command to ?force this one to zero?.) If the wrapping operation for steering indexes is just a vpand against a simple constant, then maybe (maybe!) the JIT can easily drop that vpand, when the input is passed to a friendly auto-masking instruction, just like with (i >> (j&31)). On the other hand, Paul?s idea might be more robust. It would require that the permutation intrinsics would apply vpand at the right places, and omit vpand when possible. On the other other hand (the first hand) the classic way of doing it doesn?t introduce vpand inside of intrinsics, which has a routine advantage: The vpands introduced outside of the intrinsic can be user-introduced or framework-introduced or both. In all cases, the JIT treats them uniformly and can collapse them together. Putting magic fixup instructions inside of intrinsic expansion risks making them invisible to the routine optimizations of the JIT. So, assuming the vpand gets good optimization, putting it outside of the intrinsic is the most robust option, as long as ?good optimization? includes the >>(j&31) trick for auto-masking instructions. So the intrinsic should look for a vpand in its steering input, and pop off the IR node if the hardware masking is found to produce the same result. From matsaave at openjdk.org Wed Aug 21 19:33:46 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 21 Aug 2024 19:33:46 GMT Subject: RFR: 8335664: Parsing jsr broken: assert(bci>= 0 && bci < c->method()->code_size()) failed: index out of bounds [v2] In-Reply-To: References: Message-ID: > Although JSR bytecodes cannot be generated by javac anymore, a classfile generated with a tool like JASM can still contain this bytecode. Should a program end with a JSR, there will be undefined behavior since the bytecode reads the address of the next instruction. In the case of Hotspot, this leads to a crash when generating oop maps. This fixes the calculation of basic blocks. > > The early exploration of this issue was done by @eme64 who also generated a reproducer. Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: - Removed incorrect comment and added copyright - Added regression test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20645/files - new: https://git.openjdk.org/jdk/pull/20645/files/dd8de52d..622b77b9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20645&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20645&range=00-01 Stats: 113 lines in 4 files changed: 112 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20645.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20645/head:pull/20645 PR: https://git.openjdk.org/jdk/pull/20645 From sviswanathan at openjdk.org Wed Aug 21 19:34:06 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 21 Aug 2024 19:34:06 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v3] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Wed, 21 Aug 2024 18:27:09 GMT, Paul Sandoz wrote: > Is it possible for the intrinsic to be responsible for wrapping, if needed? If was looking at [`vpermi2b`](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=vpermi2b&ig_expand=4917,4982,5004,5010,5014&techs=AVX_512) and AFAICT it implicitly wraps, operating on the lower N bits. Is that correct? It is good to keep wrapping separate. Two reasons: 1) Not all permute instructions do wrapping e.g. pshufb has a different behavior if MSB is set. 2) By keeping wrapping separate it can move out of the loop if shuffle is loop invariant. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2302865908 From dlong at openjdk.org Wed Aug 21 19:40:03 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 21 Aug 2024 19:40:03 GMT Subject: RFR: 8335664: Parsing jsr broken: assert(bci>= 0 && bci < c->method()->code_size()) failed: index out of bounds [v2] In-Reply-To: References: Message-ID: On Wed, 21 Aug 2024 19:33:46 GMT, Matias Saavedra Silva wrote: >> Although JSR bytecodes cannot be generated by javac anymore, a classfile generated with a tool like JASM can still contain this bytecode. Should a program end with a JSR, there will be undefined behavior since the bytecode reads the address of the next instruction. In the case of Hotspot, this leads to a crash when generating oop maps. This fixes the calculation of basic blocks. >> >> The early exploration of this issue was done by @eme64 who also generated a reproducer. > > Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: > > - Removed incorrect comment and added copyright > - Added regression test test/hotspot/jtreg/runtime/interpreter/LastJsr.jasm line 32: > 30: return; > 31: LABEL: > 32: nop; Are these NOP instructions in both tests necessary? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20645#discussion_r1725661836 From matsaave at openjdk.org Wed Aug 21 19:43:03 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 21 Aug 2024 19:43:03 GMT Subject: RFR: 8335664: Parsing jsr broken: assert(bci>= 0 && bci < c->method()->code_size()) failed: index out of bounds [v2] In-Reply-To: References: Message-ID: On Wed, 21 Aug 2024 19:37:53 GMT, Dean Long wrote: >> Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: >> >> - Removed incorrect comment and added copyright >> - Added regression test > > test/hotspot/jtreg/runtime/interpreter/LastJsr.jasm line 32: > >> 30: return; >> 31: LABEL: >> 32: nop; > > Are these NOP instructions in both tests necessary? I don't believe so, no. They were included as part of @eme64's reproducer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20645#discussion_r1725663966 From dlong at openjdk.org Wed Aug 21 19:56:03 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 21 Aug 2024 19:56:03 GMT Subject: RFR: 8335664: Parsing jsr broken: assert(bci>= 0 && bci < c->method()->code_size()) failed: index out of bounds [v2] In-Reply-To: References: Message-ID: On Wed, 21 Aug 2024 19:33:46 GMT, Matias Saavedra Silva wrote: >> Although JSR bytecodes cannot be generated by javac anymore, a classfile generated with a tool like JASM can still contain this bytecode. Should a program end with a JSR, there will be undefined behavior since the bytecode reads the address of the next instruction. In the case of Hotspot, this leads to a crash when generating oop maps. This fixes the calculation of basic blocks. >> >> The early exploration of this issue was done by @eme64 who also generated a reproducer. > > Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: > > - Removed incorrect comment and added copyright > - Added regression test Let's say the JSR is not the last instruction, but it doesn't return because there is no RET instruction. What is the effect of creating a basic block that won't be used -- is it harmless? Do we need a test for that? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20645#issuecomment-2302901409 From mgronlun at openjdk.org Wed Aug 21 20:15:13 2024 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 21 Aug 2024 20:15:13 GMT Subject: RFR: 8338745: Intrinsify Continuation.pin() and Continuation.unpin() Message-ID: Greetings, Please help review this change set that implements C2 intrinsics for jdk.internal.vm.Continuation.pin() and jdk.internal.vm.Continuation.unpin(). This work is a consequence of [JDK-8338417](https://bugs.openjdk.org/browse/JDK-8338417), which required us to introduce explicit pin constructs for VirtualThreads in a relatively performance-sensitive path. Testing: jdk_jfr, loom testing Comment: I changed the type of the ContinuationEntry::_pin_count field from uint to uin32_t to make the size explicit and to access it uniformly from the intrinsic code using T_INT. Thanks Markus ------------- Commit messages: - 8338745 Changes: https://git.openjdk.org/jdk/pull/20664/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20664&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338745 Stats: 117 lines in 8 files changed: 113 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20664.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20664/head:pull/20664 PR: https://git.openjdk.org/jdk/pull/20664 From dlong at openjdk.org Wed Aug 21 20:34:03 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 21 Aug 2024 20:34:03 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace In-Reply-To: References: Message-ID: <0UeIWfhsJRNahI3I8r8wKfGlvWuY-0crKHqGxbPXk5o=.6b9ff3f1-f297-40f0-9c81-99507c2b2880@github.com> On Thu, 9 May 2024 13:51:09 GMT, Coleen Phillimore wrote: > This change stores InstanceKlass for interface and abstract classes in the non-class metaspace, since class metaspace will have limits on number of classes that can be represented when Lilliput changes go in. Classes that have no instances created for them don't require compressed class pointers. The generated LambdaForm classes are also AllStatic, and changing them to abstract moves them to non-class metaspace too. It's not technically great to make them abstract and not final but you can't have both. Java classfile access flags have no way of specifying something like AllStatic. > > Tested with tier1-8. src/java.base/share/classes/java/lang/invoke/InvokerBytecodeGenerator.java line 279: > 277: clb.withMethod(invokerName, invokerDesc, ACC_STATIC, config); > 278: } > 279: There's probably not much value in using ACC_FINAL here anyway. We are only using these classes to create static methods, right? I think ACC_INTERFACE would work here too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1725721069 From coleenp at openjdk.org Wed Aug 21 21:02:04 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 21 Aug 2024 21:02:04 GMT Subject: RFR: 8338760: Adjust the comment after UseObjectMonitorTable In-Reply-To: References: Message-ID: On Wed, 21 Aug 2024 18:07:01 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple comment change? > After reading description and code of https://github.com/openjdk/jdk/pull/20067, seems to me this comment should be changed accordingly. But I could be wrong. > Thanks Yes, this is correct. Thank you for noticing this. Also, it can be checked in as "trivial". ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20663#pullrequestreview-2251935600 From liach at openjdk.org Wed Aug 21 21:05:04 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 21 Aug 2024 21:05:04 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace In-Reply-To: <0UeIWfhsJRNahI3I8r8wKfGlvWuY-0crKHqGxbPXk5o=.6b9ff3f1-f297-40f0-9c81-99507c2b2880@github.com> References: <0UeIWfhsJRNahI3I8r8wKfGlvWuY-0crKHqGxbPXk5o=.6b9ff3f1-f297-40f0-9c81-99507c2b2880@github.com> Message-ID: <77NWcTx23rX8UnhRcnOqS36y4Y-7-zDEf3hyYD1bcbw=.6f02be24-3cd2-44d8-8483-934295aab9fd@github.com> On Wed, 21 Aug 2024 20:31:09 GMT, Dean Long wrote: >> This change stores InstanceKlass for interface and abstract classes in the non-class metaspace, since class metaspace will have limits on number of classes that can be represented when Lilliput changes go in. Classes that have no instances created for them don't require compressed class pointers. The generated LambdaForm classes are also AllStatic, and changing them to abstract moves them to non-class metaspace too. It's not technically great to make them abstract and not final but you can't have both. Java classfile access flags have no way of specifying something like AllStatic. >> >> Tested with tier1-8. > > src/java.base/share/classes/java/lang/invoke/InvokerBytecodeGenerator.java line 279: > >> 277: clb.withMethod(invokerName, invokerDesc, ACC_STATIC, config); >> 278: } >> 279: > > There's probably not much value in using ACC_FINAL here anyway. We are only using these classes to create static methods, right? I think ACC_INTERFACE would work here too. Note that JVMS 4.1 requires `ACC_ABSTRACT` to be also set when `ACC_INTERFACE` is set. Also note that some classes capture class data to refer to hidden classes and method handles or lambda forms, so those fields' generation need to add `ACC_PUBLIC` flag to be usable in interfaces. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1725751901 From coleenp at openjdk.org Wed Aug 21 21:20:06 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 21 Aug 2024 21:20:06 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace In-Reply-To: <77NWcTx23rX8UnhRcnOqS36y4Y-7-zDEf3hyYD1bcbw=.6f02be24-3cd2-44d8-8483-934295aab9fd@github.com> References: <0UeIWfhsJRNahI3I8r8wKfGlvWuY-0crKHqGxbPXk5o=.6b9ff3f1-f297-40f0-9c81-99507c2b2880@github.com> <77NWcTx23rX8UnhRcnOqS36y4Y-7-zDEf3hyYD1bcbw=.6f02be24-3cd2-44d8-8483-934295aab9fd@github.com> Message-ID: On Wed, 21 Aug 2024 21:02:50 GMT, Chen Liang wrote: >> src/java.base/share/classes/java/lang/invoke/InvokerBytecodeGenerator.java line 279: >> >>> 277: clb.withMethod(invokerName, invokerDesc, ACC_STATIC, config); >>> 278: } >>> 279: >> >> There's probably not much value in using ACC_FINAL here anyway. We are only using these classes to create static methods, right? I think ACC_INTERFACE would work here too. > > Note that JVMS 4.1 requires `ACC_ABSTRACT` to be also set when `ACC_INTERFACE` is set. Also note that some classes capture class data to refer to hidden classes and method handles or lambda forms, so those fields' generation need to add `ACC_PUBLIC` flag to be usable in interfaces. I feel like making it ACC_INTERFACE might cause some error if there are no public nonstatic methods, which is the case with this class. I don't know what @liach your comment means, but this code got more complicated than it was with the first version of this change. When I talked to @rose00 he thought ACC_ABSTRACT would be okay for this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1725764671 From matsaave at openjdk.org Wed Aug 21 21:23:03 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 21 Aug 2024 21:23:03 GMT Subject: RFR: 8335664: Parsing jsr broken: assert(bci>= 0 && bci < c->method()->code_size()) failed: index out of bounds [v2] In-Reply-To: References: Message-ID: On Wed, 21 Aug 2024 19:33:46 GMT, Matias Saavedra Silva wrote: >> Although JSR bytecodes cannot be generated by javac anymore, a classfile generated with a tool like JASM can still contain this bytecode. Should a program end with a JSR, there will be undefined behavior since the bytecode reads the address of the next instruction. In the case of Hotspot, this leads to a crash when generating oop maps. This fixes the calculation of basic blocks. >> >> The early exploration of this issue was done by @eme64 who also generated a reproducer. > > Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: > > - Removed incorrect comment and added copyright > - Added regression test Basic blocks have a property called `is_reachable` which seems to deal with unused basic blocks. The changes in my patch stop the new basic block from being marked or set to reachable. I think the current change is safe and unreachable basic blocks are already handled. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20645#issuecomment-2303032949 From matsaave at openjdk.org Wed Aug 21 21:28:06 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 21 Aug 2024 21:28:06 GMT Subject: RFR: 8335664: Parsing jsr broken: assert(bci>= 0 && bci < c->method()->code_size()) failed: index out of bounds [v2] In-Reply-To: References: Message-ID: On Wed, 21 Aug 2024 19:33:46 GMT, Matias Saavedra Silva wrote: >> Although JSR bytecodes cannot be generated by javac anymore, a classfile generated with a tool like JASM can still contain this bytecode. Should a program end with a JSR, there will be undefined behavior since the bytecode reads the address of the next instruction. In the case of Hotspot, this leads to a crash when generating oop maps. This fixes the calculation of basic blocks. >> >> The early exploration of this issue was done by @eme64 who also generated a reproducer. > > Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: > > - Removed incorrect comment and added copyright > - Added regression test To answer your question, I don't think that case needs to be tested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20645#issuecomment-2303040437 From liach at openjdk.org Wed Aug 21 21:35:04 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 21 Aug 2024 21:35:04 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace In-Reply-To: References: <0UeIWfhsJRNahI3I8r8wKfGlvWuY-0crKHqGxbPXk5o=.6b9ff3f1-f297-40f0-9c81-99507c2b2880@github.com> <77NWcTx23rX8UnhRcnOqS36y4Y-7-zDEf3hyYD1bcbw=.6f02be24-3cd2-44d8-8483-934295aab9fd@github.com> Message-ID: On Wed, 21 Aug 2024 21:17:14 GMT, Coleen Phillimore wrote: >> Note that JVMS 4.1 requires `ACC_ABSTRACT` to be also set when `ACC_INTERFACE` is set. Also note that some classes capture class data to refer to hidden classes and method handles or lambda forms, so those fields' generation need to add `ACC_PUBLIC` flag to be usable in interfaces. > > I feel like making it ACC_INTERFACE might cause some error if there are no public nonstatic methods, which is the case with this class. I don't know what @liach your comment means, but this code got more complicated than it was with the first version of this change. When I talked to @rose00 he thought ACC_ABSTRACT would be okay for this. Yes, you are right that we currently don't add ACC_PUBLIC flags on methods, which will fail if we add ACC_INTERFACE. Same for the fields; these LambdaForm classes use fields to store class data that's usually stored as condy, because LambdaForm is the infrastructure that condy uses (like LambdaMetafacotry cannot use lambdas). Those fields are my concerns for the interface migration. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1725777710 From ihse at openjdk.org Wed Aug 21 21:58:34 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 21 Aug 2024 21:58:34 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds Message-ID: As a preparation for Hermetic Java, we need to have a way to look up during runtime if we are using a statically linked library or not. This change will be the first step needed towards compiling the object files only once, and then link them into either dynamic or static libraries. (The only exception will be the linktype.c[pp] files, which needs to be compiled twice, once for the dynamic libraries and once for the static libraries.) Getting there will require further work though. This is part of the changes that make up the draft PR https://github.com/openjdk/jdk/pull/19478, which I have broken out. ------------- Commit messages: - 8338768: Introduce runtime lookup to check for static builds Changes: https://git.openjdk.org/jdk/pull/20666/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20666&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338768 Stats: 203 lines in 11 files changed: 109 ins; 21 del; 73 mod Patch: https://git.openjdk.org/jdk/pull/20666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20666/head:pull/20666 PR: https://git.openjdk.org/jdk/pull/20666 From ihse at openjdk.org Wed Aug 21 22:14:40 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 21 Aug 2024 22:14:40 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: References: Message-ID: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> > As a preparation for Hermetic Java, we need to have a way to look up during runtime if we are using a statically linked library or not. > > This change will be the first step needed towards compiling the object files only once, and then link them into either dynamic or static libraries. (The only exception will be the linktype.c[pp] files, which needs to be compiled twice, once for the dynamic libraries and once for the static libraries.) Getting there will require further work though. > > This is part of the changes that make up the draft PR https://github.com/openjdk/jdk/pull/19478, which I have broken out. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Also update build to link properly ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20666/files - new: https://git.openjdk.org/jdk/pull/20666/files/e917f6a2..072a910d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20666&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20666&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20666/head:pull/20666 PR: https://git.openjdk.org/jdk/pull/20666 From dcubed at openjdk.org Wed Aug 21 22:21:03 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 21 Aug 2024 22:21:03 GMT Subject: RFR: 8338760: Adjust the comment after UseObjectMonitorTable In-Reply-To: References: Message-ID: <7uhrh2WhMEhBdTkjdVZKlVDVuWmuempS4nu-pVEzy4U=.3fca4ac3-e03d-4d09-ab27-b6edec817355@github.com> On Wed, 21 Aug 2024 18:07:01 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple comment change? > After reading description and code of https://github.com/openjdk/jdk/pull/20067, seems to me this comment should be changed accordingly. But I could be wrong. > Thanks Thumbs up. I also agree that this is trivial. ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20663#pullrequestreview-2252055567 From prr at openjdk.org Wed Aug 21 23:54:05 2024 From: prr at openjdk.org (Phil Race) Date: Wed, 21 Aug 2024 23:54:05 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: On Wed, 21 Aug 2024 22:14:40 GMT, Magnus Ihse Bursie wrote: >> As a preparation for Hermetic Java, we need to have a way to look up during runtime if we are using a statically linked library or not. >> >> This change will be the first step needed towards compiling the object files only once, and then link them into either dynamic or static libraries. (The only exception will be the linktype.c[pp] files, which needs to be compiled twice, once for the dynamic libraries and once for the static libraries.) Getting there will require further work though. >> >> This is part of the changes that make up the draft PR https://github.com/openjdk/jdk/pull/19478, which I have broken out. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Also update build to link properly AWT changes look fine. ------------- Marked as reviewed by prr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20666#pullrequestreview-2252384240 From dlong at openjdk.org Wed Aug 21 23:54:03 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 21 Aug 2024 23:54:03 GMT Subject: RFR: 8335664: Parsing jsr broken: assert(bci>= 0 && bci < c->method()->code_size()) failed: index out of bounds [v2] In-Reply-To: References: Message-ID: On Wed, 21 Aug 2024 19:33:46 GMT, Matias Saavedra Silva wrote: >> Although JSR bytecodes cannot be generated by javac anymore, a classfile generated with a tool like JASM can still contain this bytecode. Should a program end with a JSR, there will be undefined behavior since the bytecode reads the address of the next instruction. In the case of Hotspot, this leads to a crash when generating oop maps. This fixes the calculation of basic blocks. >> >> The early exploration of this issue was done by @eme64 who also generated a reproducer. > > Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: > > - Removed incorrect comment and added copyright > - Added regression test Looks good. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20645#pullrequestreview-2252385772 From jiangli at openjdk.org Thu Aug 22 00:33:02 2024 From: jiangli at openjdk.org (Jiangli Zhou) Date: Thu, 22 Aug 2024 00:33:02 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: <5_BKiz0spEIxGN2mZJHiAoaSOWOdnH8kf5POgG9sQ9g=.9339d838-9f04-4d28-93b8-647ad90e805a@github.com> On Wed, 21 Aug 2024 22:14:40 GMT, Magnus Ihse Bursie wrote: >> As a preparation for Hermetic Java, we need to have a way to look up during runtime if we are using a statically linked library or not. >> >> This change will be the first step needed towards compiling the object files only once, and then link them into either dynamic or static libraries. (The only exception will be the linktype.c[pp] files, which needs to be compiled twice, once for the dynamic libraries and once for the static libraries.) Getting there will require further work though. >> >> This is part of the changes that make up the draft PR https://github.com/openjdk/jdk/pull/19478, which I have broken out. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Also update build to link properly I compared the extracted changes in this PR with the related parts in https://github.com/openjdk/jdk/pull/19478. They look ok. My concern (as discussed in https://github.com/openjdk/jdk/pull/19478#issuecomment-2278421931) is that these runtime changes for static JDK can't be tested even they are relatively simple, without the the actual linking change. Any timeline for the static linking changes? ------------- PR Review: https://git.openjdk.org/jdk/pull/20666#pullrequestreview-2252486767 From sviswanathan at openjdk.org Thu Aug 22 01:20:16 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 22 Aug 2024 01:20:16 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v3] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Wed, 21 Aug 2024 16:42:44 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Pass explicit wrap argument to selectFrom API with default value set to true. @rose00 @PaulSandoz I have updated https://github.com/openjdk/jdk/pull/20634. Please take a look if it meets your expectations for the existing rearrange/selectFrom apis. Jatin can then base the new two vector selectFrom api in this PR on similar lines. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2303383784 From dholmes at openjdk.org Thu Aug 22 02:49:09 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 22 Aug 2024 02:49:09 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: On Wed, 21 Aug 2024 22:14:40 GMT, Magnus Ihse Bursie wrote: >> As a preparation for Hermetic Java, we need to have a way to look up during runtime if we are using a statically linked library or not. >> >> This change will be the first step needed towards compiling the object files only once, and then link them into either dynamic or static libraries. (The only exception will be the linktype.c[pp] files, which needs to be compiled twice, once for the dynamic libraries and once for the static libraries.) Getting there will require further work though. >> >> This is part of the changes that make up the draft PR https://github.com/openjdk/jdk/pull/19478, which I have broken out. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Also update build to link properly Sorry but I don't understand the point of changing build-time constructs using `ifdef STATIC_BUILD` into what appear to be runtime checks, but the result of which is already determined at build time. These are not really runtime checks. ------------- PR Review: https://git.openjdk.org/jdk/pull/20666#pullrequestreview-2252973892 From jwaters at openjdk.org Thu Aug 22 05:30:02 2024 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 22 Aug 2024 05:30:02 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: <6Qu9wbsWf_YxFFvfA6KveCLOheMOsbo4z--gZzQh9xY=.b1d013f6-eb34-4d50-b19b-64c684cdea13@github.com> On Thu, 22 Aug 2024 02:46:34 GMT, David Holmes wrote: > Sorry but I don't understand the point of changing build-time constructs using `ifdef STATIC_BUILD` into what appear to be runtime checks, but the result of which is already determined at build time. These are not really runtime checks. I believe the new methods are for checking whether the JDK was statically linked or not, but I half agree with your point. The change to using the new methods instead of preprocessor checks don't seem to really do anything. Maybe the new methods can be kept alongside the preprocessor checks ------------- PR Comment: https://git.openjdk.org/jdk/pull/20666#issuecomment-2303813593 From thartmann at openjdk.org Thu Aug 22 05:44:09 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 22 Aug 2024 05:44:09 GMT Subject: RFR: 8335664: Parsing jsr broken: assert(bci>= 0 && bci < c->method()->code_size()) failed: index out of bounds [v2] In-Reply-To: References: Message-ID: On Wed, 21 Aug 2024 19:33:46 GMT, Matias Saavedra Silva wrote: >> Although JSR bytecodes cannot be generated by javac anymore, a classfile generated with a tool like JASM can still contain this bytecode. Should a program end with a JSR, there will be undefined behavior since the bytecode reads the address of the next instruction. In the case of Hotspot, this leads to a crash when generating oop maps. This fixes the calculation of basic blocks. >> >> The early exploration of this issue was done by @eme64 who also generated a reproducer. > > Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: > > - Removed incorrect comment and added copyright > - Added regression test Looks good to me. You might want to add @eme64 as co-contributor to this PR since he contributed the test. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20645#pullrequestreview-2253542147 From aboldtch at openjdk.org Thu Aug 22 06:27:18 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 22 Aug 2024 06:27:18 GMT Subject: RFR: 8338810: PPC, s390x: LightweightSynchronizer::exit asserts, missing lock Message-ID: [JDK-8338638](https://bugs.openjdk.org/browse/JDK-8338638) made me realise that PPC and s390x have the same issue. The issue is that the C2 unlock path will check if the monitor is inflated after popping of the last entry on the lock stack. With UseObjectMonitorTable (without the the cache lookup implemented), the slow path is incorrectly taken without resting the popped oop. Currently the runtime expects the the lock stack to be consistent (have an entry) in exit if a the monitor is anonymously inflated. I'll provide a bandaid fix which pushes back the oop before the calling to the runtime. A future enhancement for all platform would be to allow the C2 entry point to redo the push when taking the slow path and it realises that the monitor is anonymously inflated or it is fast locked and the lock stack does not contain the oop. (Removing all the push back logic from the emitted C2 unlock nodes) I am unable to test ppc and s390x, so I have not verified that the issue is reproduced nor that this fixes it. Hopefully @TheRealMDoerr and @offamitkumar can assist me here with running `test/hotspot/jtreg/runtime/Monitor/UseObjectMonitorTableTest.java` with and without the patch on PPC and s390x respectively. Thanks in advance. (And sorry for integrating without better testing of your respective platforms) ------------- Commit messages: - 8338810: PPC, s390x: LightweightSynchronizer::exit asserts, missing lock Changes: https://git.openjdk.org/jdk/pull/20672/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20672&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338810 Stats: 12 lines in 2 files changed: 10 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20672.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20672/head:pull/20672 PR: https://git.openjdk.org/jdk/pull/20672 From mli at openjdk.org Thu Aug 22 07:26:09 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 22 Aug 2024 07:26:09 GMT Subject: RFR: 8338760: Adjust the comment after UseObjectMonitorTable In-Reply-To: References: Message-ID: <_Uocc56GCMY8NJPryg3SUaiR0DRJMCc_sV4x6616nZU=.2adf1105-8690-48f8-b8c6-476329c0052f@github.com> On Wed, 21 Aug 2024 20:59:50 GMT, Coleen Phillimore wrote: >> Hi, >> Can you help to review this simple comment change? >> After reading description and code of https://github.com/openjdk/jdk/pull/20067, seems to me this comment should be changed accordingly. But I could be wrong. >> Thanks > > Yes, this is correct. Thank you for noticing this. Also, it can be checked in as "trivial". @coleenp @dcubed-ojdk Thank you for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20663#issuecomment-2303961759 From mli at openjdk.org Thu Aug 22 07:26:09 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 22 Aug 2024 07:26:09 GMT Subject: Integrated: 8338760: Adjust the comment after UseObjectMonitorTable In-Reply-To: References: Message-ID: On Wed, 21 Aug 2024 18:07:01 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple comment change? > After reading description and code of https://github.com/openjdk/jdk/pull/20067, seems to me this comment should be changed accordingly. But I could be wrong. > Thanks This pull request has now been integrated. Changeset: 6644dd33 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/6644dd33f6f4b440105d84ef187a0ff6b1d60827 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8338760: Adjust the comment after UseObjectMonitorTable Reviewed-by: coleenp, dcubed ------------- PR: https://git.openjdk.org/jdk/pull/20663 From mdoerr at openjdk.org Thu Aug 22 07:33:02 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 22 Aug 2024 07:33:02 GMT Subject: RFR: 8338810: PPC, s390x: LightweightSynchronizer::exit asserts, missing lock In-Reply-To: References: Message-ID: <9d5V0toFm2U-u4IpOgVa7EDn-3ELXeV4BIcy_tlAJ_A=.6c94054c-5cb7-4ef1-921e-489ed6d9f57c@github.com> On Thu, 22 Aug 2024 06:20:23 GMT, Axel Boldt-Christmas wrote: > [JDK-8338638](https://bugs.openjdk.org/browse/JDK-8338638) made me realise that PPC and s390x have the same issue. > > The issue is that the C2 unlock path will check if the monitor is inflated after popping of the last entry on the lock stack. With UseObjectMonitorTable (without the the cache lookup implemented), the slow path is incorrectly taken without resting the popped oop. Currently the runtime expects the the lock stack to be consistent (have an entry) in exit if a the monitor is anonymously inflated. > > I'll provide a bandaid fix which pushes back the oop before the calling to the runtime. > > A future enhancement for all platform would be to allow the C2 entry point to redo the push when taking the slow path and it realises that the monitor is anonymously inflated or it is fast locked and the lock stack does not contain the oop. (Removing all the push back logic from the emitted C2 unlock nodes) > > I am unable to test ppc and s390x, so I have not verified that the issue is reproduced nor that this fixes it. > Hopefully @TheRealMDoerr and @offamitkumar can assist me here with running `test/hotspot/jtreg/runtime/Monitor/UseObjectMonitorTableTest.java` with and without the patch on PPC and s390x respectively. > Thanks in advance. (And sorry for integrating without better testing of your respective platforms) Thanks for fixing it! The PPC64 part looks good and fixes the test failures. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20672#pullrequestreview-2253712572 From fyang at openjdk.org Thu Aug 22 07:49:08 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 22 Aug 2024 07:49:08 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v3] In-Reply-To: References: Message-ID: <1VHlhME2c2sX0XIU9jrKgYMWjWfsZWVcA8-ND2nT-3c=.a273218b-0482-4d3b-8e67-a83752065e97@github.com> On Wed, 21 Aug 2024 10:26:18 GMT, Hamlin Li wrote: >> ## Performance >> benchmarks run on CanVM-K230 >> >> data >> >> Benchmark m2+m1+scalar | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score +intrinsic+rvv | Score -intrinsic | Error | Units | Improvement >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 1 | avgt | 10 | 97.771 | 98.506 | 0.713 | ns/op | 1.008 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 3 | avgt | 10 | 117.715 | 118.422 | 0.428 | ns/op | 1.006 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 7 | avgt | 10 | 174.625 | 172.767 | 7.671 | ns/op | 0.989 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 32 | avgt | 10 | 286.391 | 317.175 | 11.443 | ns/op | 1.107 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 64 | avgt | 10 | 336.932 | 503.257 | 15.738 | ns/op | 1.494 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 80 | avgt | 10 | 418.894 | 625.485 | 7.21 | ns/op | 1.493 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 96 | avgt | 10 | 353.813 | 698.67 | 15.485 | ns/op | 1.975 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 112 | avgt | 10 | 499.243 | 866.909 | 4.427 | ns/op | 1.736 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 512 | avgt | 10 | 1451.277 | 3530.048 | 3.685 | ns/op | 2.432 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 1000 | avgt | 10 | 2258.785 | 5964.066 | 9.075 | ns/op | 2.64 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 20000 | avgt | 10 | 39689.204 | 122334.929 | 255.195 | ns/op | 3.082 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1 | avgt | 10 | 187.032 | 158.558 | 7.606 | ns/op | 0.848 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 3 | avgt | 10 | 209.558 | 200.774 | 7.648 | ns/op | 0.958 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 7 | avgt | 10 | 556.696 | 505.072 | 8.748 | ns/op | 0.907 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 32 | avgt | 10 | 2139.767 | 1876.825 | 13.787 | ns/op | 0.877 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 64 | avgt | 10 | 6142.353 | 3818.199 | 35.622 | ns/op | 0.622 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 80 | avgt | 10 | 8746.205 | 4787.155 | 109.819 | ns/op ... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > revert misc Hi, was this ever tested on boards from other vendors like BPI-F3? Seems the JMH data is no there. I can take a look tomorrow. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20026#issuecomment-2304002615 From rkennke at openjdk.org Thu Aug 22 07:57:11 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 07:57:11 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace In-Reply-To: References: Message-ID: On Thu, 9 May 2024 13:51:09 GMT, Coleen Phillimore wrote: > This change stores InstanceKlass for interface and abstract classes in the non-class metaspace, since class metaspace will have limits on number of classes that can be represented when Lilliput changes go in. Classes that have no instances created for them don't require compressed class pointers. The generated LambdaForm classes are also AllStatic, and changing them to abstract moves them to non-class metaspace too. It's not technically great to make them abstract and not final but you can't have both. Java classfile access flags have no way of specifying something like AllStatic. > > Tested with tier1-8. If I understand correctly, we already have limits on how many classes we can represent as compressed class-pointers. While this is nice for Lilliput, this is equally useful for non-Lilliput CCP, because addressable class-space doesn't get polluted by classes that never need to be encoded as CCP, and thus effectively increases the number of classes that we can address without resorting to -UseCompressedClassPointers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19157#issuecomment-2304017272 From rkennke at openjdk.org Thu Aug 22 08:00:54 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 08:00:54 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v2] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #20603 and #20605 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - The identity hash-code is temporarily narrowed to 25 bits. As soon as we get Tiny Class-Pointers (planned before the JEP can be integrated, and to be opened for review soon), we will widen the hash-bits back to 31 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant. > - Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral to a narrow-Klass* (o... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge branch 'JDK-8305896-v2' into JDK-8305895-v3 - Merge branch 'JDK-8305896-v2' into JDK-8305895-v3 - Explicitely make AVAILABLE=true - Export new hash constants to JVMCI - Improve asserts - 8305894: Implementation: JEP 450: Compact Object Headers (Experimental) ------------- Changes: https://git.openjdk.org/jdk/pull/20640/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20640&range=01 Stats: 1670 lines in 105 files changed: 1234 ins; 208 del; 228 mod Patch: https://git.openjdk.org/jdk/pull/20640.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20640/head:pull/20640 PR: https://git.openjdk.org/jdk/pull/20640 From mli at openjdk.org Thu Aug 22 08:10:04 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 22 Aug 2024 08:10:04 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v3] In-Reply-To: <1VHlhME2c2sX0XIU9jrKgYMWjWfsZWVcA8-ND2nT-3c=.a273218b-0482-4d3b-8e67-a83752065e97@github.com> References: <1VHlhME2c2sX0XIU9jrKgYMWjWfsZWVcA8-ND2nT-3c=.a273218b-0482-4d3b-8e67-a83752065e97@github.com> Message-ID: On Thu, 22 Aug 2024 07:45:58 GMT, Fei Yang wrote: > Hi, was this ever tested on boards from other vendors like BPI-F3? Seems the JMH data is not there. I can take a look tomorrow. Thanks. Yes, on bananapi for normal cases it's much better when size getting bigger, I think the reason is it has wider vreg length, which is reasonable. e.g. for size 20000, on K230 the improvement is about 3.082 times, on bananpi it's about 3.856 times. Please check the data below: Benchmark. - bananapi (vlenb == 32) | (addSpecial) | (errorIndex) | (maxNumBytes) | Mode | Cnt | Score +intrinsic | Score -intrinsic | Error | Units | Improvement -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- Base64Decode.testBase64Decode | 0 | 144 | 1 | avgt | 10 | 127.479 | 128.436 | 0.35 | ns/op | 1.008 Base64Decode.testBase64Decode | 0 | 144 | 3 | avgt | 10 | 153.771 | 149.556 | 0.734 | ns/op | 0.973 Base64Decode.testBase64Decode | 0 | 144 | 7 | avgt | 10 | 205.214 | 220.49 | 3.627 | ns/op | 1.074 Base64Decode.testBase64Decode | 0 | 144 | 32 | avgt | 10 | 312.845 | 352.138 | 4.029 | ns/op | 1.126 Base64Decode.testBase64Decode | 0 | 144 | 64 | avgt | 10 | 432.29 | 522.126 | 1.681 | ns/op | 1.208 Base64Decode.testBase64Decode | 0 | 144 | 80 | avgt | 10 | 528.625 | 651.837 | 1.605 | ns/op | 1.233 Base64Decode.testBase64Decode | 0 | 144 | 96 | avgt | 10 | 358.15 | 781.689 | 1.535 | ns/op | 2.183 Base64Decode.testBase64Decode | 0 | 144 | 112 | avgt | 10 | 504.604 | 941.241 | 2.038 | ns/op | 1.865 Base64Decode.testBase64Decode | 0 | 144 | 512 | avgt | 10 | 1949.818 | 4085.463 | 1.364 | ns/op | 2.095 Base64Decode.testBase64Decode | 0 | 144 | 1000 | avgt | 10 | 2551.405 | 6719.044 | 21.669 | ns/op | 2.633 Base64Decode.testBase64Decode | 0 | 144 | 20000 | avgt | 10 | 29470 | 113639.583 | 4.189 | ns/op | 3.856 Base64Decode.testBase64Decode | 0 | 144 | 50000 | avgt | 10 | 70658.042 | 282494.033 | 12.559 | ns/op | 3.998 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20026#issuecomment-2304042266 From ihse at openjdk.org Thu Aug 22 08:46:03 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 22 Aug 2024 08:46:03 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: <5_BKiz0spEIxGN2mZJHiAoaSOWOdnH8kf5POgG9sQ9g=.9339d838-9f04-4d28-93b8-647ad90e805a@github.com> References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> <5_BKiz0spEIxGN2mZJHiAoaSOWOdnH8kf5POgG9sQ9g=.9339d838-9f04-4d28-93b8-647ad90e805a@github.com> Message-ID: On Thu, 22 Aug 2024 00:30:07 GMT, Jiangli Zhou wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> Also update build to link properly > > I compared the extracted changes in this PR with the related parts in https://github.com/openjdk/jdk/pull/19478. They look ok. My concern (as discussed in https://github.com/openjdk/jdk/pull/19478#issuecomment-2278421931) is that these runtime changes for static JDK can't be tested even they are relatively simple, without the the actual linking change. Any timeline for the static linking changes? @jianglizhou > [...] these runtime changes for static JDK can't be tested [...] Yes, they can. This is just a pure refactoring of existing code. I have deliberately kept out addition of the new places where static linking exceptions are needed in the code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20666#issuecomment-2304110701 From amitkumar at openjdk.org Thu Aug 22 08:50:06 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 22 Aug 2024 08:50:06 GMT Subject: RFR: 8338810: PPC, s390x: LightweightSynchronizer::exit asserts, missing lock In-Reply-To: References: Message-ID: <3CfG_ppB6dSqplSaZCddubMLqh-UlBM65GgK-wDAMBU=.57481888-44ec-4275-929a-22b5da26a001@github.com> On Thu, 22 Aug 2024 06:20:23 GMT, Axel Boldt-Christmas wrote: > [JDK-8338638](https://bugs.openjdk.org/browse/JDK-8338638) made me realise that PPC and s390x have the same issue. > > The issue is that the C2 unlock path will check if the monitor is inflated after popping of the last entry on the lock stack. With UseObjectMonitorTable (without the the cache lookup implemented), the slow path is incorrectly taken without resting the popped oop. Currently the runtime expects the the lock stack to be consistent (have an entry) in exit if a the monitor is anonymously inflated. > > I'll provide a bandaid fix which pushes back the oop before the calling to the runtime. > > A future enhancement for all platform would be to allow the C2 entry point to redo the push when taking the slow path and it realises that the monitor is anonymously inflated or it is fast locked and the lock stack does not contain the oop. (Removing all the push back logic from the emitted C2 unlock nodes) > > I am unable to test ppc and s390x, so I have not verified that the issue is reproduced nor that this fixes it. > Hopefully @TheRealMDoerr and @offamitkumar can assist me here with running `test/hotspot/jtreg/runtime/Monitor/UseObjectMonitorTableTest.java` with and without the patch on PPC and s390x respectively. > Thanks in advance. (And sorry for integrating without better testing of your respective platforms) s390x Part looks good as well. Test also passes; ./summary.txt:1889:runtime/Monitor/UseObjectMonitorTableTest.java#ExtremeDeflation Passed. Execution successful ./summary.txt:1890:runtime/Monitor/UseObjectMonitorTableTest.java#NormalDeflation Passed. Execution successful ------------- Marked as reviewed by amitkumar (Committer). PR Review: https://git.openjdk.org/jdk/pull/20672#pullrequestreview-2253884335 From ihse at openjdk.org Thu Aug 22 08:59:03 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 22 Aug 2024 08:59:03 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: On Thu, 22 Aug 2024 02:46:34 GMT, David Holmes wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> Also update build to link properly > > Sorry but I don't understand the point of changing build-time constructs using `ifdef STATIC_BUILD` into what appear to be runtime checks, but the result of which is already determined at build time. These are not really runtime checks. @dholmes-ora > Sorry but I don't understand the point of changing build-time constructs using ifdef STATIC_BUILD into what appear to be runtime checks, but the result of which is already determined at build time. I apologize. I did not express the intent of this change clear enough. The background is that we want to build and test statically linked native libraries. Currently, building a statically linked version requires recompiling all native source code into a completely new set of .o files, which are then linked into a static library. This is extremely wasteful. Most of the code is completely identical for static and dynamic libraries. To fix this, me, Jiangli and her team have been working on a way to get around this. By moving the ifdef check to a new file that just contains a single function, we only need to compile this single file twice -- once for the static library, and once for the dynamic library. All other .o files is compiled just once, and then you link "all other files" + "the one special file for your kind of library" to get what you want. Unfortunately, there is also one more blocker before this can be achieved. That is the reason the corresponding change in the build system is not included in this patch. (So this is a preparation for these future changes, but not the complete solution.) The missing part is that the `[JNI|Agent]_On[Un]Load` functions need to be able to use the static linked naming scheme, even for dynamically linked libraries. This is trivial per se, but requires a spec change, which has not yet happened. The reason I want to get this partial solution into the mainline right now, instead of waiting for the spec change and the complete build system fix, is that these new functions for checking for static/dynamic are needed by additional changes that Jiangli have created upstream, and that I am trying to help her get integrated. (The goal of these changes is to make not just static libraries, but to link these static libraries with the java launcher into a statically linked launcher, which is a pre-requisite for the rest of the Hermetic Java story.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20666#issuecomment-2304133213 From aph-open at littlepinkcloud.com Thu Aug 22 09:08:23 2024 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Thu, 22 Aug 2024 10:08:23 +0100 Subject: 8315884: New Object to ObjectMonitor mapping causes linux-aarch64 musl to fail In-Reply-To: <7F626C6C-D7E0-4893-A699-A5A9A4AA33AC@jetbrains.com> References: <8E87A8D2-8147-4F66-92EF-F688F7ADDE07@jetbrains.com> <7F626C6C-D7E0-4893-A699-A5A9A4AA33AC@jetbrains.com> Message-ID: <6361f29b-8cf7-4803-ba6b-bdcca41fa6fc@littlepinkcloud.com> On 8/20/24 10:38, Vitaly Provodin wrote: > bug report may be found here - https://bugs.openjdk.org/browse/JDK-8338660 > > Please let me know if more info required. Which version of GCC is this? -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From mli at openjdk.org Thu Aug 22 09:20:02 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 22 Aug 2024 09:20:02 GMT Subject: RFR: 8338727: RISC-V: Avoid synthetic data dependency in nmethod barrier on Ztso In-Reply-To: References: Message-ID: On Wed, 21 Aug 2024 10:01:21 GMT, Robbin Ehn wrote: > Hi please consider, > > On TSO we don't need the synthetic data dependency in between the loads. > Also added some comment about this. > > Sanity tested Make sense to me. Thanks! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20661#pullrequestreview-2253955587 From stuefe at openjdk.org Thu Aug 22 09:16:07 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 22 Aug 2024 09:16:07 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace In-Reply-To: References: Message-ID: <4FCg8K0oXo8mO47HSpZJ5Y4qS6oxspsM4Juqdj0mLkg=.9f0ef8c9-8f2d-44d5-ba56-348e8159c9a5@github.com> On Thu, 9 May 2024 13:51:09 GMT, Coleen Phillimore wrote: > This change stores InstanceKlass for interface and abstract classes in the non-class metaspace, since class metaspace will have limits on number of classes that can be represented when Lilliput changes go in. Classes that have no instances created for them don't require compressed class pointers. The generated LambdaForm classes are also AllStatic, and changing them to abstract moves them to non-class metaspace too. It's not technically great to make them abstract and not final but you can't have both. Java classfile access flags have no way of specifying something like AllStatic. > > Tested with tier1-8. I am surprised that this patch is so small. I would have assumed a lot of code exists that unconditionally assumes we always can encode decode Klass*<->narrowKlass. I looked through the typical cases (eg Klass validation) and all of them seem to be okay. I will keep looking. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19157#issuecomment-2304169936 From aph at openjdk.org Thu Aug 22 09:36:09 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 22 Aug 2024 09:36:09 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v5] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Wed, 21 Aug 2024 16:11:25 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: > > cleanup: use a constexpr function for intpow instead of a templated class One thing that's odd, but not really wrong. Why do you process byte arrays 32-wide instead of 16-wide like everything else? It makes the code more complex than doing everything 8-wide and doesn't seem to increase performance, either with my measurements or yours. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2304210284 From aph at openjdk.org Thu Aug 22 09:45:07 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 22 Aug 2024 09:45:07 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v5] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Wed, 21 Aug 2024 16:11:25 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: > > cleanup: use a constexpr function for intpow instead of a templated class src/hotspot/share/utilities/intpow.hpp line 2: > 1: /* > 2: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. `Copyright (c) 2024, Oracle`? Is there a co-author here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1726722636 From ihse at openjdk.org Thu Aug 22 10:33:05 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 22 Aug 2024 10:33:05 GMT Subject: RFR: Implement JEP 450: Compact Object Headers (Experimental) [v2] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 08:00:54 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #20603 and #20605 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - The identity hash-code is temporarily narrowed to 25 bits. As soon as we get Tiny Class-Pointers (planned before the JEP can be integrated, and to be opened for review soon), we will widen the hash-bits back to 31 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant. >> - Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral t... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge branch 'JDK-8305896-v2' into JDK-8305895-v3 > - Merge branch 'JDK-8305896-v2' into JDK-8305895-v3 > - Explicitely make AVAILABLE=true > - Export new hash constants to JVMCI > - Improve asserts > - 8305894: Implementation: JEP 450: Compact Object Headers (Experimental) Build changes look good. I have not looked at any other changes. @rkennke Note that the Skara bot removed the RFR label when you changed the title to no longer match a JBS issue. This means that no emails will be sent to the corresponding lists. I am not sure if this was intentional on your part. ------------- Marked as reviewed by ihse (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20640#pullrequestreview-2254136529 PR Comment: https://git.openjdk.org/jdk/pull/20640#issuecomment-2304321468 From kevinw at openjdk.org Thu Aug 22 11:03:02 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Thu, 22 Aug 2024 11:03:02 GMT Subject: RFR: 8204681: Option to include timestamp in hprof filename In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 15:07:17 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses [8204681](https://bugs.openjdk.org/browse/JDK-8204681) enabling support for timestamp expansion in filenames specified in `-XX:HeapDumpPath` using `%t`. > > As mentioned in this comments for this issue, this is somewhat related to [8334492](https://bugs.openjdk.org/browse/JDK-8334492) where we enabled support for `%p` for filenames specified in jcmd. > > With this patch, I propose: > - Expanding the utility function `Arguments::copy_expand_pid` to `Arguments::copy_expand_arguments` to deal with `%p` expansions for pid and `%t` expansions for timestamps. > - Leveraging the above utility function to enable argument expansion for both heap dump filenames and jcmd output commands. > - Though the linked JBS issue only relates to heap dumps generated in case of OOM, I think we can edit it to more broadly support filename expansion to support `%t` for jcmd as well. > > Testing: > - [x] Added test cases pass with all platforms (verified with a GHA job). > - [x] Tier 1 passes with GHA. > > Looking forward to hearing your thoughts! > > Thanks, > Sonia Hi, Yes, agreed with notes above that current time is the timestamp that is useful. (Multiple files from the same VM are created with the same filename pattern, and file created is different over time. Good. And if the pid is the same, timestamp keeps things "more" unique.) That means we should not need any changes relating to start_time = create_vm_timer...? ( src/hotspot/share/runtime/threads.cpp ) I'd imagined a raw numerical timestamp, to keep things simple, the example in JDK-8204681 was like that. However src/hotspot/share/utilities/ostream.cpp has make_log_name() which says "%t => YYYY-MM-DD_HH-MM-SS" so maybe we could follow that pattern. As long as it's clear, and we'll be doing help/man page updates also. JDK-8204681 was logged originally about -XX:HeapDumpPath but yes all of our output filenames from diagnostic commands would ideally support this. FYI, in JDK-8338603 I will try and clear up the "FILE" type confusion. It is not easy for a JMX client to know that implementation-dependent types may exist. I can come back for a proper review soon. ------------- PR Review: https://git.openjdk.org/jdk/pull/20568#pullrequestreview-2254201847 From rkennke at openjdk.org Thu Aug 22 11:05:18 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 11:05:18 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #20603 and #20605 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - The identity hash-code is temporarily narrowed to 25 bits. As soon as we get Tiny Class-Pointers (planned before the JEP can be integrated, and to be opened for review soon), we will widen the hash-bits back to 31 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant. > - Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral to a narrow-Klass* (o... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge branch 'JDK-8305896-v2' into JDK-8305895-v3 - Merge branch 'JDK-8305896-v2' into JDK-8305895-v3 - Merge branch 'JDK-8305896-v2' into JDK-8305895-v3 - Explicitely make AVAILABLE=true - Export new hash constants to JVMCI - Improve asserts - 8305894: Implementation: JEP 450: Compact Object Headers (Experimental) ------------- Changes: https://git.openjdk.org/jdk/pull/20640/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20640&range=02 Stats: 1670 lines in 105 files changed: 1234 ins; 208 del; 228 mod Patch: https://git.openjdk.org/jdk/pull/20640.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20640/head:pull/20640 PR: https://git.openjdk.org/jdk/pull/20640 From rkennke at openjdk.org Thu Aug 22 11:05:18 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 11:05:18 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v2] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 10:30:00 GMT, Magnus Ihse Bursie wrote: > @rkennke Note that the Skara bot removed the RFR label when you changed the title to no longer match a JBS issue. This means that no emails will be sent to the corresponding lists. I am not sure if this was intentional on your part. Thanks for pointing that out! No it was not intentional. Mark changed the title in the JBS issue, and I copied that over, but forgot the actual issue number. Should be fixed now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20640#issuecomment-2304384577 From duke at openjdk.org Thu Aug 22 11:08:06 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Thu, 22 Aug 2024 11:08:06 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: <4qN_vR7mek2GtJvYt8bbi-ZiBfQNxQ4415WhHe_p4Dg=.cc07e069-9bdd-41c4-8043-8aefb87a5c51@github.com> On Fri, 5 Jul 2024 17:23:04 GMT, Mikhail Ablakatov wrote: > * [x] For arrays shorter than the number of elements processed by a single iteration of the Neon loop performance is not optimal, though still better than the baseline's. Previously I noticed that unrolling the scalar loop by 8 instead of 4 might result in better performance for shorter arrays/strings. After running benchmarks on Neoverse N1, Neoverse V1 and Neoverse V2 I can say that the results are not consistent across the range of CPUs. While increasing the unroll factor does slightly improve the performance on Neoverse N1, the same doesn't hold true for Neoverse V1/V2. Thus I think it doesn't worth the increased code size. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2304391438 From duke at openjdk.org Thu Aug 22 12:28:08 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Thu, 22 Aug 2024 12:28:08 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v5] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Thu, 22 Aug 2024 09:33:07 GMT, Andrew Haley wrote: > One thing that's odd, but not really wrong. Why do you process byte arrays 32-wide instead of 16-wide like everything else? It makes the code more complex than doing everything 8-wide ... There's no arrangement specifier for `LD1 (multiple structures)` which instructs to load 4 single byte sized elements per a SIMD&FP register. The smallest one is `8B`. So while we can process 4 elements per a SIMD&FP register for `T_INT`/`T_BYTE`/`T_SHORT` arrays, we have to do it twice for `T_BOOLEAN`/`T_BYTE` arrays and [switch two halves of the registers places in between](https://github.com/openjdk/jdk/pull/18487/files#diff-9112056f732229b18fec48fb0b20a3fe824de49d0abd41fbdb4202cfe70ad114R5451) using `SSHLL2`/`USHLL2`. > ... and doesn't seem to increase performance, either with my measurements or yours. What measurements are you referring to here? Could these be done prior to loading 4 registers per a single `LD1` instruction? > src/hotspot/share/utilities/intpow.hpp line 2: > >> 1: /* >> 2: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. > > `Copyright (c) 2024, Oracle`? Is there a co-author here? There isn't, thanks, I'll remove it ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2304537584 PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1726958327 From coleenp at openjdk.org Thu Aug 22 12:30:06 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 22 Aug 2024 12:30:06 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace In-Reply-To: <4FCg8K0oXo8mO47HSpZJ5Y4qS6oxspsM4Juqdj0mLkg=.9f0ef8c9-8f2d-44d5-ba56-348e8159c9a5@github.com> References: <4FCg8K0oXo8mO47HSpZJ5Y4qS6oxspsM4Juqdj0mLkg=.9f0ef8c9-8f2d-44d5-ba56-348e8159c9a5@github.com> Message-ID: On Thu, 22 Aug 2024 09:12:44 GMT, Thomas Stuefe wrote: >> This change stores InstanceKlass for interface and abstract classes in the non-class metaspace, since class metaspace will have limits on number of classes that can be represented when Lilliput changes go in. Classes that have no instances created for them don't require compressed class pointers. The generated LambdaForm classes are also AllStatic, and changing them to abstract moves them to non-class metaspace too. It's not technically great to make them abstract and not final but you can't have both. Java classfile access flags have no way of specifying something like AllStatic. >> >> Tested with tier1-8. > > I am surprised that this patch is so small. I would have assumed a lot of code exists that unconditionally assumes we always can encode decode Klass*<->narrowKlass. > > I looked through the typical cases (eg Klass validation) and all of them seem to be okay. I will keep looking. Thanks for looking at this @tstuefe. I was pleased that the change was small but once we identify the classes as AbstractClass instead of Class in metaspace, it just falls out. I thought CDS would have more changes, but CDS is all in one space and doesn't differentiate. It's good that the only time we compress and uncompress klass pointers is when we get them out of an object and this should keep it that way. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19157#issuecomment-2304546143 From adinn at openjdk.org Thu Aug 22 13:15:12 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 22 Aug 2024 13:15:12 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v5] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: <98PGDBTutglXqoo6q2UOq8xpnBJUZCxrqww5AKdqxg0=.765f6952-3c43-4684-a6b8-c7b51d8d9839@github.com> On Wed, 21 Aug 2024 16:11:25 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: > > cleanup: use a constexpr function for intpow instead of a templated class src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 100: > 98: > 99: assert(is_power_of_2(unroll_factor), "can't use this value to calculate the jump target PC"); > 100: orr(tmp2, cnt, 0x1fff ^ (unroll_factor - 1)); This is rather cryptic, especially the slightly arbitrary choice of 0x1fff -- any all 1s value greater than (loop_factor - 1) would do. I think a comment as to what is being done here might help maintainers understand what is happening here. // At this point cnt holds (r - l) where r is the number of remaining elements, // l is loop_count and 0 <= r < l. The constant operand to the orr 'rounds' this // negative value into range [-u, -1] where u = unroll_factor, by clearing bits // below bit k = log2(unroll_factor) and setting any higher bits that might be clear. // The subtract shifted by 3 offsets past ((u - r) % u) load + madd insns i.e. it only // executes r % u load + madds. Iteration eats up the remainder, u elements at a time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1727033707 From adinn at openjdk.org Thu Aug 22 13:52:07 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 22 Aug 2024 13:52:07 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v5] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Wed, 21 Aug 2024 16:11:25 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: > > cleanup: use a constexpr function for intpow instead of a templated class The code here looks very good. Apart from the one bit I highlighted as seriously in need of a comment it is all quite readable. I think it would be worth adding another comment in the code to document why loop_factor is doubled for bools and bytes. It was not immediately obvious that this was to to do with the available vector load options. A brief comment would stop a maintainer getting sidestepped by that difference and having to do some archaeology to resolve the confusion. You might also choose to document the reason for choosing 4 for the loop_unroll factor but we have the rationale here on the PR for anyone who needs to know. Oh, and I should have said: very nice work! ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18487#pullrequestreview-2254580682 PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2304723784 From vitaly.provodin at jetbrains.com Thu Aug 22 13:52:54 2024 From: vitaly.provodin at jetbrains.com (Vitaly Provodin) Date: Thu, 22 Aug 2024 17:52:54 +0400 Subject: 8315884: New Object to ObjectMonitor mapping causes linux-aarch64 musl to fail In-Reply-To: <7F626C6C-D7E0-4893-A699-A5A9A4AA33AC@jetbrains.com> References: <8E87A8D2-8147-4F66-92EF-F688F7ADDE07@jetbrains.com> <7F626C6C-D7E0-4893-A699-A5A9A4AA33AC@jetbrains.com> Message-ID: The issue was observed with gcc 9.3.0 After upgrading docker image up to alpine:3.14 with gcc 10.3.1 build became successful Sorry for disturbing you Thanks, Vitaly > On 20 Aug 2024, at 13:38, Vitaly Provodin wrote: > > Andrew, > > bug report may be found here - https://bugs.openjdk.org/browse/JDK-8338660 > > Please let me know if more info required. > > Thanks, > Vitaly > >> On 20 Aug 2024, at 12:07, Andrew Haley wrote: >> >> On 8/20/24 06:39, Vitaly Provodin wrote: >>> Not sure if a ticket should be submitted against this issue into JBS because I could not find any info about supporting build platform for at Linux aarch64 musl at https://wiki.openjdk.org/display/Build/Supported+Build+Platforms . Hopefully the list of supported platforms was outdated and aarch64 is still supported... >> >> Musl isn't involved here. This looks to me to be a GCC bug, and this line >> is the clue: >> >> during RTL pass: reload >> /mnt/agent/work/f25b6e4d8156543c/src/hotspot/share/runtime/synchronizer.cpp:1116:1: internal compiler error: in curr_insn_transform, at lra-constraints.c:3962 >> Please submit a full bug report, >> with preprocessed source if appropriate. >> See for instructions. >> >> I'd be interested to see the preprocessed C++. >> >> -- >> Andrew Haley (he/him) >> Java Platform Lead Engineer >> Red Hat UK Ltd. >> https://keybase.io/andrewhaley >> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From duke at openjdk.org Thu Aug 22 14:45:06 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Thu, 22 Aug 2024 14:45:06 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v5] In-Reply-To: <98PGDBTutglXqoo6q2UOq8xpnBJUZCxrqww5AKdqxg0=.765f6952-3c43-4684-a6b8-c7b51d8d9839@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <98PGDBTutglXqoo6q2UOq8xpnBJUZCxrqww5AKdqxg0=.765f6952-3c43-4684-a6b8-c7b51d8d9839@github.com> Message-ID: <4rVk-qWpazgP9uRaOCnjINR2NzFqrC5UNbOcSR7ZfQw=.8864f54e-ce4f-49c2-bde1-f569736df0f9@github.com> On Thu, 22 Aug 2024 13:12:23 GMT, Andrew Dinn wrote: >> Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: >> >> cleanup: use a constexpr function for intpow instead of a templated class > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 100: > >> 98: >> 99: assert(is_power_of_2(unroll_factor), "can't use this value to calculate the jump target PC"); >> 100: orr(tmp2, cnt, 0x1fff ^ (unroll_factor - 1)); > > This is rather cryptic, especially the slightly arbitrary choice of 0x1fff -- any all 1s value greater than (loop_factor - 1) would do. I think a comment as to what is being done here might help maintainers understand what is happening here. > > // At this point cnt holds (r - l) where r is the number of remaining elements, > // l is loop_count and 0 <= r < l. The constant operand to the orr 'rounds' this > // negative value into range [-u, -1] where u = unroll_factor, by clearing bits > // below bit k = log2(unroll_factor) and setting any higher bits that might be clear. > // The subtract shifted by 3 offsets past ((u - r) % u) load + madd insns i.e. it only > // executes r % u load + madds. Iteration eats up the remainder, u elements at a time. Thank you for providing a detailed comment! I can't remember why I needed the `0x1fff` literal. Probably this is an artifact left from WIP code. I believe this is not required anymore and the `orr` can be simplified to `orr(tmp2, cnt, -unroll_factor)`. Given that, I'd suggest to shorten the comment to: ```c++ // At this point cnt holds (r - l) where r is the number of remaining elements, l is loop_count // and 0 <= r < l. The orr performs (r - l) % u where u = unroll_factor. The subtract shifted by // 3 offsets past |(r - l) % u| load + madd insns i.e. it only executes r % u load + madds. // Iteration eats up the remainder, u elements at a time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1727195330 From rkennke at openjdk.org Thu Aug 22 14:51:15 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 14:51:15 GMT Subject: Withdrawn: 8305895: Implement JEP 450: Compact Object Headers (Experimental) In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 10:07:26 GMT, Roman Kennke wrote: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #20603 and #20605 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - The identity hash-code is temporarily narrowed to 25 bits. As soon as we get Tiny Class-Pointers (planned before the JEP can be integrated, and to be opened for review soon), we will widen the hash-bits back to 31 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant. > - Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral to a narrow-Klass* (o... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/20640 From rkennke at openjdk.org Thu Aug 22 14:51:15 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 14:51:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: Message-ID: <_dUMNfQKbcgEP_r6avLEDuPprVLFitHPaIWTxJ7_ZcU=.c711b62a-6bc1-4e69-85f2-52e38ccfeb87@github.com> On Thu, 22 Aug 2024 11:05:18 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #20603 and #20605 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - The identity hash-code is temporarily narrowed to 25 bits. As soon as we get Tiny Class-Pointers (planned before the JEP can be integrated, and to be opened for review soon), we will widen the hash-bits back to 31 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant. >> - Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral t... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge branch 'JDK-8305896-v2' into JDK-8305895-v3 > - Merge branch 'JDK-8305896-v2' into JDK-8305895-v3 > - Merge branch 'JDK-8305896-v2' into JDK-8305895-v3 > - Explicitely make AVAILABLE=true > - Export new hash constants to JVMCI > - Improve asserts > - 8305894: Implementation: JEP 450: Compact Object Headers (Experimental) Superseding by #20677 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20640#issuecomment-2304864714 From rkennke at openjdk.org Thu Aug 22 14:53:47 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 14:53:47 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) Message-ID: This is the main body of the JEP 450: Compact Object Headers (Experimental). It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. Main changes: - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). - Arrays will now store their length at offset 8. - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant. - Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral to a narrow-Klass* (or at least I could not find it), and also I fear that doing so could mess with optimizations. This may be useful to revisit. OTOH, the approach that I have taken works and is similar to DecodeNKlass and similar instructions. Testing: (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests.) The below testing has been run many times, but not with this exact base version of the JDK. I want to hold off the full testing until we also have the Tiny Class-Pointers PR lined-up, and test with that. - [x] tier1 (x86_64) - [ ] tier2 (x86_64) - [ ] tier3 (x86_64) - [ ] tier4 (x86_64) - [x] tier1 (aarch64) - [ ] tier2 (aarch64) - [ ] tier3 (aarch64) - [ ] tier4 (aarch64) - [x] tier1 (x86_64) +UseCompactObjectHeaders - [ ] tier2 (x86_64) +UseCompactObjectHeaders - [ ] tier3 (x86_64) +UseCompactObjectHeaders - [ ] tier4 (x86_64) +UseCompactObjectHeaders - [x] tier1 (aarch64) +UseCompactObjectHeaders - [ ] tier2 (aarch64) +UseCompactObjectHeaders - [ ] tier3 (aarch64) +UseCompactObjectHeaders - [ ] tier4 (aarch64) +UseCompactObjectHeaders - [x] Running as a backport in production since >1 year. ------------- Commit messages: - 8305895: Implement JEP 450: Compact Object Headers (Experimental) Changes: https://git.openjdk.org/jdk/pull/20677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305895 Stats: 4526 lines in 187 files changed: 3238 ins; 671 del; 617 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Thu Aug 22 15:00:09 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 15:00:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v2] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Add missing newline ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/ed032173..18e08c1e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From tonyp at openjdk.org Thu Aug 22 15:20:04 2024 From: tonyp at openjdk.org (Antonios Printezis) Date: Thu, 22 Aug 2024 15:20:04 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v3] In-Reply-To: References: Message-ID: On Wed, 21 Aug 2024 10:26:18 GMT, Hamlin Li wrote: >> ## Performance >> benchmarks run on CanVM-K230 >> >> data >> >> Benchmark m2+m1+scalar | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score +intrinsic+rvv | Score -intrinsic | Error | Units | Improvement >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 1 | avgt | 10 | 97.771 | 98.506 | 0.713 | ns/op | 1.008 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 3 | avgt | 10 | 117.715 | 118.422 | 0.428 | ns/op | 1.006 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 7 | avgt | 10 | 174.625 | 172.767 | 7.671 | ns/op | 0.989 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 32 | avgt | 10 | 286.391 | 317.175 | 11.443 | ns/op | 1.107 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 64 | avgt | 10 | 336.932 | 503.257 | 15.738 | ns/op | 1.494 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 80 | avgt | 10 | 418.894 | 625.485 | 7.21 | ns/op | 1.493 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 96 | avgt | 10 | 353.813 | 698.67 | 15.485 | ns/op | 1.975 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 112 | avgt | 10 | 499.243 | 866.909 | 4.427 | ns/op | 1.736 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 512 | avgt | 10 | 1451.277 | 3530.048 | 3.685 | ns/op | 2.432 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 1000 | avgt | 10 | 2258.785 | 5964.066 | 9.075 | ns/op | 2.64 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 20000 | avgt | 10 | 39689.204 | 122334.929 | 255.195 | ns/op | 3.082 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1 | avgt | 10 | 187.032 | 158.558 | 7.606 | ns/op | 0.848 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 3 | avgt | 10 | 209.558 | 200.774 | 7.648 | ns/op | 0.958 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 7 | avgt | 10 | 556.696 | 505.072 | 8.748 | ns/op | 0.907 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 32 | avgt | 10 | 2139.767 | 1876.825 | 13.787 | ns/op | 0.877 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 64 | avgt | 10 | 6142.353 | 3818.199 | 35.622 | ns/op | 0.622 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 80 | avgt | 10 | 8746.205 | 4787.155 | 109.819 | ns/op ... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > revert misc Quick comments on the scalar version. Looking at the vector version next. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5462: > 5460: __ sub(length, send, soff); > 5461: // it's not guaranteed by java level, so do it explicitly > 5462: __ andi(length, length, -4); Is it possible for `length == 0` here? src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5470: > 5468: > 5469: // load the codec base address > 5470: __ la(codec, ExternalAddress((address) fromBase64)); any way to avoid the double `la` when `isURL == true`? I guess you'd need one more label and an extra branch? Not sure it's worth it. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5497: > 5495: > 5496: // load 4 bytes encoded src data > 5497: __ lbu(byte0, Address(src, 0)); Is it faster to issue four 8-bit loads instead of one 32-bit load and getting the four values with shifting and masking? ------------- PR Review: https://git.openjdk.org/jdk/pull/20026#pullrequestreview-2254684381 PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1727202339 PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1727178004 PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1727163010 From adinn at openjdk.org Thu Aug 22 15:21:06 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 22 Aug 2024 15:21:06 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v5] In-Reply-To: <4rVk-qWpazgP9uRaOCnjINR2NzFqrC5UNbOcSR7ZfQw=.8864f54e-ce4f-49c2-bde1-f569736df0f9@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <98PGDBTutglXqoo6q2UOq8xpnBJUZCxrqww5AKdqxg0=.765f6952-3c43-4684-a6b8-c7b51d8d9839@github.com> <4rVk-qWpazgP9uRaOCnjINR2NzFqrC5UNbOcSR7ZfQw=.8864f54e-ce4f-49c2-bde1-f569736df0f9@github.com> Message-ID: On Thu, 22 Aug 2024 14:42:20 GMT, Mikhail Ablakatov wrote: >> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 100: >> >>> 98: >>> 99: assert(is_power_of_2(unroll_factor), "can't use this value to calculate the jump target PC"); >>> 100: orr(tmp2, cnt, 0x1fff ^ (unroll_factor - 1)); >> >> This is rather cryptic, especially the slightly arbitrary choice of 0x1fff -- any all 1s value greater than (loop_factor - 1) would do. I think a comment as to what is being done here might help maintainers understand what is happening here. >> >> // At this point cnt holds (r - l) where r is the number of remaining elements, >> // l is loop_count and 0 <= r < l. The constant operand to the orr 'rounds' this >> // negative value into range [-u, -1] where u = unroll_factor, by clearing bits >> // below bit k = log2(unroll_factor) and setting any higher bits that might be clear. >> // The subtract shifted by 3 offsets past ((u - r) % u) load + madd insns i.e. it only >> // executes r % u load + madds. Iteration eats up the remainder, u elements at a time. > > Thank you for providing a detailed comment! > > I can't remember why I needed the `0x1fff` literal. Probably this is an artifact left from WIP code. I believe this is not required anymore and the `orr` can be simplified to `orr(tmp2, cnt, -unroll_factor)`. > > Given that, I'd suggest to shorten the comment to: > > ```c++ > // At this point cnt holds (r - l) where r is the number of remaining elements, l is loop_count > // and 0 <= r < l. The orr performs (r - l) % u where u = unroll_factor. The subtract shifted by > // 3 offsets past |(r - l) % u| load + madd insns i.e. it only executes r % u load + madds. > // Iteration eats up the remainder, u elements at a time. Yes, just fixing the constant helps a lot. However, the comment is still worth having. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1727313087 From mli at openjdk.org Thu Aug 22 15:53:06 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 22 Aug 2024 15:53:06 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v3] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 14:46:26 GMT, Antonios Printezis wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> revert misc > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5462: > >> 5460: __ sub(length, send, soff); >> 5461: // it's not guaranteed by java level, so do it explicitly >> 5462: __ andi(length, length, -4); > > Is it possible for `length == 0` here? No, this is guaranteed at java level. > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5470: > >> 5468: >> 5469: // load the codec base address >> 5470: __ la(codec, ExternalAddress((address) fromBase64)); > > any way to avoid the double `la` when `isURL == true`? I guess you'd need one more label and an extra branch? Not sure it's worth it. Seems not necessary, and I see other places, also on other cpus do the similar things. > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5497: > >> 5495: >> 5496: // load 4 bytes encoded src data >> 5497: __ lbu(byte0, Address(src, 0)); > > Is it faster to issue four 8-bit loads instead of one 32-bit load and getting the four values with shifting and masking? Yeh, it could be, I will test it later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1727403705 PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1727401431 PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1727399419 From aph at openjdk.org Thu Aug 22 15:56:09 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 22 Aug 2024 15:56:09 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v5] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Thu, 22 Aug 2024 12:23:04 GMT, Mikhail Ablakatov wrote: > > One thing that's odd, but not really wrong. Why do you process byte arrays 32-wide instead of 16-wide like everything else? It makes the code more complex than doing everything 8-wide ... > > There's no arrangement specifier for `LD1 (multiple structures)` which instructs to load 4 single byte sized elements per a SIMD&FP register. Isn't that `ld1 V1.s, V2.s, V3.s, v4.s, [x1]`? > > ... and doesn't seem to increase performance, either with my measurements or yours. > > What measurements are you referring to here? Your performance figures, and mine, as quoted in this PR. It's really not important, though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2305108265 From matsaave at openjdk.org Thu Aug 22 15:58:10 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 22 Aug 2024 15:58:10 GMT Subject: RFR: 8335664: Parsing jsr broken: assert(bci>= 0 && bci < c->method()->code_size()) failed: index out of bounds [v2] In-Reply-To: References: Message-ID: On Wed, 21 Aug 2024 19:53:16 GMT, Dean Long wrote: >> Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: >> >> - Removed incorrect comment and added copyright >> - Added regression test > > Let's say the JSR is not the last instruction, but it doesn't return because there is no RET instruction. What is the effect of creating a basic block that won't be used -- is it harmless? Do we need a test for that? Thanks for the reviews @dean-long and @TobiHartmann! And thank you @eme64 for the reproducer! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20645#issuecomment-2305109628 From matsaave at openjdk.org Thu Aug 22 15:58:11 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 22 Aug 2024 15:58:11 GMT Subject: Integrated: 8335664: Parsing jsr broken: assert(bci>= 0 && bci < c->method()->code_size()) failed: index out of bounds In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 14:12:55 GMT, Matias Saavedra Silva wrote: > Although JSR bytecodes cannot be generated by javac anymore, a classfile generated with a tool like JASM can still contain this bytecode. Should a program end with a JSR, there will be undefined behavior since the bytecode reads the address of the next instruction. In the case of Hotspot, this leads to a crash when generating oop maps. This fixes the calculation of basic blocks. > > The early exploration of this issue was done by @eme64 who also generated a reproducer. This pull request has now been integrated. Changeset: 6041c936 Author: Matias Saavedra Silva URL: https://git.openjdk.org/jdk/commit/6041c936d6dd39c5b3a89ed2823b25a8aef42b9f Stats: 127 lines in 4 files changed: 118 ins; 6 del; 3 mod 8335664: Parsing jsr broken: assert(bci>= 0 && bci < c->method()->code_size()) failed: index out of bounds Co-authored-by: Emanuel Peter Reviewed-by: dlong, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/20645 From rkennke at openjdk.org Thu Aug 22 16:23:48 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 16:23:48 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Remove hashcode leftovers from SA ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/18e08c1e..1578ffae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From aph-open at littlepinkcloud.com Thu Aug 22 16:30:30 2024 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Thu, 22 Aug 2024 17:30:30 +0100 Subject: 8315884: New Object to ObjectMonitor mapping causes linux-aarch64 musl to fail In-Reply-To: References: <8E87A8D2-8147-4F66-92EF-F688F7ADDE07@jetbrains.com> <7F626C6C-D7E0-4893-A699-A5A9A4AA33AC@jetbrains.com> Message-ID: <196a2aea-557a-4ff1-8798-3935f9873b11@littlepinkcloud.com> On 8/22/24 14:52, Vitaly Provodin wrote: > The issue was observed with gcc 9.3.0 > After upgrading docker image up to alpine:3.14 with gcc 10.3.1 build became successful Excellent! Thanks for letting me know. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From tonyp at openjdk.org Thu Aug 22 17:04:05 2024 From: tonyp at openjdk.org (Antonios Printezis) Date: Thu, 22 Aug 2024 17:04:05 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v3] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 15:50:05 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5462: >> >>> 5460: __ sub(length, send, soff); >>> 5461: // it's not guaranteed by java level, so do it explicitly >>> 5462: __ andi(length, length, -4); >> >> Is it possible for `length == 0` here? > > No, this is guaranteed at java level. Great. Maybe add a comment to that effect here? >> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5470: >> >>> 5468: >>> 5469: // load the codec base address >>> 5470: __ la(codec, ExternalAddress((address) fromBase64)); >> >> any way to avoid the double `la` when `isURL == true`? I guess you'd need one more label and an extra branch? Not sure it's worth it. > > Seems not necessary, and I see other places, also on other cpus do the similar things. OK! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1727509181 PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1727509434 From dcubed at openjdk.org Thu Aug 22 17:25:02 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 22 Aug 2024 17:25:02 GMT Subject: RFR: 8204681: Option to include timestamp in hprof filename In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 10:59:58 GMT, Kevin Walls wrote: > which says "%t => YYYY-MM-DD_HH-MM-SS" so maybe we could follow that pattern Another vote for the above time stamp pattern. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20568#issuecomment-2305273185 From mli at openjdk.org Thu Aug 22 17:43:04 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 22 Aug 2024 17:43:04 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v3] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 17:01:46 GMT, Antonios Printezis wrote: >> No, this is guaranteed at java level. > > Great. Maybe add a comment to that effect here? Sure, I can do it later when addressing other comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1727561015 From mli at openjdk.org Thu Aug 22 17:43:05 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 22 Aug 2024 17:43:05 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v3] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 15:47:32 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5497: >> >>> 5495: >>> 5496: // load 4 bytes encoded src data >>> 5497: __ lbu(byte0, Address(src, 0)); >> >> Is it faster to issue four 8-bit loads instead of one 32-bit load and getting the four values with shifting and masking? > > Yeh, it could be, I will test it later. there is bit regression with `lw` instead of `lb`. I think the rational is that lb*4 are not really 4 loads, as they are continuos in address. But with `lw`, it will need other operations to move data from word to 4 bytes respectively. Benchmark | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score -lb | Score -lw | Units | Improvement -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- Base64Decode.testBase64Decode | 0 | 144 | 4 | 1 | avgt | ? | 123.03 | 124.96 | ns/op | 1.016 Base64Decode.testBase64Decode | 0 | 144 | 4 | 3 | avgt | ? | 146.828 | 145.344 | ns/op | 0.99 Base64Decode.testBase64Decode | 0 | 144 | 4 | 7 | avgt | ? | 197.021 | 202.769 | ns/op | 1.029 Base64Decode.testBase64Decode | 0 | 144 | 4 | 32 | avgt | ? | 310.964 | 328.367 | ns/op | 1.056 Base64Decode.testBase64Decode | 0 | 144 | 4 | 64 | avgt | ? | 432.836 | 464.795 | ns/op | 1.074 Base64Decode.testBase64Decode | 0 | 144 | 4 | 80 | avgt | ? | 543.394 | 570.661 | ns/op | 1.05 Base64Decode.testBase64Decode | 0 | 144 | 4 | 96 | avgt | ? | 599.538 | 659.938 | ns/op | 1.101 Base64Decode.testBase64Decode | 0 | 144 | 4 | 112 | avgt | ? | 714.922 | 793.329 | ns/op | 1.11 Base64Decode.testBase64Decode | 0 | 144 | 4 | 512 | avgt | ? | 3054.931 | 3356.059 | ns/op | 1.099 Base64Decode.testBase64Decode | 0 | 144 | 4 | 1000 | avgt | ? | 4921.95 | 5413.909 | ns/op | 1.1 Base64Decode.testBase64Decode | 0 | 144 | 4 | 20000 | avgt | ? | 78169.374 | 89504.671 | ns/op | 1.145 Base64Decode.testBase64Decode | 0 | 144 | 4 | 50000 | avgt | ? | 188385.163 | 218491.724 | ns/op | 1.16 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1727560148 From rkennke at openjdk.org Thu Aug 22 17:59:10 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 17:59:10 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v4] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix hash_mask_in_place in ClhsdbLongConstant test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/1578ffae..7009e147 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Thu Aug 22 18:18:01 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 18:18:01 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v5] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix hash shift for 32 bit builds ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/7009e147..5ffc582f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From szaldana at openjdk.org Thu Aug 22 19:21:02 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Thu, 22 Aug 2024 19:21:02 GMT Subject: RFR: 8204681: Option to include timestamp in hprof filename In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 15:07:17 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses [8204681](https://bugs.openjdk.org/browse/JDK-8204681) enabling support for timestamp expansion in filenames specified in `-XX:HeapDumpPath` using `%t`. > > As mentioned in this comments for this issue, this is somewhat related to [8334492](https://bugs.openjdk.org/browse/JDK-8334492) where we enabled support for `%p` for filenames specified in jcmd. > > With this patch, I propose: > - Expanding the utility function `Arguments::copy_expand_pid` to `Arguments::copy_expand_arguments` to deal with `%p` expansions for pid and `%t` expansions for timestamps. > - Leveraging the above utility function to enable argument expansion for both heap dump filenames and jcmd output commands. > - Though the linked JBS issue only relates to heap dumps generated in case of OOM, I think we can edit it to more broadly support filename expansion to support `%t` for jcmd as well. > > Testing: > - [x] Added test cases pass with all platforms (verified with a GHA job). > - [x] Tier 1 passes with GHA. > > Looking forward to hearing your thoughts! > > Thanks, > Sonia Hi folks, thanks for the intial comments. Just noting I will be away on holiday until September 9th, so I will not be able to address these until then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20568#issuecomment-2305466115 From ihse at openjdk.org Thu Aug 22 19:29:03 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 22 Aug 2024 19:29:03 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v5] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 18:18:01 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix hash shift for 32 bit builds Build changes look good. I have not looked at any other code. ------------- Marked as reviewed by ihse (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2255474321 From rkennke at openjdk.org Thu Aug 22 20:08:43 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 20:08:43 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix bit counts in GCForwarding ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/5ffc582f..eaec1117 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From ayang at openjdk.org Thu Aug 22 20:16:07 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 22 Aug 2024 20:16:07 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v5] In-Reply-To: References: Message-ID: <-G_gdaZBT2xhZFsdyEwIqiOHpbLpiL79N6NDsW8X2BY=.bc52bd8a-21c5-40e7-a921-a5f68675200f@github.com> On Thu, 22 Aug 2024 18:18:01 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix hash shift for 32 bit builds src/hotspot/share/gc/shared/gcForwarding.cpp line 37: > 35: size_t max_narrow_heap_size = right_n_bits(NumLowBitsNarrow - Shift); > 36: if (UseCompactObjectHeaders && max_heap_size > max_narrow_heap_size * HeapWordSize) { > 37: FLAG_SET_DEFAULT(UseCompactObjectHeaders, false); Maybe a log-info/warning would be nice. src/hotspot/share/gc/shared/gcForwarding.hpp line 36: > 34: * Implements forwarding for the full-GCs of Serial, Parallel, G1 and Shenandoah in > 35: * a way that preserves upper N bits of object mark-words, which contain crucial > 36: * Klass* information when running with compact headers. The encoding is similar to This doc suggests this forwarding is only for compact-header so I wonder if we can check `UseCompactObjectHeaders` directly instead of heap-size in `GCForwarding::initialize`. src/hotspot/share/gc/shared/gcForwarding.hpp line 40: > 38: * heap-base, shifts that difference into the right place, and sets the lowest two > 39: * bits (to indicate 'forwarded' state as usual). > 40: */ > "can use 40 bits for forwardee encoding. That's enough for 8TB of heap." I feel this 8T-constraint is significant and should be in the doc. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1727708193 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1727727638 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1727732496 From ayang at openjdk.org Thu Aug 22 20:16:05 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 22 Aug 2024 20:16:05 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 16:23:48 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Remove hashcode leftovers from SA src/hotspot/share/gc/parallel/mutableSpace.cpp line 232: > 230: p += obj->forwardee()->size(); > 231: } else { > 232: p += obj->size(); I feel it's more correct to go through the forwardee for forwarded objs even for the non-COMPACT_HEADERS case. (This method is meant to cover all objs, so should not be perf-critical.) IOW, the `false` case should just be dropped. src/hotspot/share/gc/serial/defNewGeneration.cpp line 707: > 705: } else if (obj->is_forwarded()) { > 706: // To restore the klass-bits in the header. > 707: obj->forward_safe_init_mark(); I wonder if not modifying successful-forwarded objs is cleaner. Sth like: reset_self_forwarded_in_space(space) { cur = space->bottom(); top = space->top(); while (cur < top) { obj = cast_to_oop(cur); if (obj->is_self_forwarded()) { obj->unset_self_forwarded(); obj_size = obj->size(); } else { assert(obj->is_forwarded(), "inv"); obj_size = obj->forwardee()->size(); } cur += obj_size; } } reset_self_forwarded_in_space(eden()); reset_self_forwarded_in_space(from()); src/hotspot/share/gc/serial/serialArguments.cpp line 33: > 31: void SerialArguments::initialize_heap_flags_and_sizes() { > 32: GenArguments::initialize_heap_flags_and_sizes(); > 33: GCForwarding::initialize_flags(MaxNewSize + MaxOldSize); Can one use `MaxHeapSize` here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1727547638 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1727524479 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1727548413 From tonyp at openjdk.org Thu Aug 22 21:16:04 2024 From: tonyp at openjdk.org (Antonios Printezis) Date: Thu, 22 Aug 2024 21:16:04 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v3] In-Reply-To: References: Message-ID: On Wed, 21 Aug 2024 10:26:18 GMT, Hamlin Li wrote: >> ## Performance >> benchmarks run on CanVM-K230 >> >> data >> >> Benchmark m2+m1+scalar | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score +intrinsic+rvv | Score -intrinsic | Error | Units | Improvement >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 1 | avgt | 10 | 97.771 | 98.506 | 0.713 | ns/op | 1.008 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 3 | avgt | 10 | 117.715 | 118.422 | 0.428 | ns/op | 1.006 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 7 | avgt | 10 | 174.625 | 172.767 | 7.671 | ns/op | 0.989 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 32 | avgt | 10 | 286.391 | 317.175 | 11.443 | ns/op | 1.107 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 64 | avgt | 10 | 336.932 | 503.257 | 15.738 | ns/op | 1.494 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 80 | avgt | 10 | 418.894 | 625.485 | 7.21 | ns/op | 1.493 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 96 | avgt | 10 | 353.813 | 698.67 | 15.485 | ns/op | 1.975 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 112 | avgt | 10 | 499.243 | 866.909 | 4.427 | ns/op | 1.736 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 512 | avgt | 10 | 1451.277 | 3530.048 | 3.685 | ns/op | 2.432 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 1000 | avgt | 10 | 2258.785 | 5964.066 | 9.075 | ns/op | 2.64 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 20000 | avgt | 10 | 39689.204 | 122334.929 | 255.195 | ns/op | 3.082 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1 | avgt | 10 | 187.032 | 158.558 | 7.606 | ns/op | 0.848 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 3 | avgt | 10 | 209.558 | 200.774 | 7.648 | ns/op | 0.958 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 7 | avgt | 10 | 556.696 | 505.072 | 8.748 | ns/op | 0.907 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 32 | avgt | 10 | 2139.767 | 1876.825 | 13.787 | ns/op | 0.877 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 64 | avgt | 10 | 6142.353 | 3818.199 | 35.622 | ns/op | 0.622 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 80 | avgt | 10 | 8746.205 | 4787.155 | 109.819 | ns/op ... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > revert misc LGTM ------------- Marked as reviewed by tonyp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20026#pullrequestreview-2255672794 From dholmes at openjdk.org Thu Aug 22 22:52:05 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 22 Aug 2024 22:52:05 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: On Thu, 22 Aug 2024 08:54:56 GMT, Magnus Ihse Bursie wrote: >> Sorry but I don't understand the point of changing build-time constructs using `ifdef STATIC_BUILD` into what appear to be runtime checks, but the result of which is already determined at build time. These are not really runtime checks. > > @dholmes-ora > >> Sorry but I don't understand the point of changing build-time constructs using ifdef STATIC_BUILD into what appear to be runtime checks, but the result of which is already determined at build time. > > I apologize. I did not express the intent of this change clear enough. > > The background is that we want to build and test statically linked native libraries. Currently, building a statically linked version requires recompiling all native source code into a completely new set of .o files, which are then linked into a static library. > > This is extremely wasteful. Most of the code is completely identical for static and dynamic libraries. To fix this, me, Jiangli and her team have been working on a way to get around this. > > By moving the ifdef check to a new file that just contains a single function, we only need to compile this single file twice -- once for the static library, and once for the dynamic library. All other .o files is compiled just once, and then you link "all other files" + "the one special file for your kind of library" to get what you want. > > Unfortunately, there is also one more blocker before this can be achieved. That is the reason the corresponding change in the build system is not included in this patch. (So this is a preparation for these future changes, but not the complete solution.) The missing part is that the `[JNI|Agent]_On[Un]Load` functions need to be able to use the static linked naming scheme, even for dynamically linked libraries. This is trivial per se, but requires a spec change, which has not yet happened. > > The reason I want to get this partial solution into the mainline right now, instead of waiting for the spec change and the complete build system fix, is that these new functions for checking for static/dynamic are needed by additional changes that Jiangli have created upstream, and that I am trying to help her get integrated. (The goal of these changes is to make not just static libraries, but to link these static libraries with the java launcher into a statically linked launcher, which is a pre-requisite for the rest of the Hermetic Java story.) @magicus is the final intent here that this one magic file will be compiled first with an inline declaration such that when the other files containing the apparent runtime check get compiled, it can actually be determined at build time and so have the same effects as the old ifdef logic? Otherwise it concerns me that a build-time issue that affects a handful of people becomes a runtime issue that affects every single instance of a running Java program. There are also other source-level solutions possible here by refactoring the code that has static vs dynamic linking dependencies into its own files and the build system can then select which set of files to compile. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20666#issuecomment-2305875408 From kvn at openjdk.org Fri Aug 23 01:05:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 23 Aug 2024 01:05:13 GMT Subject: RFR: 8328880: Events::log_exception should limit the size of the logging message In-Reply-To: <8KXxdCRCjeJny7i7Sv-d-7vTuGNJwUzq94r5jnG1o3I=.c59b2513-5066-46ea-9731-de3faf7e40f0@github.com> References: <8KXxdCRCjeJny7i7Sv-d-7vTuGNJwUzq94r5jnG1o3I=.c59b2513-5066-46ea-9731-de3faf7e40f0@github.com> Message-ID: On Tue, 20 Aug 2024 05:22:40 GMT, David Holmes wrote: > This simple enhancement allows for `Exceptions::_throw` to limit the message length printed by `Events::log_exception` in the same way that unified logging is limited. We simply allow a `message_length_limit` variable to be passed down - default value zero which means no limit (i.e. the full `strlen` of the message will be printed). > > Testing: > - tiers 1-3 > > Thanks src/hotspot/share/utilities/events.cpp line 169: > 167: h_exception->print_value_on(&st); > 168: if (message != nullptr) { > 169: int len = message_length_limit > 0 ? message_length_limit : (int)strlen(message); Do we need to check that `message_length_limit <= (int)strlen(message)`? Or it is intentional to reserve bigger space in output than `message`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20638#discussion_r1728088097 From dholmes at openjdk.org Fri Aug 23 01:33:15 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 23 Aug 2024 01:33:15 GMT Subject: RFR: 8328880: Events::log_exception should limit the size of the logging message In-Reply-To: References: <8KXxdCRCjeJny7i7Sv-d-7vTuGNJwUzq94r5jnG1o3I=.c59b2513-5066-46ea-9731-de3faf7e40f0@github.com> Message-ID: On Fri, 23 Aug 2024 01:02:29 GMT, Vladimir Kozlov wrote: >> This simple enhancement allows for `Exceptions::_throw` to limit the message length printed by `Events::log_exception` in the same way that unified logging is limited. We simply allow a `message_length_limit` variable to be passed down - default value zero which means no limit (i.e. the full `strlen` of the message will be printed). >> >> Testing: >> - tiers 1-3 >> >> Thanks > > src/hotspot/share/utilities/events.cpp line 169: > >> 167: h_exception->print_value_on(&st); >> 168: if (message != nullptr) { >> 169: int len = message_length_limit > 0 ? message_length_limit : (int)strlen(message); > > Do we need to check that `message_length_limit <= (int)strlen(message)`? > Or it is intentional to reserve bigger space in output than `message`? I don't quite understand the question. We are specifying the maximum number of characters to print: it is either `message_length_limit` if that is > 0 otherwise it is no limit i.e. the whole string i.e. `strlen(message)`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20638#discussion_r1728105202 From kvn at openjdk.org Fri Aug 23 02:04:03 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 23 Aug 2024 02:04:03 GMT Subject: RFR: 8328880: Events::log_exception should limit the size of the logging message In-Reply-To: References: <8KXxdCRCjeJny7i7Sv-d-7vTuGNJwUzq94r5jnG1o3I=.c59b2513-5066-46ea-9731-de3faf7e40f0@github.com> Message-ID: <1s7yBKLxjKiACuOVr-WitPwiiEHX9q06lnNdrZZEXC0=.1c7077af-c6d6-4b3c-aa9e-80f8deeb4987@github.com> On Fri, 23 Aug 2024 01:30:19 GMT, David Holmes wrote: >> src/hotspot/share/utilities/events.cpp line 169: >> >>> 167: h_exception->print_value_on(&st); >>> 168: if (message != nullptr) { >>> 169: int len = message_length_limit > 0 ? message_length_limit : (int)strlen(message); >> >> Do we need to check that `message_length_limit <= (int)strlen(message)`? >> Or it is intentional to reserve bigger space in output than `message`? > > I don't quite understand the question. We are specifying the maximum number of characters to print: it is either `message_length_limit` if that is > 0 otherwise it is no limit i.e. the whole string i.e. `strlen(message)`. Someone can pass `message_length_limit` which is > strlen(message). What happens in this case? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20638#discussion_r1728142135 From dholmes at openjdk.org Fri Aug 23 02:27:08 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 23 Aug 2024 02:27:08 GMT Subject: RFR: 8328880: Events::log_exception should limit the size of the logging message In-Reply-To: <1s7yBKLxjKiACuOVr-WitPwiiEHX9q06lnNdrZZEXC0=.1c7077af-c6d6-4b3c-aa9e-80f8deeb4987@github.com> References: <8KXxdCRCjeJny7i7Sv-d-7vTuGNJwUzq94r5jnG1o3I=.c59b2513-5066-46ea-9731-de3faf7e40f0@github.com> <1s7yBKLxjKiACuOVr-WitPwiiEHX9q06lnNdrZZEXC0=.1c7077af-c6d6-4b3c-aa9e-80f8deeb4987@github.com> Message-ID: On Fri, 23 Aug 2024 02:01:47 GMT, Vladimir Kozlov wrote: >> I don't quite understand the question. We are specifying the maximum number of characters to print: it is either `message_length_limit` if that is > 0 otherwise it is no limit i.e. the whole string i.e. `strlen(message)`. > > Someone can pass `message_length_limit` which is > strlen(message). What happens in this case? Printing stops at the end of `message`. Given this: const char* msg = "This is the message"; printf(">>%.*s<<\n", 40, msg); we get: >>This is the message<< Having a limit >> strlen(message) is the normal expected case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20638#discussion_r1728160978 From kvn at openjdk.org Fri Aug 23 02:38:12 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 23 Aug 2024 02:38:12 GMT Subject: RFR: 8328880: Events::log_exception should limit the size of the logging message In-Reply-To: <8KXxdCRCjeJny7i7Sv-d-7vTuGNJwUzq94r5jnG1o3I=.c59b2513-5066-46ea-9731-de3faf7e40f0@github.com> References: <8KXxdCRCjeJny7i7Sv-d-7vTuGNJwUzq94r5jnG1o3I=.c59b2513-5066-46ea-9731-de3faf7e40f0@github.com> Message-ID: <6jjzCROmX6qT0p8DfByUcFO2yLoFc0Bt2oN5JOdSf2E=.6536565c-ac01-4e08-aa49-fe104a389e5d@github.com> On Tue, 20 Aug 2024 05:22:40 GMT, David Holmes wrote: > This simple enhancement allows for `Exceptions::_throw` to limit the message length printed by `Events::log_exception` in the same way that unified logging is limited. We simply allow a `message_length_limit` variable to be passed down - default value zero which means no limit (i.e. the full `strlen` of the message will be printed). > > Testing: > - tiers 1-3 > > Thanks Marked as reviewed by kvn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20638#pullrequestreview-2256124508 From dholmes at openjdk.org Fri Aug 23 02:38:12 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 23 Aug 2024 02:38:12 GMT Subject: RFR: 8328880: Events::log_exception should limit the size of the logging message In-Reply-To: <6jjzCROmX6qT0p8DfByUcFO2yLoFc0Bt2oN5JOdSf2E=.6536565c-ac01-4e08-aa49-fe104a389e5d@github.com> References: <8KXxdCRCjeJny7i7Sv-d-7vTuGNJwUzq94r5jnG1o3I=.c59b2513-5066-46ea-9731-de3faf7e40f0@github.com> <6jjzCROmX6qT0p8DfByUcFO2yLoFc0Bt2oN5JOdSf2E=.6536565c-ac01-4e08-aa49-fe104a389e5d@github.com> Message-ID: On Fri, 23 Aug 2024 02:33:27 GMT, Vladimir Kozlov wrote: >> This simple enhancement allows for `Exceptions::_throw` to limit the message length printed by `Events::log_exception` in the same way that unified logging is limited. We simply allow a `message_length_limit` variable to be passed down - default value zero which means no limit (i.e. the full `strlen` of the message will be printed). >> >> Testing: >> - tiers 1-3 >> >> Thanks > > Marked as reviewed by kvn (Reviewer). Thanks for the review @vnkozlov ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20638#issuecomment-2306074143 From kvn at openjdk.org Fri Aug 23 02:38:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 23 Aug 2024 02:38:13 GMT Subject: RFR: 8328880: Events::log_exception should limit the size of the logging message In-Reply-To: References: <8KXxdCRCjeJny7i7Sv-d-7vTuGNJwUzq94r5jnG1o3I=.c59b2513-5066-46ea-9731-de3faf7e40f0@github.com> <1s7yBKLxjKiACuOVr-WitPwiiEHX9q06lnNdrZZEXC0=.1c7077af-c6d6-4b3c-aa9e-80f8deeb4987@github.com> Message-ID: On Fri, 23 Aug 2024 02:24:32 GMT, David Holmes wrote: >> Someone can pass `message_length_limit` which is > strlen(message). What happens in this case? > > Printing stops at the end of `message`. Given this: > > const char* msg = "This is the message"; > printf(">>%.*s<<\n", 40, msg); > > we get: > >>>This is the message<< > > Having a limit >> strlen(message) is the normal expected case. Good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20638#discussion_r1728166895 From dholmes at openjdk.org Fri Aug 23 02:38:13 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 23 Aug 2024 02:38:13 GMT Subject: Integrated: 8328880: Events::log_exception should limit the size of the logging message In-Reply-To: <8KXxdCRCjeJny7i7Sv-d-7vTuGNJwUzq94r5jnG1o3I=.c59b2513-5066-46ea-9731-de3faf7e40f0@github.com> References: <8KXxdCRCjeJny7i7Sv-d-7vTuGNJwUzq94r5jnG1o3I=.c59b2513-5066-46ea-9731-de3faf7e40f0@github.com> Message-ID: On Tue, 20 Aug 2024 05:22:40 GMT, David Holmes wrote: > This simple enhancement allows for `Exceptions::_throw` to limit the message length printed by `Events::log_exception` in the same way that unified logging is limited. We simply allow a `message_length_limit` variable to be passed down - default value zero which means no limit (i.e. the full `strlen` of the message will be printed). > > Testing: > - tiers 1-3 > > Thanks This pull request has now been integrated. Changeset: ea337098 Author: David Holmes URL: https://git.openjdk.org/jdk/commit/ea3370982bfd3da4b200b738dd3b8c16cebb3a34 Stats: 18 lines in 3 files changed: 10 ins; 1 del; 7 mod 8328880: Events::log_exception should limit the size of the logging message Reviewed-by: shade, kvn ------------- PR: https://git.openjdk.org/jdk/pull/20638 From david.holmes at oracle.com Fri Aug 23 04:33:24 2024 From: david.holmes at oracle.com (David Holmes) Date: Fri, 23 Aug 2024 14:33:24 +1000 Subject: RFC 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths In-Reply-To: References: Message-ID: I've had some internal feedback which has been incorporated in the CSR request: https://bugs.openjdk.org/browse/JDK-8338709 Proposed name of the new function is now GetStringUTFLengthAsLong. David On 20/08/2024 8:34 am, David Holmes wrote: > > Broadening the audience to hotspot-dev as zero response on > hotspot-runtime-dev. > > David > > On 13/08/2024 4:12 pm, David Holmes wrote: >> >> Comment is sought on this proposed updated to the JNI Specification >> >> https://bugs.openjdk.org/browse/JDK-8328877 >> >> The modified UTf-8 format used by the VM can lead to UTF-8 sequences >> that exceed the maximum value of an int, due to multi-byte encoding, >> but the JNI GetStringUTFLength returns a jsize, which is (perhaps >> incorrectly) a jint ie. an int. As a result the current >> implementation will return a truncated version of the length of the >> sequence. To address this we propose to do two things in the JNI spec: >> >> 1. We Deprecate GetStringUTFLength >> >> +### GetStringUTFLength (Deprecated) >> >> ?`jsize GetStringUTFLength(JNIEnv *env, jstring string);` >> >> ?Returns the length in bytes of the modified UTF-8 representation of a string. >> >> +As the capacity of a `jsize` variable is not sufficient to hold the length of >> +all possible modified UTF-8 string representations (due to multi-byte encodings) >> +this function is deprecated in favor of [`GetLargeStringUTFLength()`](#getlargestringutflength). >> +If the modified UTF-8 representation of `string` has a length that exceeds the capacity >> +of a `jsize` variable, then the length as of the last character that could be fully >> +encoded without exceeding that capacity, is returned. >> >> 2. We add a new function GetLargeStringUTFLength >> >> +### GetLargeStringUTFLength >> + >> +`jlong GetLargeStringUTFLength(JNIEnv *env, jstring string);` >> + >> +Returns the complete length in bytes of the modified UTF-8 representation of a string. >> >> In addition we tweak the wording of GetStringUTFChars so that it: >> >> a) refers to a byte sequence instead of a byte array (to avoid >> suggesting the returned sequence is limited by the capacity of a Java >> array); and >> >> b) references the new GetLargeStringUTFLength function instead of the >> Deprecated GetStringUTFLength >> >> Note that GetStringUTFRegion is still using an int length so can't be >> used to obtain a giant region, but we don't expect this to be a >> practical concern. >> >> The JNI version will also be bumped for this API addition. >> >> Thanks, >> David >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From dholmes at openjdk.org Fri Aug 23 04:31:07 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 23 Aug 2024 04:31:07 GMT Subject: RFR: 8204681: Option to include timestamp in hprof filename In-Reply-To: <3Z33v_k5LPdJndHtRPY6JnKHWsJWilQRyYxa7DFUftM=.2954c811-04db-49a8-8316-b21cdab28558@github.com> References: <3Z33v_k5LPdJndHtRPY6JnKHWsJWilQRyYxa7DFUftM=.2954c811-04db-49a8-8316-b21cdab28558@github.com> Message-ID: On Wed, 21 Aug 2024 09:54:08 GMT, Thomas Stuefe wrote: >> Hi all, >> >> This PR addresses [8204681](https://bugs.openjdk.org/browse/JDK-8204681) enabling support for timestamp expansion in filenames specified in `-XX:HeapDumpPath` using `%t`. >> >> As mentioned in this comments for this issue, this is somewhat related to [8334492](https://bugs.openjdk.org/browse/JDK-8334492) where we enabled support for `%p` for filenames specified in jcmd. >> >> With this patch, I propose: >> - Expanding the utility function `Arguments::copy_expand_pid` to `Arguments::copy_expand_arguments` to deal with `%p` expansions for pid and `%t` expansions for timestamps. >> - Leveraging the above utility function to enable argument expansion for both heap dump filenames and jcmd output commands. >> - Though the linked JBS issue only relates to heap dumps generated in case of OOM, I think we can edit it to more broadly support filename expansion to support `%t` for jcmd as well. >> >> Testing: >> - [x] Added test cases pass with all platforms (verified with a GHA job). >> - [x] Tier 1 passes with GHA. >> >> Looking forward to hearing your thoughts! >> >> Thanks, >> Sonia > > I think this could be very useful, but it needs more preparation and decisions. Possibly a CSR. > > - copy_expand_xxx is used in many places. While I think all of these places would benefit from more expansions than just %p, there is a potential backward compatibility issue if clients use %t for whatever reason today > - Do we want the time of the dump or the JVM start? If the JVM runs for a week, then produces a JFR file, should the file be named by the JVM start date? I think in most cases the *current* time makes more sense > - Do we want the printout as a human-readable date or as a numeric timestamp? Both makes sense depending on the post-processing clients want to do. > - Do we want to improve this function further, potentially adding more replacement options? > > One possible way to solve this: > - use different characters for timestamp (number) and datetime (human readable date) > - use always the current time > - If we want to add further replacements: > - come up with a new replacement character that does not clash with libc sprintf (IMHO using percent was not a good idea in the first place). E.g. `$` > - Add a new switch to guard this new replacement logic. By default off. If on, the contract is that any character following a `$` may be either now or in the future replaced with something different. Client must not use `$` as a normal character. > - We probably should remove all non-matching `$` from the input. > - The first replacements could be: `$p` for pid, `$t` for timestamp (numeric), `$d` for datetime > - later replacements can be added later. Since we guard the new feature with a switch and forbid the use of `$`, we are then free to do so without breaking backward compatibility. > > I would like to hear @dholmes-ora take on this. > > We had a similar system at SAP in our proprietary JVM, which was really useful, so I like this idea in general. I don't object (don't really have strong views) on adding this functionality, but as @tstuefe notes there are a few things to consider. I'm not really averse to using the `%` character precisely because it is commonly identified as a format specifier - and I think `$` would be very problematic due to shell issues. At the risk of seeking the perfect instead of just doing what is immediately "good enough" we might also look at the unified logging decorators as potential formats: Available log decorators: time (t), utctime (utc), uptime (u), timemillis (tm), uptimemillis (um), timenanos (tn), uptimenanos (un), hostname (hn), pid (p), tid (ti), level (l), tags (tg) and it may also allow for some code sharing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20568#issuecomment-2306234386 From jbhateja at openjdk.org Fri Aug 23 05:46:29 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 23 Aug 2024 05:46:29 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v4] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Defaulting to index wrapping scheme. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/e24632cb..d7ad6887 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=02-03 Stats: 424 lines in 39 files changed: 0 ins; 361 del; 63 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From aboldtch at openjdk.org Fri Aug 23 05:50:08 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 23 Aug 2024 05:50:08 GMT Subject: RFR: 8338810: PPC, s390x: LightweightSynchronizer::exit asserts, missing lock In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 06:20:23 GMT, Axel Boldt-Christmas wrote: > [JDK-8338638](https://bugs.openjdk.org/browse/JDK-8338638) made me realise that PPC and s390x have the same issue. > > The issue is that the C2 unlock path will check if the monitor is inflated after popping of the last entry on the lock stack. With UseObjectMonitorTable (without the the cache lookup implemented), the slow path is incorrectly taken without resting the popped oop. Currently the runtime expects the the lock stack to be consistent (have an entry) in exit if a the monitor is anonymously inflated. > > I'll provide a bandaid fix which pushes back the oop before the calling to the runtime. > > A future enhancement for all platform would be to allow the C2 entry point to redo the push when taking the slow path and it realises that the monitor is anonymously inflated or it is fast locked and the lock stack does not contain the oop. (Removing all the push back logic from the emitted C2 unlock nodes) > > I am unable to test ppc and s390x, so I have not verified that the issue is reproduced nor that this fixes it. > Hopefully @TheRealMDoerr and @offamitkumar can assist me here with running `test/hotspot/jtreg/runtime/Monitor/UseObjectMonitorTableTest.java` with and without the patch on PPC and s390x respectively. > Thanks in advance. (And sorry for integrating without better testing of your respective platforms) Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20672#issuecomment-2306336456 From aboldtch at openjdk.org Fri Aug 23 05:50:09 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 23 Aug 2024 05:50:09 GMT Subject: Integrated: 8338810: PPC, s390x: LightweightSynchronizer::exit asserts, missing lock In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 06:20:23 GMT, Axel Boldt-Christmas wrote: > [JDK-8338638](https://bugs.openjdk.org/browse/JDK-8338638) made me realise that PPC and s390x have the same issue. > > The issue is that the C2 unlock path will check if the monitor is inflated after popping of the last entry on the lock stack. With UseObjectMonitorTable (without the the cache lookup implemented), the slow path is incorrectly taken without resting the popped oop. Currently the runtime expects the the lock stack to be consistent (have an entry) in exit if a the monitor is anonymously inflated. > > I'll provide a bandaid fix which pushes back the oop before the calling to the runtime. > > A future enhancement for all platform would be to allow the C2 entry point to redo the push when taking the slow path and it realises that the monitor is anonymously inflated or it is fast locked and the lock stack does not contain the oop. (Removing all the push back logic from the emitted C2 unlock nodes) > > I am unable to test ppc and s390x, so I have not verified that the issue is reproduced nor that this fixes it. > Hopefully @TheRealMDoerr and @offamitkumar can assist me here with running `test/hotspot/jtreg/runtime/Monitor/UseObjectMonitorTableTest.java` with and without the patch on PPC and s390x respectively. > Thanks in advance. (And sorry for integrating without better testing of your respective platforms) This pull request has now been integrated. Changeset: e06652ad Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/e06652ad3c02dfe54104eaa04eaa3d117699b27f Stats: 12 lines in 2 files changed: 10 ins; 0 del; 2 mod 8338810: PPC, s390x: LightweightSynchronizer::exit asserts, missing lock Reviewed-by: mdoerr, amitkumar ------------- PR: https://git.openjdk.org/jdk/pull/20672 From jbhateja at openjdk.org Fri Aug 23 05:58:05 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 23 Aug 2024 05:58:05 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v4] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 23 Aug 2024 05:46:29 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Defaulting to index wrapping scheme. Hi @rose00 , @PaulSandoz , @sviswa7, Latest patch removed explicit wrap argument passed to selectFrom API, instead uses wrapping scheme as a mitigation strategy to handle OOB partially wrapped indexes. Summarizing the new scheme for index wrapping:- - Shuffle always holds indexes in valid vector index range or partially wraps OOB indexes. - Following are the shuffle creation intercepts - VectorShuffle.fromArray - Partially wraps OOB indexes - iotaShuffle - Accepts explicit wrap argument to chooses b/w wrapping vs partial wrapping of OOB indexes. - Vector.toShuffle - Partially wraps OOB indexes. - Partial wrapping generate -ve indexes for OOB indices after wrapping them into valid index range. - Objective is to delegate mitigation strategy to subsequent APIs which can either generate a IndexOutOfBounds exception or create valid index by adding vector length. - An important point to mention here is that partially wrapped indexing schemes first wraps OOB index ( index < 0 OR index >= VECLEN) into valid index range and then subtracts VECLEN from wrapped index to generate a -ve number in [-VECLEN: -1] range. - With new scheme we are choosing wrapping as a default mitigation strategy hence only client which make effective use of a partially wrapped indexes is two vector re-arrange, which uses it to compute the mask for blending two permuted vectors. - Two vector re-arrange and selectFrom API differ in terms of acceptable index range, while former accepts shuffle indices in single vector index range [0:VECLEN) latter operates on two vector index range [0:2*VECLEN). Best Regards, Jatin ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2306344606 From thomas.stuefe at gmail.com Fri Aug 23 05:59:57 2024 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Fri, 23 Aug 2024 07:59:57 +0200 Subject: RFC 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths In-Reply-To: References: Message-ID: Hi David, had a read through the CSR. --- `In addition we tweak the wording of GetStringUTFChars so that it: ... b) references the new GetStringUTFLengthAsLong function instead of the Deprecated GetStringUTFLength` (b) refers to GetStringUTFRegion, or? GetStringUTFChars has no such wording, nor a len argument --- I was initially surprised that we return a fake length from GetStringUTFLength upon overflow instead of a clear error indicator like -1. Now folks will work with potentially truncated strings. Typically those are documents stored in string form, and truncation errors are not obvious. But probably there is no better way: Returning 0 would be an option - it would cause clearer and more immediate data errors (missing document contents). But it can be confused with "have no data" which can be a valid state. Returning -1 is potentially dangerous and can lead to overflows. Returning MAX_INT is not much better than returning up to the last valid encoding, we just get a weird character at the end of the document. Cheers, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbhateja at openjdk.org Fri Aug 23 06:09:48 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 23 Aug 2024 06:09:48 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v5] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <7e5pWnvjqk-dQYNeaZjFzXcd5WlzniZPl5T4l1rKQGE=.0882bcd4-e307-4a29-aa41-5496ee029a60@github.com> > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Removing redundant checkIndex routine ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/d7ad6887..6cb1a46d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=03-04 Stats: 35 lines in 7 files changed: 0 ins; 35 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From david.holmes at oracle.com Fri Aug 23 07:54:32 2024 From: david.holmes at oracle.com (David Holmes) Date: Fri, 23 Aug 2024 17:54:32 +1000 Subject: RFC 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths In-Reply-To: References: Message-ID: <89008d1e-4f05-48a3-8ab3-8333aba5c2ed@oracle.com> Hi Thomas, On 23/08/2024 3:59 pm, Thomas St?fe wrote: > Hi David, > > had a read through the CSR. Thanks for taking a look. > --- > > `In addition we tweak the wording of|GetStringUTFChars|so that it: > ... > b) references the new|GetStringUTFLengthAsLong|function instead of the > Deprecated|GetStringUTFLength`| > | > | > | > (b) refers to GetStringUTFRegion, or? GetStringUTFChars has no such > wording,?nor a len argument Oops thanks - fixed (two different functions tweaked - I misread the diff) > --- > > I was initially surprised that we return a fake length from > GetStringUTFLength upon overflow instead of a clear error indicator like > -1. Now folks will work with potentially truncated strings. Typically > those are documents stored in string form, and truncation errors are not > obvious. But probably there is no better way: > > Returning 0 would be an option - it would cause clearer and more > immediate data errors (missing document contents). But > it?can?be?confused with "have no data" which can be a valid state. > Returning -1 is potentially dangerous and can lead to overflows. > Returning MAX_INT is not much better than returning up to the last valid > encoding, we just get a weird character at the end of the document. Yes all of these possibilities were evaluated when that change was made (not in public unfortunately as it was considered a security issue), and each has its pros and cons. We settled on what seemed the least terrible option - truncation to the length of a valid UTF8 sequence. Thanks, David ----- > Cheers, Thomas > | > > From aph at openjdk.org Fri Aug 23 07:57:05 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 23 Aug 2024 07:57:05 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v5] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: <5Pqrq4T-PezYLvvQ-YKEWMccrDHjNlOyHHoxuwXN1WU=.e693a2a1-fe92-41fc-ae74-345e3ac40313@github.com> On Thu, 22 Aug 2024 15:53:26 GMT, Andrew Haley wrote: > > It's really not important, though. I take that back. It _might_ be important because it enables optimization for short strings. The average string in Java applications is about 32 characters long, so if we moved the block length down to 16 we'd get some benefit for typical java `String`s from this patch. It's not just about making the code compact and elegant, it's about real-world performance too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2306508397 From mli at openjdk.org Fri Aug 23 08:45:38 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 23 Aug 2024 08:45:38 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v4] In-Reply-To: References: Message-ID: > ## Performance > benchmarks run on CanVM-K230 > > data > > Benchmark m2+m1+scalar | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score +intrinsic+rvv | Score -intrinsic | Error | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Base64Decode.testBase64Decode | 0 | 144 | 4 | 1 | avgt | 10 | 97.771 | 98.506 | 0.713 | ns/op | 1.008 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 3 | avgt | 10 | 117.715 | 118.422 | 0.428 | ns/op | 1.006 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 7 | avgt | 10 | 174.625 | 172.767 | 7.671 | ns/op | 0.989 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 32 | avgt | 10 | 286.391 | 317.175 | 11.443 | ns/op | 1.107 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 64 | avgt | 10 | 336.932 | 503.257 | 15.738 | ns/op | 1.494 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 80 | avgt | 10 | 418.894 | 625.485 | 7.21 | ns/op | 1.493 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 96 | avgt | 10 | 353.813 | 698.67 | 15.485 | ns/op | 1.975 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 112 | avgt | 10 | 499.243 | 866.909 | 4.427 | ns/op | 1.736 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 512 | avgt | 10 | 1451.277 | 3530.048 | 3.685 | ns/op | 2.432 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 1000 | avgt | 10 | 2258.785 | 5964.066 | 9.075 | ns/op | 2.64 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 20000 | avgt | 10 | 39689.204 | 122334.929 | 255.195 | ns/op | 3.082 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1 | avgt | 10 | 187.032 | 158.558 | 7.606 | ns/op | 0.848 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 3 | avgt | 10 | 209.558 | 200.774 | 7.648 | ns/op | 0.958 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 7 | avgt | 10 | 556.696 | 505.072 | 8.748 | ns/op | 0.907 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 32 | avgt | 10 | 2139.767 | 1876.825 | 13.787 | ns/op | 0.877 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 64 | avgt | 10 | 6142.353 | 3818.199 | 35.622 | ns/op | 0.622 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 80 | avgt | 10 | 8746.205 | 4787.155 | 109.819 | ns/op | 0.547 > Base64Decode.testBase64MIMEDecode | 0 | ... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20026/files - new: https://git.openjdk.org/jdk/pull/20026/files/b29e927f..d1899de7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20026&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20026&range=02-03 Stats: 4 lines in 1 file changed: 2 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20026.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20026/head:pull/20026 PR: https://git.openjdk.org/jdk/pull/20026 From mli at openjdk.org Fri Aug 23 08:45:38 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 23 Aug 2024 08:45:38 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v3] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 17:40:08 GMT, Hamlin Li wrote: >> Great. Maybe add a comment to that effect here? > > Sure, I can do it later when addressing other comments. Fixed the comments, Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1728606072 From mcimadamore at openjdk.org Fri Aug 23 09:03:32 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Fri, 23 Aug 2024 09:03:32 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v10] In-Reply-To: References: Message-ID: > This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: > > * `System::load` and `System::loadLibrary` are now restricted methods > * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods > * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation > > This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. > > Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. > > Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - Merge branch 'master' into restricted_jni - Merge branch 'master' into restricted_jni - Address review comments - Add note on --illegal-native-access default value in the launcher help - Address review comment - Refine warning text for JNI method binding - Address review comments Improve warning for JNI methods, similar to what's described in JEP 472 Beef up tests - Address review comments - Fix another typo - Fix typo - ... and 3 more: https://git.openjdk.org/jdk/compare/f7ea738c...04622748 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19213/files - new: https://git.openjdk.org/jdk/pull/19213/files/ff51ac6a..04622748 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19213&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19213&range=08-09 Stats: 51278 lines in 1477 files changed: 28775 ins; 15348 del; 7155 mod Patch: https://git.openjdk.org/jdk/pull/19213.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19213/head:pull/19213 PR: https://git.openjdk.org/jdk/pull/19213 From mdoerr at openjdk.org Fri Aug 23 10:02:34 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 23 Aug 2024 10:02:34 GMT Subject: RFR: 8338814: [PPC64] Unify interface of cmpxchg for different types Message-ID: PPC64 code has very complicated cmpxchg functions in MacroAssembler. We should have at least a unified argument list for the different types and the features should be usable with all types. I have also cleaned up the `RegisterOrConstant` functions because they are used by the cmpxchg code. One difference in the argument list still exists: `cmpxchgb` and `cmpxchgh` use extra temp registers to support older processors. They should get removed with [JDK-8331859](https://bugs.openjdk.org/browse/JDK-8331859). ------------- Commit messages: - 8338814: [PPC64] Unify interface of cmpxchg for different types Changes: https://git.openjdk.org/jdk/pull/20689/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20689&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338814 Stats: 123 lines in 9 files changed: 38 ins; 3 del; 82 mod Patch: https://git.openjdk.org/jdk/pull/20689.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20689/head:pull/20689 PR: https://git.openjdk.org/jdk/pull/20689 From ihse at openjdk.org Fri Aug 23 10:07:03 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 23 Aug 2024 10:07:03 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: On Thu, 22 Aug 2024 22:49:50 GMT, David Holmes wrote: > is the final intent here that this one magic file will be compiled first with an inline declaration such that when the other files containing the apparent runtime check get compiled, it can actually be determined at build time and so have the same effects as the old ifdef logic? Theoretically, this is a valid complaint: what is now inlined at compilation time will require an additional function call. And yes, if that had been a performance issue, I would have needed to do something like that. But, if you look at the actual functions that are affected, you can see that it is just a handful of calls that are all done at startup time. Adding like half a dozen calls to a trivial function before loading a DLL will not even be measurable, compared to all the work the OS will do afterwards when loading the DLL. So no, I do not intend to complicate the code further. Any impact of this code is measured in a few additional machine code operations. > There are also other source-level solutions possible here by refactoring the code that has static vs dynamic linking dependencies into its own files and the build system can then select which set of files to compile. There are definitely refactoring/restructuring opportunities to be had, both in Hotspot and in the JDK libraries! Overall, I have found a lot of redundant work, duplicated code, and legacy code that does not make sense anymore (like trying to differentiate between a JRE and a JDK) when setting up the initial environment wrt the basic dynamic libraries. But in the grand scheme of things, I don't think it is reasonable that we spend too much efforts on cleaning up that. While it is definitely a "lava flow" anti-pattern, it mostly works, and starting to poke around will risk breaking things. We don't have a good testing story for the JDK bootstrapping. This is the same problem as the build is facing: you would need to have a ton of differently configured environments to be able to test all possible installations and paths etc. The patch presented here seem to me to be a cautious middle ground -- fixing what is needed to be able to progress, but doing so in a way that every single change is trivially and obviously correct. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20666#issuecomment-2306752185 From aph at openjdk.org Fri Aug 23 10:41:07 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 23 Aug 2024 10:41:07 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v5] In-Reply-To: <5Pqrq4T-PezYLvvQ-YKEWMccrDHjNlOyHHoxuwXN1WU=.e693a2a1-fe92-41fc-ae74-345e3ac40313@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <5Pqrq4T-PezYLvvQ-YKEWMccrDHjNlOyHHoxuwXN1WU=.e693a2a1-fe92-41fc-ae74-345e3ac40313@github.com> Message-ID: On Fri, 23 Aug 2024 07:54:02 GMT, Andrew Haley wrote: > if we moved the block length down to 16 we'd get some benefit for typical java `String`s from this patch. Just to provide some substance, this patch runs at 55.5 cycles for a (Latin1) `String` of 31 chars, and a very creditable 16.5 cycles for 32 chars. If we had 16-wide vectorized hash for byte[] that `String[31].hashCode()` would come down to about 19 cycles, I think. P.S. `String.hashCode()`uses `Arrays.hashCode internally`, for either `char[]` or `byte[]`, depending on whether all of the chars in the string are between 0...127. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2306811750 From stefank at openjdk.org Fri Aug 23 10:56:05 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 23 Aug 2024 10:56:05 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 20:08:43 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix bit counts in GCForwarding I've looked through the changes to the gc/ directory and have a couple of proposal changes. Please have a look: https://github.com/openjdk/jdk/compare/pr/20677...stefank:jdk:lilliput_review_gc_1 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2306834883 From lucy at openjdk.org Fri Aug 23 11:37:04 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 23 Aug 2024 11:37:04 GMT Subject: RFR: 8338814: [PPC64] Unify interface of cmpxchg for different types In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 09:58:21 GMT, Martin Doerr wrote: > PPC64 code has very complicated cmpxchg functions in MacroAssembler. We should have at least a unified argument list for the different types and the features should be usable with all types. > I have also cleaned up the `RegisterOrConstant` functions because they are used by the cmpxchg code. > > One difference in the argument list still exists: `cmpxchgb` and `cmpxchgh` use extra temp registers to support older processors. They should get removed with [JDK-8331859](https://bugs.openjdk.org/browse/JDK-8331859). Changes look good to me. I like uniform interfaces! Thanks for all the tedious work. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20689#pullrequestreview-2257020158 From mdoerr at openjdk.org Fri Aug 23 12:14:04 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 23 Aug 2024 12:14:04 GMT Subject: RFR: 8338814: [PPC64] Unify interface of cmpxchg for different types In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 09:58:21 GMT, Martin Doerr wrote: > PPC64 code has very complicated cmpxchg functions in MacroAssembler. We should have at least a unified argument list for the different types and the features should be usable with all types. > I have also cleaned up the `RegisterOrConstant` functions because they are used by the cmpxchg code. > > One difference in the argument list still exists: `cmpxchgb` and `cmpxchgh` use extra temp registers to support older processors. They should get removed with [JDK-8331859](https://bugs.openjdk.org/browse/JDK-8331859). Thank you for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20689#issuecomment-2306963690 From tonyp at openjdk.org Fri Aug 23 13:26:10 2024 From: tonyp at openjdk.org (Antonios Printezis) Date: Fri, 23 Aug 2024 13:26:10 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v3] In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 08:40:26 GMT, Hamlin Li wrote: >> Sure, I can do it later when addressing other comments. > > Fixed the comments, Thanks! Comment is very helpful, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1728969357 From mdoerr at openjdk.org Fri Aug 23 13:31:06 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 23 Aug 2024 13:31:06 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> Message-ID: On Mon, 19 Aug 2024 14:25:13 GMT, Roberto Casta?eda Lozano wrote: >> If case of heap base != null, a branch already exists which makes the other null check redundant. So, we have null check, region crossing check, another null check. Maybe this compressed oop mode is not important enough. >> >> For the other compressed oop modes, yes, this means moving the null check above the region crossing check. On PPC64, the null check can be combined with the shift instruction, so we save one compare instruction. Technically, it would even be possible to use only one branch instruction for both checks, but I'm not sure if it's worth the complexity. I'll think about it. > > OK, thanks. I just ran some benchmarks with zero-based OOP compression ([prototype here](https://github.com/robcasloz/jdk/tree/JDK-8334060-g1-late-barrier-expansion-x64-optimizations)) and could not observe any significant performance effect on three different x64 implementations. I think I will keep the `g1StoreN` implementation as-is in the x64 and aarch64 backends, for simplicity. Again, we can revisit this in follow-up work if need be. I have an experimental implementation for PPC64. I have moved the oop decoding into `G1BarrierSetAssembler::g1_write_barrier_post_c2`: https://github.com/TheRealMDoerr/jdk/blob/a48598075862f17e7b1cfbec29af4c2431809257/src/hotspot/cpu/ppc/gc/g1/g1BarrierSetAssembler_ppc.cpp#L476 This has 2 advantages: - Reduce replicated code in the .ad file. - Make the discussed optimization easy. Please take a look. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1728978594 From mdoerr at openjdk.org Fri Aug 23 13:36:06 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 23 Aug 2024 13:36:06 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <4aS0KysKaIPue1D60nootRPb8m7pdP_2hlPSwGUIk8w=.6b68b278-62ce-4df0-a6c6-c7fee40557aa@github.com> Message-ID: <6DcMr9PUa8OZEhO861hkhqTMYlKDs96tgPs4Fu1u72I=.9dd2d0d4-9075-4c36-9ab6-ff886e257b4b@github.com> On Mon, 19 Aug 2024 12:20:21 GMT, Roberto Casta?eda Lozano wrote: >> Note that we had such an optimization already in C2: https://github.com/openjdk/jdk8u-dev/blob/4106121e0ae42d644e45c6eab9037874110ed670/hotspot/src/share/vm/opto/library_call.cpp#L3114 >> But, it's probably not a big deal. Maybe I can try it on PPC64 which may be more sensitive to accesses on contended memory. > > Thanks for the reference, I would still prefer to keep this part as is for simplicity. We can always optimize the atomic barriers in follow-up work, if a need arises. After thinking more about this, I figured out that we can optimize more when moving the pre_barrier after the cmpxchg. We can skip all G1 barriers if the cmpxchg fails: https://github.com/TheRealMDoerr/jdk/blob/a48598075862f17e7b1cfbec29af4c2431809257/src/hotspot/cpu/ppc/gc/g1/g1_ppc.ad#L171 This may reduce load on GC queue handling and related work for GC threads. I'm testing this version and I actually like it more than the version I had before. Please take a look. (Note that my final version will need https://github.com/openjdk/jdk/pull/20689 to be integrated and merged into your PR.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1728987173 From stuefe at openjdk.org Fri Aug 23 15:23:06 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 23 Aug 2024 15:23:06 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace In-Reply-To: References: Message-ID: On Thu, 9 May 2024 13:51:09 GMT, Coleen Phillimore wrote: > This change stores InstanceKlass for interface and abstract classes in the non-class metaspace, since class metaspace will have limits on number of classes that can be represented when Lilliput changes go in. Classes that have no instances created for them don't require compressed class pointers. The generated LambdaForm classes are also AllStatic, and changing them to abstract moves them to non-class metaspace too. It's not technically great to make them abstract and not final but you can't have both. Java classfile access flags have no way of specifying something like AllStatic. > > Tested with tier1-8. Hi Coleen, IIUC, the new "is_in_klass_space" function that now is present in all Metadata children only exists because of the template function in MetadataFactory, right? Just for the purpose of deallocation? If so, see this proposed addition to your patch: https://gist.github.com/tstuefe/5111c735b12f6d9c3c1d32699d0820f6 This would make Metaspace::deallocate smarter - Metaspace knows whether a given pointer is in class space or not, it can do automatically the right thing. There should be no need to tell it how to deallocate that storage. (If you are worried, in debug builds there are also sanity checks). If you do this, I think you could remove all variants of "is_in_klass_space" apart from the one in Klass. src/hotspot/share/memory/allocation.hpp line 319: > 317: f(SharedClassPathEntry) \ > 318: f(RecordComponent) \ > 319: f(AbstractClass) This is a minor nit: I assume this new constant is just there to steer allocation down in metaspace away from Class space? This breaks a bit with established pattern, because the type `Klass::type()` returns is still "ClassType", so this new constant never really appears anywhere. Its only point is "its not classtype". You could probably hand down any other constant to Metaspace::allocate, as long as its not `ClassType`. What we should eventually do, but in a follow up RFE: modify `Metaspace::allocate` to replace the `MetaspaceObj::Type` parameter with a `MetadataType mdType` parameter. Or a plain "allocate_in_classspace_please" boolean parameter. Because Metaspace::allocate does not really need to know the type of the metadata object. It just needs to know if the caller insists on having the data in class space. But lets do this in a follow-up. src/hotspot/share/oops/instanceKlass.cpp line 456: > 454: > 455: InstanceKlass* ik; > 456: MetaspaceObj::Type type = (parser.is_interface() || parser.is_abstract()) ? MetaspaceObj::AbstractClassType : MetaspaceObj::ClassType; small nit, const or constexpr? src/hotspot/share/oops/klass.hpp line 205: > 203: > 204: void* operator new(size_t size, ClassLoaderData* loader_data, size_t word_size, TRAPS) throw(); > 205: Oh ArrayKlass never used this? Its good to move it to InstanceKlass. src/hotspot/share/oops/klass.hpp line 214: > 212: virtual bool is_klass() const { return true; } > 213: > 214: bool is_in_klass_space() const { return !is_interface() && !is_abstract(); } This name is misleading. As a caller, I expect a function with this name to make a range check of Klass* to be inside class space range. This is more of a "should be". (We also write class space with `c` throughout hotspot, its weird to have it with `k` now) How about, instead: "needs_narrow_klass_id" or "must_be_narrow_encodable" or similar? That clearly says what you want, that this class needs to be encodable with a narrow id for whatever is your reason. This leaves room for future changes (e.g. a possible future where we need narrow klass ids for other reasons than to make heap objects smaller, or where there is no class space anymore). ------------- PR Review: https://git.openjdk.org/jdk/pull/19157#pullrequestreview-2256433389 PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1728452223 PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1728421710 PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1728421197 PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1728417646 From lmesnik at openjdk.org Fri Aug 23 16:37:05 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 23 Aug 2024 16:37:05 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 20:08:43 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix bit counts in GCForwarding Changes requested by lmesnik (Reviewer). make/Images.gmk line 135: > 133: # > 134: # Param1 - VM variant (e.g., server, client, zero, ...) > 135: # Param2 - _nocoops, _coh, _nocoops_coh, or empty The -XX:+UseCompactObjectHeaders ssems to incompatible withe zero vm. The zero vm build start failing while generating shared archive with +UseCompactObjectHeaders. Generation should be disabled by default for zero to don't break the build. ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2257621775 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1729222671 From mli at openjdk.org Fri Aug 23 18:49:05 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 23 Aug 2024 18:49:05 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 20:08:43 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix bit counts in GCForwarding src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp line 170: > 168: mv(tmp1, (int32_t)(intptr_t)markWord::prototype().value()); > 169: sd(tmp1, Address(obj, oopDesc::mark_offset_in_bytes())); > 170: // Todo UseCompactObjectHeaders Can I ask, will this pr fullly support riscv? src/hotspot/share/oops/oop.inline.hpp line 94: > 92: > 93: void oopDesc::init_mark() { > 94: if (UseCompactObjectHeaders) { Seems only `set_mark(prototype_mark());` is fine for both cases? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1729383247 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1728833750 From lmesnik at openjdk.org Fri Aug 23 19:06:04 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 23 Aug 2024 19:06:04 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 20:08:43 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix bit counts in GCForwarding test/hotspot/jtreg/runtime/cds/appcds/TestZGCWithCDS.java line 59: > 57: public static void main(String... args) throws Exception { > 58: String zGenerational = args[0]; > 59: String compactHeaders = "-XX:" + (zGenerational.equals("-XX:+ZGenerational") ? "+" : "-") + "UseCompactObjectHeaders"; The test failing with stdout: [[0.176s][info][cds] trying to map /opt/mach5/mesos/work_dir/slaves/a20696e7-ae7d-4d37-8e9c-83f99ef002cb-S2261/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/f0801999-993f-4e08-b017-08b33a8ec44f/runs/34cc555e-ae8f-4a48-8175-e998194f204b/testoutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_cds_relocation/scratch/5/appcds-18h50m16s773.jsa [0.176s][info][cds] Opened archive /opt/mach5/mesos/work_dir/slaves/a20696e7-ae7d-4d37-8e9c-83f99ef002cb-S2261/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/f0801999-993f-4e08-b017-08b33a8ec44f/runs/34cc555e-ae8f-4a48-8175-e998194f204b/testoutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_cds_relocation/scratch/5/appcds-18h50m16s773.jsa. [0.176s][info][cds] Archive was created with UseCompressedOops = 0, UseCompressedClassPointers = 1 [0.176s][info][cds] The shared archive file's UseCompactObjectHeaders setting (enabled) does not equal the current UseCompactObjectHeaders setting (disabled). [0.176s][info][cds] Initialize static archive failed. [0.176s][info][cds] Unable to map shared spaces [0.176s][error][cds] An error has occurred while processing the shared archive file. [0.176s][error][cds] Unable to map shared spaces Error occurred during initialization of VM Unable to use shared archive. ]; stderr: [] exitValue = 1 java.lang.RuntimeException: 'Hello World' missing from stdout/stderr at jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:252) at TestZGCWithCDS.main(TestZGCWithCDS.java:123) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:573) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) at java.base/java.lang.Thread.run(Thread.java:1575) JavaTest Message: Test threw exception: java.lang.RuntimeException JavaTest Message: shutting down test ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1729404477 From coleenp at openjdk.org Fri Aug 23 19:29:06 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 23 Aug 2024 19:29:06 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace In-Reply-To: References: Message-ID: On Thu, 9 May 2024 13:51:09 GMT, Coleen Phillimore wrote: > This change stores InstanceKlass for interface and abstract classes in the non-class metaspace, since class metaspace will have limits on number of classes that can be represented when Lilliput changes go in. Classes that have no instances created for them don't require compressed class pointers. The generated LambdaForm classes are also AllStatic, and changing them to abstract moves them to non-class metaspace too. It's not technically great to make them abstract and not final but you can't have both. Java classfile access flags have no way of specifying something like AllStatic. > > Tested with tier1-8. Yes, is_in_klass_space was just to direct where to deallocate the metaspace pointer. In your patch isn't the contains metaspace call still very slow? Or I suppose for class space, it's not because it's a fixed space. But it's not an inlined call at all because I had to search in cpp files for the range check. + const bool is_class = Metaspace::contains_in_class_space(ptr); I sort of think it might be better for the outside runtime code to control this and the metaspace call assert if its wrong. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19157#issuecomment-2307688050 From coleenp at openjdk.org Fri Aug 23 20:46:39 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 23 Aug 2024 20:46:39 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v2] In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 20:43:52 GMT, Coleen Phillimore wrote: >> This change stores InstanceKlass for interface and abstract classes in the non-class metaspace, since class metaspace will have limits on number of classes that can be represented when Lilliput changes go in. Classes that have no instances created for them don't require compressed class pointers. The generated LambdaForm classes are also AllStatic, and changing them to abstract moves them to non-class metaspace too. It's not technically great to make them abstract and not final but you can't have both. Java classfile access flags have no way of specifying something like AllStatic. >> >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Incorporated a set of Thomas Stuefe's comments. Take out AbstractClass MetaspaceObj::Type. Thanks for reviewing this and your comments Thomas. ------------- PR Review: https://git.openjdk.org/jdk/pull/19157#pullrequestreview-2258059047 From coleenp at openjdk.org Fri Aug 23 20:46:40 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 23 Aug 2024 20:46:40 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v2] In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 06:43:24 GMT, Thomas Stuefe wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Incorporated a set of Thomas Stuefe's comments. Take out AbstractClass MetaspaceObj::Type. > > src/hotspot/share/memory/allocation.hpp line 319: > >> 317: f(SharedClassPathEntry) \ >> 318: f(RecordComponent) \ >> 319: f(AbstractClass) > > This is a minor nit: I assume this new constant is just there to steer allocation down in metaspace away from Class space? This breaks a bit with established pattern, because the type `Klass::type()` returns is still "ClassType", so this new constant never really appears anywhere. Its only point is "its not classtype". You could probably hand down any other constant to Metaspace::allocate, as long as its not `ClassType`. > > What we should eventually do, but in a follow up RFE: modify `Metaspace::allocate` to replace the `MetaspaceObj::Type` parameter with a `MetadataType mdType` parameter. Or a plain "allocate_in_classspace_please" boolean parameter. Because Metaspace::allocate does not really need to know the type of the metadata object. It just needs to know if the caller insists on having the data in class space. > > But lets do this in a follow-up. Yes, this is a bit inconsistent. I like your suggestion for an improvement. I was going to try to do this in this patch but the MetaspaceObj::Type is used for the report_metadata_oom event, so it's still needed in metaspace::allocate. Which is unfortunate because I always get these two MetadataType and MetaspaceObj::Type things confused. > src/hotspot/share/oops/instanceKlass.cpp line 456: > >> 454: >> 455: InstanceKlass* ik; >> 456: MetaspaceObj::Type type = (parser.is_interface() || parser.is_abstract()) ? MetaspaceObj::AbstractClassType : MetaspaceObj::ClassType; > > small nit, const or constexpr? I changed this line now (and used const for bool). > src/hotspot/share/oops/klass.hpp line 214: > >> 212: virtual bool is_klass() const { return true; } >> 213: >> 214: bool is_in_klass_space() const { return !is_interface() && !is_abstract(); } > > This name is misleading. As a caller, I expect a function with this name to make a range check of Klass* to be inside class space range. This is more of a "should be". > > (We also write class space with `c` throughout hotspot, its weird to have it with `k` now) > > How about, instead: "needs_narrow_klass_id" or "must_be_narrow_encodable" or similar? That clearly says what you want, that this class needs to be encodable with a narrow id for whatever is your reason. This leaves room for future changes (e.g. a possible future where we need narrow klass ids for other reasons than to make heap objects smaller, or where there is no class space anymore). I renamed this is_in_class_space() with the lower case 'c'. It's still directing metaspace or indicating where the object was allocated. Your name is a little better but I think not enough until we want to expand the things we want allocated in the class space. As we talked about, with Tiny Class Pointers, class space will have different things in it (not that these new things need a compressed pointers). But I think we're better off having less things in the space where their pointers can be compressed since this space is constrained. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1729487783 PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1729488157 PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1729490999 From coleenp at openjdk.org Fri Aug 23 20:46:40 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 23 Aug 2024 20:46:40 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v2] In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 20:37:55 GMT, Coleen Phillimore wrote: >> src/hotspot/share/oops/klass.hpp line 214: >> >>> 212: virtual bool is_klass() const { return true; } >>> 213: >>> 214: bool is_in_klass_space() const { return !is_interface() && !is_abstract(); } >> >> This name is misleading. As a caller, I expect a function with this name to make a range check of Klass* to be inside class space range. This is more of a "should be". >> >> (We also write class space with `c` throughout hotspot, its weird to have it with `k` now) >> >> How about, instead: "needs_narrow_klass_id" or "must_be_narrow_encodable" or similar? That clearly says what you want, that this class needs to be encodable with a narrow id for whatever is your reason. This leaves room for future changes (e.g. a possible future where we need narrow klass ids for other reasons than to make heap objects smaller, or where there is no class space anymore). > > I renamed this is_in_class_space() with the lower case 'c'. It's still directing metaspace or indicating where the object was allocated. Your name is a little better but I think not enough until we want to expand the things we want allocated in the class space. As we talked about, with Tiny Class Pointers, class space will have different things in it (not that these new things need a compressed pointers). But I think we're better off having less things in the space where their pointers can be compressed since this space is constrained. I have to think about this more. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1729492034 From coleenp at openjdk.org Fri Aug 23 20:46:39 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 23 Aug 2024 20:46:39 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v2] In-Reply-To: References: Message-ID: > This change stores InstanceKlass for interface and abstract classes in the non-class metaspace, since class metaspace will have limits on number of classes that can be represented when Lilliput changes go in. Classes that have no instances created for them don't require compressed class pointers. The generated LambdaForm classes are also AllStatic, and changing them to abstract moves them to non-class metaspace too. It's not technically great to make them abstract and not final but you can't have both. Java classfile access flags have no way of specifying something like AllStatic. > > Tested with tier1-8. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Incorporated a set of Thomas Stuefe's comments. Take out AbstractClass MetaspaceObj::Type. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19157/files - new: https://git.openjdk.org/jdk/pull/19157/files/da077055..c58278a5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19157&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19157&range=00-01 Stats: 37 lines in 16 files changed: 3 ins; 1 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/19157.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19157/head:pull/19157 PR: https://git.openjdk.org/jdk/pull/19157 From psandoz at openjdk.org Fri Aug 23 22:33:09 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 23 Aug 2024 22:33:09 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v5] In-Reply-To: <7e5pWnvjqk-dQYNeaZjFzXcd5WlzniZPl5T4l1rKQGE=.0882bcd4-e307-4a29-aa41-5496ee029a60@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7e5pWnvjqk-dQYNeaZjFzXcd5WlzniZPl5T4l1rKQGE=.0882bcd4-e307-4a29-aa41-5496ee029a60@github.com> Message-ID: <2_P1qPMS46tgh4RUSuitcjXYnd0koS_BxfRRRmj79EY=.c3baeeaa-87f7-47d4-bc70-ae2afd9de745@github.com> On Fri, 23 Aug 2024 06:09:48 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Removing redundant checkIndex routine API changes look good. (Note at the moment we are not proposing to change how shuffles works - as you point out the two vector `selectFrom` and `rearrange` differ in the index representation.) IIUC if the more direct two-table instruction is not available you fall back to calling two single arg rearranges with a blend, as a lowering transformation, similar to the fallback Java expression. The float/double conversion bothers me, not suggesting we do something about it here, noting down for any future conversation on shuffles. Ideally we would want the equivalent integral vector (int or long) to represent the index, tricky to express in the API, or alternative treat as a bitwise no-op conversion (there is also impact on `toShuffle` too). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2307886044 From sviswanathan at openjdk.org Fri Aug 23 23:33:14 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 23 Aug 2024 23:33:14 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v4] In-Reply-To: References: Message-ID: On Mon, 19 Aug 2024 07:19:30 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions. src/java.base/share/classes/java/lang/Byte.java line 647: > 645: */ > 646: public static byte subSaturating(byte a, byte b) { > 647: byte res = (byte)(a - b); Could we not do subSaturating as an int operation on similar lines as addSaturating? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1729570756 From fyang at openjdk.org Sat Aug 24 07:59:06 2024 From: fyang at openjdk.org (Fei Yang) Date: Sat, 24 Aug 2024 07:59:06 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v4] In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 08:45:38 GMT, Hamlin Li wrote: >> ## Performance >> benchmarks run on CanVM-K230 >> >> data >> >> Benchmark m2+m1+scalar | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score +intrinsic+rvv | Score -intrinsic | Error | Units | Improvement >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 1 | avgt | 10 | 97.771 | 98.506 | 0.713 | ns/op | 1.008 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 3 | avgt | 10 | 117.715 | 118.422 | 0.428 | ns/op | 1.006 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 7 | avgt | 10 | 174.625 | 172.767 | 7.671 | ns/op | 0.989 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 32 | avgt | 10 | 286.391 | 317.175 | 11.443 | ns/op | 1.107 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 64 | avgt | 10 | 336.932 | 503.257 | 15.738 | ns/op | 1.494 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 80 | avgt | 10 | 418.894 | 625.485 | 7.21 | ns/op | 1.493 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 96 | avgt | 10 | 353.813 | 698.67 | 15.485 | ns/op | 1.975 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 112 | avgt | 10 | 499.243 | 866.909 | 4.427 | ns/op | 1.736 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 512 | avgt | 10 | 1451.277 | 3530.048 | 3.685 | ns/op | 2.432 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 1000 | avgt | 10 | 2258.785 | 5964.066 | 9.075 | ns/op | 2.64 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 20000 | avgt | 10 | 39689.204 | 122334.929 | 255.195 | ns/op | 3.082 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1 | avgt | 10 | 187.032 | 158.558 | 7.606 | ns/op | 0.848 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 3 | avgt | 10 | 209.558 | 200.774 | 7.648 | ns/op | 0.958 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 7 | avgt | 10 | 556.696 | 505.072 | 8.748 | ns/op | 0.907 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 32 | avgt | 10 | 2139.767 | 1876.825 | 13.787 | ns/op | 0.877 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 64 | avgt | 10 | 6142.353 | 3818.199 | 35.622 | ns/op | 0.622 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 80 | avgt | 10 | 8746.205 | 4787.155 | 109.819 | ns/op ... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > comments src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5397: > 5395: * c_rarg4 - dp, dst start offset > 5396: * c_rarg5 - isURL, Base64 or URL character set > 5397: * c_rarg6 - isMIME, Decoding MIME block - unused here Seems "unused here" in the code comment is not accurate? As I see `isMIME` is checked in the code. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5460: > 5458: > 5459: // passed in length (send - soff) is guaranteed to be > 4, > 5460: // and in this intrinsic we only processe data of length in multiple of 4, Nit: s/processe/process/ src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5476: > 5474: __ BIND(ProcessData); > 5475: > 5476: Nit: I think a single empty line is enough here. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5492: > 5490: } > 5491: > 5492: __ BIND(ScalarLoop); Why not move this `ScalarLoop` body to immediately before `Exit`? Seems to me that we will have a more clear separation of scalar code and vector code and a more simpler control flow then. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5539: > 5537: } > 5538: > 5539: Similar here. A single empty line is enough. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1728640550 PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1728628017 PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1729789588 PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1729794073 PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1729789889 From fyang at openjdk.org Sat Aug 24 08:35:07 2024 From: fyang at openjdk.org (Fei Yang) Date: Sat, 24 Aug 2024 08:35:07 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v4] In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 08:45:38 GMT, Hamlin Li wrote: >> ## Performance >> benchmarks run on CanVM-K230 >> >> data >> >> Benchmark m2+m1+scalar | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score +intrinsic+rvv | Score -intrinsic | Error | Units | Improvement >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 1 | avgt | 10 | 97.771 | 98.506 | 0.713 | ns/op | 1.008 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 3 | avgt | 10 | 117.715 | 118.422 | 0.428 | ns/op | 1.006 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 7 | avgt | 10 | 174.625 | 172.767 | 7.671 | ns/op | 0.989 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 32 | avgt | 10 | 286.391 | 317.175 | 11.443 | ns/op | 1.107 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 64 | avgt | 10 | 336.932 | 503.257 | 15.738 | ns/op | 1.494 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 80 | avgt | 10 | 418.894 | 625.485 | 7.21 | ns/op | 1.493 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 96 | avgt | 10 | 353.813 | 698.67 | 15.485 | ns/op | 1.975 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 112 | avgt | 10 | 499.243 | 866.909 | 4.427 | ns/op | 1.736 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 512 | avgt | 10 | 1451.277 | 3530.048 | 3.685 | ns/op | 2.432 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 1000 | avgt | 10 | 2258.785 | 5964.066 | 9.075 | ns/op | 2.64 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 20000 | avgt | 10 | 39689.204 | 122334.929 | 255.195 | ns/op | 3.082 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1 | avgt | 10 | 187.032 | 158.558 | 7.606 | ns/op | 0.848 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 3 | avgt | 10 | 209.558 | 200.774 | 7.648 | ns/op | 0.958 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 7 | avgt | 10 | 556.696 | 505.072 | 8.748 | ns/op | 0.907 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 32 | avgt | 10 | 2139.767 | 1876.825 | 13.787 | ns/op | 0.877 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 64 | avgt | 10 | 6142.353 | 3818.199 | 35.622 | ns/op | 0.622 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 80 | avgt | 10 | 8746.205 | 4787.155 | 109.819 | ns/op ... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > comments src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5517: > 5515: __ orr(byte0, byte0, byte1); > 5516: __ orr(byte0, byte0, byte3); > 5517: __ slliw(byte2, byte2, 6); Is this correct to shift left and modify `byte0` - `byte3` before their original value are OR-ed into `combined32Bits`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1729807905 From aph at openjdk.org Sat Aug 24 08:57:06 2024 From: aph at openjdk.org (Andrew Haley) Date: Sat, 24 Aug 2024 08:57:06 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v5] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: <5_fQVPqr44djn4RFKw9p_c34z3vncTFouoMqNK5jiEY=.b48ed049-6485-41f5-81b0-1c969cb4bb77@github.com> On Wed, 21 Aug 2024 16:11:25 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: > > cleanup: use a constexpr function for intpow instead of a templated class Thinking some more, another problem which makes this PR inefficient with typical-length strings is the serial `ldrb; madd` iteration which handles the tail. Suggestions: - Maybe replace the serial tail-handling iteration with the 4-wide vectorized version which you presented earlier. - Maybe get rid of tail handling altogether. Instead, pad the first block with zeroes at the left, performing the calculation over a whole number of blocks. It's still worth reducing the block size from 32 to 16 for byte arrays. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2308224466 From stuefe at openjdk.org Sat Aug 24 10:56:06 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 24 Aug 2024 10:56:06 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v2] In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 20:46:39 GMT, Coleen Phillimore wrote: >> This change stores InstanceKlass for interface and abstract classes in the non-class metaspace, since class metaspace will have limits on number of classes that can be represented when Lilliput changes go in. Classes that have no instances created for them don't require compressed class pointers. The generated LambdaForm classes are also AllStatic, and changing them to abstract moves them to non-class metaspace too. It's not technically great to make them abstract and not final but you can't have both. Java classfile access flags have no way of specifying something like AllStatic. >> >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Incorporated a set of Thomas Stuefe's comments. Take out AbstractClass MetaspaceObj::Type. > I renamed this is_in_class_space() with the lower case 'c'. It's still directing metaspace or indicating where the object was allocated. Your name is a little better but I think not enough until we want to expand the things we want allocated in the class space. As we talked about, with Tiny Class Pointers, class space will have different things in it (not that these new things need a compressed pointers). But I think we're better off having less things in the space where their pointers can be compressed since this space is constrained. How about "needs_class_space" then? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19157#issuecomment-2308353198 From stuefe at openjdk.org Sat Aug 24 10:56:05 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 24 Aug 2024 10:56:05 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 19:26:46 GMT, Coleen Phillimore wrote: > Yes, is_in_klass_space was just to direct where to deallocate the metaspace pointer. In your patch isn't the contains metaspace call still very slow? Or I suppose for class space, it's not because it's a fixed space. But it's not an inlined call at all because I had to search in cpp files for the range check. > > * const bool is_class = Metaspace::contains_in_class_space(ptr); > > I sort of think it might be better for the outside runtime code to control this and the metaspace call assert if its wrong. No, I think my way is better and it will be needed anyway for TinyCP/Lilliput. We only need to do two address comparisons, that should be simple and fast. I opened a PR to separate the change, and in that PR I also inline the check. https://github.com/openjdk/jdk/pull/20701 I don't think the costs for two address comparisons matter, not with the comparatively few deallocations that happen (few hundreds or few thousand). If deallocate is hot, we are using metaspace wrong. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19157#issuecomment-2308352940 From fyang at openjdk.org Sat Aug 24 13:02:10 2024 From: fyang at openjdk.org (Fei Yang) Date: Sat, 24 Aug 2024 13:02:10 GMT Subject: RFR: 8338539: New Object to ObjectMonitor mapping: riscv64 implementation [v2] In-Reply-To: <41Gz085AoCNBu48scw1l0qoN7aHibe1e0u1mxtBpTZY=.6231d6a0-5b92-42f9-a835-65d247c1b7fb@github.com> References: <41Gz085AoCNBu48scw1l0qoN7aHibe1e0u1mxtBpTZY=.6231d6a0-5b92-42f9-a835-65d247c1b7fb@github.com> Message-ID: On Tue, 20 Aug 2024 12:37:58 GMT, Gui Cao wrote: > > Hey, not functional review yet. > > But this code have been patched to many times "let me just add this". We shadow tmp1 register with tmp1_mark, then we shadow tmp1 with tmp1_monitor. And similar for other tmp registers. > > If we create two methods, one for "{ // Lightweight locking" and one for "{ // Handle inflated monitor." this code will be so much better. > > If you are not up for the task I can do it. > > (your patch actually slightly improves this, so I'm not saying it's your doing) > > Hi, Thanks for having a look! Your suggestion makes sense to me. But I feel that it's better to go with another PR. Will leave it for you :-) Hi, Please keep the current shape for a while as I am trying to sync regularly with latest loom fiber changes which touches this part. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20621#issuecomment-2308386092 From fyang at openjdk.org Sat Aug 24 14:56:03 2024 From: fyang at openjdk.org (Fei Yang) Date: Sat, 24 Aug 2024 14:56:03 GMT Subject: RFR: 8338727: RISC-V: Avoid synthetic data dependency in nmethod barrier on Ztso In-Reply-To: References: Message-ID: On Wed, 21 Aug 2024 10:01:21 GMT, Robbin Ehn wrote: > Hi please consider, > > On TSO we don't need the synthetic data dependency in between the loads. > Also added some comment about this. > > Sanity tested This change looks fine. But I have a question about the code comment. src/hotspot/cpu/riscv/gc/shared/barrierSetAssembler_riscv.cpp line 281: > 279: // Embed an synthetic data dependency to order the guard load > 280: // before the epoch load. (xor + add is standard way) > 281: // Note: This may be slower than using a membar(load|load) (fence r,r). But the RV ISA spec says that this is lightweight ordering mechanism compared with a FENCE R, R. Here is what I read from the spec: Like other modern memory models, the RVWMO memory model uses syntactic rather than semantic dependencies. In other words, this definition depends on the identities of the registers being accessed by different instructions, not the actual contents of those registers. This means that an address, control, or data dependency must be enforced even if the calculation could seemingly be ?optimized away?. This choice ensures that RVWMO remains compatible with code that uses these false syntactic dependencies as a lightweight ordering mechanism. ld a1,0(s0) xor a2,a1,a1 add s1,s1,a2 ld a5,0(s1) Figure A.10: A syntactic address dependency For example, there is a syntactic address dependency from the memory operation generated by the first instruction to the memory operation generated by the last instruction in Figure A.10, even though a1 XOR a1 is zero and hence has no effect on the address accessed by the second load. The benefit of using dependencies as a lightweight synchronization mechanism is that the ordering enforcement requirement is limited only to the specific two instructions in question. Other non-dependent instructions may be freely reordered by aggressive implementations. One alternative would be to use a load-acquire, but this would enforce ordering for the first load with respect to all subsequent instructions. Another would be to use a FENCE R,R, but this would include all previous and all subsequent loads, making this option more expensive ------------- PR Review: https://git.openjdk.org/jdk/pull/20661#pullrequestreview-2258753836 PR Review Comment: https://git.openjdk.org/jdk/pull/20661#discussion_r1730010163 From aph at openjdk.org Sun Aug 25 08:36:09 2024 From: aph at openjdk.org (Andrew Haley) Date: Sun, 25 Aug 2024 08:36:09 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v5] In-Reply-To: <5_fQVPqr44djn4RFKw9p_c34z3vncTFouoMqNK5jiEY=.b48ed049-6485-41f5-81b0-1c969cb4bb77@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <5_fQVPqr44djn4RFKw9p_c34z3vncTFouoMqNK5jiEY=.b48ed049-6485-41f5-81b0-1c969cb4bb77@github.com> Message-ID: <5vu58p8MnayNro1Af65GeRTaxULfPslsXpfjm4XG07g=.0c1fa790-528c-46b1-b08e-10fcf4fdf44c@github.com> On Sat, 24 Aug 2024 08:54:47 GMT, Andrew Haley wrote: > * Maybe get rid of tail handling altogether. Instead, pad the first block with zeroes at the left, performing the calculation over a whole number of blocks. Upon mature consideration, while this would work, it's probably too fiddly to do for small blocks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2308732988 From rehn at openjdk.org Sun Aug 25 12:23:07 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Sun, 25 Aug 2024 12:23:07 GMT Subject: RFR: 8338727: RISC-V: Avoid synthetic data dependency in nmethod barrier on Ztso In-Reply-To: References: Message-ID: On Sat, 24 Aug 2024 14:50:17 GMT, Fei Yang wrote: >> Hi please consider, >> >> On TSO we don't need the synthetic data dependency in between the loads. >> Also added some comment about this. >> >> Sanity tested > > src/hotspot/cpu/riscv/gc/shared/barrierSetAssembler_riscv.cpp line 281: > >> 279: // Embed an synthetic data dependency to order the guard load >> 280: // before the epoch load. (xor + add is standard way) >> 281: // Note: This may be slower than using a membar(load|load) (fence r,r). > > But the RV ISA spec says that this is lightweight ordering mechanism compared with a FENCE R, R. > Here is what I read from the spec: > > Like other modern memory models, the RVWMO memory model uses syntactic rather than semantic dependencies. > In other words, this definition depends on the identities of the registers being accessed by different instructions, > not the actual contents of those registers. This means that an address, control, or data dependency must be enforced > even if the calculation could seemingly be ?optimized away?. This choice ensures that RVWMO remains compatible > with code that uses these false syntactic dependencies as a lightweight ordering mechanism. > > ld a1,0(s0) > xor a2,a1,a1 > add s1,s1,a2 > ld a5,0(s1) > > Figure A.10: A syntactic address dependency > > For example, there is a syntactic address dependency from the memory operation generated by the > first instruction to the memory operation generated by the last instruction in Figure A.10, even > though a1 XOR a1 is zero and hence has no effect on the address accessed by the second load. > The benefit of using dependencies as a lightweight synchronization mechanism is that the ordering > enforcement requirement is limited only to the specific two instructions in question. > Other non-dependent instructions may be freely reordered by aggressive implementations. > One alternative would be to use a load-acquire, but this would enforce ordering for the first load > with respect to all subsequent instructions. Another would be to use a FENCE R,R, but this would > include all previous and all subsequent loads, making this option more expensive Not sure what you mean, but there is no contradiction here. Manual says: load guard epoch data dep> load epoch load thread_gurad_epoch //unaffected by data dep. In RVWMO load thread_gurad_epoch is uneffected yes, and can be loaded eariler, yes. But we branch on the value of epoch (plus guard), delaying the load of epoch more than neccessary means we delay the branch instruction. As that branch have a control dependency it stop the all following instructions: Control dependencies behave differently from address and data dependencies in the sense that a control dependency always extends to all instructions following the original target in program order. Which means the main goal is get throught the branch as quick as possible. My comment says delaying the load of epoch in favour of loading thread_gurad_epoch eariler may be slower. I have not look to deep but it seems like we can also move the load of thread_gurad_epoch before data dep? (i.e. before any such fence r,r) As the load of guard and load epoch cannot overlap, they happen sequentially (due to data dep). `fence r,r` only says the load will happen in global memory order, it do not force them to be sequential. As we are going very close to CPU implmentation here there maybe differences. So the point of the comment was, maybe revisit this in a few years. Thanks! I'll wait until we can agree if this is a good comment or if it should be in other wording. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20661#discussion_r1730325122 From rehn at openjdk.org Sun Aug 25 12:29:06 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Sun, 25 Aug 2024 12:29:06 GMT Subject: RFR: 8338539: New Object to ObjectMonitor mapping: riscv64 implementation [v2] In-Reply-To: References: <41Gz085AoCNBu48scw1l0qoN7aHibe1e0u1mxtBpTZY=.6231d6a0-5b92-42f9-a835-65d247c1b7fb@github.com> Message-ID: On Sat, 24 Aug 2024 12:59:06 GMT, Fei Yang wrote: > Hi, Please keep the current shape for a while as I am trying to sync regularly with latest loom fiber changes which touches this part. Okey, I have a big patch :) It turns out we can reuse same code in C1 and C2 with some minor changes. I'll hold off. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20621#issuecomment-2308812400 From alanb at openjdk.org Sun Aug 25 15:00:03 2024 From: alanb at openjdk.org (Alan Bateman) Date: Sun, 25 Aug 2024 15:00:03 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: On Fri, 23 Aug 2024 10:04:42 GMT, Magnus Ihse Bursie wrote: > But, if you look at the actual functions that are affected, you can see that it is just a handful of calls that are all done at startup time. That is true for now but there 30-50 other places that will need attention once this effort is further along. Everywhere that deals with user editable configuration (conf tree) will change, as will everywhere that reads JDK internal files in the lib directory. It will be possible to abstract some of this but just to say that it a lot of it will be in Java code rather than native code. Also there will be specifically changes that goes with some of this, something for later if there is JEP that proposes builds to produce a static image. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20666#issuecomment-2308884916 From mli at openjdk.org Sun Aug 25 22:21:16 2024 From: mli at openjdk.org (Hamlin Li) Date: Sun, 25 Aug 2024 22:21:16 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v5] In-Reply-To: References: Message-ID: > ## Performance > benchmarks run on CanVM-K230 > > data > > Benchmark m2+m1+scalar | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score +intrinsic+rvv | Score -intrinsic | Error | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Base64Decode.testBase64Decode | 0 | 144 | 4 | 1 | avgt | 10 | 97.771 | 98.506 | 0.713 | ns/op | 1.008 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 3 | avgt | 10 | 117.715 | 118.422 | 0.428 | ns/op | 1.006 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 7 | avgt | 10 | 174.625 | 172.767 | 7.671 | ns/op | 0.989 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 32 | avgt | 10 | 286.391 | 317.175 | 11.443 | ns/op | 1.107 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 64 | avgt | 10 | 336.932 | 503.257 | 15.738 | ns/op | 1.494 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 80 | avgt | 10 | 418.894 | 625.485 | 7.21 | ns/op | 1.493 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 96 | avgt | 10 | 353.813 | 698.67 | 15.485 | ns/op | 1.975 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 112 | avgt | 10 | 499.243 | 866.909 | 4.427 | ns/op | 1.736 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 512 | avgt | 10 | 1451.277 | 3530.048 | 3.685 | ns/op | 2.432 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 1000 | avgt | 10 | 2258.785 | 5964.066 | 9.075 | ns/op | 2.64 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 20000 | avgt | 10 | 39689.204 | 122334.929 | 255.195 | ns/op | 3.082 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1 | avgt | 10 | 187.032 | 158.558 | 7.606 | ns/op | 0.848 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 3 | avgt | 10 | 209.558 | 200.774 | 7.648 | ns/op | 0.958 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 7 | avgt | 10 | 556.696 | 505.072 | 8.748 | ns/op | 0.907 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 32 | avgt | 10 | 2139.767 | 1876.825 | 13.787 | ns/op | 0.877 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 64 | avgt | 10 | 6142.353 | 3818.199 | 35.622 | ns/op | 0.622 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 80 | avgt | 10 | 8746.205 | 4787.155 | 109.819 | ns/op | 0.547 > Base64Decode.testBase64MIMEDecode | 0 | ... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: refine ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20026/files - new: https://git.openjdk.org/jdk/pull/20026/files/d1899de7..1848c2fd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20026&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20026&range=03-04 Stats: 121 lines in 1 file changed: 51 ins; 64 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20026.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20026/head:pull/20026 PR: https://git.openjdk.org/jdk/pull/20026 From mli at openjdk.org Sun Aug 25 22:21:17 2024 From: mli at openjdk.org (Hamlin Li) Date: Sun, 25 Aug 2024 22:21:17 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v4] In-Reply-To: References: Message-ID: <9CaMDk7tdTqmlXequDf4H-5ozalxbrVCb4E5E6AjkVE=.51211e3a-ad04-46e0-8aa4-a07c2452e625@github.com> On Sat, 24 Aug 2024 07:55:16 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> comments > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5492: > >> 5490: } >> 5491: >> 5492: __ BIND(ScalarLoop); > > Why not move this `ScalarLoop` body to immediately before `Exit`? Seems to me that we will have a more clear separation of scalar code and vector code and a more simpler control flow then. You're right, fixed. Thanks! > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5517: > >> 5515: __ orr(byte0, byte0, byte1); >> 5516: __ orr(byte0, byte0, byte3); >> 5517: __ slliw(byte2, byte2, 6); > > Is this correct to shift left and modify `byte0` - `byte3` before their original value are OR-ed into `combined32Bits`? I may have misunderstood your question. After line 5518, combined32Bits will be byte0[23:18] | byte1[17:12] | byte2[11:6] | byte3[5:0], so before it, we need to move every byte(6 bits) to the right position except of byte3. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1730440530 PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1730440537 From fyang at openjdk.org Mon Aug 26 02:06:31 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 26 Aug 2024 02:06:31 GMT Subject: RFR: 8338727: RISC-V: Avoid synthetic data dependency in nmethod barrier on Ztso In-Reply-To: References: Message-ID: On Sun, 25 Aug 2024 12:20:47 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/gc/shared/barrierSetAssembler_riscv.cpp line 281: >> >>> 279: // Embed an synthetic data dependency to order the guard load >>> 280: // before the epoch load. (xor + add is standard way) >>> 281: // Note: This may be slower than using a membar(load|load) (fence r,r). >> >> But the RV ISA spec says that this is lightweight ordering mechanism compared with a FENCE R, R. >> Here is what I read from the spec: >> >> Like other modern memory models, the RVWMO memory model uses syntactic rather than semantic dependencies. >> In other words, this definition depends on the identities of the registers being accessed by different instructions, >> not the actual contents of those registers. This means that an address, control, or data dependency must be enforced >> even if the calculation could seemingly be ?optimized away?. This choice ensures that RVWMO remains compatible >> with code that uses these false syntactic dependencies as a lightweight ordering mechanism. >> >> ld a1,0(s0) >> xor a2,a1,a1 >> add s1,s1,a2 >> ld a5,0(s1) >> >> Figure A.10: A syntactic address dependency >> >> For example, there is a syntactic address dependency from the memory operation generated by the >> first instruction to the memory operation generated by the last instruction in Figure A.10, even >> though a1 XOR a1 is zero and hence has no effect on the address accessed by the second load. >> The benefit of using dependencies as a lightweight synchronization mechanism is that the ordering >> enforcement requirement is limited only to the specific two instructions in question. >> Other non-dependent instructions may be freely reordered by aggressive implementations. >> One alternative would be to use a load-acquire, but this would enforce ordering for the first load >> with respect to all subsequent instructions. Another would be to use a FENCE R,R, but this would >> include all previous and all subsequent loads, making this option more expensive > > Not sure what you mean, but there is no contradiction here. > Manual says: > > load guard > epoch data dep> > load epoch > load thread_gurad_epoch //unaffected by data dep. > > In RVWMO load thread_gurad_epoch is uneffected yes, and can be loaded eariler, yes. > > But we branch on the value of epoch (plus guard), delaying the load of epoch more than neccessary means we delay the branch instruction. As that branch have a control dependency it stop the all following instructions: > > Control dependencies behave differently from address and data dependencies in the sense that a > control dependency always extends to all instructions following the original target in program order. > > > Which means the main goal is get throught the branch as quick as possible. > > My comment says delaying the load of epoch in favour of loading thread_gurad_epoch eariler may be slower. > I have not look to deep but it seems like we can also move the load of thread_gurad_epoch before data dep? (i.e. before any such fence r,r) > > As the load of guard and load epoch cannot overlap, they happen sequentially (due to data dep). > `fence r,r` only says the load will happen in global memory order, it do not force them to be sequential. > > As we are comming very close to CPU implementation here there maybe differences. > So the point of the comment was, maybe revisit this in a few years. > > Thanks! I'll wait until we can agree if this is a good comment or if it should be in other wording. All right then. Can you change following line of comment to be more specific? // Embed an synthetic data dependency to order the guard load => // Embed a syntactic address dependency to order the guard load ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20661#discussion_r1730543230 From dholmes at openjdk.org Mon Aug 26 02:10:03 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 26 Aug 2024 02:10:03 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: On Wed, 21 Aug 2024 22:14:40 GMT, Magnus Ihse Bursie wrote: >> As a preparation for Hermetic Java, we need to have a way to look up during runtime if we are using a statically linked library or not. >> >> This change will be the first step needed towards compiling the object files only once, and then link them into either dynamic or static libraries. (The only exception will be the linktype.c[pp] files, which needs to be compiled twice, once for the dynamic libraries and once for the static libraries.) Getting there will require further work though. >> >> This is part of the changes that make up the draft PR https://github.com/openjdk/jdk/pull/19478, which I have broken out. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Also update build to link properly I understand the cost overhead experienced by any individual Java run may be lost in the noise, but it still impacts every single Java run just to save some time/resources for the handful of builders of statically linked VMs. I am not a fan. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20666#issuecomment-2309162391 From fyang at openjdk.org Mon Aug 26 02:57:04 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 26 Aug 2024 02:57:04 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v4] In-Reply-To: <9CaMDk7tdTqmlXequDf4H-5ozalxbrVCb4E5E6AjkVE=.51211e3a-ad04-46e0-8aa4-a07c2452e625@github.com> References: <9CaMDk7tdTqmlXequDf4H-5ozalxbrVCb4E5E6AjkVE=.51211e3a-ad04-46e0-8aa4-a07c2452e625@github.com> Message-ID: On Sun, 25 Aug 2024 22:18:33 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5517: >> >>> 5515: __ orr(byte0, byte0, byte1); >>> 5516: __ orr(byte0, byte0, byte3); >>> 5517: __ slliw(byte2, byte2, 6); >> >> Is this correct to shift left and modify `byte0` - `byte3` before their original value are OR-ed into `combined32Bits`? > > I may have misunderstood your question. > After line 5518, combined32Bits will be byte0[23:18] | byte1[17:12] | byte2[11:6] | byte3[5:0], so before it, we need to move every byte(6 bits) to the right position by shifting except of byte3. Sorry for being not clear on this. The java code snippet of decodeBlock: 795 int b1 = base64[src[sp++] & 0xff]; 796 int b2 = base64[src[sp++] & 0xff]; 797 int b3 = base64[src[sp++] & 0xff]; 798 int b4 = base64[src[sp++] & 0xff]; 799 if ((b1 | b2 | b3 | b4) < 0) { // non base64 byte 800 return new_dp - dp; 801 } 802 int bits0 = b1 << 18 | b2 << 12 | b3 << 6 | b4; L799 simply OR-ed all the initial values of `b1`-`b4` and compare the result with zero. I think this should be reflected on the value of `combined32Bits` when it is used to do following error check. Correspondingly, It should be the OR-ed result of the initial loaded values in `byte0` - `byte3`. // error check __ bltz(combined32Bits, Exit); Anything I missed? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1730591429 From sjayagond at openjdk.org Mon Aug 26 04:28:59 2024 From: sjayagond at openjdk.org (Sidraya Jayagond) Date: Mon, 26 Aug 2024 04:28:59 GMT Subject: RFR: 8327652: S390x: Implements SLP support [v8] In-Reply-To: References: Message-ID: > This PR Adds SIMD support on s390x. Sidraya Jayagond has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - Merge branch 'openjdk:master' into Vector_support_s390x - PopCountVI supported by z14 onwards. - Fix cosmetic review comments - Address code cleanup review comments - Use Op_VecX instead of Op_RegF Signed-off-by: Sidraya - Add proper comments and cosmetic changes. - Address review comments - Remove extra spcaes - Implements SIMD support on s390x ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18162/files - new: https://git.openjdk.org/jdk/pull/18162/files/042c3966..20a12fcf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18162&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18162&range=06-07 Stats: 744880 lines in 9749 files changed: 244305 ins; 134355 del; 366220 mod Patch: https://git.openjdk.org/jdk/pull/18162.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18162/head:pull/18162 PR: https://git.openjdk.org/jdk/pull/18162 From sjayagond at openjdk.org Mon Aug 26 05:17:47 2024 From: sjayagond at openjdk.org (Sidraya Jayagond) Date: Mon, 26 Aug 2024 05:17:47 GMT Subject: RFR: 8327652: S390x: Implements SLP support [v9] In-Reply-To: References: Message-ID: > This PR Adds SIMD support on s390x. Sidraya Jayagond has updated the pull request incrementally with one additional commit since the last revision: Add rebase changes from jdk master ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18162/files - new: https://git.openjdk.org/jdk/pull/18162/files/20a12fcf..3f2af99e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18162&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18162&range=07-08 Stats: 8 lines in 1 file changed: 0 ins; 4 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18162.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18162/head:pull/18162 PR: https://git.openjdk.org/jdk/pull/18162 From rehn at openjdk.org Mon Aug 26 06:24:15 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 26 Aug 2024 06:24:15 GMT Subject: RFR: 8338727: RISC-V: Avoid synthetic data dependency in nmethod barrier on Ztso [v2] In-Reply-To: References: Message-ID: > Hi please consider, > > On TSO we don't need the synthetic data dependency in between the loads. > Also added some comment about this. > > Sanity tested Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Comment update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20661/files - new: https://git.openjdk.org/jdk/pull/20661/files/925b527f..0a7727a2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20661&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20661&range=00-01 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20661.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20661/head:pull/20661 PR: https://git.openjdk.org/jdk/pull/20661 From rehn at openjdk.org Mon Aug 26 06:24:16 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 26 Aug 2024 06:24:16 GMT Subject: RFR: 8338727: RISC-V: Avoid synthetic data dependency in nmethod barrier on Ztso [v2] In-Reply-To: References: Message-ID: On Mon, 26 Aug 2024 02:01:22 GMT, Fei Yang wrote: >> Not sure what you mean, but there is no contradiction here. >> Manual says: >> >> load guard >> epoch data dep> >> load epoch >> load thread_gurad_epoch //unaffected by data dep. >> >> In RVWMO load thread_gurad_epoch is uneffected yes, and can be loaded eariler, yes. >> >> But we branch on the value of epoch (plus guard), delaying the load of epoch more than neccessary means we delay the branch instruction. As that branch have a control dependency it stop the all following instructions: >> >> Control dependencies behave differently from address and data dependencies in the sense that a >> control dependency always extends to all instructions following the original target in program order. >> >> >> Which means the main goal is get throught the branch as quick as possible. >> >> My comment says delaying the load of epoch in favour of loading thread_gurad_epoch eariler may be slower. >> I have not look to deep but it seems like we can also move the load of thread_gurad_epoch before data dep? (i.e. before any such fence r,r) >> >> As the load of guard and load epoch cannot overlap, they happen sequentially (due to data dep). >> `fence r,r` only says the load will happen in global memory order, it do not force them to be sequential. >> >> As we are comming very close to CPU implementation here there maybe differences. >> So the point of the comment was, maybe revisit this in a few years. >> >> Thanks! I'll wait until we can agree if this is a good comment or if it should be in other wording. > > All right then. Can you change following line of comment to be more specific? > > > // Embed an synthetic data dependency to order the guard load > > => > > // Embed a syntactic address dependency to order the guard load Let me know what you think about the update, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20661#discussion_r1730727524 From fyang at openjdk.org Mon Aug 26 06:33:06 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 26 Aug 2024 06:33:06 GMT Subject: RFR: 8338727: RISC-V: Avoid synthetic data dependency in nmethod barrier on Ztso [v2] In-Reply-To: References: Message-ID: On Mon, 26 Aug 2024 06:24:15 GMT, Robbin Ehn wrote: >> Hi please consider, >> >> On TSO we don't need the synthetic data dependency in between the loads. >> Also added some comment about this. >> >> Sanity tested > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Comment update Looks good. Thanks for the update. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20661#pullrequestreview-2259870642 From rcastanedalo at openjdk.org Mon Aug 26 07:26:08 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 26 Aug 2024 07:26:08 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> Message-ID: <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> On Fri, 23 Aug 2024 13:28:03 GMT, Martin Doerr wrote: >> OK, thanks. I just ran some benchmarks with zero-based OOP compression ([prototype here](https://github.com/robcasloz/jdk/tree/JDK-8334060-g1-late-barrier-expansion-x64-optimizations)) and could not observe any significant performance effect on three different x64 implementations. I think I will keep the `g1StoreN` implementation as-is in the x64 and aarch64 backends, for simplicity. Again, we can revisit this in follow-up work if need be. > > I have an experimental implementation for PPC64. I have moved the oop decoding into `G1BarrierSetAssembler::g1_write_barrier_post_c2`: > https://github.com/TheRealMDoerr/jdk/blob/0aedfb0aa1c545319257c0e613066b91404a07ca/src/hotspot/cpu/ppc/gc/g1/g1BarrierSetAssembler_ppc.cpp#L476 > This has 2 advantages: > - Reduce replicated code in the .ad file. > - Make the discussed optimization easy. Please take a look. Great that you already have an experimental port! Thanks for the heads-up, I agree that the OOP decoding + null check fusion becomes less intrusive, but I still prefer the current decoupled implementation for x64 and aarch64 (even simpler, IMO). In the benchmarks I have run (admittedly, only on x64), I could not observe any positive effect, whereas I found a slight regression in one case using zero-based OOP compression. I have not investigated further, but I wonder if hoisting the null check above the region-crossing test could have a negative impact on branch predictability. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1730806021 From mdoerr at openjdk.org Mon Aug 26 07:46:06 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 26 Aug 2024 07:46:06 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> Message-ID: On Mon, 26 Aug 2024 07:23:40 GMT, Roberto Casta?eda Lozano wrote: >> I have an experimental implementation for PPC64. I have moved the oop decoding into `G1BarrierSetAssembler::g1_write_barrier_post_c2`: >> https://github.com/TheRealMDoerr/jdk/blob/0aedfb0aa1c545319257c0e613066b91404a07ca/src/hotspot/cpu/ppc/gc/g1/g1BarrierSetAssembler_ppc.cpp#L476 >> This has 2 advantages: >> - Reduce replicated code in the .ad file. >> - Make the discussed optimization easy. Please take a look. > > Great that you already have an experimental port! Thanks for the heads-up, I agree that the OOP decoding + null check fusion becomes less intrusive, but I still prefer the current decoupled implementation for x64 and aarch64 (even simpler, IMO). In the benchmarks I have run (admittedly, only on x64), I could not observe any positive effect, whereas I found a slight regression in one case using zero-based OOP compression. I have not investigated further, but I wonder if hoisting the null check above the region-crossing test could have a negative impact on branch predictability. It can be implemented like this: - If oop decoding requires a null check, redirect the branch to jump over the barrier code. - Else insert the null check after the region crossing check. This way, I don't see how it can have a negative effect. But I leave you free to decide about x86 and aarch64. Optimizations could be done later if needed as you already mentioned. Did you also see my comment https://github.com/openjdk/jdk/pull/19746#discussion_r1728987173 ? It's in the "resolved" discussion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1730832653 From stuefe at openjdk.org Mon Aug 26 08:06:09 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 26 Aug 2024 08:06:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 19:03:19 GMT, Leonid Mesnik wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > test/hotspot/jtreg/runtime/cds/appcds/TestZGCWithCDS.java line 59: > >> 57: public static void main(String... args) throws Exception { >> 58: String zGenerational = args[0]; >> 59: String compactHeaders = "-XX:" + (zGenerational.equals("-XX:+ZGenerational") ? "+" : "-") + "UseCompactObjectHeaders"; > > The test failing with > stdout: [[0.176s][info][cds] trying to map /opt/mach5/mesos/work_dir/slaves/a20696e7-ae7d-4d37-8e9c-83f99ef002cb-S2261/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/f0801999-993f-4e08-b017-08b33a8ec44f/runs/34cc555e-ae8f-4a48-8175-e998194f204b/testoutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_cds_relocation/scratch/5/appcds-18h50m16s773.jsa > [0.176s][info][cds] Opened archive /opt/mach5/mesos/work_dir/slaves/a20696e7-ae7d-4d37-8e9c-83f99ef002cb-S2261/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/f0801999-993f-4e08-b017-08b33a8ec44f/runs/34cc555e-ae8f-4a48-8175-e998194f204b/testoutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_cds_relocation/scratch/5/appcds-18h50m16s773.jsa. > [0.176s][info][cds] Archive was created with UseCompressedOops = 0, UseCompressedClassPointers = 1 > [0.176s][info][cds] The shared archive file's UseCompactObjectHeaders setting (enabled) does not equal the current UseCompactObjectHeaders setting (disabled). > [0.176s][info][cds] Initialize static archive failed. > [0.176s][info][cds] Unable to map shared spaces > [0.176s][error][cds] An error has occurred while processing the shared archive file. > [0.176s][error][cds] Unable to map shared spaces > Error occurred during initialization of VM > Unable to use shared archive. > ]; > stderr: [] > exitValue = 1 > > java.lang.RuntimeException: 'Hello World' missing from stdout/stderr > at jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:252) > at TestZGCWithCDS.main(TestZGCWithCDS.java:123) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) > at java.base/java.lang.reflect.Method.invoke(Method.java:573) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) > at java.base/java.lang.Thread.run(Thread.java:1575) > > JavaTest Message: Test threw exception: java.lang.RuntimeException > JavaTest Message: shutting down test Roman has two weeks of vacation; I am taking a look at this one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1730855152 From rcastanedalo at openjdk.org Mon Aug 26 08:32:06 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 26 Aug 2024 08:32:06 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: <6DcMr9PUa8OZEhO861hkhqTMYlKDs96tgPs4Fu1u72I=.9dd2d0d4-9075-4c36-9ab6-ff886e257b4b@github.com> References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <4aS0KysKaIPue1D60nootRPb8m7pdP_2hlPSwGUIk8w=.6b68b278-62ce-4df0-a6c6-c7fee40557aa@github.com> <6DcMr9PUa8OZEhO861hkhqTMYlKDs96tgPs4Fu1u72I=.9dd2d0d4-9075-4c36-9ab6-ff886e257b4b@github.com> Message-ID: On Fri, 23 Aug 2024 13:33:09 GMT, Martin Doerr wrote: >> Thanks for the reference, I would still prefer to keep this part as is for simplicity. We can always optimize the atomic barriers in follow-up work, if a need arises. > > After thinking more about this, I figured out that we can optimize more when moving the pre_barrier after the cmpxchg. We can skip all G1 barriers if the cmpxchg fails: > https://github.com/TheRealMDoerr/jdk/blob/0aedfb0aa1c545319257c0e613066b91404a07ca/src/hotspot/cpu/ppc/gc/g1/g1_ppc.ad#L171 > The cmpxchg jumps to no_update on failure. This may reduce load on GC queue handling and related work for GC threads. I'm testing this version and I actually like it more than the version I had before. Please take a look. > > (Note that my final version will need https://github.com/openjdk/jdk/pull/20689 to be integrated and merged into your PR.) Right, that makes sense since for PPC's cmpxchg implementation (unlike x64 or aarch64+LSE) you are already explicitly branching on failure anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1730891297 From rcastanedalo at openjdk.org Mon Aug 26 08:41:08 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 26 Aug 2024 08:41:08 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> Message-ID: <6PF-kgezzOb9Ed7j-BbrwaURnLJH5aFgOouFwTYiFrE=.670092b8-6980-43f3-a091-25312cfa0f1b@github.com> On Mon, 26 Aug 2024 07:43:39 GMT, Martin Doerr wrote: > This way, I don't see how it can have a negative effect. I agree, this is the implementation I tried out originally (https://github.com/openjdk/jdk/pull/19746#discussion_r1719811953). > Did you also see my comment https://github.com/openjdk/jdk/pull/19746#discussion_r1728987173 ? It's in the "resolved" discussion. Yes, thanks, I "unresolved" it now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1730905873 From rcastanedalo at openjdk.org Mon Aug 26 08:49:06 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 26 Aug 2024 08:49:06 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: <6PF-kgezzOb9Ed7j-BbrwaURnLJH5aFgOouFwTYiFrE=.670092b8-6980-43f3-a091-25312cfa0f1b@github.com> References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> <6PF-kgezzOb9Ed7j-BbrwaURnLJH5aFgOouFwTYiFrE=.670092b8-6980-43f3-a091-25312cfa0f1b@github.com> Message-ID: On Mon, 26 Aug 2024 08:38:39 GMT, Roberto Casta?eda Lozano wrote: >> It can be implemented like this: >> >> - If oop decoding requires a null check, redirect the branch to jump over the barrier code. >> - Else insert the null check after the region crossing check. >> >> This way, I don't see how it can have a negative effect. But I leave you free to decide about x86 and aarch64. Optimizations could be done later if needed as you already mentioned. >> >> Did you also see my comment https://github.com/openjdk/jdk/pull/19746#discussion_r1728987173 ? It's in the "resolved" discussion. > >> This way, I don't see how it can have a negative effect. > > I agree, this is the implementation I tried out originally (https://github.com/openjdk/jdk/pull/19746#discussion_r1719811953). > >> Did you also see my comment https://github.com/openjdk/jdk/pull/19746#discussion_r1728987173 ? It's in the "resolved" discussion. > > Yes, thanks, I "unresolved" it now. > I have an experimental implementation for PPC64. An unrelated comment about your PPC64 implementation: did you try running `test/hotspot/jtreg/compiler/gcbarriers/TestG1BarrierGeneration.java`? It expects the ADL instructions that implement `GetAndSetP` and `GetAndSetN` to be called `g1XChgP` and `g1XChgN`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1730916202 From mcimadamore at openjdk.org Mon Aug 26 09:20:18 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 26 Aug 2024 09:20:18 GMT Subject: Integrated: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI In-Reply-To: References: Message-ID: <3nep7-Z8_feW39di9cTU1O07lgFQD4WmSHQia-UUS7c=.18fb2776-9f8a-47c8-be58-8cd1dd30d45f@github.com> On Mon, 13 May 2024 10:42:26 GMT, Maurizio Cimadamore wrote: > This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: > > * `System::load` and `System::loadLibrary` are now restricted methods > * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods > * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation > > This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. > > Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. > > Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. This pull request has now been integrated. Changeset: 20d8f58c Author: Maurizio Cimadamore URL: https://git.openjdk.org/jdk/commit/20d8f58c92009a46dfb91b951e7d87b4cb8e8b41 Stats: 532 lines in 107 files changed: 341 ins; 52 del; 139 mod 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI Reviewed-by: jpai, prr, ihse, kcr, alanb ------------- PR: https://git.openjdk.org/jdk/pull/19213 From ihse at openjdk.org Mon Aug 26 09:39:04 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 26 Aug 2024 09:39:04 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: On Sun, 25 Aug 2024 14:57:22 GMT, Alan Bateman wrote: > That is true for now but there 30-50 other places that will need attention once this effort is further along. Everywhere that deals with user editable configuration (conf tree) will change, as will everywhere that reads JDK internal files in the lib directory. Well, yes and no. What you are talking about is the Hermetic Java project as a whole. This is about making the build smarter and more efficient at producing static libraries. While static libraries are a sine qua non for Hermetic Java, it is also produced by other reasons. (For instance, the GraalVM and the Mobile Project). It is true that Hermetic Java was what actually got me to prioritize getting static builds to work properly, but the fact is that it has been a sore point for a long time, and the fact that we need to build all object files twice is actually putting a noticable effect on the Oracle CI system. So please don't confuse the cleanup of static builds with the eventual goals of Hermetic Java. The runtime changes needed to read config files from a single binary will need to stand on it's own. The changes in this PR is good for us, regardless of however Hermetic Java proceeds. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20666#issuecomment-2309780980 From ihse at openjdk.org Mon Aug 26 09:42:04 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 26 Aug 2024 09:42:04 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: <4zUGEcC6eLmdq0wAqDCgAjsU17u6-sQNv8KZVQ8pCKc=.f8801e0b-8351-4af9-9825-70ccfa63847a@github.com> On Mon, 26 Aug 2024 02:07:39 GMT, David Holmes wrote: > but it still impacts every single Java run just to save some time/resources for the handful of builders of statically linked VMs. Seriously? I challenge you do prove there is any effect at all. :-/ Also, there is not a "handful" of builders of static libraries. Our internal CI system builds static libraries all the time, and I for one would be glad to use these resources on more productive stuff than building all object files twice. Also, the intention is to enable static builds by default on GHA, once the entire process of making static builds "dirt cheap" is finished, to avoid regressions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20666#issuecomment-2309786734 From mdoerr at openjdk.org Mon Aug 26 09:45:06 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 26 Aug 2024 09:45:06 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> <6PF-kgezzOb9Ed7j-BbrwaURnLJH5aFgOouFwTYiFrE=.670092b8-6980-43f3-a091-25312cfa0f1b@github.com> Message-ID: <1vQH6zpEgjhIO_mq9DCnpwxgDXmnqdx0owlvjJq4Fcw=.78e60c6e-2b23-4e94-a998-e7ba9eafcb6a@github.com> On Mon, 26 Aug 2024 08:46:10 GMT, Roberto Casta?eda Lozano wrote: >>> This way, I don't see how it can have a negative effect. >> >> I agree, this is the implementation I tried out originally (https://github.com/openjdk/jdk/pull/19746#discussion_r1719811953). >> >>> Did you also see my comment https://github.com/openjdk/jdk/pull/19746#discussion_r1728987173 ? It's in the "resolved" discussion. >> >> Yes, thanks, I "unresolved" it now. > >> I have an experimental implementation for PPC64. > > An unrelated comment about your PPC64 implementation: did you try running `test/hotspot/jtreg/compiler/gcbarriers/TestG1BarrierGeneration.java`? It expects the ADL instructions that implement `GetAndSetP` and `GetAndSetN` to be called `g1XChgP` and `g1XChgN`. That one is among the failing tests. Can we agree on better names than `g1XChgP` and `g1XChgN`? They are not readable very well IMHO. All the other nodes have nice names. Regardless if you implement the compressed oops optimization or not, I'd reconsider moving the oop decoding into `G1BarrierSetAssembler::g1_write_barrier_post_c2` because it makes the .ad file shorter because you can get rid of the replicated `decode_heap_oop`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1730991651 From amitkumar at openjdk.org Mon Aug 26 10:36:07 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 26 Aug 2024 10:36:07 GMT Subject: RFR: 8327652: S390x: Implements SLP support [v7] In-Reply-To: References: <1SGOMkL6TvnkQDt1WkH3FbPVrbCUOD_cA3e23QK5-jg=.b9b066f9-d50a-4710-a8a5-76c2d9b83236@github.com> Message-ID: On Mon, 1 Apr 2024 09:01:30 GMT, Martin Doerr wrote: >>> > I think we shouldn't allow `MacroAssembler::string_compress(...)` and `MacroAssembler::string_expand(...)` to use vector registers without specifying this effect. That can be solved by adding a KILL effect for all vector registers which are killed. Alternatively, we could revert to the old implementation before [d5adf1d](https://github.com/openjdk/jdk/commit/d5adf1df921e5ecb8ff4c7e4349a12660069ed28) which doesn't use vector registers. The benefit was not huge if I remember correctly. >>> >>> Agreed. My proposed circumvention is a too dirty hack. >>> >>> I would prefer to add KILL effects to the match rules. I believe the vector implementation had a substantial performance effect. Unfortunately, I can't find any records of performance results from back then. >>> >>> Reverting the commit @TheRealMDoerr mentioned is not possible. It contains many additions that may have been used by unrelated code. The vector code is well encapsulated and could be removed by deleting the >>> >>> ``` >>> if (VM_Version::has_VectorFacility()) { >>> } >>> ``` >>> >>> block. I would not like that, though. >> >> I didn't mean to back out the whole commit. Only the implementation of string_compress and string_expand. The benefit of the vector version certainly depends on what kind of strings are used. (Effect may also be negative in some cases.) I think that classical benchmarks didn't show a significant performance impact, but I don't remember exactly, either. I'll leave the s390 maintainers free to decide if they want to adapt the vector version or go for the short and simple implementation. > >> @TheRealMDoerr and @RealLucy Just for my understanding why GPR and FPR doesn't get affected in intrinsic code as they are also allocated outside of register allocator? why only vector registers usage in intrinsic code get affected or am I missing anything here? > > GPRs and FPRs already have an `effect` specified in the match rules. (If a GPR or FPR is used by a `MachNode` without proper specification, it is a critical bug.) Before this PR, it is legal to use VRs without `effect` because they are not used by register allocation. This is exploited in `MacroAssembler::string_compress(...)` and `MacroAssembler::string_expand(...)`. > > With your change, these 2 intrinsics may overwrite live values! Unfortunately, they use many VRs. So, specifying a `KILL` effect for all of them may cause the register allocation to insert much spill code. May impact performance and code size. Hi @TheRealMDoerr, I added changes on top of SLP PR: https://github.com/openjdk/jdk/commit/3fd8074a52f3cc8b9c3a8e588dd352f4e6c25a76 , and faced build break with this error: * For target buildtools_create_symbols_javac__the.COMPILE_CREATE_SYMBOLS_batch: # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/amit/jdk/src/hotspot/share/opto/chaitin.cpp:997), pid=3968896, tid=3969099 # assert(lrgmask.is_aligned_sets(RegMask::SlotsPerVecX)) failed: vector should be aligned # # JRE version: OpenJDK Runtime Environment (24.0) (fastdebug build 24-internal-adhoc.amit.jdk) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 24-internal-adhoc.amit.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-s390x) # Problematic frame: # V [libjvm.so+0x5a2d8a] PhaseChaitin::gather_lrg_masks(bool) [clone .constprop.1]+0x175a # # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /home/amit/jdk/make/core.3968896) # # An error report file with more information is saved as: # /home/amit/jdk/make/hs_err_pid3968896.log BT: Stack: [0x000003ff2f900000,0x000003ff2fd00000], sp=0x000003ff2fcfb7d8, free space=4077k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x5a2d8a] PhaseChaitin::gather_lrg_masks(bool) [clone .constprop.1]+0x175a (chaitin.cpp:997) V [libjvm.so+0x5aa122] PhaseChaitin::Register_Allocate()+0x1ba (chaitin.cpp:405) V [libjvm.so+0x6f181e] Compile::Code_Gen()+0x2fe (compile.cpp:2966) V [libjvm.so+0x6f4154] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1794 (compile.cpp:885) V [libjvm.so+0x52d32a] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1b2 (c2compiler.cpp:142) V [libjvm.so+0x7011ae] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xce6 (compileBroker.cpp:2303) V [libjvm.so+0x701cb2] CompileBroker::compiler_thread_loop()+0x512 (compileBroker.cpp:1961) V [libjvm.so+0xb649c6] JavaThread::thread_main_inner()+0xfe (javaThread.cpp:758) V [libjvm.so+0x137066c] Thread::call_run()+0xc4 (thread.cpp:225) V [libjvm.so+0x109470a] thread_native_entry(Thread*)+0x132 (os_linux.cpp:858) Do you think using `Op_VecX` is causing issue here and we should revert https://github.com/openjdk/jdk/pull/18162/commits/3caa470c0f89be306e5b43c5da4ca9e625abfe6b ------------- PR Comment: https://git.openjdk.org/jdk/pull/18162#issuecomment-2309888699 From mdoerr at openjdk.org Mon Aug 26 11:08:08 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 26 Aug 2024 11:08:08 GMT Subject: RFR: 8327652: S390x: Implements SLP support [v9] In-Reply-To: References: Message-ID: <6JiVYm6Z0V2uE_DZTxYPnPYZRhZnXGSqTQJpsNNB4gM=.8c75cd00-bf33-4830-99fd-5ce49516cb6a@github.com> On Mon, 26 Aug 2024 05:17:47 GMT, Sidraya Jayagond wrote: >> This PR Adds SIMD support on s390x. > > Sidraya Jayagond has updated the pull request incrementally with one additional commit since the last revision: > > Add rebase changes from jdk master Please don't revert it. The old code wasn't good. The problem should be understood and fixed. Otherwise, this PR should not get integrated. It's a high risk to integrate buggy vector code. We had tons of crashes on PPC64 with the initial version. Also see https://bugs.openjdk.org/browse/JDK-8188802. Does the CPU you were building on support the vector instructions (SuperwordUseVX and UseSFPV)? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18162#issuecomment-2309941494 From rcastanedalo at openjdk.org Mon Aug 26 13:26:07 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 26 Aug 2024 13:26:07 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: <1vQH6zpEgjhIO_mq9DCnpwxgDXmnqdx0owlvjJq4Fcw=.78e60c6e-2b23-4e94-a998-e7ba9eafcb6a@github.com> References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> <6PF-kgezzOb9Ed7j-BbrwaURnLJH5aFgOouFwTYiFrE=.670092b8-6980-43f3-a091-25312cfa0f1b@github.com> <1vQH6zpEgjhIO_mq9DCnpwxgDXmnqdx0owlvjJq4Fcw=.78e60c6e-2b23-4e94-a998-e7ba9eafcb6a@github.com> Message-ID: On Mon, 26 Aug 2024 09:42:29 GMT, Martin Doerr wrote: > That one is among the failing tests. Can we agree on better names than g1XChgP and g1XChgN? They are not readable very well IMHO. Sure, I agree that `g1GetAndSetP` and `g1GetAndSetN` are more consistent with the corresponding Ideal operation names, will rename the instructions in x64 and aarch64 and update the test expectations. > Regardless if you implement the compressed oops optimization or not, I'd reconsider moving the oop decoding into G1BarrierSetAssembler::g1_write_barrier_post_c2 because it makes the .ad file shorter because you can get rid of the replicated decode_heap_oop. Thanks, will try it out. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1731240303 From aph at openjdk.org Mon Aug 26 13:42:07 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 26 Aug 2024 13:42:07 GMT Subject: RFR: 8327652: S390x: Implements SLP support [v9] In-Reply-To: <6JiVYm6Z0V2uE_DZTxYPnPYZRhZnXGSqTQJpsNNB4gM=.8c75cd00-bf33-4830-99fd-5ce49516cb6a@github.com> References: <6JiVYm6Z0V2uE_DZTxYPnPYZRhZnXGSqTQJpsNNB4gM=.8c75cd00-bf33-4830-99fd-5ce49516cb6a@github.com> Message-ID: On Mon, 26 Aug 2024 11:05:08 GMT, Martin Doerr wrote: > Please don't revert it. The old code wasn't good. The problem should be understood and fixed. Otherwise, this PR should not get integrated. It's a high risk to integrate buggy vector code. We had tons of crashes on PPC64 with the initial version. Also see https://bugs.openjdk.org/browse/JDK-8188802. Mmm, but C2 is very much designed around 32-bit slots, and the register allocator assumes that every vec is some multiple of 32-bit slots. Here: // SlotsPerLong is 2, since slots are 32 bits and longs are 64 bits. // Also, consider the maximum alignment size for a normally allocated // value. Since we allocate register pairs but not register quads (at // present), this alignment is SlotsPerLong (== 2). A normally // aligned allocated register is either a single register, or a pair // of adjacent registers, the lower-numbered being even. // See also is_aligned_Pairs() below, and the padding added before // Matcher::_new_SP to keep allocated pairs aligned properly. // If we ever go to quad-word allocations, SlotsPerQuad will become // the controlling alignment constraint. Note that this alignment // requirement is internal to the allocator, and independent of any // particular platform. enum { SlotsPerLong = 2, SlotsPerVecA = 4, SlotsPerVecS = 1, SlotsPerVecD = 2, SlotsPerVecX = 4, SlotsPerVecY = 8, SlotsPerVecZ = 16, SlotsPerRegVectMask = X86_ONLY(2) NOT_X86(1) }; This change that removes the subwords from the `reg_def` is wrong: https://github.com/openjdk/jdk/pull/18162/commits/3caa470c0f89be306e5b43c5da4ca9e625abfe6b . It is, in essence, lying to the register allocator. Sure, you can get away with this as long as you never have to refer directly to individual registers, which is what Amit's patch is doing. We must ensure that every vector register, as declared, is a multiple of the correct number of slots in order for all this to work correctly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18162#issuecomment-2310243935 From coleenp at openjdk.org Mon Aug 26 14:00:09 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 26 Aug 2024 14:00:09 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace In-Reply-To: References: Message-ID: On Sat, 24 Aug 2024 10:51:53 GMT, Thomas Stuefe wrote: > I don't think the costs for two address comparisons matter, not with the comparatively few deallocations that happen (few hundreds or few thousand). If deallocate is hot, we are using metaspace wrong. MethodData does a lot of deallocations from metaspace because it's allocated racily. It might be using Metaspace wrong. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19157#issuecomment-2310284365 From mdoerr at openjdk.org Mon Aug 26 14:10:07 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 26 Aug 2024 14:10:07 GMT Subject: RFR: 8327652: S390x: Implements SLP support [v9] In-Reply-To: References: <6JiVYm6Z0V2uE_DZTxYPnPYZRhZnXGSqTQJpsNNB4gM=.8c75cd00-bf33-4830-99fd-5ce49516cb6a@github.com> Message-ID: On Mon, 26 Aug 2024 13:39:35 GMT, Andrew Haley wrote: >> Please don't revert it. The old code wasn't good. The problem should be understood and fixed. Otherwise, this PR should not get integrated. It's a high risk to integrate buggy vector code. We had tons of crashes on PPC64 with the initial version. Also see https://bugs.openjdk.org/browse/JDK-8188802. Does the CPU you were building on support the vector instructions (SuperwordUseVX and UseSFPV)? > >> Please don't revert it. The old code wasn't good. The problem should be understood and fixed. Otherwise, this PR should not get integrated. It's a high risk to integrate buggy vector code. We had tons of crashes on PPC64 with the initial version. Also see https://bugs.openjdk.org/browse/JDK-8188802. > > Mmm, but C2 is very much designed around 32-bit slots, and the register allocator assumes that every vec is some multiple of 32-bit slots. Here: > > > // SlotsPerLong is 2, since slots are 32 bits and longs are 64 bits. > // Also, consider the maximum alignment size for a normally allocated > // value. Since we allocate register pairs but not register quads (at > // present), this alignment is SlotsPerLong (== 2). A normally > // aligned allocated register is either a single register, or a pair > // of adjacent registers, the lower-numbered being even. > // See also is_aligned_Pairs() below, and the padding added before > // Matcher::_new_SP to keep allocated pairs aligned properly. > // If we ever go to quad-word allocations, SlotsPerQuad will become > // the controlling alignment constraint. Note that this alignment > // requirement is internal to the allocator, and independent of any > // particular platform. > enum { SlotsPerLong = 2, > SlotsPerVecA = 4, > SlotsPerVecS = 1, > SlotsPerVecD = 2, > SlotsPerVecX = 4, > SlotsPerVecY = 8, > SlotsPerVecZ = 16, > SlotsPerRegVectMask = X86_ONLY(2) NOT_X86(1) > }; > > > This change that removes the subwords from the `reg_def` is wrong: https://github.com/openjdk/jdk/pull/18162/commits/3caa470c0f89be306e5b43c5da4ca9e625abfe6b . It is, in essence, lying to the register allocator. Sure, you can get away with this as long as you never have to refer directly to individual registers, which is what Amit's patch is doing. > > We must ensure that every vector register, as declared, is a multiple of the correct number of slots in order for all this to work correctly. Thanks for the explanation, @theRealAph! I see aarch64 has a similar implementation. So, I'm ok with reverting. But please don't do a complete backout! At least the formatting changes etc. should be kept. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18162#issuecomment-2310307928 From mdoerr at openjdk.org Mon Aug 26 14:21:07 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 26 Aug 2024 14:21:07 GMT Subject: RFR: 8327652: S390x: Implements SLP support [v9] In-Reply-To: References: <6JiVYm6Z0V2uE_DZTxYPnPYZRhZnXGSqTQJpsNNB4gM=.8c75cd00-bf33-4830-99fd-5ce49516cb6a@github.com> Message-ID: <4qL5dsLiU8B_K5ZFnN039mIWN0QjW5DgGmzg7WlJXyw=.906b5cd0-4469-4a63-8e91-7465efd39b9f@github.com> On Mon, 26 Aug 2024 13:39:35 GMT, Andrew Haley wrote: >> Please don't revert it. The old code wasn't good. The problem should be understood and fixed. Otherwise, this PR should not get integrated. It's a high risk to integrate buggy vector code. We had tons of crashes on PPC64 with the initial version. Also see https://bugs.openjdk.org/browse/JDK-8188802. Does the CPU you were building on support the vector instructions (SuperwordUseVX and UseSFPV)? > >> Please don't revert it. The old code wasn't good. The problem should be understood and fixed. Otherwise, this PR should not get integrated. It's a high risk to integrate buggy vector code. We had tons of crashes on PPC64 with the initial version. Also see https://bugs.openjdk.org/browse/JDK-8188802. > > Mmm, but C2 is very much designed around 32-bit slots, and the register allocator assumes that every vec is some multiple of 32-bit slots. Here: > > > // SlotsPerLong is 2, since slots are 32 bits and longs are 64 bits. > // Also, consider the maximum alignment size for a normally allocated > // value. Since we allocate register pairs but not register quads (at > // present), this alignment is SlotsPerLong (== 2). A normally > // aligned allocated register is either a single register, or a pair > // of adjacent registers, the lower-numbered being even. > // See also is_aligned_Pairs() below, and the padding added before > // Matcher::_new_SP to keep allocated pairs aligned properly. > // If we ever go to quad-word allocations, SlotsPerQuad will become > // the controlling alignment constraint. Note that this alignment > // requirement is internal to the allocator, and independent of any > // particular platform. > enum { SlotsPerLong = 2, > SlotsPerVecA = 4, > SlotsPerVecS = 1, > SlotsPerVecD = 2, > SlotsPerVecX = 4, > SlotsPerVecY = 8, > SlotsPerVecZ = 16, > SlotsPerRegVectMask = X86_ONLY(2) NOT_X86(1) > }; > > > This change that removes the subwords from the `reg_def` is wrong: https://github.com/openjdk/jdk/pull/18162/commits/3caa470c0f89be306e5b43c5da4ca9e625abfe6b . It is, in essence, lying to the register allocator. Sure, you can get away with this as long as you never have to refer directly to individual registers, which is what Amit's patch is doing. > > We must ensure that every vector register, as declared, is a multiple of the correct number of slots in order for all this to work correctly. @theRealAph: I didn't get the "lying to the register allocator" part. `SlotsPerVecX` is defined to be 4 slots and a slot has 4 Bytes. 16 Bytes is the correct vector width. Note that VecX is also used on PPC64 for example. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18162#issuecomment-2310333928 From duke at openjdk.org Mon Aug 26 15:38:19 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 26 Aug 2024 15:38:19 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm Message-ID: The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup -- | -- | -- | -- MathBench.tanhDouble | 70900 | 95618 | 1.35x ------------- Commit messages: - Fix bug in NaN path - 8338694: x86_64 intrinsic for tanh using libm Changes: https://git.openjdk.org/jdk/pull/20657/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338694 Stats: 565 lines in 25 files changed: 556 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/20657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20657/head:pull/20657 PR: https://git.openjdk.org/jdk/pull/20657 From darcy at openjdk.org Mon Aug 26 15:50:03 2024 From: darcy at openjdk.org (Joe Darcy) Date: Mon, 26 Aug 2024 15:50:03 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm In-Reply-To: References: Message-ID: On Wed, 21 Aug 2024 00:25:03 GMT, Srinivas Vamsi Parasa wrote: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x This PR doesn't include any additional tests. It is often appropriate to add more regression testing when introducing a new implementation of a method. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20657#issuecomment-2310523202 From mli at openjdk.org Mon Aug 26 16:49:05 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 26 Aug 2024 16:49:05 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v4] In-Reply-To: References: <9CaMDk7tdTqmlXequDf4H-5ozalxbrVCb4E5E6AjkVE=.51211e3a-ad04-46e0-8aa4-a07c2452e625@github.com> Message-ID: <1gDwqRoefl8BhrD4CF6oNyWoi_6nuMTG44zdDhVaoH8=.7983dfe7-2026-40b7-9e21-5296a84ae1e9@github.com> On Mon, 26 Aug 2024 02:54:24 GMT, Fei Yang wrote: >> I may have misunderstood your question. >> After line 5518, combined32Bits will be byte0[23:18] | byte1[17:12] | byte2[11:6] | byte3[5:0], so before it, we need to move every byte(6 bits) to the right position by shifting except of byte3. > > Sorry for not being clear on this. The java code snippet of decodeBlock: > > 795 int b1 = base64[src[sp++] & 0xff]; > 796 int b2 = base64[src[sp++] & 0xff]; > 797 int b3 = base64[src[sp++] & 0xff]; > 798 int b4 = base64[src[sp++] & 0xff]; > 799 if ((b1 | b2 | b3 | b4) < 0) { // non base64 byte > 800 return new_dp - dp; > 801 } > 802 int bits0 = b1 << 18 | b2 << 12 | b3 << 6 | b4; > > L799 simply OR-ed all the initial values of `b1`-`b4` and compare the result with zero. I think this should be reflected on the value of `combined32Bits` when it is used to do following error check. Correspondingly, It should be the OR-ed result of the initial loaded values in `byte0` - `byte3`. > > // error check > __ bltz(combined32Bits, Exit); > > > Anything I missed? `(b1 | b2 | b3 | b4) < 0`, this java code is to tell if any of b1-b4 is < 0. On the other side, `bltz(combined32Bits, Exit)` is doing the similar thing (same effect, but different way), as when loading byte1-4, `lb` will sign-extend it, if any of byte1-4 < 0, then the top bit will be 1, and after shift left, top bit will still be 1, so as a result, we can use combined32Bits to tell if any of byte1-4 < 0, and at the same time, we can use combined32Bits to decode the final data (from 4 bytes to 3 bytes). Hope this answers your question? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1731515640 From aph at openjdk.org Mon Aug 26 16:57:06 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 26 Aug 2024 16:57:06 GMT Subject: RFR: 8327652: S390x: Implements SLP support [v9] In-Reply-To: References: <6JiVYm6Z0V2uE_DZTxYPnPYZRhZnXGSqTQJpsNNB4gM=.8c75cd00-bf33-4830-99fd-5ce49516cb6a@github.com> Message-ID: <1d5uxUmWTtW5OVgkEVgvClKLh_lm2hiKAIhCoqVvX04=.acd31921-25c9-4b50-a6ed-bf19cf3ff595@github.com> On Mon, 26 Aug 2024 13:39:35 GMT, Andrew Haley wrote: >> Please don't revert it. The old code wasn't good. The problem should be understood and fixed. Otherwise, this PR should not get integrated. It's a high risk to integrate buggy vector code. We had tons of crashes on PPC64 with the initial version. Also see https://bugs.openjdk.org/browse/JDK-8188802. Does the CPU you were building on support the vector instructions (SuperwordUseVX and UseSFPV)? > >> Please don't revert it. The old code wasn't good. The problem should be understood and fixed. Otherwise, this PR should not get integrated. It's a high risk to integrate buggy vector code. We had tons of crashes on PPC64 with the initial version. Also see https://bugs.openjdk.org/browse/JDK-8188802. > > Mmm, but C2 is very much designed around 32-bit slots, and the register allocator assumes that every vec is some multiple of 32-bit slots. Here: > > > // SlotsPerLong is 2, since slots are 32 bits and longs are 64 bits. > // Also, consider the maximum alignment size for a normally allocated > // value. Since we allocate register pairs but not register quads (at > // present), this alignment is SlotsPerLong (== 2). A normally > // aligned allocated register is either a single register, or a pair > // of adjacent registers, the lower-numbered being even. > // See also is_aligned_Pairs() below, and the padding added before > // Matcher::_new_SP to keep allocated pairs aligned properly. > // If we ever go to quad-word allocations, SlotsPerQuad will become > // the controlling alignment constraint. Note that this alignment > // requirement is internal to the allocator, and independent of any > // particular platform. > enum { SlotsPerLong = 2, > SlotsPerVecA = 4, > SlotsPerVecS = 1, > SlotsPerVecD = 2, > SlotsPerVecX = 4, > SlotsPerVecY = 8, > SlotsPerVecZ = 16, > SlotsPerRegVectMask = X86_ONLY(2) NOT_X86(1) > }; > > > This change that removes the subwords from the `reg_def` is wrong: https://github.com/openjdk/jdk/pull/18162/commits/3caa470c0f89be306e5b43c5da4ca9e625abfe6b . It is, in essence, lying to the register allocator. Sure, you can get away with this as long as you never have to refer directly to individual registers, which is what Amit's patch is doing. > > We must ensure that every vector register, as declared, is a multiple of the correct number of slots in order for all this to work correctly. > @theRealAph: I didn't get the "lying to the register allocator" part. `SlotsPerVecX` is defined to be 4 slots and a slot has 4 Bytes. 16 Bytes is the correct vector width. Note that VecX is also used on PPC64 for example. Do you mean that we need to multiply the encoding() by 4? Can't that be fixed without reverting most of the new code? (only `max_vr = max_fpr + VectorRegister::number_of_registers * VectorRegister::max_slots_per_register` and `(encoding() * VectorRegister::max_slots_per_register) + ConcreteRegisterImpl::max_fpr`) No, I mean that we need this: reg_def Z_VR16 ( SOC, SOC, Op_RegF, 16, Z_V16->as_VMReg() ); reg_def Z_VR16_H ( SOC, SOC, Op_RegF, 16, Z_V16->as_VMReg()->next() ); reg_def Z_VR16_J ( SOC, SOC, Op_RegF, 16, Z_V16->as_VMReg()->next(2) ); reg_def Z_VR16_K ( SOC, SOC, Op_RegF, 16, Z_V16->as_VMReg()->next(3) ); and reg_class z_v_reg( // Attention: Only these ones are saved & restored at safepoint by RegisterSaver. //1st 16 VRs overlaps with 1st 16 FPRs. Z_VR16, Z_VR16_H, Z_VR16_J, Z_VR16_K, Because without the extra dummy register declarations, `Z_VR16` is only allocated a single `VMReg`, but it needs to be allocated 4. I have no intention of touching anything else in this commit, just restoring the vector register declarations, which were working just fine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18162#issuecomment-2310647831 From mdoerr at openjdk.org Mon Aug 26 21:00:07 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 26 Aug 2024 21:00:07 GMT Subject: RFR: 8327652: S390x: Implements SLP support [v9] In-Reply-To: References: Message-ID: <2u6QOgOLGnFfTH4p_1atCbXiTF0zcaNyBQA6174dkZc=.b8070aec-a8bd-4c27-9837-ccc73861071b@github.com> On Mon, 26 Aug 2024 05:17:47 GMT, Sidraya Jayagond wrote: >> This PR Adds SIMD support on s390x. > > Sidraya Jayagond has updated the pull request incrementally with one additional commit since the last revision: > > Add rebase changes from jdk master Ok, so on PPC64 we only have one VMReg per VectorRegister. Seems to work. I don't know why s390 needs the individual ones, but that may be better. Except in terms of resource usage and register allocation performance I guess. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18162#issuecomment-2311081123 From cslucas at openjdk.org Mon Aug 26 21:13:09 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 26 Aug 2024 21:13:09 GMT Subject: RFR: 8327652: S390x: Implements SLP support [v9] In-Reply-To: References: Message-ID: On Mon, 26 Aug 2024 05:17:47 GMT, Sidraya Jayagond wrote: >> This PR Adds SIMD support on s390x. > > Sidraya Jayagond has updated the pull request incrementally with one additional commit since the last revision: > > Add rebase changes from jdk master src/hotspot/cpu/s390/vmreg_s390.cpp line 48: > 46: > 47: VectorRegister vreg = ::as_VectorRegister(0); > 48: for (; i < ConcreteRegisterImpl::max_vr;) { NIT: this really looks like a `while` loop. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18162#discussion_r1731832016 From cjplummer at openjdk.org Mon Aug 26 21:56:04 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 26 Aug 2024 21:56:04 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 20:08:43 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix bit counts in GCForwarding src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Oop.java line 85: > 83: > 84: private static Klass getKlass(Mark mark) { > 85: assert(VM.getVM().isCompactObjectHeadersEnabled()); `mark.getKlass()` already does this assert. I don't see any value in this `getKlass()` method. The caller should just call `getMark().getKlass()` rather than `getKlass(getMark())`. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Oop.java line 169: > 167: } else { > 168: visitor.doMetadata(klass, true); > 169: } Why is there no `visitor.doMetadata()` call for the compressed object header case? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1731849434 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1731866842 From darcy at openjdk.org Mon Aug 26 22:16:04 2024 From: darcy at openjdk.org (Joe Darcy) Date: Mon, 26 Aug 2024 22:16:04 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v4] In-Reply-To: References: Message-ID: <3hk6EiDY3Qxq_sjSBoL7SBsk_5_FsuRa7iZ0caxSs8s=.6db958ed-8f74-49d1-b949-a7da94357592@github.com> On Mon, 19 Aug 2024 07:19:30 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions. Some general impressions of the API change in the `java.lang` classes. I don't think the change as-is, especially the new constant fields, are a great fit for the current API and I think those constant would look worse in a future where there was an "UnsignedInt" value class, so similar fuller platform support. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2311195882 From sviswanathan at openjdk.org Mon Aug 26 23:15:11 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 26 Aug 2024 23:15:11 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v4] In-Reply-To: References: Message-ID: On Mon, 19 Aug 2024 07:19:30 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions. src/hotspot/cpu/x86/assembler_x86.cpp line 3454: > 3452: > 3453: void Assembler::evmovdquw(XMMRegister dst, KRegister mask, XMMRegister src, bool merge, int vector_len) { > 3454: assert(VM_Version::supports_avx512vlbw(), ""); vl not needed for 512 bit. src/hotspot/cpu/x86/assembler_x86.cpp line 4583: > 4581: void Assembler::evpcmpgtb(KRegister kdst, XMMRegister nds, Address src, int vector_len) { > 4582: assert(VM_Version::supports_avx512vlbw(), ""); > 4583: assert(vector_len == Assembler::AVX_512bit || VM_Version::supports_avx512vl(), ""); The check for supports_avx512vlbw() at previous line in this function need to be changed to supports_avx512bw(). If it helps, the vl check is already happening in vex_prefix() if we use the higher bank registers for length < 512 bit. src/hotspot/cpu/x86/assembler_x86.cpp line 4596: > 4594: void Assembler::evpcmpgtb(KRegister kdst, KRegister mask, XMMRegister nds, Address src, int vector_len) { > 4595: assert(VM_Version::supports_avx512vlbw(), ""); > 4596: assert(vector_len == Assembler::AVX_512bit || VM_Version::supports_avx512vl(), ""); The check for supports_avx512vlbw() at previous line in this function need to be changed to supports_avx512bw(). src/hotspot/cpu/x86/assembler_x86.cpp line 4611: > 4609: void Assembler::evpcmpub(KRegister kdst, XMMRegister nds, XMMRegister src, ComparisonPredicate vcc, int vector_len) { > 4610: assert(vector_len == Assembler::AVX_512bit || VM_Version::supports_avx512vl(), ""); > 4611: assert(VM_Version::supports_avx512vlbw(), ""); I think you meant this to be supports_avx512bw(). src/hotspot/cpu/x86/assembler_x86.cpp line 4620: > 4618: void Assembler::evpcmpuw(KRegister kdst, XMMRegister nds, XMMRegister src, ComparisonPredicate vcc, int vector_len) { > 4619: assert(vector_len == Assembler::AVX_512bit || VM_Version::supports_avx512vl(), ""); > 4620: assert(VM_Version::supports_avx512vlbw(), ""); The check for supports_avx512vlbw() in this function need to be changed to supports_avx512bw(). src/hotspot/cpu/x86/assembler_x86.cpp line 4645: > 4643: void Assembler::evpcmpuw(KRegister kdst, XMMRegister nds, Address src, ComparisonPredicate vcc, int vector_len) { > 4644: assert(VM_Version::supports_avx512vlbw(), ""); > 4645: assert(vector_len == Assembler::AVX_512bit || VM_Version::supports_avx512vl(), ""); The check for supports_avx512vlbw() at previous line in this function need to be changed to supports_avx512bw(). src/hotspot/cpu/x86/assembler_x86.cpp line 4672: > 4670: void Assembler::evpcmpeqb(KRegister kdst, KRegister mask, XMMRegister nds, Address src, int vector_len) { > 4671: assert(VM_Version::supports_avx512vlbw(), ""); > 4672: assert(vector_len == Assembler::AVX_512bit || VM_Version::supports_avx512vl(), ""); The check for supports_avx512vlbw() at previous line in this function need to be changed to supports_avx512bw(). src/hotspot/cpu/x86/assembler_x86.cpp line 8191: > 8189: void Assembler::vpminub(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) { > 8190: assert(UseAVX > 0 && (vector_len == Assembler::AVX_512bit || (!needs_evex(dst, nds, src) || VM_Version::supports_avx512vl())), ""); > 8191: assert(!needs_evex(dst, nds, src) || VM_Version::supports_avx512bw(), ""); It will be good to keep the assert similar to vpaddsb for new vmin/vmax instructions. src/hotspot/cpu/x86/assembler_x86.cpp line 8311: > 8309: } > 8310: > 8311: void Assembler::evpminud(XMMRegister dst, KRegister mask, XMMRegister nds, Address src, bool merge, int vector_len) { assert(VM_Version::supports_evex(), "") check missing. src/hotspot/cpu/x86/assembler_x86.cpp line 8340: > 8338: > 8339: void Assembler::evpminuq(XMMRegister dst, KRegister mask, XMMRegister nds, Address src, bool merge, int vector_len) { > 8340: assert(vector_len == AVX_512bit || VM_Version::supports_avx512vl(), ""); assert(VM_Version::supports_evex(), "") check missing. src/hotspot/cpu/x86/assembler_x86.cpp line 8402: > 8400: void Assembler::vpmaxuw(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) { > 8401: assert(vector_len == AVX_128bit ? VM_Version::supports_avx() : > 8402: (vector_len == AVX_256bit ? VM_Version::supports_avx2() : VM_Version::supports_avx512bw()), ""); Why support_avx() check here only and not in other newly added v* integral instructions? For avx1 platforms, integral vector width supported is only 128bit. src/hotspot/cpu/x86/assembler_x86.cpp line 8478: > 8476: > 8477: void Assembler::evpmaxud(XMMRegister dst, KRegister mask, XMMRegister nds, Address src, bool merge, int vector_len) { > 8478: assert(vector_len == AVX_512bit || VM_Version::supports_avx512vl(), ""); assert(VM_Version::supports_evex(), "") is missing. src/hotspot/cpu/x86/assembler_x86.cpp line 8506: > 8504: > 8505: void Assembler::evpmaxuq(XMMRegister dst, KRegister mask, XMMRegister nds, Address src, bool merge, int vector_len) { > 8506: assert(vector_len == AVX_512bit || VM_Version::supports_avx512vl(), ""); assert(VM_Version::supports_evex(), "") is missing. src/hotspot/cpu/x86/assembler_x86.cpp line 10229: > 10227: InstructionMark im(this); > 10228: assert(VM_Version::supports_avx512bw() && (vector_len == AVX_512bit || VM_Version::supports_avx512vl()), ""); > 10229: InstructionAttr attributes(vector_len, /* vex_w */ true,/* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true); vex_w could be false here. src/hotspot/cpu/x86/assembler_x86.cpp line 10256: > 10254: InstructionMark im(this); > 10255: assert(VM_Version::supports_avx512bw() && (vector_len == AVX_512bit || VM_Version::supports_avx512vl()), ""); > 10256: InstructionAttr attributes(vector_len, /* vex_w */ true,/* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true); vex_w could be false here. src/hotspot/cpu/x86/assembler_x86.cpp line 10283: > 10281: InstructionMark im(this); > 10282: assert(VM_Version::supports_avx512bw() && (vector_len == AVX_512bit || VM_Version::supports_avx512vl()), ""); > 10283: InstructionAttr attributes(vector_len, /* vex_w */ true,/* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true); vex_w could be false here. src/hotspot/cpu/x86/assembler_x86.cpp line 10310: > 10308: InstructionMark im(this); > 10309: assert(VM_Version::supports_avx512bw() && (vector_len == AVX_512bit || VM_Version::supports_avx512vl()), ""); > 10310: InstructionAttr attributes(vector_len, /* vex_w */ true,/* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true); vex_w could be false here. src/hotspot/cpu/x86/assembler_x86.cpp line 10337: > 10335: InstructionMark im(this); > 10336: assert(VM_Version::supports_avx512bw() && (vector_len == AVX_512bit || VM_Version::supports_avx512vl()), ""); > 10337: InstructionAttr attributes(vector_len, /* vex_w */ true,/* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true); vex_w could be false here. src/hotspot/cpu/x86/assembler_x86.cpp line 10364: > 10362: InstructionMark im(this); > 10363: assert(VM_Version::supports_avx512bw() && (vector_len == AVX_512bit || VM_Version::supports_avx512vl()), ""); > 10364: InstructionAttr attributes(vector_len, /* vex_w */ true,/* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true); vex_w could be false here. src/hotspot/cpu/x86/assembler_x86.cpp line 10391: > 10389: InstructionMark im(this); > 10390: assert(VM_Version::supports_avx512bw() && (vector_len == AVX_512bit || VM_Version::supports_avx512vl()), ""); > 10391: InstructionAttr attributes(vector_len, /* vex_w */ true,/* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true); vex_w could be false here. src/hotspot/cpu/x86/assembler_x86.cpp line 10419: > 10417: InstructionMark im(this); > 10418: assert(VM_Version::supports_avx512bw() && (vector_len == AVX_512bit || VM_Version::supports_avx512vl()), ""); > 10419: InstructionAttr attributes(vector_len, /* vex_w */ true,/* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true); vex_w could be false here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731912227 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731608860 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731609177 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731917735 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731612730 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731726012 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731726337 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731748671 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731769490 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731771330 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731823750 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731870793 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731870288 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731888852 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731889468 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731890265 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731909994 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731910246 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731910516 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731910755 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731911129 From coleenp at openjdk.org Mon Aug 26 23:44:04 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 26 Aug 2024 23:44:04 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote: >> This work has been split out from JDK-8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths >> >> The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. Though with compact strings this reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define UTF8 lengths as `size_t` to accommodate all possible representations. Higher-level API's can still use `int` if they know the strings (eg symbols) are sufficiently constrained in length. See the comments in utf8.hpp that explain Strings, compact strings and the encoding. >> >> As the existing JNI `GetStringUTFLength` still requires the current truncating behaviour of ` UNICODE::utf8_length` we add back `UNICODE::utf8_length_as_int` for it to use. >> >> Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` use `length` as an IN/OUT parameter: it is the incoming (int) length of the jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. This makes some of the call sites a little messy with casts. >> >> Testing: >> - tiers 1-4 >> - GHA > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > more missing casts This looks good. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20560#pullrequestreview-2261826956 From coleenp at openjdk.org Mon Aug 26 23:44:06 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 26 Aug 2024 23:44:06 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: <5ned4M2iUF1GfIM3E5uMRhYsM3f8trrPaX1yuXVY__g=.b2811553-0d1f-48b5-a0ed-61c3b118af1b@github.com> References: <1of0cndqphEvQjJD8q54cUrYMAKOPh-Y7hkZiZ-uooU=.46215ef2-b141-465f-9247-071f3eec483e@github.com> <5ned4M2iUF1GfIM3E5uMRhYsM3f8trrPaX1yuXVY__g=.b2811553-0d1f-48b5-a0ed-61c3b118af1b@github.com> Message-ID: On Mon, 19 Aug 2024 23:08:35 GMT, David Holmes wrote: >> src/hotspot/share/classfile/javaClasses.cpp line 639: >> >>> 637: if (length == 0) { >>> 638: return 0; >>> 639: } >> >> Maybe assert length > 0 here? > > Why "> 0" ? Because length is an in which could be negative but you're passing it to size_t. -Wsign-conversion might complain because you're changing signs. I guess you know from context that it's a positive number, so ok. >> src/hotspot/share/prims/jni.cpp line 2226: >> >>> 2224: HOTSPOT_JNI_GETSTRINGUTFLENGTH_ENTRY(env, string); >>> 2225: oop java_string = JNIHandles::resolve_non_null(string); >>> 2226: jsize ret = java_lang_String::utf8_length_as_int(java_string); >> >> So the spec says that this should be jsize (signed int), which is why this is, right? > > Yes. Hence the other change to add a new JNI API. Ok. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1731939815 PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1731940575 From lmesnik at openjdk.org Tue Aug 27 00:23:17 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 27 Aug 2024 00:23:17 GMT Subject: RFR: 8339030: frame::print_value_on(outputStream* st, JavaThread *thread) doesn't need thread argument Message-ID: Method frame::print_value_on(outputStream* st, JavaThread *thread) doesn't need thread argument it usually is called with nullptr as second arg except JavaThread::trace_frames() where it is called with this. It seems that thread has never been used since 2007 so makes sense just to get rid of it. Tested building all builds available in CI and running tier13 ------------- Commit messages: - 8339030: frame::print_value_on(outputStream* st, JavaThread *thread) doesn't need thread argument Changes: https://git.openjdk.org/jdk/pull/20721/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20721&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339030 Stats: 16 lines in 7 files changed: 0 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/20721.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20721/head:pull/20721 PR: https://git.openjdk.org/jdk/pull/20721 From jiangli at openjdk.org Tue Aug 27 00:44:04 2024 From: jiangli at openjdk.org (Jiangli Zhou) Date: Tue, 27 Aug 2024 00:44:04 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: <5_BKiz0spEIxGN2mZJHiAoaSOWOdnH8kf5POgG9sQ9g=.9339d838-9f04-4d28-93b8-647ad90e805a@github.com> References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> <5_BKiz0spEIxGN2mZJHiAoaSOWOdnH8kf5POgG9sQ9g=.9339d838-9f04-4d28-93b8-647ad90e805a@github.com> Message-ID: On Thu, 22 Aug 2024 00:30:07 GMT, Jiangli Zhou wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> Also update build to link properly > > I compared the extracted changes in this PR with the related parts in https://github.com/openjdk/jdk/pull/19478. They look ok. My concern (as discussed in https://github.com/openjdk/jdk/pull/19478#issuecomment-2278421931) is that these runtime changes for static JDK can't be tested even they are relatively simple, without the the actual linking change. Any timeline for the static linking changes? > @jianglizhou > > > [...] these runtime changes for static JDK can't be tested [...] > > Yes, they can. This is just a pure refactoring of existing code. I have deliberately kept out addition of the new places where static linking exceptions are needed in the code. Hi @magicus, perhaps the answer is both `yes` and `no`. Since your `src/hotspot/share/runtime/linkType.cpp` change removes the needs of requiring `#ifdef STATIC_BUILD` macro from various affected JDK source files to handle the differences between dynamic linking and static linking. From that sense, it's probably `yes` (can be tested as before) as `linkType.cpp` still uses `#ifdef STATIC_BUILD`, and the dynamic v.s. static differences are still determined at build time and not at runtime, as @dholmes-ora and @TheShermanTanker have pointed out. In theory, things (especially the dynamic case) could be tested as before since the fundamental is unchanged. That's different from the changes in https://github.com/openjdk/leyden/tree/hermetic-java-runtime, which does the actual runtime checks. Since the mainline doesn't have the needed build changes to have the ability to link a `javastatic` binary, from that point of view all the `static` cases in the PR cannot be tested yet. We could test them by integrating into https://github.com/openjdk/leyden/tree/hermetic-java-runtime and downstream codebase (with full hermetic Java support) after the PR is approved/submitted in the mainline. That might help. To ease some of @dholmes-ora's concern (and my concern as well) that the initial change could affect all Java instances, perhaps providing the build support for statically linking `javastatic` should be done as an immediate follow-up step (I'm continually nudging you toward that direction :-)). We have multiple goals to achieve in the build system for just the static-Java-only part and we probably want to consider adding the support in following sequence: 1) Capability of building a fully statically linked `javastatic` executable 2) Allow linking both `java` (with dynamic linking support) and `javatatic` using the same set of `.o` object files ? ? - Eliminate the needs of `#ifdef STATIC_BUILD` macro. Your `linkType.cpp` change seems to be able to limit the macro usage within one file and just conditionally compile the single file only. So that helps. ? ? - May involve spec changes for `JNI_OnLoad` and friends to use `JNI_OnLoad_` naming for dynamic linking support. The needed spec change for the static linking case (built in library support) has already been done in the past by others. 3) General solution for duplicating symbol issue - `objcopy` for symbol hiding ------------- PR Comment: https://git.openjdk.org/jdk/pull/20666#issuecomment-2311351447 From dholmes at openjdk.org Tue Aug 27 01:07:13 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 27 Aug 2024 01:07:13 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int In-Reply-To: <1of0cndqphEvQjJD8q54cUrYMAKOPh-Y7hkZiZ-uooU=.46215ef2-b141-465f-9247-071f3eec483e@github.com> References: <1of0cndqphEvQjJD8q54cUrYMAKOPh-Y7hkZiZ-uooU=.46215ef2-b141-465f-9247-071f3eec483e@github.com> Message-ID: On Thu, 15 Aug 2024 20:44:52 GMT, Coleen Phillimore wrote: >> This work has been split out from JDK-8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths >> >> The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. Though with compact strings this reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define UTF8 lengths as `size_t` to accommodate all possible representations. Higher-level API's can still use `int` if they know the strings (eg symbols) are sufficiently constrained in length. See the comments in utf8.hpp that explain Strings, compact strings and the encoding. >> >> As the existing JNI `GetStringUTFLength` still requires the current truncating behaviour of ` UNICODE::utf8_length` we add back `UNICODE::utf8_length_as_int` for it to use. >> >> Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` use `length` as an IN/OUT parameter: it is the incoming (int) length of the jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. This makes some of the call sites a little messy with casts. >> >> Testing: >> - tiers 1-4 >> - GHA > > It doesn't look like GHA is configured for you here. Thanks for the review @coleenp ! Anyone up for a second review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20560#issuecomment-2311374023 From dholmes at openjdk.org Tue Aug 27 01:07:14 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 27 Aug 2024 01:07:14 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: References: <1of0cndqphEvQjJD8q54cUrYMAKOPh-Y7hkZiZ-uooU=.46215ef2-b141-465f-9247-071f3eec483e@github.com> <5ned4M2iUF1GfIM3E5uMRhYsM3f8trrPaX1yuXVY__g=.b2811553-0d1f-48b5-a0ed-61c3b118af1b@github.com> Message-ID: <7QmRnDFdHSTWbIXoezlIWQSmJYuW0ciwwfzZ6MR71Vc=.8d46b3b4-ef23-4325-a630-6ead22d97541@github.com> On Mon, 26 Aug 2024 23:37:15 GMT, Coleen Phillimore wrote: >> Why "> 0" ? > > Because length is an in which could be negative but you're passing it to size_t. -Wsign-conversion might complain because you're changing signs. I guess you know from context that it's a positive number, so ok. Right - array lengths must be >= 0 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1731990488 From dlong at openjdk.org Tue Aug 27 01:12:06 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 27 Aug 2024 01:12:06 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: References: Message-ID: <8_lpkeW6gtzUixDbpePNOfM6ZBadNASYoqhW9KhUciA=.34f548f5-843a-439c-b3cc-940d3932028a@github.com> On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote: >> This work has been split out from JDK-8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths >> >> The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. Though with compact strings this reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define UTF8 lengths as `size_t` to accommodate all possible representations. Higher-level API's can still use `int` if they know the strings (eg symbols) are sufficiently constrained in length. See the comments in utf8.hpp that explain Strings, compact strings and the encoding. >> >> As the existing JNI `GetStringUTFLength` still requires the current truncating behaviour of ` UNICODE::utf8_length` we add back `UNICODE::utf8_length_as_int` for it to use. >> >> Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` use `length` as an IN/OUT parameter: it is the incoming (int) length of the jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. This makes some of the call sites a little messy with casts. >> >> Testing: >> - tiers 1-4 >> - GHA > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > more missing casts src/hotspot/share/classfile/javaClasses.cpp line 307: > 305: { > 306: ResourceMark rm; > 307: size_t utf8_len = static_cast(length); I think there should be an assert that length is not negative, probably at the very beginning of this function. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1731992852 From dlong at openjdk.org Tue Aug 27 01:19:10 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 27 Aug 2024 01:19:10 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: References: Message-ID: <6RLRfWy8b0ByQvC4Ivqcof7JgxOc_Krsyy9NlyHgCpw=.54d83b99-60af-42dd-a5da-29718f7d9306@github.com> On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote: >> This work has been split out from JDK-8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths >> >> The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. Though with compact strings this reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define UTF8 lengths as `size_t` to accommodate all possible representations. Higher-level API's can still use `int` if they know the strings (eg symbols) are sufficiently constrained in length. See the comments in utf8.hpp that explain Strings, compact strings and the encoding. >> >> As the existing JNI `GetStringUTFLength` still requires the current truncating behaviour of ` UNICODE::utf8_length` we add back `UNICODE::utf8_length_as_int` for it to use. >> >> Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` use `length` as an IN/OUT parameter: it is the incoming (int) length of the jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. This makes some of the call sites a little messy with casts. >> >> Testing: >> - tiers 1-4 >> - GHA > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > more missing casts src/hotspot/share/classfile/javaClasses.cpp line 350: > 348: // This check is too strict when the input string is not a valid UTF8. > 349: // For example, it may be created with arbitrary content via jni_NewStringUTF. > 350: if (UTF8::is_legal_utf8((const unsigned char*)utf8_str, strlen(utf8_str), false)) { Most of the time we use `is_legal_utf8`, we have a char* and have to cast it to unsigned char*. How about adding an inlined overload for is_legal_utf8(const char*, size_t) for conenience? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1731997319 From dholmes at openjdk.org Tue Aug 27 02:12:06 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 27 Aug 2024 02:12:06 GMT Subject: RFR: 8339030: frame::print_value_on(outputStream* st, JavaThread *thread) doesn't need thread argument In-Reply-To: References: Message-ID: <4Sj49UNtEbfus-W0RrEdIq6W8FZmrgUzk6fOpA9xbbA=.64aaf45c-6fe2-4151-aa5d-62275fa40e51@github.com> On Tue, 27 Aug 2024 00:19:04 GMT, Leonid Mesnik wrote: > Method > frame::print_value_on(outputStream* st, JavaThread *thread) doesn't need thread argument > > it usually is called with nullptr as second arg except > JavaThread::trace_frames() > where it is called with this. > > It seems that thread has never been used since 2007 so makes sense just to get rid of it. > > Tested building all builds available in CI and running tier13 Looks like it has been unused since [JDK-4894843](https://bugs.openjdk.org/browse/JDK-4894843) was integrated back in 2003. :) Thanks for the clean up. One nit below. src/hotspot/share/runtime/frame.hpp line 437: > 435: public: > 436: void print_value() const { print_value_on(tty); } > 437: void print_value_on(outputStream *st) const; Nit: the * should be at the end of `outputStream` not the start of `st`. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20721#pullrequestreview-2261953035 PR Review Comment: https://git.openjdk.org/jdk/pull/20721#discussion_r1732025523 From fyang at openjdk.org Tue Aug 27 02:43:03 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 27 Aug 2024 02:43:03 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v4] In-Reply-To: <1gDwqRoefl8BhrD4CF6oNyWoi_6nuMTG44zdDhVaoH8=.7983dfe7-2026-40b7-9e21-5296a84ae1e9@github.com> References: <9CaMDk7tdTqmlXequDf4H-5ozalxbrVCb4E5E6AjkVE=.51211e3a-ad04-46e0-8aa4-a07c2452e625@github.com> <1gDwqRoefl8BhrD4CF6oNyWoi_6nuMTG44zdDhVaoH8=.7983dfe7-2026-40b7-9e21-5296a84ae1e9@github.com> Message-ID: <-_qE1z1Gn-8fZzM5_xT41eAqUhjow3zRMsNX4tPOxlI=.451e56b1-5d1d-45d4-a2fd-44cc376d0b4c@github.com> On Mon, 26 Aug 2024 16:46:23 GMT, Hamlin Li wrote: >> Sorry for not being clear on this. The java code snippet of decodeBlock: >> >> 795 int b1 = base64[src[sp++] & 0xff]; >> 796 int b2 = base64[src[sp++] & 0xff]; >> 797 int b3 = base64[src[sp++] & 0xff]; >> 798 int b4 = base64[src[sp++] & 0xff]; >> 799 if ((b1 | b2 | b3 | b4) < 0) { // non base64 byte >> 800 return new_dp - dp; >> 801 } >> 802 int bits0 = b1 << 18 | b2 << 12 | b3 << 6 | b4; >> >> L799 simply OR-ed all the initial values of `b1`-`b4` and compare the result with zero. I think this should be reflected on the value of `combined32Bits` when it is used to do following error check. Correspondingly, It should be the OR-ed result of the initial loaded values in `byte0` - `byte3`. >> >> // error check >> __ bltz(combined32Bits, Exit); >> >> >> Anything I missed? > > `(b1 | b2 | b3 | b4) < 0`, this java code is to tell if any of b1-b4 is < 0. > On the other side, `bltz(combined32Bits, Exit)` is doing the similar thing (same effect, but different way), as when loading byte1-4, `lb` will sign-extend it, if any of byte1-4 < 0, then the top bit will be 1, and after shift left, top bit will still be 1, so as a result, we can use combined32Bits to tell if any of byte1-4 < 0, and at the same time, we can use combined32Bits to decode the final data (from 4 bytes to 3 bytes). Or to put it another way, combined32Bits is constructed for 2 purposes at the same time. > Hope this answers your question? Yeah. Interesting. Could you please add a small code comment for this? Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1732043633 From dlong at openjdk.org Tue Aug 27 02:55:04 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 27 Aug 2024 02:55:04 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote: >> This work has been split out from JDK-8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths >> >> The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. Though with compact strings this reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define UTF8 lengths as `size_t` to accommodate all possible representations. Higher-level API's can still use `int` if they know the strings (eg symbols) are sufficiently constrained in length. See the comments in utf8.hpp that explain Strings, compact strings and the encoding. >> >> As the existing JNI `GetStringUTFLength` still requires the current truncating behaviour of ` UNICODE::utf8_length` we add back `UNICODE::utf8_length_as_int` for it to use. >> >> Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` use `length` as an IN/OUT parameter: it is the incoming (int) length of the jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. This makes some of the call sites a little messy with casts. >> >> Testing: >> - tiers 1-4 >> - GHA > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > more missing casts src/hotspot/share/classfile/javaClasses.cpp line 555: > 553: bool is_latin1 = java_lang_String::is_latin1(java_string); > 554: > 555: if (length == 0) return nullptr; Should this be checking for length <= 0? It looks like length can indeed be negative if UTF8::unicode_length() tries to return the length of a utf8 string with length > 0x7fffffff. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732050412 From dlong at openjdk.org Tue Aug 27 03:17:03 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 27 Aug 2024 03:17:03 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote: >> This work has been split out from JDK-8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths >> >> The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. Though with compact strings this reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define UTF8 lengths as `size_t` to accommodate all possible representations. Higher-level API's can still use `int` if they know the strings (eg symbols) are sufficiently constrained in length. See the comments in utf8.hpp that explain Strings, compact strings and the encoding. >> >> As the existing JNI `GetStringUTFLength` still requires the current truncating behaviour of ` UNICODE::utf8_length` we add back `UNICODE::utf8_length_as_int` for it to use. >> >> Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` use `length` as an IN/OUT parameter: it is the incoming (int) length of the jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. This makes some of the call sites a little messy with casts. >> >> Testing: >> - tiers 1-4 >> - GHA > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > more missing casts src/hotspot/share/classfile/javaClasses.cpp line 588: > 586: size_t utf8_len = static_cast(length); > 587: const char* base = UNICODE::as_utf8(position, utf8_len); > 588: Symbol* sym = SymbolTable::new_symbol(base, checked_cast(utf8_len)); With the current limitations of checked_cast(), we would also need to check if the result is negative on 32-bit platforms, because then size_t and int will be the same size, and checked_cast will never complain. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732062256 From dlong at openjdk.org Tue Aug 27 03:39:04 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 27 Aug 2024 03:39:04 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: References: Message-ID: <9EDjaihyjacDFA_ky8NRAAgGw_ojoPMGOKVdqfJKH5M=.7d1dae77-6679-4c31-bd6e-ae655945dfae@github.com> On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote: >> This work has been split out from JDK-8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths >> >> The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. Though with compact strings this reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define UTF8 lengths as `size_t` to accommodate all possible representations. Higher-level API's can still use `int` if they know the strings (eg symbols) are sufficiently constrained in length. See the comments in utf8.hpp that explain Strings, compact strings and the encoding. >> >> As the existing JNI `GetStringUTFLength` still requires the current truncating behaviour of ` UNICODE::utf8_length` we add back `UNICODE::utf8_length_as_int` for it to use. >> >> Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` use `length` as an IN/OUT parameter: it is the incoming (int) length of the jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. This makes some of the call sites a little messy with casts. >> >> Testing: >> - tiers 1-4 >> - GHA > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > more missing casts src/hotspot/share/classfile/javaClasses.cpp line 633: > 631: } > 632: > 633: int java_lang_String::utf8_length_as_int(oop java_string, typeArrayOop value) { Why not call java_lang_String::utf8_length() here instead of duplicating the code? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732075517 From fyang at openjdk.org Tue Aug 27 04:28:04 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 27 Aug 2024 04:28:04 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v5] In-Reply-To: References: Message-ID: On Sun, 25 Aug 2024 22:21:16 GMT, Hamlin Li wrote: >> ## Performance >> benchmarks run on CanVM-K230 >> >> data >> >> Benchmark m2+m1+scalar | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score +intrinsic+rvv | Score -intrinsic | Error | Units | Improvement >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 1 | avgt | 10 | 97.771 | 98.506 | 0.713 | ns/op | 1.008 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 3 | avgt | 10 | 117.715 | 118.422 | 0.428 | ns/op | 1.006 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 7 | avgt | 10 | 174.625 | 172.767 | 7.671 | ns/op | 0.989 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 32 | avgt | 10 | 286.391 | 317.175 | 11.443 | ns/op | 1.107 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 64 | avgt | 10 | 336.932 | 503.257 | 15.738 | ns/op | 1.494 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 80 | avgt | 10 | 418.894 | 625.485 | 7.21 | ns/op | 1.493 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 96 | avgt | 10 | 353.813 | 698.67 | 15.485 | ns/op | 1.975 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 112 | avgt | 10 | 499.243 | 866.909 | 4.427 | ns/op | 1.736 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 512 | avgt | 10 | 1451.277 | 3530.048 | 3.685 | ns/op | 2.432 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 1000 | avgt | 10 | 2258.785 | 5964.066 | 9.075 | ns/op | 2.64 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 20000 | avgt | 10 | 39689.204 | 122334.929 | 255.195 | ns/op | 3.082 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1 | avgt | 10 | 187.032 | 158.558 | 7.606 | ns/op | 0.848 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 3 | avgt | 10 | 209.558 | 200.774 | 7.648 | ns/op | 0.958 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 7 | avgt | 10 | 556.696 | 505.072 | 8.748 | ns/op | 0.907 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 32 | avgt | 10 | 2139.767 | 1876.825 | 13.787 | ns/op | 0.877 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 64 | avgt | 10 | 6142.353 | 3818.199 | 35.622 | ns/op | 0.622 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 80 | avgt | 10 | 8746.205 | 4787.155 | 109.819 | ns/op ... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > refine src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5478: > 5476: // vector version > 5477: if (UseRVV) { > 5478: __ bnez(isMIME, ScalarLoop); BTW: I think this branch on `isMIME` also deserves a code comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1732107337 From jbhateja at openjdk.org Tue Aug 27 05:29:04 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 27 Aug 2024 05:29:04 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm In-Reply-To: References: Message-ID: <8CAXws7Rp6HKERu5hSTOrXi8GRFRdV4I670Nf8NSZlI=.ba6acccb-77e5-46a6-bec2-e0ea97dfe85d@github.com> On Wed, 21 Aug 2024 00:25:03 GMT, Srinivas Vamsi Parasa wrote: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x src/hotspot/cpu/x86/stubGenerator_x86_64_tanh.cpp line 305: > 303: #define __ _masm-> > 304: > 305: address StubGenerator::generate_libmTanh() { Please add the link to original source references from where algorithm is ported / disassembled. src/hotspot/cpu/x86/stubGenerator_x86_64_tanh.cpp line 437: > 435: __ mulpd(xmm1, xmm1); > 436: __ movdqu(xmm4, ExternalAddress(pv + 32), r11 /*rscratch*/); > 437: __ mulpd(xmm2, xmm1); I would encourage either you add detailed comments or give meaningful names to the registers to ease the review process. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1732140581 PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1732144262 From fyang at openjdk.org Tue Aug 27 05:42:05 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 27 Aug 2024 05:42:05 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 18:42:28 GMT, Hamlin Li wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp line 170: > >> 168: mv(tmp1, (int32_t)(intptr_t)markWord::prototype().value()); >> 169: sd(tmp1, Address(obj, oopDesc::mark_offset_in_bytes())); >> 170: // Todo UseCompactObjectHeaders > > Can I ask, will this pr fullly support riscv? @Hamlin-Li : AFAIK, porting to linux-riscv platform has NOT been started yet. To avoid duplicate work, please let me know if anyone is interested or has been working on it :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1732153574 From dholmes at openjdk.org Tue Aug 27 07:12:06 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 27 Aug 2024 07:12:06 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: <8_lpkeW6gtzUixDbpePNOfM6ZBadNASYoqhW9KhUciA=.34f548f5-843a-439c-b3cc-940d3932028a@github.com> References: <8_lpkeW6gtzUixDbpePNOfM6ZBadNASYoqhW9KhUciA=.34f548f5-843a-439c-b3cc-940d3932028a@github.com> Message-ID: On Tue, 27 Aug 2024 01:09:09 GMT, Dean Long wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> more missing casts > > src/hotspot/share/classfile/javaClasses.cpp line 307: > >> 305: { >> 306: ResourceMark rm; >> 307: size_t utf8_len = static_cast(length); > > I think there should be an assert that length is not negative, probably at the very beginning of this function. Why? As I explained to Coleen this is the length obtained from a Java array. All of the existing code relies on length being >= 0 and doesn't assert that anywhere. > src/hotspot/share/classfile/javaClasses.cpp line 350: > >> 348: // This check is too strict when the input string is not a valid UTF8. >> 349: // For example, it may be created with arbitrary content via jni_NewStringUTF. >> 350: if (UTF8::is_legal_utf8((const unsigned char*)utf8_str, strlen(utf8_str), false)) { > > Most of the time we use `is_legal_utf8`, we have a char* and have to cast it to unsigned char*. How about adding an inlined overload for is_legal_utf8(const char*, size_t) for conenience? Sorry out of scope for this change. > src/hotspot/share/classfile/javaClasses.cpp line 555: > >> 553: bool is_latin1 = java_lang_String::is_latin1(java_string); >> 554: >> 555: if (length == 0) return nullptr; > > Should this be checking for length <= 0? It looks like length can indeed be negative if UTF8::unicode_length() tries to return the length of a utf8 string with length > 0x7fffffff. ??? length is assigned on line 552 and again comes from the length of a Java array. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732264231 PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732264978 PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732267249 From dholmes at openjdk.org Tue Aug 27 07:23:04 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 27 Aug 2024 07:23:04 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 03:13:59 GMT, Dean Long wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> more missing casts > > src/hotspot/share/classfile/javaClasses.cpp line 588: > >> 586: size_t utf8_len = static_cast(length); >> 587: const char* base = UNICODE::as_utf8(position, utf8_len); >> 588: Symbol* sym = SymbolTable::new_symbol(base, checked_cast(utf8_len)); > > With the current limitations of checked_cast(), we would also need to check if the result is negative on 32-bit platforms, because then size_t and int will be the same size, and checked_cast will never complain. I'm trying to reason if on 32-bit we could even create a large enough string for this to be a problem? Once we have the giant string `as_utf8` will have to allocate an array that is just as large if not larger. So for overflow to be an issue we need a string of length INT_MAX - which is limited to 2GB and then we have to allocate a resource array of 2GB as well. So we need to have allocated 4GB which is our entire address space on 32-bit. So I don't think we can ever hit a problem on 32-bit where the size_t utf8 length would convert to a negative int. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732281358 From dholmes at openjdk.org Tue Aug 27 07:26:05 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 27 Aug 2024 07:26:05 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: <9EDjaihyjacDFA_ky8NRAAgGw_ojoPMGOKVdqfJKH5M=.7d1dae77-6679-4c31-bd6e-ae655945dfae@github.com> References: <9EDjaihyjacDFA_ky8NRAAgGw_ojoPMGOKVdqfJKH5M=.7d1dae77-6679-4c31-bd6e-ae655945dfae@github.com> Message-ID: On Tue, 27 Aug 2024 03:36:00 GMT, Dean Long wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> more missing casts > > src/hotspot/share/classfile/javaClasses.cpp line 633: > >> 631: } >> 632: >> 633: int java_lang_String::utf8_length_as_int(oop java_string, typeArrayOop value) { > > Why not call java_lang_String::utf8_length() here instead of duplicating the code? Because we have to call `UNICODE::utf8_length_as_int` to get the proper truncation - the code is not an exact duplicate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732284976 From rcastanedalo at openjdk.org Tue Aug 27 07:30:46 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 27 Aug 2024 07:30:46 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Rename g1XChgX to g1GetAndSetX for consistency with Ideal operation names ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/92112802..daf38d3f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=08-09 Stats: 10 lines in 4 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From dholmes at openjdk.org Tue Aug 27 07:31:06 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 27 Aug 2024 07:31:06 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote: >> This work has been split out from JDK-8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths >> >> The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. Though with compact strings this reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define UTF8 lengths as `size_t` to accommodate all possible representations. Higher-level API's can still use `int` if they know the strings (eg symbols) are sufficiently constrained in length. See the comments in utf8.hpp that explain Strings, compact strings and the encoding. >> >> As the existing JNI `GetStringUTFLength` still requires the current truncating behaviour of ` UNICODE::utf8_length` we add back `UNICODE::utf8_length_as_int` for it to use. >> >> Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` use `length` as an IN/OUT parameter: it is the incoming (int) length of the jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. This makes some of the call sites a little messy with casts. >> >> Testing: >> - tiers 1-4 >> - GHA > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > more missing casts Thanks for looking at this @dean-long ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20560#issuecomment-2311768153 From dlong at openjdk.org Tue Aug 27 07:34:03 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 27 Aug 2024 07:34:03 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: References: <8_lpkeW6gtzUixDbpePNOfM6ZBadNASYoqhW9KhUciA=.34f548f5-843a-439c-b3cc-940d3932028a@github.com> Message-ID: On Tue, 27 Aug 2024 07:07:11 GMT, David Holmes wrote: >> src/hotspot/share/classfile/javaClasses.cpp line 307: >> >>> 305: { >>> 306: ResourceMark rm; >>> 307: size_t utf8_len = static_cast(length); >> >> I think there should be an assert that length is not negative, probably at the very beginning of this function. > > Why? As I explained to Coleen this is the length obtained from a Java array. All of the existing code relies on length being >= 0 and doesn't assert that anywhere. If called from jni_NewString --> java_lang_String::create_oop_from_unicode, there's no Java String yet. Yes, it has probably been like this forever. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732297010 From rcastanedalo at openjdk.org Tue Aug 27 07:38:08 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 27 Aug 2024 07:38:08 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> <6PF-kgezzOb9Ed7j-BbrwaURnLJH5aFgOouFwTYiFrE=.670092b8-6980-43f3-a091-25312cfa0f1b@github.com> <1vQH6zpEgjhIO_mq9DCnpwxgDXmnqdx0owlvjJq4Fcw=.78e60c6e-2b23-4e94-a998-e7ba9eafcb6a@github.com> Message-ID: On Mon, 26 Aug 2024 13:23:16 GMT, Roberto Casta?eda Lozano wrote: > Sure, I agree that g1GetAndSetP and g1GetAndSetN are more consistent with the corresponding Ideal operation names, will rename the instructions in x64 and aarch64 and update the test expectations. Done (commit daf38d3). @offamitkumar @feilongjiang @snazarkin please note that the ADL instructions `g1XChgP` and `g1XChgN` have been renamed to `g1GetAndSetP` and `g1GetAndSetN`, and the same naming is expected across all platforms by the test `compiler/gcbarriers/TestG1BarrierGeneration.java` included in this changeset. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1732301224 From dlong at openjdk.org Tue Aug 27 07:38:05 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 27 Aug 2024 07:38:05 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: References: <8_lpkeW6gtzUixDbpePNOfM6ZBadNASYoqhW9KhUciA=.34f548f5-843a-439c-b3cc-940d3932028a@github.com> Message-ID: On Tue, 27 Aug 2024 07:09:33 GMT, David Holmes wrote: >> src/hotspot/share/classfile/javaClasses.cpp line 555: >> >>> 553: bool is_latin1 = java_lang_String::is_latin1(java_string); >>> 554: >>> 555: if (length == 0) return nullptr; >> >> Should this be checking for length <= 0? It looks like length can indeed be negative if UTF8::unicode_length() tries to return the length of a utf8 string with length > 0x7fffffff. > > ??? length is assigned on line 552 and again comes from the length of a Java array. OK. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732302004 From mli at openjdk.org Tue Aug 27 07:43:19 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 27 Aug 2024 07:43:19 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v6] In-Reply-To: References: Message-ID: <2U6vmyL7j1lBEIqBDfLdeV46YHqV4Rz2DL98glJID4Q=.4ffe1c9b-df61-466c-baad-90272b020124@github.com> > ## Performance > benchmarks run on CanVM-K230 > > data > > Benchmark m2+m1+scalar | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score +intrinsic+rvv | Score -intrinsic | Error | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Base64Decode.testBase64Decode | 0 | 144 | 4 | 1 | avgt | 10 | 97.771 | 98.506 | 0.713 | ns/op | 1.008 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 3 | avgt | 10 | 117.715 | 118.422 | 0.428 | ns/op | 1.006 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 7 | avgt | 10 | 174.625 | 172.767 | 7.671 | ns/op | 0.989 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 32 | avgt | 10 | 286.391 | 317.175 | 11.443 | ns/op | 1.107 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 64 | avgt | 10 | 336.932 | 503.257 | 15.738 | ns/op | 1.494 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 80 | avgt | 10 | 418.894 | 625.485 | 7.21 | ns/op | 1.493 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 96 | avgt | 10 | 353.813 | 698.67 | 15.485 | ns/op | 1.975 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 112 | avgt | 10 | 499.243 | 866.909 | 4.427 | ns/op | 1.736 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 512 | avgt | 10 | 1451.277 | 3530.048 | 3.685 | ns/op | 2.432 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 1000 | avgt | 10 | 2258.785 | 5964.066 | 9.075 | ns/op | 2.64 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 20000 | avgt | 10 | 39689.204 | 122334.929 | 255.195 | ns/op | 3.082 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1 | avgt | 10 | 187.032 | 158.558 | 7.606 | ns/op | 0.848 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 3 | avgt | 10 | 209.558 | 200.774 | 7.648 | ns/op | 0.958 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 7 | avgt | 10 | 556.696 | 505.072 | 8.748 | ns/op | 0.907 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 32 | avgt | 10 | 2139.767 | 1876.825 | 13.787 | ns/op | 0.877 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 64 | avgt | 10 | 6142.353 | 3818.199 | 35.622 | ns/op | 0.622 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 80 | avgt | 10 | 8746.205 | 4787.155 | 109.819 | ns/op | 0.547 > Base64Decode.testBase64MIMEDecode | 0 | ... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20026/files - new: https://git.openjdk.org/jdk/pull/20026/files/1848c2fd..362cbfae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20026&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20026&range=04-05 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20026.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20026/head:pull/20026 PR: https://git.openjdk.org/jdk/pull/20026 From mli at openjdk.org Tue Aug 27 07:43:19 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 27 Aug 2024 07:43:19 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v5] In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 04:25:33 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> refine > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5478: > >> 5476: // vector version >> 5477: if (UseRVV) { >> 5478: __ bnez(isMIME, ScalarLoop); > > BTW: I think this branch on `isMIME` also deserves a code comment about why it's needed. also added. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1732308448 From mli at openjdk.org Tue Aug 27 07:43:19 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 27 Aug 2024 07:43:19 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v4] In-Reply-To: <-_qE1z1Gn-8fZzM5_xT41eAqUhjow3zRMsNX4tPOxlI=.451e56b1-5d1d-45d4-a2fd-44cc376d0b4c@github.com> References: <9CaMDk7tdTqmlXequDf4H-5ozalxbrVCb4E5E6AjkVE=.51211e3a-ad04-46e0-8aa4-a07c2452e625@github.com> <1gDwqRoefl8BhrD4CF6oNyWoi_6nuMTG44zdDhVaoH8=.7983dfe7-2026-40b7-9e21-5296a84ae1e9@github.com> <-_qE1z1Gn-8fZzM5_xT41eAqUhjow3zRMsNX4tPOxlI=.451e56b1-5d1d-45d4-a2fd-44cc376d0b4c@github.com> Message-ID: On Tue, 27 Aug 2024 02:40:45 GMT, Fei Yang wrote: >> `(b1 | b2 | b3 | b4) < 0`, this java code is to tell if any of b1-b4 is < 0. >> On the other side, `bltz(combined32Bits, Exit)` is doing the similar thing (same effect, but different way), as when loading byte1-4, `lb` will sign-extend it, if any of byte1-4 < 0, then the top bit will be 1, and after shift left, top bit will still be 1, so as a result, we can use combined32Bits to tell if any of byte1-4 < 0, and at the same time, we can use combined32Bits to decode the final data (from 4 bytes to 3 bytes). Or to put it another way, combined32Bits is constructed for 2 purposes at the same time. >> Hope this answers your question? > > Yeah. Interesting. Could you please add a small code comment for this? Thanks. Sure, added. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1732308194 From mli at openjdk.org Tue Aug 27 07:46:10 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 27 Aug 2024 07:46:10 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 05:37:30 GMT, Fei Yang wrote: >> src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp line 170: >> >>> 168: mv(tmp1, (int32_t)(intptr_t)markWord::prototype().value()); >>> 169: sd(tmp1, Address(obj, oopDesc::mark_offset_in_bytes())); >>> 170: // Todo UseCompactObjectHeaders >> >> Can I ask, will this pr fullly support riscv? > > @Hamlin-Li : AFAIK, porting to linux-riscv platform has NOT been started yet. To avoid duplicate work, please let me know if anyone is interested or has been working on it :-) Yes, I'm interested in it. Thanks for raising the discussion. :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1732312058 From dlong at openjdk.org Tue Aug 27 07:54:04 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 27 Aug 2024 07:54:04 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 07:20:27 GMT, David Holmes wrote: >> src/hotspot/share/classfile/javaClasses.cpp line 588: >> >>> 586: size_t utf8_len = static_cast(length); >>> 587: const char* base = UNICODE::as_utf8(position, utf8_len); >>> 588: Symbol* sym = SymbolTable::new_symbol(base, checked_cast(utf8_len)); >> >> With the current limitations of checked_cast(), we would also need to check if the result is negative on 32-bit platforms, because then size_t and int will be the same size, and checked_cast will never complain. > > I'm trying to reason if on 32-bit we could even create a large enough string for this to be a problem? Once we have the giant string `as_utf8` will have to allocate an array that is just as large if not larger. So for overflow to be an issue we need a string of length INT_MAX - which is limited to 2GB and then we have to allocate a resource array of 2GB as well. So we need to have allocated 4GB which is our entire address space on 32-bit. So I don't think we can ever hit a problem on 32-bit where the size_t utf8 length would convert to a negative int. I think the Java string would only need to be INT_MAX/3 in length, if all the characters require surrogate encoding. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732326074 From dlong at openjdk.org Tue Aug 27 07:58:05 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 27 Aug 2024 07:58:05 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: References: <9EDjaihyjacDFA_ky8NRAAgGw_ojoPMGOKVdqfJKH5M=.7d1dae77-6679-4c31-bd6e-ae655945dfae@github.com> Message-ID: <1dWEh2WCVFI2FpWngQiolei0-ElCe499Twk5Oqx15w4=.1a8fff06-e76c-4adf-8501-f04dce40c31c@github.com> On Tue, 27 Aug 2024 07:23:14 GMT, David Holmes wrote: >> src/hotspot/share/classfile/javaClasses.cpp line 633: >> >>> 631: } >>> 632: >>> 633: int java_lang_String::utf8_length_as_int(oop java_string, typeArrayOop value) { >> >> Why not call java_lang_String::utf8_length() here instead of duplicating the code? > > Because we have to call `UNICODE::utf8_length_as_int` to get the proper truncation - the code is not an exact duplicate. OK, I missed that. It's probably not worth it to refactor out the common code using C++ lambda expressions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732331731 From rehn at openjdk.org Tue Aug 27 08:02:04 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 27 Aug 2024 08:02:04 GMT Subject: RFR: 8338727: RISC-V: Avoid synthetic data dependency in nmethod barrier on Ztso [v2] In-Reply-To: References: Message-ID: On Mon, 26 Aug 2024 06:24:15 GMT, Robbin Ehn wrote: >> Hi please consider, >> >> On TSO we don't need the synthetic data dependency in between the loads. >> Also added some comment about this. >> >> Sanity tested > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Comment update I'm still confused by the re-review thingy. If I integrate now will @Hamlin-Li still get credit or do he need to re-review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20661#issuecomment-2311826203 From mli at openjdk.org Tue Aug 27 08:12:03 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 27 Aug 2024 08:12:03 GMT Subject: RFR: 8338727: RISC-V: Avoid synthetic data dependency in nmethod barrier on Ztso [v2] In-Reply-To: References: Message-ID: On Mon, 26 Aug 2024 06:24:15 GMT, Robbin Ehn wrote: >> Hi please consider, >> >> On TSO we don't need the synthetic data dependency in between the loads. >> Also added some comment about this. >> >> Sanity tested > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Comment update Marked as reviewed by mli (Reviewer). Yeah, the new process is bit tedious, espcially for additional minor change after one approval. Maybe we can improve it a bit ourself in riscv area? For minor changes after approval, we can skip re-review/approve? ------------- PR Review: https://git.openjdk.org/jdk/pull/20661#pullrequestreview-2262498987 PR Comment: https://git.openjdk.org/jdk/pull/20661#issuecomment-2311845544 From aph at openjdk.org Tue Aug 27 08:20:06 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 27 Aug 2024 08:20:06 GMT Subject: RFR: 8327652: S390x: Implements SLP support [v9] In-Reply-To: <2u6QOgOLGnFfTH4p_1atCbXiTF0zcaNyBQA6174dkZc=.b8070aec-a8bd-4c27-9837-ccc73861071b@github.com> References: <2u6QOgOLGnFfTH4p_1atCbXiTF0zcaNyBQA6174dkZc=.b8070aec-a8bd-4c27-9837-ccc73861071b@github.com> Message-ID: <8c2LNTjTARrjDpqWc8hGnJXvUd9m5LQuJLWxL68jSB0=.1b6daed4-a6fb-4a31-8c71-278c503d348f@github.com> On Mon, 26 Aug 2024 20:57:02 GMT, Martin Doerr wrote: > Ok, so on PPC64 we only have one VMReg per VectorRegister. Seems to work. I don't know why s390 needs the individual ones, but that may be better. Except in terms of resource usage and register allocation performance I guess. Sure, I've wondered many times how PPC64 works. It's odd, and it's certainly not how the register allocator was ever intended to work, but it certainly seems to. maybe one day I'll dig in to it, but not today. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18162#issuecomment-2311862042 From dlong at openjdk.org Tue Aug 27 08:26:05 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 27 Aug 2024 08:26:05 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote: >> This work has been split out from JDK-8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths >> >> The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. Though with compact strings this reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define UTF8 lengths as `size_t` to accommodate all possible representations. Higher-level API's can still use `int` if they know the strings (eg symbols) are sufficiently constrained in length. See the comments in utf8.hpp that explain Strings, compact strings and the encoding. >> >> As the existing JNI `GetStringUTFLength` still requires the current truncating behaviour of ` UNICODE::utf8_length` we add back `UNICODE::utf8_length_as_int` for it to use. >> >> Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` use `length` as an IN/OUT parameter: it is the incoming (int) length of the jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. This makes some of the call sites a little messy with casts. >> >> Testing: >> - tiers 1-4 >> - GHA > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > more missing casts src/hotspot/share/utilities/utf8.cpp line 127: > 125: prev = c; > 126: } > 127: return checked_cast(num_chars); Ideally, this function would return size_t. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732370827 From rehn at openjdk.org Tue Aug 27 08:45:06 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 27 Aug 2024 08:45:06 GMT Subject: Integrated: 8338727: RISC-V: Avoid synthetic data dependency in nmethod barrier on Ztso In-Reply-To: References: Message-ID: On Wed, 21 Aug 2024 10:01:21 GMT, Robbin Ehn wrote: > Hi please consider, > > On TSO we don't need the synthetic data dependency in between the loads. > Also added some comment about this. > > Sanity tested This pull request has now been integrated. Changeset: aefdbdc7 Author: Robbin Ehn URL: https://git.openjdk.org/jdk/commit/aefdbdc7e54ae92b5c2113504ce17abf00681e62 Stats: 13 lines in 1 file changed: 8 ins; 0 del; 5 mod 8338727: RISC-V: Avoid synthetic data dependency in nmethod barrier on Ztso Reviewed-by: mli, fyang ------------- PR: https://git.openjdk.org/jdk/pull/20661 From jzhu at openjdk.org Tue Aug 27 09:35:32 2024 From: jzhu at openjdk.org (Joshua Zhu) Date: Tue, 27 Aug 2024 09:35:32 GMT Subject: RFR: 8339063: [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only Message-ID: Please review this minor enhancement that skips verify_sve_vector_length after native calls. It works on SVE architecture that only supports 128-bit vector length. ------------- Commit messages: - [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only Changes: https://git.openjdk.org/jdk/pull/20724/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20724&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339063 Stats: 54 lines in 6 files changed: 32 ins; 14 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/20724.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20724/head:pull/20724 PR: https://git.openjdk.org/jdk/pull/20724 From jbhateja at openjdk.org Tue Aug 27 09:58:44 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 27 Aug 2024 09:58:44 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v6] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/6cb1a46d..408a8694 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=04-05 Stats: 112 lines in 7 files changed: 91 ins; 14 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From jbhateja at openjdk.org Tue Aug 27 10:04:04 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 27 Aug 2024 10:04:04 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v5] In-Reply-To: <2_P1qPMS46tgh4RUSuitcjXYnd0koS_BxfRRRmj79EY=.c3baeeaa-87f7-47d4-bc70-ae2afd9de745@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7e5pWnvjqk-dQYNeaZjFzXcd5WlzniZPl5T4l1rKQGE=.0882bcd4-e307-4a29-aa41-5496ee029a60@github.com> <2_P1qPMS46tgh4RUSuitcjXYnd0koS_BxfRRRmj79EY=.c3baeeaa-87f7-47d4-bc70-ae2afd9de745@github.com> Message-ID: On Fri, 23 Aug 2024 22:29:46 GMT, Paul Sandoz wrote: > API changes look good. (Note at the moment we are not proposing to change how shuffles works - as you point out the two vector `selectFrom` and `rearrange` differ in the index representation.) > > IIUC if the more direct two-table instruction is not available you fall back to calling two single arg rearranges with a blend, as a lowering transformation, similar to the fallback Java expression. > > The float/double conversion bothers me, not suggesting we do something about it here, noting down for any future conversation on shuffles. Ideally we would want the equivalent integral vector (int or long) to represent the index, tricky to express in the API, or alternative treat as a bitwise no-op conversion (there is also impact on `toShuffle` too). Thanks @PaulSandoz, > IIUC if the more direct two-table instruction is not available you fall back to calling two single arg rearranges with a blend, as > a lowering transformation, similar to the fallback Java expression. Idea here is to be performant as much as possible and save additional boxing penalties incurred due to failed intrinsification if target does not directly support two vector permutation but does supports its constituents. I have now unwrapped and optimized the fallback implementation to directly operates over index vector lanes instead going through intermediate shuffle. > > The float/double conversion bothers me, not suggesting we do something about it here, noting down for any future conversation on shuffles. I Agree. > Ideally we would want the equivalent integral vector (int or long) to represent the index, tricky to express in the API, or alternative treat as a bitwise no-op conversion (there is also impact on `toShuffle` too). Since floating-point index vector may carry special values like NaN, POSITIVE_INFINITY and NEGATIVE_INFINITY, thus with default wrapping semantics, its necessary to convert this into integral vector followed by wrapping normalization to valid two vector index range, through existing sequence we are bypassing partial wrapping (part to toShuffle) altogether which may save few instruction. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2312089987 From duke at openjdk.org Tue Aug 27 10:30:07 2024 From: duke at openjdk.org (Yuri Gaevsky) Date: Tue, 27 Aug 2024 10:30:07 GMT Subject: RFR: 8324124: RISC-V: implement _vectorizedMismatch intrinsic In-Reply-To: References: Message-ID: On Wed, 7 Feb 2024 14:35:55 GMT, Yuri Gaevsky wrote: > Hello All, > > Please review these changes to enable the __vectorizedMismatch_ intrinsic on RISC-V platform with RVV instructions supported. > > Thank you, > -Yuri Gaevsky > > **Correctness checks:** > hotspot/jtreg/compiler/{intrinsic/c1/c2}/ under QEMU-8.1 with RVV v1.0.0 and -XX:TieredStopAtLevel=1/2/3/4. . ------------- PR Comment: https://git.openjdk.org/jdk/pull/17750#issuecomment-2312146364 From aph at openjdk.org Tue Aug 27 10:57:08 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 27 Aug 2024 10:57:08 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm In-Reply-To: <8CAXws7Rp6HKERu5hSTOrXi8GRFRdV4I670Nf8NSZlI=.ba6acccb-77e5-46a6-bec2-e0ea97dfe85d@github.com> References: <8CAXws7Rp6HKERu5hSTOrXi8GRFRdV4I670Nf8NSZlI=.ba6acccb-77e5-46a6-bec2-e0ea97dfe85d@github.com> Message-ID: On Tue, 27 Aug 2024 05:24:34 GMT, Jatin Bhateja wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm >> >> Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup >> -- | -- | -- | -- >> MathBench.tanhDouble | 70900 | 95618 | 1.35x > > src/hotspot/cpu/x86/stubGenerator_x86_64_tanh.cpp line 437: > >> 435: __ mulpd(xmm1, xmm1); >> 436: __ movdqu(xmm4, ExternalAddress(pv + 32), r11 /*rscratch*/); >> 437: __ mulpd(xmm2, xmm1); > > I would encourage either you add detailed comments or give meaningful names to the registers to ease the review process. I agree, this is all rather obscure. Ideally the same names that are used in wherever this comes from. Where does the algorithm come from? What are its accuracy guarantees? In addition, given the rarity of hyperbolic tangents in Java applications, do we need this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1732613573 From jbhateja at openjdk.org Tue Aug 27 11:13:09 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 27 Aug 2024 11:13:09 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm In-Reply-To: References: Message-ID: On Wed, 21 Aug 2024 00:25:03 GMT, Srinivas Vamsi Parasa wrote: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x Hi @vamsi-parasa , Kindly also add a JMH micro benchmark, I did a first run and see around 4% performance drop with attached micro on Sapphire Rapids. [test.txt](https://github.com/user-attachments/files/16761142/test.txt) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20657#issuecomment-2312254535 From sroy at openjdk.org Tue Aug 27 11:14:03 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Tue, 27 Aug 2024 11:14:03 GMT Subject: RFR: 8338814: [PPC64] Unify interface of cmpxchg for different types In-Reply-To: References: Message-ID: <_dUodALMeSkSlgTW9JAT-hUl4O2kvDy19I66QgwHI9k=.d4b1f994-b5f2-4a83-bb20-49a64931df83@github.com> On Fri, 23 Aug 2024 09:58:21 GMT, Martin Doerr wrote: > PPC64 code has very complicated cmpxchg functions in MacroAssembler. We should have at least a unified argument list for the different types and the features should be usable with all types. > I have also cleaned up the `RegisterOrConstant` functions because they are used by the cmpxchg code. > > One difference in the argument list still exists: `cmpxchgb` and `cmpxchgh` use extra temp registers to support older processors. They should get removed with [JDK-8331859](https://bugs.openjdk.org/browse/JDK-8331859). LGTM. Thanks for unifying interfaces. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20689#issuecomment-2312258399 From mdoerr at openjdk.org Tue Aug 27 11:54:09 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 27 Aug 2024 11:54:09 GMT Subject: RFR: 8338814: [PPC64] Unify interface of cmpxchg for different types In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 09:58:21 GMT, Martin Doerr wrote: > PPC64 code has very complicated cmpxchg functions in MacroAssembler. We should have at least a unified argument list for the different types and the features should be usable with all types. > I have also cleaned up the `RegisterOrConstant` functions because they are used by the cmpxchg code. > > One difference in the argument list still exists: `cmpxchgb` and `cmpxchgh` use extra temp registers to support older processors. They should get removed with [JDK-8331859](https://bugs.openjdk.org/browse/JDK-8331859). Thanks for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20689#issuecomment-2312347059 From mdoerr at openjdk.org Tue Aug 27 11:54:09 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 27 Aug 2024 11:54:09 GMT Subject: Integrated: 8338814: [PPC64] Unify interface of cmpxchg for different types In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 09:58:21 GMT, Martin Doerr wrote: > PPC64 code has very complicated cmpxchg functions in MacroAssembler. We should have at least a unified argument list for the different types and the features should be usable with all types. > I have also cleaned up the `RegisterOrConstant` functions because they are used by the cmpxchg code. > > One difference in the argument list still exists: `cmpxchgb` and `cmpxchgh` use extra temp registers to support older processors. They should get removed with [JDK-8331859](https://bugs.openjdk.org/browse/JDK-8331859). This pull request has now been integrated. Changeset: 2edf574f Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/2edf574f62837678e621e1dfdd8d8a77dbe17ad6 Stats: 123 lines in 9 files changed: 38 ins; 3 del; 82 mod 8338814: [PPC64] Unify interface of cmpxchg for different types Reviewed-by: lucy ------------- PR: https://git.openjdk.org/jdk/pull/20689 From jzhu at openjdk.org Tue Aug 27 12:08:02 2024 From: jzhu at openjdk.org (Joshua Zhu) Date: Tue, 27 Aug 2024 12:08:02 GMT Subject: RFR: 8339063: [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only In-Reply-To: References: Message-ID: <3xANA_53kzHFctOOX8vAJfxEvlzoyjpx497IAUa2XWQ=.2377fd89-03e0-4ee8-9f71-d7433e593a77@github.com> On Tue, 27 Aug 2024 09:28:52 GMT, Joshua Zhu wrote: > Please review this minor enhancement that skips verify_sve_vector_length after native calls. > It works on SVE micro-architecture that only supports 128-bit vector length. Add some more background. The maximum SVE vector length "VLmax" is determined by the hardware: 16 <= VLmax <= 256. The value of VL can be configured at runtime: 16 <= VL <= VLmax, where VL must be a multiple of 16. Once we find cpu's VLMax is 16 bytes only, the verification "verify_sve_vector_length()" after native calls is not required - in other words, VL cannot be configured to a value other than 16. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20724#issuecomment-2312380459 From dholmes at openjdk.org Tue Aug 27 12:13:05 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 27 Aug 2024 12:13:05 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 08:22:57 GMT, Dean Long wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> more missing casts > > src/hotspot/share/utilities/utf8.cpp line 127: > >> 125: prev = c; >> 126: } >> 127: return checked_cast(num_chars); > > Ideally, this function would return size_t. Why? I think that would have a large flow on effect. And this length does fit in an int. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732727140 From dholmes at openjdk.org Tue Aug 27 12:23:06 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 27 Aug 2024 12:23:06 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 07:51:38 GMT, Dean Long wrote: >> I'm trying to reason if on 32-bit we could even create a large enough string for this to be a problem? Once we have the giant string `as_utf8` will have to allocate an array that is just as large if not larger. So for overflow to be an issue we need a string of length INT_MAX - which is limited to 2GB and then we have to allocate a resource array of 2GB as well. So we need to have allocated 4GB which is our entire address space on 32-bit. So I don't think we can ever hit a problem on 32-bit where the size_t utf8 length would convert to a negative int. > > I think the Java string would only need to be INT_MAX/3 in length, if all the characters require surrogate encoding. IIUC for compact strings, with non-latin-1 each pair of bytes would require at most 3-bytes to encode so you'd need 2/3 of INT_MAX. With latin-1 it would be 1/2 INT_MAX. But yes I suppose in theory you might be able to get an overflow on 32-bit. Need to think more about what could even be done for this case ... and whether it is worth trying ... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732741739 From rcastanedalo at openjdk.org Tue Aug 27 12:39:07 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 27 Aug 2024 12:39:07 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> <6PF-kgezzOb9Ed7j-BbrwaURnLJH5aFgOouFwTYiFrE=.670092b8-6980-43f3-a091-25312cfa0f1b@github.com> <1vQH6zpEgjhIO_mq9DCnpwxgDXmnqdx0owlvjJq4Fcw=.78e60c6e-2b23-4e94-a998-e7ba9eafcb6a@github.com> Message-ID: On Tue, 27 Aug 2024 07:34:57 GMT, Roberto Casta?eda Lozano wrote: >>> That one is among the failing tests. Can we agree on better names than g1XChgP and g1XChgN? They are not readable very well IMHO. >> >> Sure, I agree that `g1GetAndSetP` and `g1GetAndSetN` are more consistent with the corresponding Ideal operation names, will rename the instructions in x64 and aarch64 and update the test expectations. >> >>> Regardless if you implement the compressed oops optimization or not, I'd reconsider moving the oop decoding into G1BarrierSetAssembler::g1_write_barrier_post_c2 because it makes the .ad file shorter because you can get rid of the replicated decode_heap_oop. >> >> Thanks, will try it out. > >> Sure, I agree that g1GetAndSetP and g1GetAndSetN are more consistent with the corresponding Ideal operation names, will rename the instructions in x64 and aarch64 and update the test expectations. > > Done (commit daf38d3). > > @offamitkumar @feilongjiang @snazarkin please note that the ADL instructions `g1XChgP` and `g1XChgN` have been renamed to `g1GetAndSetP` and `g1GetAndSetN`, and the same naming is expected across all platforms by the test `compiler/gcbarriers/TestG1BarrierGeneration.java` included in this changeset. > Regardless if you implement the compressed oops optimization or not, I'd reconsider moving the oop decoding into G1BarrierSetAssembler::g1_write_barrier_post_c2 because it makes the .ad file shorter because you can get rid of the replicated decode_heap_oop. I tried this refactoring [here](https://github.com/openjdk/jdk/commit/d4e83fd7d77c5415700b33556752e8c8da811dea), thanks again Martin for the suggestion. In my opinion, the result is similar in terms of readability/maintainability because the benefit of removing explicit `decode_heap_oop` operations in the ADL file is somewhat negated by the increased complexity of the `write_barrier_post` and `g1_write_barrier_post_c2` signatures. For aarch64, moving non-destructive `decode_heap_oop` operations would probably require passing both source and destination registers to these functions explicitly, which would make them more complex. I feel that this refactoring is only amortized when the new-value decoding operations are more complex, as in your PPC implementation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1732770143 From aturbanov at openjdk.org Tue Aug 27 12:55:09 2024 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Tue, 27 Aug 2024 12:55:09 GMT Subject: RFR: 8204681: Option to include timestamp in hprof filename In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 15:07:17 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses [8204681](https://bugs.openjdk.org/browse/JDK-8204681) enabling support for timestamp expansion in filenames specified in `-XX:HeapDumpPath` using `%t`. > > As mentioned in this comments for this issue, this is somewhat related to [8334492](https://bugs.openjdk.org/browse/JDK-8334492) where we enabled support for `%p` for filenames specified in jcmd. > > With this patch, I propose: > - Expanding the utility function `Arguments::copy_expand_pid` to `Arguments::copy_expand_arguments` to deal with `%p` expansions for pid and `%t` expansions for timestamps. > - Leveraging the above utility function to enable argument expansion for both heap dump filenames and jcmd output commands. > - Though the linked JBS issue only relates to heap dumps generated in case of OOM, I think we can edit it to more broadly support filename expansion to support `%t` for jcmd as well. > > Testing: > - [x] Added test cases pass with all platforms (verified with a GHA job). > - [x] Tier 1 passes with GHA. > > Looking forward to hearing your thoughts! > > Thanks, > Sonia test/hotspot/jtreg/runtime/ErrorHandling/TestHeapDumpFilenameExpansion.java line 53: > 51: try { > 52: Object[] oa = new Object[Integer.MAX_VALUE]; > 53: for(int i = 0; i < oa.length; i++) { Suggestion: for (int i = 0; i < oa.length; i++) { test/hotspot/jtreg/runtime/ErrorHandling/TestHeapDumpFilenameExpansion.java line 90: > 88: Pattern pattern = Pattern.compile("file\\d{4}-\\d{2}-\\d{2}_\\d{2}-\\d{2}-\\d{2}"); > 89: File[] files = new File(".").listFiles(); > 90: if(files != null) { Suggestion: if (files != null) { test/jdk/sun/tools/jcmd/TestJcmdArgumentSubstitution.java line 87: > 85: Pattern pattern = Pattern.compile("myfile\\d{4}-\\d{2}-\\d{2}_\\d{2}-\\d{2}-\\d{2}"); > 86: File[] files = new File(test_dir).listFiles(); > 87: if(files != null) { Suggestion: if (files != null) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20568#discussion_r1732794735 PR Review Comment: https://git.openjdk.org/jdk/pull/20568#discussion_r1732794999 PR Review Comment: https://git.openjdk.org/jdk/pull/20568#discussion_r1732795548 From stuefe at openjdk.org Tue Aug 27 13:09:04 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 27 Aug 2024 13:09:04 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 12:20:04 GMT, David Holmes wrote: >> I think the Java string would only need to be INT_MAX/3 in length, if all the characters require surrogate encoding. > > IIUC for compact strings, with non-latin-1 each pair of bytes would require at most 3-bytes to encode so you'd need 2/3 of INT_MAX. With latin-1 it would be 1/2 INT_MAX. But yes I suppose in theory you might be able to get an overflow on 32-bit. Need to think more about what could even be done for this case ... and whether it is worth trying ... SymbolTable does check the length and truncates with a warning (see https://github.com/openjdk/jdk/blob/0c332e9de919184d8a4678bfd7c274fcef02b3e2/src/hotspot/share/classfile/symbolTable.cpp#L351-L360) though it does not seem to check for values < 0. Maybe we should add that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732816650 From ihse at openjdk.org Tue Aug 27 13:50:03 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 27 Aug 2024 13:50:03 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: <4zUGEcC6eLmdq0wAqDCgAjsU17u6-sQNv8KZVQ8pCKc=.f8801e0b-8351-4af9-9825-70ccfa63847a@github.com> References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> <4zUGEcC6eLmdq0wAqDCgAjsU17u6-sQNv8KZVQ8pCKc=.f8801e0b-8351-4af9-9825-70ccfa63847a@github.com> Message-ID: On Mon, 26 Aug 2024 09:39:28 GMT, Magnus Ihse Bursie wrote: >> I understand the cost overhead experienced by any individual Java run may be lost in the noise, but it still impacts every single Java run just to save some time/resources for the handful of builders of statically linked VMs. I am not a fan. > >> but it still impacts every single Java run just to save some time/resources for the handful of builders of statically linked VMs. > > Seriously? I challenge you do prove there is any effect at all. :-/ > > Also, there is not a "handful" of builders of static libraries. Our internal CI system builds static libraries all the time, and I for one would be glad to use these resources on more productive stuff than building all object files twice. > > Also, the intention is to enable static builds by default on GHA, once the entire process of making static builds "dirt cheap" is finished, to avoid regressions. > Hi @magicus, perhaps the answer is both yes and no. [...] Since the mainline doesn't have the needed build changes to have the ability to link a javastatic binary, from that point of view all the static cases in the PR cannot be tested yet. I don't think that is correct. This PR just modifies the existing places where static and dynamic libraries are handled differently. These have been put in place by prior users of static libraries (mobile, graal), and do not require the Hermetic Java "javastatic" launcher to test. I honestly thought this part was going to be a no-brainer, a simple preparation for future things to come. I'm surprised that it seems to be so controversial. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20666#issuecomment-2312613235 From ihse at openjdk.org Tue Aug 27 13:58:07 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 27 Aug 2024 13:58:07 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: On Wed, 21 Aug 2024 22:14:40 GMT, Magnus Ihse Bursie wrote: >> As a preparation for Hermetic Java, we need to have a way to look up during runtime if we are using a statically linked library or not. >> >> This change will be the first step needed towards compiling the object files only once, and then link them into either dynamic or static libraries. (The only exception will be the linktype.c[pp] files, which needs to be compiled twice, once for the dynamic libraries and once for the static libraries.) Getting there will require further work though. >> >> This is part of the changes that make up the draft PR https://github.com/openjdk/jdk/pull/19478, which I have broken out. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Also update build to link properly And the discussion whether the checks are made "dynamically" or "statically" is too simplified to be really helpful. Currently, we compile two sets of all object files, with slightly different compiler arguments, one for dynamic libraries and one for static libraries. Files that are doing things differently for these two modes have an #ifdef, so the alternative way of doing things are not included in the object file. In your branch, you still have a separate compilation of all files for static builds, but you also try to figure out through various means (which involves jumping through some hoops to get the bootstrapping right) if this is a static build or a dynamic build. In a way, one could argue that this is just worse than the current solution, since you are still recompiling all files separately for static libraries so you could "know" at build time if you are static or not. What I am trying to do is to get to a point where we can compile almost all files just once, and then have two trivially small files that are compiled twice, with just a different value of a define that makes the difference. To propagate this information to all other object files, they need to call the function provided in this object file. So, is it then a "build time" lookup or a "runtime lookup", or a "static lookup" vs "dynamic lookup"? The semantics does not really matter. The whole point is that the difference in build is reduced to an absolute minimum. Sure, this single "lookup" function could be created more like the way you are doing in your branch to try to figure this out without the help of the build system, but there is really no point in that. This is a simple and elegant solution. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20666#issuecomment-2312637272 From zzambers at openjdk.org Tue Aug 27 14:15:08 2024 From: zzambers at openjdk.org (Zdenek Zambersky) Date: Tue, 27 Aug 2024 14:15:08 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v5] In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 17:34:46 GMT, Severin Gehwolf wrote: >> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. >> >> I'm adding those tests in order to not regress another time. >> >> Testing: >> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. >> - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) >> - [x] GHA > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Add Whitebox check for host cpu > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Fix comments > - 8333446: Add tests for hierarchical container support If I am not mistaken, new test requires, that testsuite is ran as superuser (root). (Because it writes `/etc/systemd/system`, runs certain systemd commands). Should test be skipped for non-root? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2312691761 From mbaesken at openjdk.org Tue Aug 27 14:27:07 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 27 Aug 2024 14:27:07 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v5] In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 17:34:46 GMT, Severin Gehwolf wrote: >> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. >> >> I'm adding those tests in order to not regress another time. >> >> Testing: >> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. >> - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) >> - [x] GHA > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Add Whitebox check for host cpu > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Fix comments > - 8333446: Add tests for hierarchical container support I added the PR to our internal build/test queue . ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2312720521 From fyang at openjdk.org Tue Aug 27 14:54:11 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 27 Aug 2024 14:54:11 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v6] In-Reply-To: <2U6vmyL7j1lBEIqBDfLdeV46YHqV4Rz2DL98glJID4Q=.4ffe1c9b-df61-466c-baad-90272b020124@github.com> References: <2U6vmyL7j1lBEIqBDfLdeV46YHqV4Rz2DL98glJID4Q=.4ffe1c9b-df61-466c-baad-90272b020124@github.com> Message-ID: On Tue, 27 Aug 2024 07:43:19 GMT, Hamlin Li wrote: >> ## Performance >> benchmarks run on CanVM-K230 >> >> data >> >> Benchmark m2+m1+scalar | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score +intrinsic+rvv | Score -intrinsic | Error | Units | Improvement >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 1 | avgt | 10 | 97.771 | 98.506 | 0.713 | ns/op | 1.008 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 3 | avgt | 10 | 117.715 | 118.422 | 0.428 | ns/op | 1.006 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 7 | avgt | 10 | 174.625 | 172.767 | 7.671 | ns/op | 0.989 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 32 | avgt | 10 | 286.391 | 317.175 | 11.443 | ns/op | 1.107 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 64 | avgt | 10 | 336.932 | 503.257 | 15.738 | ns/op | 1.494 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 80 | avgt | 10 | 418.894 | 625.485 | 7.21 | ns/op | 1.493 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 96 | avgt | 10 | 353.813 | 698.67 | 15.485 | ns/op | 1.975 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 112 | avgt | 10 | 499.243 | 866.909 | 4.427 | ns/op | 1.736 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 512 | avgt | 10 | 1451.277 | 3530.048 | 3.685 | ns/op | 2.432 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 1000 | avgt | 10 | 2258.785 | 5964.066 | 9.075 | ns/op | 2.64 >> Base64Decode.testBase64Decode | 0 | 144 | 4 | 20000 | avgt | 10 | 39689.204 | 122334.929 | 255.195 | ns/op | 3.082 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1 | avgt | 10 | 187.032 | 158.558 | 7.606 | ns/op | 0.848 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 3 | avgt | 10 | 209.558 | 200.774 | 7.648 | ns/op | 0.958 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 7 | avgt | 10 | 556.696 | 505.072 | 8.748 | ns/op | 0.907 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 32 | avgt | 10 | 2139.767 | 1876.825 | 13.787 | ns/op | 0.877 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 64 | avgt | 10 | 6142.353 | 3818.199 | 35.622 | ns/op | 0.622 >> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 80 | avgt | 10 | 8746.205 | 4787.155 | 109.819 | ns/op ... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > comments Updated change looks good modulo another minor question. Thanks. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5363: > 5361: Label NoFailure; > 5362: __ beq(failedIdx, minusOne, NoFailure); > 5363: __ vsetvli(x0, failedIdx, Assembler::e8, lmul, Assembler::mu, Assembler::tu); Is it necessary to switch to `Assembler::mu, Assembler::tu` for the second vsetvli? Seems that we could still use `Assembler::ma, Assembler::ta` as the first vsetvli. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20026#pullrequestreview-2263638862 PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1733008404 From sgehwolf at openjdk.org Tue Aug 27 15:05:05 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Tue, 27 Aug 2024 15:05:05 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v5] In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 14:12:40 GMT, Zdenek Zambersky wrote: > If I am not mistaken, new test requires, that testsuite is ran as superuser (root). (Because it writes `/etc/systemd/system`, runs certain systemd commands). Should test be skipped for non-root? Thanks! I can add that. FWIW, container tests are in a similar situation (applying cpu/memory limits is not allowed in rootless on cg v1) and they don't check for it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2312812395 From sgehwolf at openjdk.org Tue Aug 27 15:05:05 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Tue, 27 Aug 2024 15:05:05 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v5] In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 14:24:18 GMT, Matthias Baesken wrote: > I added the PR to our internal build/test queue . Thanks, Matthias! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2312813813 From stuefe at openjdk.org Tue Aug 27 15:41:08 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 27 Aug 2024 15:41:08 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace In-Reply-To: References: Message-ID: On Mon, 26 Aug 2024 13:57:16 GMT, Coleen Phillimore wrote: > > I don't think the costs for two address comparisons matter, not with the comparatively few deallocations that happen (few hundreds or few thousand). If deallocate is hot, we are using metaspace wrong. > > MethodData does a lot of deallocations from metaspace because it's allocated racily. It might be using Metaspace wrong. I think that should be okay. This should still be an exception. I have never seen that many deallocations happening in customer cases. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19157#issuecomment-2312905514 From mli at openjdk.org Tue Aug 27 16:18:10 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 27 Aug 2024 16:18:10 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v3] In-Reply-To: <0CsRTA7DLHnrYQuBL4LBO-z_Z0Fysx-Chyw7w57yyYU=.411be3a8-1587-4bef-94e8-4ffa8d48ea2c@github.com> References: <0CsRTA7DLHnrYQuBL4LBO-z_Z0Fysx-Chyw7w57yyYU=.411be3a8-1587-4bef-94e8-4ffa8d48ea2c@github.com> Message-ID: On Wed, 21 Aug 2024 11:53:42 GMT, Robbin Ehn wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> revert misc > > Seems good to me, thanks. Thanks @robehn @gctony @RealFYang for your reviewing. Thanks @camel-cdr for sharing your thoughts! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20026#issuecomment-2312995120 From mli at openjdk.org Tue Aug 27 16:18:11 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 27 Aug 2024 16:18:11 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v6] In-Reply-To: References: <2U6vmyL7j1lBEIqBDfLdeV46YHqV4Rz2DL98glJID4Q=.4ffe1c9b-df61-466c-baad-90272b020124@github.com> Message-ID: On Tue, 27 Aug 2024 14:49:02 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> comments > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5363: > >> 5361: Label NoFailure; >> 5362: __ beq(failedIdx, minusOne, NoFailure); >> 5363: __ vsetvli(x0, failedIdx, Assembler::e8, lmul, Assembler::mu, Assembler::tu); > > Is it necessary to switch to `Assembler::mu, Assembler::tu` for the second vsetvli? Seems that we could still use `Assembler::ma, Assembler::ta` as the first vsetvli. (Although the jtreg test shows no difference between mu/tu and ma/ta.) Yes, I think it's safe to use mu/tu here, in particular it's for the code `__ vsseg3e8_v(outputV1, dst);`. Because seems to me, by spec if we use ma/ta, an implementation could touch dst data after the fail index, which is not expected. And if the code go through this path, it's the end of loop, so impact of performance is just for the last round when there is error in input data. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20026#discussion_r1733178202 From mli at openjdk.org Tue Aug 27 16:23:13 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 27 Aug 2024 16:23:13 GMT Subject: Integrated: 8314124: RISC-V: implement Base64 intrinsic - decoding In-Reply-To: References: Message-ID: On Thu, 4 Jul 2024 10:09:41 GMT, Hamlin Li wrote: > ## Performance > benchmarks run on CanVM-K230 > > data > > Benchmark m2+m1+scalar | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score +intrinsic+rvv | Score -intrinsic | Error | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Base64Decode.testBase64Decode | 0 | 144 | 4 | 1 | avgt | 10 | 97.771 | 98.506 | 0.713 | ns/op | 1.008 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 3 | avgt | 10 | 117.715 | 118.422 | 0.428 | ns/op | 1.006 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 7 | avgt | 10 | 174.625 | 172.767 | 7.671 | ns/op | 0.989 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 32 | avgt | 10 | 286.391 | 317.175 | 11.443 | ns/op | 1.107 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 64 | avgt | 10 | 336.932 | 503.257 | 15.738 | ns/op | 1.494 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 80 | avgt | 10 | 418.894 | 625.485 | 7.21 | ns/op | 1.493 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 96 | avgt | 10 | 353.813 | 698.67 | 15.485 | ns/op | 1.975 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 112 | avgt | 10 | 499.243 | 866.909 | 4.427 | ns/op | 1.736 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 512 | avgt | 10 | 1451.277 | 3530.048 | 3.685 | ns/op | 2.432 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 1000 | avgt | 10 | 2258.785 | 5964.066 | 9.075 | ns/op | 2.64 > Base64Decode.testBase64Decode | 0 | 144 | 4 | 20000 | avgt | 10 | 39689.204 | 122334.929 | 255.195 | ns/op | 3.082 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1 | avgt | 10 | 187.032 | 158.558 | 7.606 | ns/op | 0.848 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 3 | avgt | 10 | 209.558 | 200.774 | 7.648 | ns/op | 0.958 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 7 | avgt | 10 | 556.696 | 505.072 | 8.748 | ns/op | 0.907 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 32 | avgt | 10 | 2139.767 | 1876.825 | 13.787 | ns/op | 0.877 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 64 | avgt | 10 | 6142.353 | 3818.199 | 35.622 | ns/op | 0.622 > Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 80 | avgt | 10 | 8746.205 | 4787.155 | 109.819 | ns/op | 0.547 > Base64Decode.testBase64MIMEDecode | 0 | ... This pull request has now been integrated. Changeset: 44d3a68d Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/44d3a68d8a73c119b64772687d74e5ce25926f4f Stats: 276 lines in 2 files changed: 276 ins; 0 del; 0 mod 8314124: RISC-V: implement Base64 intrinsic - decoding Reviewed-by: fyang, rehn, tonyp ------------- PR: https://git.openjdk.org/jdk/pull/20026 From aph at openjdk.org Tue Aug 27 16:25:12 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 27 Aug 2024 16:25:12 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v5] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Wed, 21 Aug 2024 16:11:25 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: > > cleanup: use a constexpr function for intpow instead of a templated class This is what I'm seeing now. Scorching fast with large blocks, poor with smaller ones. Benchmark (size) Mode Cnt Score Error Units ArraysHashCode.bytes 1 avgt 5 0.532 ? 0.036 ns/op ArraysHashCode.bytes 2 avgt 5 0.812 ? 0.011 ns/op ArraysHashCode.bytes 4 avgt 5 1.104 ? 0.020 ns/op ArraysHashCode.bytes 8 avgt 5 2.136 ? 0.032 ns/op ArraysHashCode.bytes 12 avgt 5 3.596 ? 0.061 ns/op ArraysHashCode.bytes 16 avgt 5 5.278 ? 0.240 ns/op ArraysHashCode.bytes 20 avgt 5 7.390 ? 0.043 ns/op ArraysHashCode.bytes 24 avgt 5 9.606 ? 0.059 ns/op ArraysHashCode.bytes 28 avgt 5 12.144 ? 0.064 ns/op ArraysHashCode.bytes 32 avgt 5 3.898 ? 0.096 ns/op ArraysHashCode.bytes 36 avgt 5 4.468 ? 0.113 ns/op ArraysHashCode.bytes 40 avgt 5 4.481 ? 0.082 ns/op ArraysHashCode.bytes 44 avgt 5 5.143 ? 0.060 ns/op ArraysHashCode.bytes 48 avgt 5 6.727 ? 0.103 ns/op ArraysHashCode.bytes 52 avgt 5 8.844 ? 0.029 ns/op ArraysHashCode.bytes 56 avgt 5 11.108 ? 0.108 ns/op ArraysHashCode.bytes 60 avgt 5 13.864 ? 0.071 ns/op ArraysHashCode.bytes 64 avgt 5 5.796 ? 0.146 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2313009835 From mli at openjdk.org Tue Aug 27 16:31:10 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 27 Aug 2024 16:31:10 GMT Subject: RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding [v3] In-Reply-To: <0CsRTA7DLHnrYQuBL4LBO-z_Z0Fysx-Chyw7w57yyYU=.411be3a8-1587-4bef-94e8-4ffa8d48ea2c@github.com> References: <0CsRTA7DLHnrYQuBL4LBO-z_Z0Fysx-Chyw7w57yyYU=.411be3a8-1587-4bef-94e8-4ffa8d48ea2c@github.com> Message-ID: On Wed, 21 Aug 2024 11:53:42 GMT, Robbin Ehn wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> revert misc > > Seems good to me, thanks. @robehn Please check the commit log of this pr, https://github.com/openjdk/jdk/commit/44d3a68d8a73c119b64772687d74e5ce25926f4f. Seems all previous reviewers will be added in the commit log. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20026#issuecomment-2313021544 From dlong at openjdk.org Tue Aug 27 16:54:07 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 27 Aug 2024 16:54:07 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 12:10:36 GMT, David Holmes wrote: >> src/hotspot/share/utilities/utf8.cpp line 127: >> >>> 125: prev = c; >>> 126: } >>> 127: return checked_cast(num_chars); >> >> Ideally, this function would return size_t. > > Why? I think that would have a large flow on effect. And this length does fit in an int. The worse case is len == SIZE_MAX and therefore num_chars == SIZE_MAX, which won't fit in an int. If we say this will never happen because current callers never use sizes bigger than int, that makes the code fragile against scenarios where a developer might add a new caller. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1733226733 From galder at openjdk.org Tue Aug 27 17:12:04 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 27 Aug 2024 17:12:04 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) In-Reply-To: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Tue, 9 Jul 2024 12:07:37 GMT, Galder Zamarre?o wrote: > This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. > > Currently vectorization does not kick in for loops containing either of these calls because of the following error: > > > VLoop::check_preconditions: failed: control flow in loop not allowed > > > The control flow is due to the java implementation for these methods, e.g. > > > public static long max(long a, long b) { > return (a >= b) ? a : b; > } > > > This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. > By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. > E.g. > > > SuperWord::transform_loop: > Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined > 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) > > > Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1155 > long max 1173 > > > After the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1042 > long max 1042 > > > This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. > Therefore, it still relies on the macro expansion to transform those into CMoveL. > > I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier1 2500 2500 0 0 >>> jtreg:test/jdk:tier1 ... Working on it ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2313102213 From lmesnik at openjdk.org Tue Aug 27 17:14:38 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 27 Aug 2024 17:14:38 GMT Subject: RFR: 8339030: frame::print_value_on(outputStream* st, JavaThread *thread) doesn't need thread argument [v2] In-Reply-To: References: Message-ID: > Method > frame::print_value_on(outputStream* st, JavaThread *thread) doesn't need thread argument > > it usually is called with nullptr as second arg except > JavaThread::trace_frames() > where it is called with this. > > It seems that thread has never been used since 2007 so makes sense just to get rid of it. > > Tested building all builds available in CI and running tier13 Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: fixed identation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20721/files - new: https://git.openjdk.org/jdk/pull/20721/files/9e0304da..cd2f8192 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20721&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20721&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20721.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20721/head:pull/20721 PR: https://git.openjdk.org/jdk/pull/20721 From lmesnik at openjdk.org Tue Aug 27 17:14:38 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 27 Aug 2024 17:14:38 GMT Subject: RFR: 8339030: frame::print_value_on(outputStream* st, JavaThread *thread) doesn't need thread argument [v2] In-Reply-To: <4Sj49UNtEbfus-W0RrEdIq6W8FZmrgUzk6fOpA9xbbA=.64aaf45c-6fe2-4151-aa5d-62275fa40e51@github.com> References: <4Sj49UNtEbfus-W0RrEdIq6W8FZmrgUzk6fOpA9xbbA=.64aaf45c-6fe2-4151-aa5d-62275fa40e51@github.com> Message-ID: On Tue, 27 Aug 2024 02:08:46 GMT, David Holmes wrote: >> Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed identation > > src/hotspot/share/runtime/frame.hpp line 437: > >> 435: public: >> 436: void print_value() const { print_value_on(tty); } >> 437: void print_value_on(outputStream *st) const; > > Nit: the * should be at the end of `outputStream` not the start of `st`. Thanks! I missed that IDE "helped" me here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20721#discussion_r1733250525 From coleenp at openjdk.org Tue Aug 27 17:27:19 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 27 Aug 2024 17:27:19 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v3] In-Reply-To: References: Message-ID: > This change stores InstanceKlass for interface and abstract classes in the non-class metaspace, since class metaspace will have limits on number of classes that can be represented when Lilliput changes go in. Classes that have no instances created for them don't require compressed class pointers. The generated LambdaForm classes are also AllStatic, and changing them to abstract moves them to non-class metaspace too. It's not technically great to make them abstract and not final but you can't have both. Java classfile access flags have no way of specifying something like AllStatic. > > Tested with tier1-8. Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - With JDK-8338929 we don't need is_in_class_space(). - Merge branch 'master' into anon - Incorporated a set of Thomas Stuefe's comments. Take out AbstractClass MetaspaceObj::Type. - 8338526: Don't store abstract and interface Klasses in class metaspace ------------- Changes: https://git.openjdk.org/jdk/pull/19157/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19157&range=02 Stats: 79 lines in 19 files changed: 30 ins; 11 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/19157.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19157/head:pull/19157 PR: https://git.openjdk.org/jdk/pull/19157 From coleenp at openjdk.org Tue Aug 27 17:27:20 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 27 Aug 2024 17:27:20 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v3] In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 06:13:29 GMT, Thomas Stuefe wrote: >> Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: >> >> - With JDK-8338929 we don't need is_in_class_space(). >> - Merge branch 'master' into anon >> - Incorporated a set of Thomas Stuefe's comments. Take out AbstractClass MetaspaceObj::Type. >> - 8338526: Don't store abstract and interface Klasses in class metaspace > > src/hotspot/share/oops/klass.hpp line 205: > >> 203: >> 204: void* operator new(size_t size, ClassLoaderData* loader_data, size_t word_size, TRAPS) throw(); >> 205: > > Oh ArrayKlass never used this? Its good to move it to InstanceKlass. ArrayKlass has its own operator new. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1733266271 From mdoerr at openjdk.org Tue Aug 27 17:41:13 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 27 Aug 2024 17:41:13 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> <6PF-kgezzOb9Ed7j-BbrwaURnLJH5aFgOouFwTYiFrE=.670092b8-6980-43f3-a091-25312cfa0f1b@github.com> <1vQH6zpEgjhIO_mq9DCnpwxgDXmnqdx0owlvjJq4Fcw=.78e60c6e-2b23-4e94-a998-e7ba9eafcb6a@github.com> Message-ID: On Tue, 27 Aug 2024 12:36:39 GMT, Roberto Casta?eda Lozano wrote: >>> Sure, I agree that g1GetAndSetP and g1GetAndSetN are more consistent with the corresponding Ideal operation names, will rename the instructions in x64 and aarch64 and update the test expectations. >> >> Done (commit daf38d3). >> >> @offamitkumar @feilongjiang @snazarkin please note that the ADL instructions `g1XChgP` and `g1XChgN` have been renamed to `g1GetAndSetP` and `g1GetAndSetN`, and the same naming is expected across all platforms by the test `compiler/gcbarriers/TestG1BarrierGeneration.java` included in this changeset. > >> Regardless if you implement the compressed oops optimization or not, I'd reconsider moving the oop decoding into G1BarrierSetAssembler::g1_write_barrier_post_c2 because it makes the .ad file shorter because you can get rid of the replicated decode_heap_oop. > > I tried this refactoring [here](https://github.com/openjdk/jdk/commit/d4e83fd7d77c5415700b33556752e8c8da811dea), thanks again Martin for the suggestion. In my opinion, the result is similar in terms of readability/maintainability because the benefit of removing explicit `decode_heap_oop` operations in the ADL file is somewhat negated by the increased complexity of the `write_barrier_post` and `g1_write_barrier_post_c2` signatures. For aarch64, moving non-destructive `decode_heap_oop` operations would probably require passing both source and destination registers to these functions explicitly, which would make them more complex. I feel that this refactoring is only amortized when the new-value decoding operations are more complex, as in your PPC implementation. Thanks for trying! I like your refactored version. I prefer moving stuff out of the .ad files. The complexity of the barrier code is not significantly higher. Note that you could use `decode_heap_oop_not_null(Register dst, Register src)` with `tmp2` as dst (`generate_post_barrier_fast_path` can deal with new_val == tmp2). That would save a move instruction in some cases. I haven't looked into the aarch64 code. I leave you free to decide. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1733283320 From jiangli at openjdk.org Tue Aug 27 17:56:03 2024 From: jiangli at openjdk.org (Jiangli Zhou) Date: Tue, 27 Aug 2024 17:56:03 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: <3fVMocxS8IOcl2YdBhyFtAd7U8oUkR_arFRFKQNrktI=.e896558e-9e20-49bc-a298-13de3e0f3f11@github.com> On Wed, 21 Aug 2024 22:14:40 GMT, Magnus Ihse Bursie wrote: >> As a preparation for Hermetic Java, we need to have a way to look up during runtime if we are using a statically linked library or not. >> >> This change will be the first step needed towards compiling the object files only once, and then link them into either dynamic or static libraries. (The only exception will be the linktype.c[pp] files, which needs to be compiled twice, once for the dynamic libraries and once for the static libraries.) Getting there will require further work though. >> >> This is part of the changes that make up the draft PR https://github.com/openjdk/jdk/pull/19478, which I have broken out. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Also update build to link properly Marked as reviewed by jiangli (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20666#pullrequestreview-2264106008 From jiangli at openjdk.org Tue Aug 27 17:56:04 2024 From: jiangli at openjdk.org (Jiangli Zhou) Date: Tue, 27 Aug 2024 17:56:04 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: <8qxU0njJxPJtQiUvMkU2NlOY7xhV3xz17YBXPSDi11E=.69e5a2eb-96fd-462c-9f5d-b9af254701fa@github.com> On Tue, 27 Aug 2024 13:55:51 GMT, Magnus Ihse Bursie wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> Also update build to link properly > > And the discussion whether the checks are made "dynamically" or "statically" is too simplified to be really helpful. > > Currently, we compile two sets of all object files, with slightly different compiler arguments, one for dynamic libraries and one for static libraries. Files that are doing things differently for these two modes have an #ifdef, so the alternative way of doing things are not included in the object file. > > In your branch, you still have a separate compilation of all files for static builds, but you also try to figure out through various means (which involves jumping through some hoops to get the bootstrapping right) if this is a static build or a dynamic build. In a way, one could argue that this is just worse than the current solution, since you are still recompiling all files separately for static libraries so you could "know" at build time if you are static or not. > > What I am trying to do is to get to a point where we can compile almost all files just once, and then have two trivially small files that are compiled twice, with just a different value of a define that makes the difference. To propagate this information to all other object files, they need to call the function provided in this object file. So, is it then a "build time" lookup or a "runtime lookup", or a "static lookup" vs "dynamic lookup"? The semantics does not really matter. The whole point is that the difference in build is reduced to an absolute minimum. Sure, this single "lookup" function could be created more like the way you are doing in your branch to try to figure this out without the help of the build system, but there is really no point in that. This is a simple and elegant solution. We had a zoom discussion with @magicus and others on this PR (as part of regular hermetic Java meeting) this morning. @magicus mentioned that he has a PR in progress with the static linking part, which helps address my specific concern. > In your branch, you still have a separate compilation of all files for static builds, but you also try to figure out through various means (which involves jumping through some hoops to get the bootstrapping right) if this is a static build or a dynamic build. In a way, one could argue that this is just worse than the current solution, since you are still recompiling all files separately for static libraries so you could "know" at build time if you are static or not. Yes, the .o files are recompiled for creating the static libraries currently. That causes observable large overhead in terms of both memory and build duration for building JDK itself. In real world constraint environments, both overhead are problematic and cause build issues. So steps toward building the `.so` and `.a` using the same set of `.o` object files should be one of our end goals (just to re-iterate its importance), but would be "ok" without during the initial phases when we are building/integrating hermetic/static support in JDK mainline. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20666#issuecomment-2313180693 From sviswanathan at openjdk.org Tue Aug 27 18:30:08 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 27 Aug 2024 18:30:08 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v4] In-Reply-To: References: Message-ID: On Mon, 19 Aug 2024 07:19:30 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions. src/hotspot/cpu/x86/x86.ad line 1773: > 1771: return false; > 1772: } > 1773: if (bt == T_LONG && !VM_Version::supports_avx512vl()) { we should be able to support bt == T_LONG for 512 bit irrespective of avx512vl. src/hotspot/cpu/x86/x86.ad line 1953: > 1951: if (UseAVX < 1 || size_in_bits < 128 || (size_in_bits == 512 && !VM_Version::supports_avx512bw())) { > 1952: return false; > 1953: } UseAVX < 1 could be written as UseAVX == 0. Could we not do register version for size_in_bit < 128? src/hotspot/cpu/x86/x86.ad line 1962: > 1960: return false; // Implementation limitation > 1961: } > 1962: break; Could we not do register version for size_in_bit < 128? src/hotspot/cpu/x86/x86.ad line 2143: > 2141: if (is_subword_type(bt) && !VM_Version::supports_avx512bw()) { > 2142: return false; // Implementation limitation > 2143: } UMinV and UMaxV are supported on AVX1, AVX2 platform. src/hotspot/cpu/x86/x86.ad line 2155: > 2153: return false; // Implementation limitation > 2154: } > 2155: return true; Byte/Short saturating vector add is supported for AVX1, AVX2 platforms. Could we not do register version for size_in_bit < 128? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1733330892 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1733333203 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1733333608 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1733336005 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1733338300 From psandoz at openjdk.org Tue Aug 27 20:03:07 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 27 Aug 2024 20:03:07 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v6] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Tue, 27 Aug 2024 09:58:44 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions. I think we should leave the fallback expression as `vec2.rearrange(vec1.toShuffle(), vec3);`, lets address that separately if needed. Otherwise, you have introduced an additional code path that requires more explicit testing. My comment was related to understanding what `SelectFromTwoVectorNode::Ideal` and `VectorRearrangeNode::Ideal` are doing - the former lowers, if needed, into the rearrange expression and the latter adjusts, if needed, the index vector (a comment describing this transformation would be useful, like you have in the former method). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2313401788 From dcubed at openjdk.org Tue Aug 27 20:45:09 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 27 Aug 2024 20:45:09 GMT Subject: RFR: 8338727: RISC-V: Avoid synthetic data dependency in nmethod barrier on Ztso [v2] In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 07:59:10 GMT, Robbin Ehn wrote: > I'm still confused by the re-review thingy. > If I integrate now will @Hamlin-Li still get credit or do he need to re-review? Folks who have officially reviewed (and approved) do not get removed. The re-review "thingy" is just to make sure that all changes are reviewed before integration. Make a change after folks have reviewed, you just need someone to re-review to verify that the latest change was looked at. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20661#issuecomment-2313497307 From dholmes at openjdk.org Tue Aug 27 21:14:08 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 27 Aug 2024 21:14:08 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 13:06:26 GMT, Thomas Stuefe wrote: >> IIUC for compact strings, with non-latin-1 each pair of bytes would require at most 3-bytes to encode so you'd need 2/3 of INT_MAX. With latin-1 it would be 1/2 INT_MAX. But yes I suppose in theory you might be able to get an overflow on 32-bit. Need to think more about what could even be done for this case ... and whether it is worth trying ... > > SymbolTable does check the length and truncates with a warning (see https://github.com/openjdk/jdk/blob/0c332e9de919184d8a4678bfd7c274fcef02b3e2/src/hotspot/share/classfile/symbolTable.cpp#L351-L360) though it does not seem to check for values < 0. Maybe we should add that. A negative value should only come from integer overflow and we have been eradicating the sources for that at the higher levels. But maybe it is worth adding the negative check in `symbolTable` ... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1733520437 From gziemski at openjdk.org Tue Aug 27 21:16:31 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 27 Aug 2024 21:16:31 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 06:31:14 GMT, David Holmes wrote: >>> I agree with @tstuefe here. MemFlag and MemType sound far too general when this is NMT specific. >> >> Yes, it is not very specific, but it also not hard to learn and then know what this type is all about. >> >>> My preference to keep the "flags" part of the type was to avoid needing to rename many parameters. The usage of MEMFLAGS flags is quite extensive. I would not want to see a partial approach here where we end up with a non-flag type name but a flag variable name. >> >> I think we should rename all the 'flags' variables in the same change. >> >>> NMTTypeFlag would I hope satisfy Thomas's requirement and avoid the need to do variable renames. >> >> * To me, that's really not an appealing name for a type that is going to be used by all parts of the HotSpot code base. I much more prefer a shorter name that is easy on the eyes, then a longer and more specific name that is an eyesore. >> >> * And even as a longer name, it doesn't tell what it is going to be used for. What is a Native Memory Tracker Type Flag? >> >> * I don't want us to select a bad name so that we don't have to change the variable names. >> >> * Whatever we choose we also need to consider the mt prefix of things like mtGC, mtClass, etc. >> >> With all that said, I hope it is clear that we various reviewers have different opinions around this and that we don't integrate this before we have some kind of consensus about the way forward with this. > >> What is a Native Memory Tracker Type Flag? > > It is a flag telling us the type of native memory being tracked. > >> Whatever we choose we also need to consider the mt prefix of things like mtGC, mtClass, etc. > > And what does that stand for: memory type? memory tracker? Arguably they should have been nmtGC etc. > >> I think we should rename all the 'flags' variables in the same change. > > Okay. That's a big change but I'd prefer it to any half-way measures. @dholmes-ora @tstuefe @stefank @kimbarrett @afshin-zafari @jdksjolen hi all, I would like to see if we can give this another go, now that we got some time to sleep on it. How about this - I created a table with some name candidates, so everyone can see where everyone else is. We all can choose 3 candidates that we can rank 1, 2 and 3. At the end we tabulate the answer and the one with highest score wins? | developer | MemType | MemTypeFlag | NMTCat | NMTGroup | NMT_MemType | NMT::MemType | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | gerard | 1 | 0 | 0 | 0 | 2 | 3 | | David | ? | ? | ? | ? | ? | ? | | Thomas | ? | ? | ? | ? | ? | ? | | Johan | ? | ? | ? | ? | ? | ? | | Afshin | ? | ? | ? | ? | ? | ? | | Stefan | ? | ? | ? | ? | ? | ? | | Kim | ? | ? | ? | ? | ? | ? | ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2313584428 From dholmes at openjdk.org Tue Aug 27 21:23:23 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 27 Aug 2024 21:23:23 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 16:51:21 GMT, Dean Long wrote: >> Why? I think that would have a large flow on effect. And this length does fit in an int. > > The worse case is len == SIZE_MAX and therefore num_chars == SIZE_MAX, which won't fit in an int. If we say this will never happen because current callers never use sizes bigger than int, that makes the code fragile against scenarios where a developer might add a new caller. (A whitebox test or gtest could be written that makes the checked_cast fail.) If you try to accommodate arbitrary future use then every method in the VM would need to enforce every single precondition and invariant it expects "just in case" and that is not practical. Code can and does take advantage of the expected calling context, which here limits lengths to int (and typically < 64K). The checked_cast serves to catch such misuses in my opinion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1733539059 From duke at openjdk.org Tue Aug 27 21:25:40 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 27 Aug 2024 21:25:40 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm In-Reply-To: References: Message-ID: <5RUXvY7Tb8B_QYxg0iLNaC5d6fcNMdHUYXdEzyBoQ_U=.19f4d853-620a-459c-acc1-d57bfd6fb7bc@github.com> On Mon, 26 Aug 2024 15:47:13 GMT, Joe Darcy wrote: > This PR doesn't include any additional tests. It is often appropriate to add more regression testing when introducing a new implementation of a method. Thank You Joe for the suggestion. Will add more tests. (This PR passes the tier-1 tanh tests in the HyperbolicTests.Java) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20657#issuecomment-2313603036 From mli at openjdk.org Tue Aug 27 21:43:21 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 27 Aug 2024 21:43:21 GMT Subject: RFR: 8338727: RISC-V: Avoid synthetic data dependency in nmethod barrier on Ztso [v2] In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 20:42:39 GMT, Daniel D. Daugherty wrote: > Folks who have officially reviewed (and approved) do not get removed. The re-review "thingy" is just to make sure that all changes are reviewed before integration. Make a change after folks have reviewed, you just need someone to re-review to verify that the latest change was looked at. Thanks for confirmation, I also verified this via [this pr](https://github.com/openjdk/jdk/pull/20026#issuecomment-2313021544) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20661#issuecomment-2313629036 From duke at openjdk.org Tue Aug 27 22:26:21 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 27 Aug 2024 22:26:21 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm In-Reply-To: References: <8CAXws7Rp6HKERu5hSTOrXi8GRFRdV4I670Nf8NSZlI=.ba6acccb-77e5-46a6-bec2-e0ea97dfe85d@github.com> Message-ID: On Tue, 27 Aug 2024 10:54:11 GMT, Andrew Haley wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_tanh.cpp line 437: >> >>> 435: __ mulpd(xmm1, xmm1); >>> 436: __ movdqu(xmm4, ExternalAddress(pv + 32), r11 /*rscratch*/); >>> 437: __ mulpd(xmm2, xmm1); >> >> I would encourage either you add detailed comments or give meaningful names to the registers to ease the review process. > > I agree, this is all rather obscure. Ideally the same names that are used in wherever this comes from. > > Where does the algorithm come from? What are its accuracy guarantees? > > In addition, given the rarity of hyperbolic tangents in Java applications, do we need this? @theRealAph, this implementation is based on Intel libm math library and meets the accuracy requirements. The algorithm is provided in the comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1733589125 From sviswanathan at openjdk.org Tue Aug 27 22:28:21 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 27 Aug 2024 22:28:21 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v4] In-Reply-To: References: Message-ID: <4k6vX8rkREK9CYMZjs0KfHikLJJ1NWbMtWYYzLcYPc0=.53547148-1abb-4a7f-8238-944c13a26304@github.com> On Mon, 19 Aug 2024 07:19:30 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions. src/hotspot/cpu/x86/x86.ad line 10635: > 10633: %} > 10634: > 10635: instruct saturating_unsigned_add_reg_avx(vec dst, vec src1, vec src2, vec xtmp1, vec xtmp2, vec xtmp3, vec xtmp4) Should the temp here and all the places related to !avx512vl() be legVec instead of vec? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1733588147 From darcy at openjdk.org Tue Aug 27 22:47:18 2024 From: darcy at openjdk.org (Joe Darcy) Date: Tue, 27 Aug 2024 22:47:18 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm In-Reply-To: <5RUXvY7Tb8B_QYxg0iLNaC5d6fcNMdHUYXdEzyBoQ_U=.19f4d853-620a-459c-acc1-d57bfd6fb7bc@github.com> References: <5RUXvY7Tb8B_QYxg0iLNaC5d6fcNMdHUYXdEzyBoQ_U=.19f4d853-620a-459c-acc1-d57bfd6fb7bc@github.com> Message-ID: On Tue, 27 Aug 2024 21:22:26 GMT, Srinivas Vamsi Parasa wrote: > > This PR doesn't include any additional tests. It is often appropriate to add more regression testing when introducing a new implementation of a method. > > Thank You Joe for the suggestion. Will add more tests. (This PR passes the tier-1 tanh tests in the HyperbolicTests.Java) Yes @vamsi-parasa ; running that test is a good backstop and it is written to be applicable to any implementation of {sinh, cosh, tanh} that meet the general quality-of-implementation criteria for java.lang.Math. To be explicit, the WorstCaseTests.java file, and for good measure all the java.lang.Math tests, should also be run too for a change like this. For a hypothetical example, if an intrinsic used different polynomials for different ranges of the input, it would be a reasonable regression tests _for that implementation_ to probe around the boundary of the transition between the polynomials to make sure the monotonicity requirements were being met. That kind of check could be written to be generally applicable and be suitable for a regression tests in java/lang/Math or could be suitable for a regression test in the HotSpot area. HTH ------------- PR Comment: https://git.openjdk.org/jdk/pull/20657#issuecomment-2313699961 From jiangli at openjdk.org Tue Aug 27 23:17:19 2024 From: jiangli at openjdk.org (Jiangli Zhou) Date: Tue, 27 Aug 2024 23:17:19 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: On Tue, 27 Aug 2024 13:55:51 GMT, Magnus Ihse Bursie wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> Also update build to link properly > > And the discussion whether the checks are made "dynamically" or "statically" is too simplified to be really helpful. > > Currently, we compile two sets of all object files, with slightly different compiler arguments, one for dynamic libraries and one for static libraries. Files that are doing things differently for these two modes have an #ifdef, so the alternative way of doing things are not included in the object file. > > In your branch, you still have a separate compilation of all files for static builds, but you also try to figure out through various means (which involves jumping through some hoops to get the bootstrapping right) if this is a static build or a dynamic build. In a way, one could argue that this is just worse than the current solution, since you are still recompiling all files separately for static libraries so you could "know" at build time if you are static or not. > > What I am trying to do is to get to a point where we can compile almost all files just once, and then have two trivially small files that are compiled twice, with just a different value of a define that makes the difference. To propagate this information to all other object files, they need to call the function provided in this object file. So, is it then a "build time" lookup or a "runtime lookup", or a "static lookup" vs "dynamic lookup"? The semantics does not really matter. The whole point is that the difference in build is reduced to an absolute minimum. Sure, this single "lookup" function could be created more like the way you are doing in your branch to try to figure this out without the help of the build system, but there is really no point in that. This is a simple and elegant solution. @magicus please also specify contributor properly to so it's clear part of the change is based on/extracted from https://github.com/openjdk/leyden/tree/hermetic-java-runtime. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20666#issuecomment-2313737779 From john.r.rose at oracle.com Tue Aug 27 23:44:28 2024 From: john.r.rose at oracle.com (John Rose) Date: Tue, 27 Aug 2024 16:44:28 -0700 Subject: RFR: 8338023: Support two vector selectFrom API [v5] In-Reply-To: <2_P1qPMS46tgh4RUSuitcjXYnd0koS_BxfRRRmj79EY=.c3baeeaa-87f7-47d4-bc70-ae2afd9de745@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7e5pWnvjqk-dQYNeaZjFzXcd5WlzniZPl5T4l1rKQGE=.0882bcd4-e307-4a29-aa41-5496ee029a60@github.com> <2_P1qPMS46tgh4RUSuitcjXYnd0koS_BxfRRRmj79EY=.c3baeeaa-87f7-47d4-bc70-ae2afd9de745@github.com> Message-ID: On 23 Aug 2024, at 15:33, Paul Sandoz wrote: > The float/double conversion bothers me, not suggesting we do something about it here, noting down for any future conversation on shuffles. Yes, it?s a pain which is noticeable in the vector/shuffle conversions. In the worst case it adds dynamic reformatting operations to get from the artificially ?uniform? float/double index format into the real format the hardware requires. As a workaround, the user could convert the float/double payloads bitwise into int/long payloads, and then do the shuffling in the uniform int/long API, later reconverting back to float/double after the payloads are reordered. Those conversions don?t actually use any dynamic operations. For prototyping, it seems fine to take the hit and ignore the fact that the index vectors are in an odd (though ?uniform?) format. From dlong at openjdk.org Tue Aug 27 23:56:25 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 27 Aug 2024 23:56:25 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 21:21:01 GMT, David Holmes wrote: > If you try to accommodate arbitrary future use then every method in the VM would need to enforce every single precondition and invariant it expects "just in case" and that is not practical. I'm basically arguing for Functional Testing here, or at least having some invariants the would allow functional testing. It may seem impractical to retrofit existing code, but when we are changing the input from int to size_t, that seems like the perfect time to enforce the new invariants. If we expect "len" to be <= INT_MAX instead of SIZE_MAX, something that is not obvious from its type, then why not check that with an assert or at least document it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1733651059 From sviswanathan at openjdk.org Wed Aug 28 00:15:19 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 28 Aug 2024 00:15:19 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v2] In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 06:59:53 GMT, Jatin Bhateja wrote: >>> its usage in existing patch is limited to [type comparison.](https://github.com/openjdk/jdk/pull/20507/files#diff-3559dcf23b719805be5fd06fd5c1851dbd8f53e47afe6d99cba13a3de0ebc6b2R1542) >> >> Ah, that makes sense to me. I took a closer look and I think since the patch is creating a `VectorReinterpret` node after unsigned vector nodes, it might be able to avoid cases where the type might get filtered/joined, like with `PhiNode::Value`. That might lead to errors since `empty_type->filter(other_type) == TOP`. It's unfortunate that it's not really possible to disambiguate between an empty type and an unsigned range, which would allow us to solve this elegantly. > > Hey @jaskarth , Central idea behind introducing VectorReinterpretNode after unsigned vector IR is to facilitate unboxing-boxing optimization, this explicit reinterpretation ensures type compatibility between value being boxed and box type which is always signed vector types. > > As mentioned previously my plan is to address is handle value range related concerns in a follow up patch along with intrisification and auto-vectorization of newly created scalar saturating IR, this patch is not generating scalar IR with newly defined unsigned types. Wonder if it would have been simpler if we added unsigned vector operators like Op_SaturatingUnsignedAddVB etc. We are not adding unsigned data types to Java, only supporting unsigned (saturating) operations on existing signed integral types. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1733659843 From dholmes at openjdk.org Wed Aug 28 00:41:20 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 28 Aug 2024 00:41:20 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 17:13:06 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. > > There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) Hmmmm .... I like the structure of `NMT::MemType` but also prefer to keep "flag" in the name to avoid renaming parameters (or having oddly named parameters). Can I vote for `NMT::MemTypeFlag`? :) Otherwise: - `MemTypeFlag` - 3 points - `NMT::MemType` - 2 points - `NMT_MemType` - 1 point (but maybe `NMTMemType`?) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2313858939 From dholmes at openjdk.org Wed Aug 28 01:27:28 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 28 Aug 2024 01:27:28 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 23:54:08 GMT, Dean Long wrote: >> If you try to accommodate arbitrary future use then every method in the VM would need to enforce every single precondition and invariant it expects "just in case" and that is not practical. Code can and does take advantage of the expected calling context, which here limits lengths to int (and typically < 64K). The checked_cast serves to catch such misuses in my opinion. > >> If you try to accommodate arbitrary future use then every method in the VM would need to enforce every single precondition and invariant it expects "just in case" and that is not practical. > > I'm basically arguing for Functional Testing here, or at least having some invariants the would allow functional testing. It may seem impractical to retrofit existing code, but when we are changing the input from int to size_t, that seems like the perfect time to enforce the new invariants. If we expect "len" to be <= INT_MAX instead of SIZE_MAX, something that is not obvious from its type, then why not check that with an assert or at least document it? Note that I do already document the assumptions here in the general comment in utf8.hpp: There is an additional assumption/expectation that our UTF8 API's are never dealing with invalid UTF8, and more generally that all UTF8 sequences could form valid Strings. Consequently the Unicode length of a UTF8 sequence is assumed to always be representable by an int. the check_cast is then the assert that verifies that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1733751739 From dholmes at openjdk.org Wed Aug 28 02:09:18 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 28 Aug 2024 02:09:18 GMT Subject: RFR: 8339030: frame::print_value_on(outputStream* st, JavaThread *thread) doesn't need thread argument [v2] In-Reply-To: References: Message-ID: <2crv29muJbKk9v7Ghd32lt0kgQbn7GKOocFlfDwid6U=.de22ee1b-1161-4e11-830a-bf239f30d1ef@github.com> On Tue, 27 Aug 2024 17:14:38 GMT, Leonid Mesnik wrote: >> Method >> frame::print_value_on(outputStream* st, JavaThread *thread) doesn't need thread argument >> >> it usually is called with nullptr as second arg except >> JavaThread::trace_frames() >> where it is called with this. >> >> It seems that thread has never been used since 2007 so makes sense just to get rid of it. >> >> Tested building all builds available in CI and running tier13 > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > fixed identation Marked as reviewed by dholmes (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20721#pullrequestreview-2264845506 From dlong at openjdk.org Wed Aug 28 03:49:21 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 28 Aug 2024 03:49:21 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote: >> This work has been split out from JDK-8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths >> >> The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. Though with compact strings this reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define UTF8 lengths as `size_t` to accommodate all possible representations. Higher-level API's can still use `int` if they know the strings (eg symbols) are sufficiently constrained in length. See the comments in utf8.hpp that explain Strings, compact strings and the encoding. >> >> As the existing JNI `GetStringUTFLength` still requires the current truncating behaviour of ` UNICODE::utf8_length` we add back `UNICODE::utf8_length_as_int` for it to use. >> >> Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` use `length` as an IN/OUT parameter: it is the incoming (int) length of the jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. This makes some of the call sites a little messy with casts. >> >> Testing: >> - tiers 1-4 >> - GHA > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > more missing casts Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20560#pullrequestreview-2265064454 From dlong at openjdk.org Wed Aug 28 03:49:21 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 28 Aug 2024 03:49:21 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 01:24:43 GMT, David Holmes wrote: >>> If you try to accommodate arbitrary future use then every method in the VM would need to enforce every single precondition and invariant it expects "just in case" and that is not practical. >> >> I'm basically arguing for Functional Testing here, or at least having some invariants the would allow functional testing. It may seem impractical to retrofit existing code, but when we are changing the input from int to size_t, that seems like the perfect time to enforce the new invariants. If we expect "len" to be <= INT_MAX instead of SIZE_MAX, something that is not obvious from its type, then why not check that with an assert or at least document it? > > Note that I do already document the assumptions here in the general comment in utf8.hpp: > > There is an additional assumption/expectation that our UTF8 API's are never dealing with > invalid UTF8, and more generally that all UTF8 sequences could form valid Strings. > Consequently the Unicode length of a UTF8 sequence is assumed to always be representable > by an int. > > the check_cast is then the assert that verifies that. OK, that's good enough for me. Thanks, ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1733935851 From amitkumar at openjdk.org Wed Aug 28 04:31:27 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 28 Aug 2024 04:31:27 GMT Subject: RFR: 8338658: New Object to ObjectMonitor mapping: s390x implementation Message-ID: s390x implementation of [JDK-8315884](https://bugs.openjdk.org/browse/JDK-8315884) New Object to ObjectMonitor mapping; Testing: - tier1-test (fastdebug) - tier1-test with UseObjectMonitorTable ------------- Commit messages: - s390-port Changes: https://git.openjdk.org/jdk/pull/20740/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20740&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338658 Stats: 169 lines in 7 files changed: 94 ins; 23 del; 52 mod Patch: https://git.openjdk.org/jdk/pull/20740.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20740/head:pull/20740 PR: https://git.openjdk.org/jdk/pull/20740 From dholmes at openjdk.org Wed Aug 28 04:43:21 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 28 Aug 2024 04:43:21 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v5] In-Reply-To: References: Message-ID: <70PFnoEfExk65lPEy4h9AeG-7vT6tDbuoDu-FBLjTQQ=.748cf8d3-c698-421a-a2c4-c2bc45f2aa7d@github.com> On Wed, 28 Aug 2024 03:46:43 GMT, Dean Long wrote: >> Note that I do already document the assumptions here in the general comment in utf8.hpp: >> >> There is an additional assumption/expectation that our UTF8 API's are never dealing with >> invalid UTF8, and more generally that all UTF8 sequences could form valid Strings. >> Consequently the Unicode length of a UTF8 sequence is assumed to always be representable >> by an int. >> >> the check_cast is then the assert that verifies that. > > OK, that's good enough for me. Thanks, Thanks for the review @dean-long ! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1733973531 From dholmes at openjdk.org Wed Aug 28 04:59:56 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 28 Aug 2024 04:59:56 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v6] In-Reply-To: References: Message-ID: > This work has been split out from JDK-8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths > > The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. Though with compact strings this reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define UTF8 lengths as `size_t` to accommodate all possible representations. Higher-level API's can still use `int` if they know the strings (eg symbols) are sufficiently constrained in length. See the comments in utf8.hpp that explain Strings, compact strings and the encoding. > > As the existing JNI `GetStringUTFLength` still requires the current truncating behaviour of ` UNICODE::utf8_length` we add back `UNICODE::utf8_length_as_int` for it to use. > > Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` use `length` as an IN/OUT parameter: it is the incoming (int) length of the jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. This makes some of the call sites a little messy with casts. > > Testing: > - tiers 1-4 > - GHA David Holmes has updated the pull request incrementally with one additional commit since the last revision: Extra assertion requested by tstuefe ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20560/files - new: https://git.openjdk.org/jdk/pull/20560/files/0c332e9d..3d36ba52 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20560&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20560&range=04-05 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20560.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20560/head:pull/20560 PR: https://git.openjdk.org/jdk/pull/20560 From stuefe at openjdk.org Wed Aug 28 05:01:25 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 28 Aug 2024 05:01:25 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: <7S3vfLFTqQ00fyeqqLQ0INQxAUpIktjo7XTnzlFSSY8=.6dd191e9-2ddf-426a-8da7-7983e1c12e47@github.com> On Wed, 7 Aug 2024 17:13:06 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. > > There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) NMTCat, NMTGroup 3 points ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2314317309 From iklam at openjdk.org Wed Aug 28 06:02:23 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 28 Aug 2024 06:02:23 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v3] In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 17:27:19 GMT, Coleen Phillimore wrote: >> This change stores InstanceKlass for interface and abstract classes in the non-class metaspace, since class metaspace will have limits on number of classes that can be represented when Lilliput changes go in. Classes that have no instances created for them don't require compressed class pointers. The generated LambdaForm classes are also AllStatic, and changing them to abstract moves them to non-class metaspace too. It's not technically great to make them abstract and not final but you can't have both. Java classfile access flags have no way of specifying something like AllStatic. >> >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - With JDK-8338929 we don't need is_in_class_space(). > - Merge branch 'master' into anon > - Incorporated a set of Thomas Stuefe's comments. Take out AbstractClass MetaspaceObj::Type. > - 8338526: Don't store abstract and interface Klasses in class metaspace Looks OK to me. Maybe we should add an assert in `CompressedKlassPointers::encode_not_null()` to check that we never encode abstract and interface classes? src/hotspot/share/oops/metadata.hpp line 2: > 1: /* > 2: * Copyright (c) 2011, 2024, Oracle and/or its affiliates. All rights reserved. No more change in this file? ------------- PR Review: https://git.openjdk.org/jdk/pull/19157#pullrequestreview-2265222694 PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1734036194 From rehn at openjdk.org Wed Aug 28 06:21:22 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 28 Aug 2024 06:21:22 GMT Subject: RFR: 8338727: RISC-V: Avoid synthetic data dependency in nmethod barrier on Ztso [v2] In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 20:42:39 GMT, Daniel D. Daugherty wrote: > > I'm still confused by the re-review thingy. > > If I integrate now will @Hamlin-Li still get credit or do he need to re-review? > > Folks who have officially reviewed (and approved) do not get removed. The re-review "thingy" is just to make sure that all changes are reviewed before integration. Make a change after folks have reviewed, you just need someone to re-review to verify that the latest change was looked at. Thanks, Dan! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20661#issuecomment-2314408072 From azafari at openjdk.org Wed Aug 28 08:04:20 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Wed, 28 Aug 2024 08:04:20 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 17:13:06 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. > > There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) - `MemTypeFlag` : 3 - `NMT::MemType` : 2 IMHO, `MemType` without NMT is a global name and possibly will be source of confusion in the code or by the developers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2314608812 From kbarrett at openjdk.org Wed Aug 28 08:27:25 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 28 Aug 2024 08:27:25 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: References: Message-ID: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> On Tue, 27 Aug 2024 07:30:46 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Rename g1XChgX to g1GetAndSetX for consistency with Ideal operation names I've only looked at the changes in gc directories (shared and cpu-specific). src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 160: > 158: * To reduce the number of updates to the remembered set, the post-barrier > 159: * filters out updates to fields in objects located in the Young Generation, the > 160: * same region as the reference, when the null is being written, or if the card s/the null/null/ src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 166: > 164: * post-barrier completely, if it is possible during compile time to prove the > 165: * object is newly allocated and that no safepoint exists between the allocation > 166: * and the store. It might be worth saying explicitly that this is a compile-time version of the above mentioned young generation filter. src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 229: > 227: } > 228: > 229: void refine_barrier_by_new_val_type(Node* n) { This function should probably be `static`. ------------- Changes requested by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2259069811 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1734167614 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1734196887 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1734207820 From kbarrett at openjdk.org Wed Aug 28 08:27:27 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 28 Aug 2024 08:27:27 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v9] In-Reply-To: References: Message-ID: On Mon, 19 Aug 2024 08:53:30 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Pass oldval to the pre-barrier in g1CompareAndExchange/SwapP src/hotspot/cpu/aarch64/gc/g1/g1BarrierSetAssembler_aarch64.cpp line 218: > 216: __ cbz(new_val, done); > 217: } > 218: // Storing region crossing non-null, is card already dirty? s/already dirty/young/ src/hotspot/cpu/aarch64/gc/g1/g1BarrierSetAssembler_aarch64.cpp line 280: > 278: > 279: #undef __ > 280: #define __ masm-> These "changes" to `__` are unnecessary and confusing. We have the same define near the top of the file, unconditionally. This one is conditonal on COMPILER2, but is left in place at the end of the conditional block, affecting following unconditional code. src/hotspot/share/opto/memnode.cpp line 3468: > 3466: // Capture an unaliased, unconditional, simple store into an initializer. > 3467: // Or, if it is independent of the allocation, hoist it above the allocation. > 3468: if (ReduceFieldZeroing && ReduceInitialCardMarks && /*can_reshape &&*/ It's not obvious to me how this is related to the late barrier changes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1730194278 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1730238757 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1730246320 From kbarrett at openjdk.org Wed Aug 28 08:27:28 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 28 Aug 2024 08:27:28 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> Message-ID: On Wed, 28 Aug 2024 08:09:44 GMT, Kim Barrett wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename g1XChgX to g1GetAndSetX for consistency with Ideal operation names > > src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 166: > >> 164: * post-barrier completely, if it is possible during compile time to prove the >> 165: * object is newly allocated and that no safepoint exists between the allocation >> 166: * and the store. > > It might be worth saying explicitly that this is a compile-time version of the above mentioned young > generation filter. We can similarly elide the post-barrier if we can prove at compile-time that the value being written is null. That case isn't handled here though. Instead that's checked for in `refine_barrier_by_new_val_type` and in `get_store_barrier`. I'm not sure why it's structured that way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1734201007 From jzhu at openjdk.org Wed Aug 28 08:51:11 2024 From: jzhu at openjdk.org (Joshua Zhu) Date: Wed, 28 Aug 2024 08:51:11 GMT Subject: RFR: 8339063: [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only [v2] In-Reply-To: References: Message-ID: > Please review this minor enhancement that skips verify_sve_vector_length after native calls. > It works on SVE micro-architecture that only supports 128-bit vector length. Joshua Zhu has updated the pull request incrementally with one additional commit since the last revision: Fix compilation failure with --disable-precompiled-headers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20724/files - new: https://git.openjdk.org/jdk/pull/20724/files/b825cbab..c0ec5499 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20724&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20724&range=00-01 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20724.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20724/head:pull/20724 PR: https://git.openjdk.org/jdk/pull/20724 From jsjolen at openjdk.org Wed Aug 28 09:06:26 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 28 Aug 2024 09:06:26 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 17:13:06 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. > > There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) NMTGroup 3 points, NMTCat 2 points are my picks (NMTCategory is even better than NMTCat, imho). No 1 point given, I want these to win :P. Thank you for the effort with this table, may the highest point taker win :-). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2314753974 From adinn at openjdk.org Wed Aug 28 09:23:19 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 28 Aug 2024 09:23:19 GMT Subject: RFR: 8339063: [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only [v2] In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 08:51:11 GMT, Joshua Zhu wrote: >> Please review this minor enhancement that skips verify_sve_vector_length after native calls. >> It works on SVE micro-architecture that only supports 128-bit vector length. > > Joshua Zhu has updated the pull request incrementally with one additional commit since the last revision: > > Fix compilation failure with --disable-precompiled-headers The code changes look ok. What have you done to test it? ------------- PR Review: https://git.openjdk.org/jdk/pull/20724#pullrequestreview-2265676276 From coleenp at openjdk.org Wed Aug 28 12:33:21 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 28 Aug 2024 12:33:21 GMT Subject: RFR: 8339030: frame::print_value_on(outputStream* st, JavaThread *thread) doesn't need thread argument [v2] In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 17:14:38 GMT, Leonid Mesnik wrote: >> Method >> frame::print_value_on(outputStream* st, JavaThread *thread) doesn't need thread argument >> >> it usually is called with nullptr as second arg except >> JavaThread::trace_frames() >> where it is called with this. >> >> It seems that thread has never been used since 2007 so makes sense just to get rid of it. >> >> Tested building all builds available in CI and running tier13 > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > fixed identation This looks great. Thanks for cleaning this up. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20721#pullrequestreview-2266117942 From coleenp at openjdk.org Wed Aug 28 12:37:20 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 28 Aug 2024 12:37:20 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 17:13:06 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. > > There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) I like NMT::MemType 3 points. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2315202137 From ihse at openjdk.org Wed Aug 28 12:47:19 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 28 Aug 2024 12:47:19 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: On Tue, 27 Aug 2024 23:15:03 GMT, Jiangli Zhou wrote: >> And the discussion whether the checks are made "dynamically" or "statically" is too simplified to be really helpful. >> >> Currently, we compile two sets of all object files, with slightly different compiler arguments, one for dynamic libraries and one for static libraries. Files that are doing things differently for these two modes have an #ifdef, so the alternative way of doing things are not included in the object file. >> >> In your branch, you still have a separate compilation of all files for static builds, but you also try to figure out through various means (which involves jumping through some hoops to get the bootstrapping right) if this is a static build or a dynamic build. In a way, one could argue that this is just worse than the current solution, since you are still recompiling all files separately for static libraries so you could "know" at build time if you are static or not. >> >> What I am trying to do is to get to a point where we can compile almost all files just once, and then have two trivially small files that are compiled twice, with just a different value of a define that makes the difference. To propagate this information to all other object files, they need to call the function provided in this object file. So, is it then a "build time" lookup or a "runtime lookup", or a "static lookup" vs "dynamic lookup"? The semantics does not really matter. The whole point is that the difference in build is reduced to an absolute minimum. Sure, this single "lookup" function could be created more like the way you are doing in your branch to try to figure this out without the help of the build system, but there is really no point in that. This is a simple and elegant solution. > > @magicus please also specify contributor properly to so it's clear part of the change is based on/extracted from https://github.com/openjdk/leyden/tree/hermetic-java-runtime. @jianglizhou Are there any other authors on the `hermetic-java-runtime` branch that should be credited? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20666#issuecomment-2315222006 From yzheng at openjdk.org Wed Aug 28 13:17:20 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 28 Aug 2024 13:17:20 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm In-Reply-To: References: Message-ID: On Wed, 21 Aug 2024 00:25:03 GMT, Srinivas Vamsi Parasa wrote: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x src/hotspot/share/jvmci/jvmciCompilerToVM.hpp line 114: > 112: static address dcos; > 113: static address dtan; > 114: static address dtanh; Could you please add the following initializing code? diff --git a/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp b/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp index 9752d7edf99..1db9be70db0 100644 --- a/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp +++ b/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp @@ -259,6 +259,17 @@ void CompilerToVM::Data::initialize(JVMCI_TRAPS) { SET_TRIGFUNC(dpow); #undef SET_TRIGFUNC + +#define SET_TRIGFUNC_OR_NULL(name) \ + if (StubRoutines::name() != nullptr) { \ + name = StubRoutines::name(); \ + } else { \ + name = nullptr; \ + } + + SET_TRIGFUNC_OR_NULL(dtanh); + +#undef SET_TRIGFUNC_OR_NULL } static jboolean is_c1_supported(vmIntrinsics::ID id){ diff --git a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp index fea308503cf..189c1465589 100644 --- a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp +++ b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp @@ -126,6 +126,7 @@ static_field(CompilerToVM::Data, dsin, address) \ static_field(CompilerToVM::Data, dcos, address) \ static_field(CompilerToVM::Data, dtan, address) \ + static_field(CompilerToVM::Data, dtanh, address) \ static_field(CompilerToVM::Data, dexp, address) \ static_field(CompilerToVM::Data, dlog, address) \ static_field(CompilerToVM::Data, dlog10, address) \ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1734655447 From erikj at openjdk.org Wed Aug 28 13:41:21 2024 From: erikj at openjdk.org (Erik Joelsson) Date: Wed, 28 Aug 2024 13:41:21 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: <9YbB0gtUIqE5SAmbAMmVC_S8wNDa9kKPVop6uvKUxCY=.e000ded5-fcb8-42f9-bde7-97cd1f52ecf9@github.com> On Mon, 26 Aug 2024 02:07:39 GMT, David Holmes wrote: > I understand the cost overhead experienced by any individual Java run may be lost in the noise, but it still impacts every single Java run just to save some time/resources for the handful of builders of statically linked VMs. I am not a fan. I understand your stance and it's a fair principle. My opinion is that we need to weigh the pros and cons with more nuance. We are often in situations where have to weigh runtime performance against things like (openjdk) developer convenience, maintainability and build performance. As we are building the Java platform, we often give up a lot to eek out the last drops of runtime performance, but we sure aren't always making that tradeoff in favor of performance. As a very clear example, we could enable LTO (Link Time Optimization), which would likely give a measurable (though likely small) performance improvement at runtime, at the cost of a big increase in build time, but we haven't, because we don't think the tradeoff is worth it. My take on the current issue is that the potential savings in build time is easily comparable to using LTO or not, while the difference in runtime performance is likely different by orders of magnitudes. My point is that we make these kinds of calls quite often. So in this case, my take is that even if the size difference in the number of people impacted is big, I think the size difference in the actual impact more than makes up for it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20666#issuecomment-2315348318 From jzhu at openjdk.org Wed Aug 28 14:17:19 2024 From: jzhu at openjdk.org (Joshua Zhu) Date: Wed, 28 Aug 2024 14:17:19 GMT Subject: RFR: 8339063: [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only [v2] In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 09:21:00 GMT, Andrew Dinn wrote: > The code changes look ok. What have you done to test it? @adinn Thanks for your review! > The maximum SVE vector length "VLmax" is determined by the hardware: 16 <= VLmax <= 256. The value of VL can be configured at runtime: 16 <= VL <= VLmax, where VL must be a multiple of 16. > > Once we find cpu's VLMax is 16 bytes only, the verification "verify_sve_vector_length()" after native calls is not required - in other words, VL cannot be configured to a value other than 16. I checked the behavior of prctl(PR_SVE_SET_VL, value) by a separated C case. https://github.com/JoshuaZhuwj/openjdk_cases/blob/master/8339063/setSVEVL.c The output is aligned with the above expectation. https://github.com/JoshuaZhuwj/openjdk_cases/blob/master/8339063/output I have an aarch64 hardware at hand with only 128-bit SVE vector length. With this change applied, the generated native wrapper and native entry no longer check SVE VL change after native calls in the machine. I also ensure no regression failures by jtreg case: test/hotspot/jtreg/compiler/c2/aarch64/TestSVEWithJNI.java Also no regression failures when JVM starts up by specifying different MaxVectorSize. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20724#issuecomment-2315459504 From aph at openjdk.org Wed Aug 28 14:20:20 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 28 Aug 2024 14:20:20 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm In-Reply-To: References: <8CAXws7Rp6HKERu5hSTOrXi8GRFRdV4I670Nf8NSZlI=.ba6acccb-77e5-46a6-bec2-e0ea97dfe85d@github.com> Message-ID: On Tue, 27 Aug 2024 22:23:44 GMT, Srinivas Vamsi Parasa wrote: >> I agree, this is all rather obscure. Ideally the same names that are used in wherever this comes from. >> >> Where does the algorithm come from? What are its accuracy guarantees? >> >> In addition, given the rarity of hyperbolic tangents in Java applications, do we need this? > > @theRealAph, this implementation is based on Intel libm math library and meets the accuracy requirements. The algorithm is provided in the comments. Do you have a copy of this information? Should it be in the commit? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1734776732 From zzambers at openjdk.org Wed Aug 28 14:23:21 2024 From: zzambers at openjdk.org (Zdenek Zambersky) Date: Wed, 28 Aug 2024 14:23:21 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v5] In-Reply-To: References: Message-ID: <0MlDf-EYzm6DG3KhZTK9xJlXmb9CRkwhi4VwQGA5xgY=.3c38107b-5b73-4508-b245-45f15fac845a@github.com> On Tue, 27 Aug 2024 15:01:59 GMT, Severin Gehwolf wrote: > > If I am not mistaken, new test requires, that testsuite is ran as superuser (root). (Because it writes `/etc/systemd/system`, runs certain systemd commands). Should test be skipped for non-root? > > Thanks! I can add that. FWIW, container tests are in a similar situation (applying cpu/memory limits is not allowed in rootless on cg v1) and they don't check for it. I think, it would be nice to skip on non-root, as tests are often ran as non-root. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2315477927 From coleenp at openjdk.org Wed Aug 28 15:42:58 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 28 Aug 2024 15:42:58 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v3] In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 05:57:14 GMT, Ioi Lam wrote: >> Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: >> >> - With JDK-8338929 we don't need is_in_class_space(). >> - Merge branch 'master' into anon >> - Incorporated a set of Thomas Stuefe's comments. Take out AbstractClass MetaspaceObj::Type. >> - 8338526: Don't store abstract and interface Klasses in class metaspace > > src/hotspot/share/oops/metadata.hpp line 2: > >> 1: /* >> 2: * Copyright (c) 2011, 2024, Oracle and/or its affiliates. All rights reserved. > > No more change in this file? I removed the copyright. I tried to add an assert to CompresssedKlassPointers::encode_not_null but CDS calls this with ill-formed Klass* pointers so the tests crash. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1734914543 From coleenp at openjdk.org Wed Aug 28 15:42:57 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 28 Aug 2024 15:42:57 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v4] In-Reply-To: References: Message-ID: > This change stores InstanceKlass for interface and abstract classes in the non-class metaspace, since class metaspace will have limits on number of classes that can be represented when Lilliput changes go in. Classes that have no instances created for them don't require compressed class pointers. The generated LambdaForm classes are also AllStatic, and changing them to abstract moves them to non-class metaspace too. It's not technically great to make them abstract and not final but you can't have both. Java classfile access flags have no way of specifying something like AllStatic. > > Tested with tier1-8. Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'anon' of github.com:coleenp/jdk into anon - Fix copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19157/files - new: https://git.openjdk.org/jdk/pull/19157/files/94413cd1..1382ced0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19157&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19157&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19157.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19157/head:pull/19157 PR: https://git.openjdk.org/jdk/pull/19157 From rcastanedalo at openjdk.org Wed Aug 28 15:49:22 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 28 Aug 2024 15:49:22 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> <6PF-kgezzOb9Ed7j-BbrwaURnLJH5aFgOouFwTYiFrE=.670092b8-6980-43f3-a091-25312cfa0f1b@github.com> <1vQH6zpEgjhIO_mq9DCnpwxgDXmnqdx0owlvjJq4Fcw=.78e60c6e-2b23-4e94-a998-e7ba9eafcb6a@github.com> Message-ID: <6rvTU-KY2DpLx3sK7zmkaZCBaNELr3FLGItGwUJzNUM=.0e75c7a7-4309-49c5-b48b-5aa642bcbe43@github.com> On Tue, 27 Aug 2024 17:38:28 GMT, Martin Doerr wrote: >>> Regardless if you implement the compressed oops optimization or not, I'd reconsider moving the oop decoding into G1BarrierSetAssembler::g1_write_barrier_post_c2 because it makes the .ad file shorter because you can get rid of the replicated decode_heap_oop. >> >> I tried this refactoring [here](https://github.com/openjdk/jdk/commit/d4e83fd7d77c5415700b33556752e8c8da811dea), thanks again Martin for the suggestion. In my opinion, the result is similar in terms of readability/maintainability because the benefit of removing explicit `decode_heap_oop` operations in the ADL file is somewhat negated by the increased complexity of the `write_barrier_post` and `g1_write_barrier_post_c2` signatures. For aarch64, moving non-destructive `decode_heap_oop` operations would probably require passing both source and destination registers to these functions explicitly, which would make them more complex. I feel that this refactoring is only amortized when the new-value decoding operations are more complex, as in your PPC implementation. > > Thanks for trying! I like your refactored version. I prefer moving stuff out of the .ad files. The complexity of the barrier code is not significantly higher. Note that you could use `decode_heap_oop_not_null(Register dst, Register src)` with `tmp2` as dst (`generate_post_barrier_fast_path` can deal with new_val == tmp2). That would save a move instruction in some cases. > I haven't looked into the aarch64 code. I leave you free to decide. Thanks, I prototyped the refactored version for both x64 and aarch64 [here](https://github.com/robcasloz/jdk/commit/c1ae871eadac0d44981b7892ac8f7b64e8734283). I do not have a strong opinion for or against this refactoring. @albertnetymk @kimbarrett what do you think about it? (asking since you have recently looked at this code). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1734924686 From adinn at openjdk.org Wed Aug 28 15:55:18 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 28 Aug 2024 15:55:18 GMT Subject: RFR: 8339063: [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only [v2] In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 08:51:11 GMT, Joshua Zhu wrote: >> Please review this minor enhancement that skips verify_sve_vector_length after native calls. >> It works on SVE micro-architecture that only supports 128-bit vector length. > > Joshua Zhu has updated the pull request incrementally with one additional commit since the last revision: > > Fix compilation failure with --disable-precompiled-headers Marked as reviewed by adinn (Reviewer). Ok, that sounds like it is sufficient. ------------- PR Review: https://git.openjdk.org/jdk/pull/20724#pullrequestreview-2266680405 PR Comment: https://git.openjdk.org/jdk/pull/20724#issuecomment-2315725636 From sviswanathan at openjdk.org Wed Aug 28 16:08:25 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 28 Aug 2024 16:08:25 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v2] In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 00:12:26 GMT, Sandhya Viswanathan wrote: >> Hey @jaskarth , Central idea behind introducing VectorReinterpretNode after unsigned vector IR is to facilitate unboxing-boxing optimization, this explicit reinterpretation ensures type compatibility between value being boxed and box type which is always signed vector types. >> >> As mentioned previously my plan is to address is handle value range related concerns in a follow up patch along with intrisification and auto-vectorization of newly created scalar saturating IR, this patch is not generating scalar IR with newly defined unsigned types. > > Wonder if it would have been simpler if we added unsigned vector operators like Op_SaturatingUnsignedAddVB etc. We are not adding unsigned data types to Java, only supporting unsigned (saturating) operations on existing signed integral types. If the aim is to reduce the number of nodes, we could merge the Op_SaturatingAddVB, Op_SaturatingAddVS, Op_SaturatingAddVI, and Op_SaturatingAddVL into one Op_SaturatingAddV. Likewise for unsigned saturating add into Op_SaturatingUnsignedAddV. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1734951862 From sgehwolf at openjdk.org Wed Aug 28 16:13:07 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Wed, 28 Aug 2024 16:13:07 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v6] In-Reply-To: References: Message-ID: <-Ff0X6wkJWy78vOGT8F1m939z9Aoq8VjbUi_OTNoxko=.9447519f-8e98-4d9d-9c94-86cdbbbe3ae1@github.com> > Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. > > I'm adding those tests in order to not regress another time. > > Testing: > - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. > - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) > - [x] GHA Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Add root check for SystemdMemoryAwarenessTest.java - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Add Whitebox check for host cpu - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Fix comments - 8333446: Add tests for hierarchical container support ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19530/files - new: https://git.openjdk.org/jdk/pull/19530/files/eda249b4..7e8d9ed4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19530&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19530&range=04-05 Stats: 6081 lines in 315 files changed: 4125 ins; 864 del; 1092 mod Patch: https://git.openjdk.org/jdk/pull/19530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19530/head:pull/19530 PR: https://git.openjdk.org/jdk/pull/19530 From sgehwolf at openjdk.org Wed Aug 28 16:13:07 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Wed, 28 Aug 2024 16:13:07 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v5] In-Reply-To: <0MlDf-EYzm6DG3KhZTK9xJlXmb9CRkwhi4VwQGA5xgY=.3c38107b-5b73-4508-b245-45f15fac845a@github.com> References: <0MlDf-EYzm6DG3KhZTK9xJlXmb9CRkwhi4VwQGA5xgY=.3c38107b-5b73-4508-b245-45f15fac845a@github.com> Message-ID: On Wed, 28 Aug 2024 14:21:09 GMT, Zdenek Zambersky wrote: > > > If I am not mistaken, new test requires, that testsuite is ran as superuser (root). (Because it writes `/etc/systemd/system`, runs certain systemd commands). Should test be skipped for non-root? > > > > > > Thanks! I can add that. FWIW, container tests are in a similar situation (applying cpu/memory limits is not allowed in rootless on cg v1) and they don't check for it. > > I think, it would be nice to skip on non-root, as tests are often ran as non-root. Thanks. Done in https://github.com/openjdk/jdk/pull/19530/commits/7e8d9ed46815096ae8c4502f3320ebf5208438d5 ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2315757991 From iklam at openjdk.org Wed Aug 28 16:20:23 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 28 Aug 2024 16:20:23 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v4] In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 15:42:57 GMT, Coleen Phillimore wrote: >> This change stores InstanceKlass for interface and abstract classes in the non-class metaspace, since class metaspace will have limits on number of classes that can be represented when Lilliput changes go in. Classes that have no instances created for them don't require compressed class pointers. The generated LambdaForm classes are also AllStatic, and changing them to abstract moves them to non-class metaspace too. It's not technically great to make them abstract and not final but you can't have both. Java classfile access flags have no way of specifying something like AllStatic. >> >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'anon' of github.com:coleenp/jdk into anon > - Fix copyright Marked as reviewed by iklam (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19157#pullrequestreview-2266738043 From gziemski at openjdk.org Wed Aug 28 16:41:22 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Wed, 28 Aug 2024 16:41:22 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: <-NKUnExfzAqwhaGAtlJQD81b6wtoEnOhaXF4R779--s=.6d31a953-da4f-4875-8950-6971148ff639@github.com> References: <-NKUnExfzAqwhaGAtlJQD81b6wtoEnOhaXF4R779--s=.6d31a953-da4f-4875-8950-6971148ff639@github.com> Message-ID: <19VwWXE1ZczFuYtfnD2ioXSV-oh40JUlk67aMsNIGN8=.0783ce36-2668-4866-952a-94cbfcbf4ba4@github.com> On Thu, 15 Aug 2024 07:59:21 GMT, Kim Barrett wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. >> >> There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) > > The suggestion of "nmtGC" from @daholmes initially looked somewhat appealing > to me. But that then suggests "NMTType" or (better?) "NMTGroup" or something > like that. But I don't much like the look of or typing those acronyms. (Note > that NMTGroup is HotSpot style, not NmtGroup.) > > So I'm still preferring "MemType". @kimbarrett @stefank Would you like to name your nominees? As last voters you have quite weight behind your choices... | | MemType | MemTypeFlag | NMTCat | NMTGroup | NMT_MemType | NMT::MemType | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | gerard | 1 | 0 | 0 | 0 | 2 | 3 | | David | 0 | 3 | 0 | 0 | 1(NMTMemType) | 2 | | Thomas | 0 | 0 | 2 | 3 | 0 | 0 | | Johan | 0 | 0 | 2(NMTCategory) | 3 | 0 | 0 | | Afshin | 0 | 3 | 0 | 0 | 0 | 2 | | Stefan | ? | ? | ? | ? | ? | ? | | Kim | ? | ? | ? | ? | ? | ? | | Coleen | 0 | 0 | 0 | 0 | 0 | 3 | | | 1 | 6 | 4 | 6 | 3 | 10 | ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2315814746 From dcubed at openjdk.org Wed Aug 28 17:13:21 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 28 Aug 2024 17:13:21 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 17:13:06 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. > > There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) I think it must have `NMT` somewhere in the name. `NMTGroup` sounds like there could more than one value set at one time, but that might be just me. `NMTCat` (and `NMTGroup`) lose the idea that `Type` should be in there somewhere. For ease of typing, I'm not fond of `NMT::MemType` but it has the appeal of being in a namespace... So I'm going with: NMT::MemType - 3 points NMT_MemType/NMTMemType(prefer for ease of typing) - 2 points and I'm skipping a 1 point assignment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2315871575 From jiangli at openjdk.org Wed Aug 28 17:19:23 2024 From: jiangli at openjdk.org (Jiangli Zhou) Date: Wed, 28 Aug 2024 17:19:23 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: On Tue, 27 Aug 2024 23:15:03 GMT, Jiangli Zhou wrote: >> And the discussion whether the checks are made "dynamically" or "statically" is too simplified to be really helpful. >> >> Currently, we compile two sets of all object files, with slightly different compiler arguments, one for dynamic libraries and one for static libraries. Files that are doing things differently for these two modes have an #ifdef, so the alternative way of doing things are not included in the object file. >> >> In your branch, you still have a separate compilation of all files for static builds, but you also try to figure out through various means (which involves jumping through some hoops to get the bootstrapping right) if this is a static build or a dynamic build. In a way, one could argue that this is just worse than the current solution, since you are still recompiling all files separately for static libraries so you could "know" at build time if you are static or not. >> >> What I am trying to do is to get to a point where we can compile almost all files just once, and then have two trivially small files that are compiled twice, with just a different value of a define that makes the difference. To propagate this information to all other object files, they need to call the function provided in this object file. So, is it then a "build time" lookup or a "runtime lookup", or a "static lookup" vs "dynamic lookup"? The semantics does not really matter. The whole point is that the difference in build is reduced to an absolute minimum. Sure, this single "lookup" function could be created more like the way you are doing in your branch to try to figure this out without the help of the build system, but there is really no point in that. This is a simple and elegant solution. > > @magicus please also specify contributor properly to so it's clear part of the change is based on/extracted from https://github.com/openjdk/leyden/tree/hermetic-java-runtime. > @jianglizhou Are there any other authors on the `hermetic-java-runtime` branch that should be credited? For any commits in https://github.com/openjdk/leyden/compare/master...hermetic-java-runtime contributed by other contributor(s) or additional contributor(s), I documented that the commit message (e.g. https://github.com/openjdk/leyden/commit/4faa3a964ec550e410c741048c7e0ed99ac64b52). The current PR is related to the following. Please refer to those commit messages. - https://github.com/openjdk/leyden/commit/7d75a7f4d6aa020b7580fbbf660b2b3e3a41b274 - https://github.com/openjdk/leyden/commit/22d8c439157b61acdfe99090d39f91c09388b1b1 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20666#issuecomment-2315882065 From coleenp at openjdk.org Wed Aug 28 17:19:25 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 28 Aug 2024 17:19:25 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v4] In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 15:42:57 GMT, Coleen Phillimore wrote: >> This change stores InstanceKlass for interface and abstract classes in the non-class metaspace, since class metaspace will have limits on number of classes that can be represented when Lilliput changes go in. Classes that have no instances created for them don't require compressed class pointers. The generated LambdaForm classes are also AllStatic, and changing them to abstract moves them to non-class metaspace too. It's not technically great to make them abstract and not final but you can't have both. Java classfile access flags have no way of specifying something like AllStatic. >> >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'anon' of github.com:coleenp/jdk into anon > - Fix copyright Thanks Ioi. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19157#issuecomment-2315880742 From coleenp at openjdk.org Wed Aug 28 17:19:25 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 28 Aug 2024 17:19:25 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v4] In-Reply-To: References: <0UeIWfhsJRNahI3I8r8wKfGlvWuY-0crKHqGxbPXk5o=.6b9ff3f1-f297-40f0-9c81-99507c2b2880@github.com> <77NWcTx23rX8UnhRcnOqS36y4Y-7-zDEf3hyYD1bcbw=.6f02be24-3cd2-44d8-8483-934295aab9fd@github.com> Message-ID: On Wed, 21 Aug 2024 21:32:15 GMT, Chen Liang wrote: >> I feel like making it ACC_INTERFACE might cause some error if there are no public nonstatic methods, which is the case with this class. I don't know what @liach your comment means, but this code got more complicated than it was with the first version of this change. When I talked to @rose00 he thought ACC_ABSTRACT would be okay for this. > > Yes, you are right that we currently don't add ACC_PUBLIC flags on methods, which will fail if we add ACC_INTERFACE. Same for the fields; these LambdaForm classes use fields to store class data that's usually stored as condy, because LambdaForm is the infrastructure that condy uses (like LambdaMetafacotry cannot use lambdas). Those fields are my concerns for the interface migration. Thanks for reviewing this part. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1735039561 From stooke at openjdk.org Wed Aug 28 19:29:29 2024 From: stooke at openjdk.org (Simon Tooke) Date: Wed, 28 Aug 2024 19:29:29 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows Message-ID: This PR changes the status of realpath() from a Posix-specific API to a globally available API, i.e. adding it to the "Hotspot Porting API". Code would refer to os::realpath() instead of os::Posix::realpath(). This requires the addition of a stub routine in os_posix.cpp and a Windows implementation of realpath(), using Windows _fullpath(). This PR depends on #20597 in that it removes the need for one #ifdef in that PR. Because of that, this PR will be modified when and if #20597 is integrated. Please note that guidelines for doing this appear in src/hotspot/share/runtime/os.hpp ------------- Commit messages: - hoist os::Posix::realpath() to os::realpath() Changes: https://git.openjdk.org/jdk/pull/20683/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20683&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338851 Stats: 73 lines in 11 files changed: 54 ins; 7 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/20683.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20683/head:pull/20683 PR: https://git.openjdk.org/jdk/pull/20683 From lmesnik at openjdk.org Wed Aug 28 20:20:23 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 28 Aug 2024 20:20:23 GMT Subject: Integrated: 8339030: frame::print_value_on(outputStream* st, JavaThread *thread) doesn't need thread argument In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 00:19:04 GMT, Leonid Mesnik wrote: > Method > frame::print_value_on(outputStream* st, JavaThread *thread) doesn't need thread argument > > it usually is called with nullptr as second arg except > JavaThread::trace_frames() > where it is called with this. > > It seems that thread has never been used since 2007 so makes sense just to get rid of it. > > Tested building all builds available in CI and running tier13 This pull request has now been integrated. Changeset: d03ec7aa Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/d03ec7aad41d830b47801b7af75ee5e278128e69 Stats: 16 lines in 7 files changed: 0 ins; 0 del; 16 mod 8339030: frame::print_value_on(outputStream* st, JavaThread *thread) doesn't need thread argument Reviewed-by: dholmes, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/20721 From dholmes at openjdk.org Thu Aug 29 02:45:53 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 29 Aug 2024 02:45:53 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v7] In-Reply-To: References: Message-ID: > This work has been split out from JDK-8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths > > The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. Though with compact strings this reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define UTF8 lengths as `size_t` to accommodate all possible representations. Higher-level API's can still use `int` if they know the strings (eg symbols) are sufficiently constrained in length. See the comments in utf8.hpp that explain Strings, compact strings and the encoding. > > As the existing JNI `GetStringUTFLength` still requires the current truncating behaviour of ` UNICODE::utf8_length` we add back `UNICODE::utf8_length_as_int` for it to use. > > Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` use `length` as an IN/OUT parameter: it is the incoming (int) length of the jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. This makes some of the call sites a little messy with casts. > > Testing: > - tiers 1-4 > - GHA David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - Merge branch 'master' into 8338257-utf8-length - Extra assertion requested by tstuefe - more missing casts - fix cast - missing cast - Fix incorrect comments and size_t use per Dean's review - Add missing cast for signed-to-unsigned converion. - unnecessary cast - Fix comments - Fix off-by-one error - ... and 3 more: https://git.openjdk.org/jdk/compare/6e817a2b...9dce4ffb ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20560/files - new: https://git.openjdk.org/jdk/pull/20560/files/3d36ba52..9dce4ffb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20560&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20560&range=05-06 Stats: 38818 lines in 1231 files changed: 22068 ins; 10873 del; 5877 mod Patch: https://git.openjdk.org/jdk/pull/20560.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20560/head:pull/20560 PR: https://git.openjdk.org/jdk/pull/20560 From iklam at openjdk.org Thu Aug 29 04:23:34 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 29 Aug 2024 04:23:34 GMT Subject: RFR: 8338018: Rename ClassPrelinker to AOTConstantPoolResolver Message-ID: This is the 2nd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). A simple renaming of the `ClassPrelinker` class to `AOTConstantPoolLinker`, so that the name is consistent with new classes that will be introduced in subsequent PRs for JEP 483 (`AOTClassLinker`, `AOTLinkedClassTable`, and `AOTLinkedClassBulkLoader`). ----- See [here](https://bugs.openjdk.org/browse/JDK-8315737) for the sequence of dependent RFEs for implementing JEP 483. ------------- Depends on: https://git.openjdk.org/jdk/pull/20516 Commit messages: - Merge branch 'jep-483-step-01-8338017-add-aot-command-line-aliases' into jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver - Merge branch 'jep-483-step-01-8338017-add-aot-command-line-aliases' into jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver - 8338018: Rename ClassPrelinker to AOTConstantPoolResolver Changes: https://git.openjdk.org/jdk/pull/20517/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20517&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338018 Stats: 717 lines in 10 files changed: 352 ins; 350 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/20517.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20517/head:pull/20517 PR: https://git.openjdk.org/jdk/pull/20517 From dholmes at openjdk.org Thu Aug 29 05:26:20 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 29 Aug 2024 05:26:20 GMT Subject: RFR: 8338018: Rename ClassPrelinker to AOTConstantPoolResolver In-Reply-To: References: Message-ID: On Fri, 9 Aug 2024 00:26:27 GMT, Ioi Lam wrote: > This is the 2nd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > A simple renaming of the `ClassPrelinker` class to `AOTConstantPoolLinker`, so that the name is consistent with new classes that will be introduced in subsequent PRs for JEP 483 (`AOTClassLinker`, `AOTLinkedClassTable`, and `AOTLinkedClassBulkLoader`). > > ----- > See [here](https://bugs.openjdk.org/browse/JDK-8315737) for the sequence of dependent RFEs for implementing JEP 483. Rename looks fine, but there seems to be an unrelated change. Thanks src/hotspot/share/cds/aotConstantPoolResolver.hpp line 42: > 40: class Klass; > 41: > 42: template class GrowableArray; This doesn't seem part of the rename. ------------- PR Review: https://git.openjdk.org/jdk/pull/20517#pullrequestreview-2267705000 PR Review Comment: https://git.openjdk.org/jdk/pull/20517#discussion_r1735584155 From stuefe at openjdk.org Thu Aug 29 05:28:21 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 29 Aug 2024 05:28:21 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v4] In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 15:42:57 GMT, Coleen Phillimore wrote: >> This change stores InstanceKlass for interface and abstract classes in the non-class metaspace, since class metaspace will have limits on number of classes that can be represented when Lilliput changes go in. Classes that have no instances created for them don't require compressed class pointers. The generated LambdaForm classes are also AllStatic, and changing them to abstract moves them to non-class metaspace too. It's not technically great to make them abstract and not final but you can't have both. Java classfile access flags have no way of specifying something like AllStatic. >> >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'anon' of github.com:coleenp/jdk into anon > - Fix copyright src/hotspot/share/jfr/recorder/checkpoint/types/traceid/jfrTraceIdKlassQueue.cpp line 79: > 77: > 78: static bool can_compress_element(const Klass* klass) { > 79: return Metaspace::is_in_class_space(klass) && Suggestion: return (Metaspace::is_in_class_space(klass) || Metaspace::is_in_shared_metaspace(klass)) && ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1735585478 From jbhateja at openjdk.org Thu Aug 29 05:42:58 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 29 Aug 2024 05:42:58 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Adding descriptive comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/408a8694..8d71f175 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=05-06 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From iklam at openjdk.org Thu Aug 29 05:46:24 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 29 Aug 2024 05:46:24 GMT Subject: RFR: 8338018: Rename ClassPrelinker to AOTConstantPoolResolver In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 05:22:24 GMT, David Holmes wrote: >> This is the 2nd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> A simple renaming of the `ClassPrelinker` class to `AOTConstantPoolLinker`, so that the name is consistent with new classes that will be introduced in subsequent PRs for JEP 483 (`AOTClassLinker`, `AOTLinkedClassTable`, and `AOTLinkedClassBulkLoader`). >> >> ----- >> See [here](https://bugs.openjdk.org/browse/JDK-8315737) for the sequence of dependent RFEs for implementing JEP 483. > > src/hotspot/share/cds/aotConstantPoolResolver.hpp line 42: > >> 40: class Klass; >> 41: >> 42: template class GrowableArray; > > This doesn't seem part of the rename. This header was using GrowableArray without declaring it. The problem is discovered after this header is moved earlier due to alphabetical sorting. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20517#discussion_r1735599671 From jbhateja at openjdk.org Thu Aug 29 05:46:24 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 29 Aug 2024 05:46:24 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v6] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Tue, 27 Aug 2024 20:00:56 GMT, Paul Sandoz wrote: > My comment was related to understanding what `SelectFromTwoVectorNode::Ideal` and `VectorRearrangeNode::Ideal` are doing - the former lowers, if needed, into the rearrange expression and the latter adjusts, if needed, the index vector (a comment describing this transformation would be useful, like you have in the former method). Done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2316759572 From dholmes at openjdk.org Thu Aug 29 05:48:25 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 29 Aug 2024 05:48:25 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 18:36:39 GMT, Simon Tooke wrote: > This PR changes the status of realpath() from a Posix-specific API to a globally available API, i.e. adding it to the "Hotspot Porting API". Code would refer to os::realpath() instead of os::Posix::realpath(). > > This requires the addition of a stub routine in os_posix.cpp and a Windows implementation of realpath(), using Windows _fullpath(). > > This PR depends on #20597 in that it removes the need for one #ifdef in that PR. Because of that, this PR will be modified when and if #20597 is integrated. > > Please note that guidelines for doing this appear in src/hotspot/share/runtime/os.hpp This is okay in principle but a few changes can be made. Also it seems that none of the callers of `realpath` ever check `errno` so I think that can be removed. Thanks src/hotspot/os/posix/os_posix.cpp line 896: > 894: char* os::realpath(const char* filename, char* outbuf, size_t outbuflen) { > 895: return os::Posix::realpath(filename, outbuf, outbuflen); > 896: } We don't need `os::Posix::realpath` any more - just rename it to `os::realpath`. src/hotspot/os/windows/os_windows.cpp line 5319: > 5317: > 5318: char* os::realpath(const char* filename, char* outbuf, size_t outbuflen) { > 5319: return os::win32::realpath(filename, outbuf, outbuflen); Again you don't need the indirection here. src/hotspot/os/windows/os_windows.cpp line 5344: > 5342: // In this case, use the user provided buffer but at least check whether _fullpath caused > 5343: // a memory overwrite. > 5344: if (errno == EINVAL) { There is nothing to indicate that `_fullpath` can ever set `EINVAL` it is only specified to return null on error. This code should not check errno but can just re-try with the user-supplied buffer. ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20683#pullrequestreview-2267709458 PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1735587113 PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1735588758 PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1735593867 From dholmes at openjdk.org Thu Aug 29 05:51:17 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 29 Aug 2024 05:51:17 GMT Subject: RFR: 8338018: Rename ClassPrelinker to AOTConstantPoolResolver In-Reply-To: References: Message-ID: <_TSPt39si1s0hTd6vFlPjD2feICfdm6B3Uc7v-xrDvA=.12cbbcff-5d1f-47f3-84f1-4c16bddadbde@github.com> On Fri, 9 Aug 2024 00:26:27 GMT, Ioi Lam wrote: > This is the 2nd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > A simple renaming of the `ClassPrelinker` class to `AOTConstantPoolLinker`, so that the name is consistent with new classes that will be introduced in subsequent PRs for JEP 483 (`AOTClassLinker`, `AOTLinkedClassTable`, and `AOTLinkedClassBulkLoader`). > > ----- > See [here](https://bugs.openjdk.org/browse/JDK-8315737) for the sequence of dependent RFEs for implementing JEP 483. Marked as reviewed by dholmes (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20517#pullrequestreview-2267734206 From dholmes at openjdk.org Thu Aug 29 05:51:18 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 29 Aug 2024 05:51:18 GMT Subject: RFR: 8338018: Rename ClassPrelinker to AOTConstantPoolResolver In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 05:44:06 GMT, Ioi Lam wrote: >> src/hotspot/share/cds/aotConstantPoolResolver.hpp line 42: >> >>> 40: class Klass; >>> 41: >>> 42: template class GrowableArray; >> >> This doesn't seem part of the rename. > > This header was using GrowableArray without declaring it. The problem is discovered after this header is moved earlier due to alphabetical sorting. Okay - thanks for explaining. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20517#discussion_r1735602874 From stuefe at openjdk.org Thu Aug 29 05:56:23 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 29 Aug 2024 05:56:23 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v7] In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 05:52:26 GMT, Thomas Stuefe wrote: > Many of these translations seem awkward, since they convert to size_t only to then convert back to int. > > Proposal: I undestand you need to find a good point to tourniquet off the int->size_t conversion to minimize the translations needed. But I'd consider converting SymbolTable functions to size_t too. SymbolTable already does not use the full width of the int length parameter, so functionally nothing changes (it needs to check the length for validity). > > If you worry that the changes fan out too much, at least consider converting SymbolTable::new_symbol. That appears about ten times. If you want to leave it as it is, that is fine, in that case I'll review the existing patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20560#issuecomment-2316769341 From stuefe at openjdk.org Thu Aug 29 05:56:22 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 29 Aug 2024 05:56:22 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v7] In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 02:45:53 GMT, David Holmes wrote: >> This work has been split out from JDK-8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths >> >> The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. Though with compact strings this reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define UTF8 lengths as `size_t` to accommodate all possible representations. Higher-level API's can still use `int` if they know the strings (eg symbols) are sufficiently constrained in length. See the comments in utf8.hpp that explain Strings, compact strings and the encoding. >> >> As the existing JNI `GetStringUTFLength` still requires the current truncating behaviour of ` UNICODE::utf8_length` we add back `UNICODE::utf8_length_as_int` for it to use. >> >> Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` use `length` as an IN/OUT parameter: it is the incoming (int) length of the jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. This makes some of the call sites a little messy with casts. >> >> Testing: >> - tiers 1-4 >> - GHA > > David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - Merge branch 'master' into 8338257-utf8-length > - Extra assertion requested by tstuefe > - more missing casts > - fix cast > - missing cast > - Fix incorrect comments and size_t use per Dean's review > - Add missing cast for signed-to-unsigned converion. > - unnecessary cast > - Fix comments > - Fix off-by-one error > - ... and 3 more: https://git.openjdk.org/jdk/compare/fb153748...9dce4ffb Many of these translations seem awkward, since they convert to size_t only to then convert back to int. Proposal: I undestand you need to find a good point to tourniquet off the int->size_t conversion to minimize the translations needed. But I'd consider converting SymbolTable functions to size_t too. SymbolTable already does not use the full width of the int length parameter, so functionally nothing changes (it needs to check the length for validity). If you worry that the changes fan out too much, at least consider converting SymbolTable::new_symbol. That appears about ten times. ------------- PR Review: https://git.openjdk.org/jdk/pull/20560#pullrequestreview-2267738621 From dholmes at openjdk.org Thu Aug 29 08:02:21 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 29 Aug 2024 08:02:21 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v7] In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 05:53:25 GMT, Thomas Stuefe wrote: > Many of these translations seem awkward, since they convert to size_t only to then convert back to int. Can you be more specific here please? I know the most awkward case is where we have an in/out parameter that brings in an int and sends out a size_t, but there is not much to do about that without converting everything to int and I think there will be a lot of fan out because ultimately these mostly come back to array lengths, and symbol lengths, which are all int (and don't need to be bigger). So I would really prefer to not try and apply more bandages at this stage. I really just want this part fixed so I can proceed with the JNI update. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20560#issuecomment-2316956852 From ihse at openjdk.org Thu Aug 29 08:29:20 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 29 Aug 2024 08:29:20 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: On Wed, 21 Aug 2024 22:14:40 GMT, Magnus Ihse Bursie wrote: >> As a preparation for Hermetic Java, we need to have a way to look up during runtime if we are using a statically linked library or not. >> >> This change will be the first step needed towards compiling the object files only once, and then link them into either dynamic or static libraries. (The only exception will be the linktype.c[pp] files, which needs to be compiled twice, once for the dynamic libraries and once for the static libraries.) Getting there will require further work though. >> >> This is part of the changes that make up the draft PR https://github.com/openjdk/jdk/pull/19478, which I have broken out. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Also update build to link properly Okay. Unless I misunderstand something, there were no additional authors to be credited for these two commits. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20666#issuecomment-2317008624 From ayang at openjdk.org Thu Aug 29 08:40:21 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 29 Aug 2024 08:40:21 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: <6rvTU-KY2DpLx3sK7zmkaZCBaNELr3FLGItGwUJzNUM=.0e75c7a7-4309-49c5-b48b-5aa642bcbe43@github.com> References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> <6PF-kgezzOb9Ed7j-BbrwaURnLJH5aFgOouFwTYiFrE=.670092b8-6980-43f3-a091-25312cfa0f1b@github.com> <1vQH6zpEgjhIO_mq9DCnpwxgDXmnqdx0owlvjJq4Fcw=.78e60c6e-2b23-4e94-a998-e7ba9eafcb6a@github.com> <6rvTU-KY2DpLx3sK7zmkaZCBaNELr3FLGItGwUJzNUM=.0e75c7a7-4309-49c5-b48b-5aa642bcbe43@github.com> Message-ID: On Wed, 28 Aug 2024 15:46:57 GMT, Roberto Casta?eda Lozano wrote: >> Thanks for trying! I like your refactored version. I prefer moving stuff out of the .ad files. The complexity of the barrier code is not significantly higher. Note that you could use `decode_heap_oop_not_null(Register dst, Register src)` with `tmp2` as dst (`generate_post_barrier_fast_path` can deal with new_val == tmp2). That would save a move instruction in some cases. >> I haven't looked into the aarch64 code. I leave you free to decide. > > Thanks, I prototyped the refactored version for both x64 and aarch64 [here](https://github.com/robcasloz/jdk/commit/c1ae871eadac0d44981b7892ac8f7b64e8734283). I do not have a strong opinion for or against this refactoring. @albertnetymk @kimbarrett what do you think about it? (asking since you have recently looked at this code). I find the use of default-arg-value `bool decode_new_val = false` a bit confusing. (I tend to think default-arg-value makes the code less readable in general.) If not using default-arg-value, I suspect the diff will be larger, and I don't see the immediate benefit of this refactoring. Maybe this can be deferred to its own PR if it's really desirable? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1735806805 From stuefe at openjdk.org Thu Aug 29 08:41:21 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 29 Aug 2024 08:41:21 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v7] In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 07:59:47 GMT, David Holmes wrote: > > Many of these translations seem awkward, since they convert to size_t only to then convert back to int. > > Can you be more specific here please? Certainly. For example, this construct: size_t utf8_len = static_cast(length); const char* base = UNICODE::as_utf8(position, utf8_len); Symbol* sym = SymbolTable::new_symbol(base, checked_cast(utf8_len)); We introduce `utf8_len` as a `size_t` synonym for len, but since it originates from an `int`, its length must be <= INT_MIN. We also assume it is >=0, but we don't check. We feed `utf8_len` into both `UNICODE::as_utf8` and `SymbolTable::new_symbol`. The former takes a size_t, but since we rely on `length` >= 0, we could just as well give it `length`. For `SymbolTable::new_symbol`, we translate `utf8_len` back to `int`, with a check. The check feels superfluous since `utf8_len` came from an int. I assume this verbosity is for the benefit of the code reader, to make intent clear. Otherwise, we could have just continued to use length, and just cast it on the fly to unsigned or to size_t when calling `UNICODE::as_utf8`. My thought was that if `SymbolTable::new_symbol` would take size_t too, the introduction of utf8_len would serve more of a point. > I know the most awkward case is where we have an in/out parameter that brings in an int and sends out a size_t, but there is not much to do about that without converting everything to int and I think there will be a lot of fan out because ultimately these mostly come back to array lengths, and symbol lengths, which are all int (and don't need to be bigger). > > So I would really prefer to not try and apply more bandages at this stage. I really just want this part fixed so I can proceed with the JNI update. Okay, sure. I'll review the patch for correctness then in its current form. > > Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20560#issuecomment-2317033703 From stuefe at openjdk.org Thu Aug 29 08:58:24 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 29 Aug 2024 08:58:24 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v7] In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 02:45:53 GMT, David Holmes wrote: >> This work has been split out from JDK-8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths >> >> The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. Though with compact strings this reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define UTF8 lengths as `size_t` to accommodate all possible representations. Higher-level API's can still use `int` if they know the strings (eg symbols) are sufficiently constrained in length. See the comments in utf8.hpp that explain Strings, compact strings and the encoding. >> >> As the existing JNI `GetStringUTFLength` still requires the current truncating behaviour of ` UNICODE::utf8_length` we add back `UNICODE::utf8_length_as_int` for it to use. >> >> Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` use `length` as an IN/OUT parameter: it is the incoming (int) length of the jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. This makes some of the call sites a little messy with casts. >> >> Testing: >> - tiers 1-4 >> - GHA > > David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - Merge branch 'master' into 8338257-utf8-length > - Extra assertion requested by tstuefe > - more missing casts > - fix cast > - missing cast > - Fix incorrect comments and size_t use per Dean's review > - Add missing cast for signed-to-unsigned converion. > - unnecessary cast > - Fix comments > - Fix off-by-one error > - ... and 3 more: https://git.openjdk.org/jdk/compare/8c60a602...9dce4ffb I think this patch was onerous but valuable work. I have one question inline. Other than that, I think most of the cases where you modified calls to `SymbolTable` to feed in the new size_t length, but checked, could have just continued to use the old int length. But I did not find any errors here. I'll mark this as approved. Up to you if you address my inline concern. src/hotspot/share/classfile/javaClasses.hpp line 138: > 136: // Legacy variants that truncate the length if needed > 137: static int utf8_length_as_int(oop java_string); > 138: static int utf8_length_as_int(oop java_string, typeArrayOop string_value); I don't get the point of this variant of the function. It takes a string and a typearray. What is the contract here, is the only value allowed for typearray the array oop underlying the string? If yes, why do you assert for value equality in the function? That implies I can feed in any typearrayoop here as long as it has the same value as the string. I can only see a single point where this function is used, so that does not explain much. Maybe I am overlooking something, but why not just inline the code into that one using call site? ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20560#pullrequestreview-2268097833 PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1735827631 From rcastanedalo at openjdk.org Thu Aug 29 09:11:23 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 29 Aug 2024 09:11:23 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> <6PF-kgezzOb9Ed7j-BbrwaURnLJH5aFgOouFwTYiFrE=.670092b8-6980-43f3-a091-25312cfa0f1b@github.com> <1vQH6zpEgjhIO_mq9DCnpwxgDXmnqdx0owlvjJq4Fcw=.78e60c6e-2b23-4e94-a998-e7ba9eafcb6a@github.com> <6rvTU-KY2DpLx3sK7zmkaZCBaNELr3FLGItGwUJzNUM=.0e75c7a7-4309-49c5-b48b-5aa642bcbe43@github.com> Message-ID: On Thu, 29 Aug 2024 08:37:24 GMT, Albert Mingkun Yang wrote: >> Thanks, I prototyped the refactored version for both x64 and aarch64 [here](https://github.com/robcasloz/jdk/commit/c1ae871eadac0d44981b7892ac8f7b64e8734283). I do not have a strong opinion for or against this refactoring. @albertnetymk @kimbarrett what do you think about it? (asking since you have recently looked at this code). > > I find the use of default-arg-value `bool decode_new_val = false` a bit confusing. (I tend to think default-arg-value makes the code less readable in general.) > > If not using default-arg-value, I suspect the diff will be larger, and I don't see the immediate benefit of this refactoring. Maybe this can be deferred to its own PR if it's really desirable? Thanks for looking at it, Albert! Since there is no clear consensus, let's postpone the refactoring. We can come back to it after the JEP is integrated if there is renewed interest. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1735852561 From stefank at openjdk.org Thu Aug 29 09:31:21 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 29 Aug 2024 09:31:21 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 20:08:43 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix bit counts in GCForwarding src/hotspot/share/cds/archiveHeapWriter.cpp line 214: > 212: oopDesc::set_mark(mem, markWord::prototype()); > 213: oopDesc::release_set_klass(mem, k); > 214: } The `UseCompactObjectHeaders` path calls `get_requested_narrow_klass`, while the `else` part directly uses `k`. Is one of these paths incorrect? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1735881613 From fgao at openjdk.org Thu Aug 29 09:58:21 2024 From: fgao at openjdk.org (Fei Gao) Date: Thu, 29 Aug 2024 09:58:21 GMT Subject: RFR: 8339063: [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only [v2] In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 08:51:11 GMT, Joshua Zhu wrote: >> Please review this minor enhancement that skips verify_sve_vector_length after native calls. >> It works on SVE micro-architecture that only supports 128-bit vector length. > > Joshua Zhu has updated the pull request incrementally with one additional commit since the last revision: > > Fix compilation failure with --disable-precompiled-headers src/hotspot/cpu/aarch64/aarch64_vector.ad line 158: > 156: > 157: int length_in_bytes = vlen * type2aelembytes(bt); > 158: if (UseSVE == 0 && length_in_bytes > FloatRegister::neon_vl) { Should we also update `aarch64_vector_ad.m4` to avoid any mismatch :) ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20724#discussion_r1735911352 From jzhu at openjdk.org Thu Aug 29 10:54:19 2024 From: jzhu at openjdk.org (Joshua Zhu) Date: Thu, 29 Aug 2024 10:54:19 GMT Subject: RFR: 8339063: [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only [v2] In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 09:50:33 GMT, Fei Gao wrote: >> Joshua Zhu has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix compilation failure with --disable-precompiled-headers > > src/hotspot/cpu/aarch64/aarch64_vector.ad line 158: > >> 156: >> 157: int length_in_bytes = vlen * type2aelembytes(bt); >> 158: if (UseSVE == 0 && length_in_bytes > FloatRegister::neon_vl) { > > Should we also update `aarch64_vector_ad.m4` to avoid any mismatch :) ? Nice catch! I overlooked this place. Thanks for your reminder! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20724#discussion_r1735987350 From stuefe at openjdk.org Thu Aug 29 11:40:25 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 29 Aug 2024 11:40:25 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 20:08:43 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix bit counts in GCForwarding src/hotspot/share/cds/archiveHeapWriter.hpp line 261: > 259: // at mapping start, these 4G are enough. Therefore, we don't need to shift at all (shift=0). > 260: static constexpr int precomputed_narrow_klass_shift = 0; > 261: Reviewer Note: move to ArchiveBuilder ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1736042302 From coleenp at openjdk.org Thu Aug 29 11:40:27 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 29 Aug 2024 11:40:27 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v4] In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 05:24:18 GMT, Thomas Stuefe wrote: >> Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'anon' of github.com:coleenp/jdk into anon >> - Fix copyright > > src/hotspot/share/jfr/recorder/checkpoint/types/traceid/jfrTraceIdKlassQueue.cpp line 79: > >> 77: >> 78: static bool can_compress_element(const Klass* klass) { >> 79: return Metaspace::is_in_class_space(klass) && > > Suggestion: > > return (Metaspace::is_in_class_space(klass) || Metaspace::is_in_shared_metaspace(klass)) && Is this right? If UseCompressedClassPointers is off, then the shared metaspace isn't in compressed space? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1736041738 From coleenp at openjdk.org Thu Aug 29 12:08:37 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 29 Aug 2024 12:08:37 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v5] In-Reply-To: References: Message-ID: <0VQJugX9IulwqoN4WWxCixyhPRhfGs-48Vm5DB0s-VU=.334232df-93b0-453a-aba7-0cf26cecf8d1@github.com> > This change stores InstanceKlass for interface and abstract classes in the non-class metaspace, since class metaspace will have limits on number of classes that can be represented when Lilliput changes go in. Classes that have no instances created for them don't require compressed class pointers. The generated LambdaForm classes are also AllStatic, and changing them to abstract moves them to non-class metaspace too. It's not technically great to make them abstract and not final but you can't have both. Java classfile access flags have no way of specifying something like AllStatic. > > Tested with tier1-8. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Add function in Metaspace to tell you if Klass pointer is in compressible space. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19157/files - new: https://git.openjdk.org/jdk/pull/19157/files/1382ced0..ce96165e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19157&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19157&range=03-04 Stats: 5 lines in 2 files changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19157.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19157/head:pull/19157 PR: https://git.openjdk.org/jdk/pull/19157 From dholmes at openjdk.org Thu Aug 29 12:33:25 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 29 Aug 2024 12:33:25 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v7] In-Reply-To: References: Message-ID: <5r0YV_6asNY0Sh0eIMM1050eFE4f1OFMGMXJ1f0rHhY=.49780388-5a3f-4b71-b596-3a3096e7d0b4@github.com> On Thu, 29 Aug 2024 08:52:03 GMT, Thomas Stuefe wrote: >> David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: >> >> - Merge branch 'master' into 8338257-utf8-length >> - Extra assertion requested by tstuefe >> - more missing casts >> - fix cast >> - missing cast >> - Fix incorrect comments and size_t use per Dean's review >> - Add missing cast for signed-to-unsigned converion. >> - unnecessary cast >> - Fix comments >> - Fix off-by-one error >> - ... and 3 more: https://git.openjdk.org/jdk/compare/e81bf3da...9dce4ffb > > src/hotspot/share/classfile/javaClasses.hpp line 138: > >> 136: // Legacy variants that truncate the length if needed >> 137: static int utf8_length_as_int(oop java_string); >> 138: static int utf8_length_as_int(oop java_string, typeArrayOop string_value); > > I don't get the point of this variant of the function. It takes a string and a typearray. What is the contract here, is the only value allowed for typearray the array oop underlying the string? If yes, why do you assert for value equality in the function? That implies I can feed in any typearrayoop here as long as it has the same value as the string. > > I can only see a single point where this function is used, so that does not explain much. Maybe I am overlooking something, but why not just inline the code into that one using call site? The as_int versions mirror the existing not-as-int versions. I admit I don't really see the point of unwrapping the array from the string but then pass them both. I assume they are intended/required to be a matching pair. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1736113281 From dholmes at openjdk.org Thu Aug 29 12:40:24 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 29 Aug 2024 12:40:24 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v7] In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 08:38:41 GMT, Thomas Stuefe wrote: > > > Many of these translations seem awkward, since they convert to size_t only to then convert back to int. > > > > Can you be more specific here please? > > Certainly. For example, this construct: > > ``` > size_t utf8_len = static_cast(length); > const char* base = UNICODE::as_utf8(position, utf8_len); > Symbol* sym = SymbolTable::new_symbol(base, checked_cast(utf8_len)); > ``` > > We introduce `utf8_len` as a `size_t` synonym for len, but since it originates from an `int`, its length must be <= INT_MIN. We also assume it is >=0, but we don't check. We feed `utf8_len` into both `UNICODE::as_utf8` and `SymbolTable::new_symbol`. The former takes a size_t, but since we rely on `length` >= 0, we could just as well give it `length`. For `SymbolTable::new_symbol`, we translate `utf8_len` back to `int`, with a check. The check feels superfluous since `utf8_len` came from an int. > > I assume this verbosity is for the benefit of the code reader, to make intent clear. Otherwise, we could have just continued to use length, and just cast it on the fly to unsigned or to size_t when calling `UNICODE::as_utf8`. This is exactly the case I was referring to. The declaration here is: template static char* as_utf8(const T* base, size_t& length); whether length is an IN/OUT parameter that is the int array length going in (hence >= 0 and <= INT_MAX), and the size_t utf8 sequence length coming out. The out coming utf8 length can theoretically by > INT_MAX but if that were the case in this code (which expects to be dealing with names that can be symbols hence < 64K) then that would be a programming error which the checked_cast would catch. And of course new_symbol checks for < 64K. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20560#issuecomment-2317534674 From stuefe at openjdk.org Thu Aug 29 13:09:22 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 29 Aug 2024 13:09:22 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v7] In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 12:37:47 GMT, David Holmes wrote: > > > > Many of these translations seem awkward, since they convert to size_t only to then convert back to int. > > > > > > > > > Can you be more specific here please? > > > > > > Certainly. For example, this construct: > > ``` > > size_t utf8_len = static_cast(length); > > const char* base = UNICODE::as_utf8(position, utf8_len); > > Symbol* sym = SymbolTable::new_symbol(base, checked_cast(utf8_len)); > > ``` > > > > > > > > > > > > > > > > > > > > > > > > We introduce `utf8_len` as a `size_t` synonym for len, but since it originates from an `int`, its length must be <= INT_MIN. We also assume it is >=0, but we don't check. We feed `utf8_len` into both `UNICODE::as_utf8` and `SymbolTable::new_symbol`. The former takes a size_t, but since we rely on `length` >= 0, we could just as well give it `length`. For `SymbolTable::new_symbol`, we translate `utf8_len` back to `int`, with a check. The check feels superfluous since `utf8_len` came from an int. > > I assume this verbosity is for the benefit of the code reader, to make intent clear. Otherwise, we could have just continued to use length, and just cast it on the fly to unsigned or to size_t when calling `UNICODE::as_utf8`. > > This is exactly the case I was referring to. The declaration here is: > > ``` > template static char* as_utf8(const T* base, size_t& length); > ``` > > whether length is an IN/OUT parameter that is the int array length going in (hence >= 0 and <= INT_MAX), and the size_t utf8 sequence length coming out. The out coming utf8 length can theoretically by > INT_MAX but if that were the case in this code (which expects to be dealing with names that can be symbols hence < 64K) then that would be a programming error which the checked_cast would catch. And of course new_symbol checks for < 64K. Oh, I completely missed that! I wish we would use pointers instead of references in cases like these since that would make the intent immediately clear when looking at the call site. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20560#issuecomment-2317607555 From stuefe at openjdk.org Thu Aug 29 13:09:22 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 29 Aug 2024 13:09:22 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v7] In-Reply-To: <5r0YV_6asNY0Sh0eIMM1050eFE4f1OFMGMXJ1f0rHhY=.49780388-5a3f-4b71-b596-3a3096e7d0b4@github.com> References: <5r0YV_6asNY0Sh0eIMM1050eFE4f1OFMGMXJ1f0rHhY=.49780388-5a3f-4b71-b596-3a3096e7d0b4@github.com> Message-ID: On Thu, 29 Aug 2024 12:30:26 GMT, David Holmes wrote: >> src/hotspot/share/classfile/javaClasses.hpp line 138: >> >>> 136: // Legacy variants that truncate the length if needed >>> 137: static int utf8_length_as_int(oop java_string); >>> 138: static int utf8_length_as_int(oop java_string, typeArrayOop string_value); >> >> I don't get the point of this variant of the function. It takes a string and a typearray. What is the contract here, is the only value allowed for typearray the array oop underlying the string? If yes, why do you assert for value equality in the function? That implies I can feed in any typearrayoop here as long as it has the same value as the string. >> >> I can only see a single point where this function is used, so that does not explain much. Maybe I am overlooking something, but why not just inline the code into that one using call site? > > The as_int versions mirror the existing not-as-int versions. I admit I don't really see the point of unwrapping the array from the string but then pass them both. I assume they are intended/required to be a matching pair. Thanks for clarifying. Okay then. My approval of the patch stands. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1736173642 From stuefe at openjdk.org Thu Aug 29 13:20:25 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 29 Aug 2024 13:20:25 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v4] In-Reply-To: References: Message-ID: <--5KRqr06ENVgu5CvmEG0zpAUvvYWUqnVHzQDL8x488=.2bc136c2-005f-4e7c-ac46-5706f189d20b@github.com> On Thu, 29 Aug 2024 11:37:19 GMT, Coleen Phillimore wrote: >> src/hotspot/share/jfr/recorder/checkpoint/types/traceid/jfrTraceIdKlassQueue.cpp line 79: >> >>> 77: >>> 78: static bool can_compress_element(const Klass* klass) { >>> 79: return Metaspace::is_in_class_space(klass) && >> >> Suggestion: >> >> return (Metaspace::is_in_class_space(klass) || Metaspace::is_in_shared_metaspace(klass)) && > > Is this right? If UseCompressedClassPointers is off, then the shared metaspace isn't in compressed space? If UseCompressedClassPointers is off, we don't have a compressed class space. If its on, Klass from CDS and from class space are compressable. With your patch, interfaces will live in normal metaspace, not int class space, so those are excluded now. TBH, I am not really sure what this code here does, but I assume it tries to reduce the size of a JFR recording by using a compressed identifier for X if X can be expressed by such. Maybe a JFR person should look at this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1736193248 From fyang at openjdk.org Thu Aug 29 13:38:40 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 29 Aug 2024 13:38:40 GMT Subject: RFR: 8339248: RISC-V: Remove li64 macro assembler routine and related code Message-ID: The macro assembler routine li64 and related code (is_li64_at, patch_imm_in_li64, get_target_of_li64 and check_li64_data_dependency) is unused for now. We should remove these unused code, which will save us some unnecessary runtime checks. We can add them back when needed again someday. Testing: - [x] release & fastdebug build - [x] Gtest & Tier1 test (release) ------------- Commit messages: - 8339248: RISC-V: Remove li64 macro assembler routine and related code Changes: https://git.openjdk.org/jdk/pull/20769/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20769&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339248 Stats: 111 lines in 2 files changed: 0 ins; 111 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20769.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20769/head:pull/20769 PR: https://git.openjdk.org/jdk/pull/20769 From dchuyko at openjdk.org Thu Aug 29 14:17:53 2024 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Thu, 29 Aug 2024 14:17:53 GMT Subject: RFR: 8337666: AArch64: SHA3 GPR intrinsic Message-ID: This is an implementation of SHA3 intrinsics for AArch64 that operates GPRs. It follows the Java implementation algorithm but eagerly uses available registers. For example, FP+R18 are used when it's allowed. On simpler cores like RPi3 or Surface Pro it is 23-53% faster than C2 compiled version; on Graviton 3 it is 8-14% faster than C2 compiled version (which is faster than the current intrinsic); on Apple Silicon it is faster than C2 compiled version but slower than the ARMv8.2-SHA intrinsic. Improvements on a particular CPU depend on the input length. For instance, for Graviton 2: Benchmark (ops/ms) (digesterName) (length) G2 MessageDigests.digest SHA3-256 64 28.28% MessageDigests.digest SHA3-256 16384 53.58% MessageDigests.digest SHA3-512 64 27.97% MessageDigests.digest SHA3-512 16384 43.90% MessageDigests.getAndDigest SHA3-256 64 26.18% MessageDigests.getAndDigest SHA3-256 16384 52.82% MessageDigests.getAndDigest SHA3-512 64 24.73% MessageDigests.getAndDigest SHA3-512 16384 44.31% (results for intermediate input lengths look like steps) Existing intrinsic implementation is put under a flag `UseSIMDForSHA3Intrinsic` which is on by default where the intrinsic is enabled currently. Sanity tests were modified to cover new intrinsic variants (`-XX:-UseSIMDForSHA3Intrinsic -XX:+-PreserveFramePointer`) on aarch64 hw. Existing test cases where intrinsic is enabled are executed with `-XX:+IgnoreUnrecognizedVMOptions -XX:+UseSIMDForSHA3Intrinsic`, on platforms where the sha3 extension is missing they still are cut off by isSHA3IntrinsicAvailable() predicate. ------------- Commit messages: - Sanity tests - GPR intrinsic implementation Changes: https://git.openjdk.org/jdk/pull/20422/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20422&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337666 Stats: 744 lines in 5 files changed: 739 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20422.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20422/head:pull/20422 PR: https://git.openjdk.org/jdk/pull/20422 From jiangli at openjdk.org Thu Aug 29 15:00:21 2024 From: jiangli at openjdk.org (Jiangli Zhou) Date: Thu, 29 Aug 2024 15:00:21 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: On Thu, 29 Aug 2024 08:26:16 GMT, Magnus Ihse Bursie wrote: > Okay. Unless I misunderstand something, there were no additional authors to be credited for these two commits. That's correct for these. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20666#issuecomment-2317982354 From duke at openjdk.org Thu Aug 29 15:07:25 2024 From: duke at openjdk.org (duke) Date: Thu, 29 Aug 2024 15:07:25 GMT Subject: Withdrawn: 8330174: Establish no-access zone at the start of Klass encoding range In-Reply-To: <9RShpjQGr5MI3aqK6VqpYgDiUJS3q_Q6Bdo4jWmtJ5g=.764b3747-69be-4a70-a599-d6cb9a02bddd@github.com> References: <9RShpjQGr5MI3aqK6VqpYgDiUJS3q_Q6Bdo4jWmtJ5g=.764b3747-69be-4a70-a599-d6cb9a02bddd@github.com> Message-ID: On Sat, 18 May 2024 06:32:03 GMT, Thomas Stuefe wrote: > After having reserved an address range for the Klass encoding range, we either: > a) Place CDS, then class space, into that address range > b) Place only class space in that range (if CDS is off). > > For an nKlass of 0, the decoded Klasspointer points to the beginning of the encoding range. Since nKlass=0 is a special value, both CDS (a) and Metaspace (b) ensure that no Klass is placed right at the start of the Klass range. > > However, it would also be good to establish a no-access zone at the range's start. Dereferencing an nKlass=0 would then result in an immediate, obvious crash instead of in reading invalid data. > > This would closely mimic what we do in the compressed-oops-enabled java heap (albeit there we do it for fault-based null checks, too) and what Operating Systems do with low-address ranges. > > --- > > The patch: > > We can neither move the encoding base down one page (the encoding base is carefully chosen to fit the platform's decoding). Nor can we move CDS archive space up one page (since CDS relies on the archive being placed exactly at the encoding base address). Nor do we want to move class space up (since class space start has a high alignment requirement of 16MB, protection zone would need to be 16MB large, which is a waste of address space). > > Instead, as before, we just let Metaspace and CDS handle the protection zone internally. For Metaspace, this is very simple. We just protect the first page of class space. > > For CDS, it is a tiny bit more complex since we need to leave a "protection-zone-shaped hole" in the first region of the archive when we dump it. We do just that and then give that region a new property, "has protection zone". At runtime, we protect the underlying memory if a mapped region has a protection zone. > > With CDS, because the page size can differ between dump- and runtime, the protection zone is the size of CDS core region alignment, not page-sized (e.g. dumping on Linux aarch64 with 4KB pages shall generate an archive that can be used in Docker on MacOS with 16KB pages). > > ---- > > Tests: > - ran CDS and AppCDS jtreg tests manually on Mac m1 > - manually tested that decoding, then dereferencing an nKlass=0 gives us the new "Fault address is narrow Klass base - dereferencing a zero nKlass?" output in the hs-err file > - GHAs (which include the new regression test) This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/19290 From gziemski at openjdk.org Thu Aug 29 15:15:23 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 29 Aug 2024 15:15:23 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: <8RG45GnRcXMdeZO6qFIo9KdRBk9KC0kdC-4WgjEM_To=.117f02ea-8eef-4492-bb72-24eef56ef219@github.com> On Wed, 7 Aug 2024 17:13:06 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. > > There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) Taking Dan's feedback into account we have: ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2318039929 From gziemski at openjdk.org Thu Aug 29 15:22:26 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 29 Aug 2024 15:22:26 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 17:13:06 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. > > There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) Taking Dan's feedback into account we have: | | MemType | MemTypeFlag | NMTCat | NMTGroup | NMT_MemType | NMT::MemType | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | gerard | 1 | 0 | 0 | 0 | 2 | 3 | | David | 0 | 3 | 0 | 0 | 1(NMTMemType) | 2 | | Thomas | 0 | 0 | 2 | 3 | 0 | 0 | | Johan | 0 | 0 | 2(NMTCategory) | 3 | 0 | 0 | | Afshin | 0 | 3 | 0 | 0 | 0 | 2 | | Coleen | 0 | 0 | 0 | 0 | 0 | 3 | | Dan | 0 | 0 | 0 | 0 | 2(NMTMemType) | 3 | | | 1 | 6 | 4 | 6 | 5 | 13 | `NMT::MemType` it is. Sorry, if your choice hasn't made it. I am surprised how wide the spectrum of choices are, this took a while to find a consensus. Thank you everyone, who participated! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2318083570 From mbaesken at openjdk.org Thu Aug 29 15:24:30 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 29 Aug 2024 15:24:30 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v6] In-Reply-To: <-Ff0X6wkJWy78vOGT8F1m939z9Aoq8VjbUi_OTNoxko=.9447519f-8e98-4d9d-9c94-86cdbbbe3ae1@github.com> References: <-Ff0X6wkJWy78vOGT8F1m939z9Aoq8VjbUi_OTNoxko=.9447519f-8e98-4d9d-9c94-86cdbbbe3ae1@github.com> Message-ID: On Wed, 28 Aug 2024 16:13:07 GMT, Severin Gehwolf wrote: >> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. >> >> I'm adding those tests in order to not regress another time. >> >> Testing: >> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. >> - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) >> - [x] GHA > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Add root check for SystemdMemoryAwarenessTest.java > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Add Whitebox check for host cpu > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Fix comments > - 8333446: Add tests for hierarchical container support test/hotspot/jtreg/containers/systemd/SystemdMemoryAwarenessTest.java line 58: > 56: SystemdRunOptions opts = SystemdTestUtils.newOpts("HelloSystemd"); > 57: // 1 GB memory > 58: opts.memoryLimit("1000M"); Just wondering - is 1G here possible (the comment states 1 GB / 1024M) ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19530#discussion_r1736474134 From thomas.stuefe at gmail.com Thu Aug 29 15:25:43 2024 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 29 Aug 2024 17:25:43 +0200 Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: <8RG45GnRcXMdeZO6qFIo9KdRBk9KC0kdC-4WgjEM_To=.117f02ea-8eef-4492-bb72-24eef56ef219@github.com> References: <8RG45GnRcXMdeZO6qFIo9KdRBk9KC0kdC-4WgjEM_To=.117f02ea-8eef-4492-bb72-24eef56ef219@github.com> Message-ID: Note that ?NMT::MemType? may lead to many uses of just ?MemType? since a lot of the usage will happen inside a future NMT namespace and people will just drop the namespace. Will make it a bit more difficult to grep for, since you need to look for both variants. Adding an NMT namespace also opens other questions. E.g. writing - all lower case is the standard. On Thu 29. Aug 2024 at 17:15, Gerard Ziemski wrote: > On Wed, 7 Aug 2024 17:13:06 GMT, Gerard Ziemski > wrote: > > > Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. > > > > `MEMFLAGS` implies that we can use more than one at the same time, but > those are exclusive values, so `MemType` is much more suitable name. > > > > There is a bunch of other related cleanup that we can do, but I will > leave for follow up issues such as [NMT: rename NMTUtil::flag to > NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) > > Taking Dan's feedback into account we have: > > ------------- > > PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2318039929 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gziemski at openjdk.org Thu Aug 29 15:27:19 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 29 Aug 2024 15:27:19 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 17:13:06 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. > > There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) I'm planning on doing only the renaming of MEMFLAGS to NMT::MemType in this fix. The renaming of parameters and variables I will leave to a followup. I will start with NMT first and per Stefan's suggestion I am thinking of using `mt` for the variables/parameters names. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2318112256 From coleenp at openjdk.org Thu Aug 29 15:48:28 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 29 Aug 2024 15:48:28 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v4] In-Reply-To: <--5KRqr06ENVgu5CvmEG0zpAUvvYWUqnVHzQDL8x488=.2bc136c2-005f-4e7c-ac46-5706f189d20b@github.com> References: <--5KRqr06ENVgu5CvmEG0zpAUvvYWUqnVHzQDL8x488=.2bc136c2-005f-4e7c-ac46-5706f189d20b@github.com> Message-ID: <_XsCkv5395DpEgFGjzEnGOaphwwu6ttYPWBx1dZDIMk=.4fc3a6e0-8935-4d19-b9bc-0cecf687b5a8@github.com> On Thu, 29 Aug 2024 13:17:50 GMT, Thomas Stuefe wrote: >> Is this right? If UseCompressedClassPointers is off, then the shared metaspace isn't in compressed space? > > If UseCompressedClassPointers is off, we don't have a compressed class space. If its on, Klass from CDS and from class space are compressable. With your patch, interfaces will live in normal metaspace, not int class space, so those are excluded now. > > TBH, I am not really sure what this code here does, but I assume it tries to reduce the size of a JFR recording by using a compressed identifier for X if X can be expressed by such. Maybe a JFR person should look at this. With UseCompressedClassPointers off, I think Metaspace::is_in_shared_metaspace() would still return true but I don't think he compression base is the bottom of the CDS archive. I asked Markus to have a look. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1736538545 From sgehwolf at openjdk.org Thu Aug 29 15:52:21 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Thu, 29 Aug 2024 15:52:21 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v6] In-Reply-To: References: <-Ff0X6wkJWy78vOGT8F1m939z9Aoq8VjbUi_OTNoxko=.9447519f-8e98-4d9d-9c94-86cdbbbe3ae1@github.com> Message-ID: On Thu, 29 Aug 2024 15:22:02 GMT, Matthias Baesken wrote: >> Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Add root check for SystemdMemoryAwarenessTest.java >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Add Whitebox check for host cpu >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Fix comments >> - 8333446: Add tests for hierarchical container support > > test/hotspot/jtreg/containers/systemd/SystemdMemoryAwarenessTest.java line 58: > >> 56: SystemdRunOptions opts = SystemdTestUtils.newOpts("HelloSystemd"); >> 57: // 1 GB memory >> 58: opts.memoryLimit("1000M"); > > Just wondering - is 1G here possible (the comment states 1 GB / 1024M) ? I probably shall fix the comment or change it to `1024M`. Either way it has to match the assertion where we look for `1048576000` bytes in the output. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19530#discussion_r1736549823 From coleenp at openjdk.org Thu Aug 29 15:58:56 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 29 Aug 2024 15:58:56 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags Message-ID: Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. Tested with tier1-7. NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. ------------- Commit messages: - Fix C1 nodes for misc_flags access. - Fix s390 compilation errors. - The test compiler/types/TestSubTypeCheckNewObjectNotConstant.java asserts because the opcode is Op_LoadUB. No idea why. - Fix C2 things I hope. - Fix typeo. - Add in has_finalizer and is_cloneable_fast but doesn't work for C2 yet. - Refix is_hidden_class to use misc_flags. - Move JVM_ACC_IS_VALUE_BASED_CLASS. - Move JVM_ACC_IS_HIDDEN_CLASS. Changes: https://git.openjdk.org/jdk/pull/20719/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20719&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339112 Stats: 322 lines in 52 files changed: 165 ins; 43 del; 114 mod Patch: https://git.openjdk.org/jdk/pull/20719.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20719/head:pull/20719 PR: https://git.openjdk.org/jdk/pull/20719 From stefank at openjdk.org Thu Aug 29 16:20:19 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 29 Aug 2024 16:20:19 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 17:13:06 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. > > There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) I much prefer to see MemType, but I'm warming up to NMTCategory. - MemType: Succinct - matches part of the code (E.g. the mt in mtGC) - MemTypeFlag: Too many words for my preference. - NMTCat: Meuw. :) - NMTCategory: Parts of the code call these categories, so I'm not entirely against this. - NMTGroup: "Group" is a new name for this that currently isn't reflected at all in the code. - NMT_MemType: I think we should try get rid of names using this style. - NMT::MemType: The `::` makes all function declarations noisier for very little benefit, IMO. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2318273393 From gdub at openjdk.org Thu Aug 29 16:25:22 2024 From: gdub at openjdk.org (Gilles Duboscq) Date: Thu, 29 Aug 2024 16:25:22 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags In-Reply-To: References: Message-ID: On Mon, 26 Aug 2024 23:54:22 GMT, Coleen Phillimore wrote: > Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. > > Tested with tier1-7. > > NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 484: > 482: declare_constant(JVMCINMethodData::SPECULATION_LENGTH_BITS) \ > 483: \ > 484: declare_constant(JVM_ACC_WRITTEN_FLAGS) \ `JVM_ACC_IS_HIDDEN_CLASS` and `JVM_ACC_IS_VALUE_BASED_CLASS` are actually used in the compiler (see [here](https://github.com/search?q=repo%3Aoracle%2Fgraal+path%3Acompiler+JVM_ACC_IS_HIDDEN_CLASS&type=code) and [there](https://github.com/search?q=repo%3Aoracle%2Fgraal+path%3Acompiler+JVM_ACC_IS_VALUE_BASED_CLASS&type=code)) so i think `KlassFlags::_misc_is_hidden_class` and `KlassFlags::_misc_is_value_based_class` also need to be exposed below along the other 2 bits. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1736636738 From sgehwolf at openjdk.org Thu Aug 29 16:27:22 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Thu, 29 Aug 2024 16:27:22 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v6] In-Reply-To: <-Ff0X6wkJWy78vOGT8F1m939z9Aoq8VjbUi_OTNoxko=.9447519f-8e98-4d9d-9c94-86cdbbbe3ae1@github.com> References: <-Ff0X6wkJWy78vOGT8F1m939z9Aoq8VjbUi_OTNoxko=.9447519f-8e98-4d9d-9c94-86cdbbbe3ae1@github.com> Message-ID: On Wed, 28 Aug 2024 16:13:07 GMT, Severin Gehwolf wrote: >> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. >> >> I'm adding those tests in order to not regress another time. >> >> Testing: >> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. >> - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) >> - [x] GHA > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Add root check for SystemdMemoryAwarenessTest.java > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Add Whitebox check for host cpu > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Fix comments > - 8333446: Add tests for hierarchical container support Noting here that I'll amend those tests to also cover nested hierarchical limits where the lower limit is higher up the hierarchy. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2318291396 From yzheng at openjdk.org Thu Aug 29 16:31:21 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Thu, 29 Aug 2024 16:31:21 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 16:22:22 GMT, Gilles Duboscq wrote: >> Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. >> >> Tested with tier1-7. >> >> NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. > > src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 484: > >> 482: declare_constant(JVMCINMethodData::SPECULATION_LENGTH_BITS) \ >> 483: \ >> 484: declare_constant(JVM_ACC_WRITTEN_FLAGS) \ > > `JVM_ACC_IS_HIDDEN_CLASS` and `JVM_ACC_IS_VALUE_BASED_CLASS` are actually used in the compiler (see [here](https://github.com/search?q=repo%3Aoracle%2Fgraal+path%3Acompiler+JVM_ACC_IS_HIDDEN_CLASS&type=code) and [there](https://github.com/search?q=repo%3Aoracle%2Fgraal+path%3Acompiler+JVM_ACC_IS_VALUE_BASED_CLASS&type=code)) so i think `KlassFlags::_misc_is_hidden_class` and `KlassFlags::_misc_is_value_based_class` also need to be exposed below along the other 2 bits. Yes, please add these two symbols as well diff --git a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp index f0af57f9513..9d65268f0fe 100644 --- a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp +++ b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp @@ -727,6 +727,8 @@ \ declare_constant(InstanceKlassFlags::_misc_has_nonstatic_concrete_methods) \ declare_constant(InstanceKlassFlags::_misc_declares_nonstatic_concrete_methods) \ + declare_constant(KlassFlags::_misc_is_hidden_class) \ + declare_constant(KlassFlags::_misc_is_value_based_class) \ declare_constant(KlassFlags::_misc_has_finalizer) \ declare_constant(KlassFlags::_misc_is_cloneable_fast) \ \ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1736649959 From jrose at openjdk.org Thu Aug 29 16:42:22 2024 From: jrose at openjdk.org (John R Rose) Date: Thu, 29 Aug 2024 16:42:22 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags In-Reply-To: References: Message-ID: On Mon, 26 Aug 2024 23:54:22 GMT, Coleen Phillimore wrote: > Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. > > Tested with tier1-7. > > NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. src/hotspot/share/oops/klass.hpp line 198: > 196: #endif > 197: > 198: KlassFlags _misc_flags; On the line above (167) where _access_flags is defined, maybe leave a forwarding comment, something like: // Some flags created by the JVM, not in the class file itself, are in _misc_flags below. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1736678013 From mgronlun at openjdk.org Thu Aug 29 17:29:21 2024 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 29 Aug 2024 17:29:21 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v4] In-Reply-To: <_XsCkv5395DpEgFGjzEnGOaphwwu6ttYPWBx1dZDIMk=.4fc3a6e0-8935-4d19-b9bc-0cecf687b5a8@github.com> References: <--5KRqr06ENVgu5CvmEG0zpAUvvYWUqnVHzQDL8x488=.2bc136c2-005f-4e7c-ac46-5706f189d20b@github.com> <_XsCkv5395DpEgFGjzEnGOaphwwu6ttYPWBx1dZDIMk=.4fc3a6e0-8935-4d19-b9bc-0cecf687b5a8@github.com> Message-ID: <7w9D5LRW8UJ2Xb9Mm7Wd5kL_T88jo6UjQDup4ntxjmk=.f84b964c-12fc-420f-982a-d86522d7e7d9@github.com> On Thu, 29 Aug 2024 15:45:17 GMT, Coleen Phillimore wrote: >> If UseCompressedClassPointers is off, we don't have a compressed class space. If its on, Klass from CDS and from class space are compressable. With your patch, interfaces will live in normal metaspace, not int class space, so those are excluded now. >> >> TBH, I am not really sure what this code here does, but I assume it tries to reduce the size of a JFR recording by using a compressed identifier for X if X can be expressed by such. Maybe a JFR person should look at this. > > With UseCompressedClassPointers off, I think Metaspace::is_in_shared_metaspace() would still return true but I don't think he compression base is the bottom of the CDS archive. I asked Markus to have a look. The code supports the JfrTraceID load barrier that enqueues tagged Klass*. It selects a more compact representation (a single word, instead of two words), if a Klass* can be compressed (i.e. there exists a compress class scheme in place (CompressedKlassPointers::encode(const_cast(klass)); AND the traceid (u8) value is low enough to be represented by only 4 bytes). struct JfrEpochQueueKlassElement { traceid id; const Klass* klass; }; struct JfrEpochQueueNarrowKlassElement { u4 id; narrowKlass compressed_klass; }; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1736775534 From mgronlun at openjdk.org Thu Aug 29 17:29:22 2024 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 29 Aug 2024 17:29:22 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v4] In-Reply-To: <7w9D5LRW8UJ2Xb9Mm7Wd5kL_T88jo6UjQDup4ntxjmk=.f84b964c-12fc-420f-982a-d86522d7e7d9@github.com> References: <--5KRqr06ENVgu5CvmEG0zpAUvvYWUqnVHzQDL8x488=.2bc136c2-005f-4e7c-ac46-5706f189d20b@github.com> <_XsCkv5395DpEgFGjzEnGOaphwwu6ttYPWBx1dZDIMk=.4fc3a6e0-8935-4d19-b9bc-0cecf687b5a8@github.com> <7w9D5LRW8UJ2Xb9Mm7Wd5kL_T88jo6UjQDup4ntxjmk=.f84b964c-12fc-420f-982a-d86522d7e7d9@github.com> Message-ID: On Thu, 29 Aug 2024 17:24:47 GMT, Markus Gr?nlund wrote: >> With UseCompressedClassPointers off, I think Metaspace::is_in_shared_metaspace() would still return true but I don't think he compression base is the bottom of the CDS archive. I asked Markus to have a look. > > The code supports the JfrTraceID load barrier that enqueues tagged Klass*. It selects a more compact representation (a single word, instead of two words), if a Klass* can be compressed (i.e. there exists a compress class scheme in place (CompressedKlassPointers::encode(const_cast(klass)); AND the traceid (u8) value is low enough to be represented by only 4 bytes). > > struct JfrEpochQueueKlassElement { > traceid id; > const Klass* klass; > }; > > struct JfrEpochQueueNarrowKlassElement { > u4 id; > narrowKlass compressed_klass; > }; // Return TRUE only if UseCompressedClassPointers is True. static bool using_class_space() { return NOT_LP64(false) LP64_ONLY(UseCompressedClassPointers); } I see now that was wrong for 32-bit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1736779802 From mgronlun at openjdk.org Thu Aug 29 17:33:24 2024 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 29 Aug 2024 17:33:24 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v4] In-Reply-To: References: <--5KRqr06ENVgu5CvmEG0zpAUvvYWUqnVHzQDL8x488=.2bc136c2-005f-4e7c-ac46-5706f189d20b@github.com> <_XsCkv5395DpEgFGjzEnGOaphwwu6ttYPWBx1dZDIMk=.4fc3a6e0-8935-4d19-b9bc-0cecf687b5a8@github.com> <7w9D5LRW8UJ2Xb9Mm7Wd5kL_T88jo6UjQDup4ntxjmk=.f84b964c-12fc-420f-982a-d86522d7e7d9@github.com> Message-ID: On Thu, 29 Aug 2024 17:26:57 GMT, Markus Gr?nlund wrote: >> The code supports the JfrTraceID load barrier that enqueues tagged Klass*. It selects a more compact representation (a single word, instead of two words), if a Klass* can be compressed (i.e. there exists a compress class scheme in place (CompressedKlassPointers::encode(const_cast(klass)); AND the traceid (u8) value is low enough to be represented by only 4 bytes). >> >> struct JfrEpochQueueKlassElement { >> traceid id; >> const Klass* klass; >> }; >> >> struct JfrEpochQueueNarrowKlassElement { >> u4 id; >> narrowKlass compressed_klass; >> }; > > // Return TRUE only if UseCompressedClassPointers is True. > static bool using_class_space() { > return NOT_LP64(false) LP64_ONLY(UseCompressedClassPointers); > } > > I see now that was wrong for 32-bit. In summary, we are agnostic about which space the Klass* is located in; we only care if a valid means exists to perform an encode() and decode() operation to compress the Klass* (for 64-bit to be clear). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1736786419 From mgronlun at openjdk.org Thu Aug 29 17:37:21 2024 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 29 Aug 2024 17:37:21 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v4] In-Reply-To: References: <--5KRqr06ENVgu5CvmEG0zpAUvvYWUqnVHzQDL8x488=.2bc136c2-005f-4e7c-ac46-5706f189d20b@github.com> <_XsCkv5395DpEgFGjzEnGOaphwwu6ttYPWBx1dZDIMk=.4fc3a6e0-8935-4d19-b9bc-0cecf687b5a8@github.com> <7w9D5LRW8UJ2Xb9Mm7Wd5kL_T88jo6UjQDup4ntxjmk=.f84b964c-12fc-420f-982a-d86522d7e7d9@github.com> Message-ID: On Thu, 29 Aug 2024 17:30:26 GMT, Markus Gr?nlund wrote: >> // Return TRUE only if UseCompressedClassPointers is True. >> static bool using_class_space() { >> return NOT_LP64(false) LP64_ONLY(UseCompressedClassPointers); >> } >> >> I see now that was wrong for 32-bit. > > In summary, we are agnostic about which space the Klass* is located in; we only care if a valid means exists to perform an encode() and decode() operation to compress the Klass* (for 64-bit to be clear). This may now become a function of what space the Klass resides in? Its very rare, if at all, that an abstract or an interface would be tagged in JFR. Tags are for concrete implementations, mostly InstanceKlass*. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1736795292 From mgronlun at openjdk.org Thu Aug 29 17:48:21 2024 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 29 Aug 2024 17:48:21 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v4] In-Reply-To: References: <--5KRqr06ENVgu5CvmEG0zpAUvvYWUqnVHzQDL8x488=.2bc136c2-005f-4e7c-ac46-5706f189d20b@github.com> <_XsCkv5395DpEgFGjzEnGOaphwwu6ttYPWBx1dZDIMk=.4fc3a6e0-8935-4d19-b9bc-0cecf687b5a8@github.com> <7w9D5LRW8UJ2Xb9Mm7Wd5kL_T88jo6UjQDup4ntxjmk=.f84b964c-12fc-420f-982a-d86522d7e7d9@github.com> Message-ID: On Thu, 29 Aug 2024 17:35:07 GMT, Markus Gr?nlund wrote: >> In summary, we are agnostic about which space the Klass* is located in; we only care if a valid means exists to perform an encode() and decode() operation to compress the Klass* (for 64-bit to be clear). This may now become a function of what space the Klass resides in? > > Its very rare, if at all, that an abstract or an interface would be tagged in JFR. Tags are for concrete implementations, mostly InstanceKlass*. I now read the JIRA issue. JFR do process loads of java/lang/invoke/LambdaForm$MH and derivatives. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1736809566 From mgronlun at openjdk.org Thu Aug 29 17:48:21 2024 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 29 Aug 2024 17:48:21 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v4] In-Reply-To: References: <--5KRqr06ENVgu5CvmEG0zpAUvvYWUqnVHzQDL8x488=.2bc136c2-005f-4e7c-ac46-5706f189d20b@github.com> <_XsCkv5395DpEgFGjzEnGOaphwwu6ttYPWBx1dZDIMk=.4fc3a6e0-8935-4d19-b9bc-0cecf687b5a8@github.com> <7w9D5LRW8UJ2Xb9Mm7Wd5kL_T88jo6UjQDup4ntxjmk=.f84b964c-12fc-420f-982a-d86522d7e7d9@github.com> Message-ID: On Thu, 29 Aug 2024 17:42:20 GMT, Markus Gr?nlund wrote: >> Its very rare, if at all, that an abstract or an interface would be tagged in JFR. Tags are for concrete implementations, mostly InstanceKlass*. > > I now read the JIRA issue. JFR do process loads of java/lang/invoke/LambdaForm$MH and derivatives. It's fine to have each Klass* report whether it can be compressed. If not, it will be represented using the non-compressed version, which will be a bit more bloated, but no problems. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1736814770 From coleenp at openjdk.org Thu Aug 29 17:48:21 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 29 Aug 2024 17:48:21 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v4] In-Reply-To: References: <--5KRqr06ENVgu5CvmEG0zpAUvvYWUqnVHzQDL8x488=.2bc136c2-005f-4e7c-ac46-5706f189d20b@github.com> <_XsCkv5395DpEgFGjzEnGOaphwwu6ttYPWBx1dZDIMk=.4fc3a6e0-8935-4d19-b9bc-0cecf687b5a8@github.com> <7w9D5LRW8UJ2Xb9Mm7Wd5kL_T88jo6UjQDup4ntxjmk=.f84b964c-12fc-420f-982a-d86522d7e7d9@github.com> Message-ID: On Thu, 29 Aug 2024 17:45:17 GMT, Markus Gr?nlund wrote: >> I now read the JIRA issue. JFR do process loads of java/lang/invoke/LambdaForm$MH and derivatives. > > It's fine to have each Klass* report whether it can be compressed. If not, it will be represented using the non-compressed version, which will be a bit more bloated, but no problems. narrowKlass is the result of encoding Klass* with CompressedKlassPointers::encode() which is relative to the compressed base, so if UseCompressedClassPointers is false then the encoding to narrowKlass from some other (CDS?) base isn't valid. using_class_space() above doesn't look wrong for 32 bits. It should return false. With this patch, interface and abstract classes cannot be encoded and decoded to yield a valid compressed narrowKlass, since they're now allocated in the non-class metaspace. Yes, this now a function of which space the Klass resides in. We do crash for these classes in JFR without this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1736816371 From mgronlun at openjdk.org Thu Aug 29 17:58:23 2024 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 29 Aug 2024 17:58:23 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v4] In-Reply-To: References: <--5KRqr06ENVgu5CvmEG0zpAUvvYWUqnVHzQDL8x488=.2bc136c2-005f-4e7c-ac46-5706f189d20b@github.com> <_XsCkv5395DpEgFGjzEnGOaphwwu6ttYPWBx1dZDIMk=.4fc3a6e0-8935-4d19-b9bc-0cecf687b5a8@github.com> <7w9D5LRW8UJ2Xb9Mm7Wd5kL_T88jo6UjQDup4ntxjmk=.f84b964c-12fc-420f-982a-d86522d7e7d9@github.com> Message-ID: On Thu, 29 Aug 2024 17:46:11 GMT, Coleen Phillimore wrote: >> It's fine to have each Klass* report whether it can be compressed. If not, it will be represented using the non-compressed version, which will be a bit more bloated, but no problems. > > narrowKlass is the result of encoding Klass* with CompressedKlassPointers::encode() which is relative to the compressed base, so if UseCompressedClassPointers is false then the encoding to narrowKlass from some other (CDS?) base isn't valid. > using_class_space() above doesn't look wrong for 32 bits. It should return false. > > With this patch, interface and abstract classes cannot be encoded and decoded to yield a valid compressed narrowKlass, since they're now allocated in the non-class metaspace. Yes, this now a function of which space the Klass resides in. We do crash for these classes in JFR without this change. I mean its wrong from JFRs perspective not to handle 32-bit outside of the call to using_class_space(), because that call will always be false for 32-bit, although the Klass* will still fit in 4-bytes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1736834488 From lmesnik at openjdk.org Thu Aug 29 18:25:29 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 29 Aug 2024 18:25:29 GMT Subject: RFR: 8338934: vmTestbase/nsk/jvmti/*Field*Watch/TestDescription.java tests timeout intermittently Message-ID: The tests time out because of dedlock of of the thread that is in transition and thread changing field watches. They use JvmtiThreadState_lock and JvmtiVTMSTransitionDisabler. The change field watch require disabler, but attempt to use it only when already locked in void JvmtiEventController::change_field_watch(jvmtiEvent event_type, bool added) { MutexLocker mu(JvmtiThreadState_lock); JvmtiEventControllerPrivate::change_field_watch(event_type, added); } while it is needed to first disable transitions and then try to use JvmtiThreadState_lock. I quickly looked that most of jvmti methods do it already. Also moved disabler into jvmtiEmv.cpp to be more consistent with other methods. I was able to verify my fix in loom repo locally. and run tier1 + tier5-svc testing in jdk. ------------- Commit messages: - change lock - moved disabler Changes: https://git.openjdk.org/jdk/pull/20776/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20776&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338934 Stats: 8 lines in 3 files changed: 4 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20776.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20776/head:pull/20776 PR: https://git.openjdk.org/jdk/pull/20776 From mgronlun at openjdk.org Thu Aug 29 18:33:34 2024 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 29 Aug 2024 18:33:34 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v4] In-Reply-To: References: <--5KRqr06ENVgu5CvmEG0zpAUvvYWUqnVHzQDL8x488=.2bc136c2-005f-4e7c-ac46-5706f189d20b@github.com> <_XsCkv5395DpEgFGjzEnGOaphwwu6ttYPWBx1dZDIMk=.4fc3a6e0-8935-4d19-b9bc-0cecf687b5a8@github.com> <7w9D5LRW8UJ2Xb9Mm7Wd5kL_T88jo6UjQDup4ntxjmk=.f84b964c-12fc-420f-982a-d86522d7e7d9@github.com> Message-ID: On Thu, 29 Aug 2024 17:55:53 GMT, Markus Gr?nlund wrote: >> narrowKlass is the result of encoding Klass* with CompressedKlassPointers::encode() which is relative to the compressed base, so if UseCompressedClassPointers is false then the encoding to narrowKlass from some other (CDS?) base isn't valid. >> using_class_space() above doesn't look wrong for 32 bits. It should return false. >> >> With this patch, interface and abstract classes cannot be encoded and decoded to yield a valid compressed narrowKlass, since they're now allocated in the non-class metaspace. Yes, this now a function of which space the Klass resides in. We do crash for these classes in JFR without this change. > > I mean its wrong from JFRs perspective not to handle 32-bit outside of the call to using_class_space(), because that call will always be false for 32-bit, although the Klass* will still fit in 4-bytes. This means trying to represent the tracied (which is always 64-bit) more effectively is skipped on 32-bit. Looks ok from a JFR perspective, Coleen. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1736895410 From coleenp at openjdk.org Thu Aug 29 18:47:21 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 29 Aug 2024 18:47:21 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v4] In-Reply-To: References: <--5KRqr06ENVgu5CvmEG0zpAUvvYWUqnVHzQDL8x488=.2bc136c2-005f-4e7c-ac46-5706f189d20b@github.com> <_XsCkv5395DpEgFGjzEnGOaphwwu6ttYPWBx1dZDIMk=.4fc3a6e0-8935-4d19-b9bc-0cecf687b5a8@github.com> <7w9D5LRW8UJ2Xb9Mm7Wd5kL_T88jo6UjQDup4ntxjmk=.f84b964c-12fc-420f-982a-d86522d7e7d9@github.com> Message-ID: On Thu, 29 Aug 2024 18:30:13 GMT, Markus Gr?nlund wrote: >> I mean its wrong from JFRs perspective not to handle 32-bit outside of the call to using_class_space(), because that call will always be false for 32-bit, although the Klass* will still fit in 4-bytes. This means trying to represent the tracied (which is always 64-bit) more effectively is skipped on 32-bit. > > Looks ok from a JFR perspective, Coleen. Thank you Markus. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1736918572 From coleenp at openjdk.org Thu Aug 29 18:50:42 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 29 Aug 2024 18:50:42 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2] In-Reply-To: References: Message-ID: > Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. > > Tested with tier1-7. > > NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Add in graal flags and a comment. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20719/files - new: https://git.openjdk.org/jdk/pull/20719/files/350f8679..9dc7e551 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20719&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20719&range=00-01 Stats: 4 lines in 2 files changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20719.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20719/head:pull/20719 PR: https://git.openjdk.org/jdk/pull/20719 From coleenp at openjdk.org Thu Aug 29 18:50:42 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 29 Aug 2024 18:50:42 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2] In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 16:28:29 GMT, Yudi Zheng wrote: >> src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 484: >> >>> 482: declare_constant(JVMCINMethodData::SPECULATION_LENGTH_BITS) \ >>> 483: \ >>> 484: declare_constant(JVM_ACC_WRITTEN_FLAGS) \ >> >> `JVM_ACC_IS_HIDDEN_CLASS` and `JVM_ACC_IS_VALUE_BASED_CLASS` are actually used in the compiler (see [here](https://github.com/search?q=repo%3Aoracle%2Fgraal+path%3Acompiler+JVM_ACC_IS_HIDDEN_CLASS&type=code) and [there](https://github.com/search?q=repo%3Aoracle%2Fgraal+path%3Acompiler+JVM_ACC_IS_VALUE_BASED_CLASS&type=code)) so i think `KlassFlags::_misc_is_hidden_class` and `KlassFlags::_misc_is_value_based_class` also need to be exposed below along the other 2 bits. > > Yes, please add these two symbols as well > > diff --git a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp > index f0af57f9513..9d65268f0fe 100644 > --- a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp > +++ b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp > @@ -727,6 +727,8 @@ > \ > declare_constant(InstanceKlassFlags::_misc_has_nonstatic_concrete_methods) \ > declare_constant(InstanceKlassFlags::_misc_declares_nonstatic_concrete_methods) \ > + declare_constant(KlassFlags::_misc_is_hidden_class) \ > + declare_constant(KlassFlags::_misc_is_value_based_class) \ > declare_constant(KlassFlags::_misc_has_finalizer) \ > declare_constant(KlassFlags::_misc_is_cloneable_fast) \ > \ Ok, I added these flags in like this. I didn't see them in the jdk code. We'll have to coordinate the graal code and this change. Also these flags are in misc_flags_offset() now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1736922958 From coleenp at openjdk.org Thu Aug 29 18:50:42 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 29 Aug 2024 18:50:42 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2] In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 16:39:51 GMT, John R Rose wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Add in graal flags and a comment. > > src/hotspot/share/oops/klass.hpp line 198: > >> 196: #endif >> 197: >> 198: KlassFlags _misc_flags; > > On the line above (167) where _access_flags is defined, maybe leave a forwarding comment, something like: > > > // Some flags created by the JVM, not in the class file itself, are in _misc_flags below. Added this comment. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1736923536 From sspitsyn at openjdk.org Thu Aug 29 19:20:18 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 29 Aug 2024 19:20:18 GMT Subject: RFR: 8338934: vmTestbase/nsk/jvmti/*Field*Watch/TestDescription.java tests timeout intermittently In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 18:18:12 GMT, Leonid Mesnik wrote: > The tests time out because of dedlock of of the thread that is in transition and thread changing field watches. > > They use JvmtiThreadState_lock and JvmtiVTMSTransitionDisabler. > > The change field watch require disabler, but attempt to use it only when already locked in > > void > JvmtiEventController::change_field_watch(jvmtiEvent event_type, bool added) { > MutexLocker mu(JvmtiThreadState_lock); > JvmtiEventControllerPrivate::change_field_watch(event_type, added); > } > > > while it is needed to first disable transitions and then try to use JvmtiThreadState_lock. > I quickly looked that most of jvmti methods do it already. Also moved disabler into jvmtiEmv.cpp to be more consistent with other methods. > > > I was able to verify my fix in loom repo locally. and run tier1 + tier5-svc testing in jdk. Looks good. Thank you for jumping to this. The fix is as I initially wanted to have. src/hotspot/share/runtime/mutexLocker.cpp line 270: > 268: MUTEX_DEFN(DirectivesStack_lock , PaddedMutex , nosafepoint); > 269: > 270: MUTEX_DEFN(JvmtiVTMSTransition_lock , PaddedMonitor, safepoint); // used for Virtual Thread Mount State transition management Nit: It'd better to align the comment at the end. ------------- PR Review: https://git.openjdk.org/jdk/pull/20776#pullrequestreview-2269901775 PR Review Comment: https://git.openjdk.org/jdk/pull/20776#discussion_r1736969973 From kbarrett at openjdk.org Thu Aug 29 19:43:19 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 29 Aug 2024 19:43:19 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: <-V2cmdnKmAbRe1i7BPDT-Q5WoGRIRiLwDaRn1jEKYxs=.7dce5659-9a70-4986-a32d-e606d3d8304b@github.com> On Thu, 29 Aug 2024 16:17:09 GMT, Stefan Karlsson wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. >> >> There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) > > I much prefer to see MemType, but I'm warming up to NMTCategory. > > - MemType: Succinct - matches part of the code (E.g. the mt in mtGC) > - MemTypeFlag: Too many words for my preference. > - NMTCat: Meuw. :) > - NMTCategory: Parts of the code call these categories, so I'm not entirely against this. > - NMTGroup: "Group" is a new name for this that currently isn't reflected at all in the code. > - NMT_MemType: I think we should try get rid of names using this style. > - NMT::MemType: The `::` makes all function declarations noisier for very little benefit, IMO. I continue to mostly agree with @stefank. I think this name shouldn't be considered in isolation. There are already a bunch of "NMT_" prefixed names. That's the common idiom for things like this (often (maybe even usually?) without the "_"). Why are we proposing to adopt a new style. (For just this? That would be weird. Or more broadly? That certainly needs more discussion.) Implementation question: Where does the NMT "namespace" come from? Presumably the enum definition is going to be wrapped up in an AllStatic class? A similar effect can be achieved using a `namespace`, but HotSpot avoids using those except in some specific cases that don't apply to this type in isolation. But if we're going to consider the broader scope (as we should), then a namespace might well be appropriate. I'd prefer namespaces (if we were to start using them) use snake_case style naming myself, so "nmt::MemType". ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2318768006 From dholmes at openjdk.org Thu Aug 29 20:42:23 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 29 Aug 2024 20:42:23 GMT Subject: RFR: 8338257: UTF8 lengths should be size_t not int [v7] In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 13:06:19 GMT, Thomas Stuefe wrote: >>> > > Many of these translations seem awkward, since they convert to size_t only to then convert back to int. >>> > >>> > Can you be more specific here please? >>> >>> Certainly. For example, this construct: >>> >>> ``` >>> size_t utf8_len = static_cast(length); >>> const char* base = UNICODE::as_utf8(position, utf8_len); >>> Symbol* sym = SymbolTable::new_symbol(base, checked_cast(utf8_len)); >>> ``` >>> >>> We introduce `utf8_len` as a `size_t` synonym for len, but since it originates from an `int`, its length must be <= INT_MIN. We also assume it is >=0, but we don't check. We feed `utf8_len` into both `UNICODE::as_utf8` and `SymbolTable::new_symbol`. The former takes a size_t, but since we rely on `length` >= 0, we could just as well give it `length`. For `SymbolTable::new_symbol`, we translate `utf8_len` back to `int`, with a check. The check feels superfluous since `utf8_len` came from an int. >>> >>> I assume this verbosity is for the benefit of the code reader, to make intent clear. Otherwise, we could have just continued to use length, and just cast it on the fly to unsigned or to size_t when calling `UNICODE::as_utf8`. >> >> This is exactly the case I was referring to. The declaration here is: >> >> template static char* as_utf8(const T* base, size_t& length); >> >> whether length is an IN/OUT parameter that is the int array length going in (hence >= 0 and <= INT_MAX), and the size_t utf8 sequence length coming out. The out coming utf8 length can theoretically by > INT_MAX but if that were the case in this code (which expects to be dealing with names that can be symbols hence < 64K) then that would be a programming error which the checked_cast would catch. And of course new_symbol checks for < 64K. > >> > > > Many of these translations seem awkward, since they convert to size_t only to then convert back to int. >> > > >> > > >> > > Can you be more specific here please? >> > >> > >> > Certainly. For example, this construct: >> > ``` >> > size_t utf8_len = static_cast(length); >> > const char* base = UNICODE::as_utf8(position, utf8_len); >> > Symbol* sym = SymbolTable::new_symbol(base, checked_cast(utf8_len)); >> > ``` >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > We introduce `utf8_len` as a `size_t` synonym for len, but since it originates from an `int`, its length must be <= INT_MIN. We also assume it is >=0, but we don't check. We feed `utf8_len` into both `UNICODE::as_utf8` and `SymbolTable::new_symbol`. The former takes a size_t, but since we rely on `length` >= 0, we could just as well give it `length`. For `SymbolTable::new_symbol`, we translate `utf8_len` back to `int`, with a check. The check feels superfluous since `utf8_len` came from an int. >> > I assume this verbosity is for the benefit of the code reader, to make intent clear. Otherwise, we could have just continued to use length, and just cast it on the fly to unsigned or to size_t when calling `UNICODE::as_utf8`. >> >> This is exactly the case I was referring to. The declaration here is: >> >> ``` >> template static char* as_utf8(const T* base, size_t& length); >> ``` >> >> whether length is an IN/OUT parameter that is the int array length going in (hence >= 0 and <= INT_MAX), and the size_t utf8 sequence length coming out. The out coming utf8 length can theoretically by > INT_MAX but if that were the case in this code (which expects to be dealing with names that can be symbols hence < 64K) then that would be a programming error which the checked_cast would catch. And of course new_symbol checks for < 64K. > > Oh, I completely missed that! > > I wish we would use pointers instead of references in cases like these since that would make the intent immediately clear when looking at the call site. Thanks for the review and discussion @tstuefe . ------------- PR Comment: https://git.openjdk.org/jdk/pull/20560#issuecomment-2318932417 From dholmes at openjdk.org Thu Aug 29 20:42:24 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 29 Aug 2024 20:42:24 GMT Subject: Integrated: 8338257: UTF8 lengths should be size_t not int In-Reply-To: References: Message-ID: <0EaYdwWN6e9bYcbVikV2p2jZZNhXGNQOOwyl4BMFeZ8=.6387a2fc-6705-4cf8-9536-60df6a56aab7@github.com> On Tue, 13 Aug 2024 02:20:41 GMT, David Holmes wrote: > This work has been split out from JDK-8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths > > The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. Though with compact strings this reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define UTF8 lengths as `size_t` to accommodate all possible representations. Higher-level API's can still use `int` if they know the strings (eg symbols) are sufficiently constrained in length. See the comments in utf8.hpp that explain Strings, compact strings and the encoding. > > As the existing JNI `GetStringUTFLength` still requires the current truncating behaviour of ` UNICODE::utf8_length` we add back `UNICODE::utf8_length_as_int` for it to use. > > Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` use `length` as an IN/OUT parameter: it is the incoming (int) length of the jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. This makes some of the call sites a little messy with casts. > > Testing: > - tiers 1-4 > - GHA This pull request has now been integrated. Changeset: a4962ace Author: David Holmes URL: https://git.openjdk.org/jdk/commit/a4962ace4d3afb36e9d6822a4f02a1515fac40ed Stats: 234 lines in 16 files changed: 112 ins; 5 del; 117 mod 8338257: UTF8 lengths should be size_t not int Reviewed-by: stuefe, coleenp, dlong ------------- PR: https://git.openjdk.org/jdk/pull/20560 From matsaave at openjdk.org Thu Aug 29 21:07:20 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 29 Aug 2024 21:07:20 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2] In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 18:50:42 GMT, Coleen Phillimore wrote: >> Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. >> >> Tested with tier1-7. >> >> NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add in graal flags and a comment. x86 and ARM interpreter code looks good, just one potential nit. src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 247: > 245: load_klass(t1, obj); > 246: ldrb(t1, Address(t1, Klass::misc_flags_offset())); > 247: tstw(t1, KlassFlags::_misc_is_value_based_class); Same here src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 694: > 692: load_klass(tmp, obj_reg); > 693: ldrb(tmp, Address(tmp, Klass::misc_flags_offset())); > 694: tstw(tmp, KlassFlags::_misc_is_value_based_class); Should this just be `tst` instead of `tstw`? ------------- PR Review: https://git.openjdk.org/jdk/pull/20719#pullrequestreview-2270204138 PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737188858 PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737188530 From dlong at openjdk.org Thu Aug 29 22:08:24 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 29 Aug 2024 22:08:24 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2] In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 18:50:42 GMT, Coleen Phillimore wrote: >> Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. >> >> Tested with tier1-7. >> >> NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add in graal flags and a comment. src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp line 76: > 74: load_klass(tmp, Roop); > 75: z_lb(tmp, Address(tmp, Klass::misc_flags_offset())); > 76: testbit(tmp, exact_log2(KlassFlags::_misc_is_value_based_class)); Suggestion: z_tm(Klass::misc_flags_offset(), tmp, KlassFlags::_misc_is_value_based_class); or Suggestion: z_tm(Address(tmp, Klass::misc_flags_offset()), KlassFlags::_misc_is_value_based_class); src/hotspot/cpu/s390/c1_Runtime1_s390.cpp line 447: > 445: __ load_klass(klass, Z_ARG1); > 446: __ z_lb(klass, Address(klass, Klass::misc_flags_offset())); > 447: __ testbit(klass, exact_log2(KlassFlags::_misc_has_finalizer)); Use z_tm. See above for example. src/hotspot/cpu/s390/interp_masm_s390.cpp line 1011: > 1009: load_klass(tmp, object); > 1010: z_lb(tmp, Address(tmp, Klass::misc_flags_offset())); > 1011: testbit(tmp, exact_log2(KlassFlags::_misc_is_value_based_class)); Use z_tm. src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3511: > 3509: load_klass(temp, oop); > 3510: z_lb(temp, Address(temp, Klass::misc_flags_offset())); > 3511: testbit(temp, exact_log2(KlassFlags::_misc_is_value_based_class)); Use z_tm. src/hotspot/cpu/s390/macroAssembler_s390.cpp line 6157: > 6155: load_klass(tmp1, obj); > 6156: z_lb(tmp1, Address(temp, Klass::misc_flags_offset())); > 6157: testbit(tmp1, exact_log2(KlassFlags::_misc_is_value_based_class)); Use z_tm. src/hotspot/cpu/s390/templateTable_s390.cpp line 2325: > 2323: __ load_klass(Rklass, Rthis); > 2324: __ z_lb(Rklass, Address(Rklass, Klass::misc_flags_offset())); > 2325: __ testbit(Rklass, exact_log2(KlassFlags::_misc_has_finalizer)); Use z_tm. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737266627 PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737267281 PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737268180 PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737268350 PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737269052 PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737269173 From dlong at openjdk.org Thu Aug 29 22:17:22 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 29 Aug 2024 22:17:22 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2] In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 18:50:42 GMT, Coleen Phillimore wrote: >> Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. >> >> Tested with tier1-7. >> >> NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add in graal flags and a comment. src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp line 62: > 60: load_klass(hdr, obj, rscratch1); > 61: movb(hdr, Address(hdr, Klass::misc_flags_offset())); > 62: testl(hdr, KlassFlags::_misc_is_value_based_class); Suggestion: testb(Address(hdr, Klass::misc_flags_offset()), KlassFlags::_misc_is_value_based_class); src/hotspot/cpu/x86/c1_Runtime1_x86.cpp line 1170: > 1168: __ load_klass(t, rax, rscratch1); > 1169: __ movb(t, Address(t, Klass::misc_flags_offset())); > 1170: __ testl(t, KlassFlags::_misc_has_finalizer); Use testb(Address, imm) src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 281: > 279: load_klass(tmpReg, objReg, scrReg); > 280: movb(tmpReg, Address(tmpReg, Klass::misc_flags_offset())); > 281: testl(tmpReg, KlassFlags::_misc_is_value_based_class); Use testb(Address, imm) src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 600: > 598: if (DiagnoseSyncOnValueBasedClasses != 0) { > 599: load_klass(rax_reg, obj, t); > 600: movb(rax_reg, Address(rax_reg, Klass::misc_flags_offset())); Use testb(Address, imm) src/hotspot/cpu/x86/interp_masm_x86.cpp line 1178: > 1176: if (DiagnoseSyncOnValueBasedClasses != 0) { > 1177: load_klass(tmp_reg, obj_reg, rklass_decode_tmp); > 1178: movb(tmp_reg, Address(tmp_reg, Klass::misc_flags_offset())); Use testb(Address, imm) src/hotspot/cpu/x86/templateTable_x86.cpp line 2582: > 2580: __ movptr(robj, aaddress(0)); > 2581: __ load_klass(rdi, robj, rscratch1); > 2582: __ movb(rdi, Address(rdi, Klass::misc_flags_offset())); Use testb(Address, imm) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737276368 PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737276956 PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737277139 PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737277271 PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737277473 PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737277597 From dlong at openjdk.org Thu Aug 29 22:27:20 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 29 Aug 2024 22:27:20 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2] In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 18:50:42 GMT, Coleen Phillimore wrote: >> Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. >> >> Tested with tier1-7. >> >> NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add in graal flags and a comment. src/hotspot/share/ci/ciKlass.cpp line 233: > 231: jint ciKlass::misc_flags() { > 232: assert(is_loaded(), "not loaded"); > 233: GUARDED_VM_ENTRY( To Compiler folks: I don't think the VM_ENTRY is necessary, but if it is, then we should consider entering VM mode once and caching/memoizing these immutable flag values in the ciKlass. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737286933 From iklam at openjdk.org Thu Aug 29 22:35:22 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 29 Aug 2024 22:35:22 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: <3M4XT4BaiowyWjJhSoFBREh9e-Be2B6L4tHVAXKw5VQ=.7647e788-8d7d-4e05-91f3-509c6fbd0d3c@github.com> On Thu, 29 Aug 2024 09:28:50 GMT, Stefan Karlsson wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/hotspot/share/cds/archiveHeapWriter.cpp line 214: > >> 212: oopDesc::set_mark(mem, markWord::prototype()); >> 213: oopDesc::release_set_klass(mem, k); >> 214: } > > The `UseCompactObjectHeaders` path calls `get_requested_narrow_klass`, while the `else` part directly uses `k`. Is one of these paths incorrect? This seems odd. The original code sets `Universe::objectArrayKlass()` into the object header. This is the value of this class in the current JVM lifetime. Later, `ArchiveHeapWriter::update_header_for_requested_obj()` would change the object's klass to the "requested" address. I.e., where this class will be loaded in a future JVM lifetime when the CDS archive is loaded into memory. It seems the same logic should be used in the `UseCompactObjectHeaders==true` case. BTW (unrelated to this PR) the comment a few lines up is outdated and wrong: Klass* k = Universe::objectArrayKlass(); // already relocated to point to archived klass `k` is the value of the *actual* location of this class in the current JVM lifetime. Please ignore this comment when trying to understand this function. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1737294872 From dlong at openjdk.org Thu Aug 29 22:43:24 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 29 Aug 2024 22:43:24 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2] In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 18:50:42 GMT, Coleen Phillimore wrote: >> Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. >> >> Tested with tier1-7. >> >> NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add in graal flags and a comment. src/hotspot/share/ci/ciKlass.cpp line 231: > 229: // ------------------------------------------------------------------ > 230: // ciKlass::misc_flags > 231: jint ciKlass::misc_flags() { Suggestion: u1 ciKlass::misc_flags() { I think this should match what misc_flags() returns. Ideally, I think it should be a typedef like KlassFlags_t so we don't have to make a lot of changes if grows to u2. src/hotspot/share/ci/ciKlass.hpp line 125: > 123: > 124: // Fetch Klass::misc_flags. > 125: jint misc_flags(); Suggestion: KlassFlags_t misc_flags(); src/hotspot/share/oops/klass.hpp line 436: > 434: #endif > 435: static ByteSize bitmap_offset() { return byte_offset_of(Klass, _bitmap); } > 436: static ByteSize misc_flags_offset() { return byte_offset_of(Klass, _misc_flags); } Suggestion: static ByteSize misc_flags_offset() { return byte_offset_of(Klass, _misc_flags._flags); } src/hotspot/share/oops/klassFlags.hpp line 56: > 54: // These flags are write-once before the class is published and then read-only > 55: // so don't require atomic updates. > 56: u1 _flags; Suggestion: typedef u1 KlassFlags_t; KlassFlags_t _flags; Can we have a typedef so C++ code that doesn't care about the size doesn't need to change if we later make it u2 or u4? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737291678 PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737299082 PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737303285 PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737294456 From dlong at openjdk.org Thu Aug 29 22:43:24 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 29 Aug 2024 22:43:24 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2] In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 22:29:43 GMT, Dean Long wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Add in graal flags and a comment. > > src/hotspot/share/ci/ciKlass.cpp line 231: > >> 229: // ------------------------------------------------------------------ >> 230: // ciKlass::misc_flags >> 231: jint ciKlass::misc_flags() { > > Suggestion: > > u1 ciKlass::misc_flags() { > > I think this should match what misc_flags() returns. Ideally, I think it should be a typedef like KlassFlags_t so we don't have to make a lot of changes if grows to u2. Also, using the correct, narrowed type will help with -Wconversion warnings later, because u1 can be converted to both int and size_t without a cast. > src/hotspot/share/oops/klass.hpp line 436: > >> 434: #endif >> 435: static ByteSize bitmap_offset() { return byte_offset_of(Klass, _bitmap); } >> 436: static ByteSize misc_flags_offset() { return byte_offset_of(Klass, _misc_flags); } > > Suggestion: > > static ByteSize misc_flags_offset() { return byte_offset_of(Klass, _misc_flags._flags); } We probably shouldn't assume the _flags field starts at offset 0. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737296853 PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737306718 From dlong at openjdk.org Thu Aug 29 22:49:24 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 29 Aug 2024 22:49:24 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2] In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 18:50:42 GMT, Coleen Phillimore wrote: >> Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. >> >> Tested with tier1-7. >> >> NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add in graal flags and a comment. src/hotspot/share/classfile/classFileParser.cpp line 5176: > 5174: ik->set_declares_nonstatic_concrete_methods(_declares_nonstatic_concrete_methods); > 5175: > 5176: assert(!_is_hidden || ik->is_hidden(), "must be set already"); Is this an optimization independent of the current change? src/hotspot/share/opto/library_call.cpp line 3774: > 3772: > 3773: // Use this for testing if Klass is_hidden, has_finalizer, and is_cloneable_fast. > 3774: Node* LibraryCallKit::generate_misc_flags_guard(Node* kls, int modifier_mask, int modifier_bits, RegionNode* region) { It looks like we could refactor generate_misc_flags_guard and generate_access_flags_guard with a common generate_klass_accessor_guard that takes the offset and type as parameters. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737314136 PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737318524 From dlong at openjdk.org Thu Aug 29 22:53:21 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 29 Aug 2024 22:53:21 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2] In-Reply-To: References: Message-ID: <9xD4CEuWpS-ZSWTYG-NbsgY1S8ntiVG-UXGUsbWCZfs=.5374f1e7-3167-4397-8f58-6adea43a1f22@github.com> On Thu, 29 Aug 2024 18:50:42 GMT, Coleen Phillimore wrote: >> Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. >> >> Tested with tier1-7. >> >> NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add in graal flags and a comment. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedObjectTypeImpl.java line 378: > 376: HotSpotVMConfig config = config(); > 377: int miscFlags = UNSAFE.getByte(getKlassPointer() + config.klassMiscFlagsOffset); > 378: return (miscFlags & config().jvmAccHasFinalizer) != 0; Suggestion: return (miscFlags & config.jvmAccHasFinalizer) != 0; src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedObjectTypeImpl.java line 1117: > 1115: HotSpotVMConfig config = config(); > 1116: int miscFlags = UNSAFE.getByte(getKlassPointer() + config.klassMiscFlagsOffset); > 1117: return (miscFlags & config().jvmAccIsCloneableFast) != 0; Suggestion: return (miscFlags & config.jvmAccIsCloneableFast) != 0; src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedObjectTypeImpl.java line 1118: > 1116: int miscFlags = UNSAFE.getByte(getKlassPointer() + config.klassMiscFlagsOffset); > 1117: return (miscFlags & config().jvmAccIsCloneableFast) != 0; > 1118: } Maybe introduce getMiscFlags() helper like existing getAccessFlags()? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737327588 PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737331700 PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737330084 From gziemski at openjdk.org Thu Aug 29 23:35:19 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 29 Aug 2024 23:35:19 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 17:13:06 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. > > There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) After some more internal discussion `MemTag` and `NMT::MemTag` were suggested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2319445188 From sviswanathan at openjdk.org Thu Aug 29 23:41:22 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 29 Aug 2024 23:41:22 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v4] In-Reply-To: References: Message-ID: On Mon, 19 Aug 2024 07:19:30 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 6674: > 6672: // Res = Mask ? Zero : Res > 6673: evmovdqu(etype, ktmp, dst, dst, false, vlen_enc); > 6674: } We could directly do masked evpsubd/evpsubq here with merge as false. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 6698: > 6696: // Unsigned values ranges comprise of only +ve numbers, thus there exist only an upper bound saturation. > 6697: // overflow = ((UMAX - MAX(SRC1 & SRC2)) >> 31 == 1 > 6698: // Res = Signed Add INP1, INP2 The >>> 31 is not coded so comment could be improved to match the code. Comment has SRC1/INP1 term mixed. Also, could overflow not be implemented based on much simpler Java scalar algo: Overflow = Res 6714: // > 6715: // Adaptation of unsigned addition overflow detection from hacker's delight > 6716: // section 2-13 : overflow = ((a & b) | ((a | b) & ~(s))) >>> 31 == 1 Not clear what is s here? I think it is s = a + b. Could you please update the comments to indicate this. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 6738: > 6736: XMMRegister xtmp1, XMMRegister xtmp2, XMMRegister xtmp3, > 6737: XMMRegister xtmp4, int vlen_enc) { > 6738: // Res = Signed Add INP1, INP2 Wondering if we could implement overflow here also based on much simpler Java scalar algo: Overflow = Res 6743: vpcmpeqd(xtmp3, xtmp3, xtmp3, vlen_enc); > 6744: // T2 = ~Res > 6745: vpxor(xtmp2, xtmp3, dst, vlen_enc); Did you mean this to be T3 = ~Res src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 6749: > 6747: vpor(xtmp2, xtmp2, src2, vlen_enc); > 6748: // Compute mask for muxing T1 with T3 using SRC1. > 6749: vpsign_extend_dq(etype, xtmp4, src1, vlen_enc); I don't think we need to do the sign extension. The blend instruction uses most significant bit to do the blend. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 6932: > 6930: > 6931: // Sign-extend to compute overflow detection mask. > 6932: vpsign_extend_dq(etype, xtmp3, xtmp2, vlen_enc); Sign extend to lower bits not needed as blend uses msbit only. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 6939: > 6937: > 6938: // Compose saturating min/max vector using first input polarity mask. > 6939: vpsign_extend_dq(etype, xtmp4, src1, vlen_enc); Sign extend to lower bits not needed as blend uses msbit only. src/hotspot/cpu/x86/x86.ad line 10656: > 10654: match(Set dst (SaturatingSubVI src1 src2)); > 10655: match(Set dst (SaturatingSubVL src1 src2)); > 10656: effect(TEMP ktmp); This needs TEMP dst as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1737116841 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1737272705 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1737306541 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1737307396 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1737325898 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1737338765 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1737467234 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1737467902 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1737489758 From gziemski at openjdk.org Thu Aug 29 23:46:19 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 29 Aug 2024 23:46:19 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 16:17:09 GMT, Stefan Karlsson wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. >> >> There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) > > I much prefer to see MemType, but I'm warming up to NMTCategory. > > - MemType: Succinct - matches part of the code (E.g. the mt in mtGC) > - MemTypeFlag: Too many words for my preference. > - NMTCat: Meuw. :) > - NMTCategory: Parts of the code call these categories, so I'm not entirely against this. > - NMTGroup: "Group" is a new name for this that currently isn't reflected at all in the code. > - NMT_MemType: I think we should try get rid of names using this style. > - NMT::MemType: The `::` makes all function declarations noisier for very little benefit, IMO. > I continue to mostly agree with @stefank. > > I think this name shouldn't be considered in isolation. There are already a bunch of "NMT_" prefixed names. That's the common idiom for things like this (often (maybe even usually?) without the "_"). Why are we proposing to adopt a new style. (For just this? That would be weird. Or more broadly? That certainly needs more discussion.) It is precisely because we already are using `NMT_TrackingStackDepth` and `NMT_TrackingLevel` (and the fact that MemType by itself was too general) that I suggested we adopt `NMT::` Yes, it would be all static class. Why not all lower letter `nmt::`? Again, because we already use `NMT_` elsewhere. My plan was later to switch from `NMT_` to `NMT::` for all of them. We can do `nmt::` too. So how about `MemTag`, `nmt::MemTag` or `NMT::MemTag`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2319470647 From dholmes at openjdk.org Fri Aug 30 02:13:57 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 30 Aug 2024 02:13:57 GMT Subject: RFR: 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths Message-ID: This is the implementation of a new method added to the JNI specification. >From the CSR request: The `GetStringUTFLength` function returns the length as a `jint` (`jsize`) value and so is limited to returning at most `Integer.MAX_VALUE`. But a Java string can itself consist of `Integer.MAX_VALUE` characters, each of which may require more than one byte to represent them in modified UTF-8 format.** It follows then that this function cannot return the correct answer for all String values and yet the specification makes no mention of this, nor of any possible error to report if this situation is encountered. **The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. With compact strings this reduces to 2*`Integer.MAX_VALUE`. Solution Deprecate the existing JNI `GetStringUTFLength` method noting that it may return a truncated length, and add a new method, JNI `GetStringUTFLengthAsLong` that returns the string length as a `jlong` value. --- We also add a truncation warning to `GetStringUTFLength` under -Xcheck:jni There are some incidental whitespace changes in `src/hotspot/os/posix/dtrace/hotspot_jni.d` along with the new method entries. Testing: - new test added - tiers 1-3 sanity Thanks ------------- Commit messages: - Test adjustments - 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths - Merge - Initial commit before splitting out UTF8 changes Changes: https://git.openjdk.org/jdk/pull/20784/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20784&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8328877 Stats: 203 lines in 7 files changed: 180 ins; 1 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/20784.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20784/head:pull/20784 PR: https://git.openjdk.org/jdk/pull/20784 From jzhu at openjdk.org Fri Aug 30 03:03:53 2024 From: jzhu at openjdk.org (Joshua Zhu) Date: Fri, 30 Aug 2024 03:03:53 GMT Subject: RFR: 8339063: [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only [v3] In-Reply-To: References: Message-ID: > Please review this minor enhancement that skips verify_sve_vector_length after native calls. > It works on SVE micro-architecture that only supports 128-bit vector length. Joshua Zhu has updated the pull request incrementally with one additional commit since the last revision: Fix mismatch issue in ad m4 file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20724/files - new: https://git.openjdk.org/jdk/pull/20724/files/c0ec5499..d1910858 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20724&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20724&range=01-02 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20724.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20724/head:pull/20724 PR: https://git.openjdk.org/jdk/pull/20724 From cjplummer at openjdk.org Fri Aug 30 04:51:21 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 30 Aug 2024 04:51:21 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2] In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 18:50:42 GMT, Coleen Phillimore wrote: >> Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. >> >> Tested with tier1-7. >> >> NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add in graal flags and a comment. Marked as reviewed by cjplummer (Reviewer). SA changes look good. ------------- PR Review: https://git.openjdk.org/jdk/pull/20719#pullrequestreview-2271158683 PR Comment: https://git.openjdk.org/jdk/pull/20719#issuecomment-2320037657 From cjplummer at openjdk.org Fri Aug 30 05:14:23 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 30 Aug 2024 05:14:23 GMT Subject: RFR: 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 02:07:54 GMT, David Holmes wrote: > This is the implementation of a new method added to the JNI specification. > > From the CSR request: > > The `GetStringUTFLength` function returns the length as a `jint` (`jsize`) value and so is limited to returning at most `Integer.MAX_VALUE`. But a Java string can itself consist of `Integer.MAX_VALUE` characters, each of which may require more than one byte to represent them in modified UTF-8 format.** It follows then that this function cannot return the correct answer for all String values and yet the specification makes no mention of this, nor of any possible error to report if this situation is encountered. > > **The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. With compact strings this reduces to 2*`Integer.MAX_VALUE`. > > Solution > > Deprecate the existing JNI `GetStringUTFLength` method noting that it may return a truncated length, and add a new method, JNI `GetStringUTFLengthAsLong` that returns the string length as a `jlong` value. > > --- > > We also add a truncation warning to `GetStringUTFLength` under -Xcheck:jni > > There are some incidental whitespace changes in `src/hotspot/os/posix/dtrace/hotspot_jni.d` along with the new method entries. > > Testing: > - new test added > - tiers 1-3 sanity > > Thanks test/hotspot/jtreg/runtime/jni/checked/TestLargeUTF8Length.java line 27: > 25: * @bug 8328877 > 26: * @summary Test warning for GetStringUTFLength and functionality of GetStringUTFLengthAsLong > 27: * @library /test/lib Shouldn't this test have: `@requires vm.bits == 64 ` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20784#discussion_r1737942541 From dholmes at openjdk.org Fri Aug 30 05:21:54 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 30 Aug 2024 05:21:54 GMT Subject: RFR: 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths [v2] In-Reply-To: References: Message-ID: <2e6s-MMPDH7HvC8BHvUV4SzjJximYjZr44OL_CnwFWc=.042e04ef-ba2c-4964-9973-4d9963a6410a@github.com> > This is the implementation of a new method added to the JNI specification. > > From the CSR request: > > The `GetStringUTFLength` function returns the length as a `jint` (`jsize`) value and so is limited to returning at most `Integer.MAX_VALUE`. But a Java string can itself consist of `Integer.MAX_VALUE` characters, each of which may require more than one byte to represent them in modified UTF-8 format.** It follows then that this function cannot return the correct answer for all String values and yet the specification makes no mention of this, nor of any possible error to report if this situation is encountered. > > **The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. With compact strings this reduces to 2*`Integer.MAX_VALUE`. > > Solution > > Deprecate the existing JNI `GetStringUTFLength` method noting that it may return a truncated length, and add a new method, JNI `GetStringUTFLengthAsLong` that returns the string length as a `jlong` value. > > --- > > We also add a truncation warning to `GetStringUTFLength` under -Xcheck:jni > > There are some incidental whitespace changes in `src/hotspot/os/posix/dtrace/hotspot_jni.d` along with the new method entries. > > Testing: > - new test added > - tiers 1-3 sanity > > Thanks David Holmes has updated the pull request incrementally with one additional commit since the last revision: Exclude test on 32-bit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20784/files - new: https://git.openjdk.org/jdk/pull/20784/files/9a8964b8..73174e64 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20784&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20784&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20784.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20784/head:pull/20784 PR: https://git.openjdk.org/jdk/pull/20784 From dholmes at openjdk.org Fri Aug 30 05:21:54 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 30 Aug 2024 05:21:54 GMT Subject: RFR: 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths [v2] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 05:11:30 GMT, Chris Plummer wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Exclude test on 32-bit > > test/hotspot/jtreg/runtime/jni/checked/TestLargeUTF8Length.java line 27: > >> 25: * @bug 8328877 >> 26: * @summary Test warning for GetStringUTFLength and functionality of GetStringUTFLengthAsLong >> 27: * @library /test/lib > > Shouldn't this test have: > > `@requires vm.bits == 64 > ` Thanks for taking a look @plummercj . Yep I suppose it should. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20784#discussion_r1737947827 From stuefe at openjdk.org Fri Aug 30 07:22:21 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 30 Aug 2024 07:22:21 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: <6W5PVxmEnnjOu0CQnJOBu81Bwm-bA7CzCnn0WrkWhu4=.8fcb5baf-02bb-438f-9f86-3450a5c35860@github.com> On Wed, 7 Aug 2024 17:13:06 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. > > There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) I am not against moving NMT into a namespace, but we should be sure that is what we want. Introducing a new namespace is a larger scope than just renaming a single type. For example: should the whole of NMT, including its outside-facing interface, including the enum values itself, go into the new NMT namespace? So, do we do this now: NMT::MemFlag flag = NMT::mtGC; NMT::MemTracker::record_malloc(p, l, flag); NMT::MemTracker::report(); ? If yes, that affects more than just the type name. Do we omit the namespace qualifier when inside NMT? When outside NMT, do we sprinkle `using NMT` around to avoid getting RSI from typing "NMT::"? That is possible, but now we get: MemFlag flag = mtGC; MemTracker::record_malloc(p, l, flag); so the "NMT" prefix is gone. And we have both NMT::MemFlag and MemFlag, so people grepping for stuff need to look for both variants. Or do you plan to just put this single enum into the namespace? As in namespace NMT { enum MEMFLAGS ... }; ? But having a namespace and keeping 99% of associated code outside of that namespace would be a very weird choice. And the enum values would have to be in that namespace in any case, so we wont get around having to qualify all "mtXXX" flags NMT::mtXXX. --- None of these are hard reasons to avoid namespaces. I used namespace "metaspace" back when doing Elastic Metaspace, and I have argued for their selective use in the past. But changes like these tend to fan out a bit more than one initially thinks. I also really prefer namespaces to be lowercase, since that is the usual way they are written ("std", "boost" etc) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2320312099 From stefank at openjdk.org Fri Aug 30 07:30:22 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 30 Aug 2024 07:30:22 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 17:30:14 GMT, Albert Mingkun Yang wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove hashcode leftovers from SA > > src/hotspot/share/gc/serial/serialArguments.cpp line 33: > >> 31: void SerialArguments::initialize_heap_flags_and_sizes() { >> 32: GenArguments::initialize_heap_flags_and_sizes(); >> 33: GCForwarding::initialize_flags(MaxNewSize + MaxOldSize); > > Can one use `MaxHeapSize` here? Good catch. This is actually a bug that is causing the CDS tests to fail. The used variables have not yet been initialized at this point. I tried making the suggested change and that fixed at least one of the CDS failures. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1738101667 From stefank at openjdk.org Fri Aug 30 07:30:23 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 30 Aug 2024 07:30:23 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v5] In-Reply-To: <-G_gdaZBT2xhZFsdyEwIqiOHpbLpiL79N6NDsW8X2BY=.bc52bd8a-21c5-40e7-a921-a5f68675200f@github.com> References: <-G_gdaZBT2xhZFsdyEwIqiOHpbLpiL79N6NDsW8X2BY=.bc52bd8a-21c5-40e7-a921-a5f68675200f@github.com> Message-ID: On Thu, 22 Aug 2024 19:36:00 GMT, Albert Mingkun Yang wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix hash shift for 32 bit builds > > src/hotspot/share/gc/shared/gcForwarding.cpp line 37: > >> 35: size_t max_narrow_heap_size = right_n_bits(NumLowBitsNarrow - Shift); >> 36: if (UseCompactObjectHeaders && max_heap_size > max_narrow_heap_size * HeapWordSize) { >> 37: FLAG_SET_DEFAULT(UseCompactObjectHeaders, false); > > Maybe a log-info/warning would be nice. Yes. This silent setting of UseCompactObjectHeaders ended up hiding why we got CDS failures. I would also suggest that we change this to FLAG_SET_ERGO. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1738104783 From stuefe at openjdk.org Fri Aug 30 07:40:23 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 30 Aug 2024 07:40:23 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 16:23:19 GMT, Leonid Mesnik wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > make/Images.gmk line 135: > >> 133: # >> 134: # Param1 - VM variant (e.g., server, client, zero, ...) >> 135: # Param2 - _nocoops, _coh, _nocoops_coh, or empty > > The -XX:+UseCompactObjectHeaders ssems to incompatible withe zero vm. The zero vm build start failing while generating shared archive with +UseCompactObjectHeaders. Generation should be disabled by default for zero to don't break the build. No, zero works with +COH, but a small change is needed. I'll post a suggestion inline. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1738119614 From stuefe at openjdk.org Fri Aug 30 07:45:19 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 30 Aug 2024 07:45:19 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 07:25:54 GMT, Stefan Karlsson wrote: >> src/hotspot/share/gc/serial/serialArguments.cpp line 33: >> >>> 31: void SerialArguments::initialize_heap_flags_and_sizes() { >>> 32: GenArguments::initialize_heap_flags_and_sizes(); >>> 33: GCForwarding::initialize_flags(MaxNewSize + MaxOldSize); >> >> Can one use `MaxHeapSize` here? > > Good catch. This is actually a bug that is causing the CDS tests to fail. The used variables have not yet been initialized at this point. I tried making the suggested change and that fixed at least one of the CDS failures. Yes, one must, since MaxNewSize and MaxOldSize are still on their initial values, so way too large to allow the GC forwarding, and therefore CompactObjectHeaders get automatically disabled for SerialGC. That explains a bunch of the problems @lmesnik saw. This fixes SerialGC for me: Suggestion: GCForwarding::initialize_flags(MaxHeapSize); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1738123826 From stuefe at openjdk.org Fri Aug 30 07:45:20 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 30 Aug 2024 07:45:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v5] In-Reply-To: References: <-G_gdaZBT2xhZFsdyEwIqiOHpbLpiL79N6NDsW8X2BY=.bc52bd8a-21c5-40e7-a921-a5f68675200f@github.com> Message-ID: <3QGPH52NyrDPne5EgoGx2sx9OeGRu9K72onNNwzMr2M=.8a390b3d-2e8a-470e-8bb7-1ba975070c53@github.com> On Fri, 30 Aug 2024 07:27:45 GMT, Stefan Karlsson wrote: >> src/hotspot/share/gc/shared/gcForwarding.cpp line 37: >> >>> 35: size_t max_narrow_heap_size = right_n_bits(NumLowBitsNarrow - Shift); >>> 36: if (UseCompactObjectHeaders && max_heap_size > max_narrow_heap_size * HeapWordSize) { >>> 37: FLAG_SET_DEFAULT(UseCompactObjectHeaders, false); >> >> Maybe a log-info/warning would be nice. > > Yes. This silent setting of UseCompactObjectHeaders ended up hiding why we got CDS failures. I would also suggest that we change this to FLAG_SET_ERGO. Seems we run all into the same thoughts :) I added Suggestion: FLAG_SET_DEFAULT(UseCompactObjectHeaders, false); warning("Compact object headers require a java heap size smaller than %zu (given: %zu). " "Disabling compact object headers.", max_narrow_heap_size * HeapWordSize, max_heap_size); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1738127194 From stefank at openjdk.org Fri Aug 30 08:10:21 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 30 Aug 2024 08:10:21 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 20:08:43 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix bit counts in GCForwarding src/hotspot/share/cds/filemap.cpp line 2507: > 2505: } > 2506: > 2507: if (compact_headers() != UseCompactObjectHeaders) { (Commenting here, but the comment applies to code a bit above) While debugging CDS, it would have been useful to print the value of UseCompactObjectHeaders. Could we change the code to be: log_info(cds)("Archive was created with UseCompressedOops = %d, UseCompressedClassPointers = %d, UseCompactObjectHeaders = %d", compressed_oops(), compressed_class_pointers(), compact_headers()); src/hotspot/share/cds/filemap.cpp line 2508: > 2506: > 2507: if (compact_headers() != UseCompactObjectHeaders) { > 2508: log_info(cds)("The shared archive file's UseCompactObjectHeaders setting (%s)" Printing on the `info` level mimics what we do when there's a mismatch for compressed classes (and oops), but I wonder if that one is intentional or if it is accidentally printing to 'info' instead of 'warning'. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1738164792 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1738166832 From rcastanedalo at openjdk.org Fri Aug 30 08:22:43 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 30 Aug 2024 08:22:43 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v11] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with six additional commits since the last revision: - Add test to motivate compile-time null checks in 'refine_barrier_by_new_val_type' - Remark relation between compiler optimization and barrier filter - Make 'refine_barrier_by_new_val_type' static and its input argument 'const' - Replace 'the null' with 'null' in comment - Remove redundant redefinitions of '__' - Replace 'already dirty' with 'young' in post-barrier fast path comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/daf38d3f..57adcfb0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=09-10 Stats: 39 lines in 4 files changed: 27 ins; 6 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Fri Aug 30 08:22:44 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 30 Aug 2024 08:22:44 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v9] In-Reply-To: References: Message-ID: On Sun, 25 Aug 2024 01:53:30 GMT, Kim Barrett wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Pass oldval to the pre-barrier in g1CompareAndExchange/SwapP > > src/hotspot/cpu/aarch64/gc/g1/g1BarrierSetAssembler_aarch64.cpp line 218: > >> 216: __ cbz(new_val, done); >> 217: } >> 218: // Storing region crossing non-null, is card already dirty? > > s/already dirty/young/ Done (commit [70c2771](https://github.com/openjdk/jdk/pull/19746/commits/70c2771818834a74a12f8a61de3c77bb69e3e531)), thanks. > src/hotspot/cpu/aarch64/gc/g1/g1BarrierSetAssembler_aarch64.cpp line 280: > >> 278: >> 279: #undef __ >> 280: #define __ masm-> > > These "changes" to `__` are unnecessary and confusing. We have the same define near the top of > the file, unconditionally. This one is conditonal on COMPILER2, but is left in place at the end of the > conditional block, affecting following unconditional code. Removed now (commit [2dc688b](https://github.com/openjdk/jdk/pull/19746/commits/2dc688baf2a8f446c7579fafce7eab3a953e623a)), thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1738181093 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1738182128 From rcastanedalo at openjdk.org Fri Aug 30 08:22:44 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 30 Aug 2024 08:22:44 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> Message-ID: On Wed, 28 Aug 2024 07:50:11 GMT, Kim Barrett wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename g1XChgX to g1GetAndSetX for consistency with Ideal operation names > > src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 160: > >> 158: * To reduce the number of updates to the remembered set, the post-barrier >> 159: * filters out updates to fields in objects located in the Young Generation, the >> 160: * same region as the reference, when the null is being written, or if the card > > s/the null/null/ Done (commit [d1a2349](https://github.com/openjdk/jdk/pull/19746/commits/d1a2349068194ee598cec2b6afe7aa972781b491)), thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1738183062 From rcastanedalo at openjdk.org Fri Aug 30 08:22:44 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 30 Aug 2024 08:22:44 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> Message-ID: On Wed, 28 Aug 2024 08:12:36 GMT, Kim Barrett wrote: >> src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 166: >> >>> 164: * post-barrier completely, if it is possible during compile time to prove the >>> 165: * object is newly allocated and that no safepoint exists between the allocation >>> 166: * and the store. >> >> It might be worth saying explicitly that this is a compile-time version of the above mentioned young >> generation filter. > > We can similarly elide the post-barrier if we can prove at compile-time that the value being written > is null. That case isn't handled here though. Instead that's checked for in > `refine_barrier_by_new_val_type` and in `get_store_barrier`. I'm not sure why it's structured > that way. > It might be worth saying explicitly that this is a compile-time version of the above mentioned young generation filter. Done (commit [72a04c4](https://github.com/openjdk/jdk/pull/19746/commits/72a04c4e8046256ee7e811d66934d5d9e24f4c7c)), thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1738184612 From rcastanedalo at openjdk.org Fri Aug 30 08:27:22 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 30 Aug 2024 08:27:22 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> Message-ID: On Fri, 30 Aug 2024 08:19:50 GMT, Roberto Casta?eda Lozano wrote: >> We can similarly elide the post-barrier if we can prove at compile-time that the value being written >> is null. That case isn't handled here though. Instead that's checked for in >> `refine_barrier_by_new_val_type` and in `get_store_barrier`. I'm not sure why it's structured >> that way. > >> It might be worth saying explicitly that this is a compile-time version of the above mentioned young > generation filter. > > Done (commit [72a04c4](https://github.com/openjdk/jdk/pull/19746/commits/72a04c4e8046256ee7e811d66934d5d9e24f4c7c)), thanks. > We can similarly elide the post-barrier if we can prove at compile-time that the value being written is null. That case isn't handled here though. Instead that's checked for in refine_barrier_by_new_val_type and in get_store_barrier. I'm not sure why it's structured that way. The reason why the compile-time null check is performed outside of `g1_can_remove_post_barrier` is for consistency with the [current mainline code](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp#L382-L388). The difference between the current and this changeset's `g1_can_remove_post_barrier` function is minimal, but this is unfortunately obscured in the patch by the temporary `G1_LATE_BARRIER_MIGRATION_SUPPORT`-guarded code. `refine_barrier_by_new_val_type` performs a compile-time null check again at the end of C2's platform-independent optimizations (see https://bugs.openjdk.org/secure/attachment/107747/late-expansion.png) to exploit potentially stronger type information that might be revealed only after applying some optimizations. I have added a new test case that illustrates this scenario (commit [57adcfb](https://github.com/openjdk/jdk/pull/19746/commits/57adcfb04b163ba6744389d6258efe4b2ace534d)). I will study if the check in `get_store_barrier` is superseded by that in `refine_barrier_by_new_val_type`. If I can convince myself that this is the case I will consider removing the former. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1738191022 From rcastanedalo at openjdk.org Fri Aug 30 08:27:23 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 30 Aug 2024 08:27:23 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> Message-ID: On Wed, 28 Aug 2024 08:17:14 GMT, Kim Barrett wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename g1XChgX to g1GetAndSetX for consistency with Ideal operation names > > src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 229: > >> 227: } >> 228: >> 229: void refine_barrier_by_new_val_type(Node* n) { > > This function should probably be `static`. Done, thanks (I also made its argument `const`, see commit [29d8a89](https://github.com/openjdk/jdk/pull/19746/commits/29d8a89a9a7fd0c1717330609c6d7cb36b0ff174)). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1738192979 From mdoerr at openjdk.org Fri Aug 30 08:33:23 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 30 Aug 2024 08:33:23 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v11] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 08:22:43 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with six additional commits since the last revision: > > - Add test to motivate compile-time null checks in 'refine_barrier_by_new_val_type' > - Remark relation between compiler optimization and barrier filter > - Make 'refine_barrier_by_new_val_type' static and its input argument 'const' > - Replace 'the null' with 'null' in comment > - Remove redundant redefinitions of '__' > - Replace 'already dirty' with 'young' in post-barrier fast path comment Are you planning to merge jdk-24+13? It has a known testbug on PPC64, but that's not a problem. It looks good otherwise. I'll have to rebase the PPC64 implementation after it is merged and I should be able to provide a stable version for this PR afterwards. So, I'd appreciate the update unless @feilongjiang @offamitkumar @snazarkin see any issue on their platforms. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2320483973 From amitkumar at openjdk.org Fri Aug 30 08:53:25 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 30 Aug 2024 08:53:25 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v11] In-Reply-To: References: Message-ID: <5FM9bNeaaI0Lcsto0kfzrcrY4u6SODtf3wqDwmlninw=.367c8d65-c059-4726-a10a-6dd616b643af@github.com> On Fri, 30 Aug 2024 08:22:43 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with six additional commits since the last revision: > > - Add test to motivate compile-time null checks in 'refine_barrier_by_new_val_type' > - Remark relation between compiler optimization and barrier filter > - Make 'refine_barrier_by_new_val_type' static and its input argument 'const' > - Replace 'the null' with 'null' in comment > - Remove redundant redefinitions of '__' > - Replace 'already dirty' with 'young' in post-barrier fast path comment On s390x side, we are good. So I don't have issue with merging jdk-24+13. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2320533252 From ysuenaga at openjdk.org Fri Aug 30 09:22:32 2024 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Fri, 30 Aug 2024 09:22:32 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame Message-ID: I attempted to check stack trace in the core generated by [SEGV example in upcall](https://github.com/YaSuenag/garakuta/blob/841452d9176dab1ddbb552009c180530eb81190b/NativeSEGV/ffm/upcall/src/main/java/com/yasuenag/garakuta/nativesegv/upcall/Main.java) with `jhsdb jstack`, however it failed with following exception. Error occurred during stack walking: java.lang.RuntimeException: Couldn't deduce type of CodeBlob @0x00007fa04c265990 for PC=0x00007fa04c265aa6 at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlobUnsafe(CodeCache.java:124) at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlob(CodeCache.java:83) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.cb(Frame.java:119) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.adjustUnextendedSP(X86Frame.java:334) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.initFrame(X86Frame.java:137) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.(X86Frame.java:163) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.senderForInterpreterFrame(X86Frame.java:361) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.sender(X86Frame.java:281) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.sender(Frame.java:207) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.realSender(Frame.java:212) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.sender(VFrame.java:120) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.javaSender(VFrame.java:144) at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:81) at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45) at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.run(JStack.java:67) at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:278) at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:241) at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:134) at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.runWithArgs(JStack.java:90) at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJSTACK(SALauncher.java:302) at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:500) Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x00007fa04c265990 (nearest symbol is _ZTV10UpcallStub) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.InstanceConstructor.newWrongTypeException(InstanceConstructor.java:62) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VirtualConstructor.instantiateWrapperFor(VirtualConstructor.java:80) at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlobUnsafe(CodeCache.java:107) FFM upcall would use `UpcallStub` to call Java code from native, however frame size of the stub would be set to zero implicitly. It should be set to valid size. ------------- Commit messages: - 8339307: jhsdb jstack could not trace FFM upcall frame Changes: https://git.openjdk.org/jdk/pull/20789/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20789&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339307 Stats: 72 lines in 7 files changed: 63 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/20789.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20789/head:pull/20789 PR: https://git.openjdk.org/jdk/pull/20789 From rcastanedalo at openjdk.org Fri Aug 30 09:23:23 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 30 Aug 2024 09:23:23 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v5] In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 09:10:10 GMT, Feilong Jiang wrote: >>> Are you planning to rebase it with master ? Nothing important, but there were couple of failures which are fixed after this PR. So will make test result a bit clean for us ?. >> >> Actually, I have refrained to update to the latest mainline changes to avoid interfering with the porting work while it is in progress, but if there is consensus among the port maintainers I would be happy to update the changeset regularly. @TheRealMDoerr @feilongjiang @offamitkumar @snazarkin what do you prefer? > >> > Are you planning to rebase it with master ? Nothing important, but there were couple of failures which are fixed after this PR. So will make test result a bit clean for us ?. >> >> Actually, I have refrained to update to the latest mainline changes to avoid interfering with the porting work while it is in progress, but if there is consensus among the port maintainers I would be happy to update the changeset regularly. @TheRealMDoerr @feilongjiang @offamitkumar @snazarkin what do you prefer? > > I have already merged upstream commits on my local branch, so I'm fine with regular updates. > So, I'd appreciate the update unless @feilongjiang @offamitkumar @snazarkin see any issue on their platforms. OK, if there are no objections from @feilongjiang or @snazarkin within a couple of days I will prepare an update to jdk-24+13. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2320618425 From kbarrett at openjdk.org Fri Aug 30 09:56:20 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 30 Aug 2024 09:56:20 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: <6W5PVxmEnnjOu0CQnJOBu81Bwm-bA7CzCnn0WrkWhu4=.8fcb5baf-02bb-438f-9f86-3450a5c35860@github.com> References: <6W5PVxmEnnjOu0CQnJOBu81Bwm-bA7CzCnn0WrkWhu4=.8fcb5baf-02bb-438f-9f86-3450a5c35860@github.com> Message-ID: On Fri, 30 Aug 2024 07:18:47 GMT, Thomas Stuefe wrote: > [...] And the enum values would have to be in that namespace in any case, so we wont get around having to qualify all "mtXXX" flags NMT::mtXXX. We already have to solve essentially that problem because its now a scoped enum. We currently define synonym constants at global scope. (C++20 provides `using MEMFLAGS;`.) Otherwise, we'd currently be needing MEMFLAGS::mtXXX. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2320702002 From kbarrett at openjdk.org Fri Aug 30 10:06:20 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 30 Aug 2024 10:06:20 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: <5Luyt3zjZ69guwvlJVTolw6bSIT3Cv0BLdHoyfMRBOs=.f63fb696-bf3c-4a0c-bd43-211602848c0b@github.com> On Thu, 29 Aug 2024 16:17:09 GMT, Stefan Karlsson wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. >> >> There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) > > I much prefer to see MemType, but I'm warming up to NMTCategory. > > - MemType: Succinct - matches part of the code (E.g. the mt in mtGC) > - MemTypeFlag: Too many words for my preference. > - NMTCat: Meuw. :) > - NMTCategory: Parts of the code call these categories, so I'm not entirely against this. > - NMTGroup: "Group" is a new name for this that currently isn't reflected at all in the code. > - NMT_MemType: I think we should try get rid of names using this style. > - NMT::MemType: The `::` makes all function declarations noisier for very little benefit, IMO. > > I continue to mostly agree with @stefank. > > I think this name shouldn't be considered in isolation. There are already a bunch of "NMT_" prefixed names. That's the common idiom for things like this (often (maybe even usually?) without the "_"). Why are we proposing to adopt a new style. (For just this? That would be weird. Or more broadly? That certainly needs more discussion.) > > It is precisely because we already are using `NMT_TrackingStackDepth` and `NMT_TrackingLevel` (and the fact that MemType by itself was too general) that I suggested we adopt `NMT::` Yes, it would be all static class. > > Why not all lower letter `nmt::`? Again, because we already use `NMT_` elsewhere. > > My plan was later to switch from `NMT_` to `NMT::` for all of them. We can do `nmt::` too. An `NMT_` => `NMT::` change requires using a namespace. An allstatic class is closed for extension, and some of these definitions are in different files. The only way to use an allstatic class instead of a namespace is to go the same route as currently used by the "os" class (over my dead body, and probably @tstuefe too). That previously unstated plan also supports my claim that we shouldn't be considering this one type's name in isolation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2320728486 From alanb at openjdk.org Fri Aug 30 10:07:23 2024 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 30 Aug 2024 10:07:23 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: On Wed, 21 Aug 2024 22:14:40 GMT, Magnus Ihse Bursie wrote: >> As a preparation for Hermetic Java, we need to have a way to look up during runtime if we are using a statically linked library or not. >> >> This change will be the first step needed towards compiling the object files only once, and then link them into either dynamic or static libraries. (The only exception will be the linktype.c[pp] files, which needs to be compiled twice, once for the dynamic libraries and once for the static libraries.) Getting there will require further work though. >> >> This is part of the changes that make up the draft PR https://github.com/openjdk/jdk/pull/19478, which I have broken out. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Also update build to link properly I think the approach is pragmatic and okay. I agree it looks a bit unusual to be testing the image type at runtime but it doesn't seem to be measurable and not a concern right now. In the future, I suspect we will have many places in the libraries that will need to test this at runtime for 30+ other reasons. ------------- Marked as reviewed by alanb (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20666#pullrequestreview-2271829178 From aph at openjdk.org Fri Aug 30 10:18:18 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 30 Aug 2024 10:18:18 GMT Subject: RFR: 8337666: AArch64: SHA3 GPR intrinsic In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 14:38:12 GMT, Dmitry Chuyko wrote: > This is an implementation of SHA3 intrinsics for AArch64 that operates GPRs. It follows the Java implementation algorithm but eagerly uses available registers. For example, FP+R18 are used when it's allowed. On simpler cores like RPi3 or Surface Pro it is 23-53% faster than C2 compiled version; on Graviton 3 it is 8-14% faster than C2 compiled version (which is faster than the current intrinsic); on Apple Silicon it is faster than C2 compiled version but slower than the ARMv8.2-SHA intrinsic. Improvements on a particular CPU depend on the input length. For instance, for Graviton 2: > > > Benchmark (ops/ms) (digesterName) (length) G2 > MessageDigests.digest SHA3-256 64 28.28% > MessageDigests.digest SHA3-256 16384 53.58% > MessageDigests.digest SHA3-512 64 27.97% > MessageDigests.digest SHA3-512 16384 43.90% > MessageDigests.getAndDigest SHA3-256 64 26.18% > MessageDigests.getAndDigest SHA3-256 16384 52.82% > MessageDigests.getAndDigest SHA3-512 64 24.73% > MessageDigests.getAndDigest SHA3-512 16384 44.31% > > > (results for intermediate input lengths look like steps) > > Existing intrinsic implementation is put under a flag `UseSIMDForSHA3Intrinsic` which is on by default where the intrinsic is enabled currently. > > Sanity tests were modified to cover new intrinsic variants (`-XX:-UseSIMDForSHA3Intrinsic -XX:+-PreserveFramePointer`) on aarch64 hw. Existing test cases where intrinsic is enabled are executed with `-XX:+IgnoreUnrecognizedVMOptions -XX:+UseSIMDForSHA3Intrinsic`, on platforms where the sha3 extension is missing they still are cut off by isSHA3IntrinsicAvailable() predicate. This is an interesting one. My thoughts: Keccak (SHA-3) is still not used much, mostly because it's slow. It was one of the slowest finalists in the SHA-3 competition. The main reason it was chosen is that it was so different from SHA-2, and the goal was to have something ready in case SHA-2 was broken. . But SHA-2 is still secure, and still standard. It will be the preferred has algorithm for the sofseeable future. Keccak's slowness is for a few reasons: software implementations are slow, hardware implementations don't really exist, parallel modes for SHA-3 (https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-185.pdf https://keccak.team/files/Sakura.pdf) are still not standardized, and SHA-3 has a truly humongous (i.e. unnecessary) safety margin. There is hope that one day hardware implementations will become common, because Keccak is very efficient in hardware. But (I guess) manufacturers are reluctant to spend a lot of gates on this thing people don't much use. The existing vectorized version of SHA-3 in AArch64 HotSpot depends on FEAT_SHA3, which I think is optional, so acceleration is nice to have on cores without FEAT_SHA3. But (as you say) this accelerated version offers a modest speedup over C2-compiled Java code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20422#issuecomment-2320757133 From mcimadamore at openjdk.org Fri Aug 30 10:23:18 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Fri, 30 Aug 2024 10:23:18 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 09:14:11 GMT, Yasumasa Suenaga wrote: > I attempted to check stack trace in the core generated by [SEGV example in upcall](https://github.com/YaSuenag/garakuta/blob/841452d9176dab1ddbb552009c180530eb81190b/NativeSEGV/ffm/upcall/src/main/java/com/yasuenag/garakuta/nativesegv/upcall/Main.java) with `jhsdb jstack`, however it failed with following exception. > > > Error occurred during stack walking: > java.lang.RuntimeException: Couldn't deduce type of CodeBlob @0x00007fa04c265990 for PC=0x00007fa04c265aa6 > at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlobUnsafe(CodeCache.java:124) > at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlob(CodeCache.java:83) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.cb(Frame.java:119) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.adjustUnextendedSP(X86Frame.java:334) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.initFrame(X86Frame.java:137) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.(X86Frame.java:163) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.senderForInterpreterFrame(X86Frame.java:361) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.sender(X86Frame.java:281) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.sender(Frame.java:207) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.realSender(Frame.java:212) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.sender(VFrame.java:120) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.javaSender(VFrame.java:144) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:81) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.run(JStack.java:67) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:278) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:241) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:134) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.runWithArgs(JStack.java:90) > at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJSTACK(SALauncher.java:302) > at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:500) > Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x00007fa04c265990 (nearest symbol is _ZTV10UpcallStub) > at jdk.hotspot.agent/sun.jvm.hotspot.run... You might wait for @JornVernee to weigh in on this, as he's fixing another issue with upcalls: https://bugs.openjdk.org/browse/JDK-8337753 (seems unrelated - but better double check) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20789#issuecomment-2320764337 From ysuenaga at openjdk.org Fri Aug 30 10:28:18 2024 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Fri, 30 Aug 2024 10:28:18 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 09:14:11 GMT, Yasumasa Suenaga wrote: > I attempted to check stack trace in the core generated by [SEGV example in upcall](https://github.com/YaSuenag/garakuta/blob/841452d9176dab1ddbb552009c180530eb81190b/NativeSEGV/ffm/upcall/src/main/java/com/yasuenag/garakuta/nativesegv/upcall/Main.java) with `jhsdb jstack`, however it failed with following exception. > > > Error occurred during stack walking: > java.lang.RuntimeException: Couldn't deduce type of CodeBlob @0x00007fa04c265990 for PC=0x00007fa04c265aa6 > at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlobUnsafe(CodeCache.java:124) > at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlob(CodeCache.java:83) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.cb(Frame.java:119) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.adjustUnextendedSP(X86Frame.java:334) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.initFrame(X86Frame.java:137) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.(X86Frame.java:163) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.senderForInterpreterFrame(X86Frame.java:361) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.sender(X86Frame.java:281) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.sender(Frame.java:207) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.realSender(Frame.java:212) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.sender(VFrame.java:120) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.javaSender(VFrame.java:144) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:81) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.run(JStack.java:67) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:278) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:241) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:134) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.runWithArgs(JStack.java:90) > at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJSTACK(SALauncher.java:302) > at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:500) > Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x00007fa04c265990 (nearest symbol is _ZTV10UpcallStub) > at jdk.hotspot.agent/sun.jvm.hotspot.run... Thanks for your info! I think JDK-8337753 is different from this issue (PR). Anyway I'm waiting for the review & comments! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20789#issuecomment-2320773750 From ihse at openjdk.org Fri Aug 30 10:54:20 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 30 Aug 2024 10:54:20 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: On Mon, 26 Aug 2024 02:07:39 GMT, David Holmes wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> Also update build to link properly > > I understand the cost overhead experienced by any individual Java run may be lost in the noise, but it still impacts every single Java run just to save some time/resources for the handful of builders of statically linked VMs. I am not a fan. @dholmes-ora This PR now has three reviewers approving it. You say you are "not a fan". Does this mean you want to veto this change? Or can you be willing to accept it, even if you do not like it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20666#issuecomment-2320832777 From mbaesken at openjdk.org Fri Aug 30 11:05:20 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 30 Aug 2024 11:05:20 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v6] In-Reply-To: <-Ff0X6wkJWy78vOGT8F1m939z9Aoq8VjbUi_OTNoxko=.9447519f-8e98-4d9d-9c94-86cdbbbe3ae1@github.com> References: <-Ff0X6wkJWy78vOGT8F1m939z9Aoq8VjbUi_OTNoxko=.9447519f-8e98-4d9d-9c94-86cdbbbe3ae1@github.com> Message-ID: On Wed, 28 Aug 2024 16:13:07 GMT, Severin Gehwolf wrote: >> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. >> >> I'm adding those tests in order to not regress another time. >> >> Testing: >> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. >> - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) >> - [x] GHA > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Add root check for SystemdMemoryAwarenessTest.java > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Add Whitebox check for host cpu > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Fix comments > - 8333446: Add tests for hierarchical container support src/hotspot/share/prims/whitebox.cpp line 2507: > 2505: WB_END > 2506: > 2507: // Physical cpus of the host machine (including containers), Linux only. Isn't the comment a bit misleading ? From what I see , ` os::Linux::active_processor_count()` can use various mechanisms to get number of processor info, if it uses https://linux.die.net/man/2/sched_getaffinity it gives the 'set of CPUs on which it is eligible to run.' That might be different from what the host has. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19530#discussion_r1738427318 From mbaesken at openjdk.org Fri Aug 30 11:08:23 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 30 Aug 2024 11:08:23 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v6] In-Reply-To: <-Ff0X6wkJWy78vOGT8F1m939z9Aoq8VjbUi_OTNoxko=.9447519f-8e98-4d9d-9c94-86cdbbbe3ae1@github.com> References: <-Ff0X6wkJWy78vOGT8F1m939z9Aoq8VjbUi_OTNoxko=.9447519f-8e98-4d9d-9c94-86cdbbbe3ae1@github.com> Message-ID: On Wed, 28 Aug 2024 16:13:07 GMT, Severin Gehwolf wrote: >> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. >> >> I'm adding those tests in order to not regress another time. >> >> Testing: >> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. >> - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) >> - [x] GHA > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Add root check for SystemdMemoryAwarenessTest.java > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Add Whitebox check for host cpu > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Fix comments > - 8333446: Add tests for hierarchical container support Looking through the coding it looks more or less okay to me; but if you really need to run it under user 'root' I think we will not have so much use for this in our test environments because we use other test users. Not saying that this is a very bad thing, maybe it is just the way it is, that 'root' is needed ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2320865740 From stefank at openjdk.org Fri Aug 30 11:10:23 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 30 Aug 2024 11:10:23 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 17:12:09 GMT, Albert Mingkun Yang wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove hashcode leftovers from SA > > src/hotspot/share/gc/serial/defNewGeneration.cpp line 707: > >> 705: } else if (obj->is_forwarded()) { >> 706: // To restore the klass-bits in the header. >> 707: obj->forward_safe_init_mark(); > > I wonder if not modifying successful-forwarded objs is cleaner. Sth like: > > > reset_self_forwarded_in_space(space) { > cur = space->bottom(); > top = space->top(); > > while (cur < top) { > obj = cast_to_oop(cur); > > if (obj->is_self_forwarded()) { > obj->unset_self_forwarded(); > obj_size = obj->size(); > } else { > assert(obj->is_forwarded(), "inv"); > obj_size = obj->forwardee()->size(); > } > > cur += obj_size; > } > } > > reset_self_forwarded_in_space(eden()); > reset_self_forwarded_in_space(from()); I was thinking the same, but there's a problem with that. If we get a promotion failure in the young gen, we are leaving the dead objects marked as forwarded. Then when the Full GC scans these regions with dead objects it will mistakenly think that they have been marked alive because `is_forwarded() == is_gc_marked()`. The code in `phase2_calculate_new_addr` will then break when it looks for `is_gc_marked` objects. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1738433303 From stefank at openjdk.org Fri Aug 30 11:18:20 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 30 Aug 2024 11:18:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 11:07:46 GMT, Stefan Karlsson wrote: >> src/hotspot/share/gc/serial/defNewGeneration.cpp line 707: >> >>> 705: } else if (obj->is_forwarded()) { >>> 706: // To restore the klass-bits in the header. >>> 707: obj->forward_safe_init_mark(); >> >> I wonder if not modifying successful-forwarded objs is cleaner. Sth like: >> >> >> reset_self_forwarded_in_space(space) { >> cur = space->bottom(); >> top = space->top(); >> >> while (cur < top) { >> obj = cast_to_oop(cur); >> >> if (obj->is_self_forwarded()) { >> obj->unset_self_forwarded(); >> obj_size = obj->size(); >> } else { >> assert(obj->is_forwarded(), "inv"); >> obj_size = obj->forwardee()->size(); >> } >> >> cur += obj_size; >> } >> } >> >> reset_self_forwarded_in_space(eden()); >> reset_self_forwarded_in_space(from()); > > I was thinking the same, but there's a problem with that. If we get a promotion failure in the young gen, we are leaving the dead objects marked as forwarded. Then when the Full GC scans these regions with dead objects it will mistakenly think that they have been marked alive because `is_forwarded() == is_gc_marked()`. The code in `phase2_calculate_new_addr` will then break when it looks for `is_gc_marked` objects. FWIW, the ParallelGC does something very similar to what you propose, except that it walks bitmaps instead of paring the space to find the self-forwarded objects. It then has a check inside object_iterate to make sure that it doesn't expose the dead objects (in eden and the from space) to heap dumpers and histogram printers. Because of the the code above, the SerialGC clears away the information about what objects are dead in eden and the from space, so heap dumpers and histogram printers will include these dead objects. We might want to fix that as a future RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1738444174 From sgehwolf at openjdk.org Fri Aug 30 11:43:21 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 30 Aug 2024 11:43:21 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v6] In-Reply-To: References: <-Ff0X6wkJWy78vOGT8F1m939z9Aoq8VjbUi_OTNoxko=.9447519f-8e98-4d9d-9c94-86cdbbbe3ae1@github.com> Message-ID: On Fri, 30 Aug 2024 11:02:52 GMT, Matthias Baesken wrote: >> Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Add root check for SystemdMemoryAwarenessTest.java >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Add Whitebox check for host cpu >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Fix comments >> - 8333446: Add tests for hierarchical container support > > src/hotspot/share/prims/whitebox.cpp line 2507: > >> 2505: WB_END >> 2506: >> 2507: // Physical cpus of the host machine (including containers), Linux only. > > Isn't the comment a bit misleading ? From what I see , ` os::Linux::active_processor_count()` can use various mechanisms to get number of processor info, if it uses https://linux.die.net/man/2/sched_getaffinity it gives the 'set of CPUs on which it is eligible to run.' That might be different from what the host has. Yes. See #20768 for an attempt to unify it. I'll change the comment with the update that I have for nested hierarchies. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19530#discussion_r1738475745 From sgehwolf at openjdk.org Fri Aug 30 11:49:18 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 30 Aug 2024 11:49:18 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v6] In-Reply-To: References: <-Ff0X6wkJWy78vOGT8F1m939z9Aoq8VjbUi_OTNoxko=.9447519f-8e98-4d9d-9c94-86cdbbbe3ae1@github.com> Message-ID: On Fri, 30 Aug 2024 11:05:24 GMT, Matthias Baesken wrote: > Not saying that this is a very bad thing, maybe it is just the way it is, that 'root' is needed ? I'll do some more research whether or not that is a hard requirement. Thanks for the comments so far. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2320970507 From sgehwolf at openjdk.org Fri Aug 30 12:54:22 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 30 Aug 2024 12:54:22 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v6] In-Reply-To: References: <-Ff0X6wkJWy78vOGT8F1m939z9Aoq8VjbUi_OTNoxko=.9447519f-8e98-4d9d-9c94-86cdbbbe3ae1@github.com> Message-ID: On Fri, 30 Aug 2024 11:46:51 GMT, Severin Gehwolf wrote: > > Not saying that this is a very bad thing, maybe it is just the way it is, that 'root' is needed ? > > I'll do some more research whether or not that is a hard requirement. Thanks for the comments so far. It turns out it works on cgroups v2 as non-root. I shall amend the test so that it at least runs OK on non-root and cgroups v2. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2321150231 From rehn at openjdk.org Fri Aug 30 12:56:20 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 30 Aug 2024 12:56:20 GMT Subject: RFR: 8339248: RISC-V: Remove li64 macro assembler routine and related code In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 12:59:40 GMT, Fei Yang wrote: > The macro assembler routine li64 and related code (is_li64_at, patch_imm_in_li64, get_target_of_li64 and check_li64_data_dependency) is unused for now. We should remove these unused code, which will save us some unnecessary runtime checks. We can add them back when needed again someday. > > Testing: > - [x] release & fastdebug build > - [x] Gtest & Tier1 test (release) Great, thanks! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20769#pullrequestreview-2272232501 From mli at openjdk.org Fri Aug 30 13:16:19 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 30 Aug 2024 13:16:19 GMT Subject: RFR: 8329816: Add SLEEF version 3.6.1 In-Reply-To: References: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> Message-ID: On Fri, 30 Aug 2024 10:59:55 GMT, Magnus Ihse Bursie wrote: > Basically, under the `$(eval $(call SetupJdkLibrary, BUILD_LIBVECTORMATH` call, you add the line: > > ``` > EXTRA_SRC := libsleef/generated, \ > ``` > > and that should be it. Thanks! > However, I see that you also manipulate compiler flags for individual files. I don't know if that is still needed, or can be removed. Or, conversely, if additional files will need the special flags. OK, let's have discussion in that pr for further details later. >> make/UpdateSleefSource.gmk line 105: >> >>> 103: TARGETS := $(sleef_native_build) >>> 104: >>> 105: $(eval $(call SetupExecute, sleef_cross_config, \ >> >> Not sure if it's still necessary or right to run the 2 steps build the second time, when native and cross-compilation are the same, e.g. build sve for aarch64 on an aarch64 machine. > > As the documentation says, the update make target only supports cross-compilation. I based this on the shell script created by Mikael, but I guess his reasoning for doing it that way is the same as mine: When an update is needed, you are going to have to do it for all supported platforms, and hence it is easier to cross-compile it to all targets. > > Since updating is a very uncommon operation, I prefer to keep the makefile simple. I see, make sense to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20781#issuecomment-2320895996 PR Review Comment: https://git.openjdk.org/jdk/pull/20781#discussion_r1738445738 From mli at openjdk.org Fri Aug 30 13:16:19 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 30 Aug 2024 13:16:19 GMT Subject: RFR: 8329816: Add SLEEF version 3.6.1 In-Reply-To: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> References: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> Message-ID: On Thu, 29 Aug 2024 23:07:16 GMT, Magnus Ihse Bursie wrote: > [JDK-8312425](https://bugs.openjdk.org/browse/JDK-8312425) is looking to optimize vector math operations by leveraging the SLEEF library. For legal reasons the actual contribution of the SLEEF files needs to be handled separately. > > This is a new attempt at solving [JDK-8329816](https://bugs.openjdk.org/browse/JDK-8329816); the original attempt is here: https://github.com/openjdk/jdk/pull/19185. This PR is based on the discussions on how to move forward that was held in that original PR. Thanks a lot for the quick response and effort! I'm not qualified to review changes in make files, just have one general question below. I also checked the git hash, and generated files, they're good. Can I ask another general question in advance related to [subsequent pr](https://github.com/openjdk/jdk/pull/18605)? As there are several folders (`generated` and `upstream`) under src/jdk.incubator.vector/linux/native/libsleef/ now, what's the recommended way to only include `generated` and skip `upstream` when compile/build the final libsleef.so in that pr? make/UpdateSleefSource.gmk line 105: > 103: TARGETS := $(sleef_native_build) > 104: > 105: $(eval $(call SetupExecute, sleef_cross_config, \ Not sure if it's still necessary or right to run the 2 steps build the second time, when native and cross-compilation are the same, e.g. build sve for aarch64 on an aarch64 machine. ------------- PR Review: https://git.openjdk.org/jdk/pull/20781#pullrequestreview-2271712946 PR Review Comment: https://git.openjdk.org/jdk/pull/20781#discussion_r1738285588 From ihse at openjdk.org Fri Aug 30 13:16:18 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 30 Aug 2024 13:16:18 GMT Subject: RFR: 8329816: Add SLEEF version 3.6.1 Message-ID: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> [JDK-8312425](https://bugs.openjdk.org/browse/JDK-8312425) is looking to optimize vector math operations by leveraging the SLEEF library. For legal reasons the actual contribution of the SLEEF files needs to be handled separately. This is a new attempt at solving [JDK-8329816](https://bugs.openjdk.org/browse/JDK-8329816); the original attempt is here: https://github.com/openjdk/jdk/pull/19185. This PR is based on the discussions on how to move forward that was held in that original PR. ------------- Commit messages: - Expand tabs to 8 spaces to make jcheck happy - Add generated sources for riscv64 - Remove executable bit - Add generated sources for aarch64 - Add build support for updating sleef sources - Add README - Add sleef license in legal - * Add source of libsleef 3.6.1 Changes: https://git.openjdk.org/jdk/pull/20781/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20781&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329816 Stats: 120301 lines in 175 files changed: 120301 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20781.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20781/head:pull/20781 PR: https://git.openjdk.org/jdk/pull/20781 From ihse at openjdk.org Fri Aug 30 13:16:19 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 30 Aug 2024 13:16:19 GMT Subject: RFR: 8329816: Add SLEEF version 3.6.1 In-Reply-To: References: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> Message-ID: On Fri, 30 Aug 2024 09:28:59 GMT, Hamlin Li wrote: > As there are several folders (generated and upstream) under src/jdk.incubator.vector/linux/native/libsleef/ now, what's the recommended way to only include generated and skip upstream when compile/build the final libsleef.so in that pr? Basically, under the `$(eval $(call SetupJdkLibrary, BUILD_LIBVECTORMATH` call, you add the line: EXTRA_SRC := libsleef/generated, \ and that should be it. However, I see that you also manipulate compiler flags for individual files. I don't know if that is still needed, or can be removed. Or, conversely, if additional files will need the special flags. > make/UpdateSleefSource.gmk line 105: > >> 103: TARGETS := $(sleef_native_build) >> 104: >> 105: $(eval $(call SetupExecute, sleef_cross_config, \ > > Not sure if it's still necessary or right to run the 2 steps build the second time, when native and cross-compilation are the same, e.g. build sve for aarch64 on an aarch64 machine. As the documentation says, the update make target only supports cross-compilation. I based this on the shell script created by Mikael, but I guess his reasoning for doing it that way is the same as mine: When an update is needed, you are going to have to do it for all supported platforms, and hence it is easier to cross-compile it to all targets. Since updating is a very uncommon operation, I prefer to keep the makefile simple. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20781#issuecomment-2320849630 PR Review Comment: https://git.openjdk.org/jdk/pull/20781#discussion_r1738426993 From ihse at openjdk.org Fri Aug 30 13:16:20 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 30 Aug 2024 13:16:20 GMT Subject: RFR: 8329816: Add SLEEF version 3.6.1 In-Reply-To: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> References: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> Message-ID: On Thu, 29 Aug 2024 23:07:16 GMT, Magnus Ihse Bursie wrote: > [JDK-8312425](https://bugs.openjdk.org/browse/JDK-8312425) is looking to optimize vector math operations by leveraging the SLEEF library. For legal reasons the actual contribution of the SLEEF files needs to be handled separately. > > This is a new attempt at solving [JDK-8329816](https://bugs.openjdk.org/browse/JDK-8329816); the original attempt is here: https://github.com/openjdk/jdk/pull/19185. This PR is based on the discussions on how to move forward that was held in that original PR. Sorry for the force push. The libsleef source code had *tons* of trailing whitespace, and it seems jcheck got stuck on that. The bot had not finished in 12 hours or so. I initially pushed an additional commit that fixed the whitespace, but jcheck goes through the commits one at a time, so that did not help. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20781#issuecomment-2320943487 From fjiang at openjdk.org Fri Aug 30 13:26:23 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 30 Aug 2024 13:26:23 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v11] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 08:22:43 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with six additional commits since the last revision: > > - Add test to motivate compile-time null checks in 'refine_barrier_by_new_val_type' > - Remark relation between compiler optimization and barrier filter > - Make 'refine_barrier_by_new_val_type' static and its input argument 'const' > - Replace 'the null' with 'null' in comment > - Remove redundant redefinitions of '__' > - Replace 'already dirty' with 'young' in post-barrier fast path comment risc-v port looks good too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2321247648 From fjiang at openjdk.org Fri Aug 30 13:31:17 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 30 Aug 2024 13:31:17 GMT Subject: RFR: 8339248: RISC-V: Remove li64 macro assembler routine and related code In-Reply-To: References: Message-ID: <5yCL8X8QqbBZcidgaqrWrjnAh6mGQIIpsPAzypID2Oo=.356ddf06-2605-41cf-8c6f-70d5d94c87e4@github.com> On Thu, 29 Aug 2024 12:59:40 GMT, Fei Yang wrote: > The macro assembler routine li64 and related code (is_li64_at, patch_imm_in_li64, get_target_of_li64 and check_li64_data_dependency) is unused for now. We should remove these unused code, which will save us some unnecessary runtime checks. We can add them back when needed again someday. > > Testing: > - [x] release & fastdebug build > - [x] Gtest & Tier1 test (release) Looks good, thanks for the clean up! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/20769#pullrequestreview-2272352189 From rcastanedalo at openjdk.org Fri Aug 30 13:43:24 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 30 Aug 2024 13:43:24 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v9] In-Reply-To: References: Message-ID: On Sun, 25 Aug 2024 06:15:20 GMT, Kim Barrett wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Pass oldval to the pre-barrier in g1CompareAndExchange/SwapP > > src/hotspot/share/opto/memnode.cpp line 3468: > >> 3466: // Capture an unaliased, unconditional, simple store into an initializer. >> 3467: // Or, if it is independent of the allocation, hoist it above the allocation. >> 3468: if (ReduceFieldZeroing && ReduceInitialCardMarks && /*can_reshape &&*/ > > It's not obvious to me how this is related to the late barrier changes. I agree this change is not obvious and deserves an explanation. With `ReduceInitialCardMarks` disabled, a store to a newly allocated object requires a post-barrier. In current mainline code, the post-barrier is expanded early, which allows the store-capturing transformation (a first step to avoid needless zeroing in object initialization) to move the store and its post-barrier apart: the store goes into the initialization sequence of the recently allocated object, whereas the post-barrier itself remains outside. Here is an example in pseudo-code of this transformation for early-expanded GC barriers: (before store capturing): allocate object o start initialization of o ... o.f <- 0 ... end initialization of o memory barrier (store-store) o.f <- new-val post-barrier of o.f <- new-val (after store capturing): allocate object o start initialization of o ... o.f <- new-val ... end initialization of o memory barrier (store-store) post-barrier of o.f <- new-val In late barrier expansion however, the post-barrier is an implicit, inseparable part of the store, so if we have stores with post-barriers we have no other choice than leaving them outside the initialization section. To enforce this, the change simply disables store-capturing analysis in the `!ReduceInitialCardMarks` case, which is the only case where we might find stores with post-barriers on recently allocated objects. A perhaps more principled solution might be extending store-capturing analysis to reject stores with late-expanded barriers. I will give it a try. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1738693695 From yzheng at openjdk.org Fri Aug 30 13:48:20 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Fri, 30 Aug 2024 13:48:20 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2] In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 18:50:42 GMT, Coleen Phillimore wrote: >> Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. >> >> Tested with tier1-7. >> >> NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add in graal flags and a comment. src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 274: > 272: nonstatic_field(Klass, _bitmap, uintx) \ > 273: nonstatic_field(Klass, _hash_slot, uint8_t) \ > 274: nonstatic_field(Klass, _misc_flags._flags, u1) \ Can we export `_misc_flags` instead, similar to `_access_flags`? diff --git a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp index 9d65268f0fe..6170647186c 100644 --- a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp +++ b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp @@ -268,10 +268,10 @@ nonstatic_field(Klass, _java_mirror, OopHandle) \ nonstatic_field(Klass, _modifier_flags, jint) \ nonstatic_field(Klass, _access_flags, AccessFlags) \ + nonstatic_field(Klass, _misc_flags, KlassFlags) \ nonstatic_field(Klass, _class_loader_data, ClassLoaderData*) \ nonstatic_field(Klass, _bitmap, uintx) \ nonstatic_field(Klass, _hash_slot, uint8_t) \ - nonstatic_field(Klass, _misc_flags._flags, u1) \ \ nonstatic_field(LocalVariableTableElement, start_bci, u2) \ nonstatic_field(LocalVariableTableElement, length, u2) \ diff --git a/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedObjectTypeImpl.java b/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedObjectTypeImpl.java index 3de4de7d42d..91c9e73b532 100644 --- a/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedObjectTypeImpl.java +++ b/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedObjectTypeImpl.java @@ -170,6 +170,11 @@ public int getAccessFlags() { return UNSAFE.getInt(getKlassPointer() + config.klassAccessFlagsOffset); } + public int getMiscFlags() { + HotSpotVMConfig config = config(); + return UNSAFE.getInt(getKlassPointer() + config.klassMiscFlagsOffset); + } + @Override public ResolvedJavaType getComponentType() { if (componentType == null) { @@ -373,9 +378,7 @@ public AssumptionResult hasFinalizableSubclass() { @Override public boolean hasFinalizer() { - HotSpotVMConfig config = config(); - int miscFlags = UNSAFE.getByte(getKlassPointer() + config.klassMiscFlagsOffset); - return (miscFlags & config().jvmAccHasFinalizer) != 0; + return (getMiscFlags() & config().jvmAccHasFinalizer) != 0; } @Override @@ -1112,9 +1115,7 @@ public ResolvedJavaField resolveField(UnresolvedJavaField unresolvedJavaField, R @Override public boolean isCloneableWithAllocation() { - HotSpotVMConfig config = config(); - int miscFlags = UNSAFE.getByte(getKlassPointer() + config.klassMiscFlagsOffset); - return (miscFlags & config().jvmAccIsCloneableFast) != 0; + return (getMiscFlags() & config().jvmAccIsCloneableFast) != 0; } @Override diff --git a/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotVMConfig.java b/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotVMConfig.java index 16d9cf3625e..6f1c325ee47 100644 --- a/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotVMConfig.java +++ b/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotVMConfig.java @@ -85,6 +85,7 @@ String getHostArchitectureName() { final int javaMirrorOffset = getFieldOffset("Klass::_java_mirror", Integer.class, "OopHandle"); final int klassAccessFlagsOffset = getFieldOffset("Klass::_access_flags", Integer.class, "AccessFlags"); + final int klassMiscFlagsOffset = getFieldOffset("Klass::_misc_flags", Integer.class, "KlassFlags"); final int klassLayoutHelperOffset = getFieldOffset("Klass::_layout_helper", Integer.class, "jint"); final int klassLayoutHelperNeutralValue = getConstant("Klass::_lh_neutral_value", Integer.class); @@ -98,7 +99,6 @@ String getHostArchitectureName() { final int instanceKlassFieldInfoStreamOffset = getFieldOffset("InstanceKlass::_fieldinfo_stream", Integer.class, "Array*"); final int instanceKlassAnnotationsOffset = getFieldOffset("InstanceKlass::_annotations", Integer.class, "Annotations*"); final int instanceKlassMiscFlagsOffset = getFieldOffset("InstanceKlass::_misc_flags._flags", Integer.class, "u2"); - final int klassMiscFlagsOffset = getFieldOffset("Klass::_misc_flags._flags", Integer.class, "u1"); final int klassVtableStartOffset = getFieldValue("CompilerToVM::Data::Klass_vtable_start_offset", Integer.class, "int"); final int klassVtableLengthOffset = getFieldValue("CompilerToVM::Data::Klass_vtable_length_offset", Integer.class, "int"); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1738703831 From rcastanedalo at openjdk.org Fri Aug 30 13:51:23 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 30 Aug 2024 13:51:23 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> Message-ID: On Wed, 28 Aug 2024 08:25:06 GMT, Kim Barrett wrote: > I've only looked at the changes in gc directories (shared and cpu-specific). Thanks for your suggestions and comments, Kim! I have addressed them now (modulo a couple of follow-up experiments that I will try out in the next days), please let me know if there is any further question. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2321323461 From coleenp at openjdk.org Fri Aug 30 14:00:44 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 30 Aug 2024 14:00:44 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v3] In-Reply-To: References: Message-ID: > Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. > > Tested with tier1-7. > > NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. Coleen Phillimore has updated the pull request incrementally with three additional commits since the last revision: - Fix jvmci code. - Some C2 refactoring. - Assembly corrections from Matias and Dean. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20719/files - new: https://git.openjdk.org/jdk/pull/20719/files/9dc7e551..852ca049 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20719&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20719&range=01-02 Stats: 55 lines in 20 files changed: 4 ins; 21 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/20719.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20719/head:pull/20719 PR: https://git.openjdk.org/jdk/pull/20719 From coleenp at openjdk.org Fri Aug 30 14:00:44 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 30 Aug 2024 14:00:44 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2] In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 18:50:42 GMT, Coleen Phillimore wrote: >> Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. >> >> Tested with tier1-7. >> >> NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add in graal flags and a comment. Dean and Matias, thank you for the improvements to the assembly code. Dean I took your suggestions, including refactoring C2 code. Thanks for the thorough look at this change. ------------- PR Review: https://git.openjdk.org/jdk/pull/20719#pullrequestreview-2272242043 From coleenp at openjdk.org Fri Aug 30 14:00:44 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 30 Aug 2024 14:00:44 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2] In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 21:03:09 GMT, Matias Saavedra Silva wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Add in graal flags and a comment. > > src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 694: > >> 692: load_klass(tmp, obj_reg); >> 693: ldrb(tmp, Address(tmp, Klass::misc_flags_offset())); >> 694: tstw(tmp, KlassFlags::_misc_is_value_based_class); > > Should this just be `tst` instead of `tstw`? Yes, thanks, and I changed all of these in the aarch64 code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1738608738 From coleenp at openjdk.org Fri Aug 30 14:00:45 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 30 Aug 2024 14:00:45 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2] In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 22:03:39 GMT, Dean Long wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Add in graal flags and a comment. > > src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp line 76: > >> 74: load_klass(tmp, Roop); >> 75: z_lb(tmp, Address(tmp, Klass::misc_flags_offset())); >> 76: testbit(tmp, exact_log2(KlassFlags::_misc_is_value_based_class)); > > Suggestion: > > z_tm(Klass::misc_flags_offset(), tmp, KlassFlags::_misc_is_value_based_class); > > or > Suggestion: > > z_tm(Address(tmp, Klass::misc_flags_offset()), KlassFlags::_misc_is_value_based_class); Thank you for the corrections. I changed all of these in the s390 code. > src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp line 62: > >> 60: load_klass(hdr, obj, rscratch1); >> 61: movb(hdr, Address(hdr, Klass::misc_flags_offset())); >> 62: testl(hdr, KlassFlags::_misc_is_value_based_class); > > Suggestion: > > testb(Address(hdr, Klass::misc_flags_offset()), KlassFlags::_misc_is_value_based_class); I corrected these also in the x86 code. This is much better. > src/hotspot/share/ci/ciKlass.cpp line 233: > >> 231: jint ciKlass::misc_flags() { >> 232: assert(is_loaded(), "not loaded"); >> 233: GUARDED_VM_ENTRY( > > To Compiler folks: I don't think the VM_ENTRY is necessary, but if it is, then we should consider entering VM mode once and caching/memoizing these immutable flag values in the ciKlass. I added a global typedef klass_flags_t because it didn't look confusing vs KlassFlags and KlassFlags_t, and the lower case convention is something we usually use for typedefs. > src/hotspot/share/classfile/classFileParser.cpp line 5176: > >> 5174: ik->set_declares_nonstatic_concrete_methods(_declares_nonstatic_concrete_methods); >> 5175: >> 5176: assert(!_is_hidden || ik->is_hidden(), "must be set already"); > > Is this an optimization independent of the current change? No, it's needed by the current change because with these Flags.hpp implementations, we assert that they are only assigned once i.e. const after classfile parsing. This was a second assignment. > src/hotspot/share/oops/klassFlags.hpp line 56: > >> 54: // These flags are write-once before the class is published and then read-only >> 55: // so don't require atomic updates. >> 56: u1 _flags; > > Suggestion: > > typedef u1 KlassFlags_t; > KlassFlags_t _flags; > > Can we have a typedef so C++ code that doesn't care about the size doesn't need to change if we later make it u2 or u4? Adding it here makes us refer to it as KlassFlags::KlassFlags_t which is pretty noisy. > src/hotspot/share/opto/library_call.cpp line 3774: > >> 3772: >> 3773: // Use this for testing if Klass is_hidden, has_finalizer, and is_cloneable_fast. >> 3774: Node* LibraryCallKit::generate_misc_flags_guard(Node* kls, int modifier_mask, int modifier_bits, RegionNode* region) { > > It looks like we could refactor generate_misc_flags_guard and generate_access_flags_guard with a common generate_klass_accessor_guard that takes the offset and type as parameters. Yes, copying this code was a bit painful but it was hard to get right. I refactored the common code into generate_mods_flags_guard, so we pass in the modifier node and eliminate the duplicate code. > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedObjectTypeImpl.java line 378: > >> 376: HotSpotVMConfig config = config(); >> 377: int miscFlags = UNSAFE.getByte(getKlassPointer() + config.klassMiscFlagsOffset); >> 378: return (miscFlags & config().jvmAccHasFinalizer) != 0; > > Suggestion: > > return (miscFlags & config.jvmAccHasFinalizer) != 0; Also removed local config variable. > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedObjectTypeImpl.java line 1118: > >> 1116: int miscFlags = UNSAFE.getByte(getKlassPointer() + config.klassMiscFlagsOffset); >> 1117: return (miscFlags & config().jvmAccIsCloneableFast) != 0; >> 1118: } > > Maybe introduce getMiscFlags() helper like existing getAccessFlags()? Seems not useful for these two references. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1738609365 PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1738610399 PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1738611867 PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1738642433 PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1738644664 PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1738647156 PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1738688415 PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1738645913 From coleenp at openjdk.org Fri Aug 30 14:00:45 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 30 Aug 2024 14:00:45 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2] In-Reply-To: References: Message-ID: <76ATx7XQ4jYsXzlaaKeLtdZu1sDraQv3DO3FgwKjmn8=.fe221bc1-4557-40eb-9e8e-527cc6501bb2@github.com> On Thu, 29 Aug 2024 22:40:43 GMT, Dean Long wrote: >> src/hotspot/share/oops/klass.hpp line 436: >> >>> 434: #endif >>> 435: static ByteSize bitmap_offset() { return byte_offset_of(Klass, _bitmap); } >>> 436: static ByteSize misc_flags_offset() { return byte_offset_of(Klass, _misc_flags); } >> >> Suggestion: >> >> static ByteSize misc_flags_offset() { return byte_offset_of(Klass, _misc_flags._flags); } > > We probably shouldn't assume the _flags field starts at offset 0. Yes, this was wrong. Thanks for seeing this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1738643187 From erikj at openjdk.org Fri Aug 30 14:04:22 2024 From: erikj at openjdk.org (Erik Joelsson) Date: Fri, 30 Aug 2024 14:04:22 GMT Subject: RFR: 8329816: Add SLEEF version 3.6.1 In-Reply-To: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> References: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> Message-ID: On Thu, 29 Aug 2024 23:07:16 GMT, Magnus Ihse Bursie wrote: > [JDK-8312425](https://bugs.openjdk.org/browse/JDK-8312425) is looking to optimize vector math operations by leveraging the SLEEF library. For legal reasons the actual contribution of the SLEEF files needs to be handled separately. > > This is a new attempt at solving [JDK-8329816](https://bugs.openjdk.org/browse/JDK-8329816); the original attempt is here: https://github.com/openjdk/jdk/pull/19185. This PR is based on the discussions on how to move forward that was held in that original PR. I replicated the generation using the instructions in the readme. It worked but the generated files end up with large whitespace differences, probably because you generated them before cleaning the whitespace in the original sources. If that is the case, could you update the generated files from the exact upstream sources that were committed? make/UpdateSleefSource.gmk line 38: > 36: ################################################################################ > 37: # This file is responsible for updating the generated sleef source code files > 38: # that are checked in to the JDK repo, and that is actually used when building. Suggestion: # that are checked in to the JDK repo, and that are actually used when building. src/jdk.incubator.vector/linux/native/libsleef/README.md line 29: > 27: `https://github.com/shibatch/sleef.git`, and copy all files, except the `docs` > 28: and `.github` directories, into > 29: `src/jdk.incubator.vector/linux/native/libsleef/upstream`. I think you need to add something about the need for whitespace cleanup here. ------------- PR Review: https://git.openjdk.org/jdk/pull/20781#pullrequestreview-2272399884 PR Review Comment: https://git.openjdk.org/jdk/pull/20781#discussion_r1738702398 PR Review Comment: https://git.openjdk.org/jdk/pull/20781#discussion_r1738715361 From coleenp at openjdk.org Fri Aug 30 14:13:26 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 30 Aug 2024 14:13:26 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 13:45:16 GMT, Yudi Zheng wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Add in graal flags and a comment. > > src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 274: > >> 272: nonstatic_field(Klass, _bitmap, uintx) \ >> 273: nonstatic_field(Klass, _hash_slot, uint8_t) \ >> 274: nonstatic_field(Klass, _misc_flags._flags, u1) \ > > Can we export `_misc_flags` instead, similar to `_access_flags`? > > diff --git a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp > index 9d65268f0fe..6170647186c 100644 > --- a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp > +++ b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp > @@ -268,10 +268,10 @@ > nonstatic_field(Klass, _java_mirror, OopHandle) \ > nonstatic_field(Klass, _modifier_flags, jint) \ > nonstatic_field(Klass, _access_flags, AccessFlags) \ > + nonstatic_field(Klass, _misc_flags, KlassFlags) \ > nonstatic_field(Klass, _class_loader_data, ClassLoaderData*) \ > nonstatic_field(Klass, _bitmap, uintx) \ > nonstatic_field(Klass, _hash_slot, uint8_t) \ > - nonstatic_field(Klass, _misc_flags._flags, u1) \ > \ > nonstatic_field(LocalVariableTableElement, start_bci, u2) \ > nonstatic_field(LocalVariableTableElement, length, u2) \ > diff --git a/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedObjectTypeImpl.java b/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedObjectTypeImpl.jav... I don't think the JVMCI knows about the type KlassFlags - I used the same code that I used for InstanceKlass::_misc_flags._flags (see above this). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1738754099 From sgehwolf at openjdk.org Fri Aug 30 14:14:05 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 30 Aug 2024 14:14:05 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v7] In-Reply-To: References: Message-ID: <4LqgwosLUnfbgYVsRmdcIln9NOplSeSum_iSX0-ri4w=.f2413f2d-08b5-4053-9795-279838c4226c@github.com> > Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. > > I'm adding those tests in order to not regress another time. > > Testing: > - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. > - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) > - [x] GHA Severin Gehwolf has updated the pull request incrementally with four additional commits since the last revision: - Fix comment of WB::host_cpus() - Handle non-root + CGv2 - Add nested hierarchy to test framework - Revert "Add root check for SystemdMemoryAwarenessTest.java" This reverts commit 7e8d9ed46815096ae8c4502f3320ebf5208438d5. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19530/files - new: https://git.openjdk.org/jdk/pull/19530/files/7e8d9ed4..a98fd7d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19530&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19530&range=05-06 Stats: 232 lines in 4 files changed: 167 ins; 35 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/19530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19530/head:pull/19530 PR: https://git.openjdk.org/jdk/pull/19530 From sgehwolf at openjdk.org Fri Aug 30 14:23:20 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 30 Aug 2024 14:23:20 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v7] In-Reply-To: <4LqgwosLUnfbgYVsRmdcIln9NOplSeSum_iSX0-ri4w=.f2413f2d-08b5-4053-9795-279838c4226c@github.com> References: <4LqgwosLUnfbgYVsRmdcIln9NOplSeSum_iSX0-ri4w=.f2413f2d-08b5-4053-9795-279838c4226c@github.com> Message-ID: On Fri, 30 Aug 2024 14:14:05 GMT, Severin Gehwolf wrote: >> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. >> >> I'm adding those tests in order to not regress another time. >> >> Testing: >> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. >> - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) >> - [x] GHA > > Severin Gehwolf has updated the pull request incrementally with four additional commits since the last revision: > > - Fix comment of WB::host_cpus() > - Handle non-root + CGv2 > - Add nested hierarchy to test framework > - Revert "Add root check for SystemdMemoryAwarenessTest.java" > > This reverts commit 7e8d9ed46815096ae8c4502f3320ebf5208438d5. I've updated the patch to: 1. Run tests on cg v2 when non-root (uses `~/.config/systemd/user`) directory and `systemctl --user` as well as `systemd-run --user` in that case. 2. The framework now also sets it up so that there is a lower limit further down the hierarchy: `jdk_internal.slice.d` directory has the lowest limit. The slice files itself have a higher one. The test asserts that the lower limit is being detected correctly. 3. The framework now skips the test in the main entry point (i.e. on cg v1 and non-root). 4. Addressed review comments so far. Tested on cg v1 and cg v2 as root and non-root each (with the patch of #20646 applied as well). Please let me know what you think. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2321413352 From sgehwolf at openjdk.org Fri Aug 30 14:23:22 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 30 Aug 2024 14:23:22 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v6] In-Reply-To: References: <-Ff0X6wkJWy78vOGT8F1m939z9Aoq8VjbUi_OTNoxko=.9447519f-8e98-4d9d-9c94-86cdbbbe3ae1@github.com> Message-ID: On Fri, 30 Aug 2024 11:40:45 GMT, Severin Gehwolf wrote: >> src/hotspot/share/prims/whitebox.cpp line 2507: >> >>> 2505: WB_END >>> 2506: >>> 2507: // Physical cpus of the host machine (including containers), Linux only. >> >> Isn't the comment a bit misleading ? From what I see , ` os::Linux::active_processor_count()` can use various mechanisms to get number of processor info, if it uses https://linux.die.net/man/2/sched_getaffinity it gives the 'set of CPUs on which it is eligible to run.' That might be different from what the host has. > > Yes. See #20768 for an attempt to unify it. I'll change the comment with the update that I have for nested hierarchies. Thanks! I've changed the comment. >> test/hotspot/jtreg/containers/systemd/SystemdMemoryAwarenessTest.java line 58: >> >>> 56: SystemdRunOptions opts = SystemdTestUtils.newOpts("HelloSystemd"); >>> 57: // 1 GB memory >>> 58: opts.memoryLimit("1000M"); >> >> Just wondering - is 1G here possible (the comment states 1 GB / 1024M) ? > > I probably shall fix the comment or change it to `1024M`. Either way it has to match the assertion where we look for `1048576000` bytes in the output. This should be fixed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19530#discussion_r1738770335 PR Review Comment: https://git.openjdk.org/jdk/pull/19530#discussion_r1738770983 From aph at openjdk.org Fri Aug 30 14:49:03 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 30 Aug 2024 14:49:03 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v21] In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: > This patch expands the use of a hash table for secondary superclasses > to the interpreter, C1, and runtime. It also adds a C2 implementation > of hashed lookup in cases where the superclass isn't known at compile > time. > > HotSpot shared runtime > ---------------------- > > Building hashed secondary tables is now unconditional. It takes very > little time, and now that the shared runtime always has the tables, it > might as well take advantage of them. The shared code is easier to > follow now, I think. > > There might be a performance issue with x86-64 in that we build > HotSpot for a default x86-64 target that does not support popcount. > This means that HotSpot C++ runtime on x86 always uses a software > emulation for popcount, even though the vast majority of machines made > for the past 20 years can do popcount in a single instruction. It > wouldn't be terribly hard to do something about that. > > Having said that, the software popcount is really not bad. > > x86 > --- > > x86 is rather tricky, because we still support > `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as > well as 32- and 64-bit ports. There's some further complication in > that only `RCX` can be used as a shift count, so there's some register > shuffling to do. All of this makes the logic in macroAssembler_x86.cpp > rather gnarly, with multiple levels of conditionals at compile time > and runtime. > > AArch64 > ------- > > AArch64 is considerably more straightforward. We always have a > popcount instruction and (thankfully) no 32-bit code to worry about. > > Generally > --------- > > I would dearly love simply to rip out the "old" secondary supers cache > support, but I've left it in just in case someone has a performance > regression. > > The versions of `MacroAssembler::lookup_secondary_supers_table` that > work with variable superclasses don't take a fixed set of temp > registers, and neither do they call out to to a slow path subroutine. > Instead, the slow patch is expanded inline. > > I don't think this is necessarily bad. Apart from the very rare cases > where C2 can't determine the superclass to search for at compile time, > this code is only used for generating stubs, and it seemed to me > ridiculous to have stubs calling other stubs. > > I've followed the guidance from @iwanowww not to obsess too much about > the performance of C1-compiled secondary supers lookups, and to prefer > simplicity over absolute performance. Nonetheless, this is a > complicated patch that touches many areas. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Use post-incrememnt RegSet operator. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19989/files - new: https://git.openjdk.org/jdk/pull/19989/files/dd42fe93..fe754fb6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=19-20 Stats: 14 lines in 3 files changed: 6 ins; 2 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19989.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989 PR: https://git.openjdk.org/jdk/pull/19989 From lmesnik at openjdk.org Fri Aug 30 14:50:45 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 30 Aug 2024 14:50:45 GMT Subject: RFR: 8338934: vmTestbase/nsk/jvmti/*Field*Watch/TestDescription.java tests timeout intermittently [v2] In-Reply-To: References: Message-ID: > The tests time out because of dedlock of of the thread that is in transition and thread changing field watches. > > They use JvmtiThreadState_lock and JvmtiVTMSTransitionDisabler. > > The change field watch require disabler, but attempt to use it only when already locked in > > void > JvmtiEventController::change_field_watch(jvmtiEvent event_type, bool added) { > MutexLocker mu(JvmtiThreadState_lock); > JvmtiEventControllerPrivate::change_field_watch(event_type, added); > } > > > while it is needed to first disable transitions and then try to use JvmtiThreadState_lock. > I quickly looked that most of jvmti methods do it already. Also moved disabler into jvmtiEmv.cpp to be more consistent with other methods. > > > I was able to verify my fix in loom repo locally. and run tier1 + tier5-svc testing in jdk. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: fixed spaces ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20776/files - new: https://git.openjdk.org/jdk/pull/20776/files/89e57b0d..cde6c486 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20776&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20776&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20776.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20776/head:pull/20776 PR: https://git.openjdk.org/jdk/pull/20776 From epeter at openjdk.org Fri Aug 30 15:02:26 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 30 Aug 2024 15:02:26 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Thu, 29 Aug 2024 05:42:58 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding descriptive comments I left a few comments, hopefully I can spend some more time on this next week src/hotspot/cpu/x86/matcher_x86.hpp line 215: > 213: } > 214: > 215: static bool vector_indexes_needs_massaging(BasicType ety, int vlen) { The name "massaging" sounds quite vague. Can we have something more expressive / descriptive? Is it the vector that "needs" massaging or the indices that "need" massaging? Why `ety` and not `bt`? Is that not the name we use most often? src/hotspot/cpu/x86/x86.ad line 10490: > 10488: > 10489: > 10490: instruct selectFromTwoVec_evex(vec dst, vec src1, vec src2) You could rename `dst` -> `mask_and_dst`. That would maybe help the reader to more quickly know that it is an input-mask and output-dst. src/hotspot/share/opto/vectorIntrinsics.cpp line 2716: > 2714: C->set_max_vector_size(MAX2(C->max_vector_size(), (uint)(num_elem * type2aelembytes(elem_bt)))); > 2715: return true; > 2716: } The code in these methods are extremely duplicated. Unboxing and boxing in every method around here. Maybe not your problem in this PR. BTW: your error logging used `v1` in all 3 cases `op1-3`, you probably want to give them useful names. `v1-3` probably? All this copy-pasting makes it easy to miss updating some cases... like it happenend here. src/hotspot/share/opto/vectornode.cpp line 2090: > 2088: int num_elem = vect_type()->length(); > 2089: BasicType elem_bt = vect_type()->element_basic_type(); > 2090: if (Matcher::match_rule_supported_vector(Op_SelectFromTwoVector, num_elem, elem_bt)) { Suggestion: // Keep the node if it is supported, else lower it to other nodes. if (Matcher::match_rule_supported_vector(Op_SelectFromTwoVector, num_elem, elem_bt)) { src/hotspot/share/opto/vectornode.cpp line 2095: > 2093: Node* index_vec = in(1); > 2094: Node* src1 = in(2); > 2095: Node* src2 = in(3); Suggestion: Node* src1 = in(2); Node* src2 = in(3); unnecessary spaces src/hotspot/share/opto/vectornode.cpp line 2101: > 2099: // (VectorBlend > 2100: // (VectorRearrange SRC1, INDEX) > 2101: // (VectorRearrange SRC2, NORM_INDEX) Suggestion: // (VectorRearrange SRC1 INDEX) // (VectorRearrange SRC2 NORM_INDEX) Either consistently use commas or none at all ;) src/hotspot/share/opto/vectornode.cpp line 2104: > 2102: // MASK) > 2103: // This shall prevent an intrinsification failure and associated argument > 2104: // boxing penalties. A quick comment about how the mask is computed could be nice. `MASK = INDEX < num_elem` src/hotspot/share/opto/vectornode.cpp line 2126: > 2124: case T_FLOAT: > 2125: return phase->transform(new VectorCastF2XNode(index_vec, TypeVect::make(T_INT, num_elem))); > 2126: break; `break` after `return`? src/hotspot/share/opto/vectornode.cpp line 2141: > 2139: default: return elem_bt; > 2140: } > 2141: }; This is definitely a style question. But it might be nice to make these functions member functions. They now kinda disrupt the flow of the `::Ideal` method. And in some cases you use the captured variables, and in other cases you pass them in explicitly, even though they already exist in the captured scope... consistency would be nice. src/hotspot/share/opto/vectornode.cpp line 2148: > 2146: > 2147: BoolTest::mask pred = BoolTest::lt; > 2148: ConINode* pred_node = (ConINode*)phase->makecon(TypeInt::make(pred)); Would `as_ConI()` be a better alternative to the `(ConINode*)` cast? src/hotspot/share/opto/vectornode.cpp line 2149: > 2147: BoolTest::mask pred = BoolTest::lt; > 2148: ConINode* pred_node = (ConINode*)phase->makecon(TypeInt::make(pred)); > 2149: Node* lane_cnt = phase->makecon(lane_count_type()); Hmm. I don't like to have different names for the same thing. `num_elem` and `lane_count` and `lane_cnt`. What about a method `make_num_elem_node`, returns a `ConNode*`. Then you pass it around as `num_elem_scalar`, and broadcast it to `num_elem_vector`. src/hotspot/share/opto/vectornode.cpp line 2159: > 2157: > 2158: vmask_type = TypeVect::makemask(elem_bt, num_elem); > 2159: mask = phase->transform(new VectorMaskCastNode(mask, vmask_type)); I would just have two variables, and not overwrite it: `integral_vmask_type` and `vmask_type`. Maybe also `mask` could be split into two variables? src/hotspot/share/opto/vectornode.cpp line 2181: > 2179: default: return elem_bt; > 2180: } > 2181: }; You are now using this twice. Is there not some method that already does this? src/hotspot/share/opto/vectornode.cpp line 2183: > 2181: }; > 2182: // Targets emulating unsupported permutation for certain vector types > 2183: // may need to message the indexes to match the users intent. Suggestion: // may need to massage the indexes to match the users intent. src/hotspot/share/opto/vectornode.hpp line 1272: > 1270: }; > 1271: > 1272: spurious newline src/hotspot/share/opto/vectornode.hpp line 1621: > 1619: public: > 1620: SelectFromTwoVectorNode(Node* in1, Node* in2, Node* in3, const TypeVect* vt) > 1621: : VectorNode(in1, in2, in3, vt) {} I would prefer more expressive variable names and a short specification what the node does. Otherwise one always has to reverse-engineer what inputs are acceptable etc. I mean you could even require `VectorNode*` as inputs. ------------- PR Review: https://git.openjdk.org/jdk/pull/20508#pullrequestreview-2272308274 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738648483 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738738172 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738759799 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738767466 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738768205 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738765017 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738823939 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738808199 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738781635 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738787762 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738806978 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738814420 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738838073 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738835911 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738729168 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738866021 From epeter at openjdk.org Fri Aug 30 15:02:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 30 Aug 2024 15:02:27 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 30 Aug 2024 13:17:26 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding descriptive comments > > src/hotspot/cpu/x86/matcher_x86.hpp line 215: > >> 213: } >> 214: >> 215: static bool vector_indexes_needs_massaging(BasicType ety, int vlen) { > > The name "massaging" sounds quite vague. Can we have something more expressive / descriptive? Is it the vector that "needs" massaging or the indices that "need" massaging? > > Why `ety` and not `bt`? Is that not the name we use most often? Hmm, I see that `ety` is used in other places here. What does it stand for? > src/hotspot/share/opto/vectornode.cpp line 2183: > >> 2181: }; >> 2182: // Targets emulating unsupported permutation for certain vector types >> 2183: // may need to message the indexes to match the users intent. > > Suggestion: > > // may need to massage the indexes to match the users intent. This optimization for now seems quite specific to your `SelectFromTwoVectorNode::Ideal` lowering code. Can this conversion not be done there already? What is the semantics of `VectorRearrangeNode`? Should its shuffle vector always be bytes, and we now violated that "for a quick second"? Or is it going to be generally the idea to create all sorts of shuffle types and then fix that up? But then why do we need the `vector_indexes_needs_massaging`? Can you help me understand the concept/strategy behind this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738714401 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738862138 From ihse at openjdk.org Fri Aug 30 15:51:22 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 30 Aug 2024 15:51:22 GMT Subject: RFR: 8329816: Add SLEEF version 3.6.1 [v2] In-Reply-To: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> References: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> Message-ID: > [JDK-8312425](https://bugs.openjdk.org/browse/JDK-8312425) is looking to optimize vector math operations by leveraging the SLEEF library. For legal reasons the actual contribution of the SLEEF files needs to be handled separately. > > This is a new attempt at solving [JDK-8329816](https://bugs.openjdk.org/browse/JDK-8329816); the original attempt is here: https://github.com/openjdk/jdk/pull/19185. This PR is based on the discussions on how to move forward that was held in that original PR. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Update the generated sources based on the whitespace-fixed upstream source ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20781/files - new: https://git.openjdk.org/jdk/pull/20781/files/ccfe566e..c4c58e6c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20781&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20781&range=00-01 Stats: 603 lines in 3 files changed: 0 ins; 0 del; 603 mod Patch: https://git.openjdk.org/jdk/pull/20781.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20781/head:pull/20781 PR: https://git.openjdk.org/jdk/pull/20781 From ihse at openjdk.org Fri Aug 30 15:56:17 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 30 Aug 2024 15:56:17 GMT Subject: RFR: 8329816: Add SLEEF version 3.6.1 [v3] In-Reply-To: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> References: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> Message-ID: > [JDK-8312425](https://bugs.openjdk.org/browse/JDK-8312425) is looking to optimize vector math operations by leveraging the SLEEF library. For legal reasons the actual contribution of the SLEEF files needs to be handled separately. > > This is a new attempt at solving [JDK-8329816](https://bugs.openjdk.org/browse/JDK-8329816); the original attempt is here: https://github.com/openjdk/jdk/pull/19185. This PR is based on the discussions on how to move forward that was held in that original PR. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Fix typo Co-authored-by: Erik Joelsson <37597443+erikj79 at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20781/files - new: https://git.openjdk.org/jdk/pull/20781/files/c4c58e6c..db12b25b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20781&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20781&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20781.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20781/head:pull/20781 PR: https://git.openjdk.org/jdk/pull/20781 From kvn at openjdk.org Fri Aug 30 16:02:18 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 30 Aug 2024 16:02:18 GMT Subject: RFR: 8338018: Rename ClassPrelinker to AOTConstantPoolResolver In-Reply-To: References: Message-ID: On Fri, 9 Aug 2024 00:26:27 GMT, Ioi Lam wrote: > This is the 2nd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > A simple renaming of the `ClassPrelinker` class to `AOTConstantPoolLinker`, so that the name is consistent with new classes that will be introduced in subsequent PRs for JEP 483 (`AOTClassLinker`, `AOTLinkedClassTable`, and `AOTLinkedClassBulkLoader`). > > ----- > See [here](https://bugs.openjdk.org/browse/JDK-8315737) for the sequence of dependent RFEs for implementing JEP 483. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20517#pullrequestreview-2272882801 From ihse at openjdk.org Fri Aug 30 16:03:19 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 30 Aug 2024 16:03:19 GMT Subject: RFR: 8329816: Add SLEEF version 3.6.1 [v4] In-Reply-To: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> References: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> Message-ID: <-zH_fwCKD_PmNXdigApQcfk6KON7z2iggFh9okwXrP0=.4504ce8d-81e4-4eb5-ae12-f40150192ba8@github.com> > [JDK-8312425](https://bugs.openjdk.org/browse/JDK-8312425) is looking to optimize vector math operations by leveraging the SLEEF library. For legal reasons the actual contribution of the SLEEF files needs to be handled separately. > > This is a new attempt at solving [JDK-8329816](https://bugs.openjdk.org/browse/JDK-8329816); the original attempt is here: https://github.com/openjdk/jdk/pull/19185. This PR is based on the discussions on how to move forward that was held in that original PR. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: README fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20781/files - new: https://git.openjdk.org/jdk/pull/20781/files/db12b25b..6992d5b6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20781&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20781&range=02-03 Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20781.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20781/head:pull/20781 PR: https://git.openjdk.org/jdk/pull/20781 From ihse at openjdk.org Fri Aug 30 16:03:19 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 30 Aug 2024 16:03:19 GMT Subject: RFR: 8329816: Add SLEEF version 3.6.1 [v4] In-Reply-To: References: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> Message-ID: <7X-_fQHq0GcqzB1m7ZieK51JPh6ZhTsWBZM700LyvQQ=.08d243b6-0240-415b-8dfa-34a147bce99b@github.com> On Fri, 30 Aug 2024 13:50:11 GMT, Erik Joelsson wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> README fixes > > src/jdk.incubator.vector/linux/native/libsleef/README.md line 29: > >> 27: `https://github.com/shibatch/sleef.git`, and copy all files, except the `docs` >> 28: and `.github` directories, into >> 29: `src/jdk.incubator.vector/linux/native/libsleef/upstream`. > > I think you need to add something about the need for whitespace cleanup here. Good point. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20781#discussion_r1739003726 From ihse at openjdk.org Fri Aug 30 16:03:33 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 30 Aug 2024 16:03:33 GMT Subject: RFR: 8329816: Add SLEEF version 3.6.1 [v3] In-Reply-To: References: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> Message-ID: On Fri, 30 Aug 2024 15:56:17 GMT, Magnus Ihse Bursie wrote: >> [JDK-8312425](https://bugs.openjdk.org/browse/JDK-8312425) is looking to optimize vector math operations by leveraging the SLEEF library. For legal reasons the actual contribution of the SLEEF files needs to be handled separately. >> >> This is a new attempt at solving [JDK-8329816](https://bugs.openjdk.org/browse/JDK-8329816); the original attempt is here: https://github.com/openjdk/jdk/pull/19185. This PR is based on the discussions on how to move forward that was held in that original PR. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo > > Co-authored-by: Erik Joelsson <37597443+erikj79 at users.noreply.github.com> I have updated the generated files. If you would like to re-try that you can replicate them exactly this time that'd be appreciated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20781#issuecomment-2321757498 From luhenry at openjdk.org Fri Aug 30 16:25:19 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 30 Aug 2024 16:25:19 GMT Subject: RFR: 8339248: RISC-V: Remove li64 macro assembler routine and related code In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 12:59:40 GMT, Fei Yang wrote: > The macro assembler routine li64 and related code (is_li64_at, patch_imm_in_li64, get_target_of_li64 and check_li64_data_dependency) is unused for now. We should remove these unused code, which will save us some unnecessary runtime checks. We can add them back when needed again someday. > > Testing: > - [x] release & fastdebug build > - [x] Gtest & Tier1 test (release) Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20769#pullrequestreview-2272978689 From matsaave at openjdk.org Fri Aug 30 16:34:30 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 30 Aug 2024 16:34:30 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v3] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 14:00:44 GMT, Coleen Phillimore wrote: >> Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. >> >> Tested with tier1-7. >> >> NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. > > Coleen Phillimore has updated the pull request incrementally with three additional commits since the last revision: > > - Fix jvmci code. > - Some C2 refactoring. > - Assembly corrections from Matias and Dean. Interpreter changes look good, thanks! ------------- Marked as reviewed by matsaave (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20719#pullrequestreview-2273011769 From aph at openjdk.org Fri Aug 30 16:37:05 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 30 Aug 2024 16:37:05 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v22] In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: > This patch expands the use of a hash table for secondary superclasses > to the interpreter, C1, and runtime. It also adds a C2 implementation > of hashed lookup in cases where the superclass isn't known at compile > time. > > HotSpot shared runtime > ---------------------- > > Building hashed secondary tables is now unconditional. It takes very > little time, and now that the shared runtime always has the tables, it > might as well take advantage of them. The shared code is easier to > follow now, I think. > > There might be a performance issue with x86-64 in that we build > HotSpot for a default x86-64 target that does not support popcount. > This means that HotSpot C++ runtime on x86 always uses a software > emulation for popcount, even though the vast majority of machines made > for the past 20 years can do popcount in a single instruction. It > wouldn't be terribly hard to do something about that. > > Having said that, the software popcount is really not bad. > > x86 > --- > > x86 is rather tricky, because we still support > `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as > well as 32- and 64-bit ports. There's some further complication in > that only `RCX` can be used as a shift count, so there's some register > shuffling to do. All of this makes the logic in macroAssembler_x86.cpp > rather gnarly, with multiple levels of conditionals at compile time > and runtime. > > AArch64 > ------- > > AArch64 is considerably more straightforward. We always have a > popcount instruction and (thankfully) no 32-bit code to worry about. > > Generally > --------- > > I would dearly love simply to rip out the "old" secondary supers cache > support, but I've left it in just in case someone has a performance > regression. > > The versions of `MacroAssembler::lookup_secondary_supers_table` that > work with variable superclasses don't take a fixed set of temp > registers, and neither do they call out to to a slow path subroutine. > Instead, the slow patch is expanded inline. > > I don't think this is necessarily bad. Apart from the very rare cases > where C2 can't determine the superclass to search for at compile time, > this code is only used for generating stubs, and it seemed to me > ridiculous to have stubs calling other stubs. > > I've followed the guidance from @iwanowww not to obsess too much about > the performance of C1-compiled secondary supers lookups, and to prefer > simplicity over absolute performance. Nonetheless, this is a > complicated patch that touches many areas. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Fix s390 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19989/files - new: https://git.openjdk.org/jdk/pull/19989/files/fe754fb6..a726b628 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=20-21 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19989.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989 PR: https://git.openjdk.org/jdk/pull/19989 From gziemski at openjdk.org Fri Aug 30 16:51:20 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 30 Aug 2024 16:51:20 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: <3MBOQciBs-c6nqI3dWH7w-bWkqlPDdSH2ztkIu3Den0=.18a52372-c679-4137-a570-ea2fe4312699@github.com> On Wed, 7 Aug 2024 17:13:06 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. > > There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) Assuming for a second that we agree on the name of new type ( for now `MemTag`). Are we sure we want to combine that rename `MEMFLAGS` --> `MemTag` with changing the names of parameters and variables around this change at the same time? This is exactly what I initially did, but after I took a 2nd look I got worried about how big it looked. Just for a reference these are my changes I am talking about, which I abandoned: https://openjdk.github.io/cr/?repo=jdk&pr=20472&range=00 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2321955298 From ihse at openjdk.org Fri Aug 30 16:57:18 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 30 Aug 2024 16:57:18 GMT Subject: RFR: 8329816: Add SLEEF version 3.6.1 [v5] In-Reply-To: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> References: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> Message-ID: > [JDK-8312425](https://bugs.openjdk.org/browse/JDK-8312425) is looking to optimize vector math operations by leveraging the SLEEF library. For legal reasons the actual contribution of the SLEEF files needs to be handled separately. > > This is a new attempt at solving [JDK-8329816](https://bugs.openjdk.org/browse/JDK-8329816); the original attempt is here: https://github.com/openjdk/jdk/pull/19185. This PR is based on the discussions on how to move forward that was held in that original PR. Magnus Ihse Bursie has updated the pull request incrementally with two additional commits since the last revision: - Use "whitespace" as an uncountable noun Co-authored-by: Erik Joelsson <37597443+erikj79 at users.noreply.github.com> - I suck at English verb forms Co-authored-by: Erik Joelsson <37597443+erikj79 at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20781/files - new: https://git.openjdk.org/jdk/pull/20781/files/6992d5b6..e5fe681e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20781&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20781&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20781.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20781/head:pull/20781 PR: https://git.openjdk.org/jdk/pull/20781 From erikj at openjdk.org Fri Aug 30 16:57:20 2024 From: erikj at openjdk.org (Erik Joelsson) Date: Fri, 30 Aug 2024 16:57:20 GMT Subject: RFR: 8329816: Add SLEEF version 3.6.1 [v4] In-Reply-To: <-zH_fwCKD_PmNXdigApQcfk6KON7z2iggFh9okwXrP0=.4504ce8d-81e4-4eb5-ae12-f40150192ba8@github.com> References: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> <-zH_fwCKD_PmNXdigApQcfk6KON7z2iggFh9okwXrP0=.4504ce8d-81e4-4eb5-ae12-f40150192ba8@github.com> Message-ID: On Fri, 30 Aug 2024 16:03:19 GMT, Magnus Ihse Bursie wrote: >> [JDK-8312425](https://bugs.openjdk.org/browse/JDK-8312425) is looking to optimize vector math operations by leveraging the SLEEF library. For legal reasons the actual contribution of the SLEEF files needs to be handled separately. >> >> This is a new attempt at solving [JDK-8329816](https://bugs.openjdk.org/browse/JDK-8329816); the original attempt is here: https://github.com/openjdk/jdk/pull/19185. This PR is based on the discussions on how to move forward that was held in that original PR. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > README fixes Can confirm that I get no diff when running the update target now. src/jdk.incubator.vector/linux/native/libsleef/README.md line 31: > 29: `src/jdk.incubator.vector/linux/native/libsleef/upstream`. > 30: > 31: The libsleef source code do not follow the JDK whitespace rules as enforced by Suggestion: The libsleef source code does not follow the JDK whitespace rules as enforced by src/jdk.incubator.vector/linux/native/libsleef/README.md line 32: > 30: > 31: The libsleef source code do not follow the JDK whitespace rules as enforced by > 32: jcheck. You will need to remove trailing whitespaces, and expand tabs to 8 Suggestion: jcheck. You will need to remove trailing whitespace, and expand tabs to 8 ------------- PR Review: https://git.openjdk.org/jdk/pull/20781#pullrequestreview-2273050769 PR Review Comment: https://git.openjdk.org/jdk/pull/20781#discussion_r1739118570 PR Review Comment: https://git.openjdk.org/jdk/pull/20781#discussion_r1739119374 From coleenp at openjdk.org Fri Aug 30 17:01:21 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 30 Aug 2024 17:01:21 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: <76joSrzfa19YzNpoHpOkmBliI6HwsXz6uzWlk2hleXM=.773eac71-54ed-4aa4-a577-0b3f53717527@github.com> On Wed, 7 Aug 2024 17:13:06 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. > > There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) Yes, I think people want the name of the new type, and variables that are declared with this type changed in one PR. Just looking through a bit of this webrev, this looks fine. This isn't very hard to review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2321978633 From erikj at openjdk.org Fri Aug 30 17:04:19 2024 From: erikj at openjdk.org (Erik Joelsson) Date: Fri, 30 Aug 2024 17:04:19 GMT Subject: RFR: 8329816: Add SLEEF version 3.6.1 [v5] In-Reply-To: References: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> Message-ID: On Fri, 30 Aug 2024 16:57:18 GMT, Magnus Ihse Bursie wrote: >> [JDK-8312425](https://bugs.openjdk.org/browse/JDK-8312425) is looking to optimize vector math operations by leveraging the SLEEF library. For legal reasons the actual contribution of the SLEEF files needs to be handled separately. >> >> This is a new attempt at solving [JDK-8329816](https://bugs.openjdk.org/browse/JDK-8329816); the original attempt is here: https://github.com/openjdk/jdk/pull/19185. This PR is based on the discussions on how to move forward that was held in that original PR. > > Magnus Ihse Bursie has updated the pull request incrementally with two additional commits since the last revision: > > - Use "whitespace" as an uncountable noun > > Co-authored-by: Erik Joelsson <37597443+erikj79 at users.noreply.github.com> > - I suck at English verb forms > > Co-authored-by: Erik Joelsson <37597443+erikj79 at users.noreply.github.com> Marked as reviewed by erikj (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20781#pullrequestreview-2273084845 From ayang at openjdk.org Fri Aug 30 18:13:23 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 30 Aug 2024 18:13:23 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 11:15:23 GMT, Stefan Karlsson wrote: >> I was thinking the same, but there's a problem with that. If we get a promotion failure in the young gen, we are leaving the dead objects marked as forwarded. Then when the Full GC scans these regions with dead objects it will mistakenly think that they have been marked alive because `is_forwarded() == is_gc_marked()`. The code in `phase2_calculate_new_addr` will then break when it looks for `is_gc_marked` objects. > > FWIW, the ParallelGC does something very similar to what you propose, except that it walks bitmaps instead of paring the space to find the self-forwarded objects. It then has a check inside object_iterate to make sure that it doesn't expose the dead objects (in eden and the from space) to heap dumpers and histogram printers. > > Because of the the code above, the SerialGC clears away the information about what objects are dead in eden and the from space, so heap dumpers and histogram printers will include these dead objects. We might want to fix that as a future RFE. > If we get a promotion failure in the young gen, we are leaving the dead objects marked as forwarded. True; need to do sth like `obj->init_mark();` for the non-self-forwarded case. The postcondition is that no forwarded objs in eden/from. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1739218189 From amenkov at openjdk.org Fri Aug 30 19:17:18 2024 From: amenkov at openjdk.org (Alex Menkov) Date: Fri, 30 Aug 2024 19:17:18 GMT Subject: RFR: 8338934: vmTestbase/nsk/jvmti/*Field*Watch/TestDescription.java tests timeout intermittently [v2] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 14:50:45 GMT, Leonid Mesnik wrote: >> The tests time out because of dedlock of of the thread that is in transition and thread changing field watches. >> >> They use JvmtiThreadState_lock and JvmtiVTMSTransitionDisabler. >> >> The change field watch require disabler, but attempt to use it only when already locked in >> >> void >> JvmtiEventController::change_field_watch(jvmtiEvent event_type, bool added) { >> MutexLocker mu(JvmtiThreadState_lock); >> JvmtiEventControllerPrivate::change_field_watch(event_type, added); >> } >> >> >> while it is needed to first disable transitions and then try to use JvmtiThreadState_lock. >> I quickly looked that most of jvmti methods do it already. Also moved disabler into jvmtiEmv.cpp to be more consistent with other methods. >> >> >> I was able to verify my fix in loom repo locally. and run tier1 + tier5-svc testing in jdk. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > fixed spaces Marked as reviewed by amenkov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20776#pullrequestreview-2273334746 From mli at openjdk.org Fri Aug 30 19:51:19 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 30 Aug 2024 19:51:19 GMT Subject: RFR: 8329816: Add SLEEF version 3.6.1 [v5] In-Reply-To: References: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> Message-ID: On Fri, 30 Aug 2024 16:57:18 GMT, Magnus Ihse Bursie wrote: >> [JDK-8312425](https://bugs.openjdk.org/browse/JDK-8312425) is looking to optimize vector math operations by leveraging the SLEEF library. For legal reasons the actual contribution of the SLEEF files needs to be handled separately. >> >> This is a new attempt at solving [JDK-8329816](https://bugs.openjdk.org/browse/JDK-8329816); the original attempt is here: https://github.com/openjdk/jdk/pull/19185. This PR is based on the discussions on how to move forward that was held in that original PR. > > Magnus Ihse Bursie has updated the pull request incrementally with two additional commits since the last revision: > > - Use "whitespace" as an uncountable noun > > Co-authored-by: Erik Joelsson <37597443+erikj79 at users.noreply.github.com> > - I suck at English verb forms > > Co-authored-by: Erik Joelsson <37597443+erikj79 at users.noreply.github.com> I've applied the generated sleef headers to aarch64/riscv64 patches (`Optimize vector math operations with SLEEF`), all good (the so files are built successfully, the tests jdk/incubator/vector/(Double|Float)MaxVectorTests.java run successfully, also output log is correct), although I don't run jmh tests, I think it's fine. Thanks @magicus @erikj79 for the efficient co-work! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20781#pullrequestreview-2273399318 From coleenp at openjdk.org Fri Aug 30 20:13:22 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 30 Aug 2024 20:13:22 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v3] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 14:00:44 GMT, Coleen Phillimore wrote: >> Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. >> >> Tested with tier1-7. >> >> NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. > > Coleen Phillimore has updated the pull request incrementally with three additional commits since the last revision: > > - Fix jvmci code. > - Some C2 refactoring. > - Assembly corrections from Matias and Dean. Thanks Chris and Matias for reviewing parts of this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20719#issuecomment-2322265202 From duke at openjdk.org Fri Aug 30 20:26:05 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 30 Aug 2024 20:26:05 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v2] In-Reply-To: References: Message-ID: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Add stub initialization and extra tanh tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20657/files - new: https://git.openjdk.org/jdk/pull/20657/files/79766f1b..4739ad45 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=00-01 Stats: 65 lines in 2 files changed: 65 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20657/head:pull/20657 PR: https://git.openjdk.org/jdk/pull/20657 From coleenp at openjdk.org Fri Aug 30 20:26:52 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 30 Aug 2024 20:26:52 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v4] In-Reply-To: References: Message-ID: > Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. > > Tested with tier1-7. > > NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Refactor jvmci code. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20719/files - new: https://git.openjdk.org/jdk/pull/20719/files/852ca049..9e93b2a5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20719&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20719&range=02-03 Stats: 9 lines in 1 file changed: 5 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20719.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20719/head:pull/20719 PR: https://git.openjdk.org/jdk/pull/20719 From coleenp at openjdk.org Fri Aug 30 20:26:52 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 30 Aug 2024 20:26:52 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 14:11:08 GMT, Coleen Phillimore wrote: >> src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 274: >> >>> 272: nonstatic_field(Klass, _bitmap, uintx) \ >>> 273: nonstatic_field(Klass, _hash_slot, uint8_t) \ >>> 274: nonstatic_field(Klass, _misc_flags._flags, u1) \ >> >> Can we export `_misc_flags` instead, similar to `_access_flags`? >> >> diff --git a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp >> index 9d65268f0fe..6170647186c 100644 >> --- a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp >> +++ b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp >> @@ -268,10 +268,10 @@ >> nonstatic_field(Klass, _java_mirror, OopHandle) \ >> nonstatic_field(Klass, _modifier_flags, jint) \ >> nonstatic_field(Klass, _access_flags, AccessFlags) \ >> + nonstatic_field(Klass, _misc_flags, KlassFlags) \ >> nonstatic_field(Klass, _class_loader_data, ClassLoaderData*) \ >> nonstatic_field(Klass, _bitmap, uintx) \ >> nonstatic_field(Klass, _hash_slot, uint8_t) \ >> - nonstatic_field(Klass, _misc_flags._flags, u1) \ >> \ >> nonstatic_field(LocalVariableTableElement, start_bci, u2) \ >> nonstatic_field(LocalVariableTableElement, length, u2) \ >> diff --git a/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedObjectTypeImpl.java b/src/jdk.internal.vm.ci/share/classes/jdk/... > > I don't think the JVMCI knows about the type KlassFlags - I used the same code that I used for InstanceKlass::_misc_flags._flags (see above this). I made the change to refactor the getMiscFlags function, but if you want to add knowledge of the KlassFlags class (and InstanceKlassFlags also), you could do that separately from this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1739381799 From dlong at openjdk.org Fri Aug 30 20:26:53 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 30 Aug 2024 20:26:53 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v3] In-Reply-To: References: Message-ID: <-dyvOdrDMU8UERNjLmg8NhFNta6ukiqRXrM1oJvyzc4=.f9f80021-8b7a-425f-807a-89b7dab293dc@github.com> On Fri, 30 Aug 2024 14:00:44 GMT, Coleen Phillimore wrote: >> Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. >> >> Tested with tier1-7. >> >> NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. > > Coleen Phillimore has updated the pull request incrementally with three additional commits since the last revision: > > - Fix jvmci code. > - Some C2 refactoring. > - Assembly corrections from Matias and Dean. src/hotspot/share/opto/library_call.cpp line 3777: > 3775: Node* p = basic_plus_adr(kls, in_bytes(Klass::misc_flags_offset())); > 3776: Node* mods = make_load(nullptr, p, TypeInt::UBYTE, T_BOOLEAN, MemNode::unordered); > 3777: return generate_mods_flags_guard(mods, modifier_mask, modifier_bits, region); Suggestion: return generate_mods_flags_guard(mods, modifier_mask, modifier_bits, region, Klass::misc_flags_offset(), TypeInt::UBYTE, T_BOOLEAN); This looks much better, but can't you leave the basic_plus_adr and make_load in generate_mods_flags_guard, and pass in the needed specialization? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1739380935 From cjplummer at openjdk.org Fri Aug 30 20:30:19 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 30 Aug 2024 20:30:19 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame In-Reply-To: References: Message-ID: <4Gro4J2pZmpS9XGJes3pw-lSfil0jlJbm6Hf4zg1Xsc=.e08da306-1a6d-4222-9340-0885043345e3@github.com> On Fri, 30 Aug 2024 09:14:11 GMT, Yasumasa Suenaga wrote: > I attempted to check stack trace in the core generated by [SEGV example in upcall](https://github.com/YaSuenag/garakuta/blob/841452d9176dab1ddbb552009c180530eb81190b/NativeSEGV/ffm/upcall/src/main/java/com/yasuenag/garakuta/nativesegv/upcall/Main.java) with `jhsdb jstack`, however it failed with following exception. > > > Error occurred during stack walking: > java.lang.RuntimeException: Couldn't deduce type of CodeBlob @0x00007fa04c265990 for PC=0x00007fa04c265aa6 > at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlobUnsafe(CodeCache.java:124) > at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlob(CodeCache.java:83) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.cb(Frame.java:119) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.adjustUnextendedSP(X86Frame.java:334) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.initFrame(X86Frame.java:137) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.(X86Frame.java:163) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.senderForInterpreterFrame(X86Frame.java:361) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.sender(X86Frame.java:281) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.sender(Frame.java:207) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.realSender(Frame.java:212) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.sender(VFrame.java:120) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.javaSender(VFrame.java:144) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:81) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.run(JStack.java:67) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:278) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:241) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:134) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.runWithArgs(JStack.java:90) > at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJSTACK(SALauncher.java:302) > at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:500) > Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x00007fa04c265990 (nearest symbol is _ZTV10UpcallStub) > at jdk.hotspot.agent/sun.jvm.hotspot.run... Thanks for fixing this. The SA changes look fine. You'll need an FFM expert for the hotspot changes. Is it possible to provide a test case? Maybe make a call into some blocking native API and then run jstack on the process. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/code/UpcallStub.java line 46: > 44: Type type = db.lookupType("UpcallStub"); > 45: > 46: // FIXME: add any needed fields I think you can remove this comment since clearly none of the fields are needed by SA. ------------- PR Review: https://git.openjdk.org/jdk/pull/20789#pullrequestreview-2273442126 PR Review Comment: https://git.openjdk.org/jdk/pull/20789#discussion_r1739380266 From duke at openjdk.org Fri Aug 30 20:37:19 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 30 Aug 2024 20:37:19 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm In-Reply-To: References: <5RUXvY7Tb8B_QYxg0iLNaC5d6fcNMdHUYXdEzyBoQ_U=.19f4d853-620a-459c-acc1-d57bfd6fb7bc@github.com> Message-ID: <8HXosdqMZUFQZ0dFkmNbi2lBcbGSJl-pyeCZ9HsHOPg=.7d55d0d6-5aa6-4b1e-9ed2-138b8654237e@github.com> On Tue, 27 Aug 2024 22:44:43 GMT, Joe Darcy wrote: >>> This PR doesn't include any additional tests. It is often appropriate to add more regression testing when introducing a new implementation of a method. >> >> Thank You Joe for the suggestion. Will add more tests. (This PR passes the tier-1 tanh tests in the HyperbolicTests.Java) > >> > This PR doesn't include any additional tests. It is often appropriate to add more regression testing when introducing a new implementation of a method. >> >> Thank You Joe for the suggestion. Will add more tests. (This PR passes the tier-1 tanh tests in the HyperbolicTests.Java) > > Yes @vamsi-parasa ; running that test is a good backstop and it is written to be applicable to any implementation of {sinh, cosh, tanh} that meet the general quality-of-implementation criteria for java.lang.Math. To be explicit, the WorstCaseTests.java file, and for good measure all the java.lang.Math tests, should also be run too for a change like this. > > For a hypothetical example, if an intrinsic used different polynomials for different ranges of the input, it would be a reasonable regression tests _for that implementation_ to probe around the boundary of the transition between the polynomials to make sure the monotonicity requirements were being met. > > That kind of check could be written to be generally applicable and be suitable for a regression tests in java/lang/Math or could be suitable for a regression test in the HotSpot area. HTH Hi Joe(@jddarcy) and Andrew (@theRealAph) , Please see the updates below: > This PR doesn't include any additional tests. It is often appropriate to add more regression testing when introducing a new implementation of a method. > Added 1500 regression tests in HyperbolicTests.java which compare the accuracy of the Math.tanh intrinsic by using StrictMath.tanh (which calls FdLibm.Tanh.compute) as a reference. The tests are passing within 2.5 ulps of the expected result. The tests are fairly exhaustive and also cover the boundary transitions. > Yes @vamsi-parasa ; running that test is a good backstop and it is written to be applicable to any implementation of {sinh, cosh, tanh} that meet the general quality-of-implementation criteria for java.lang.Math. To be explicit, the WorstCaseTests.java file, and for good measure all the java.lang.Math tests, should also be run too for a change like this. > Ran the WorstCaseTests.java and all the tests in java.lang.Math and they're passing on my local machine. > For a hypothetical example, if an intrinsic used different polynomials for different ranges of the input, it would be a reasonable regression tests _for that implementation_ to probe around the boundary of the transition between the polynomials to make sure the monotonicity requirements were being met. > Added new tests in HyperbolicTests.java which probe around the various boundaries of transition. 1500 testcases and they passed within 2.5ulps of the reference StrictMath.tanh > That kind of check could be written to be generally applicable and be suitable for a regression tests in java/lang/Math or could be suitable for a regression test in the HotSpot area. HTH Please let me know if anything more needs to be added. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20657#issuecomment-2322295827 From duke at openjdk.org Fri Aug 30 20:37:20 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 30 Aug 2024 20:37:20 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v2] In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 13:14:22 GMT, Yudi Zheng wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Add stub initialization and extra tanh tests > > src/hotspot/share/jvmci/jvmciCompilerToVM.hpp line 114: > >> 112: static address dcos; >> 113: static address dtan; >> 114: static address dtanh; > > Could you please add the following initializing code? > > diff --git a/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp b/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp > index 9752d7edf99..1db9be70db0 100644 > --- a/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp > +++ b/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp > @@ -259,6 +259,17 @@ void CompilerToVM::Data::initialize(JVMCI_TRAPS) { > SET_TRIGFUNC(dpow); > > #undef SET_TRIGFUNC > + > +#define SET_TRIGFUNC_OR_NULL(name) \ > + if (StubRoutines::name() != nullptr) { \ > + name = StubRoutines::name(); \ > + } else { \ > + name = nullptr; \ > + } > + > + SET_TRIGFUNC_OR_NULL(dtanh); > + > +#undef SET_TRIGFUNC_OR_NULL > } > > static jboolean is_c1_supported(vmIntrinsics::ID id){ > diff --git a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp > index fea308503cf..189c1465589 100644 > --- a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp > +++ b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp > @@ -126,6 +126,7 @@ > static_field(CompilerToVM::Data, dsin, address) \ > static_field(CompilerToVM::Data, dcos, address) \ > static_field(CompilerToVM::Data, dtan, address) \ > + static_field(CompilerToVM::Data, dtanh, address) \ > static_field(CompilerToVM::Data, dexp, address) \ > static_field(CompilerToVM::Data, dlog, address) \ > static_field(CompilerToVM::Data, dlog10, address) \ Thank You Yudi! Please see the code updated with your suggestion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1739390171 From cjplummer at openjdk.org Fri Aug 30 20:47:20 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 30 Aug 2024 20:47:20 GMT Subject: RFR: 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths [v2] In-Reply-To: <2e6s-MMPDH7HvC8BHvUV4SzjJximYjZr44OL_CnwFWc=.042e04ef-ba2c-4964-9973-4d9963a6410a@github.com> References: <2e6s-MMPDH7HvC8BHvUV4SzjJximYjZr44OL_CnwFWc=.042e04ef-ba2c-4964-9973-4d9963a6410a@github.com> Message-ID: On Fri, 30 Aug 2024 05:21:54 GMT, David Holmes wrote: >> This is the implementation of a new method added to the JNI specification. >> >> From the CSR request: >> >> The `GetStringUTFLength` function returns the length as a `jint` (`jsize`) value and so is limited to returning at most `Integer.MAX_VALUE`. But a Java string can itself consist of `Integer.MAX_VALUE` characters, each of which may require more than one byte to represent them in modified UTF-8 format.** It follows then that this function cannot return the correct answer for all String values and yet the specification makes no mention of this, nor of any possible error to report if this situation is encountered. >> >> **The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. With compact strings this reduces to 2*`Integer.MAX_VALUE`. >> >> Solution >> >> Deprecate the existing JNI `GetStringUTFLength` method noting that it may return a truncated length, and add a new method, JNI `GetStringUTFLengthAsLong` that returns the string length as a `jlong` value. >> >> --- >> >> We also add a truncation warning to `GetStringUTFLength` under -Xcheck:jni >> >> There are some incidental whitespace changes in `src/hotspot/os/posix/dtrace/hotspot_jni.d` along with the new method entries. >> >> Testing: >> - new test added >> - tiers 1-3 sanity >> >> Thanks > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Exclude test on 32-bit Overall it looks good to me, although I don't have experience adding a new JNI API (the dtrace probes were new to me), but it seems you are following what is already in place for other functions, and the testing looks good. ------------- Marked as reviewed by cjplummer (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20784#pullrequestreview-2273477700 From vlivanov at openjdk.org Fri Aug 30 21:28:27 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 30 Aug 2024 21:28:27 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2] In-Reply-To: References: Message-ID: <7In6ieBWvAxAtJQyc1H18Eoeb_x0nmLIdUbLHeP78lo=.c87c2c53-453b-4d0f-8484-4b8f33984684@github.com> On Fri, 30 Aug 2024 12:58:17 GMT, Coleen Phillimore wrote: >> src/hotspot/share/ci/ciKlass.cpp line 233: >> >>> 231: jint ciKlass::misc_flags() { >>> 232: assert(is_loaded(), "not loaded"); >>> 233: GUARDED_VM_ENTRY( >> >> To Compiler folks: I don't think the VM_ENTRY is necessary, but if it is, then we should consider entering VM mode once and caching/memoizing these immutable flag values in the ciKlass. > > I added a global typedef klass_flags_t because it didn't look confusing vs KlassFlags and KlassFlags_t, and the lower case convention is something we usually use for typedefs. I agree with Dean. I don't see why the value can't be eagerly captured as part of `ciKlass` initialization. I'm fine with leaving it as is since it follows the existing pattern in `ciKlass::access_flags()`, so something for a future cleanup. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1739425219 From coleenp at openjdk.org Fri Aug 30 21:38:22 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 30 Aug 2024 21:38:22 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v3] In-Reply-To: <-dyvOdrDMU8UERNjLmg8NhFNta6ukiqRXrM1oJvyzc4=.f9f80021-8b7a-425f-807a-89b7dab293dc@github.com> References: <-dyvOdrDMU8UERNjLmg8NhFNta6ukiqRXrM1oJvyzc4=.f9f80021-8b7a-425f-807a-89b7dab293dc@github.com> Message-ID: On Fri, 30 Aug 2024 20:22:38 GMT, Dean Long wrote: >> Coleen Phillimore has updated the pull request incrementally with three additional commits since the last revision: >> >> - Fix jvmci code. >> - Some C2 refactoring. >> - Assembly corrections from Matias and Dean. > > src/hotspot/share/opto/library_call.cpp line 3777: > >> 3775: Node* p = basic_plus_adr(kls, in_bytes(Klass::misc_flags_offset())); >> 3776: Node* mods = make_load(nullptr, p, TypeInt::UBYTE, T_BOOLEAN, MemNode::unordered); >> 3777: return generate_mods_flags_guard(mods, modifier_mask, modifier_bits, region); > > Suggestion: > > return generate_mods_flags_guard(mods, modifier_mask, modifier_bits, region, Klass::misc_flags_offset(), TypeInt::UBYTE, T_BOOLEAN); > > This looks much better, but can't you leave the basic_plus_adr and make_load in generate_mods_flags_guard, and pass in the needed specialization? Really, this is better? it adds three parameters. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1739430586 From coleenp at openjdk.org Fri Aug 30 21:51:49 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 30 Aug 2024 21:51:49 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v5] In-Reply-To: References: Message-ID: > Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. > > Tested with tier1-7. > > NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Add parameters and rename generate_klass_flags_guard. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20719/files - new: https://git.openjdk.org/jdk/pull/20719/files/9e93b2a5..4c3a04dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20719&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20719&range=03-04 Stats: 12 lines in 2 files changed: 5 ins; 2 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20719.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20719/head:pull/20719 PR: https://git.openjdk.org/jdk/pull/20719 From coleenp at openjdk.org Fri Aug 30 22:06:26 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 30 Aug 2024 22:06:26 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2] In-Reply-To: <7In6ieBWvAxAtJQyc1H18Eoeb_x0nmLIdUbLHeP78lo=.c87c2c53-453b-4d0f-8484-4b8f33984684@github.com> References: <7In6ieBWvAxAtJQyc1H18Eoeb_x0nmLIdUbLHeP78lo=.c87c2c53-453b-4d0f-8484-4b8f33984684@github.com> Message-ID: On Fri, 30 Aug 2024 21:25:18 GMT, Vladimir Ivanov wrote: >> I added a global typedef klass_flags_t because it didn't look confusing vs KlassFlags and KlassFlags_t, and the lower case convention is something we usually use for typedefs. > > I agree with Dean. I don't see why the value can't be eagerly captured as part of `ciKlass` initialization. > > I'm fine with leaving it as is since it follows the existing pattern in `ciKlass::access_flags()`, so something for a future cleanup. Yes, I agree, this should be saved in ciKlass somewhere, along with access_flags since they are const as well, but I think someone should do this as a follow on. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1739449411 From dlong at openjdk.org Fri Aug 30 22:28:30 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 30 Aug 2024 22:28:30 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v3] In-Reply-To: References: <-dyvOdrDMU8UERNjLmg8NhFNta6ukiqRXrM1oJvyzc4=.f9f80021-8b7a-425f-807a-89b7dab293dc@github.com> Message-ID: On Fri, 30 Aug 2024 21:35:49 GMT, Coleen Phillimore wrote: >> src/hotspot/share/opto/library_call.cpp line 3777: >> >>> 3775: Node* p = basic_plus_adr(kls, in_bytes(Klass::misc_flags_offset())); >>> 3776: Node* mods = make_load(nullptr, p, TypeInt::UBYTE, T_BOOLEAN, MemNode::unordered); >>> 3777: return generate_mods_flags_guard(mods, modifier_mask, modifier_bits, region); >> >> Suggestion: >> >> return generate_mods_flags_guard(mods, modifier_mask, modifier_bits, region, Klass::misc_flags_offset(), TypeInt::UBYTE, T_BOOLEAN); >> >> This looks much better, but can't you leave the basic_plus_adr and make_load in generate_mods_flags_guard, and pass in the needed specialization? > > Really, this is better? it adds three parameters. I made this change. It reduces duplicate code, which is usually good. Yes, I like it better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1739475163 From dlong at openjdk.org Fri Aug 30 22:38:21 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 30 Aug 2024 22:38:21 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v5] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 21:51:49 GMT, Coleen Phillimore wrote: >> Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. >> >> Tested with tier1-7. >> >> NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add parameters and rename generate_klass_flags_guard. Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20719#pullrequestreview-2273596650 From fyang at openjdk.org Sat Aug 31 01:47:25 2024 From: fyang at openjdk.org (Fei Yang) Date: Sat, 31 Aug 2024 01:47:25 GMT Subject: RFR: 8339248: RISC-V: Remove li64 macro assembler routine and related code In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 12:59:40 GMT, Fei Yang wrote: > The macro assembler routine li64 and related code (is_li64_at, patch_imm_in_li64, get_target_of_li64 and check_li64_data_dependency) is unused for now. We should remove these unused code, which will save us some unnecessary runtime checks. We can add them back when needed again someday. > > Testing: > - [x] release & fastdebug build > - [x] Gtest & Tier1 test (release) Thanks everyone for the review. GHA failure is unrelated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20769#issuecomment-2322708911 From fyang at openjdk.org Sat Aug 31 01:47:25 2024 From: fyang at openjdk.org (Fei Yang) Date: Sat, 31 Aug 2024 01:47:25 GMT Subject: Integrated: 8339248: RISC-V: Remove li64 macro assembler routine and related code In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 12:59:40 GMT, Fei Yang wrote: > The macro assembler routine li64 and related code (is_li64_at, patch_imm_in_li64, get_target_of_li64 and check_li64_data_dependency) is unused for now. We should remove these unused code, which will save us some unnecessary runtime checks. We can add them back when needed again someday. > > Testing: > - [x] release & fastdebug build > - [x] Gtest & Tier1 test (release) This pull request has now been integrated. Changeset: 392bdd57 Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/392bdd5734e0ad4e616d52bb7bcafcf85dccbf34 Stats: 111 lines in 2 files changed: 0 ins; 111 del; 0 mod 8339248: RISC-V: Remove li64 macro assembler routine and related code Reviewed-by: rehn, fjiang, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/20769 From ysuenaga at openjdk.org Sat Aug 31 04:21:17 2024 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Sat, 31 Aug 2024 04:21:17 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame [v2] In-Reply-To: References: Message-ID: > I attempted to check stack trace in the core generated by [SEGV example in upcall](https://github.com/YaSuenag/garakuta/blob/841452d9176dab1ddbb552009c180530eb81190b/NativeSEGV/ffm/upcall/src/main/java/com/yasuenag/garakuta/nativesegv/upcall/Main.java) with `jhsdb jstack`, however it failed with following exception. > > > Error occurred during stack walking: > java.lang.RuntimeException: Couldn't deduce type of CodeBlob @0x00007fa04c265990 for PC=0x00007fa04c265aa6 > at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlobUnsafe(CodeCache.java:124) > at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlob(CodeCache.java:83) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.cb(Frame.java:119) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.adjustUnextendedSP(X86Frame.java:334) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.initFrame(X86Frame.java:137) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.(X86Frame.java:163) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.senderForInterpreterFrame(X86Frame.java:361) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.sender(X86Frame.java:281) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.sender(Frame.java:207) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.realSender(Frame.java:212) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.sender(VFrame.java:120) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.javaSender(VFrame.java:144) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:81) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.run(JStack.java:67) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:278) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:241) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:134) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.runWithArgs(JStack.java:90) > at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJSTACK(SALauncher.java:302) > at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:500) > Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x00007fa04c265990 (nearest symbol is _ZTV10UpcallStub) > at jdk.hotspot.agent/sun.jvm.hotspot.run... Yasumasa Suenaga has updated the pull request incrementally with two additional commits since the last revision: - Add testcase - Remove unnecessary comment from UpcallStub ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20789/files - new: https://git.openjdk.org/jdk/pull/20789/files/63cc67d4..f32ba079 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20789&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20789&range=00-01 Stats: 214 lines in 4 files changed: 212 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20789.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20789/head:pull/20789 PR: https://git.openjdk.org/jdk/pull/20789 From ysuenaga at openjdk.org Sat Aug 31 04:21:18 2024 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Sat, 31 Aug 2024 04:21:18 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame [v2] In-Reply-To: <4Gro4J2pZmpS9XGJes3pw-lSfil0jlJbm6Hf4zg1Xsc=.e08da306-1a6d-4222-9340-0885043345e3@github.com> References: <4Gro4J2pZmpS9XGJes3pw-lSfil0jlJbm6Hf4zg1Xsc=.e08da306-1a6d-4222-9340-0885043345e3@github.com> Message-ID: On Fri, 30 Aug 2024 20:28:11 GMT, Chris Plummer wrote: > Thanks for fixing this. The SA changes look fine. You'll need an FFM expert for the hotspot changes. Thank you! > Is it possible to provide a test case? Maybe make a call into some blocking native API and then run jstack on the process. I added the test for this. It passes the address of upcall to JNI, then JNI func calls it. > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/code/UpcallStub.java line 46: > >> 44: Type type = db.lookupType("UpcallStub"); >> 45: >> 46: // FIXME: add any needed fields > > I think you can remove this comment since clearly none of the fields are needed by SA. Removed this comment ------------- PR Comment: https://git.openjdk.org/jdk/pull/20789#issuecomment-2322761384 PR Review Comment: https://git.openjdk.org/jdk/pull/20789#discussion_r1739599883 From fyang at openjdk.org Sat Aug 31 06:55:18 2024 From: fyang at openjdk.org (Fei Yang) Date: Sat, 31 Aug 2024 06:55:18 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame [v2] In-Reply-To: References: Message-ID: <5MBzQp3ejBzMjW1wo3Ay1qBHkwNpJucapKnN2bVNSL4=.5146c9df-0d0e-479a-b25b-027873d40a75@github.com> On Sat, 31 Aug 2024 04:21:17 GMT, Yasumasa Suenaga wrote: >> I attempted to check stack trace in the core generated by [SEGV example in upcall](https://github.com/YaSuenag/garakuta/blob/841452d9176dab1ddbb552009c180530eb81190b/NativeSEGV/ffm/upcall/src/main/java/com/yasuenag/garakuta/nativesegv/upcall/Main.java) with `jhsdb jstack`, however it failed with following exception. >> >> >> Error occurred during stack walking: >> java.lang.RuntimeException: Couldn't deduce type of CodeBlob @0x00007fa04c265990 for PC=0x00007fa04c265aa6 >> at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlobUnsafe(CodeCache.java:124) >> at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlob(CodeCache.java:83) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.cb(Frame.java:119) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.adjustUnextendedSP(X86Frame.java:334) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.initFrame(X86Frame.java:137) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.(X86Frame.java:163) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.senderForInterpreterFrame(X86Frame.java:361) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.sender(X86Frame.java:281) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.sender(Frame.java:207) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.realSender(Frame.java:212) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.sender(VFrame.java:120) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.javaSender(VFrame.java:144) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:81) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.run(JStack.java:67) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:278) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:241) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:134) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.runWithArgs(JStack.java:90) >> at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJSTACK(SALauncher.java:302) >> at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:500) >> Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x00007fa04c265990 (nearest symbol is _ZTV10Upcall... > > Yasumasa Suenaga has updated the pull request incrementally with two additional commits since the last revision: > > - Add testcase > - Remove unnecessary comment from UpcallStub src/hotspot/cpu/x86/upcallLinker_x86_64.cpp line 397: > 395: * and also should include both saved FP and return address > 396: */ > 397: (frame_size / wordSize) + 2); Hi, I witnessed build failures for other CPU ports. Shouldn't all the callsites of `UpcallStub::create` be updated to reflect this change? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20789#discussion_r1739622883 From sspitsyn at openjdk.org Sat Aug 31 07:35:18 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 31 Aug 2024 07:35:18 GMT Subject: RFR: 8338934: vmTestbase/nsk/jvmti/*Field*Watch/TestDescription.java tests timeout intermittently [v2] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 14:50:45 GMT, Leonid Mesnik wrote: >> The tests time out because of dedlock of of the thread that is in transition and thread changing field watches. >> >> They use JvmtiThreadState_lock and JvmtiVTMSTransitionDisabler. >> >> The change field watch require disabler, but attempt to use it only when already locked in >> >> void >> JvmtiEventController::change_field_watch(jvmtiEvent event_type, bool added) { >> MutexLocker mu(JvmtiThreadState_lock); >> JvmtiEventControllerPrivate::change_field_watch(event_type, added); >> } >> >> >> while it is needed to first disable transitions and then try to use JvmtiThreadState_lock. >> I quickly looked that most of jvmti methods do it already. Also moved disabler into jvmtiEmv.cpp to be more consistent with other methods. >> >> >> I was able to verify my fix in loom repo locally. and run tier1 + tier5-svc testing in jdk. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > fixed spaces Marked as reviewed by sspitsyn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20776#pullrequestreview-2273782106 From ysuenaga at openjdk.org Sat Aug 31 09:34:09 2024 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Sat, 31 Aug 2024 09:34:09 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame [v3] In-Reply-To: References: Message-ID: > I attempted to check stack trace in the core generated by [SEGV example in upcall](https://github.com/YaSuenag/garakuta/blob/841452d9176dab1ddbb552009c180530eb81190b/NativeSEGV/ffm/upcall/src/main/java/com/yasuenag/garakuta/nativesegv/upcall/Main.java) with `jhsdb jstack`, however it failed with following exception. > > > Error occurred during stack walking: > java.lang.RuntimeException: Couldn't deduce type of CodeBlob @0x00007fa04c265990 for PC=0x00007fa04c265aa6 > at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlobUnsafe(CodeCache.java:124) > at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlob(CodeCache.java:83) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.cb(Frame.java:119) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.adjustUnextendedSP(X86Frame.java:334) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.initFrame(X86Frame.java:137) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.(X86Frame.java:163) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.senderForInterpreterFrame(X86Frame.java:361) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.sender(X86Frame.java:281) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.sender(Frame.java:207) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.realSender(Frame.java:212) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.sender(VFrame.java:120) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.javaSender(VFrame.java:144) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:81) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.run(JStack.java:67) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:278) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:241) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:134) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.runWithArgs(JStack.java:90) > at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJSTACK(SALauncher.java:302) > at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:500) > Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x00007fa04c265990 (nearest symbol is _ZTV10UpcallStub) > at jdk.hotspot.agent/sun.jvm.hotspot.run... Yasumasa Suenaga has updated the pull request incrementally with one additional commit since the last revision: Add frame size to all of UpcallStub::create() call ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20789/files - new: https://git.openjdk.org/jdk/pull/20789/files/f32ba079..90bccf1f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20789&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20789&range=01-02 Stats: 28 lines in 4 files changed: 20 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/20789.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20789/head:pull/20789 PR: https://git.openjdk.org/jdk/pull/20789 From duke at openjdk.org Sat Aug 31 10:26:22 2024 From: duke at openjdk.org (ExE Boss) Date: Sat, 31 Aug 2024 10:26:22 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v5] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 21:51:49 GMT, Coleen Phillimore wrote: >> Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. >> >> Tested with tier1-7. >> >> NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add parameters and rename generate_klass_flags_guard. src/hotspot/share/opto/library_call.hpp line 161: > 159: Node* generate_mods_flags_guard(Node* kls, > 160: int modifier_mask, int modifier_bits, > 161: RegionNode* region); This?method was?removed. Suggestion: ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1739653189 From duke at openjdk.org Sat Aug 31 11:27:22 2024 From: duke at openjdk.org (Abdelhak Zaaim) Date: Sat, 31 Aug 2024 11:27:22 GMT Subject: RFR: 8329816: Add SLEEF version 3.6.1 [v5] In-Reply-To: References: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> Message-ID: On Fri, 30 Aug 2024 16:57:18 GMT, Magnus Ihse Bursie wrote: >> [JDK-8312425](https://bugs.openjdk.org/browse/JDK-8312425) is looking to optimize vector math operations by leveraging the SLEEF library. For legal reasons the actual contribution of the SLEEF files needs to be handled separately. >> >> This is a new attempt at solving [JDK-8329816](https://bugs.openjdk.org/browse/JDK-8329816); the original attempt is here: https://github.com/openjdk/jdk/pull/19185. This PR is based on the discussions on how to move forward that was held in that original PR. > > Magnus Ihse Bursie has updated the pull request incrementally with two additional commits since the last revision: > > - Use "whitespace" as an uncountable noun > > Co-authored-by: Erik Joelsson <37597443+erikj79 at users.noreply.github.com> > - I suck at English verb forms > > Co-authored-by: Erik Joelsson <37597443+erikj79 at users.noreply.github.com> Marked as reviewed by abdelhak-zaaim at github.com (no known OpenJDK username). ------------- PR Review: https://git.openjdk.org/jdk/pull/20781#pullrequestreview-2273829185 From dholmes at openjdk.org Sat Aug 31 11:57:19 2024 From: dholmes at openjdk.org (David Holmes) Date: Sat, 31 Aug 2024 11:57:19 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: On Fri, 30 Aug 2024 10:51:30 GMT, Magnus Ihse Bursie wrote: >> I understand the cost overhead experienced by any individual Java run may be lost in the noise, but it still impacts every single Java run just to save some time/resources for the handful of builders of statically linked VMs. I am not a fan. > > @dholmes-ora This PR now has three reviewers approving it. You say you are "not a fan". Does this mean you want to veto this change? Or can you be willing to accept it, even if you do not like it? @magicus There is no "veto" power in OpenJDK. You have your reviewers, I have to accept it even if I don't like it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20666#issuecomment-2322873797 From ysuenaga at openjdk.org Sat Aug 31 12:28:19 2024 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Sat, 31 Aug 2024 12:28:19 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame [v2] In-Reply-To: <5MBzQp3ejBzMjW1wo3Ay1qBHkwNpJucapKnN2bVNSL4=.5146c9df-0d0e-479a-b25b-027873d40a75@github.com> References: <5MBzQp3ejBzMjW1wo3Ay1qBHkwNpJucapKnN2bVNSL4=.5146c9df-0d0e-479a-b25b-027873d40a75@github.com> Message-ID: On Sat, 31 Aug 2024 06:52:56 GMT, Fei Yang wrote: >> Yasumasa Suenaga has updated the pull request incrementally with two additional commits since the last revision: >> >> - Add testcase >> - Remove unnecessary comment from UpcallStub > > src/hotspot/cpu/x86/upcallLinker_x86_64.cpp line 397: > >> 395: * and also should include both saved FP and return address >> 396: */ >> 397: (frame_size / wordSize) + 2); > > Hi, I witnessed build failures for other CPU ports. Shouldn't all the callsites of `UpcallStub::create` be updated to reflect this change? Good catch! I fixed all of caller of `UpcallStub::create`, and then they passed build test. (This PR branch starts with `pr/`, so GHA did not start automatically. Thus I started it in manual.) Test for this PR was kicked in serviceability test, and it works fine on both x86_64 and aarch64. In RISC-V, GHA did not kick the test, but I believe it would work fine because stack structure is similar with x86_64 (return address and FP are seemed to store on the stack) I'm not sure on s390 and PPC64. In PPC64, I checked `UpcallLinker::make_upcall_stub`. It seems to push into the stack once as following, but I'm not sure. address start = __ function_entry(); // called by C __ save_LR_CR(R0); assert((abi._stack_alignment_bytes % 16) == 0, "must be 16 byte aligned"); // allocate frame (frame_size is also aligned, so stack is still aligned) __ push_frame(frame_size, tmp); s390 also pushes address into the stack, but SA does not have s390 implementation, so we might be able to tackle this later when SA supports s390. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20789#discussion_r1739708623 From fyang at openjdk.org Sat Aug 31 13:38:19 2024 From: fyang at openjdk.org (Fei Yang) Date: Sat, 31 Aug 2024 13:38:19 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame [v2] In-Reply-To: References: <5MBzQp3ejBzMjW1wo3Ay1qBHkwNpJucapKnN2bVNSL4=.5146c9df-0d0e-479a-b25b-027873d40a75@github.com> Message-ID: On Sat, 31 Aug 2024 12:25:49 GMT, Yasumasa Suenaga wrote: >> src/hotspot/cpu/x86/upcallLinker_x86_64.cpp line 397: >> >>> 395: * and also should include both saved FP and return address >>> 396: */ >>> 397: (frame_size / wordSize) + 2); >> >> Hi, I witnessed build failures for other CPU ports. Shouldn't all the callsites of `UpcallStub::create` be updated to reflect this change? > > Good catch! I fixed all of caller of `UpcallStub::create`, and then they passed build test. > (This PR branch starts with `pr/`, so GHA did not start automatically. Thus I started it in manual.) > > Test for this PR was kicked in serviceability test, and it works fine on both x86_64 and aarch64. > In RISC-V, GHA did not kick the test, but I believe it would work fine because stack structure is similar with x86_64 (return address and FP are seemed to store on the stack) > > I'm not sure on s390 and PPC64. > In PPC64, I checked `UpcallLinker::make_upcall_stub`. It seems to push into the stack once as following, but I'm not sure. > > address start = __ function_entry(); // called by C > __ save_LR_CR(R0); > assert((abi._stack_alignment_bytes % 16) == 0, "must be 16 byte aligned"); > // allocate frame (frame_size is also aligned, so stack is still aligned) > __ push_frame(frame_size, tmp); > > > s390 also pushes address into the stack, but SA does not have s390 implementation, so we might be able to tackle this later when SA supports s390. Yeah, it works on RISC-V as well. Thanks for the update. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20789#discussion_r1739717378