From fyang at openjdk.org Fri Sep 1 04:49:42 2023 From: fyang at openjdk.org (Fei Yang) Date: Fri, 1 Sep 2023 04:49:42 GMT Subject: RFR: 8315069: Relativize extended_sp in interpreter frames [v3] In-Reply-To: References:

Message-ID: <93UXfaU7Gw00ZYENi88x8cMbtPPrI0t9sKIfq4uASPo=.c9deba87-678c-4fdb-898d-82ee5f8692df@github.com> On Thu, 31 Aug 2023 12:05:40 GMT, Fredrik Bredberg wrote: >> Implementation of relativized extended_sp in interpreter frames for AArch64 and RISC-V. >> >> By changing the "extended_sp" member in interpreter frames from being an absolute address into an offset that is relative to the frame pointer, we don't need to change the value as we freeze and thaw frames of virtual threads. This is since we might freeze and thaw from and to different worker threads, so the absolute address to locals might change, but the offset from the frame pointer will be constant. >> >> This subtask only handles "extended_sp". The relativization of other interpreter frame members are handled in other subtasks to JDK-8289296. >> >> Tested tier1-tier7 on aarch64. RISC-V was sanity tested using Qemu. > > Fredrik Bredberg has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into 8315069_relativize_extended_sp > - Updated aarch64 after review > - 8315069: Relativize extended_sp in interpreter frames LGTM. This has passed hotspot_loom, jdk_loom and tier1-3 on my RISC-V board. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15468#pullrequestreview-1606204713 From fyang at openjdk.org Fri Sep 1 07:41:48 2023 From: fyang at openjdk.org (Fei Yang) Date: Fri, 1 Sep 2023 07:41:48 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v10] In-Reply-To: References:

Message-ID: On Tue, 29 Aug 2023 08:28:42 GMT, Ilya Gavrilin wrote: >> Please review this changes into risc-v double rounding intrinsic. >> >> On risc-v intrinsics for rounding doubles with mode (like Math.ceil/floor/rint) were missing. On risc-v we don`t have special instruction for such conversion, so two times conversion was used: double -> long int -> double (using fcvt.l.d, fcvt.d.l). >> >> Also, we should provide some rounding mode to fcvt.x.x instruction. >> >> Rounding mode selection on ceil (similar for floor and rint): according to Math.ceil requirements: >> >>> Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer (Math.java:475). >> >> For double -> long int we choose rup (round towards +inf) mode to get the integer that more than or equal to the input value. >> For long int -> double we choose rdn (rounds towards -inf) mode to get the smallest (closest to -inf) representation of integer that we got after conversion. >> >> For cases when we got inf, nan, or value more than 2^63 return input value (double value which more than 2^63 is guaranteed integer). >> As well when we store result we copy sign from input value (need for cases when for (-1.0, 0.0) ceil need to return -0.0). >> >> We have observed significant improvement on hifive and thead boards. >> >> testing: tier1, tier2 and hotspot:tier3 on hifive >> >> Performance results on hifive (FpRoundingBenchmark.testceil/floor/rint): >> >> Without intrinsic: >> >> Benchmark (TESTSIZE) Mode Cnt Score Error Units >> FpRoundingBenchmark.testceil 1024 thrpt 25 39.297 ? 0.037 ops/ms >> FpRoundingBenchmark.testfloor 1024 thrpt 25 39.398 ? 0.018 ops/ms >> FpRoundingBenchmark.testrint 1024 thrpt 25 36.388 ? 0.844 ops/ms >> >> With intrinsic: >> >> Benchmark (TESTSIZE) Mode Cnt Score Error Units >> FpRoundingBenchmark.testceil 1024 thrpt 25 80.560 ? 0.053 ops/ms >> FpRoundingBenchmark.testfloor 1024 thrpt 25 80.541 ? 0.081 ops/ms >> FpRoundingBenchmark.testrint 1024 thrpt 25 80.603 ? 0.071 ops/ms > > Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in c2_MacroAssembler_riscv.cpp Thanks for the update. Several nits remain. Otherwise LGTM. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1380: > 1378: // round out-of-range values to the nearest max or min value), therefore special > 1379: // handling is needed by NaN, +/-Infinity, +/-0. > 1380: void C2_MacroAssembler::round_double_mode(FloatRegister dst, FloatRegister src, int round_mode, Register tmp1, Register tmp2, Register tmp3) { Start a new line for the three temporary register parameters. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1383: > 1381: > 1382: assert_different_registers(dst, src); > 1383: assert_different_registers(tmp1, tmp2, tmp3); Suggestion: `assert_different_registers(dst, src, tmp1, tmp2, tmp3);` src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1385: > 1383: assert_different_registers(tmp1, tmp2, tmp3); > 1384: > 1385: // setting rounding mode for conversions Suggestion: s/setting/Set/ src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1388: > 1386: // here we use similar modes to double->long and long->double conversions > 1387: // different mode for long->double conversion matter only if long value was not representable as double > 1388: // we got long value as a result of double->long conversion so it is defenitely representable Typo: s/defenitely/definitely/ src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1409: > 1407: Label done, bad_val; > 1408: > 1409: // generating constant (tmp2) Suggestion: s/generating/Generate/ src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1412: > 1410: // tmp2 = 100...0000 > 1411: addi(tmp2, zr, 1); > 1412: slli(tmp2, tmp2, 63); Better to move these two lines with code comments after `fcvt_l_d(tmp1, src, rm);` src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1416: > 1414: fcvt_l_d(tmp1, src, rm); > 1415: > 1416: // preparing converted long (tmp1) Suggestion: s/preparing/Prepare/ src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1422: > 1420: addi(tmp3, tmp1, 1); > 1421: andi(tmp3, tmp3, -2); > 1422: beq(tmp3, tmp2, bad_val); Please start a new line here after `beq`; src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1427: > 1425: // add sign of input value to result for +/- 0 cases > 1426: fsgnj_d(dst, dst, src); > 1427: j(done); Please start a new line here after `j(done);` ------------- Changes requested by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14991#pullrequestreview-1606389159 PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1312661362 PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1312659940 PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1312662917 PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1312662397 PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1312666632 PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1312665535 PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1312671271 PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1312669272 PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1312663760 From alanb at openjdk.org Fri Sep 1 07:56:48 2023 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 1 Sep 2023 07:56:48 GMT Subject: RFR: 8268829: Provide an optimized way to walk the stack with Class object only [v9] In-Reply-To: <6hreBEM3qw8FZmOCseR6hgu4-avV-C-2oK7PlOs-IYU=.b3345812-391b-4ed1-b7a2-cdb0e63e2be6@github.com> References: <6hreBEM3qw8FZmOCseR6hgu4-avV-C-2oK7PlOs-IYU=.b3345812-391b-4ed1-b7a2-cdb0e63e2be6@github.com> Message-ID: On Thu, 31 Aug 2023 17:09:40 GMT, Mandy Chung wrote: >> 8268829: Provide an optimized way to walk the stack with Class object only >> >> `StackWalker::walk` creates one `StackFrame` per frame and the current implementation >> allocates one `StackFrameInfo` and one `MemberName` objects per frame. Some frameworks >> like logging may only interest in the Class object but not the method name nor the BCI, >> for example, filters out its implementation classes to find the caller class. It's >> similar to `StackWalker::getCallerClass` but allows a predicate to filter out the element. >> >> This PR proposes to add `Option::DROP_METHOD_INFO` enum that requests to drop the method information. If no method information is needed, a `StackWalker` with `DROP_METHOD_INFO` >> can be used instead and such stack walker will save the overhead of extracting the method information >> and the memory used for the stack walking. >> >> New factory methods to take a parameter to specify the kind of stack walker to be created are defined. >> This provides a simple way for existing code, for example logging frameworks, to take advantage of >> this enhancement with the least change as it can keep the existing function for traversing >> `StackFrame`s. >> >> For example: to find the first caller filtering a known list of implementation class, >> existing code can create a stack walker instance with `DROP_METHOD_INFO` option: >> >> >> StackWalker walker = StackWalker.getInstance(Option.DROP_METHOD_INFO, Option.RETAIN_CLASS_REFERENCE); >> Optional> callerClass = walker.walk(s -> >> s.map(StackFrame::getDeclaringClass) >> .filter(Predicate.not(implClasses::contains)) >> .findFirst()); >> >> >> If method information is accessed on the `StackFrame`s produced by this stack walker such as >> `StackFrame::getMethodName`, then `UnsupportedOperationException` will be thrown. >> >> #### Javadoc & specdiff >> >> https://cr.openjdk.org/~mchung/api/java.base/java/lang/StackWalker.html >> https://cr.openjdk.org/~mchung/jdk22/specdiff/overview-summary.html >> >> #### Alternatives Considered >> One alternative is to provide a new API: >> ` T walkClass(Function, ? extends T> function)` >> >> In this case, the caller would need to pass a function that takes a stream >> of `Class` object instead of `StackFrame`. Existing code would have to >> modify calls to the `walk` method to `walkClass` and the function body. >> >> ### Implementation Details >> >> A `StackWalker` configured with `DROP_METHOD_INFO` ... > > Mandy Chung has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: > > - Merge > - Remove the new getInstance method taking varargs > - update mode to be int rather than long > - update tests > - Review feedback on javadoc > - Revised the API change. Add Option::DROP_METHOD_INFO > - Review feedback from Remi > - fixup javadoc > - Review feedback: move JLIA to ClassFrameInfo > - review feedback and javadoc clean up > - ... and 19 more: https://git.openjdk.org/jdk/compare/c8acab1d...111661bc The API changes in the the current update (111661bc) look good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15370#issuecomment-1702328042 From fbredberg at openjdk.org Fri Sep 1 08:30:44 2023 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Fri, 1 Sep 2023 08:30:44 GMT Subject: RFR: 8315069: Relativize extended_sp in interpreter frames [v3] In-Reply-To: References:

Message-ID: On Thu, 31 Aug 2023 12:05:40 GMT, Fredrik Bredberg wrote: >> Implementation of relativized extended_sp in interpreter frames for AArch64 and RISC-V. >> >> By changing the "extended_sp" member in interpreter frames from being an absolute address into an offset that is relative to the frame pointer, we don't need to change the value as we freeze and thaw frames of virtual threads. This is since we might freeze and thaw from and to different worker threads, so the absolute address to locals might change, but the offset from the frame pointer will be constant. >> >> This subtask only handles "extended_sp". The relativization of other interpreter frame members are handled in other subtasks to JDK-8289296. >> >> Tested tier1-tier7 on aarch64. RISC-V was sanity tested using Qemu. > > Fredrik Bredberg has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into 8315069_relativize_extended_sp > - Updated aarch64 after review > - 8315069: Relativize extended_sp in interpreter frames Thank you guys for review comments, and the help with testing. If no one else has anything to add, I'll integrate (as soon as I can convince a sponsor). ------------- PR Comment: https://git.openjdk.org/jdk/pull/15468#issuecomment-1702370013 From fbredberg at openjdk.org Fri Sep 1 08:38:52 2023 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Fri, 1 Sep 2023 08:38:52 GMT Subject: Integrated: 8315069: Relativize extended_sp in interpreter frames In-Reply-To: References: Message-ID: <1S3CbnDYA-Rpd50Oy9SUdqDEXDp2o_-r8JdT8nKKXeA=.fca7ac75-bec9-4b0d-acf2-9967874e1ed2@github.com> On Tue, 29 Aug 2023 12:08:49 GMT, Fredrik Bredberg wrote: > Implementation of relativized extended_sp in interpreter frames for AArch64 and RISC-V. > > By changing the "extended_sp" member in interpreter frames from being an absolute address into an offset that is relative to the frame pointer, we don't need to change the value as we freeze and thaw frames of virtual threads. This is since we might freeze and thaw from and to different worker threads, so the absolute address to locals might change, but the offset from the frame pointer will be constant. > > This subtask only handles "extended_sp". The relativization of other interpreter frame members are handled in other subtasks to JDK-8289296. > > Tested tier1-tier7 on aarch64. RISC-V was sanity tested using Qemu. This pull request has now been integrated. Changeset: 033f311a Author: Fredrik Bredberg Committer: Andrew Haley URL: https://git.openjdk.org/jdk/commit/033f311abccc45567230c69c6e0f6d1746f3c7e4 Stats: 46 lines in 10 files changed: 32 ins; 0 del; 14 mod 8315069: Relativize extended_sp in interpreter frames Reviewed-by: haosun, aph, fyang ------------- PR: https://git.openjdk.org/jdk/pull/15468 From mgronlun at openjdk.org Fri Sep 1 12:15:25 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Fri, 1 Sep 2023 12:15:25 GMT Subject: RFR: 8315220: Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native Message-ID: Greetings, This change set fixes the issue of taking a JFR stack trace in the wrong thread state for the NativeLibraryLoad and NativeLibraryUnload events. A follow-up change set, [JDK-8315364](https://bugs.openjdk.org/browse/JDK-8315364) will add assertions to the JFR stack trace code to help find similar issues earlier. There are a few additional improvements: The event declaration in metadata.xml now includes the generating thread since a stack trace without the generating thread is subpar. In os_linux.cpp, the NativeLibraryLoad event was located after the call to dlopen(), which means that the event, declared durational, fails to capture the duration of the call. Finally, the test is extended to validate the captured stack trace. Testing: jdk_jfr, stress testing Thanks Markus ------------- Commit messages: - restore lf - 8315220 Changes: https://git.openjdk.org/jdk/pull/15535/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15535&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8315220 Stats: 384 lines in 10 files changed: 232 ins; 114 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/15535.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15535/head:pull/15535 PR: https://git.openjdk.org/jdk/pull/15535 From ayang at openjdk.org Fri Sep 1 15:55:02 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 1 Sep 2023 15:55:02 GMT Subject: RFR: 8315550: G1: Fix -Wconversion warnings in g1NUMA Message-ID: Simple `int` to `uint` for NUMA node-id. Possibly, `numa_get_leaf_groups` should accept `uint[]`. I will attempt that in another PR, as that will be mostly runtime, not G1 specific. ------------- Commit messages: - g1-numa Changes: https://git.openjdk.org/jdk/pull/15541/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15541&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8315550 Stats: 32 lines in 8 files changed: 0 ins; 1 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/15541.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15541/head:pull/15541 PR: https://git.openjdk.org/jdk/pull/15541 From duke at openjdk.org Fri Sep 1 20:03:17 2023 From: duke at openjdk.org (Ilya Gavrilin) Date: Fri, 1 Sep 2023 20:03:17 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v11] In-Reply-To: References: Message-ID: <9w-_hzgOoGqBivVcABpGKZUKxWjMb3TdFTGLxUGBSUE=.3a71ea38-db0a-4698-bed3-164aff3fee7c@github.com> > Please review this changes into risc-v double rounding intrinsic. > > On risc-v intrinsics for rounding doubles with mode (like Math.ceil/floor/rint) were missing. On risc-v we don`t have special instruction for such conversion, so two times conversion was used: double -> long int -> double (using fcvt.l.d, fcvt.d.l). > > Also, we should provide some rounding mode to fcvt.x.x instruction. > > Rounding mode selection on ceil (similar for floor and rint): according to Math.ceil requirements: > >> Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer (Math.java:475). > > For double -> long int we choose rup (round towards +inf) mode to get the integer that more than or equal to the input value. > For long int -> double we choose rdn (rounds towards -inf) mode to get the smallest (closest to -inf) representation of integer that we got after conversion. > > For cases when we got inf, nan, or value more than 2^63 return input value (double value which more than 2^63 is guaranteed integer). > As well when we store result we copy sign from input value (need for cases when for (-1.0, 0.0) ceil need to return -0.0). > > We have observed significant improvement on hifive and thead boards. > > testing: tier1, tier2 and hotspot:tier3 on hifive > > Performance results on hifive (FpRoundingBenchmark.testceil/floor/rint): > > Without intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 39.297 ? 0.037 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 39.398 ? 0.018 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 36.388 ? 0.844 ops/ms > > With intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 80.560 ? 0.053 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 80.541 ? 0.081 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 80.603 ? 0.071 ops/ms Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: Fixes in code style ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14991/files - new: https://git.openjdk.org/jdk/pull/14991/files/09ad14aa..77e0537a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14991&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14991&range=09-10 Stats: 22 lines in 2 files changed: 6 ins; 3 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/14991.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14991/head:pull/14991 PR: https://git.openjdk.org/jdk/pull/14991 From dcubed at openjdk.org Sun Sep 3 13:50:43 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Sun, 3 Sep 2023 13:50:43 GMT Subject: RFR: 8313202: MutexLocker should disallow null Mutexes [v5] In-Reply-To: <5JqYLMqwPshUm2wyR1rQHndI08xNyDDkj1EqrJwIl3k=.11918918-a906-48ca-8053-e8fa8648d8cd@github.com> References: <5JqYLMqwPshUm2wyR1rQHndI08xNyDDkj1EqrJwIl3k=.11918918-a906-48ca-8053-e8fa8648d8cd@github.com> Message-ID: On Thu, 24 Aug 2023 07:55:01 GMT, Aleksey Shipilev wrote: >> As seen in [JDK-8313081](https://bugs.openjdk.org/browse/JDK-8313081), it is fairly easy to pass nullptr `Mutex` to `MutexLocker` by accident, which would just silently avoid the lock. >> >> There are a few places in Hotspot where we pass `nullptr` to simulate re-entrancy and/or conditionally take the lock. Those places can be more explicit, and the default `MutexLocker` can disallow nullptrs for extra safety. >> >> More thorough testing with different GC/JIT combinations is running now, we might find more issues there. Meanwhile, please comment on the approach. >> >> Additional testing: >> - [x] `grep -R "MutexLocker " src/hotspot | grep -i null`, only new `ConditionalMutexLocker` hits >> - [x] `grep -R "MutexLocker " src/hotspot | grep -i ?`, no hits >> - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` x `Serial Parallel G1 Shenandoah` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into JDK-8313202-mutexlocker-nulls > - Merge branch 'master' into JDK-8313202-mutexlocker-nulls > - Accept one more potentially nullptr mutex > - Merge branch 'master' into JDK-8313202-mutexlocker-nulls > - Replace ReentrantMutexLocker with ConditionalMutexLocker > - Workaround for JDK-8313210 > - Fixing CodeCache analytics > - Initial work Thumbs up, but I do have some questions about some of the new ConditionalMutexLocker uses. src/hotspot/share/classfile/classLoader.cpp line 941: > 939: > 940: void ClassLoader::release_load_zip_library() { > 941: ConditionalMutexLocker locker(Zip_lock, Zip_lock != nullptr, Monitor::_no_safepoint_check_flag); Why is this one now `ConditionalMutexLocker`? src/hotspot/share/code/stubs.cpp line 241: > 239: > 240: void StubQueue::print() { > 241: ConditionalMutexLocker lock(_mutex, _mutex != nullptr, Mutex::_no_safepoint_check_flag); Why is this one now a `ConditionalMutexLocker`? src/hotspot/share/runtime/mutexLocker.hpp line 274: > 272: public: > 273: MonitorLocker(Monitor* monitor, Mutex::SafepointCheckFlag flag = Mutex::_safepoint_check_flag) : > 274: MutexLocker(monitor, flag), _flag(flag) {} The assert will now be: `"null mutex is not allowed"` instead of `"null monitor not allowed"`. Not really a problem, just trying to make it clear. src/hotspot/share/runtime/mutexLocker.hpp line 277: > 275: > 276: MonitorLocker(Thread* thread, Monitor* monitor, Mutex::SafepointCheckFlag flag = Mutex::_safepoint_check_flag) : > 277: MutexLocker(thread, monitor, flag), _flag(flag) {} The assert will now be: `"null mutex is not allowed"` instead of `"null monitor not allowed"`. Not really a problem, just trying to make it clear. ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15043#pullrequestreview-1608528700 PR Review Comment: https://git.openjdk.org/jdk/pull/15043#discussion_r1314262278 PR Review Comment: https://git.openjdk.org/jdk/pull/15043#discussion_r1314262163 PR Review Comment: https://git.openjdk.org/jdk/pull/15043#discussion_r1314260745 PR Review Comment: https://git.openjdk.org/jdk/pull/15043#discussion_r1314260860 From vkempik at openjdk.org Sun Sep 3 17:24:39 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Sun, 3 Sep 2023 17:24:39 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v10] In-Reply-To: References:

Message-ID: On Fri, 1 Sep 2023 07:13:57 GMT, Fei Yang wrote: >> Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix typo in c2_MacroAssembler_riscv.cpp > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1383: > >> 1381: >> 1382: assert_different_registers(dst, src); >> 1383: assert_different_registers(tmp1, tmp2, tmp3); > > Suggestion: `assert_different_registers(dst, src, tmp1, tmp2, tmp3);` I doubt we can use assert_different_registers with different typos of register classes ( Register and FloatRegister here) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1314291535 From duke at openjdk.org Sun Sep 3 20:35:05 2023 From: duke at openjdk.org (Ilya Gavrilin) Date: Sun, 3 Sep 2023 20:35:05 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v12] In-Reply-To: References: Message-ID: <9nCNQfV7XquuDD8AediLi2kMcoXNFOLzXypxcn7tpEQ=.d96b9da3-a611-4eb1-9b7c-3e60f65c6b25@github.com> > Please review this changes into risc-v double rounding intrinsic. > > On risc-v intrinsics for rounding doubles with mode (like Math.ceil/floor/rint) were missing. On risc-v we don`t have special instruction for such conversion, so two times conversion was used: double -> long int -> double (using fcvt.l.d, fcvt.d.l). > > Also, we should provide some rounding mode to fcvt.x.x instruction. > > Rounding mode selection on ceil (similar for floor and rint): according to Math.ceil requirements: > >> Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer (Math.java:475). > > For double -> long int we choose rup (round towards +inf) mode to get the integer that more than or equal to the input value. > For long int -> double we choose rdn (rounds towards -inf) mode to get the smallest (closest to -inf) representation of integer that we got after conversion. > > For cases when we got inf, nan, or value more than 2^63 return input value (double value which more than 2^63 is guaranteed integer). > As well when we store result we copy sign from input value (need for cases when for (-1.0, 0.0) ceil need to return -0.0). > > We have observed significant improvement on hifive and thead boards. > > testing: tier1, tier2 and hotspot:tier3 on hifive > > Performance results on hifive (FpRoundingBenchmark.testceil/floor/rint): > > Without intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 39.297 ? 0.037 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 39.398 ? 0.018 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 36.388 ? 0.844 ops/ms > > With intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 80.560 ? 0.053 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 80.541 ? 0.081 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 80.603 ? 0.071 ops/ms Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: Fixes in assertion in c2 macroassembler ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14991/files - new: https://git.openjdk.org/jdk/pull/14991/files/77e0537a..b597ddd2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14991&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14991&range=10-11 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14991.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14991/head:pull/14991 PR: https://git.openjdk.org/jdk/pull/14991 From fyang at openjdk.org Mon Sep 4 00:26:41 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 4 Sep 2023 00:26:41 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v10] In-Reply-To: References:

Message-ID: On Sun, 3 Sep 2023 17:22:07 GMT, Vladimir Kempik wrote: >> src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1383: >> >>> 1381: >>> 1382: assert_different_registers(dst, src); >>> 1383: assert_different_registers(tmp1, tmp2, tmp3); >> >> Suggestion: `assert_different_registers(dst, src, tmp1, tmp2, tmp3);` > > I doubt we can use assert_different_registers with different types of register classes ( Register and FloatRegister here) That's right. Thanks for pointing this out. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1314342742 From fyang at openjdk.org Mon Sep 4 00:29:43 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 4 Sep 2023 00:29:43 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v10] In-Reply-To: References:

Message-ID: <-C8RR1jLN3Ap5fErAjS9cPZEQ2-nvVWl_RKMZcNosEc=.4d90dd53-1786-4a35-800f-ca4c5fcdcedb@github.com> On Fri, 1 Sep 2023 07:20:10 GMT, Fei Yang wrote: >> Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix typo in c2_MacroAssembler_riscv.cpp > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1412: > >> 1410: // tmp2 = 100...0000 >> 1411: addi(tmp2, zr, 1); >> 1412: slli(tmp2, tmp2, 63); > > Better to move these two lines with code comments after `fcvt_l_d(tmp1, src, rm);` In fact, I mean, move this two lines after `fcvt_l_d(tmp1, src, rm);` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1314343375 From pli at openjdk.org Mon Sep 4 01:30:03 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 4 Sep 2023 01:30:03 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References:

Message-ID: On Tue, 11 Jul 2023 10:02:52 GMT, Pengfei Li wrote: >> Yes, you can remove old code first. And work on new implementation after that. > > Thanks @vnkozlov and @eme64, I just created https://github.com/openjdk/jdk/pull/14824 for the legacy code cleanup. > @pfustc This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! This pull request is not dead. I'm currently doing some refactoring and part of this work in separate pull requests. I will come back after those. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14581#issuecomment-1704480439 From jwaters at openjdk.org Mon Sep 4 04:26:52 2023 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 4 Sep 2023 04:26:52 GMT Subject: RFR: 8307160: [REDO] Enable the permissive- flag on the Microsoft Visual C compiler [v4] In-Reply-To: References: <7piLRto5nNbhYYYfENCr5ecm4M2xNtMkjkE8XhrLLQ0=.8fd1ac3a-46f8-47a8-ae37-a4abbf7757d9@github.com> Message-ID: <7ZF55XhZ0rnp3kL1VVR73r7_FINgAixdnoWg-xX0T8E=.8f6282f0-3de7-4ba3-8fa8-08bf2e190a7a@github.com> On Thu, 17 Aug 2023 08:38:01 GMT, Julian Waters wrote: >> We should set the -permissive- flag for the Microsoft Visual C compiler, as was requested by the now backed out [JDK-8241499](https://bugs.openjdk.org/browse/JDK-8241499). Doing so makes the Visual C compiler much less accepting of ill formed code, which will improve code quality on Windows in the future. It can be done with some effort, given that the significantly stricter gcc can now compile an experimental Windows JDK as of 2023, and will serve to significantly cut down on monstrosities in ancient Windows code > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Document changes in awt_DnDDS.cpp Bumping? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15096#issuecomment-1704589862 From shade at openjdk.org Mon Sep 4 06:44:00 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 4 Sep 2023 06:44:00 GMT Subject: RFR: 8313202: MutexLocker should disallow null Mutexes [v6] In-Reply-To: References: Message-ID: > As seen in [JDK-8313081](https://bugs.openjdk.org/browse/JDK-8313081), it is fairly easy to pass nullptr `Mutex` to `MutexLocker` by accident, which would just silently avoid the lock. > > There are a few places in Hotspot where we pass `nullptr` to simulate re-entrancy and/or conditionally take the lock. Those places can be more explicit, and the default `MutexLocker` can disallow nullptrs for extra safety. > > More thorough testing with different GC/JIT combinations is running now, we might find more issues there. Meanwhile, please comment on the approach. > > Additional testing: > - [x] `grep -R "MutexLocker " src/hotspot | grep -i null`, only new `ConditionalMutexLocker` hits > - [x] `grep -R "MutexLocker " src/hotspot | grep -i ?`, no hits > - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` x `Serial Parallel G1 Shenandoah` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Merge branch 'master' into JDK-8313202-mutexlocker-nulls - Merge branch 'master' into JDK-8313202-mutexlocker-nulls - Merge branch 'master' into JDK-8313202-mutexlocker-nulls - Merge branch 'master' into JDK-8313202-mutexlocker-nulls - Accept one more potentially nullptr mutex - Merge branch 'master' into JDK-8313202-mutexlocker-nulls - Replace ReentrantMutexLocker with ConditionalMutexLocker - Workaround for JDK-8313210 - Fixing CodeCache analytics - Initial work ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15043/files - new: https://git.openjdk.org/jdk/pull/15043/files/7a11505f..8dc9cde7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15043&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15043&range=04-05 Stats: 13700 lines in 325 files changed: 9233 ins; 2313 del; 2154 mod Patch: https://git.openjdk.org/jdk/pull/15043.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15043/head:pull/15043 PR: https://git.openjdk.org/jdk/pull/15043 From shade at openjdk.org Mon Sep 4 07:31:46 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 4 Sep 2023 07:31:46 GMT Subject: RFR: 8313202: MutexLocker should disallow null Mutexes [v5] In-Reply-To: References: <5JqYLMqwPshUm2wyR1rQHndI08xNyDDkj1EqrJwIl3k=.11918918-a906-48ca-8053-e8fa8648d8cd@github.com> Message-ID: On Sun, 3 Sep 2023 13:47:09 GMT, Daniel D. Daugherty wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8313202-mutexlocker-nulls >> - Merge branch 'master' into JDK-8313202-mutexlocker-nulls >> - Accept one more potentially nullptr mutex >> - Merge branch 'master' into JDK-8313202-mutexlocker-nulls >> - Replace ReentrantMutexLocker with ConditionalMutexLocker >> - Workaround for JDK-8313210 >> - Fixing CodeCache analytics >> - Initial work > > src/hotspot/share/classfile/classLoader.cpp line 941: > >> 939: >> 940: void ClassLoader::release_load_zip_library() { >> 941: ConditionalMutexLocker locker(Zip_lock, Zip_lock != nullptr, Monitor::_no_safepoint_check_flag); > > Why is this one now `ConditionalMutexLocker`? This one we know about: https://bugs.openjdk.org/browse/JDK-8313210 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15043#discussion_r1314541152 From jvernee at openjdk.org Mon Sep 4 08:16:21 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 4 Sep 2023 08:16:21 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v6] In-Reply-To: References: Message-ID: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: - Merge branch 'master' into JEP22 - Merge branch 'master' into JEP22 - remove spurious imports - enable fallback linker on linux x86 in GHA - make Arena::allocate abstract - 8313894: Rename isTrivial linker option to critical Reviewed-by: pminborg, mcimadamore - 8313680: Disallow combining caputreCallState with isTrivial Reviewed-by: mcimadamore - Merge branch 'master' into JEP22 - use immutable map for fallback linker canonical layouts - 8313265: Move the FFM API out of preview Reviewed-by: mcimadamore - ... and 12 more: https://git.openjdk.org/jdk/compare/adfc1d6c...fd0512f8 ------------- Changes: https://git.openjdk.org/jdk/pull/15103/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=05 Stats: 2839 lines in 233 files changed: 1257 ins; 894 del; 688 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From jvernee at openjdk.org Mon Sep 4 08:25:25 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 4 Sep 2023 08:25:25 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v7] In-Reply-To: References: Message-ID: <-iK5sthUj7XUrkiRP38qwQ3mknsNaR70ipgnvOhdykY=.0093ab55-a85a-492f-95c7-2fab747eb551@github.com> > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: add test for unmodifiable canonical layouts map ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15103/files - new: https://git.openjdk.org/jdk/pull/15103/files/fd0512f8..efc5ef4a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=05-06 Stats: 6 lines in 1 file changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From aph at openjdk.org Mon Sep 4 09:30:40 2023 From: aph at openjdk.org (Andrew Haley) Date: Mon, 4 Sep 2023 09:30:40 GMT Subject: RFR: JDK-8314890: Reduce number of loads for Klass decoding in static code [v4] In-Reply-To: References:

Message-ID: On Thu, 31 Aug 2023 18:14:23 GMT, Thomas Stuefe wrote: >> Small change that reduces the number of loads generated by the C++ compiler for a narrow Klass decoding operation (`CompressedKlassPointers::decode_xxx()`. >> >> Stock: three loads (with two probably sharing a cache line) - UseCompressedClassPointers, encoding base and shift. >> >> >> 8b7b62: 48 8d 05 7f 1b c3 00 lea 0xc31b7f(%rip),%rax # 14e96e8 >> 8b7b69: 0f b6 00 movzbl (%rax),%eax >> 8b7b6c: 84 c0 test %al,%al >> 8b7b6e: 0f 84 9c 00 00 00 je 8b7c10 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> >> 8b7b74: 48 8d 15 05 62 c6 00 lea 0xc66205(%rip),%rdx # 151dd80 <_ZN23CompressedKlassPointers6_shiftE> >> 8b7b7b: 8b 7b 08 mov 0x8(%rbx),%edi >> 8b7b7e: 8b 0a mov (%rdx),%ecx >> 8b7b80: 48 8d 15 01 62 c6 00 lea 0xc66201(%rip),%rdx # 151dd88 <_ZN23CompressedKlassPointers5_baseE> >> 8b7b87: 48 d3 e7 shl %cl,%rdi >> 8b7b8a: 48 03 3a add (%rdx),%rdi >> >> >> Patched: one load loads all three. Since shift occupies the lowest 8 bits, compiled code uses 8bit register; ditto the UseCompressedOops flag. >> >> >> 8ba302: 48 8d 05 97 9c c2 00 lea 0xc29c97(%rip),%rax # 14e3fa0 <_ZN23CompressedKlassPointers6_comboE> >> 8ba309: 48 8b 08 mov (%rax),%rcx >> 8ba30c: f6 c5 01 test $0x1,%ch # use compressed klass pointers? >> 8ba30f: 0f 84 9b 00 00 00 je 8ba3b0 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> >> 8ba315: 8b 7b 08 mov 0x8(%rbx),%edi >> 8ba318: 48 d3 e7 shl %cl,%rdi # shift >> 8ba31b: 66 31 c9 xor %cx,%cx # zero out lower 16 bits of base >> 8ba31e: 48 01 cf add %rcx,%rdi # add base >> 8ba321: 8b 4f 08 mov 0x8(%rdi),%ecx >> >> --- >> >> Performance measurements: >> >> G1, doing a full GC over a heap filled with 256 mio life j.l.Object instances. >> >> I see a reduction of Full Pause times between 1.2% and 5%. I am unsure how reliable these numbers are since, despite my efforts (running tests on isolated CPUs etc.), the standard deviation was quite high at ?4%. Still, in general, numbers seemed to go down rather than up. >> >> --- >> >> Future extensions: >> >> This patch uses the fact that the encoding base is aligned to metaspace reser... > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge branch 'openjdk:master' into optimize-narrow-klass-decoding-in-c++ > - APH feedback > - Merge branch 'master' into optimize-narrow-klass-decoding-in-c++ > - fix -UseCCP case > - use 16 bit alignment > - with raw bit ops So, I was wondering if there is there some reason to do all this manually? It looks like an obvious candidate for bitfields. ------------- PR Review: https://git.openjdk.org/jdk/pull/15389#pullrequestreview-1609141760 From shade at openjdk.org Mon Sep 4 09:44:11 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 4 Sep 2023 09:44:11 GMT Subject: RFR: 8313202: MutexLocker should disallow null Mutexes [v7] In-Reply-To: References: Message-ID: <4vvR0KmqJsa1PmnblvSkgmsx6gv6n5DMYLbfbsEwaq0=.5b34be95-9ecc-46f6-b9a0-6c5372bdefe2@github.com> > As seen in [JDK-8313081](https://bugs.openjdk.org/browse/JDK-8313081), it is fairly easy to pass nullptr `Mutex` to `MutexLocker` by accident, which would just silently avoid the lock. > > There are a few places in Hotspot where we pass `nullptr` to simulate re-entrancy and/or conditionally take the lock. Those places can be more explicit, and the default `MutexLocker` can disallow nullptrs for extra safety. > > More thorough testing with different GC/JIT combinations is running now, we might find more issues there. Meanwhile, please comment on the approach. > > Additional testing: > - [x] `grep -R "MutexLocker " src/hotspot | grep -i null`, only new `ConditionalMutexLocker` hits > - [x] `grep -R "MutexLocker " src/hotspot | grep -i ?`, no hits > - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` x `Serial Parallel G1 Shenandoah` Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: - Touchup whitespace - Rewrite jvmtiManageCapabilities lock usage - Re-instate old asserts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15043/files - new: https://git.openjdk.org/jdk/pull/15043/files/8dc9cde7..3676fa71 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15043&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15043&range=05-06 Stats: 35 lines in 3 files changed: 15 ins; 8 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/15043.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15043/head:pull/15043 PR: https://git.openjdk.org/jdk/pull/15043 From shade at openjdk.org Mon Sep 4 09:44:15 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 4 Sep 2023 09:44:15 GMT Subject: RFR: 8313202: MutexLocker should disallow null Mutexes [v5] In-Reply-To: References: <5JqYLMqwPshUm2wyR1rQHndI08xNyDDkj1EqrJwIl3k=.11918918-a906-48ca-8053-e8fa8648d8cd@github.com> Message-ID: On Sun, 3 Sep 2023 13:46:13 GMT, Daniel D. Daugherty wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8313202-mutexlocker-nulls >> - Merge branch 'master' into JDK-8313202-mutexlocker-nulls >> - Accept one more potentially nullptr mutex >> - Merge branch 'master' into JDK-8313202-mutexlocker-nulls >> - Replace ReentrantMutexLocker with ConditionalMutexLocker >> - Workaround for JDK-8313210 >> - Fixing CodeCache analytics >> - Initial work > > src/hotspot/share/code/stubs.cpp line 241: > >> 239: >> 240: void StubQueue::print() { >> 241: ConditionalMutexLocker lock(_mutex, _mutex != nullptr, Mutex::_no_safepoint_check_flag); > > Why is this one now a `ConditionalMutexLocker`? I think there was a test failure that indicated we enter here from some error path? Let me reproduce it again. > src/hotspot/share/runtime/mutexLocker.hpp line 277: > >> 275: >> 276: MonitorLocker(Thread* thread, Monitor* monitor, Mutex::SafepointCheckFlag flag = Mutex::_safepoint_check_flag) : >> 277: MutexLocker(thread, monitor, flag), _flag(flag) {} > > The assert will now be: `"null mutex is not allowed"` > instead of `"null monitor not allowed"`. Not really a > problem, just trying to make it clear. Right. I think there is no reason to sub-class `MutexLocker` and trying to save the line of code there. Instead, let's subclass `MutexLockerImpl`, and do the proper asserts. New commit reinstates the asserts and some adjacent comments in `MonitorLocking`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15043#discussion_r1314693244 PR Review Comment: https://git.openjdk.org/jdk/pull/15043#discussion_r1314692724 From jvernee at openjdk.org Mon Sep 4 10:56:09 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 4 Sep 2023 10:56:09 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v8] In-Reply-To: References: Message-ID: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: fix TestIllegalLink on x86 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15103/files - new: https://git.openjdk.org/jdk/pull/15103/files/efc5ef4a..470fcb9d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=06-07 Stats: 7 lines in 1 file changed: 0 ins; 7 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From lkorinth at openjdk.org Mon Sep 4 11:04:44 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Mon, 4 Sep 2023 11:04:44 GMT Subject: RFR: 8315097: Rename createJavaProcessBuilder [v3] In-Reply-To: References:

Message-ID: <5o5B7LbCQN_C9xzd1EvrvTp04-6Atr0gih5WH69LeK4=.3a977034-8fe9-4da8-a167-f5dad3a97d75@github.com> On Wed, 30 Aug 2023 09:23:55 GMT, Leo Korinth wrote: >> Rename createJavaProcessBuilder so that it is not used by mistake instead of createTestJvm. >> >> I have used the following sed script: `find -name "*.java" | xargs -n 1 sed -i -e "s/createJavaProcessBuilder(/createJavaProcessBuilderIgnoreTestJavaOpts(/g"` >> >> Then I have manually modified ProcessTools.java. In that file I have moved one version of createJavaProcessBuilder so that it is close to the other version. Then I have added a javadoc comment in bold telling: >> >> /** >> * Create ProcessBuilder using the java launcher from the jdk to >> * be tested. >> * >> *

> 151: * The following table shows some examples of how C types are modelled in Linux/x64 (all the examples provided > 152: * here will assume these platform-dependent mappings): Up to you, but it might be useful to link to the ABI specifications if the links are considered stable. src/java.base/share/classes/java/lang/foreign/MemoryLayout.java line 439: > 437: * > 438: *

> 439: * If the provided layout path {@code P} contains no dereference elements, then the offset of the access operation is Suggestion: * If the provided layout path {@code P} contains no dereference elements, then the offset {@code O} of the access operation is src/java.base/share/classes/java/lang/foreign/MemoryLayout.java line 443: > 441: * > 442: * {@snippet lang = "java": > 443: * offset = this.offsetHandle(P).invokeExact(B, I1, I2, ... In); Suggestion: * O = this.offsetHandle(P).invokeExact(B, I1, I2, ... In); To align with the use of `O` later on. src/java.base/share/classes/java/lang/foreign/MemoryLayout.java line 536: > 534: * > 535: *

> 536: * The offset of the returned segment is computed as if by a call to a Suggestion: * The offset {@code O} of the returned segment is computed as if by a call to a src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 154: > 152: * MemoryLayout.sequenceLayout(4, ValueLayout.JAVA_INT).withName("data") // array of 4 elements > 153: * ); > 154: * VarHandle intHandle = segmentLayout.varHandle(MemoryLayout.PathElemenet.groupElement("data"), Suggestion: * VarHandle intHandle = segmentLayout.varHandle(MemoryLayout.PathElement.groupElement("data"), src/java.base/share/classes/java/lang/invoke/MethodHandles.java line 8027: > 8025: * @since 19 > 8026: */ > 8027: @PreviewFeature(feature=PreviewFeature.Feature.FOREIGN) Unused import to `PreviewFeature`, and possibly others too. src/java.base/share/classes/jdk/internal/foreign/StringSupport.java line 45: > 43: case DOUBLE_BYTE -> readFast_short(segment, offset, charset); > 44: case QUAD_BYTE -> readFast_int(segment, offset, charset); > 45: default -> throw new UnsupportedOperationException("Unsupported charset: " + charset); Is this necessary, since the switch expression should be exhaustive over all the enum values? ------------- PR Review: https://git.openjdk.org/jdk/pull/15103#pullrequestreview-1611924844 PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1316401360 PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1316402470 PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1316409959 PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1316410079 PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1316414805 PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1316437803 PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1316457079 PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1316444767 From msheppar at openjdk.org Tue Sep 5 22:25:39 2023 From: msheppar at openjdk.org (Mark Sheppard) Date: Tue, 5 Sep 2023 22:25:39 GMT Subject: RFR: 8315097: Rename createJavaProcessBuilder [v3] In-Reply-To: References: Message-ID: On Wed, 30 Aug 2023 09:23:55 GMT, Leo Korinth wrote: >> Rename createJavaProcessBuilder so that it is not used by mistake instead of createTestJvm. >> >> I have used the following sed script: `find -name "*.java" | xargs -n 1 sed -i -e "s/createJavaProcessBuilder(/createJavaProcessBuilderIgnoreTestJavaOpts(/g"` >> >> Then I have manually modified ProcessTools.java. In that file I have moved one version of createJavaProcessBuilder so that it is close to the other version. Then I have added a javadoc comment in bold telling: >> >> /** >> * Create ProcessBuilder using the java launcher from the jdk to >> * be tested. >> * >> *

>> 151: * The following table shows some examples of how C types are modelled in Linux/x64 (all the examples provided >> 152: * here will assume these platform-dependent mappings): > > Up to you, but it might be useful to link to the ABI specifications if the links are considered stable. There is this: https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf I'm not sure how stable this is. I don't think that website is an authoritative source. (at least, not to the degree it is for e.g. AArch64: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#595homogeneous-aggregates). Note also that is a draft. I think the final version is paywalled. Alternatively, we could refer to the name only: "System V Application Binary Interface - AMD64 Architecture Processor Supplement" (or "x86-64 psABI") Then people can google for themselves and find it. For instance, [this SO question](https://stackoverflow.com/a/40348010) points to a gitlab repo with a more up to date version: https://gitlab.com/x86-psABIs/x86-64-ABI/-/jobs/artifacts/master/raw/x86-64-ABI/abi.pdf?job=build ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1317100703 From jvernee at openjdk.org Wed Sep 6 11:20:18 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 6 Sep 2023 11:20:18 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v13] In-Reply-To: References: Message-ID: <97cANIvE2FMHvs_WDQ4BXFxfnjgtJ7fLfbLfM_aCONA=.a832d87f-5847-492b-9843-2b88346090f1@github.com> > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: - PPC linker changes - Merge branch 'master' into JEP22 - Paul's review comments - Fix typo in doc Co-authored-by: Paul Sandoz - 8315096: Allowed access modes in memory segment should depend on layout alignment Reviewed-by: psandoz - Add missing @implSpec to AddressLayout Reviewed-by: pminborg - Fix misc typos in FFM API javadoc Reviewed-by: jvernee - Clarify javadoc w.r.t. exceptions thrown by a memory access var handle (part two) Reviewed-by: pminborg - Clarify javadoc w.r.t. exceptions thrown by a memory access var handle Reviewed-by: jvernee - remove unsupported linker test - ... and 24 more: https://git.openjdk.org/jdk/compare/62a953f4...b8bb791f ------------- Changes: https://git.openjdk.org/jdk/pull/15103/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=12 Stats: 3176 lines in 238 files changed: 1480 ins; 951 del; 745 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From rehn at openjdk.org Wed Sep 6 11:47:40 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 6 Sep 2023 11:47:40 GMT Subject: RFR: 8315743: RISC-V: Cleanup masm lr()/sc() methods In-Reply-To: References: Message-ID: On Wed, 6 Sep 2023 09:16:31 GMT, Fei Yang wrote: > That looks more reasonable. Thanks for the cleanup. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15578#issuecomment-1708184670 From dfuchs at openjdk.org Wed Sep 6 11:54:38 2023 From: dfuchs at openjdk.org (Daniel Fuchs) Date: Wed, 6 Sep 2023 11:54:38 GMT Subject: RFR: 8267174: Many test files have the wrong Copyright header In-Reply-To: References: Message-ID: On Tue, 5 Sep 2023 22:49:41 GMT, Erik Joelsson wrote: > There are a number of files in the `test` directory that have an incorrect copyright header, which includes the "classpath" exception text. This patch removes that text from all test files that I could find it in. I did this using a combination of `sed` and `grep`. Reviewing this patch is probably easier using the raw patch file or a suitable webrev format. > > It's my assumption that these headers were introduced by mistake as it's quite easy to copy the wrong template when creating new files. jmx, jndi, and net changes LGTM ------------- PR Review: https://git.openjdk.org/jdk/pull/15573#pullrequestreview-1613147541 From jvernee at openjdk.org Wed Sep 6 12:01:27 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 6 Sep 2023 12:01:27 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v14] In-Reply-To: References: Message-ID: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: remove reference to allocateArray ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15103/files - new: https://git.openjdk.org/jdk/pull/15103/files/b8bb791f..52df58f5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=12-13 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From volker.simonis at gmail.com Wed Sep 6 13:02:55 2023 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 6 Sep 2023 15:02:55 +0200 Subject: Question on why sun.management MBeans are not exported? Message-ID: Hi, I recently looked for an easy way to get the CPU time spent by JIT-compiler and GC threads in Java (e.g exported by IBM J9's JvmCpuMonitorMXBean [0]). An easy way to achieve this is in OpenJDK is by using the "sun.management.HotspotInternal" MBean which exports the "sun.management:type=HotspotThreading" MBean with the attributes InternalThreadCount/InternalThreadCpuTimes (among other useful HotSpot internal counters and metrics). Up until JDK 16/17 the usage of the "sun.management.HotspotInternal" MBean was straightforward, although it resulted in an illegal reflective access warning since JDK 9. Since JDK 17 its usage requires the " --add-exports java.management/sun.management=ALL-UNNAMED" for the monitored JVM. I wonder why "sun.management" was encapsulated in the first place? I understand that it is not an "officially supported" API, but I find it still quite useful. If we really don't want to export it, I wonder why we are maintaining it at all (we even have JTreg tests for it under "test/jdk/sun/management"), because in the current configuration it isn't particularly useful. Notice that jconsole supports the "sun.management.HotspotInternal" MBean if started with "J-Djconsole.showUnsupported=true" and the "java.management" module kind of tries to support jconsole with the following exports: exports sun.management to jdk.jconsole, jdk.management, jdk.management.agent; However, that doesn't help even if jconsole monitors itself (for the general case, where jconsole monitors a different JVM process it can't work anyway). With JDK 16 and "--illegal-access=debug" we can see why: WARNING: Illegal reflective access by sun.reflect.misc.Trampoline to method sun.management.HotspotThreadMBean.getInternalThreadCount() at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) at java.base/sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:260) at java.management/com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at java.management/com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at java.management/com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at java.management/com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83) at java.management/com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206) at java.management/com.sun.jmx.mbeanserver.MBeanSupport.getAttributes(MBeanSupport.java:213) at java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttributes(DefaultMBeanServerInterceptor.java:701) at java.management/com.sun.jmx.mbeanserver.JmxMBeanServer.getAttributes(JmxMBeanServer.java:705) Notice that 'sun.reflect.misc.Trampoline' is loaded by a custom class loader ('java.base/sun.reflect.misc.MethodUtil') and not considered part of the base module. So to cut a long story short, I see several options: 1. Publicly export sun.management and restore the JDK 8 (or pre JDK 16) behavior. This would certainly require some polishing (e.g. some of the corresponding JVM functionality has already been removed [1]) but I think it could still be quite useful. 2. Port the useful functionality from the "sun.management" MBeans to corresponding "com.sun.management" MBeans and remove the "sun.management" MBeans. 3. Remove the "sun.management" MBeans without substitution. What do you think? Thank you and best regards, Volker [0] https://www.ibm.com/docs/en/sdk-java-technology/8?topic=interfaces-language-management [1] https://bugs.openjdk.org/browse/JDK-8134607 From aivanov at openjdk.org Wed Sep 6 13:09:38 2023 From: aivanov at openjdk.org (Alexey Ivanov) Date: Wed, 6 Sep 2023 13:09:38 GMT Subject: RFR: 8267174: Many test files have the wrong Copyright header In-Reply-To: References: Message-ID: On Tue, 5 Sep 2023 22:49:41 GMT, Erik Joelsson wrote: > There are a number of files in the `test` directory that have an incorrect copyright header, which includes the "classpath" exception text. This patch removes that text from all test files that I could find it in. I did this using a combination of `sed` and `grep`. Reviewing this patch is probably easier using the raw patch file or a suitable webrev format. > > It's my assumption that these headers were introduced by mistake as it's quite easy to copy the wrong template when creating new files. Client changes look good. I've looked through all the files, other files look good too. ------------- Marked as reviewed by aivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15573#pullrequestreview-1613295743 From Alan.Bateman at oracle.com Wed Sep 6 13:47:23 2023 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Wed, 6 Sep 2023 14:47:23 +0100 Subject: Question on why sun.management MBeans are not exported? In-Reply-To: References: Message-ID: <1535aa6e-6865-d885-3930-df7f9ebcb4b4@oracle.com> On 06/09/2023 14:02, Volker Simonis wrote: > : > > I wonder why "sun.management" was encapsulated in the first place? I > understand that it is not an "officially supported" API, but I find it > still quite useful. sun.management.* is JDK internal so not something for code outside the JDK to use directly. The only sun.* packages that are exported to all modules are the "critical internal APIs" in the jdk.unsupported module. JEP 260 has the details. > : > > So to cut a long story short, I see several options: > > 1. Publicly export sun.management and restore the JDK 8 (or pre JDK > 16) behavior. This would certainly require some polishing (e.g. some > of the corresponding JVM functionality has already been removed [1]) > but I think it could still be quite useful. > 2. Port the useful functionality from the "sun.management" MBeans to > corresponding "com.sun.management" MBeans and remove the > "sun.management" MBeans. > 3. Remove the "sun.management" MBeans without substitution. > > What do you think? If there are JDK-specific or HotSpot VM specific features where there is a compelling case for a management interface then com.sun.management is good place to prototype new APIs. You may already be familiar with com.sun.management.HotSpotDiagnosticMBean. -Alan From shade at openjdk.org Wed Sep 6 14:37:40 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 6 Sep 2023 14:37:40 GMT Subject: RFR: 8313202: MutexLocker should disallow null Mutexes [v7] In-Reply-To: <9G9U6OIj_Ju9ptIA9wF4Ys3NClzKxis3nQJyzIsXaPU=.fc5cdb06-11e8-44b5-9774-7cfe262bff81@github.com> References: <4vvR0KmqJsa1PmnblvSkgmsx6gv6n5DMYLbfbsEwaq0=.5b34be95-9ecc-46f6-b9a0-6c5372bdefe2@github.com> <9G9U6OIj_Ju9ptIA9wF4Ys3NClzKxis3nQJyzIsXaPU=.fc5cdb06-11e8-44b5-9774-7cfe262bff81@github.com> Message-ID: On Tue, 5 Sep 2023 21:40:42 GMT, Daniel D. Daugherty wrote: > Thumbs up. What kind of testing has been done? See "Additional testing" in PR body. I ran a matrix of tiers and different GCs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15043#issuecomment-1708495753 From volker.simonis at gmail.com Wed Sep 6 15:17:26 2023 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 6 Sep 2023 17:17:26 +0200 Subject: Question on why sun.management MBeans are not exported? In-Reply-To: <1535aa6e-6865-d885-3930-df7f9ebcb4b4@oracle.com> References: <1535aa6e-6865-d885-3930-df7f9ebcb4b4@oracle.com> Message-ID: On Wed, Sep 6, 2023 at 3:47?PM Alan Bateman wrote: > > On 06/09/2023 14:02, Volker Simonis wrote: > > : > > > > I wonder why "sun.management" was encapsulated in the first place? I > > understand that it is not an "officially supported" API, but I find it > > still quite useful. > sun.management.* is JDK internal so not something for code outside the > JDK to use directly. The only sun.* packages that are exported to all > modules are the "critical internal APIs" in the jdk.unsupported module. > JEP 260 has the details. I'm familiar with JEP 260. But wouldn't you agree that an "encapsulated" monitoring API is an oxymoron? A monitoring API is by design intended for external usage and completely useless to the platform itself. There's no single usage of the "sun.management" MBeans in the JDK itself (except for jconsole where the encapsulation broke it). My assumption is that the corresponding MBeans in "sun.management" are there for historic reasons (added in JDK 1.5) and would have made much more sense in "com.sun.management" package. But I doubt that they can be classified in the "internal implementation details of the JDK and never intended for external use? category of JEP 260. Anyway, if you classify the MBeans in "sun.management" as non-critical internal APIs (with respect to JEP 260) but without any "internal" usage, than we should really remove them, right, because an internal API without any internal usage doesn't make any sense? I'll then try to come up with a proposal to port some of the more useful MBeans functionality in "sun.management" to "com.sun.management". Thank you and best regards, Volker > > > > : > > > > So to cut a long story short, I see several options: > > > > 1. Publicly export sun.management and restore the JDK 8 (or pre JDK > > 16) behavior. This would certainly require some polishing (e.g. some > > of the corresponding JVM functionality has already been removed [1]) > > but I think it could still be quite useful. > > 2. Port the useful functionality from the "sun.management" MBeans to > > corresponding "com.sun.management" MBeans and remove the > > "sun.management" MBeans. > > 3. Remove the "sun.management" MBeans without substitution. > > > > What do you think? > If there are JDK-specific or HotSpot VM specific features where there is > a compelling case for a management interface then com.sun.management is > good place to prototype new APIs. You may already be familiar with > com.sun.management.HotSpotDiagnosticMBean. > > -Alan From azafari at openjdk.org Wed Sep 6 15:29:22 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Wed, 6 Sep 2023 15:29:22 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v4] In-Reply-To: References: Message-ID: > The `find` method now is > ```C++ > template > int find(T* token, bool f(T*, E)) const { > ... > > Any other functions which use this are also changed. > Local linux-x64-debug hotspot:tier1 passed. Mach5 tier1 build on linux and Windows passed. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: changed the `E` param of find methods to `const E&`. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15418/files - new: https://git.openjdk.org/jdk/pull/15418/files/266f6feb..d70f6141 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15418&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15418&range=02-03 Stats: 15 lines in 9 files changed: 0 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/15418.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15418/head:pull/15418 PR: https://git.openjdk.org/jdk/pull/15418 From iris at openjdk.org Wed Sep 6 16:02:41 2023 From: iris at openjdk.org (Iris Clark) Date: Wed, 6 Sep 2023 16:02:41 GMT Subject: RFR: 8267174: Many test files have the wrong Copyright header In-Reply-To: References: Message-ID: On Tue, 5 Sep 2023 22:49:41 GMT, Erik Joelsson wrote: > There are a number of files in the `test` directory that have an incorrect copyright header, which includes the "classpath" exception text. This patch removes that text from all test files that I could find it in. I did this using a combination of `sed` and `grep`. Reviewing this patch is probably easier using the raw patch file or a suitable webrev format. > > It's my assumption that these headers were introduced by mistake as it's quite easy to copy the wrong template when creating new files. Thanks for fixing! ------------- Marked as reviewed by iris (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15573#pullrequestreview-1613704179 From psandoz at openjdk.org Wed Sep 6 16:06:47 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 6 Sep 2023 16:06:47 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v10] In-Reply-To: References: Message-ID: On Wed, 6 Sep 2023 10:50:59 GMT, Jorn Vernee wrote: >> src/java.base/share/classes/java/lang/foreign/Linker.java line 152: >> >>> 150: *

>>> 151: * The following table shows some examples of how C types are modelled in Linux/x64 (all the examples provided >>> 152: * here will assume these platform-dependent mappings): >> >> Up to you, but it might be useful to link to the ABI specifications if the links are considered stable. > > [This SO question](https://stackoverflow.com/a/40348010) points to a gitlab repo that seems to have the latest version: https://gitlab.com/x86-psABIs/x86-64-ABI But, I'm not sure how stable that is, or if that's an authoritative source. > > Alternatively, we could refer to the name only: "System V Application Binary Interface - AMD64 Architecture Processor Supplement" (or "x86-64 psABI") Then people can google for themselves and find it. Yeah, its hard to find the official and latest version. Referring to the full title will help. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1317517319 From erikj at openjdk.org Wed Sep 6 16:09:41 2023 From: erikj at openjdk.org (Erik Joelsson) Date: Wed, 6 Sep 2023 16:09:41 GMT Subject: RFR: 8267174: Many test files have the wrong Copyright header In-Reply-To: References: Message-ID: On Tue, 5 Sep 2023 23:12:51 GMT, Chris Plummer wrote: > I wonder if this is the right thing to do for the hprof files. I believe they originated from some hprof tools that we no longer ship. 3rd parties might choose to integrate them into their own tools. Do you think I should revert them? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15573#issuecomment-1708676439 From lkorinth at openjdk.org Wed Sep 6 16:24:45 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Wed, 6 Sep 2023 16:24:45 GMT Subject: RFR: 8315097: Rename createJavaProcessBuilder [v3] In-Reply-To: References: Message-ID: On Wed, 30 Aug 2023 09:23:55 GMT, Leo Korinth wrote: >> Rename createJavaProcessBuilder so that it is not used by mistake instead of createTestJvm. >> >> I have used the following sed script: `find -name "*.java" | xargs -n 1 sed -i -e "s/createJavaProcessBuilder(/createJavaProcessBuilderIgnoreTestJavaOpts(/g"` >> >> Then I have manually modified ProcessTools.java. In that file I have moved one version of createJavaProcessBuilder so that it is close to the other version. Then I have added a javadoc comment in bold telling: >> >> /** >> * Create ProcessBuilder using the java launcher from the jdk to >> * be tested. >> * >> *

Please observe that you likely should use >> * createTestJvm() instead of this method because createTestJvm() >> * will add JVM options from "test.vm.opts" and "test.java.opts" >> * and this method will not do that. >> * >> * @param command Arguments to pass to the java command. >> * @return The ProcessBuilder instance representing the java command. >> */ >> >> >> I have used the name createJavaProcessBuilderIgnoreTestJavaOpts because of the name of Utils.prependTestJavaOpts that adds those VM flags. If you have a better name I could do a rename of the method. I kind of like that it is long and clumsy, that makes it harder to use... >> >> I have run tier 1 testing, and I have started more exhaustive testing. > > Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: > > fix static import I think you are missing the point. If you take a look at [the parent bug of the sub task](https://bugs.openjdk.org/browse/JDK-8314823) you can see that the problem described is *not* that people are using `createTestJvm` in error. The problem is that they are (or possibly are) using `createJavaProcessBuilder` in error. Thus renaming `createTestJvm` might help a little at most for this specific problem. Renaming `createJavaProcessBuilder` most probably helps *more*. I guess the alternative of forcing the user to make a choice using an enum value will help even more. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15452#issuecomment-1708705105 From simonis at openjdk.org Wed Sep 6 16:34:46 2023 From: simonis at openjdk.org (Volker Simonis) Date: Wed, 6 Sep 2023 16:34:46 GMT Subject: RFR: 8307352: AARCH64: Improve itable_stub [v10] In-Reply-To: References: Message-ID: On Tue, 29 Aug 2023 15:56:50 GMT, Boris Ulasevich wrote: >> This is a change for AARCH similar to https://github.com/openjdk/jdk/pull/13460 >> >> The change replaces two separate iterations over the itable with a new algorithm consisting of two loops. First, we look for a match with resolved_klass, checking for a match with holder_klass along the way. Then we continue iterating (not starting over) the itable using the second loop, checking only for a match with holder_klass. >> >> InterfaceCalls openjdk benchmark performance results on A53, A72, Neoverse N1 and V1 micro-architectures: >> >> >> Cortex-A53 (Pi 3 Model B Rev 1.2) >> >> test1stInt2Types 37.5 37.358 0.38 >> test1stInt3Types 160.166 148.04 8.19 >> test1stInt5Types 158.131 147.955 6.88 >> test2ndInt2Types 52.634 53.291 -1.23 >> test2ndInt3Types 201.39 181.603 10.90 >> test2ndInt5Types 195.722 176.707 10.76 >> testIfaceCall 157.453 140.498 12.07 >> testIfaceExtCall 175.46 154.351 13.68 >> testMonomorphic 32.052 32.039 0.04 >> AVG: 6.85 >> >> Cortex-A72 (Pi 4 Model B Rev 1.2) >> >> test1stInt2Types 27.4796 27.4738 0.02 >> test1stInt3Types 66.0085 64.9374 1.65 >> test1stInt5Types 67.9812 66.2316 2.64 >> test2ndInt2Types 32.0581 32.062 -0.01 >> test2ndInt3Types 68.2715 65.6643 3.97 >> test2ndInt5Types 68.1012 65.8024 3.49 >> testIfaceCall 64.0684 64.1811 -0.18 >> testIfaceExtCall 91.6226 81.5867 12.30 >> testMonomorphic 26.7161 26.7142 0.01 >> AVG: 2.66 >> >> Neoverse N1 (m6g.metal) >> >> test1stInt2Types 2.9104 2.9086 0.06 >> test1stInt3Types 10.9642 10.2909 6.54 >> test1stInt5Types 10.9607 10.2856 6.56 >> test2ndInt2Types 3.3410 3.3478 -0.20 >> test2ndInt3Types 12.3291 11.3089 9.02 >> test2ndInt5Types 12.328 11.2704 9.38 >> testIfaceCall 11.0598 10.3657 6.70 >> testIfaceExtCall 13.0692 11.2826 15.84 >> testMonomorphic 2.2354 2.2341 0.06 >> AVG: 6.00 >> >> Neoverse V1 (c7g.2xlarge) >> >> test1stInt2Types 2.2317 2.2320 -0.01 >> test1stInt3Types 6.6884 6.1911 8.03 >> test1stInt5Types 6.7334 6.2193 8.27 >> test2ndInt2Types 2.4002 2.4013 -0.04 >> test2ndInt3Types 7.9603 7.0372 13.12 >> test2ndInt5Types 7.9532 7.0474 12.85 >> testIfaceCall 6.7028 6.3272 5.94 >> testIfaceExtCall 8.3253 6.941... > > Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: > > Address base_plus_offset_reg encoding: assert->guarantee for shift() == size check Still good. ------------- Marked as reviewed by simonis (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13792#pullrequestreview-1613767370 From cjplummer at openjdk.org Wed Sep 6 16:52:42 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 6 Sep 2023 16:52:42 GMT Subject: RFR: 8267174: Many test files have the wrong Copyright header In-Reply-To: References: Message-ID: On Wed, 6 Sep 2023 16:06:29 GMT, Erik Joelsson wrote: > > I wonder if this is the right thing to do for the hprof files. I believe they originated from some hprof tools that we no longer ship. 3rd parties might choose to integrate them into their own tools. > > Do you think I should revert them? I'm not sure. I think you need to consult someone with expertise in this area. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15573#issuecomment-1708757719 From mchung at openjdk.org Wed Sep 6 16:53:31 2023 From: mchung at openjdk.org (Mandy Chung) Date: Wed, 6 Sep 2023 16:53:31 GMT Subject: RFR: 8268829: Provide an optimized way to walk the stack with Class object only [v10] In-Reply-To: References: Message-ID: > 8268829: Provide an optimized way to walk the stack with Class object only > > `StackWalker::walk` creates one `StackFrame` per frame and the current implementation > allocates one `StackFrameInfo` and one `MemberName` objects per frame. Some frameworks > like logging may only interest in the Class object but not the method name nor the BCI, > for example, filters out its implementation classes to find the caller class. It's > similar to `StackWalker::getCallerClass` but allows a predicate to filter out the element. > > This PR proposes to add `Option::DROP_METHOD_INFO` enum that requests to drop the method information. If no method information is needed, a `StackWalker` with `DROP_METHOD_INFO` > can be used instead and such stack walker will save the overhead of extracting the method information > and the memory used for the stack walking. > > New factory methods to take a parameter to specify the kind of stack walker to be created are defined. > This provides a simple way for existing code, for example logging frameworks, to take advantage of > this enhancement with the least change as it can keep the existing function for traversing > `StackFrame`s. > > For example: to find the first caller filtering a known list of implementation class, > existing code can create a stack walker instance with `DROP_METHOD_INFO` option: > > > StackWalker walker = StackWalker.getInstance(Option.DROP_METHOD_INFO, Option.RETAIN_CLASS_REFERENCE); > Optional> callerClass = walker.walk(s -> > s.map(StackFrame::getDeclaringClass) > .filter(Predicate.not(implClasses::contains)) > .findFirst()); > > > If method information is accessed on the `StackFrame`s produced by this stack walker such as > `StackFrame::getMethodName`, then `UnsupportedOperationException` will be thrown. > > #### Javadoc & specdiff > > https://cr.openjdk.org/~mchung/api/java.base/java/lang/StackWalker.html > https://cr.openjdk.org/~mchung/jdk22/specdiff/overview-summary.html > > #### Alternatives Considered > One alternative is to provide a new API: > ` T walkClass(Function, ? extends T> function)` > > In this case, the caller would need to pass a function that takes a stream > of `Class` object instead of `StackFrame`. Existing code would have to > modify calls to the `walk` method to `walkClass` and the function body. > > ### Implementation Details > > A `StackWalker` configured with `DROP_METHOD_INFO` option creates `ClassFrameInfo[]` > buffer that is filled by the VM during stack walking. `Sta... Mandy Chung has updated the pull request incrementally with two additional commits since the last revision: - Generify StackFrameInfo to replace duplicated code - replace href with @linkplain ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15370/files - new: https://git.openjdk.org/jdk/pull/15370/files/111661bc..a623b9dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15370&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15370&range=08-09 Stats: 146 lines in 2 files changed: 22 ins; 97 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/15370.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15370/head:pull/15370 PR: https://git.openjdk.org/jdk/pull/15370 From mchung at openjdk.org Wed Sep 6 16:56:50 2023 From: mchung at openjdk.org (Mandy Chung) Date: Wed, 6 Sep 2023 16:56:50 GMT Subject: RFR: 8268829: Provide an optimized way to walk the stack with Class object only [v9] In-Reply-To: References: <6hreBEM3qw8FZmOCseR6hgu4-avV-C-2oK7PlOs-IYU=.b3345812-391b-4ed1-b7a2-cdb0e63e2be6@github.com> Message-ID: <-IQ8wGpAOQ7BHe2A-anvu-YcxARJWCMUFPdEV4xwj2U=.45bc706a-155b-4185-a6e7-e336e562352b@github.com> On Thu, 31 Aug 2023 23:17:02 GMT, Brent Christian wrote: >> Mandy Chung has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: >> >> - Merge >> - Remove the new getInstance method taking varargs >> - update mode to be int rather than long >> - update tests >> - Review feedback on javadoc >> - Revised the API change. Add Option::DROP_METHOD_INFO >> - Review feedback from Remi >> - fixup javadoc >> - Review feedback: move JLIA to ClassFrameInfo >> - review feedback and javadoc clean up >> - ... and 19 more: https://git.openjdk.org/jdk/compare/c8acab1d...111661bc > > src/java.base/share/classes/java/lang/StackStreamFactory.java line 657: > >> 655: static final class ClassFrameBuffer extends FrameBuffer { >> 656: final StackWalker walker; >> 657: ClassFrameInfo[] classFrames; // caller class for fast path > > Maybe I missed it, but I don't see any differences between `ClassFramesBuffer` and `StackFrameBuffer` other than the `ClassFrameInfo`/`StackFrameInfo` types. Could a single, generified Buffer class serve for both? Good observation. There is some performance difference when the buffer is created via core reflection vs via bytecode invocation. StackFrameTraverser and LiveStackFrameTraverser can use the generified version. As `getCallerClass` is performance sensitive, need to keep `ClassFrameBuffer`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15370#discussion_r1317577196 From mandy.chung at oracle.com Wed Sep 6 17:13:45 2023 From: mandy.chung at oracle.com (mandy.chung at oracle.com) Date: Wed, 6 Sep 2023 10:13:45 -0700 Subject: Question on why sun.management MBeans are not exported? In-Reply-To: References: <1535aa6e-6865-d885-3930-df7f9ebcb4b4@oracle.com> Message-ID: On 9/6/23 8:17 AM, Volker Simonis wrote: > Anyway, if you classify the MBeans in "sun.management" as non-critical > internal APIs (with respect to JEP 260) but without any "internal" > usage, than we should really remove them, right, because an internal > API without any internal usage doesn't make any sense? We added these HotSpot internal MBeans in JDK 5 to expose the internal metrics.? Most of these internal metrics are exposed via jstat tool too.?? We didn't receive much feedback regarding these HotSpot internal MBeans.??? Removing them is fine and good cleanup effort. Mandy > > I'll then try to come up with a proposal to port some of the more > useful MBeans functionality in "sun.management" to > "com.sun.management". > > Thank you and best regards, > Volker > >> >>> : >>> >>> So to cut a long story short, I see several options: >>> >>> 1. Publicly export sun.management and restore the JDK 8 (or pre JDK >>> 16) behavior. This would certainly require some polishing (e.g. some >>> of the corresponding JVM functionality has already been removed [1]) >>> but I think it could still be quite useful. >>> 2. Port the useful functionality from the "sun.management" MBeans to >>> corresponding "com.sun.management" MBeans and remove the >>> "sun.management" MBeans. >>> 3. Remove the "sun.management" MBeans without substitution. >>> >>> What do you think? >> If there are JDK-specific or HotSpot VM specific features where there is >> a compelling case for a management interface then com.sun.management is >> good place to prototype new APIs. You may already be familiar with >> com.sun.management.HotSpotDiagnosticMBean. >> >> -Alan From alanb at openjdk.org Wed Sep 6 17:45:41 2023 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 6 Sep 2023 17:45:41 GMT Subject: RFR: 8267174: Many test files have the wrong Copyright header In-Reply-To: References: Message-ID: On Wed, 6 Sep 2023 16:49:39 GMT, Chris Plummer wrote: > > I wonder if this is the right thing to do for the hprof files. I believe they originated from some hprof tools that we no longer ship. 3rd parties might choose to integrate them into their own tools. > > Do you think I should revert them? They are test classes now. If someone does want to copy them into their own repo then I assume they can take it from an old repo, maybe from when the "hat" tool existed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15573#issuecomment-1708828207 From Alan.Bateman at oracle.com Wed Sep 6 17:50:42 2023 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Wed, 6 Sep 2023 18:50:42 +0100 Subject: Question on why sun.management MBeans are not exported? In-Reply-To: References: <1535aa6e-6865-d885-3930-df7f9ebcb4b4@oracle.com> Message-ID: <70186c9a-79ba-e63d-7ed9-1033dece525c@oracle.com> On 06/09/2023 16:17, Volker Simonis wrote: > : > I'm familiar with JEP 260. But wouldn't you agree that an > "encapsulated" monitoring API is an oxymoron? A monitoring API is by > design intended for external usage and completely useless to the > platform itself. There's no single usage of the "sun.management" > MBeans in the JDK itself (except for jconsole where the encapsulation > broke it). My assumption is that the corresponding MBeans in > "sun.management" are there for historic reasons (added in JDK 1.5) and > would have made much more sense in "com.sun.management" package. But I > doubt that they can be classified in the "internal implementation > details of the JDK and never intended for external use? category of > JEP 260. It's left over from experiments on exposing some internal metrics, I think during JDK 5. It's code that should probably have been removed a long time ago. -Alan From simonis at openjdk.org Wed Sep 6 17:57:46 2023 From: simonis at openjdk.org (Volker Simonis) Date: Wed, 6 Sep 2023 17:57:46 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v2] In-Reply-To: References: Message-ID: On Wed, 30 Aug 2023 22:58:41 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > address dholmes@ comments First of all thanks for this PR. I think it is useful and once it's done I think we're interested in implementing the corresponding counters for Shenandoah and also for the JIT compiler threads. First some general comments: - Instead of `PerfVariable` I'd use `PerfCounter` because they represent a "[*data value that can (should) be modified in a monotonic manner*](https://github.com/openjdk/jdk/blob/bd477810b176696e0fd043f5594663ebcf9884cf/src/hotspot/share/runtime/perfData.hpp#L419-L427)" rather than "*being modified in an unrestricted manner*". - Please put the counters into the appropriate namespace (e.g. `sun.gc.collector..cpu_time`). This fits better with the existing counters (we currently don't have counters outside the `java.` or `sun.` namespaces) and makes it easier for follow up changes to implement the corresponding counters for additional GCs. - Can you please aggregate all the different CPU time counters from the different GC phases into one GC CPU time counter (e.g. `sun.gc.cpu_time`). This would be the same for different GCs and would simplify the monitoring of GC overhead independently of the used GC algorithm. - Please add a test for the new counters. src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 461: > 459: > 460: _g1_concurrent_mark_threads_cpu_time = > 461: PerfDataManager::create_variable(NULL_NS, "g1_conc_mark_thread_time", See my general comment about name spaces. The name should be something like `*.cpu_time`. src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 119: > 117: EXCEPTION_MARK; > 118: _g1_concurrent_refine_threads_cpu_time = > 119: PerfDataManager::create_variable(NULL_NS, "g1_conc_refine_thread_time", See my general comment about name spaces. The name should be something like `*.cpu_time`. src/hotspot/share/gc/shared/collectedHeap.cpp line 279: > 277: > 278: _perf_parallel_gc_threads_cpu_time = > 279: PerfDataManager::create_variable(NULL_NS, "par_gc_thread_time", See my general comment about name spaces. The name should be something like `*.cpu_time`. src/hotspot/share/gc/shared/collectedHeap.hpp line 147: > 145: // Perf counters for CPU time of parallel GC threads. Defined here in order to > 146: // be reused for all collectors. > 147: PerfVariable* _perf_parallel_gc_threads_cpu_time; If this is intended to be reused for other GCs then rename to something more generic like `_perf_gc_threads_cpu_time`. src/hotspot/share/gc/shared/collectedHeap.hpp line 562: > 560: // hsperfdata counter. > 561: > 562: class ThreadTotalCPUTimeClosure: public ThreadClosure { This class is not really related to GC (and you are already using it to get the CPU time of the `VMThread`) so I wonder if we should put this in a more generic location (maybe `thread.hpp` or something similar). This will become even more relevant if we implement CPU time for JIT compiler threads because I don't think we want to have a dependency on `collectedHeap.hpp` there. src/hotspot/share/runtime/vmThread.cpp line 141: > 139: PerfData::U_Ticks, CHECK); > 140: _perf_vm_thread_cpu_time = > 141: PerfDataManager::create_variable(NULL_NS, "vm_thread_time", Please move to the corresponding namespace and rename to something like `*.cpu_time`. ------------- Changes requested by simonis (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15082#pullrequestreview-1613868256 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1317612034 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1317613690 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1317629008 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1317630761 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1317637000 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1317639739 From eastigeevich at openjdk.org Wed Sep 6 18:50:46 2023 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 6 Sep 2023 18:50:46 GMT Subject: RFR: 8307352: AARCH64: Improve itable_stub [v10] In-Reply-To: References: Message-ID: On Tue, 29 Aug 2023 15:56:50 GMT, Boris Ulasevich wrote: >> This is a change for AARCH similar to https://github.com/openjdk/jdk/pull/13460 >> >> The change replaces two separate iterations over the itable with a new algorithm consisting of two loops. First, we look for a match with resolved_klass, checking for a match with holder_klass along the way. Then we continue iterating (not starting over) the itable using the second loop, checking only for a match with holder_klass. >> >> InterfaceCalls openjdk benchmark performance results on A53, A72, Neoverse N1 and V1 micro-architectures: >> >> >> Cortex-A53 (Pi 3 Model B Rev 1.2) >> >> test1stInt2Types 37.5 37.358 0.38 >> test1stInt3Types 160.166 148.04 8.19 >> test1stInt5Types 158.131 147.955 6.88 >> test2ndInt2Types 52.634 53.291 -1.23 >> test2ndInt3Types 201.39 181.603 10.90 >> test2ndInt5Types 195.722 176.707 10.76 >> testIfaceCall 157.453 140.498 12.07 >> testIfaceExtCall 175.46 154.351 13.68 >> testMonomorphic 32.052 32.039 0.04 >> AVG: 6.85 >> >> Cortex-A72 (Pi 4 Model B Rev 1.2) >> >> test1stInt2Types 27.4796 27.4738 0.02 >> test1stInt3Types 66.0085 64.9374 1.65 >> test1stInt5Types 67.9812 66.2316 2.64 >> test2ndInt2Types 32.0581 32.062 -0.01 >> test2ndInt3Types 68.2715 65.6643 3.97 >> test2ndInt5Types 68.1012 65.8024 3.49 >> testIfaceCall 64.0684 64.1811 -0.18 >> testIfaceExtCall 91.6226 81.5867 12.30 >> testMonomorphic 26.7161 26.7142 0.01 >> AVG: 2.66 >> >> Neoverse N1 (m6g.metal) >> >> test1stInt2Types 2.9104 2.9086 0.06 >> test1stInt3Types 10.9642 10.2909 6.54 >> test1stInt5Types 10.9607 10.2856 6.56 >> test2ndInt2Types 3.3410 3.3478 -0.20 >> test2ndInt3Types 12.3291 11.3089 9.02 >> test2ndInt5Types 12.328 11.2704 9.38 >> testIfaceCall 11.0598 10.3657 6.70 >> testIfaceExtCall 13.0692 11.2826 15.84 >> testMonomorphic 2.2354 2.2341 0.06 >> AVG: 6.00 >> >> Neoverse V1 (c7g.2xlarge) >> >> test1stInt2Types 2.2317 2.2320 -0.01 >> test1stInt3Types 6.6884 6.1911 8.03 >> test1stInt5Types 6.7334 6.2193 8.27 >> test2ndInt2Types 2.4002 2.4013 -0.04 >> test2ndInt3Types 7.9603 7.0372 13.12 >> test2ndInt5Types 7.9532 7.0474 12.85 >> testIfaceCall 6.7028 6.3272 5.94 >> testIfaceExtCall 8.3253 6.941... > > Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: > > Address base_plus_offset_reg encoding: assert->guarantee for shift() == size check Overall lgtm ------------- Marked as reviewed by eastigeevich (Committer). PR Review: https://git.openjdk.org/jdk/pull/13792#pullrequestreview-1613999051 From mchung at openjdk.org Wed Sep 6 20:17:46 2023 From: mchung at openjdk.org (Mandy Chung) Date: Wed, 6 Sep 2023 20:17:46 GMT Subject: RFR: 8268829: Provide an optimized way to walk the stack with Class object only [v8] In-Reply-To: References: <6LniLY2k906SYWewwmwQoGQrJ2kI9MM88pqUPgjl7b8=.8522ef13-0571-47aa-a088-0f57e2e7ab8d@github.com> Message-ID: On Wed, 30 Aug 2023 22:01:28 GMT, Brent Christian wrote: >> Mandy Chung has updated the pull request incrementally with three additional commits since the last revision: >> >> - update mode to be int rather than long >> - update tests >> - Review feedback on javadoc > > src/java.base/share/classes/java/lang/ClassFrameInfo.java line 37: > >> 35: int flags; // updated by VM to set hidden and caller-sensitive bits >> 36: >> 37: ClassFrameInfo(StackWalker walker) { > > The StackFrameInfo constructor has comment. Maybe add one here, too? sure. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15370#discussion_r1317783076 From amenkov at openjdk.org Wed Sep 6 20:43:44 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 6 Sep 2023 20:43:44 GMT Subject: RFR: 8312174: missing JVMTI events from vthreads parked during JVMTI attach In-Reply-To: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> References: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> Message-ID: <7cWSn9ZypedqQPPkGV8xFhFfpGYHMJcDZ5TH6Hel4mQ=.49235d1c-737c-465e-833c-a26d814cb7ac@github.com> On Tue, 29 Aug 2023 10:09:21 GMT, Serguei Spitsyn wrote: > This update fixes two important issues: > - Issue reported by a bug submitter about missing JVMTI events on virtual threads after an a JVMTI agent dynamic attach > - Known scalability/performance issue: a need to lazily create `JvmtiThreadState's` for virtual threads > > The issue is tricky to fix because the existing mechanism of the JVMTI event management does not support unmounted virtual threads. The JVMTI `SetEventNotificationMode()` calls the function `JvmtiEventControllerPrivate::recompute_enabled()` > which inspects a `JavaThread's` list and for each thread in the list recomputes enabled event bits with the function `JvmtiEventControllerPrivate::recompute_thread_enabled()`. The `JvmtiThreadState` of each thread is created but only when it is really needed, eg, if any of the thread filtered events is enabled. There was an initial adjustment of this mechanism for virtual threads which accounted for both carrier and virtual threads when a virtual thread is mounted. However, it does not work for unmounted virtual threads. A temporary work around was to always create `JvmtiThreadState` for each virtual thread eagerly at a thread starting point. > > This fix introduces new function `JvmtiExport::get_jvmti_thread_state()` which checks if thread is virtual and there is a thread filtered event enabled globally, and if so, forces a creation of the `JvmtiThreadState`. Another adjustment was needed because the function `state_for_while_locked()` can be called directly in some contexts. New function `JvmtiEventController::recompute_thread_filtered()` was introduced to make necessary corrections. > > Testing: > - new test from the bug report was adopted: `test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest` > - ran mach5 tiers 1-6: all are passed test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/VThreadEventTest.java line 91: > 89: > 90: try (ExecutorService executorService = Executors.newVirtualThreadPerTaskExecutor()) { > 91: for (int tCnt = 0; tCnt < TCNT1; tCnt++) { Could you please add a comment before each test group creation block about expected state test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/VThreadEventTest.java line 100: > 98: mready.await(); > 99: try { > 100: // timeout is big enough to keep mounted untill interrupted The comment is misleading. 1st group of threads are expected to be unmounted during attach and mounted after the threads are interrupted. test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/VThreadEventTest.java line 136: > 134: ready1.await(); > 135: mready.decr(); > 136: VirtualMachine vm = VirtualMachine.attach(String.valueOf(ProcessHandle.current().pid())); I think sleep is needed here so threads which should be unmounted have time to unmount before attach. test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/VThreadEventTest.java line 141: > 139: log("main: completedNo: " + completedNo); > 140: attached = true; > 141: for (Thread t : threads) { AFAIU threads in 3rd group (TCNT3) should be unmounted (with LockSupport.parkNanos) before they are interrupted. Then we need sleep here test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/VThreadEventTest.java line 149: > 147: for (int sleepNo = 0; sleepNo < 10 && threadEndCount() < THREAD_CNT; sleepNo++) { > 148: log("main: wait iter: " + sleepNo); > 149: Thread.sleep(100); sleep(1000)? (comment before the loop tells about 10 secs) test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/libVThreadEventTest.cpp line 37: > 35: > 36: namespace { > 37: std::mutex lock; This mutex is only to make access to counters atomic. It would be clearer to make counters std::atomic and remove the mutex ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1317795256 PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1317794334 PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1317796891 PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1317811234 PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1317804389 PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1317802305 From matsaave at openjdk.org Wed Sep 6 21:07:02 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 6 Sep 2023 21:07:02 GMT Subject: RFR: 8292692: Move MethodCounters inline functions out of method.hpp Message-ID: The inline functions related to MethodCounters in method.hpp can be moved to the inline file to reduce the number of includes. Verified with tier 1-5 tests. Below is a comparison of the old and new include statistics: Old ---- scanning 836 methodCounters.hpp 2 found 836 method.hpp scanning 837 invocationCounter.hpp 2 found 836 method.hpp 3 found 836 methodCounters.hpp 4 found 649 interp_masm_x86.hpp 5 found 0 interp_masm_aarch64.hpp 6 found 0 interp_masm_arm.hpp 7 found 0 interp_masm_ppc.hpp 8 found 0 interp_masm_riscv.hpp 9 found 0 interp_masm_s390.hpp 10 found 0 interp_masm_zero.hpp scanning 298 method.inline.hpp 2 found 286 continuationEntry_x86.inline.hpp 3 found 0 continuationEntry_aarch64.inline.hpp 4 found 0 continuationEntry_ppc.inline.hpp 5 found 0 continuationEntry_riscv.inline.hpp New ----- scanning 304 methodCounters.hpp 2 found 299 method.inline.hpp scanning 476 invocationCounter.hpp 2 found 304 methodCounters.hpp 3 found 257 methodData.hpp 4 found 0 interp_masm_aarch64.hpp 5 found 0 interp_masm_ppc.hpp 6 found 0 interp_masm_riscv.hpp 7 found 0 interp_masm_s390.hpp 8 found 0 interp_masm_zero.hpp scanning 299 method.inline.hpp 2 found 286 continuationEntry_x86.inline.hpp 3 found 0 continuationEntry_aarch64.inline.hpp 4 found 0 continuationEntry_ppc.inline.hpp 5 found 0 continuationEntry_riscv.inline.hpp ------------- Commit messages: - 8292692: Move MethodCounters inline functions out of method.hpp Changes: https://git.openjdk.org/jdk/pull/15094/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15094&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8292692 Stats: 187 lines in 25 files changed: 106 ins; 64 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/15094.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15094/head:pull/15094 PR: https://git.openjdk.org/jdk/pull/15094 From iklam at openjdk.org Wed Sep 6 21:24:36 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 6 Sep 2023 21:24:36 GMT Subject: RFR: 8292692: Move MethodCounters inline functions out of method.hpp In-Reply-To: References: Message-ID: On Mon, 31 Jul 2023 20:12:19 GMT, Matias Saavedra Silva wrote: > The inline functions related to MethodCounters in method.hpp can be moved to the inline file to reduce the number of includes. Verified with tier 1-5 tests. > > Below is a comparison of the old and new include statistics: > > Old > ---- > scanning 836 methodCounters.hpp > 2 found 836 method.hpp > > scanning 837 invocationCounter.hpp > 2 found 836 method.hpp > 3 found 836 methodCounters.hpp > 4 found 649 interp_masm_x86.hpp > 5 found 0 interp_masm_aarch64.hpp > 6 found 0 interp_masm_arm.hpp > 7 found 0 interp_masm_ppc.hpp > 8 found 0 interp_masm_riscv.hpp > 9 found 0 interp_masm_s390.hpp > 10 found 0 interp_masm_zero.hpp > > scanning 298 method.inline.hpp > 2 found 286 continuationEntry_x86.inline.hpp > 3 found 0 continuationEntry_aarch64.inline.hpp > 4 found 0 continuationEntry_ppc.inline.hpp > 5 found 0 continuationEntry_riscv.inline.hpp > > > > New > ----- > scanning 304 methodCounters.hpp > 2 found 299 method.inline.hpp > > scanning 476 invocationCounter.hpp > 2 found 304 methodCounters.hpp > 3 found 257 methodData.hpp > 4 found 0 interp_masm_aarch64.hpp > 5 found 0 interp_masm_ppc.hpp > 6 found 0 interp_masm_riscv.hpp > 7 found 0 interp_masm_s390.hpp > 8 found 0 interp_masm_zero.hpp > > scanning 299 method.inline.hpp > 2 found 286 continuationEntry_x86.inline.hpp > 3 found 0 continuationEntry_aarch64.inline.hpp > 4 found 0 continuationEntry_ppc.inline.hpp > 5 found 0 continuationEntry_riscv.inline.hpp LGTM. ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15094#pullrequestreview-1614235698 From bchristi at openjdk.org Wed Sep 6 21:38:43 2023 From: bchristi at openjdk.org (Brent Christian) Date: Wed, 6 Sep 2023 21:38:43 GMT Subject: RFR: 8268829: Provide an optimized way to walk the stack with Class object only [v9] In-Reply-To: References: <6hreBEM3qw8FZmOCseR6hgu4-avV-C-2oK7PlOs-IYU=.b3345812-391b-4ed1-b7a2-cdb0e63e2be6@github.com> Message-ID: On Tue, 5 Sep 2023 17:52:44 GMT, Mandy Chung wrote: >> test/micro/org/openjdk/bench/java/lang/StackWalkBench.java line 64: >> >>> 62: default -> throw new IllegalArgumentException(name); >>> 63: }; >>> 64: } >> >> The previous `WALKER_DEFAULT` would not have retained the Class reference, but the new `default` will? > > Some benchmarks need the Class reference but some do not. For simplicity, use only walkers that retain Class reference so that all benchmarks can run with the default walker. In my mind, a "default" StackWalker (obtained from no-arg`StackWalker.getInstance()`) does not retain the Class instance. I think this will be confusing when the "default" Param value is reported in JMH results. I like running the benchmarks with both sets of StackWalker options, but I think the `default` Param value should be changed to something like, `class+methods`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15370#discussion_r1317859222 From manc at openjdk.org Wed Sep 6 21:59:39 2023 From: manc at openjdk.org (Man Cao) Date: Wed, 6 Sep 2023 21:59:39 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v2] In-Reply-To: References: Message-ID: On Wed, 6 Sep 2023 17:25:32 GMT, Volker Simonis wrote: >> Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: >> >> address dholmes@ comments > > src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 461: > >> 459: >> 460: _g1_concurrent_mark_threads_cpu_time = >> 461: PerfDataManager::create_variable(NULL_NS, "g1_conc_mark_thread_time", > > See my general comment about name spaces. The name should be something like `*.cpu_time`. +1. We probably should avoid using `NULL_NS` as well. So perhaps something like: PerfDataManager::create_variable(SUN_THREADS, "g1_conc_mark_cpu_time", ... There is only one existing hsperf counter under `SUN_THREADS`: sun.threads.vmOperationTime. I think it is appropriate to add all these thread CPU time counters under `SUN_THREADS`. > src/hotspot/share/gc/shared/collectedHeap.hpp line 147: > >> 145: // Perf counters for CPU time of parallel GC threads. Defined here in order to >> 146: // be reused for all collectors. >> 147: PerfVariable* _perf_parallel_gc_threads_cpu_time; > > If this is intended to be reused for other GCs then rename to something more generic like `_perf_gc_threads_cpu_time`. The intention is for parallel GC worker threads. Not for all GC threads, and not to be confused with threads for `ParallelGC`. Perhaps `_perf_parallel_worker_threads_cpu_time` is better? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1317869103 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1317865168 From manc at openjdk.org Wed Sep 6 21:59:42 2023 From: manc at openjdk.org (Man Cao) Date: Wed, 6 Sep 2023 21:59:42 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v2] In-Reply-To: References: Message-ID: On Wed, 30 Aug 2023 22:58:41 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > address dholmes@ comments src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 171: > 169: // the primary thread is started last and stopped first, so it will not risk > 170: // reading CPU time of a terminated worker thread. > 171: assert(Thread::current() == _threads[0], This assert could be changed to call `assert_current_thread_is_primary_refinement_thread()`. src/hotspot/share/gc/g1/g1ConcurrentRefineThread.cpp line 85: > 83: if (UsePerfData && os::is_thread_cpu_time_supported() && is_primary()) { > 84: _cr->update_concurrent_refine_threads_cpu_time(); > 85: } There are two classes for primary thread and secondary refinement thread: `G1PrimaryConcurrentRefineThread` and `G1SecondaryConcurrentRefineThread`. It is probably cleaner to move this part inside `G1PrimaryConcurrentRefineThread` and add a virtual method in `G1ConcurrentRefineThread`. We can get rid of the `is_primary()` check as well. class G1ConcurrentRefineThread { virtual void possibly_update_threads_cpu_time() {}; } void G1PrimaryConcurrentRefineThread::possibly_update_threads_cpu_time() { if (UsePerfData && os::is_thread_cpu_time_supported()) { _cr->update_concurrent_refine_threads_cpu_time(); } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1317853674 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1317862507 From manc at openjdk.org Wed Sep 6 22:20:42 2023 From: manc at openjdk.org (Man Cao) Date: Wed, 6 Sep 2023 22:20:42 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v2] In-Reply-To: References: Message-ID: On Wed, 6 Sep 2023 17:54:23 GMT, Volker Simonis wrote: >> Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: >> >> address dholmes@ comments > > First of all thanks for this PR. I think it is useful and once it's done I think we're interested in implementing the corresponding counters for Shenandoah and also for the JIT compiler threads. > > First some general comments: > - Instead of `PerfVariable` I'd use `PerfCounter` because they represent a "[*data value that can (should) be modified in a monotonic manner*](https://github.com/openjdk/jdk/blob/bd477810b176696e0fd043f5594663ebcf9884cf/src/hotspot/share/runtime/perfData.hpp#L419-L427)" rather than "*being modified in an unrestricted manner*". > - Please put the counters into the appropriate namespace (e.g. `sun.gc.collector..cpu_time`). This fits better with the existing counters (we currently don't have counters outside the `java.` or `sun.` namespaces) and makes it easier for follow up changes to implement the corresponding counters for additional GCs. > - Can you please aggregate all the different CPU time counters from the different GC phases into one GC CPU time counter (e.g. `sun.gc.cpu_time`). This would be the same for different GCs and would simplify the monitoring of GC overhead independently of the used GC algorithm. > - Please add a test for the new counters. Responding to some comments from @simonis: > Please put the counters into the appropriate namespace (e.g. sun.gc.collector..cpu_time). I suggested using the `SUN_THREADS` namespace instead, because: - Counters like for VM thread and String Dedup thread do not belong to any GC collector namespace. - It is hard to classify some counters into either `sun.gc.collector.0` or `sun.gc.collector.1`. E.g. `_perf_parallel_gc_threads_cpu_time` is used by both incremental and full collectors for G1. It is possible to put them under `sun.gc` directly though. > Can you please aggregate all the different CPU time counters from the different GC phases into one GC CPU time counter (e.g. sun.gc.cpu_time). Our experience for monitoring metrics is that it is best to provide individual metrics for each subcomponent. Monitoring tools can perform aggregation for these fine-grained metrics if they need to. If the JVM provide an aggregate metric, it is impossible to un-aggregate this metric. It is also hard to tell which subcomponent is the culprit if there's regression. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15082#issuecomment-1709199392 From ccheung at openjdk.org Wed Sep 6 22:31:38 2023 From: ccheung at openjdk.org (Calvin Cheung) Date: Wed, 6 Sep 2023 22:31:38 GMT Subject: RFR: 8292692: Move MethodCounters inline functions out of method.hpp In-Reply-To: References: Message-ID: <0SHLJ8tNH-hNEprnvIPpOuGrRKltxiZC76JwzHsK3Rs=.93cbd8c1-5bb7-45ff-8545-c04c52700c7b@github.com> On Mon, 31 Jul 2023 20:12:19 GMT, Matias Saavedra Silva wrote: > The inline functions related to MethodCounters in method.hpp can be moved to the inline file to reduce the number of includes. Verified with tier 1-5 tests. > > Below is a comparison of the old and new include statistics: > > Old > ---- > scanning 836 methodCounters.hpp > 2 found 836 method.hpp > > scanning 837 invocationCounter.hpp > 2 found 836 method.hpp > 3 found 836 methodCounters.hpp > 4 found 649 interp_masm_x86.hpp > 5 found 0 interp_masm_aarch64.hpp > 6 found 0 interp_masm_arm.hpp > 7 found 0 interp_masm_ppc.hpp > 8 found 0 interp_masm_riscv.hpp > 9 found 0 interp_masm_s390.hpp > 10 found 0 interp_masm_zero.hpp > > scanning 298 method.inline.hpp > 2 found 286 continuationEntry_x86.inline.hpp > 3 found 0 continuationEntry_aarch64.inline.hpp > 4 found 0 continuationEntry_ppc.inline.hpp > 5 found 0 continuationEntry_riscv.inline.hpp > > > > New > ----- > scanning 304 methodCounters.hpp > 2 found 299 method.inline.hpp > > scanning 476 invocationCounter.hpp > 2 found 304 methodCounters.hpp > 3 found 257 methodData.hpp > 4 found 0 interp_masm_aarch64.hpp > 5 found 0 interp_masm_ppc.hpp > 6 found 0 interp_masm_riscv.hpp > 7 found 0 interp_masm_s390.hpp > 8 found 0 interp_masm_zero.hpp > > scanning 299 method.inline.hpp > 2 found 286 continuationEntry_x86.inline.hpp > 3 found 0 continuationEntry_aarch64.inline.hpp > 4 found 0 continuationEntry_ppc.inline.hpp > 5 found 0 continuationEntry_riscv.inline.hpp Looks good. Just one nit. src/hotspot/share/oops/method.inline.hpp line 142: > 140: } > 141: } > 142: #endif Suggestion: ` #endif // INCLUDE_JVMTI` ------------- Marked as reviewed by ccheung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15094#pullrequestreview-1614304034 PR Review Comment: https://git.openjdk.org/jdk/pull/15094#discussion_r1317890673 From kvn at openjdk.org Wed Sep 6 22:51:39 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 6 Sep 2023 22:51:39 GMT Subject: RFR: 8292692: Move MethodCounters inline functions out of method.hpp In-Reply-To: References: Message-ID: On Mon, 31 Jul 2023 20:12:19 GMT, Matias Saavedra Silva wrote: > The inline functions related to MethodCounters in method.hpp can be moved to the inline file to reduce the number of includes. Verified with tier 1-5 tests. > > Below is a comparison of the old and new include statistics: > > Old > ---- > scanning 836 methodCounters.hpp > 2 found 836 method.hpp > > scanning 837 invocationCounter.hpp > 2 found 836 method.hpp > 3 found 836 methodCounters.hpp > 4 found 649 interp_masm_x86.hpp > 5 found 0 interp_masm_aarch64.hpp > 6 found 0 interp_masm_arm.hpp > 7 found 0 interp_masm_ppc.hpp > 8 found 0 interp_masm_riscv.hpp > 9 found 0 interp_masm_s390.hpp > 10 found 0 interp_masm_zero.hpp > > scanning 298 method.inline.hpp > 2 found 286 continuationEntry_x86.inline.hpp > 3 found 0 continuationEntry_aarch64.inline.hpp > 4 found 0 continuationEntry_ppc.inline.hpp > 5 found 0 continuationEntry_riscv.inline.hpp > > > > New > ----- > scanning 304 methodCounters.hpp > 2 found 299 method.inline.hpp > > scanning 476 invocationCounter.hpp > 2 found 304 methodCounters.hpp > 3 found 257 methodData.hpp > 4 found 0 interp_masm_aarch64.hpp > 5 found 0 interp_masm_ppc.hpp > 6 found 0 interp_masm_riscv.hpp > 7 found 0 interp_masm_s390.hpp > 8 found 0 interp_masm_zero.hpp > > scanning 299 method.inline.hpp > 2 found 286 continuationEntry_x86.inline.hpp > 3 found 0 continuationEntry_aarch64.inline.hpp > 4 found 0 continuationEntry_ppc.inline.hpp > 5 found 0 continuationEntry_riscv.inline.hpp Looks good. I submitted tier1 testing which includes different builds (I don't see link to testing in RFE). ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15094#pullrequestreview-1614324947 From mchung at openjdk.org Wed Sep 6 22:54:45 2023 From: mchung at openjdk.org (Mandy Chung) Date: Wed, 6 Sep 2023 22:54:45 GMT Subject: RFR: 8268829: Provide an optimized way to walk the stack with Class object only [v9] In-Reply-To: References: <6hreBEM3qw8FZmOCseR6hgu4-avV-C-2oK7PlOs-IYU=.b3345812-391b-4ed1-b7a2-cdb0e63e2be6@github.com> Message-ID: <6pJF7vbeCEhPHQcNx9kqyEwR35OI6f8b94hOy1ka1k4=.6aaf275a-fae1-45ad-a6b9-15328999b4f4@github.com> On Wed, 6 Sep 2023 21:35:49 GMT, Brent Christian wrote: >> Some benchmarks need the Class reference but some do not. For simplicity, use only walkers that retain Class reference so that all benchmarks can run with the default walker. > > In my mind, a "default" StackWalker (obtained from no-arg`StackWalker.getInstance()`) does not retain the Class instance. I think this will be confusing when the "default" Param value is reported in JMH results. > > I like running the benchmarks with both sets of StackWalker options, but I think the `default` Param value should be changed to something like, `class+methods`. OK, can rename it. For the benchmarking purpose, it does not matter if it retains the Class instance or not. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15370#discussion_r1317907061 From sspitsyn at openjdk.org Thu Sep 7 06:33:29 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 7 Sep 2023 06:33:29 GMT Subject: RFR: 8312174: missing JVMTI events from vthreads parked during JVMTI attach [v2] In-Reply-To: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> References: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> Message-ID: > This update fixes two important issues: > - Issue reported by a bug submitter about missing JVMTI events on virtual threads after an a JVMTI agent dynamic attach > - Known scalability/performance issue: a need to lazily create `JvmtiThreadState's` for virtual threads > > The issue is tricky to fix because the existing mechanism of the JVMTI event management does not support unmounted virtual threads. The JVMTI `SetEventNotificationMode()` calls the function `JvmtiEventControllerPrivate::recompute_enabled()` > which inspects a `JavaThread's` list and for each thread in the list recomputes enabled event bits with the function `JvmtiEventControllerPrivate::recompute_thread_enabled()`. The `JvmtiThreadState` of each thread is created but only when it is really needed, eg, if any of the thread filtered events is enabled. There was an initial adjustment of this mechanism for virtual threads which accounted for both carrier and virtual threads when a virtual thread is mounted. However, it does not work for unmounted virtual threads. A temporary work around was to always create `JvmtiThreadState` for each virtual thread eagerly at a thread starting point. > > This fix introduces new function `JvmtiExport::get_jvmti_thread_state()` which checks if thread is virtual and there is a thread filtered event enabled globally, and if so, forces a creation of the `JvmtiThreadState`. Another adjustment was needed because the function `state_for_while_locked()` can be called directly in some contexts. New function `JvmtiEventController::recompute_thread_filtered()` was introduced to make necessary corrections. > > Testing: > - new test from the bug report was adopted: `test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest` > - ran mach5 tiers 1-6: all are passed Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge - 8312174: missing JVMTI events from vthreads parked during JVMTI attach ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15467/files - new: https://git.openjdk.org/jdk/pull/15467/files/dbe3a64a..dd97dacc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15467&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15467&range=00-01 Stats: 16961 lines in 424 files changed: 12615 ins; 2655 del; 1691 mod Patch: https://git.openjdk.org/jdk/pull/15467.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15467/head:pull/15467 PR: https://git.openjdk.org/jdk/pull/15467 From azafari at openjdk.org Thu Sep 7 07:06:42 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 7 Sep 2023 07:06:42 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v3] In-Reply-To: References: Message-ID: On Tue, 29 Aug 2023 02:58:11 GMT, Quan Anh Mai wrote: > This is similar to `std::find_if` and should be just: > > ``` > template > int find_if(UnaryPredicate p) const { > for (int i = 0; i < _len; i++) { > if (p(_data[i])) { > return i; > } > } > return -1; > } > ``` > > Regarding the current approach, the comparator should take a `const E&` instead of an `E`, and the token passed in should be `const` also. Thank you @merykitty for this comment. It is fixed now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15418#issuecomment-1709587729 From azafari at openjdk.org Thu Sep 7 07:06:43 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 7 Sep 2023 07:06:43 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v4] In-Reply-To: References: <068Gqd9adw6k8nrLAJoEMDmbw2s3RMpV0KPmWDS0OdI=.1d8171a3-012a-425c-bea6-44f538a64106@github.com> <2kgLlH__lH2LiM7VdVePqCgW8Uck-AvrO-klTB8yzq4=.66ad8590-4661-47f2-b206-5979c8dba063@github.com> Message-ID: On Tue, 29 Aug 2023 07:04:15 GMT, Serguei Spitsyn wrote: >> Also, why isn't this change also being applied to `find_from_end` > > There can be a confusion related to selection of type names T and E: > T is intuitively treated as a table and E as an element. > No pressure but I wonder if using D instead of T would be better. > Also, why isn't this change also being applied to `find_from_end` Thank you @kimbarrett, the function is also changed accordingly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15418#discussion_r1318170737 From azafari at openjdk.org Thu Sep 7 07:09:37 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 7 Sep 2023 07:09:37 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v4] In-Reply-To: References: <068Gqd9adw6k8nrLAJoEMDmbw2s3RMpV0KPmWDS0OdI=.1d8171a3-012a-425c-bea6-44f538a64106@github.com> <2kgLlH__lH2LiM7VdVePqCgW8Uck-AvrO-klTB8yzq4=.66ad8590-4661-47f2-b206-5979c8dba063@github.com> Message-ID: On Thu, 7 Sep 2023 07:03:44 GMT, Afshin Zafari wrote: >> There can be a confusion related to selection of type names T and E: >> T is intuitively treated as a table and E as an element. >> No pressure but I wonder if using D instead of T would be better. > >> Also, why isn't this change also being applied to `find_from_end` > > Thank you @kimbarrett, the function is also changed accordingly. > We could just as well do a capturing lambda here, yes. Then we'd have: > > ```c++ > template > int find(F finder); > ``` > > It'd be a template instead of function pointer since it's a capturing lambda and `std::function` is not permitted in Hotspot AFAIK. > > As an aside, to clarify for readers: There's a `&` missing in the capture list of your examples. It would be nice to have this templated Function as finder. However, I think it is better to keep the changes small and manageable for this PR. Thanks for the comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15418#discussion_r1318173657 From bulasevich at openjdk.org Thu Sep 7 07:58:10 2023 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 7 Sep 2023 07:58:10 GMT Subject: RFR: 8307352: AARCH64: Improve itable_stub [v10] In-Reply-To: References: Message-ID: On Wed, 6 Sep 2023 18:48:20 GMT, Evgeny Astigeevich wrote: >> Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: >> >> Address base_plus_offset_reg encoding: assert->guarantee for shift() == size check > > Overall lgtm @eastig, @simonis, @shipilev, @theRealAph Thanks for reviewing! If there are no more comments, I will integrate it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13792#issuecomment-1709652280 From bulasevich at openjdk.org Thu Sep 7 07:58:10 2023 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 7 Sep 2023 07:58:10 GMT Subject: RFR: 8307352: AARCH64: Improve itable_stub [v11] In-Reply-To: References: Message-ID: > This is a change for AARCH similar to https://github.com/openjdk/jdk/pull/13460 > > The change replaces two separate iterations over the itable with a new algorithm consisting of two loops. First, we look for a match with resolved_klass, checking for a match with holder_klass along the way. Then we continue iterating (not starting over) the itable using the second loop, checking only for a match with holder_klass. > > InterfaceCalls openjdk benchmark performance results on A53, A72, Neoverse N1 and V1 micro-architectures: > > > Cortex-A53 (Pi 3 Model B Rev 1.2) > > test1stInt2Types 37.5 37.358 0.38 > test1stInt3Types 160.166 148.04 8.19 > test1stInt5Types 158.131 147.955 6.88 > test2ndInt2Types 52.634 53.291 -1.23 > test2ndInt3Types 201.39 181.603 10.90 > test2ndInt5Types 195.722 176.707 10.76 > testIfaceCall 157.453 140.498 12.07 > testIfaceExtCall 175.46 154.351 13.68 > testMonomorphic 32.052 32.039 0.04 > AVG: 6.85 > > Cortex-A72 (Pi 4 Model B Rev 1.2) > > test1stInt2Types 27.4796 27.4738 0.02 > test1stInt3Types 66.0085 64.9374 1.65 > test1stInt5Types 67.9812 66.2316 2.64 > test2ndInt2Types 32.0581 32.062 -0.01 > test2ndInt3Types 68.2715 65.6643 3.97 > test2ndInt5Types 68.1012 65.8024 3.49 > testIfaceCall 64.0684 64.1811 -0.18 > testIfaceExtCall 91.6226 81.5867 12.30 > testMonomorphic 26.7161 26.7142 0.01 > AVG: 2.66 > > Neoverse N1 (m6g.metal) > > test1stInt2Types 2.9104 2.9086 0.06 > test1stInt3Types 10.9642 10.2909 6.54 > test1stInt5Types 10.9607 10.2856 6.56 > test2ndInt2Types 3.3410 3.3478 -0.20 > test2ndInt3Types 12.3291 11.3089 9.02 > test2ndInt5Types 12.328 11.2704 9.38 > testIfaceCall 11.0598 10.3657 6.70 > testIfaceExtCall 13.0692 11.2826 15.84 > testMonomorphic 2.2354 2.2341 0.06 > AVG: 6.00 > > Neoverse V1 (c7g.2xlarge) > > test1stInt2Types 2.2317 2.2320 -0.01 > test1stInt3Types 6.6884 6.1911 8.03 > test1stInt5Types 6.7334 6.2193 8.27 > test2ndInt2Types 2.4002 2.4013 -0.04 > test2ndInt3Types 7.9603 7.0372 13.12 > test2ndInt5Types 7.9532 7.0474 12.85 > testIfaceCall 6.7028 6.3272 5.94 > testIfaceExtCall 8.3253 6.9416 19.93 > testMonomorphic 1.2446 1.2544 -0.79 > AVG: 7.48 > > > Testing... Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: assert message update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13792/files - new: https://git.openjdk.org/jdk/pull/13792/files/1687c4bd..11287294 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13792&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13792&range=09-10 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13792.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13792/head:pull/13792 PR: https://git.openjdk.org/jdk/pull/13792 From fyang at openjdk.org Thu Sep 7 07:59:50 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 7 Sep 2023 07:59:50 GMT Subject: RFR: 8315652: RISC-V: Features string uses wrong separator for jtreg In-Reply-To: <3oL6QCLCgkfDLQrUohFBdczMAVuJXdLvAkwU7xWhaI8=.595b5953-9501-4e6c-a265-a9f014634c04@github.com> References: <3oL6QCLCgkfDLQrUohFBdczMAVuJXdLvAkwU7xWhaI8=.595b5953-9501-4e6c-a265-a9f014634c04@github.com> Message-ID: On Wed, 6 Sep 2023 06:14:07 GMT, Robbin Ehn wrote: > Hi, please consider. > > As described in jbs, this handles both cases with a rough solution by having two strings. > Meaning we get e.g. 'v' as a separate feature from CPUInfo, but we still get the pretty string in e.g. hs_err. > > Tested tier1 on qemu rv. Hi, I have a small question about the JBS description. Where does the comma in the string we are regexping comes from if we use plain space here? I am also wondering if we could do the separation for the original `_features_string`. This would help elimnate changes to the shared code. I guess it might not be a big issue for other places. src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp line 162: > 160: > 161: _features_string = os::strdup(buf); > 162: _parsable_features_string = os::strdup(buf); Shouldn't this be `buf_pfs` instead of `buf`? ------------- Changes requested by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15579#pullrequestreview-1614780830 PR Review Comment: https://git.openjdk.org/jdk/pull/15579#discussion_r1318212928 From rehn at openjdk.org Thu Sep 7 08:39:40 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 7 Sep 2023 08:39:40 GMT Subject: RFR: 8315652: RISC-V: Features string uses wrong separator for jtreg In-Reply-To: References: <3oL6QCLCgkfDLQrUohFBdczMAVuJXdLvAkwU7xWhaI8=.595b5953-9501-4e6c-a265-a9f014634c04@github.com> Message-ID: On Thu, 7 Sep 2023 07:41:43 GMT, Fei Yang wrote: >> Hi, please consider. >> >> As described in jbs, this handles both cases with a rough solution by having two strings. >> Meaning we get e.g. 'v' as a separate feature from CPUInfo, but we still get the pretty string in e.g. hs_err. >> >> Tested tier1 on qemu rv. > > src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp line 162: > >> 160: >> 161: _features_string = os::strdup(buf); >> 162: _parsable_features_string = os::strdup(buf); > > Shouldn't this be `buf_pfs` instead of `buf`? Yes, this looks wrong. I'll investigate way this was working in testing. It seem like it should not have.... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15579#discussion_r1318278253 From aph at openjdk.org Thu Sep 7 08:51:44 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 7 Sep 2023 08:51:44 GMT Subject: RFR: 8307352: AARCH64: Improve itable_stub [v11] In-Reply-To: References: Message-ID: On Thu, 7 Sep 2023 07:58:10 GMT, Boris Ulasevich wrote: >> This is a change for AARCH similar to https://github.com/openjdk/jdk/pull/13460 >> >> The change replaces two separate iterations over the itable with a new algorithm consisting of two loops. First, we look for a match with resolved_klass, checking for a match with holder_klass along the way. Then we continue iterating (not starting over) the itable using the second loop, checking only for a match with holder_klass. >> >> InterfaceCalls openjdk benchmark performance results on A53, A72, Neoverse N1 and V1 micro-architectures: >> >> >> Cortex-A53 (Pi 3 Model B Rev 1.2) >> >> test1stInt2Types 37.5 37.358 0.38 >> test1stInt3Types 160.166 148.04 8.19 >> test1stInt5Types 158.131 147.955 6.88 >> test2ndInt2Types 52.634 53.291 -1.23 >> test2ndInt3Types 201.39 181.603 10.90 >> test2ndInt5Types 195.722 176.707 10.76 >> testIfaceCall 157.453 140.498 12.07 >> testIfaceExtCall 175.46 154.351 13.68 >> testMonomorphic 32.052 32.039 0.04 >> AVG: 6.85 >> >> Cortex-A72 (Pi 4 Model B Rev 1.2) >> >> test1stInt2Types 27.4796 27.4738 0.02 >> test1stInt3Types 66.0085 64.9374 1.65 >> test1stInt5Types 67.9812 66.2316 2.64 >> test2ndInt2Types 32.0581 32.062 -0.01 >> test2ndInt3Types 68.2715 65.6643 3.97 >> test2ndInt5Types 68.1012 65.8024 3.49 >> testIfaceCall 64.0684 64.1811 -0.18 >> testIfaceExtCall 91.6226 81.5867 12.30 >> testMonomorphic 26.7161 26.7142 0.01 >> AVG: 2.66 >> >> Neoverse N1 (m6g.metal) >> >> test1stInt2Types 2.9104 2.9086 0.06 >> test1stInt3Types 10.9642 10.2909 6.54 >> test1stInt5Types 10.9607 10.2856 6.56 >> test2ndInt2Types 3.3410 3.3478 -0.20 >> test2ndInt3Types 12.3291 11.3089 9.02 >> test2ndInt5Types 12.328 11.2704 9.38 >> testIfaceCall 11.0598 10.3657 6.70 >> testIfaceExtCall 13.0692 11.2826 15.84 >> testMonomorphic 2.2354 2.2341 0.06 >> AVG: 6.00 >> >> Neoverse V1 (c7g.2xlarge) >> >> test1stInt2Types 2.2317 2.2320 -0.01 >> test1stInt3Types 6.6884 6.1911 8.03 >> test1stInt5Types 6.7334 6.2193 8.27 >> test2ndInt2Types 2.4002 2.4013 -0.04 >> test2ndInt3Types 7.9603 7.0372 13.12 >> test2ndInt5Types 7.9532 7.0474 12.85 >> testIfaceCall 6.7028 6.3272 5.94 >> testIfaceExtCall 8.3253 6.941... > > Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: > > assert message update Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13792#pullrequestreview-1614912890 From rehn at openjdk.org Thu Sep 7 08:53:40 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 7 Sep 2023 08:53:40 GMT Subject: RFR: 8315652: RISC-V: Features string uses wrong separator for jtreg In-Reply-To: References: <3oL6QCLCgkfDLQrUohFBdczMAVuJXdLvAkwU7xWhaI8=.595b5953-9501-4e6c-a265-a9f014634c04@github.com> Message-ID: <2nAeCsWDMD4eCmc9su6i8O6QDrUJIijx0X5jXYiO-u0=.9fd22174-3de5-4c13-a2c9-04ff58c5494a@github.com> On Thu, 7 Sep 2023 07:56:40 GMT, Fei Yang wrote: > Hi, I have a small question about the JBS description. Where does the comma in the string we are regexping comes from if we use plain space here? I am also wondering if we could do the separation for the original `_features_string`. This would help elimnate changes to the shared code. I guess it might not be a big issue for other places. CPUInfo.java splits the feature string into a List, so the string supplied to jtreg required annotation is Arrays.toString(). So e.g. "rv64 i g c v zicbop z..." is in a List with strings: "rv64", "i", "g", "c", "v", "zicbop,".... toString() returns: "[rv64, i, g, c, v, zicbop, z...]" Yes, we could do the separation from the original string but adding a method to parse out everything didn't seem better. I think it actually easier to do the other way around having the feature flags with separator and then remove them for 'pretty' print string. But not clear to me that it would be an improvement, i.e. having more code dealing with strings. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15579#issuecomment-1709745846 From luhenry at openjdk.org Thu Sep 7 09:07:01 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 7 Sep 2023 09:07:01 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support Message-ID: With the Ztso extension [1], some hardware will support TSO on RISC-V. That allows us to reduce the generation of memory fences, given the stronger memory model compared to RVWMO. [1] https://github.com/riscv/riscv-isa-manual/blob/6dcbc6da9ada01f0f57da83cda6059bdec57619f/src/ztso-st-ext.adoc#L1 ------------- Commit messages: - 8315841: RISC-V: Check for hardware TSO support Changes: https://git.openjdk.org/jdk/pull/15613/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15613&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8315841 Stats: 14 lines in 4 files changed: 13 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15613.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15613/head:pull/15613 PR: https://git.openjdk.org/jdk/pull/15613 From luhenry at openjdk.org Thu Sep 7 09:40:42 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 7 Sep 2023 09:40:42 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v13] In-Reply-To: References: Message-ID: On Mon, 4 Sep 2023 16:44:10 GMT, Ilya Gavrilin wrote: >> Please review this changes into risc-v double rounding intrinsic. >> >> On risc-v intrinsics for rounding doubles with mode (like Math.ceil/floor/rint) were missing. On risc-v we don`t have special instruction for such conversion, so two times conversion was used: double -> long int -> double (using fcvt.l.d, fcvt.d.l). >> >> Also, we should provide some rounding mode to fcvt.x.x instruction. >> >> Rounding mode selection on ceil (similar for floor and rint): according to Math.ceil requirements: >> >>> Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer (Math.java:475). >> >> For double -> long int we choose rup (round towards +inf) mode to get the integer that more than or equal to the input value. >> For long int -> double we choose rdn (rounds towards -inf) mode to get the smallest (closest to -inf) representation of integer that we got after conversion. >> >> For cases when we got inf, nan, or value more than 2^63 return input value (double value which more than 2^63 is guaranteed integer). >> As well when we store result we copy sign from input value (need for cases when for (-1.0, 0.0) ceil need to return -0.0). >> >> We have observed significant improvement on hifive and thead boards. >> >> testing: tier1, tier2 and hotspot:tier3 on hifive >> >> Performance results on hifive (FpRoundingBenchmark.testceil/floor/rint): >> >> Without intrinsic: >> >> Benchmark (TESTSIZE) Mode Cnt Score Error Units >> FpRoundingBenchmark.testceil 1024 thrpt 25 39.297 ? 0.037 ops/ms >> FpRoundingBenchmark.testfloor 1024 thrpt 25 39.398 ? 0.018 ops/ms >> FpRoundingBenchmark.testrint 1024 thrpt 25 36.388 ? 0.844 ops/ms >> >> With intrinsic: >> >> Benchmark (TESTSIZE) Mode Cnt Score Error Units >> FpRoundingBenchmark.testceil 1024 thrpt 25 80.560 ? 0.053 ops/ms >> FpRoundingBenchmark.testfloor 1024 thrpt 25 80.541 ? 0.081 ops/ms >> FpRoundingBenchmark.testrint 1024 thrpt 25 80.603 ? 0.071 ops/ms > > Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in c2_macroassembler Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14991#pullrequestreview-1615021429 From vkempik at openjdk.org Thu Sep 7 10:28:38 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 7 Sep 2023 10:28:38 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support In-Reply-To: References: Message-ID: <6wE08qQdv2gj4DmwvFK-vd51-pdsrCw1sC-_gUQ4EZo=.08020fe2-5894-4498-9212-c7111eea4fdf@github.com> On Thu, 7 Sep 2023 09:00:50 GMT, Ludovic Henry wrote: > With the Ztso extension [1], some hardware will support TSO on RISC-V. That allows us to reduce the generation of memory fences, given the stronger memory model compared to RVWMO. > > [1] https://github.com/riscv/riscv-isa-manual/blob/6dcbc6da9ada01f0f57da83cda6059bdec57619f/src/ztso-st-ext.adoc#L1 Hello Ludovic, will this ( and hw support for tso) help to reduce overhead from getfield IsDone ? it's sometimes the place where cpu spends most of time in some jmh tests: 62.78% 0x0000003fdcfb6670: fence ir,iorw ;*getfield isDone {reexecute=0 rethrow=0 return_oop=0} ; - org.openjdk.bench.java.lang.jmh_generated.MathBench_unsignedMultiplyHighLongLong_jmhTest::unsignedMultiplyHighLongLong_thrpt_jmhStub at 30 (line 121) ------------- PR Comment: https://git.openjdk.org/jdk/pull/15613#issuecomment-1709903422 From vkempik at openjdk.org Thu Sep 7 10:42:51 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 7 Sep 2023 10:42:51 GMT Subject: RFR: 8313322: RISC-V: implement MD5 intrinsic [v2] In-Reply-To: <6s2kCKs3adXbeECg3R1ous5l_eb6oen2b-pEfJgF2oY=.e171ce83-5280-430e-bba7-4b8663aab966@github.com> References: <6zXzQDEH7fxazbf7vwFL4AebesePv4uPofa62bcpQDU=.91c008b8-d743-4a08-a5e3-c89259756023@github.com> <6s2kCKs3adXbeECg3R1ous5l_eb6oen2b-pEfJgF2oY=.e171ce83-5280-430e-bba7-4b8663aab966@github.com> Message-ID: On Wed, 2 Aug 2023 13:16:03 GMT, Antonios Printezis wrote: >>> Thanks, looks good to me! >>> >>> You also have some tests here: test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5xxx >>> >>> I'll take it for a spin. >> >> Thumbs up! > > Thanks to @robehn for doing a performance evaluation with the jmh md5 microbenchmarks on his VisionFive2 board! > > -UseMD5Intrinsic: > > > MessageDigests.digest md5 64 DEFAULT avgt 6 2568.244 ? 842.423 ns/op > MessageDigests.digest md5 16384 DEFAULT avgt 6 217455.589 ? 30984.729 ns/op > MessageDigests.getAndDigest md5 64 DEFAULT avgt 6 3181.132 ? 677.752 ns/op > MessageDigests.getAndDigest md5 16384 DEFAULT avgt 6 230630.983 ? 34108.072 ns/op > > > +UseMD5Intrinsic: > > > MessageDigests.digest md5 64 DEFAULT avgt 6 1930.057 ? 106.178 ns/op > MessageDigests.digest md5 16384 DEFAULT avgt 6 162308.240 ? 2042.715 ns/op > MessageDigests.getAndDigest md5 64 DEFAULT avgt 6 2721.418 ? 567.045 ns/op > MessageDigests.getAndDigest md5 16384 DEFAULT avgt 6 164660.082 ? 1976.401 ns/op > > > +UseMD5Intrinsic +UseZbb: > > > MessageDigests.digest md5 64 DEFAULT avgt 6 1835.246 ? 252.071 ns/op > MessageDigests.digest md5 16384 DEFAULT avgt 6 145386.522 ? 444.446 ns/op > MessageDigests.getAndDigest md5 64 DEFAULT avgt 6 2555.515 ? 639.491 ns/op > MessageDigests.getAndDigest md5 16384 DEFAULT avgt 6 149045.631 ? 6658.545 ns/op Hello @gctony , do you have any intentions to backport this to 21 LTS ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15090#issuecomment-1709922458 From eastigeevich at openjdk.org Thu Sep 7 11:18:46 2023 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 7 Sep 2023 11:18:46 GMT Subject: RFR: 8307352: AARCH64: Improve itable_stub [v11] In-Reply-To: References: Message-ID: On Thu, 7 Sep 2023 07:58:10 GMT, Boris Ulasevich wrote: >> This is a change for AARCH similar to https://github.com/openjdk/jdk/pull/13460 >> >> The change replaces two separate iterations over the itable with a new algorithm consisting of two loops. First, we look for a match with resolved_klass, checking for a match with holder_klass along the way. Then we continue iterating (not starting over) the itable using the second loop, checking only for a match with holder_klass. >> >> InterfaceCalls openjdk benchmark performance results on A53, A72, Neoverse N1 and V1 micro-architectures: >> >> >> Cortex-A53 (Pi 3 Model B Rev 1.2) >> >> test1stInt2Types 37.5 37.358 0.38 >> test1stInt3Types 160.166 148.04 8.19 >> test1stInt5Types 158.131 147.955 6.88 >> test2ndInt2Types 52.634 53.291 -1.23 >> test2ndInt3Types 201.39 181.603 10.90 >> test2ndInt5Types 195.722 176.707 10.76 >> testIfaceCall 157.453 140.498 12.07 >> testIfaceExtCall 175.46 154.351 13.68 >> testMonomorphic 32.052 32.039 0.04 >> AVG: 6.85 >> >> Cortex-A72 (Pi 4 Model B Rev 1.2) >> >> test1stInt2Types 27.4796 27.4738 0.02 >> test1stInt3Types 66.0085 64.9374 1.65 >> test1stInt5Types 67.9812 66.2316 2.64 >> test2ndInt2Types 32.0581 32.062 -0.01 >> test2ndInt3Types 68.2715 65.6643 3.97 >> test2ndInt5Types 68.1012 65.8024 3.49 >> testIfaceCall 64.0684 64.1811 -0.18 >> testIfaceExtCall 91.6226 81.5867 12.30 >> testMonomorphic 26.7161 26.7142 0.01 >> AVG: 2.66 >> >> Neoverse N1 (m6g.metal) >> >> test1stInt2Types 2.9104 2.9086 0.06 >> test1stInt3Types 10.9642 10.2909 6.54 >> test1stInt5Types 10.9607 10.2856 6.56 >> test2ndInt2Types 3.3410 3.3478 -0.20 >> test2ndInt3Types 12.3291 11.3089 9.02 >> test2ndInt5Types 12.328 11.2704 9.38 >> testIfaceCall 11.0598 10.3657 6.70 >> testIfaceExtCall 13.0692 11.2826 15.84 >> testMonomorphic 2.2354 2.2341 0.06 >> AVG: 6.00 >> >> Neoverse V1 (c7g.2xlarge) >> >> test1stInt2Types 2.2317 2.2320 -0.01 >> test1stInt3Types 6.6884 6.1911 8.03 >> test1stInt5Types 6.7334 6.2193 8.27 >> test2ndInt2Types 2.4002 2.4013 -0.04 >> test2ndInt3Types 7.9603 7.0372 13.12 >> test2ndInt5Types 7.9532 7.0474 12.85 >> testIfaceCall 6.7028 6.3272 5.94 >> testIfaceExtCall 8.3253 6.941... > > Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: > > assert message update lgtm ------------- Marked as reviewed by eastigeevich (Committer). PR Review: https://git.openjdk.org/jdk/pull/13792#pullrequestreview-1615197810 From eastigeevich at openjdk.org Thu Sep 7 11:19:43 2023 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 7 Sep 2023 11:19:43 GMT Subject: RFR: 8307352: AARCH64: Improve itable_stub [v11] In-Reply-To: References: Message-ID: On Thu, 7 Sep 2023 07:58:10 GMT, Boris Ulasevich wrote: >> This is a change for AARCH similar to https://github.com/openjdk/jdk/pull/13460 >> >> The change replaces two separate iterations over the itable with a new algorithm consisting of two loops. First, we look for a match with resolved_klass, checking for a match with holder_klass along the way. Then we continue iterating (not starting over) the itable using the second loop, checking only for a match with holder_klass. >> >> InterfaceCalls openjdk benchmark performance results on A53, A72, Neoverse N1 and V1 micro-architectures: >> >> >> Cortex-A53 (Pi 3 Model B Rev 1.2) >> >> test1stInt2Types 37.5 37.358 0.38 >> test1stInt3Types 160.166 148.04 8.19 >> test1stInt5Types 158.131 147.955 6.88 >> test2ndInt2Types 52.634 53.291 -1.23 >> test2ndInt3Types 201.39 181.603 10.90 >> test2ndInt5Types 195.722 176.707 10.76 >> testIfaceCall 157.453 140.498 12.07 >> testIfaceExtCall 175.46 154.351 13.68 >> testMonomorphic 32.052 32.039 0.04 >> AVG: 6.85 >> >> Cortex-A72 (Pi 4 Model B Rev 1.2) >> >> test1stInt2Types 27.4796 27.4738 0.02 >> test1stInt3Types 66.0085 64.9374 1.65 >> test1stInt5Types 67.9812 66.2316 2.64 >> test2ndInt2Types 32.0581 32.062 -0.01 >> test2ndInt3Types 68.2715 65.6643 3.97 >> test2ndInt5Types 68.1012 65.8024 3.49 >> testIfaceCall 64.0684 64.1811 -0.18 >> testIfaceExtCall 91.6226 81.5867 12.30 >> testMonomorphic 26.7161 26.7142 0.01 >> AVG: 2.66 >> >> Neoverse N1 (m6g.metal) >> >> test1stInt2Types 2.9104 2.9086 0.06 >> test1stInt3Types 10.9642 10.2909 6.54 >> test1stInt5Types 10.9607 10.2856 6.56 >> test2ndInt2Types 3.3410 3.3478 -0.20 >> test2ndInt3Types 12.3291 11.3089 9.02 >> test2ndInt5Types 12.328 11.2704 9.38 >> testIfaceCall 11.0598 10.3657 6.70 >> testIfaceExtCall 13.0692 11.2826 15.84 >> testMonomorphic 2.2354 2.2341 0.06 >> AVG: 6.00 >> >> Neoverse V1 (c7g.2xlarge) >> >> test1stInt2Types 2.2317 2.2320 -0.01 >> test1stInt3Types 6.6884 6.1911 8.03 >> test1stInt5Types 6.7334 6.2193 8.27 >> test2ndInt2Types 2.4002 2.4013 -0.04 >> test2ndInt3Types 7.9603 7.0372 13.12 >> test2ndInt5Types 7.9532 7.0474 12.85 >> testIfaceCall 6.7028 6.3272 5.94 >> testIfaceExtCall 8.3253 6.941... > > Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: > > assert message update lgtm ------------- Marked as reviewed by eastigeevich (Committer). PR Review: https://git.openjdk.org/jdk/pull/13792#pullrequestreview-1615197810 From luhenry at openjdk.org Thu Sep 7 11:20:39 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 7 Sep 2023 11:20:39 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support In-Reply-To: References: Message-ID: On Thu, 7 Sep 2023 09:00:50 GMT, Ludovic Henry wrote: > With the Ztso extension [1], some hardware will support TSO on RISC-V. That allows us to reduce the generation of memory fences, given the stronger memory model compared to RVWMO. > > [1] https://github.com/riscv/riscv-isa-manual/blob/6dcbc6da9ada01f0f57da83cda6059bdec57619f/src/ztso-st-ext.adoc#L1 `fence ir,iorw` would not match `(predecessor & w) && (successor & r)`, leading to not generating the fence. In the end, only the following fences would be generated: `rw,r`, `rw,rw` `w,r`, `w,rw`. Some of the most common cases of fences that are going to be generated are `fence w,r` (used for `sun.misc.Unsafe::fullFence`) and `fence rw,rw` (generated for `MacroAssembler::AnyAny`). ------------- PR Comment: https://git.openjdk.org/jdk/pull/15613#issuecomment-1709970098 From vkempik at openjdk.org Thu Sep 7 11:24:42 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 7 Sep 2023 11:24:42 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support In-Reply-To: References: Message-ID: On Thu, 7 Sep 2023 09:00:50 GMT, Ludovic Henry wrote: > With the Ztso extension [1], some hardware will support TSO on RISC-V. That allows us to reduce the generation of memory fences, given the stronger memory model compared to RVWMO. > > [1] https://github.com/riscv/riscv-isa-manual/blob/6dcbc6da9ada01f0f57da83cda6059bdec57619f/src/ztso-st-ext.adoc#L1 src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 381: > 379: void fence(uint32_t predecessor, uint32_t successor) { > 380: if (UseZtso) { > 381: // do not emit fence if it's not at least a StoreLoad fence Could you improve the comment with some examples, next code is hard to read "if ( not (a and b)) then .." ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15613#discussion_r1318460146 From vkempik at openjdk.org Thu Sep 7 11:30:39 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 7 Sep 2023 11:30:39 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support In-Reply-To: References: Message-ID: On Thu, 7 Sep 2023 09:00:50 GMT, Ludovic Henry wrote: > With the Ztso extension [1], some hardware will support TSO on RISC-V. That allows us to reduce the generation of memory fences, given the stronger memory model compared to RVWMO. > > [1] https://github.com/riscv/riscv-isa-manual/blob/6dcbc6da9ada01f0f57da83cda6059bdec57619f/src/ztso-st-ext.adoc#L1 IIRC, the TSO elf files will have special bit set in elf header. So we would have to build whole java for TSO mode, then UseZtso could be a static compile-time option ------------- PR Comment: https://git.openjdk.org/jdk/pull/15613#issuecomment-1709982309 From jvernee at openjdk.org Thu Sep 7 11:39:30 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 7 Sep 2023 11:39:30 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v15] In-Reply-To: References: Message-ID: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: - add name of SysV ABI - Fix javadoc issues in MemorySegment::copy Reviewed-by: jvernee ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15103/files - new: https://git.openjdk.org/jdk/pull/15103/files/52df58f5..a48a77bc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=13-14 Stats: 10 lines in 2 files changed: 3 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From luhenry at openjdk.org Thu Sep 7 11:39:36 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 7 Sep 2023 11:39:36 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support [v2] In-Reply-To: References: Message-ID: <5uVZSvWhCPe5ocu89MpxKXI8x-EBvyg-kKVkx5pbFCw=.57b5b003-4d57-4c05-a6bc-d532401820b9@github.com> > With the Ztso extension [1], some hardware will support TSO on RISC-V. That allows us to reduce the generation of memory fences, given the stronger memory model compared to RVWMO. > > [1] https://github.com/riscv/riscv-isa-manual/blob/6dcbc6da9ada01f0f57da83cda6059bdec57619f/src/ztso-st-ext.adoc#L1 Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: fixup! 8315841: RISC-V: Check for hardware TSO support ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15613/files - new: https://git.openjdk.org/jdk/pull/15613/files/2a10b8c0..1c397ad5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15613&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15613&range=00-01 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15613.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15613/head:pull/15613 PR: https://git.openjdk.org/jdk/pull/15613 From jvernee at openjdk.org Thu Sep 7 11:39:32 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 7 Sep 2023 11:39:32 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v10] In-Reply-To: References: Message-ID: On Wed, 6 Sep 2023 16:03:40 GMT, Paul Sandoz wrote: >> [This SO question](https://stackoverflow.com/a/40348010) points to a gitlab repo that seems to have the latest version: https://gitlab.com/x86-psABIs/x86-64-ABI But, I'm not sure how stable that is, or if that's an authoritative source. >> >> Alternatively, we could refer to the name only: "System V Application Binary Interface - AMD64 Architecture Processor Supplement" (or "x86-64 psABI") Then people can google for themselves and find it. > > Yeah, its hard to find the official and latest version. Referring to the full title will help. I've added the name now: https://github.com/openjdk/jdk/pull/15103/commits/a48a77bcdadda65a81ad30abf00e6da46a56b933 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1318472039 From luhenry at openjdk.org Thu Sep 7 11:41:38 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 7 Sep 2023 11:41:38 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support In-Reply-To: References: Message-ID: <-QEhs0Na8NnyBylAs9Ff3EXCc7ioiX4lxig5UuqIuKA=.118ed167-c28d-4227-a5a0-e4a1f09e7374@github.com> On Thu, 7 Sep 2023 11:27:25 GMT, Vladimir Kempik wrote: > So we would have to build whole java for TSO mode, then UseZtso could be a static compile-time option We don't have to build hotspot for TSO in order to use it in code generated for Java, given the RVTSO memory model is strictly stronger than RVWMO. We could indeed set `UseZtso` statically in case hotspot is compiled with TSO. ~I don't know how to check the ELF-compiled options, would you know how we do it elsewhere?~ (you just updated your comment with how to do it) ------------- PR Comment: https://git.openjdk.org/jdk/pull/15613#issuecomment-1709998888 From luhenry at openjdk.org Thu Sep 7 11:48:16 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 7 Sep 2023 11:48:16 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support [v3] In-Reply-To: References: Message-ID: <5CBrYFF5tSxXOsg9pmG0gNcAkeUApRLiWhpxJU9J5OU=.827ff564-0f10-4306-bd3c-3fe609efdf47@github.com> > With the Ztso extension [1], some hardware will support TSO on RISC-V. That allows us to reduce the generation of memory fences, given the stronger memory model compared to RVWMO. > > [1] https://github.com/riscv/riscv-isa-manual/blob/6dcbc6da9ada01f0f57da83cda6059bdec57619f/src/ztso-st-ext.adoc#L1 Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: fixup! 8315841: RISC-V: Check for hardware TSO support ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15613/files - new: https://git.openjdk.org/jdk/pull/15613/files/1c397ad5..e9691f3f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15613&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15613&range=01-02 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15613.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15613/head:pull/15613 PR: https://git.openjdk.org/jdk/pull/15613 From luhenry at openjdk.org Thu Sep 7 12:00:30 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 7 Sep 2023 12:00:30 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support [v4] In-Reply-To: References: Message-ID: <0XwNfOt464hFdux7jasXngWvxwiPYHQ7dnhKFCunePw=.a9f5b6d8-119d-476e-9d97-370b682f065e@github.com> > With the Ztso extension [1], some hardware will support TSO on RISC-V. That allows us to reduce the generation of memory fences, given the stronger memory model compared to RVWMO. > > [1] https://github.com/riscv/riscv-isa-manual/blob/6dcbc6da9ada01f0f57da83cda6059bdec57619f/src/ztso-st-ext.adoc#L1 Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: fixup! 8315841: RISC-V: Check for hardware TSO support ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15613/files - new: https://git.openjdk.org/jdk/pull/15613/files/e9691f3f..5f80c7ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15613&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15613&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15613.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15613/head:pull/15613 PR: https://git.openjdk.org/jdk/pull/15613 From vkempik at openjdk.org Thu Sep 7 12:06:42 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 7 Sep 2023 12:06:42 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support [v4] In-Reply-To: <0XwNfOt464hFdux7jasXngWvxwiPYHQ7dnhKFCunePw=.a9f5b6d8-119d-476e-9d97-370b682f065e@github.com> References: <0XwNfOt464hFdux7jasXngWvxwiPYHQ7dnhKFCunePw=.a9f5b6d8-119d-476e-9d97-370b682f065e@github.com> Message-ID: On Thu, 7 Sep 2023 12:00:30 GMT, Ludovic Henry wrote: >> With the Ztso extension [1], some hardware will support TSO on RISC-V. That allows us to reduce the generation of memory fences, given the stronger memory model compared to RVWMO. >> >> [1] https://github.com/riscv/riscv-isa-manual/blob/6dcbc6da9ada01f0f57da83cda6059bdec57619f/src/ztso-st-ext.adoc#L1 > > Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: > > fixup! 8315841: RISC-V: Check for hardware TSO support Marked as reviewed by vkempik (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15613#pullrequestreview-1615273578 From rehn at openjdk.org Thu Sep 7 12:29:03 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 7 Sep 2023 12:29:03 GMT Subject: RFR: 8315652: RISC-V: Features string uses wrong separator for jtreg [v2] In-Reply-To: <3oL6QCLCgkfDLQrUohFBdczMAVuJXdLvAkwU7xWhaI8=.595b5953-9501-4e6c-a265-a9f014634c04@github.com> References: <3oL6QCLCgkfDLQrUohFBdczMAVuJXdLvAkwU7xWhaI8=.595b5953-9501-4e6c-a265-a9f014634c04@github.com> Message-ID: > Hi, please consider. > > As described in jbs, this handles both cases with a rough solution by having two strings. > Meaning we get e.g. 'v' as a separate feature from CPUInfo, but we still get the pretty string in e.g. hs_err. > > Tested tier1 on qemu rv. Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Wrong buffer copied ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15579/files - new: https://git.openjdk.org/jdk/pull/15579/files/8ed5e590..1ca70f8f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15579&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15579&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15579.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15579/head:pull/15579 PR: https://git.openjdk.org/jdk/pull/15579 From rehn at openjdk.org Thu Sep 7 12:29:04 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 7 Sep 2023 12:29:04 GMT Subject: RFR: 8315652: RISC-V: Features string uses wrong separator for jtreg [v2] In-Reply-To: References: <3oL6QCLCgkfDLQrUohFBdczMAVuJXdLvAkwU7xWhaI8=.595b5953-9501-4e6c-a265-a9f014634c04@github.com> Message-ID: On Thu, 7 Sep 2023 08:36:46 GMT, Robbin Ehn wrote: >> src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp line 162: >> >>> 160: >>> 161: _features_string = os::strdup(buf); >>> 162: _parsable_features_string = os::strdup(buf); >> >> Shouldn't this be `buf_pfs` instead of `buf`? > > Yes, this looks wrong. I'll investigate way this was working in testing. It seem like it should not have.... Yes, this was not working at all. I found the issue in my testing. Good catch! Thanks alot! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15579#discussion_r1318527157 From aph at openjdk.org Thu Sep 7 12:33:42 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 7 Sep 2023 12:33:42 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support [v4] In-Reply-To: <0XwNfOt464hFdux7jasXngWvxwiPYHQ7dnhKFCunePw=.a9f5b6d8-119d-476e-9d97-370b682f065e@github.com> References: <0XwNfOt464hFdux7jasXngWvxwiPYHQ7dnhKFCunePw=.a9f5b6d8-119d-476e-9d97-370b682f065e@github.com> Message-ID: On Thu, 7 Sep 2023 12:00:30 GMT, Ludovic Henry wrote: >> With the Ztso extension [1], some hardware will support TSO on RISC-V. That allows us to reduce the generation of memory fences, given the stronger memory model compared to RVWMO. >> >> [1] https://github.com/riscv/riscv-isa-manual/blob/6dcbc6da9ada01f0f57da83cda6059bdec57619f/src/ztso-st-ext.adoc#L1 > > Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: > > fixup! 8315841: RISC-V: Check for hardware TSO support src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 392: > 390: Assembler::fence(predecessor, successor); > 391: } > 392: Suggestion: void fence(uint32_t predecessor, uint32_t successor) { if (UseZtso) { if (pred_succ_to_membar_mask(predecessor, successor) & StoreLoad) { // TSO allows for stores to be reordered after loads. When the compiler // generates a fence to disallow that, we are required to generate the // fence for correctness. Assembler::fence(predecessor, successor); } else { // TSO guarantees other orderings already. } } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15613#discussion_r1318533009 From vkempik at openjdk.org Thu Sep 7 12:39:39 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 7 Sep 2023 12:39:39 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support [v4] In-Reply-To: References: <0XwNfOt464hFdux7jasXngWvxwiPYHQ7dnhKFCunePw=.a9f5b6d8-119d-476e-9d97-370b682f065e@github.com> Message-ID: <1ICZgbo_GA4-OYUVOStzlY1ZmLnJfU-OqV23GPv29-I=.c2090136-4359-4bdd-a8b6-4f6b92f40f18@github.com> On Thu, 7 Sep 2023 12:29:15 GMT, Andrew Haley wrote: >> Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: >> >> fixup! 8315841: RISC-V: Check for hardware TSO support > > src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 392: > >> 390: Assembler::fence(predecessor, successor); >> 391: } >> 392: > > Suggestion: > > void fence(uint32_t predecessor, uint32_t successor) { > if (UseZtso) { > if (pred_succ_to_membar_mask(predecessor, successor) & StoreLoad) { > // TSO allows for stores to be reordered after loads. When the compiler > // generates a fence to disallow that, we are required to generate the > // fence for correctness. > Assembler::fence(predecessor, successor); > } else { > // TSO guarantees other orderings already. > } > } > } @theRealAph this way we will miss fence completely when UseZtso is false ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15613#discussion_r1318541727 From rehn at openjdk.org Thu Sep 7 12:50:44 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 7 Sep 2023 12:50:44 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support [v4] In-Reply-To: <0XwNfOt464hFdux7jasXngWvxwiPYHQ7dnhKFCunePw=.a9f5b6d8-119d-476e-9d97-370b682f065e@github.com> References: <0XwNfOt464hFdux7jasXngWvxwiPYHQ7dnhKFCunePw=.a9f5b6d8-119d-476e-9d97-370b682f065e@github.com> Message-ID: On Thu, 7 Sep 2023 12:00:30 GMT, Ludovic Henry wrote: >> With the Ztso extension [1], some hardware will support TSO on RISC-V. That allows us to reduce the generation of memory fences, given the stronger memory model compared to RVWMO. >> >> [1] https://github.com/riscv/riscv-isa-manual/blob/6dcbc6da9ada01f0f57da83cda6059bdec57619f/src/ztso-st-ext.adoc#L1 > > Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: > > fixup! 8315841: RISC-V: Check for hardware TSO support Thanks, looks good (minus a nit). src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 381: > 379: void fence(uint32_t predecessor, uint32_t successor) { > 380: if (UseZtso) { > 381: if (pred_succ_to_membar_mask(predecessor, successor) & StoreLoad) { This should be "(pred_succ_to_membar_mask(predecessor, successor) & StoreLoad) == 0". src/hotspot/cpu/riscv/vm_version_riscv.cpp line 213: > 211: } > 212: > 213: #if defined(TARGET_ZTSO) && TARGET_ZTSO If someone compiles with "CXXFLAGS=-marchrv64....ztso..", we need to try to parse the supplied flags, that doesn't seem like fun. Instead I suggest we add code to read-out the elf flags, i.e: "Flags: 0x15, RVC, double-float ABI, TSO" And set UseZtso: A: If this is a TSO elf. B: If hwprobe says this TSO hardware (either directly or via vendor). C: If someone set flag, I guess your idea was to have a flag like --enable-tso which sets TARGET_TSO ? If we have that or not I still like above to happen. (I'm not saying you should do any of this in this PR, I can file new ones) ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15613#pullrequestreview-1615312026 PR Review Comment: https://git.openjdk.org/jdk/pull/15613#discussion_r1318530105 PR Review Comment: https://git.openjdk.org/jdk/pull/15613#discussion_r1318549490 From matsaave at openjdk.org Thu Sep 7 12:53:24 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 7 Sep 2023 12:53:24 GMT Subject: RFR: 8292692: Move MethodCounters inline functions out of method.hpp [v2] In-Reply-To: References: Message-ID: > The inline functions related to MethodCounters in method.hpp can be moved to the inline file to reduce the number of includes. Verified with tier 1-5 tests. > > Below is a comparison of the old and new include statistics: > > Old > ---- > scanning 836 methodCounters.hpp > 2 found 836 method.hpp > > scanning 837 invocationCounter.hpp > 2 found 836 method.hpp > 3 found 836 methodCounters.hpp > 4 found 649 interp_masm_x86.hpp > 5 found 0 interp_masm_aarch64.hpp > 6 found 0 interp_masm_arm.hpp > 7 found 0 interp_masm_ppc.hpp > 8 found 0 interp_masm_riscv.hpp > 9 found 0 interp_masm_s390.hpp > 10 found 0 interp_masm_zero.hpp > > scanning 298 method.inline.hpp > 2 found 286 continuationEntry_x86.inline.hpp > 3 found 0 continuationEntry_aarch64.inline.hpp > 4 found 0 continuationEntry_ppc.inline.hpp > 5 found 0 continuationEntry_riscv.inline.hpp > > > > New > ----- > scanning 304 methodCounters.hpp > 2 found 299 method.inline.hpp > > scanning 476 invocationCounter.hpp > 2 found 304 methodCounters.hpp > 3 found 257 methodData.hpp > 4 found 0 interp_masm_aarch64.hpp > 5 found 0 interp_masm_ppc.hpp > 6 found 0 interp_masm_riscv.hpp > 7 found 0 interp_masm_s390.hpp > 8 found 0 interp_masm_zero.hpp > > scanning 299 method.inline.hpp > 2 found 286 continuationEntry_x86.inline.hpp > 3 found 0 continuationEntry_aarch64.inline.hpp > 4 found 0 continuationEntry_ppc.inline.hpp > 5 found 0 continuationEntry_riscv.inline.hpp Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Calvin comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15094/files - new: https://git.openjdk.org/jdk/pull/15094/files/dcffac79..e1ba0319 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15094&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15094&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15094.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15094/head:pull/15094 PR: https://git.openjdk.org/jdk/pull/15094 From rehn at openjdk.org Thu Sep 7 12:56:39 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 7 Sep 2023 12:56:39 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support [v4] In-Reply-To: <1ICZgbo_GA4-OYUVOStzlY1ZmLnJfU-OqV23GPv29-I=.c2090136-4359-4bdd-a8b6-4f6b92f40f18@github.com> References: <0XwNfOt464hFdux7jasXngWvxwiPYHQ7dnhKFCunePw=.a9f5b6d8-119d-476e-9d97-370b682f065e@github.com> <1ICZgbo_GA4-OYUVOStzlY1ZmLnJfU-OqV23GPv29-I=.c2090136-4359-4bdd-a8b6-4f6b92f40f18@github.com> Message-ID: On Thu, 7 Sep 2023 12:36:54 GMT, Vladimir Kempik wrote: >> src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 392: >> >>> 390: Assembler::fence(predecessor, successor); >>> 391: } >>> 392: >> >> Suggestion: >> >> void fence(uint32_t predecessor, uint32_t successor) { >> if (UseZtso) { >> if (pred_succ_to_membar_mask(predecessor, successor) & StoreLoad) { >> // TSO allows for stores to be reordered after loads. When the compiler >> // generates a fence to disallow that, we are required to generate the >> // fence for correctness. >> Assembler::fence(predecessor, successor); >> } else { >> // TSO guarantees other orderings already. >> } >> } >> } > > @theRealAph this way we will miss fence completely when UseZtso is false I prefer: { if (UseZtso) { .... return; } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15613#discussion_r1318559284 From luhenry at openjdk.org Thu Sep 7 12:56:42 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 7 Sep 2023 12:56:42 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support [v4] In-Reply-To: References: <0XwNfOt464hFdux7jasXngWvxwiPYHQ7dnhKFCunePw=.a9f5b6d8-119d-476e-9d97-370b682f065e@github.com> Message-ID: On Thu, 7 Sep 2023 12:44:09 GMT, Robbin Ehn wrote: >> Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: >> >> fixup! 8315841: RISC-V: Check for hardware TSO support > > src/hotspot/cpu/riscv/vm_version_riscv.cpp line 213: > >> 211: } >> 212: >> 213: #if defined(TARGET_ZTSO) && TARGET_ZTSO > > If someone compiles with "CXXFLAGS=-marchrv64....ztso..", we need to try to parse the supplied flags, that doesn't seem like fun. > Instead I suggest we add code to read-out the elf flags, i.e: > "Flags: 0x15, RVC, double-float ABI, TSO" > > And set UseZtso: > A: If this is a TSO elf. > B: If hwprobe says this TSO hardware (either directly or via vendor). > C: If someone set flag, > > I guess your idea was to have a flag like --enable-tso which sets TARGET_TSO ? > If we have that or not I still like above to happen. > > (I'm not saying you should do any of this in this PR, I can file new ones) `TARGET_TSO` is set by gcc directly. See https://www.mail-archive.com/gcc-patches at gcc.gnu.org/msg281514.html ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15613#discussion_r1318559665 From luhenry at openjdk.org Thu Sep 7 13:04:05 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 7 Sep 2023 13:04:05 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support [v5] In-Reply-To: References: Message-ID: <3aqRu6uqKceznkcdnvIBWGH_CPshz2uOvVdVZuRVkuw=.6baad5d7-c405-4171-8abc-8d6f15e9ac7d@github.com> > With the Ztso extension [1], some hardware will support TSO on RISC-V. That allows us to reduce the generation of memory fences, given the stronger memory model compared to RVWMO. > > [1] https://github.com/riscv/riscv-isa-manual/blob/6dcbc6da9ada01f0f57da83cda6059bdec57619f/src/ztso-st-ext.adoc#L1 Ludovic Henry has updated the pull request incrementally with two additional commits since the last revision: - fixup! 8315841: RISC-V: Check for hardware TSO support - fixup! 8315841: RISC-V: Check for hardware TSO support ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15613/files - new: https://git.openjdk.org/jdk/pull/15613/files/5f80c7ca..c41c3e74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15613&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15613&range=03-04 Stats: 6 lines in 1 file changed: 4 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15613.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15613/head:pull/15613 PR: https://git.openjdk.org/jdk/pull/15613 From luhenry at openjdk.org Thu Sep 7 13:04:08 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 7 Sep 2023 13:04:08 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support [v4] In-Reply-To: References: <0XwNfOt464hFdux7jasXngWvxwiPYHQ7dnhKFCunePw=.a9f5b6d8-119d-476e-9d97-370b682f065e@github.com> Message-ID: On Thu, 7 Sep 2023 12:26:51 GMT, Robbin Ehn wrote: >> Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: >> >> fixup! 8315841: RISC-V: Check for hardware TSO support > > src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 381: > >> 379: void fence(uint32_t predecessor, uint32_t successor) { >> 380: if (UseZtso) { >> 381: if (pred_succ_to_membar_mask(predecessor, successor) & StoreLoad) { > > This should be "(pred_succ_to_membar_mask(predecessor, successor) & StoreLoad) == 0". Not AFAIU, as we would then _not_ generate `StoreLoad` barriers which are the only one TSO doesn't guarantee. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15613#discussion_r1318565637 From luhenry at openjdk.org Thu Sep 7 13:04:09 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 7 Sep 2023 13:04:09 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support [v4] In-Reply-To: References: <0XwNfOt464hFdux7jasXngWvxwiPYHQ7dnhKFCunePw=.a9f5b6d8-119d-476e-9d97-370b682f065e@github.com> <1ICZgbo_GA4-OYUVOStzlY1ZmLnJfU-OqV23GPv29-I=.c2090136-4359-4bdd-a8b6-4f6b92f40f18@github.com> Message-ID: On Thu, 7 Sep 2023 12:52:53 GMT, Robbin Ehn wrote: >> @theRealAph this way we will miss fence completely when UseZtso is false > > I prefer: > > { > if (UseZtso) { > .... > return; > } > > } We still need to generate the fence in case of non-TSO, so with that code structure, it would look like that: void fence(uint32_t predecessor, uint32_t successor) { if (UseZtso) { if (pred_succ_to_membar_mask(predecessor, successor) & StoreLoad) { // TSO allows for stores to be reordered after loads. When the compiler // generates a fence to disallow that, we are required to generate the // fence for correctness. Assembler::fence(predecessor, successor); } else { // TSO guarantees other fences already. } } else { Assembler::fence(predecessor, successor); } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15613#discussion_r1318564436 From jvernee at openjdk.org Thu Sep 7 13:07:50 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 7 Sep 2023 13:07:50 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v16] In-Reply-To: References: Message-ID: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: Add support for sliced allocation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15103/files - new: https://git.openjdk.org/jdk/pull/15103/files/a48a77bc..55296527 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=14-15 Stats: 539 lines in 9 files changed: 413 ins; 56 del; 70 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From jvernee at openjdk.org Thu Sep 7 13:08:15 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 7 Sep 2023 13:08:15 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v15] In-Reply-To: References: Message-ID: On Thu, 7 Sep 2023 11:39:30 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: > > - add name of SysV ABI > - Fix javadoc issues in MemorySegment::copy > > Reviewed-by: jvernee After discussing with Maurizio, I've added one more non-trivial change to this patch, brought over from the panama-foreign repo. This adds a new `SegmentAllocator::allocateFrom` overload which accepts a MemorySegment as an initializer. See the original PR: https://github.com/openjdk/panama-foreign/pull/878 The commit I've added also includes the changes from https://github.com/openjdk/panama-foreign/pull/855 which were required by the first patch. I had to massage the code a bit since the javadoc in the mainline slightly deviates from the one in the panama-foreign repo. Please take a look at the changes here: https://github.com/openjdk/jdk/pull/15103/commits/55296527a029b80dd78a7d1aecb429e793d7d32e ------------- PR Comment: https://git.openjdk.org/jdk/pull/15103#issuecomment-1710117041 From luhenry at openjdk.org Thu Sep 7 13:14:23 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 7 Sep 2023 13:14:23 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support [v6] In-Reply-To: References: Message-ID: > With the Ztso extension [1], some hardware will support TSO on RISC-V. That allows us to reduce the generation of memory fences, given the stronger memory model compared to RVWMO. > > [1] https://github.com/riscv/riscv-isa-manual/blob/6dcbc6da9ada01f0f57da83cda6059bdec57619f/src/ztso-st-ext.adoc#L1 Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: fixup! 8315841: RISC-V: Check for hardware TSO support ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15613/files - new: https://git.openjdk.org/jdk/pull/15613/files/c41c3e74..b40a28bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15613&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15613&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15613.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15613/head:pull/15613 PR: https://git.openjdk.org/jdk/pull/15613 From vkempik at openjdk.org Thu Sep 7 13:14:27 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 7 Sep 2023 13:14:27 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support [v5] In-Reply-To: <3aqRu6uqKceznkcdnvIBWGH_CPshz2uOvVdVZuRVkuw=.6baad5d7-c405-4171-8abc-8d6f15e9ac7d@github.com> References: <3aqRu6uqKceznkcdnvIBWGH_CPshz2uOvVdVZuRVkuw=.6baad5d7-c405-4171-8abc-8d6f15e9ac7d@github.com> Message-ID: On Thu, 7 Sep 2023 13:04:05 GMT, Ludovic Henry wrote: >> With the Ztso extension [1], some hardware will support TSO on RISC-V. That allows us to reduce the generation of memory fences, given the stronger memory model compared to RVWMO. >> >> [1] https://github.com/riscv/riscv-isa-manual/blob/6dcbc6da9ada01f0f57da83cda6059bdec57619f/src/ztso-st-ext.adoc#L1 > > Ludovic Henry has updated the pull request incrementally with two additional commits since the last revision: > > - fixup! 8315841: RISC-V: Check for hardware TSO support > - fixup! 8315841: RISC-V: Check for hardware TSO support Maybe this way: still needs a good comment for cases when we don't generate fence if (UseZtso) { if ((pred_succ_to_membar_mask(predecessor, successor) & StoreLoad) == 0) { return; } // always generate fence for RVWMO Assembler::fence(predecessor, successor); ------------- PR Comment: https://git.openjdk.org/jdk/pull/15613#issuecomment-1710123100 From rehn at openjdk.org Thu Sep 7 13:14:32 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 7 Sep 2023 13:14:32 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support [v4] In-Reply-To: References: <0XwNfOt464hFdux7jasXngWvxwiPYHQ7dnhKFCunePw=.a9f5b6d8-119d-476e-9d97-370b682f065e@github.com> Message-ID: On Thu, 7 Sep 2023 12:58:12 GMT, Ludovic Henry wrote: >> src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 381: >> >>> 379: void fence(uint32_t predecessor, uint32_t successor) { >>> 380: if (UseZtso) { >>> 381: if (pred_succ_to_membar_mask(predecessor, successor) & StoreLoad) { >> >> This should be "(pred_succ_to_membar_mask(predecessor, successor) & StoreLoad) == 0". > > Not AFAIU, as we would then _not_ generate `StoreLoad` barriers which are the only one TSO doesn't guarantee. No, sorry I mean != 0. We don't if on integer values. So that if should be a boolean expression. >> src/hotspot/cpu/riscv/vm_version_riscv.cpp line 213: >> >>> 211: } >>> 212: >>> 213: #if defined(TARGET_ZTSO) && TARGET_ZTSO >> >> If someone compiles with "CXXFLAGS=-marchrv64....ztso..", we need to try to parse the supplied flags, that doesn't seem like fun. >> Instead I suggest we add code to read-out the elf flags, i.e: >> "Flags: 0x15, RVC, double-float ABI, TSO" >> >> And set UseZtso: >> A: If this is a TSO elf. >> B: If hwprobe says this TSO hardware (either directly or via vendor). >> C: If someone set flag, >> >> I guess your idea was to have a flag like --enable-tso which sets TARGET_TSO ? >> If we have that or not I still like above to happen. >> >> (I'm not saying you should do any of this in this PR, I can file new ones) > > `TARGET_TSO` is set by gcc directly. See https://www.mail-archive.com/gcc-patches at gcc.gnu.org/msg281514.html Ah, it is not set by LLVM what I can see at least (running a couple of weeks old tip). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15613#discussion_r1318570723 PR Review Comment: https://git.openjdk.org/jdk/pull/15613#discussion_r1318578037 From mbaesken at openjdk.org Thu Sep 7 14:05:41 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 7 Sep 2023 14:05:41 GMT Subject: RFR: 8315220: Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native In-Reply-To: References: Message-ID: On Fri, 1 Sep 2023 12:03:39 GMT, Markus Gr?nlund wrote: > Greetings, > > This change set fixes the issue of taking a JFR stack trace in the wrong thread state for the NativeLibraryLoad and NativeLibraryUnload events. > > A follow-up change set, [JDK-8315364](https://bugs.openjdk.org/browse/JDK-8315364) will add assertions to the JFR stack trace code to help find similar issues earlier. > > There are a few additional improvements: > > The event declaration in metadata.xml now includes the generating thread since a stack trace without the generating thread is subpar. > > In os_linux.cpp, the NativeLibraryLoad event was located after the call to dlopen(), which means that the event, declared durational, fails to capture the duration of the call. > > Finally, the test is extended to validate the captured stack trace. > > Testing: jdk_jfr, stress testing > > Thanks > Markus LGTM (we tested it also for some days in our internal infrastructure) ------------- Marked as reviewed by mbaesken (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15535#pullrequestreview-1615524142 From fyang at openjdk.org Thu Sep 7 14:21:40 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 7 Sep 2023 14:21:40 GMT Subject: RFR: 8315652: RISC-V: Features string uses wrong separator for jtreg [v2] In-Reply-To: References: <3oL6QCLCgkfDLQrUohFBdczMAVuJXdLvAkwU7xWhaI8=.595b5953-9501-4e6c-a265-a9f014634c04@github.com> Message-ID: On Thu, 7 Sep 2023 12:29:03 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> As described in jbs, this handles both cases with a rough solution by having two strings. >> Meaning we get e.g. 'v' as a separate feature from CPUInfo, but we still get the pretty string in e.g. hs_err. >> >> Tested tier1 on qemu rv. > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Wrong buffer copied > > Hi, I have a small question about the JBS description. Where does the comma in the string we are regexping comes from if we use plain space here? I am also wondering if we could do the separation for the original `_features_string`. This would help elimnate changes to the shared code. I guess it might not be a big issue for other places. > > CPUInfo.java splits the feature string into a List, so the string supplied to jtreg required annotation is Arrays.toString(). So e.g. "rv64 i g c v zicbop z..." is in a List with strings: "rv64", "i", "g", "c", "v", "zicbop,".... toString() returns: "[rv64, i, g, c, v, zicbop, z...]" Thanks for the explaination. > Yes, we could do the separation from the original string but adding a method to parse out everything didn't seem better. I think it actually easier to do the other way around having the feature flags with separator and then remove them for 'pretty' print string. But not clear to me that it would be an improvement, i.e. having more code dealing with strings. In fact, I mean simply keeping the feature flags with space separators in `_features_string` without removing them for 'pretty' print string. As you mentioned on JBS, then we would have CPU info/desc like: "CPU: total 16 (initial active 16) rv64 i m a f d c v zicbom zicboz zicbop zba zbb zbs zicsr zifencei zic64b zihintpause" This seems acceptable to me. Please consider. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15579#issuecomment-1710239845 From mgronlun at openjdk.org Thu Sep 7 14:56:24 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 7 Sep 2023 14:56:24 GMT Subject: RFR: 8315220: Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native [v2] In-Reply-To: References: Message-ID: > Greetings, > > This change set fixes the issue of taking a JFR stack trace in the wrong thread state for the NativeLibraryLoad and NativeLibraryUnload events. > > A follow-up change set, [JDK-8315364](https://bugs.openjdk.org/browse/JDK-8315364) will add assertions to the JFR stack trace code to help find similar issues earlier. > > There are a few additional improvements: > > The event declaration in metadata.xml now includes the generating thread since a stack trace without the generating thread is subpar. > > In os_linux.cpp, the NativeLibraryLoad event was located after the call to dlopen(), which means that the event, declared durational, fails to capture the duration of the call. > > Finally, the test is extended to validate the captured stack trace. > > Testing: jdk_jfr, stress testing > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: renaming and commentary ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15535/files - new: https://git.openjdk.org/jdk/pull/15535/files/aa4bc48a..b0094b7f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15535&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15535&range=00-01 Stats: 32 lines in 2 files changed: 11 ins; 0 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/15535.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15535/head:pull/15535 PR: https://git.openjdk.org/jdk/pull/15535 From mgronlun at openjdk.org Thu Sep 7 14:59:39 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 7 Sep 2023 14:59:39 GMT Subject: RFR: 8315220: Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native [v2] In-Reply-To: References: Message-ID: On Thu, 7 Sep 2023 14:02:50 GMT, Matthias Baesken wrote: > LGTM (we tested it also for some days in our internal infrastructure) Thank you, Matthias! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15535#issuecomment-1710303606 From egahlin at openjdk.org Thu Sep 7 15:10:41 2023 From: egahlin at openjdk.org (Erik Gahlin) Date: Thu, 7 Sep 2023 15:10:41 GMT Subject: RFR: 8315220: Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native [v2] In-Reply-To: References: Message-ID: On Thu, 7 Sep 2023 14:56:24 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> This change set fixes the issue of taking a JFR stack trace in the wrong thread state for the NativeLibraryLoad and NativeLibraryUnload events. >> >> A follow-up change set, [JDK-8315364](https://bugs.openjdk.org/browse/JDK-8315364) will add assertions to the JFR stack trace code to help find similar issues earlier. >> >> There are a few additional improvements: >> >> The event declaration in metadata.xml now includes the generating thread since a stack trace without the generating thread is subpar. >> >> In os_linux.cpp, the NativeLibraryLoad event was located after the call to dlopen(), which means that the event, declared durational, fails to capture the duration of the call. >> >> Finally, the test is extended to validate the captured stack trace. >> >> Testing: jdk_jfr, stress testing >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > renaming and commentary Marked as reviewed by egahlin (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15535#pullrequestreview-1615668304 From mgronlun at openjdk.org Thu Sep 7 15:10:43 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 7 Sep 2023 15:10:43 GMT Subject: RFR: 8315220: Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native [v2] In-Reply-To: References: Message-ID: On Thu, 7 Sep 2023 15:06:25 GMT, Erik Gahlin wrote: >> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: >> >> renaming and commentary > > Marked as reviewed by egahlin (Reviewer). Thank you @egahlin and @MBaesken, for your reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15535#issuecomment-1710322305 From duke at openjdk.org Thu Sep 7 15:18:45 2023 From: duke at openjdk.org (Ilya Gavrilin) Date: Thu, 7 Sep 2023 15:18:45 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v13] In-Reply-To: References: Message-ID: On Mon, 4 Sep 2023 16:44:10 GMT, Ilya Gavrilin wrote: >> Please review this changes into risc-v double rounding intrinsic. >> >> On risc-v intrinsics for rounding doubles with mode (like Math.ceil/floor/rint) were missing. On risc-v we don`t have special instruction for such conversion, so two times conversion was used: double -> long int -> double (using fcvt.l.d, fcvt.d.l). >> >> Also, we should provide some rounding mode to fcvt.x.x instruction. >> >> Rounding mode selection on ceil (similar for floor and rint): according to Math.ceil requirements: >> >>> Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer (Math.java:475). >> >> For double -> long int we choose rup (round towards +inf) mode to get the integer that more than or equal to the input value. >> For long int -> double we choose rdn (rounds towards -inf) mode to get the smallest (closest to -inf) representation of integer that we got after conversion. >> >> For cases when we got inf, nan, or value more than 2^63 return input value (double value which more than 2^63 is guaranteed integer). >> As well when we store result we copy sign from input value (need for cases when for (-1.0, 0.0) ceil need to return -0.0). >> >> We have observed significant improvement on hifive and thead boards. >> >> testing: tier1, tier2 and hotspot:tier3 on hifive >> >> Performance results on hifive (FpRoundingBenchmark.testceil/floor/rint): >> >> Without intrinsic: >> >> Benchmark (TESTSIZE) Mode Cnt Score Error Units >> FpRoundingBenchmark.testceil 1024 thrpt 25 39.297 ? 0.037 ops/ms >> FpRoundingBenchmark.testfloor 1024 thrpt 25 39.398 ? 0.018 ops/ms >> FpRoundingBenchmark.testrint 1024 thrpt 25 36.388 ? 0.844 ops/ms >> >> With intrinsic: >> >> Benchmark (TESTSIZE) Mode Cnt Score Error Units >> FpRoundingBenchmark.testceil 1024 thrpt 25 80.560 ? 0.053 ops/ms >> FpRoundingBenchmark.testfloor 1024 thrpt 25 80.541 ? 0.081 ops/ms >> FpRoundingBenchmark.testrint 1024 thrpt 25 80.603 ? 0.071 ops/ms > > Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in c2_macroassembler Tier1 tests are good ------------- PR Comment: https://git.openjdk.org/jdk/pull/14991#issuecomment-1710336835 From duke at openjdk.org Thu Sep 7 15:29:54 2023 From: duke at openjdk.org (Ilya Gavrilin) Date: Thu, 7 Sep 2023 15:29:54 GMT Subject: Integrated: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint In-Reply-To: References: Message-ID: <3cE8MKtGd3gLpmpbHCRmTj1Q3AuAo59IB8I07fZ8Tt8=.ab30bcae-2b3f-413d-92ec-a43a8c68bdb3@github.com> On Mon, 24 Jul 2023 08:22:52 GMT, Ilya Gavrilin wrote: > Please review this changes into risc-v double rounding intrinsic. > > On risc-v intrinsics for rounding doubles with mode (like Math.ceil/floor/rint) were missing. On risc-v we don`t have special instruction for such conversion, so two times conversion was used: double -> long int -> double (using fcvt.l.d, fcvt.d.l). > > Also, we should provide some rounding mode to fcvt.x.x instruction. > > Rounding mode selection on ceil (similar for floor and rint): according to Math.ceil requirements: > >> Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer (Math.java:475). > > For double -> long int we choose rup (round towards +inf) mode to get the integer that more than or equal to the input value. > For long int -> double we choose rdn (rounds towards -inf) mode to get the smallest (closest to -inf) representation of integer that we got after conversion. > > For cases when we got inf, nan, or value more than 2^63 return input value (double value which more than 2^63 is guaranteed integer). > As well when we store result we copy sign from input value (need for cases when for (-1.0, 0.0) ceil need to return -0.0). > > We have observed significant improvement on hifive and thead boards. > > testing: tier1, tier2 and hotspot:tier3 on hifive > > Performance results on hifive (FpRoundingBenchmark.testceil/floor/rint): > > Without intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 39.297 ? 0.037 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 39.398 ? 0.018 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 36.388 ? 0.844 ops/ms > > With intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 80.560 ? 0.053 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 80.541 ? 0.081 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 80.603 ? 0.071 ops/ms This pull request has now been integrated. Changeset: 8557205a Author: Ilya Gavrilin Committer: Vladimir Kempik URL: https://git.openjdk.org/jdk/commit/8557205a8279287e00f012b82f0f29bc76789002 Stats: 83 lines in 4 files changed: 81 ins; 0 del; 2 mod 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint Reviewed-by: luhenry, fjiang, fyang ------------- PR: https://git.openjdk.org/jdk/pull/14991 From luhenry at openjdk.org Thu Sep 7 15:51:09 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 7 Sep 2023 15:51:09 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support [v7] In-Reply-To: References: Message-ID: <6GcuR5PGOmPetTP9UShn5CxZZuv8EweymonnD40_UcE=.07aae442-9652-4651-9197-b1f82fede399@github.com> > With the Ztso extension [1], some hardware will support TSO on RISC-V. That allows us to reduce the generation of memory fences, given the stronger memory model compared to RVWMO. > > [1] https://github.com/riscv/riscv-isa-manual/blob/6dcbc6da9ada01f0f57da83cda6059bdec57619f/src/ztso-st-ext.adoc#L1 Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: fixup! 8315841: RISC-V: Check for hardware TSO support ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15613/files - new: https://git.openjdk.org/jdk/pull/15613/files/b40a28bf..ec30b9a2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15613&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15613&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15613.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15613/head:pull/15613 PR: https://git.openjdk.org/jdk/pull/15613 From mandy.chung at oracle.com Thu Sep 7 16:16:57 2023 From: mandy.chung at oracle.com (mandy.chung at oracle.com) Date: Thu, 7 Sep 2023 09:16:57 -0700 Subject: Question on why sun.management MBeans are not exported? In-Reply-To: References: <1535aa6e-6865-d885-3930-df7f9ebcb4b4@oracle.com> <70186c9a-79ba-e63d-7ed9-1033dece525c@oracle.com> Message-ID: <181c2756-aced-8656-f45a-d78d5ccc7865@oracle.com> What we're referring to is to remove sun.management.Hotspot*, the internal MBeans which are never exposed and registered in the platform MBeanServer.?? The internal metrics in HotSpot VM should be retained as they are exposed through other ways like jstat, GC logs, etc. Mandy On 9/6/23 11:27 PM, Kirk Pepperdine wrote: > Hi, > > It would be a shame to lose these metrics because many of them have been very useful over time and some would be even more useful with some modifications. For example, the CPU breakouts found in GC logs has been incredibly useful as a proxy measure in helping sort out other issues in systems. So much so that I have analytics built specifically around this in my tooling. > > Kind regards, > Kirk Pepperdine > > >> On Sep 6, 2023, at 10:50 AM, Alan Bateman wrote: >> >> On 06/09/2023 16:17, Volker Simonis wrote: >>> : >>> I'm familiar with JEP 260. But wouldn't you agree that an >>> "encapsulated" monitoring API is an oxymoron? A monitoring API is by >>> design intended for external usage and completely useless to the >>> platform itself. There's no single usage of the "sun.management" >>> MBeans in the JDK itself (except for jconsole where the encapsulation >>> broke it). My assumption is that the corresponding MBeans in >>> "sun.management" are there for historic reasons (added in JDK 1.5) and >>> would have made much more sense in "com.sun.management" package. But I >>> doubt that they can be classified in the "internal implementation >>> details of the JDK and never intended for external use? category of >>> JEP 260. >> It's left over from experiments on exposing some internal metrics, I think during JDK 5. It's code that should probably have been removed a long time ago. >> >> -Alan From mgronlun at openjdk.org Thu Sep 7 16:16:59 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 7 Sep 2023 16:16:59 GMT Subject: Integrated: 8315220: Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native In-Reply-To: References: Message-ID: On Fri, 1 Sep 2023 12:03:39 GMT, Markus Gr?nlund wrote: > Greetings, > > This change set fixes the issue of taking a JFR stack trace in the wrong thread state for the NativeLibraryLoad and NativeLibraryUnload events. > > A follow-up change set, [JDK-8315364](https://bugs.openjdk.org/browse/JDK-8315364) will add assertions to the JFR stack trace code to help find similar issues earlier. > > There are a few additional improvements: > > The event declaration in metadata.xml now includes the generating thread since a stack trace without the generating thread is subpar. > > In os_linux.cpp, the NativeLibraryLoad event was located after the call to dlopen(), which means that the event, declared durational, fails to capture the duration of the call. > > Finally, the test is extended to validate the captured stack trace. > > Testing: jdk_jfr, stress testing > > Thanks > Markus This pull request has now been integrated. Changeset: 1cae0f53 Author: Markus Gr?nlund URL: https://git.openjdk.org/jdk/commit/1cae0f53a9d37fbae9471bd942f7157429a85cd1 Stats: 395 lines in 10 files changed: 243 ins; 114 del; 38 mod 8315220: Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native Reviewed-by: mbaesken, egahlin ------------- PR: https://git.openjdk.org/jdk/pull/15535 From jvernee at openjdk.org Thu Sep 7 16:26:48 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 7 Sep 2023 16:26:48 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v17] In-Reply-To: References: Message-ID: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: Split long throws clauses in `MemorySegment` javadoc Reviewed-by: jvernee ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15103/files - new: https://git.openjdk.org/jdk/pull/15103/files/55296527..2f50adbf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=15-16 Stats: 41 lines in 1 file changed: 3 ins; 0 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From matsaave at openjdk.org Thu Sep 7 16:57:40 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 7 Sep 2023 16:57:40 GMT Subject: RFR: 8292692: Move MethodCounters inline functions out of method.hpp [v2] In-Reply-To: <0SHLJ8tNH-hNEprnvIPpOuGrRKltxiZC76JwzHsK3Rs=.93cbd8c1-5bb7-45ff-8545-c04c52700c7b@github.com> References: <0SHLJ8tNH-hNEprnvIPpOuGrRKltxiZC76JwzHsK3Rs=.93cbd8c1-5bb7-45ff-8545-c04c52700c7b@github.com> Message-ID: On Wed, 6 Sep 2023 22:29:06 GMT, Calvin Cheung wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Calvin comment > > Looks good. Just one nit. Thank you for the reviews @calvinccheung, @iklam, and @vnkozlov! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15094#issuecomment-1710490575 From matsaave at openjdk.org Thu Sep 7 17:19:57 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 7 Sep 2023 17:19:57 GMT Subject: Integrated: 8292692: Move MethodCounters inline functions out of method.hpp In-Reply-To: References: Message-ID: On Mon, 31 Jul 2023 20:12:19 GMT, Matias Saavedra Silva wrote: > The inline functions related to MethodCounters in method.hpp can be moved to the inline file to reduce the number of includes. Verified with tier 1-5 tests. > > Below is a comparison of the old and new include statistics: > > Old > ---- > scanning 836 methodCounters.hpp > 2 found 836 method.hpp > > scanning 837 invocationCounter.hpp > 2 found 836 method.hpp > 3 found 836 methodCounters.hpp > 4 found 649 interp_masm_x86.hpp > 5 found 0 interp_masm_aarch64.hpp > 6 found 0 interp_masm_arm.hpp > 7 found 0 interp_masm_ppc.hpp > 8 found 0 interp_masm_riscv.hpp > 9 found 0 interp_masm_s390.hpp > 10 found 0 interp_masm_zero.hpp > > scanning 298 method.inline.hpp > 2 found 286 continuationEntry_x86.inline.hpp > 3 found 0 continuationEntry_aarch64.inline.hpp > 4 found 0 continuationEntry_ppc.inline.hpp > 5 found 0 continuationEntry_riscv.inline.hpp > > > > New > ----- > scanning 304 methodCounters.hpp > 2 found 299 method.inline.hpp > > scanning 476 invocationCounter.hpp > 2 found 304 methodCounters.hpp > 3 found 257 methodData.hpp > 4 found 0 interp_masm_aarch64.hpp > 5 found 0 interp_masm_ppc.hpp > 6 found 0 interp_masm_riscv.hpp > 7 found 0 interp_masm_s390.hpp > 8 found 0 interp_masm_zero.hpp > > scanning 299 method.inline.hpp > 2 found 286 continuationEntry_x86.inline.hpp > 3 found 0 continuationEntry_aarch64.inline.hpp > 4 found 0 continuationEntry_ppc.inline.hpp > 5 found 0 continuationEntry_riscv.inline.hpp This pull request has now been integrated. Changeset: 683672c0 Author: Matias Saavedra Silva URL: https://git.openjdk.org/jdk/commit/683672c0bbb7f4e3290bffa0df271da7d2539f8b Stats: 187 lines in 25 files changed: 106 ins; 64 del; 17 mod 8292692: Move MethodCounters inline functions out of method.hpp Reviewed-by: iklam, ccheung, kvn ------------- PR: https://git.openjdk.org/jdk/pull/15094 From luhenry at openjdk.org Thu Sep 7 17:24:28 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 7 Sep 2023 17:24:28 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support [v8] In-Reply-To: References: Message-ID: > With the Ztso extension [1], some hardware will support TSO on RISC-V. That allows us to reduce the generation of memory fences, given the stronger memory model compared to RVWMO. > > [1] https://github.com/riscv/riscv-isa-manual/blob/6dcbc6da9ada01f0f57da83cda6059bdec57619f/src/ztso-st-ext.adoc#L1 Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: fixup! 8315841: RISC-V: Check for hardware TSO support ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15613/files - new: https://git.openjdk.org/jdk/pull/15613/files/ec30b9a2..98f485ec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15613&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15613&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15613.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15613/head:pull/15613 PR: https://git.openjdk.org/jdk/pull/15613 From mchung at openjdk.org Thu Sep 7 18:22:40 2023 From: mchung at openjdk.org (Mandy Chung) Date: Thu, 7 Sep 2023 18:22:40 GMT Subject: RFR: 8268829: Provide an optimized way to walk the stack with Class object only [v11] In-Reply-To: References: Message-ID: > 8268829: Provide an optimized way to walk the stack with Class object only > > `StackWalker::walk` creates one `StackFrame` per frame and the current implementation > allocates one `StackFrameInfo` and one `MemberName` objects per frame. Some frameworks > like logging may only interest in the Class object but not the method name nor the BCI, > for example, filters out its implementation classes to find the caller class. It's > similar to `StackWalker::getCallerClass` but allows a predicate to filter out the element. > > This PR proposes to add `Option::DROP_METHOD_INFO` enum that requests to drop the method information. If no method information is needed, a `StackWalker` with `DROP_METHOD_INFO` > can be used instead and such stack walker will save the overhead of extracting the method information > and the memory used for the stack walking. > > New factory methods to take a parameter to specify the kind of stack walker to be created are defined. > This provides a simple way for existing code, for example logging frameworks, to take advantage of > this enhancement with the least change as it can keep the existing function for traversing > `StackFrame`s. > > For example: to find the first caller filtering a known list of implementation class, > existing code can create a stack walker instance with `DROP_METHOD_INFO` option: > > > StackWalker walker = StackWalker.getInstance(Option.DROP_METHOD_INFO, Option.RETAIN_CLASS_REFERENCE); > Optional> callerClass = walker.walk(s -> > s.map(StackFrame::getDeclaringClass) > .filter(Predicate.not(implClasses::contains)) > .findFirst()); > > > If method information is accessed on the `StackFrame`s produced by this stack walker such as > `StackFrame::getMethodName`, then `UnsupportedOperationException` will be thrown. > > #### Javadoc & specdiff > > https://cr.openjdk.org/~mchung/api/java.base/java/lang/StackWalker.html > https://cr.openjdk.org/~mchung/jdk22/specdiff/overview-summary.html > > #### Alternatives Considered > One alternative is to provide a new API: > ` T walkClass(Function, ? extends T> function)` > > In this case, the caller would need to pass a function that takes a stream > of `Class` object instead of `StackFrame`. Existing code would have to > modify calls to the `walk` method to `walkClass` and the function body. > > ### Implementation Details > > A `StackWalker` configured with `DROP_METHOD_INFO` option creates `ClassFrameInfo[]` > buffer that is filled by the VM during stack walking. `Sta... Mandy Chung has updated the pull request incrementally with one additional commit since the last revision: review feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15370/files - new: https://git.openjdk.org/jdk/pull/15370/files/a623b9dc..0e6abc42 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15370&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15370&range=09-10 Stats: 22 lines in 3 files changed: 21 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15370.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15370/head:pull/15370 PR: https://git.openjdk.org/jdk/pull/15370 From bchristi at openjdk.org Thu Sep 7 18:37:44 2023 From: bchristi at openjdk.org (Brent Christian) Date: Thu, 7 Sep 2023 18:37:44 GMT Subject: RFR: 8268829: Provide an optimized way to walk the stack with Class object only [v11] In-Reply-To: References: Message-ID: On Thu, 7 Sep 2023 18:22:40 GMT, Mandy Chung wrote: >> 8268829: Provide an optimized way to walk the stack with Class object only >> >> `StackWalker::walk` creates one `StackFrame` per frame and the current implementation >> allocates one `StackFrameInfo` and one `MemberName` objects per frame. Some frameworks >> like logging may only interest in the Class object but not the method name nor the BCI, >> for example, filters out its implementation classes to find the caller class. It's >> similar to `StackWalker::getCallerClass` but allows a predicate to filter out the element. >> >> This PR proposes to add `Option::DROP_METHOD_INFO` enum that requests to drop the method information. If no method information is needed, a `StackWalker` with `DROP_METHOD_INFO` >> can be used instead and such stack walker will save the overhead of extracting the method information >> and the memory used for the stack walking. >> >> New factory methods to take a parameter to specify the kind of stack walker to be created are defined. >> This provides a simple way for existing code, for example logging frameworks, to take advantage of >> this enhancement with the least change as it can keep the existing function for traversing >> `StackFrame`s. >> >> For example: to find the first caller filtering a known list of implementation class, >> existing code can create a stack walker instance with `DROP_METHOD_INFO` option: >> >> >> StackWalker walker = StackWalker.getInstance(Option.DROP_METHOD_INFO, Option.RETAIN_CLASS_REFERENCE); >> Optional> callerClass = walker.walk(s -> >> s.map(StackFrame::getDeclaringClass) >> .filter(Predicate.not(implClasses::contains)) >> .findFirst()); >> >> >> If method information is accessed on the `StackFrame`s produced by this stack walker such as >> `StackFrame::getMethodName`, then `UnsupportedOperationException` will be thrown. >> >> #### Javadoc & specdiff >> >> https://cr.openjdk.org/~mchung/api/java.base/java/lang/StackWalker.html >> https://cr.openjdk.org/~mchung/jdk22/specdiff/overview-summary.html >> >> #### Alternatives Considered >> One alternative is to provide a new API: >> ` T walkClass(Function, ? extends T> function)` >> >> In this case, the caller would need to pass a function that takes a stream >> of `Class` object instead of `StackFrame`. Existing code would have to >> modify calls to the `walk` method to `walkClass` and the function body. >> >> ### Implementation Details >> >> A `StackWalker` configured with `DROP_METHOD_INFO` ... > > Mandy Chung has updated the pull request incrementally with one additional commit since the last revision: > > review feedback Looks great - thanks! ------------- Marked as reviewed by bchristi (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15370#pullrequestreview-1616029850 From dcubed at openjdk.org Thu Sep 7 18:40:43 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 7 Sep 2023 18:40:43 GMT Subject: RFR: 8313202: MutexLocker should disallow null Mutexes [v7] In-Reply-To: <4vvR0KmqJsa1PmnblvSkgmsx6gv6n5DMYLbfbsEwaq0=.5b34be95-9ecc-46f6-b9a0-6c5372bdefe2@github.com> References: <4vvR0KmqJsa1PmnblvSkgmsx6gv6n5DMYLbfbsEwaq0=.5b34be95-9ecc-46f6-b9a0-6c5372bdefe2@github.com> Message-ID: On Mon, 4 Sep 2023 09:44:11 GMT, Aleksey Shipilev wrote: >> As seen in [JDK-8313081](https://bugs.openjdk.org/browse/JDK-8313081), it is fairly easy to pass nullptr `Mutex` to `MutexLocker` by accident, which would just silently avoid the lock. >> >> There are a few places in Hotspot where we pass `nullptr` to simulate re-entrancy and/or conditionally take the lock. Those places can be more explicit, and the default `MutexLocker` can disallow nullptrs for extra safety. >> >> More thorough testing with different GC/JIT combinations is running now, we might find more issues there. Meanwhile, please comment on the approach. >> >> Additional testing: >> - [x] `grep -R "MutexLocker " src/hotspot | grep -i null`, only new `ConditionalMutexLocker` hits >> - [x] `grep -R "MutexLocker " src/hotspot | grep -i ?`, no hits >> - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` x `Serial Parallel G1 Shenandoah` > > Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: > > - Touchup whitespace > - Rewrite jvmtiManageCapabilities lock usage > - Re-instate old asserts I ran v06 thru Mach5 Tier[1-3] testing. See: https://bugs.openjdk.org/browse/JDK-8313202?focusedCommentId=14609581&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14609581 Test results for those three tiers look good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15043#issuecomment-1710610429 From dfuchs at openjdk.org Thu Sep 7 18:56:51 2023 From: dfuchs at openjdk.org (Daniel Fuchs) Date: Thu, 7 Sep 2023 18:56:51 GMT Subject: RFR: 8268829: Provide an optimized way to walk the stack with Class object only [v11] In-Reply-To: References: Message-ID: On Thu, 7 Sep 2023 18:22:40 GMT, Mandy Chung wrote: >> 8268829: Provide an optimized way to walk the stack with Class object only >> >> `StackWalker::walk` creates one `StackFrame` per frame and the current implementation >> allocates one `StackFrameInfo` and one `MemberName` objects per frame. Some frameworks >> like logging may only interest in the Class object but not the method name nor the BCI, >> for example, filters out its implementation classes to find the caller class. It's >> similar to `StackWalker::getCallerClass` but allows a predicate to filter out the element. >> >> This PR proposes to add `Option::DROP_METHOD_INFO` enum that requests to drop the method information. If no method information is needed, a `StackWalker` with `DROP_METHOD_INFO` >> can be used instead and such stack walker will save the overhead of extracting the method information >> and the memory used for the stack walking. >> >> New factory methods to take a parameter to specify the kind of stack walker to be created are defined. >> This provides a simple way for existing code, for example logging frameworks, to take advantage of >> this enhancement with the least change as it can keep the existing function for traversing >> `StackFrame`s. >> >> For example: to find the first caller filtering a known list of implementation class, >> existing code can create a stack walker instance with `DROP_METHOD_INFO` option: >> >> >> StackWalker walker = StackWalker.getInstance(Option.DROP_METHOD_INFO, Option.RETAIN_CLASS_REFERENCE); >> Optional> callerClass = walker.walk(s -> >> s.map(StackFrame::getDeclaringClass) >> .filter(Predicate.not(implClasses::contains)) >> .findFirst()); >> >> >> If method information is accessed on the `StackFrame`s produced by this stack walker such as >> `StackFrame::getMethodName`, then `UnsupportedOperationException` will be thrown. >> >> #### Javadoc & specdiff >> >> https://cr.openjdk.org/~mchung/api/java.base/java/lang/StackWalker.html >> https://cr.openjdk.org/~mchung/jdk22/specdiff/overview-summary.html >> >> #### Alternatives Considered >> One alternative is to provide a new API: >> ` T walkClass(Function, ? extends T> function)` >> >> In this case, the caller would need to pass a function that takes a stream >> of `Class` object instead of `StackFrame`. Existing code would have to >> modify calls to the `walk` method to `walkClass` and the function body. >> >> ### Implementation Details >> >> A `StackWalker` configured with `DROP_METHOD_INFO` ... > > Mandy Chung has updated the pull request incrementally with one additional commit since the last revision: > > review feedback Changes requested by dfuchs (Reviewer). test/micro/org/openjdk/bench/java/lang/StackWalkBench.java line 60: > 58: static StackWalker walker(String name) { > 59: return switch (name) { > 60: case "class+method" -> WALKER; don't you need to also change "default" into "class+method" in the `@Param({"default", "class_only"})` annotations below? test/micro/org/openjdk/bench/java/lang/StackWalkBench.java line 78: > 76: public int mark = 4; > 77: > 78: @Param({"default", "class_only"}) (I mean here) ------------- PR Review: https://git.openjdk.org/jdk/pull/15370#pullrequestreview-1616052545 PR Review Comment: https://git.openjdk.org/jdk/pull/15370#discussion_r1318997721 PR Review Comment: https://git.openjdk.org/jdk/pull/15370#discussion_r1318999463 From jvernee at openjdk.org Thu Sep 7 19:01:25 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 7 Sep 2023 19:01:25 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v18] In-Reply-To: References: Message-ID: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: add code snippet ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15103/files - new: https://git.openjdk.org/jdk/pull/15103/files/2f50adbf..86a7e227 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=16-17 Stats: 6 lines in 1 file changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From mchung at openjdk.org Thu Sep 7 19:07:52 2023 From: mchung at openjdk.org (Mandy Chung) Date: Thu, 7 Sep 2023 19:07:52 GMT Subject: RFR: 8268829: Provide an optimized way to walk the stack with Class object only [v11] In-Reply-To: References: Message-ID: On Thu, 7 Sep 2023 18:51:44 GMT, Daniel Fuchs wrote: >> Mandy Chung has updated the pull request incrementally with one additional commit since the last revision: >> >> review feedback > > test/micro/org/openjdk/bench/java/lang/StackWalkBench.java line 60: > >> 58: static StackWalker walker(String name) { >> 59: return switch (name) { >> 60: case "class+method" -> WALKER; > > don't you need to also change "default" into "class+method" in the `@Param({"default", "class_only"})` annotations below? thanks for catching it. will fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15370#discussion_r1319009375 From mchung at openjdk.org Thu Sep 7 19:27:14 2023 From: mchung at openjdk.org (Mandy Chung) Date: Thu, 7 Sep 2023 19:27:14 GMT Subject: RFR: 8268829: Provide an optimized way to walk the stack with Class object only [v12] In-Reply-To: References: Message-ID: > 8268829: Provide an optimized way to walk the stack with Class object only > > `StackWalker::walk` creates one `StackFrame` per frame and the current implementation > allocates one `StackFrameInfo` and one `MemberName` objects per frame. Some frameworks > like logging may only interest in the Class object but not the method name nor the BCI, > for example, filters out its implementation classes to find the caller class. It's > similar to `StackWalker::getCallerClass` but allows a predicate to filter out the element. > > This PR proposes to add `Option::DROP_METHOD_INFO` enum that requests to drop the method information. If no method information is needed, a `StackWalker` with `DROP_METHOD_INFO` > can be used instead and such stack walker will save the overhead of extracting the method information > and the memory used for the stack walking. > > New factory methods to take a parameter to specify the kind of stack walker to be created are defined. > This provides a simple way for existing code, for example logging frameworks, to take advantage of > this enhancement with the least change as it can keep the existing function for traversing > `StackFrame`s. > > For example: to find the first caller filtering a known list of implementation class, > existing code can create a stack walker instance with `DROP_METHOD_INFO` option: > > > StackWalker walker = StackWalker.getInstance(Option.DROP_METHOD_INFO, Option.RETAIN_CLASS_REFERENCE); > Optional> callerClass = walker.walk(s -> > s.map(StackFrame::getDeclaringClass) > .filter(Predicate.not(implClasses::contains)) > .findFirst()); > > > If method information is accessed on the `StackFrame`s produced by this stack walker such as > `StackFrame::getMethodName`, then `UnsupportedOperationException` will be thrown. > > #### Javadoc & specdiff > > https://cr.openjdk.org/~mchung/api/java.base/java/lang/StackWalker.html > https://cr.openjdk.org/~mchung/jdk22/specdiff/overview-summary.html > > #### Alternatives Considered > One alternative is to provide a new API: > ` T walkClass(Function, ? extends T> function)` > > In this case, the caller would need to pass a function that takes a stream > of `Class` object instead of `StackFrame`. Existing code would have to > modify calls to the `walk` method to `walkClass` and the function body. > > ### Implementation Details > > A `StackWalker` configured with `DROP_METHOD_INFO` option creates `ClassFrameInfo[]` > buffer that is filled by the VM during stack walking. `Sta... Mandy Chung has updated the pull request incrementally with one additional commit since the last revision: Fix @Param due to the rename from default to class+method ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15370/files - new: https://git.openjdk.org/jdk/pull/15370/files/0e6abc42..c3746f5c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15370&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15370&range=10-11 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15370.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15370/head:pull/15370 PR: https://git.openjdk.org/jdk/pull/15370 From dfuchs at openjdk.org Thu Sep 7 19:30:47 2023 From: dfuchs at openjdk.org (Daniel Fuchs) Date: Thu, 7 Sep 2023 19:30:47 GMT Subject: RFR: 8268829: Provide an optimized way to walk the stack with Class object only [v12] In-Reply-To: References: Message-ID: On Thu, 7 Sep 2023 19:27:14 GMT, Mandy Chung wrote: >> 8268829: Provide an optimized way to walk the stack with Class object only >> >> `StackWalker::walk` creates one `StackFrame` per frame and the current implementation >> allocates one `StackFrameInfo` and one `MemberName` objects per frame. Some frameworks >> like logging may only interest in the Class object but not the method name nor the BCI, >> for example, filters out its implementation classes to find the caller class. It's >> similar to `StackWalker::getCallerClass` but allows a predicate to filter out the element. >> >> This PR proposes to add `Option::DROP_METHOD_INFO` enum that requests to drop the method information. If no method information is needed, a `StackWalker` with `DROP_METHOD_INFO` >> can be used instead and such stack walker will save the overhead of extracting the method information >> and the memory used for the stack walking. >> >> New factory methods to take a parameter to specify the kind of stack walker to be created are defined. >> This provides a simple way for existing code, for example logging frameworks, to take advantage of >> this enhancement with the least change as it can keep the existing function for traversing >> `StackFrame`s. >> >> For example: to find the first caller filtering a known list of implementation class, >> existing code can create a stack walker instance with `DROP_METHOD_INFO` option: >> >> >> StackWalker walker = StackWalker.getInstance(Option.DROP_METHOD_INFO, Option.RETAIN_CLASS_REFERENCE); >> Optional> callerClass = walker.walk(s -> >> s.map(StackFrame::getDeclaringClass) >> .filter(Predicate.not(implClasses::contains)) >> .findFirst()); >> >> >> If method information is accessed on the `StackFrame`s produced by this stack walker such as >> `StackFrame::getMethodName`, then `UnsupportedOperationException` will be thrown. >> >> #### Javadoc & specdiff >> >> https://cr.openjdk.org/~mchung/api/java.base/java/lang/StackWalker.html >> https://cr.openjdk.org/~mchung/jdk22/specdiff/overview-summary.html >> >> #### Alternatives Considered >> One alternative is to provide a new API: >> ` T walkClass(Function, ? extends T> function)` >> >> In this case, the caller would need to pass a function that takes a stream >> of `Class` object instead of `StackFrame`. Existing code would have to >> modify calls to the `walk` method to `walkClass` and the function body. >> >> ### Implementation Details >> >> A `StackWalker` configured with `DROP_METHOD_INFO` ... > > Mandy Chung has updated the pull request incrementally with one additional commit since the last revision: > > Fix @Param due to the rename from default to class+method Marked as reviewed by dfuchs (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15370#pullrequestreview-1616112038 From coleenp at openjdk.org Thu Sep 7 20:04:42 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 7 Sep 2023 20:04:42 GMT Subject: RFR: 8308479: [s390x] Implement alternative fast-locking scheme [v8] In-Reply-To: References: Message-ID: <0OUTu5q5D3Cofz2eoihwbCukQKWudzM9h-YtSu8ua14=.024196a2-2f8d-4c4d-b659-1d1cb3f2f1f8@github.com> On Fri, 23 Jun 2023 05:44:04 GMT, Amit Kumar wrote: >> This PR implements new fast-locking scheme for s390x. Additionally few parameters have been renamed to be in sync with PPC. >> >> Testing done (for release, fastdebug and slowdebug build): >> All `test/jdk/java/util/concurrent` test with parameters: >> * LockingMode=2 >> * LockingMode=2 with -Xint >> * LockingMode=2 with -XX:TieredStopAtLevel=1 >> * LockingMode=2 with -XX:-TieredCompilation >> >> Result is consistently similar to Aarch(MacOS) and PPC, All of 124 tests are passing except `MapLoops.java` because in the 2nd part for this testcase, jvm starts with `HeavyMonitors` which conflict with `LockingMode=2` >> >> BenchMark Result for Renaissance-jmh: >> >> | Benchmark | Without fastLock (ms/op) | With fastLock (ms/op) | Improvement | >> |------------------------------------------|-------------------------|----------------------|-------------| >> | o.r.actors.JmhAkkaUct.runOperation | 1565.080 | 1365.877 | 12.70% | >> | o.r.actors.JmhReactors.runOperation | 9316.972 | 10592.982 | -13.70% | >> | o.r.jdk.concurrent.JmhFjKmeans.runOperation | 1257.183 | 1235.530 | 1.73% | >> | o.r.jdk.concurrent.JmhFutureGenetic.runOperation | 1925.158 | 2073.066 | -7.69% | >> | o.r.jdk.streams.JmhParMnemonics.runOperation | 2746.664 | 2836.085 | -3.24% | >> | o.r.jdk.streams.JmhScrabble.runOperation | 76.774 | 74.239 | 3.31% | >> | o.r.rx.JmhRxScrabble.runOperation | 162.270 | 167.061 | -2.96% | >> | o.r.scala.sat.JmhScalaDoku.runOperation | 3333.711 | 3271.078 | 1.88% | >> | o.r.scala.stdlib.JmhScalaKmeans.runOperation | 182.746 | 182.153 | 0.33% | >> | o.r.scala.stm.JmhPhilosophers.runOperation | 15003.329 | 13396.921 | 10.57% | >> | o.r.scala.stm.JmhScalaStmBench7.runOperation | 1669.090 | 1579.900 | 5.34% | >> | o.r.twitter.finagle.JmhFinagleChirper.runOperation | 9601.963 | 10034.404 | -4.52% | >> | o.r.twitter.finagle.JmhFinagleHttp.runOperation | 4403.725 | 4746.707 | -7.79% | >> >> >> DaCapo Benchmark Result: >> >> | Benchmark | Without fast lock (msec) | With fast lock (msec) | Improvement | >> |--... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > suggestions from Martin We are considering making Fast Locking on by default for Oracle supported platforms. Have these performance concerns been addressed? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14414#issuecomment-1710701169 From mchung at openjdk.org Thu Sep 7 21:40:57 2023 From: mchung at openjdk.org (Mandy Chung) Date: Thu, 7 Sep 2023 21:40:57 GMT Subject: Integrated: 8268829: Provide an optimized way to walk the stack with Class object only In-Reply-To: References: Message-ID: <-WTzPHcQRUICXBu_u26_37O8oRcACfyZkwquhz8dg9Y=.91014937-6a56-4c06-be60-47f933d94c14@github.com> On Mon, 21 Aug 2023 20:07:20 GMT, Mandy Chung wrote: > 8268829: Provide an optimized way to walk the stack with Class object only > > `StackWalker::walk` creates one `StackFrame` per frame and the current implementation > allocates one `StackFrameInfo` and one `MemberName` objects per frame. Some frameworks > like logging may only interest in the Class object but not the method name nor the BCI, > for example, filters out its implementation classes to find the caller class. It's > similar to `StackWalker::getCallerClass` but allows a predicate to filter out the element. > > This PR proposes to add `Option::DROP_METHOD_INFO` enum that requests to drop the method information. If no method information is needed, a `StackWalker` with `DROP_METHOD_INFO` > can be used instead and such stack walker will save the overhead of extracting the method information > and the memory used for the stack walking. > > New factory methods to take a parameter to specify the kind of stack walker to be created are defined. > This provides a simple way for existing code, for example logging frameworks, to take advantage of > this enhancement with the least change as it can keep the existing function for traversing > `StackFrame`s. > > For example: to find the first caller filtering a known list of implementation class, > existing code can create a stack walker instance with `DROP_METHOD_INFO` option: > > > StackWalker walker = StackWalker.getInstance(Option.DROP_METHOD_INFO, Option.RETAIN_CLASS_REFERENCE); > Optional> callerClass = walker.walk(s -> > s.map(StackFrame::getDeclaringClass) > .filter(Predicate.not(implClasses::contains)) > .findFirst()); > > > If method information is accessed on the `StackFrame`s produced by this stack walker such as > `StackFrame::getMethodName`, then `UnsupportedOperationException` will be thrown. > > #### Javadoc & specdiff > > https://cr.openjdk.org/~mchung/api/java.base/java/lang/StackWalker.html > https://cr.openjdk.org/~mchung/jdk22/specdiff/overview-summary.html > > #### Alternatives Considered > One alternative is to provide a new API: > ` T walkClass(Function, ? extends T> function)` > > In this case, the caller would need to pass a function that takes a stream > of `Class` object instead of `StackFrame`. Existing code would have to > modify calls to the `walk` method to `walkClass` and the function body. > > ### Implementation Details > > A `StackWalker` configured with `DROP_METHOD_INFO` option creates `ClassFrameInfo[]` > buffer that is filled by the VM during stack walking. `Sta... This pull request has now been integrated. Changeset: 111ecdba Author: Mandy Chung URL: https://git.openjdk.org/jdk/commit/111ecdbaf58e5c0b3a64e0eca8a291df295e71b0 Stats: 1340 lines in 34 files changed: 718 ins; 358 del; 264 mod 8268829: Provide an optimized way to walk the stack with Class object only 8210375: StackWalker::getCallerClass throws UnsupportedOperationException Reviewed-by: coleenp, dfuchs, bchristi ------------- PR: https://git.openjdk.org/jdk/pull/15370 From jjoo at openjdk.org Thu Sep 7 22:32:27 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Thu, 7 Sep 2023 22:32:27 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v3] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: address remainder of dholmes' comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/557dbfa6..eb911ee1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=01-02 Stats: 19 lines in 7 files changed: 8 ins; 7 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From svkamath at openjdk.org Thu Sep 7 23:25:38 2023 From: svkamath at openjdk.org (Smita Kamath) Date: Thu, 7 Sep 2023 23:25:38 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions In-Reply-To: References: Message-ID: On Mon, 4 Sep 2023 15:00:23 GMT, Ferenc Rakoczi wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: >> >> |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup >> |-------------|------------|---------------|------------------|-----------| >> |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 >> full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 >> small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 >> small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 >> full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 >> full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 >> small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 >> small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 >> full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 >> small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 >> small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 >> full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 >> full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 >> small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 >> small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 >> full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 >> small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 >> small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 >> full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 >> full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 >> small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 >> small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 >> full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 >> small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 >> small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 >> full.AES... > > src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 590: > >> 588: private static int implGCMCrypt(byte[] in, int inOfs, int inLen, byte[] ct, >> 589: int ctOfs, byte[] out, int outOfs, >> 590: GCTR gctr, GHASH ghash, boolean encryption) { > > It looks to me that you don't need to introduce this "boolean encryption" here as it is simply (ct == out), which can easily be calculated in the intrinsics and that saves a lot of code change. @ferakocz Thank you for your comment. I will make the change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1319203520 From svkamath at openjdk.org Thu Sep 7 23:32:37 2023 From: svkamath at openjdk.org (Smita Kamath) Date: Thu, 7 Sep 2023 23:32:37 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions In-Reply-To: References: Message-ID: On Thu, 7 Sep 2023 23:23:13 GMT, Smita Kamath wrote: >> src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 590: >> >>> 588: private static int implGCMCrypt(byte[] in, int inOfs, int inLen, byte[] ct, >>> 589: int ctOfs, byte[] out, int outOfs, >>> 590: GCTR gctr, GHASH ghash, boolean encryption) { >> >> It looks to me that you don't need to introduce this "boolean encryption" here as it is simply (ct == out), which can easily be calculated in the intrinsics and that saves a lot of code change. > > @ferakocz Thank you for your comment. I will make the change. @ascarpino Apologies for the delay in responding, I was away on vacation. There are fewer number of registers available in the AVX2 algorithm as compared to AVX512. That's why its essential for the intrinsic to know if it is encryption or decryption this time around. I will be implementing Ferenc's suggestion and remove the boolean variable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1319206174 From haosun at openjdk.org Fri Sep 8 00:57:10 2023 From: haosun at openjdk.org (Hao Sun) Date: Fri, 8 Sep 2023 00:57:10 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v6] In-Reply-To: References: Message-ID: > ### Background > > 1. PAC-RET branch protection was initially implemented on Linux/AArch64 in JDK-8277204 [1]. > > 2. However, it was broken with the introduction of virtual threads [2], mainly because the continuation freeze/thaw mechanism would trigger stack copying to/from memory, whereas the saved and signed LR on the stack doesn't get re-signed accordingly. > > 3. PR-9067 [3] tried to implement the re-sign part, but it was not accepted because option "PreserveFramePointer" is always turned on by PAC-RET but this would slow down virtual threads by ~5-20x. > > 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview language features are enabled. Note that virtual thread is one preview feature then. > > 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. > > ### Goal > > This patch aims to make PAC-RET compatible with virtual threads. > > ### Requirements of virtual threads > > R-1: Option "PreserveFramePointer" should be turned off. That is, PAC-RET implementation should not rely on frame pointer FP. Otherwise, the fast path in stack copying will never be taken. > > R-2: Use some invariant values to stack copying as the modifier, so as to avoid the PAC re-sign for continuation thaw, as the fast path in stack copying doesn't walk the frame. > > Note that more details can be found in the discussion [3]. > > ### Investigation > > We considered to use (relative) stack pointer SP, thread ID, PACStack [7] and value zero as the candidate modifier. > > 1. SP: In some scenarios, we need to authenticate the return address in places where the current SP doesn't match the SP on function entry. E.g. see the usage in Runtime1::generate_handle_exception(). Hence, neither absolute nor relative SP works. > > 2. thread ID (tid): It's invariant to virtual thread, but it's nontrivial to access it from the JIT side. We need 1) firstly resolve the address of current thread (See [8] as an example), and 2) get the tid field in the way like java_lang_Thread::thread_id(). I suppose this would introduce big performance overhead. Then can we turn to use "rthread" register (JavaThread object address) as the modifier? Unfortunately, it's not an invariant to virtual threads and PAC re-sign is still needed. > > 5. PACStack uses the signed return address of caller as the modifier to sign the callee's return address. In this way, we get one PACed call chain. The modifier should be saved into somewhere around the frame record. Inevitably, FP should be preserved to make it easy to find this modifier in case of... Hao Sun has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - Revert to the implementation with zero as the PAC modifier - Merge branch 'master' into jdk-8287325 - Update aarch64.ad and jvmci AArch64TestAssembler.java Before this patch, rscratch1 is clobbered. With this patch, we use the rscratch1 register after we save it on the stack. In this way, the code would be consistent with macroAssembler_aarch64.cpp. - Merge branch 'master' into jdk-8287325 - Remove my temp test patch on jvmci_global.hpp and stubGenerator_aarch64.hpp - Use relative SP as the PAC modifier - Merge branch 'master' into jdk-8287325 - Merge branch 'master' into jdk-8287325 - Rename return_pc_at and patch_pc_at Rename return_pc_at to return_address_at. Rename patch_pc_at to patch_return_address_at. - 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret * Background 1. PAC-RET branch protection was initially implemented on Linux/AArch64 in JDK-8277204 [1]. 2. However, it was broken with the introduction of virtual threads [2], mainly because the continuation freeze/thaw mechanism would trigger stack copying to/from memory, whereas the saved and signed LR on the stack doesn't get re-signed accordingly. 3. PR-9067 [3] tried to implement the re-sign part, but it was not accepted because option "PreserveFramePointer" is always turned on by PAC-RET but this would slow down virtual threads by ~5-20x. 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview language features are enabled. Note that virtual thread is one preview feature then. 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. * Goal This patch aims to make PAC-RET compatible with virtual threads. * Requirements of virtual threads R-1: Option "PreserveFramePointer" should be turned off. That is, PAC-RET implementation should not rely on frame pointer FP. Otherwise, the fast path in stack copying will never be taken. R-2: Use some invariant values to stack copying as the modifier, so as to avoid the PAC re-sign for continuation thaw, as the fast path in stack copying doesn't walk the frame. Note that more details can be found in the discussion [3]. * Investigation We considered to use (relative) stack pointer SP, thread ID, PACStack [7] and value zero as the candidate modifier. 1. SP: In some scenarios, we need to authenticate the return address in places where the current SP doesn't match the SP on function entry. E.g. see the usage in Runtime1::generate_handle_exception(). Hence, neither absolute nor relative SP works. 2. thread ID (tid): It's invariant to virtual thread, but it's nontrivial to access it from the JIT side. We need 1) firstly resolve the address of current thread (See [8] as an example), and 2) get the tid field in the way like java_lang_Thread::thread_id(). I suppose this would introduce big performance overhead. Then can we turn to use "rthread" register (JavaThread object address) as the modifier? Unfortunately, it's not an invariant to virtual threads and PAC re-sign is still needed. 3. PACStack uses the signed return address of caller as the modifier to sign the callee's return address. In this way, we get one PACed call chain. The modifier should be saved into somewhere around the frame record. Inevitably, FP should be preserved to make it easy to find this modifier in case of some exception scenarios (Recall the reason why we fail to use SP as the modifier). Finally, we choose to use value zero as the modifier. Trivially, it's compatible with virtual threads. However, compared to FP modifier, this solution would reduce the strength of PAC-RET protection to some extent. E.g., you get the same authentication code for each call to the function, whereas using FP gives you different codes as long as the stack depth is different. * Implementation of Zero modifier Here list the key updates of this patch. 1. vm_version_aarch64.cpp Remove the constraint on "enable-preview" and "PreserveFramePointer". 2. macroAssembler_aarch64.cpp For utility protect_return_address(), 1) use PACIAZ/PACIZA instructions directly. 2) argument "temp_reg" is removed since all functions use the same modifier. 3) all the use sites are updated accordingly. This involves the updates in many files. Similar updates are done to utility authenticate_return_address(). Besides, aarch64.ad and AArch64TestAssembler.java are updated accordingly. 3. pauth_linux_aarch64.inline.hpp For utilities pauth_sign_return_address() and pauth_authenticate_return_address(), remove the second argument and pass value zero to r16 register. Similarly, all the use sites are updated as well. This involves the updates in many files. 4. continuationHelper_aarch64.inline.hpp Introduce return_pc_at() and patch_pc_at() to avoid directly reading the saved PC or writing new signed PC on the stack in shared code. 5. Minor updates 1) sharedRuntime_aarch64.cpp: Add the missing authenticate_return_address() use for function gen_continuation_enter(). In functions generate_deopt_blob() and generate_uncommon_trap_blob(), remove the authentication on the caller (3) frame since the return address is not used. 2) stubGenerator_aarch64.cpp: Add the missing authenticate_return_address() use for function generate_cont_thaw(). 3) runtime.cpp: enable the authentication. * Test 1. Cross compilations on arm32/s390/ppc/riscv passed. 2. zero build and x86 build passed. 3. tier1~3 passed on Linux/AArch64 w/ and w/o PAC-RET. Co-Developed-by: Nick Gasson [1] https://bugs.openjdk.org/browse/JDK-8277204 [2] https://openjdk.org/jeps/425 [3] https://github.com/openjdk/jdk/pull/9067 [4] https://bugs.openjdk.org/browse/JDK-8288023 [5] https://bugs.openjdk.org/browse/JDK-8301819 [6] https://openjdk.org/jeps/444 [7] https://www.usenix.org/conference/usenixsecurity21/presentation/liljestrand [8] https://github.com/openjdk/jdk/pull/10441 ------------- Changes: https://git.openjdk.org/jdk/pull/13322/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13322&range=05 Stats: 193 lines in 29 files changed: 75 ins; 28 del; 90 mod Patch: https://git.openjdk.org/jdk/pull/13322.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13322/head:pull/13322 PR: https://git.openjdk.org/jdk/pull/13322 From lmesnik at openjdk.org Fri Sep 8 03:14:53 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 8 Sep 2023 03:14:53 GMT Subject: RFR: 8312174: missing JVMTI events from vthreads parked during JVMTI attach [v2] In-Reply-To: <7cWSn9ZypedqQPPkGV8xFhFfpGYHMJcDZ5TH6Hel4mQ=.49235d1c-737c-465e-833c-a26d814cb7ac@github.com> References: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> <7cWSn9ZypedqQPPkGV8xFhFfpGYHMJcDZ5TH6Hel4mQ=.49235d1c-737c-465e-833c-a26d814cb7ac@github.com> Message-ID: <9NY499Z6epRjQ-ZvDrbxS6weL8QG-7djNWJN-o9SCmc=.6d58bb4a-2e26-4a96-ae1b-dbe6682ebe8f@github.com> On Wed, 6 Sep 2023 20:26:51 GMT, Alex Menkov wrote: >> Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge >> - 8312174: missing JVMTI events from vthreads parked during JVMTI attach > > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/VThreadEventTest.java line 136: > >> 134: ready1.await(); >> 135: mready.decr(); >> 136: VirtualMachine vm = VirtualMachine.attach(String.valueOf(ProcessHandle.current().pid())); > > I think sleep is needed here so threads which should be unmounted have time to unmount before attach. Would it makes sense also to check that thread state is TIMED_WAITING. It should be set TIMED_WAITING unmounted threads https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/lang/Thread.State.html#TIMED_WAITING ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1319305453 From lmesnik at openjdk.org Fri Sep 8 03:14:51 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 8 Sep 2023 03:14:51 GMT Subject: RFR: 8312174: missing JVMTI events from vthreads parked during JVMTI attach [v2] In-Reply-To: References: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> Message-ID: On Thu, 7 Sep 2023 06:33:29 GMT, Serguei Spitsyn wrote: >> This update fixes two important issues: >> - Issue reported by a bug submitter about missing JVMTI events on virtual threads after an a JVMTI agent dynamic attach >> - Known scalability/performance issue: a need to lazily create `JvmtiThreadState's` for virtual threads >> >> The issue is tricky to fix because the existing mechanism of the JVMTI event management does not support unmounted virtual threads. The JVMTI `SetEventNotificationMode()` calls the function `JvmtiEventControllerPrivate::recompute_enabled()` >> which inspects a `JavaThread's` list and for each thread in the list recomputes enabled event bits with the function `JvmtiEventControllerPrivate::recompute_thread_enabled()`. The `JvmtiThreadState` of each thread is created but only when it is really needed, eg, if any of the thread filtered events is enabled. There was an initial adjustment of this mechanism for virtual threads which accounted for both carrier and virtual threads when a virtual thread is mounted. However, it does not work for unmounted virtual threads. A temporary work around was to always create `JvmtiThreadState` for each virtual thread eagerly at a thread starting point. >> >> This fix introduces new function `JvmtiExport::get_jvmti_thread_state()` which checks if thread is virtual and there is a thread filtered event enabled globally, and if so, forces a creation of the `JvmtiThreadState`. Another adjustment was needed because the function `state_for_while_locked()` can be called directly in some contexts. New function `JvmtiEventController::recompute_thread_filtered()` was introduced to make necessary corrections. >> >> Testing: >> - new test from the bug report was adopted: `test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest` >> - ran mach5 tiers 1-6: all are passed > > Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge > - 8312174: missing JVMTI events from vthreads parked during JVMTI attach Changes requested by lmesnik (Reviewer). test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/VThreadEventTest.java line 86: > 84: log("WARNING: test expects at least 8 processors."); > 85: } > 86: Counter ready1 = new Counter(THREAD_CNT); I think that CountDownLatch should works fine here and no need to use custom Counter. test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/libVThreadEventTest.cpp line 30: > 28: #include "jvmti_common.h" > 29: > 30: #ifdef _WIN32 Do we need it here? test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/libVThreadEventTest.cpp line 44: > 42: > 43: void JNICALL VirtualThreadEnd(jvmtiEnv *jvmti, JNIEnv* jni, jthread virtual_thread) { > 44: std::lock_guard lockGuard(lock); the atomic would be better for counters. It guarantees that counter is always protected. test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/libVThreadEventTest.cpp line 62: > 60: > 61: void > 62: check_jvmti_err(jvmtiError err, const char* msg) { This function could be moved into jvmti_common.h. ------------- PR Review: https://git.openjdk.org/jdk/pull/15467#pullrequestreview-1616597260 PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1319303393 PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1319295540 PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1319299148 PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1319295250 From jjoo at openjdk.org Fri Sep 8 03:27:12 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Fri, 8 Sep 2023 03:27:12 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v4] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: address partial comments from Volker and Man ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/eb911ee1..74b6db2d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=02-03 Stats: 47 lines in 15 files changed: 11 ins; 6 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From jjoo at openjdk.org Fri Sep 8 03:27:13 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Fri, 8 Sep 2023 03:27:13 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v2] In-Reply-To: References: Message-ID: <-gy8KZn5ExyO91XaFXsBfDil-whwnhOhiHGayvXBMq4=.5c122640-6eae-443e-94ce-acf7b87b7df8@github.com> On Wed, 6 Sep 2023 21:40:15 GMT, Man Cao wrote: >> Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: >> >> address dholmes@ comments > > src/hotspot/share/gc/g1/g1ConcurrentRefineThread.cpp line 85: > >> 83: if (UsePerfData && os::is_thread_cpu_time_supported() && is_primary()) { >> 84: _cr->update_concurrent_refine_threads_cpu_time(); >> 85: } > > There are two classes for primary thread and secondary refinement thread: `G1PrimaryConcurrentRefineThread` and `G1SecondaryConcurrentRefineThread`. It is probably cleaner to move this part inside `G1PrimaryConcurrentRefineThread` and add a virtual method in `G1ConcurrentRefineThread`. We can get rid of the `is_primary()` check as well. > > > class G1ConcurrentRefineThread { > virtual void possibly_update_threads_cpu_time() {}; > } > > void G1PrimaryConcurrentRefineThread::possibly_update_threads_cpu_time() { > if (UsePerfData && os::is_thread_cpu_time_supported()) { > _cr->update_concurrent_refine_threads_cpu_time(); > } > } Sounds good, let me know if this change is how you envisioned it! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1319310921 From jjoo at openjdk.org Fri Sep 8 03:44:16 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Fri, 8 Sep 2023 03:44:16 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v5] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: rename counters to be *.cpu_time ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/74b6db2d..d2e48676 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=03-04 Stats: 4 lines in 4 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From jjoo at openjdk.org Fri Sep 8 03:55:24 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Fri, 8 Sep 2023 03:55:24 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v6] In-Reply-To: References: Message-ID: <25MiQkF4JLnVq2fcJCshVnf_SRrOVx0kHW3Fk33yczs=.5e463d24-87e0-4f5f-8f18-8d3b419ce3ae@github.com> > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Properly initialize concurrent dedup thread counter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/d2e48676..bcfe1516 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=04-05 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From rehn at openjdk.org Fri Sep 8 04:56:37 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 8 Sep 2023 04:56:37 GMT Subject: RFR: 8315652: RISC-V: Features string uses wrong separator for jtreg [v2] In-Reply-To: References: <3oL6QCLCgkfDLQrUohFBdczMAVuJXdLvAkwU7xWhaI8=.595b5953-9501-4e6c-a265-a9f014634c04@github.com> Message-ID: <7FpkUoSuvOEH5HS3_zsS_7oqMZ3Xbq_Wq5GZnGv-T6s=.83afa7a3-e525-4a26-950b-16c0726e4e2f@github.com> On Thu, 7 Sep 2023 14:19:09 GMT, Fei Yang wrote: > In fact, I mean simply keeping the feature flags with space separators in `_features_string` without removing them for 'pretty' print string. As you mentioned on JBS, then we would have CPU info/desc like: "CPU: total 16 (initial active 16) rv64 i m a f d c v zicbom zicboz zicbop zba zbb zbs zicsr zifencei zic64b zihintpause" This seems acceptable to me. Please consider. Yes, sure I'll change! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15579#issuecomment-1711072588 From rehn at openjdk.org Fri Sep 8 05:02:41 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 8 Sep 2023 05:02:41 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support [v8] In-Reply-To: References: Message-ID: On Thu, 7 Sep 2023 17:24:28 GMT, Ludovic Henry wrote: >> With the Ztso extension [1], some hardware will support TSO on RISC-V. That allows us to reduce the generation of memory fences, given the stronger memory model compared to RVWMO. >> >> [1] https://github.com/riscv/riscv-isa-manual/blob/6dcbc6da9ada01f0f57da83cda6059bdec57619f/src/ztso-st-ext.adoc#L1 > > Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: > > fixup! 8315841: RISC-V: Check for hardware TSO support LGTM ! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15613#pullrequestreview-1616675906 From rehn at openjdk.org Fri Sep 8 05:02:44 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 8 Sep 2023 05:02:44 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support [v4] In-Reply-To: References: <0XwNfOt464hFdux7jasXngWvxwiPYHQ7dnhKFCunePw=.a9f5b6d8-119d-476e-9d97-370b682f065e@github.com> Message-ID: On Thu, 7 Sep 2023 12:53:12 GMT, Ludovic Henry wrote: >> src/hotspot/cpu/riscv/vm_version_riscv.cpp line 213: >> >>> 211: } >>> 212: >>> 213: #if defined(TARGET_ZTSO) && TARGET_ZTSO >> >> If someone compiles with "CXXFLAGS=-marchrv64....ztso..", we need to try to parse the supplied flags, that doesn't seem like fun. >> Instead I suggest we add code to read-out the elf flags, i.e: >> "Flags: 0x15, RVC, double-float ABI, TSO" >> >> And set UseZtso: >> A: If this is a TSO elf. >> B: If hwprobe says this TSO hardware (either directly or via vendor). >> C: If someone set flag, >> >> I guess your idea was to have a flag like --enable-tso which sets TARGET_TSO ? >> If we have that or not I still like above to happen. >> >> (I'm not saying you should do any of this in this PR, I can file new ones) > > `TARGET_TSO` is set by gcc directly. See https://www.mail-archive.com/gcc-patches at gcc.gnu.org/msg281514.html Both llvm/gcc should define __riscv_ztso if this is compiled with tso. (@luhenry already updated PR) Which means point A look in elf would never be needed, B and C are already done in this PR. Thanks for the update @luhenry ! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15613#discussion_r1319355722 From haosun at openjdk.org Fri Sep 8 05:23:42 2023 From: haosun at openjdk.org (Hao Sun) Date: Fri, 8 Sep 2023 05:23:42 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v6] In-Reply-To: References: Message-ID: On Fri, 8 Sep 2023 00:57:10 GMT, Hao Sun wrote: >> ### Background >> >> 1. PAC-RET branch protection was initially implemented on Linux/AArch64 in JDK-8277204 [1]. >> >> 2. However, it was broken with the introduction of virtual threads [2], mainly because the continuation freeze/thaw mechanism would trigger stack copying to/from memory, whereas the saved and signed LR on the stack doesn't get re-signed accordingly. >> >> 3. PR-9067 [3] tried to implement the re-sign part, but it was not accepted because option "PreserveFramePointer" is always turned on by PAC-RET but this would slow down virtual threads by ~5-20x. >> >> 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview language features are enabled. Note that virtual thread is one preview feature then. >> >> 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. >> >> ### Goal >> >> This patch aims to make PAC-RET compatible with virtual threads. >> >> ### Requirements of virtual threads >> >> R-1: Option "PreserveFramePointer" should be turned off. That is, PAC-RET implementation should not rely on frame pointer FP. Otherwise, the fast path in stack copying will never be taken. >> >> R-2: Use some invariant values to stack copying as the modifier, so as to avoid the PAC re-sign for continuation thaw, as the fast path in stack copying doesn't walk the frame. >> >> Note that more details can be found in the discussion [3]. >> >> ### Investigation >> >> We considered to use (relative) stack pointer SP, thread ID, PACStack [7] and value zero as the candidate modifier. >> >> 1. SP: In some scenarios, we need to authenticate the return address in places where the current SP doesn't match the SP on function entry. E.g. see the usage in Runtime1::generate_handle_exception(). Hence, neither absolute nor relative SP works. >> >> 2. thread ID (tid): It's invariant to virtual thread, but it's nontrivial to access it from the JIT side. We need 1) firstly resolve the address of current thread (See [8] as an example), and 2) get the tid field in the way like java_lang_Thread::thread_id(). I suppose this would introduce big performance overhead. Then can we turn to use "rthread" register (JavaThread object address) as the modifier? Unfortunately, it's not an invariant to virtual threads and PAC re-sign is still needed. >> >> 5. PACStack uses the signed return address of caller as the modifier to sign the callee's return address. In this way, we get one PACed call chain. The modifier should be saved into somewhere around the frame record. Inevitably, FP sh... > > Hao Sun has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: > > - Revert to the implementation with zero as the PAC modifier > - Merge branch 'master' into jdk-8287325 > - Update aarch64.ad and jvmci AArch64TestAssembler.java > > Before this patch, rscratch1 is clobbered. > With this patch, we use the rscratch1 register after we save it on the > stack. > > In this way, the code would be consistent with > macroAssembler_aarch64.cpp. > - Merge branch 'master' into jdk-8287325 > - Remove my temp test patch on jvmci_global.hpp and stubGenerator_aarch64.hpp > - Use relative SP as the PAC modifier > - Merge branch 'master' into jdk-8287325 > - Merge branch 'master' into jdk-8287325 > - Rename return_pc_at and patch_pc_at > > Rename return_pc_at to return_address_at. > Rename patch_pc_at to patch_return_address_at. > - 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret > > * Background > > 1. PAC-RET branch protection was initially implemented on Linux/AArch64 > in JDK-8277204 [1]. > > 2. However, it was broken with the introduction of virtual threads [2], > mainly because the continuation freeze/thaw mechanism would trigger > stack copying to/from memory, whereas the saved and signed LR on the > stack doesn't get re-signed accordingly. > > 3. PR-9067 [3] tried to implement the re-sign part, but it was not > accepted because option "PreserveFramePointer" is always turned on by > PAC-RET but this would slow down virtual threads by ~5-20x. > > 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview > language features are enabled. Note that virtual thread is one preview > feature then. > > 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. > > * Goal > > This patch aims to make PAC-RET compatible with virtual threads. > > * Requirements of virtual threads > > R-1: Option "PreserveFramePointer" should be turned off. That is, > PAC-RET implementation should not rely on frame pointer FP. Otherwise, > the fast path in stack copying will never be taken. > > R-2: Use some invariant values to stack copying as the modifier, so as > to avoid the PAC re-sign for continuation thaw, as the fast path in > stack copying doesn't walk the frame. > > Note that more details can be found in the discussion [3]. > > * Investigation > > We considered to use (relative) stack pointer SP, thread ID, PACStack > [7] ... In the latest commit, I have reverted to the PAC-RET implementation using `zero` as the modifier. Tests: - Cross compilations on arm32/s390/ppc/riscv passed. - zero build and x86 build passed. - tier1~3 passed on Linux/AArch64 w/ and w/o PAC-RET. @theRealAph Could you help take another look at it when you have spare time? Thanks ------------- PR Comment: https://git.openjdk.org/jdk/pull/13322#issuecomment-1711090186 From fyang at openjdk.org Fri Sep 8 06:18:40 2023 From: fyang at openjdk.org (Fei Yang) Date: Fri, 8 Sep 2023 06:18:40 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support [v8] In-Reply-To: References: Message-ID: On Thu, 7 Sep 2023 17:24:28 GMT, Ludovic Henry wrote: >> With the Ztso extension [1], some hardware will support TSO on RISC-V. That allows us to reduce the generation of memory fences, given the stronger memory model compared to RVWMO. >> >> [1] https://github.com/riscv/riscv-isa-manual/blob/6dcbc6da9ada01f0f57da83cda6059bdec57619f/src/ztso-st-ext.adoc#L1 > > Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: > > fixup! 8315841: RISC-V: Check for hardware TSO support src/hotspot/cpu/riscv/vm_version_riscv.cpp line 213: > 211: } > 212: > 213: #ifdef __riscv_ztso May I ask where is this `__riscv_ztso` macro defined / specified? I tried to search it in the gcc user manual and gcc source code, but I found nothing about it. [1] https://gcc.gnu.org/onlinedocs/gcc.pdf ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15613#discussion_r1319404817 From rkennke at amazon.de Fri Sep 8 06:31:13 2023 From: rkennke at amazon.de (Kennke, Roman) Date: Fri, 8 Sep 2023 06:31:13 +0000 Subject: RFR: 8308479: [s390x] Implement alternative fast-locking scheme [v8] In-Reply-To: <0OUTu5q5D3Cofz2eoihwbCukQKWudzM9h-YtSu8ua14=.024196a2-2f8d-4c4d-b659-1d1cb3f2f1f8@github.com> References: , <0OUTu5q5D3Cofz2eoihwbCukQKWudzM9h-YtSu8ua14=.024196a2-2f8d-4c4d-b659-1d1cb3f2f1f8@github.com> Message-ID: <9088E539-DE1C-4A1F-AC0A-4A73983D4DF8@amazon.de> FWIW, I found many of the renaissance benchmarks quite noisy. Might be worth watching them closely, and consider ramping up number of iterations *and* forks. (Also, I found using the jmh-wrapped version much easier to deal with, with that you can achieve the increased iterations and forks with simple -i -wi and -f options) > Am 07.09.2023 um 22:10 schrieb Coleen Phillimore : > > ?CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > On Fri, 23 Jun 2023 05:44:04 GMT, Amit Kumar wrote: > >>> This PR implements new fast-locking scheme for s390x. Additionally few parameters have been renamed to be in sync with PPC. >>> >>> Testing done (for release, fastdebug and slowdebug build): >>> All `test/jdk/java/util/concurrent` test with parameters: >>> * LockingMode=2 >>> * LockingMode=2 with -Xint >>> * LockingMode=2 with -XX:TieredStopAtLevel=1 >>> * LockingMode=2 with -XX:-TieredCompilation >>> >>> Result is consistently similar to Aarch(MacOS) and PPC, All of 124 tests are passing except `MapLoops.java` because in the 2nd part for this testcase, jvm starts with `HeavyMonitors` which conflict with `LockingMode=2` >>> >>> BenchMark Result for Renaissance-jmh: >>> >>> | Benchmark | Without fastLock (ms/op) | With fastLock (ms/op) | Improvement | >>> |------------------------------------------|-------------------------|----------------------|-------------| >>> | o.r.actors.JmhAkkaUct.runOperation | 1565.080 | 1365.877 | 12.70% | >>> | o.r.actors.JmhReactors.runOperation | 9316.972 | 10592.982 | -13.70% | >>> | o.r.jdk.concurrent.JmhFjKmeans.runOperation | 1257.183 | 1235.530 | 1.73% | >>> | o.r.jdk.concurrent.JmhFutureGenetic.runOperation | 1925.158 | 2073.066 | -7.69% | >>> | o.r.jdk.streams.JmhParMnemonics.runOperation | 2746.664 | 2836.085 | -3.24% | >>> | o.r.jdk.streams.JmhScrabble.runOperation | 76.774 | 74.239 | 3.31% | >>> | o.r.rx.JmhRxScrabble.runOperation | 162.270 | 167.061 | -2.96% | >>> | o.r.scala.sat.JmhScalaDoku.runOperation | 3333.711 | 3271.078 | 1.88% | >>> | o.r.scala.stdlib.JmhScalaKmeans.runOperation | 182.746 | 182.153 | 0.33% | >>> | o.r.scala.stm.JmhPhilosophers.runOperation | 15003.329 | 13396.921 | 10.57% | >>> | o.r.scala.stm.JmhScalaStmBench7.runOperation | 1669.090 | 1579.900 | 5.34% | >>> | o.r.twitter.finagle.JmhFinagleChirper.runOperation | 9601.963 | 10034.404 | -4.52% | >>> | o.r.twitter.finagle.JmhFinagleHttp.runOperation | 4403.725 | 4746.707 | -7.79% | >>> >>> >>> DaCapo Benchmark Result: >>> >>> | Benchmark | Without fast lock (msec) | With fast lock (msec) | Improvement | >>> |--... >> >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> suggestions from Martin > > We are considering making Fast Locking on by default for Oracle supported platforms. Have these performance concerns been addressed? > > ------------- > > PR Comment: https://git.openjdk.org/jdk/pull/14414#issuecomment-1710701169 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879 From rehn at openjdk.org Fri Sep 8 06:51:44 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 8 Sep 2023 06:51:44 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support [v8] In-Reply-To: References: Message-ID: On Fri, 8 Sep 2023 06:08:14 GMT, Fei Yang wrote: >> Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: >> >> fixup! 8315841: RISC-V: Check for hardware TSO support > > src/hotspot/cpu/riscv/vm_version_riscv.cpp line 213: > >> 211: } >> 212: >> 213: #ifdef __riscv_ztso > > May I ask where is this `__riscv_ztso` macro defined / specified? I tried to search it in the gcc user manual [1] and gcc source code, but I found nothing about it. > > [1] https://gcc.gnu.org/onlinedocs/gcc.pdf Good question, I don't find docs in the compilers for this either, but: https://github.com/riscv-non-isa/riscv-toolchain-conventions#cc-preprocessor-definitions ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15613#discussion_r1319439617 From luhenry at openjdk.org Fri Sep 8 07:34:42 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 8 Sep 2023 07:34:42 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support [v8] In-Reply-To: References: Message-ID: <0SxTsRMH8gVoxpI7ky6nPbs1NgVFgniF83yj7y5w7UA=.1e5ee042-90cc-42eb-aadb-bc7ea5d7b571@github.com> On Fri, 8 Sep 2023 06:48:37 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/vm_version_riscv.cpp line 213: >> >>> 211: } >>> 212: >>> 213: #ifdef __riscv_ztso >> >> May I ask where is this `__riscv_ztso` macro defined / specified? I tried to search it in the gcc user manual [1] and gcc source code, but I found nothing about it. >> >> [1] https://gcc.gnu.org/onlinedocs/gcc.pdf > > Good question, I don't find docs in the compilers for this either, but: > https://github.com/riscv-non-isa/riscv-toolchain-conventions#cc-preprocessor-definitions We checked for it with [this simple snippet](https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename:'1',fontScale:14,fontUsePx:'0',j:2,lang:c%2B%2B,selection:(endColumn:1,endLineNumber:8,positionColumn:1,positionLineNumber:8,selectionStartColumn:1,selectionStartLineNumber:8,startColumn:1,startLineNumber:8),source:'int+main(void)+%7B%0A%23ifdef+__riscv_ztso%0A++++return+0%3B%0A%23else%0A++++return+1%3B%0A%23endif%0A%7D%0A'),l:'5',n:'0',o:'C%2B%2B+source+%232',t:'0')),k:51.29770473331974,l:'4',m:100,n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:rv64-gcctrunk,deviceViewOpen:'1',filters:(b:'0',binary:'1',binaryObject:'1',commentOnly:'0',debugCalls:'1',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'0',trim:'1'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:2,lang:c%2B%2B,libs:!(),options:'-O2+-march%3Drv64gcv_ztso0p1',overrides:!(),selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,star tColumn:1,startLineNumber:1),source:2),l:'5',n:'0',o:'+RISC-V+(64-bits)+gcc+(trunk)+(Editor+%232)',t:'0'),(h:compiler,i:(compiler:rv64-clang,deviceViewOpen:'1',filters:(b:'0',binary:'1',binaryObject:'1',commentOnly:'0',debugCalls:'1',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'0',trim:'1'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:3,lang:c%2B%2B,libs:!(),options:'-O2+-march%3Drv64gcv_ztso0p1+-menable-experimental-extensions',overrides:!(),selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:2),l:'5',n:'0',o:'+RISC-V+rv64gc+clang+(trunk)+(Editor+%232)',t:'0')),header:(),k:48.70229526668027,l:'4',m:100,n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4). You can see that in both cases, it returns `0` which implies TSO is enabled. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15613#discussion_r1319481853 From rehn at openjdk.org Fri Sep 8 07:38:14 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 8 Sep 2023 07:38:14 GMT Subject: RFR: 8315652: RISC-V: Features string uses wrong separator for jtreg [v3] In-Reply-To: <3oL6QCLCgkfDLQrUohFBdczMAVuJXdLvAkwU7xWhaI8=.595b5953-9501-4e6c-a265-a9f014634c04@github.com> References: <3oL6QCLCgkfDLQrUohFBdczMAVuJXdLvAkwU7xWhaI8=.595b5953-9501-4e6c-a265-a9f014634c04@github.com> Message-ID: > Hi, please consider. > > As described in jbs, this handles both cases with a rough solution by having two strings. > Meaning we get e.g. 'v' as a separate feature from CPUInfo, but we still get the pretty string in e.g. hs_err. > > Tested tier1 on qemu rv. Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: One features string ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15579/files - new: https://git.openjdk.org/jdk/pull/15579/files/1ca70f8f..7247c9a8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15579&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15579&range=01-02 Stats: 33 lines in 4 files changed: 3 ins; 27 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/15579.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15579/head:pull/15579 PR: https://git.openjdk.org/jdk/pull/15579 From shade at openjdk.org Fri Sep 8 08:42:42 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 8 Sep 2023 08:42:42 GMT Subject: RFR: 8313202: MutexLocker should disallow null Mutexes [v7] In-Reply-To: <4vvR0KmqJsa1PmnblvSkgmsx6gv6n5DMYLbfbsEwaq0=.5b34be95-9ecc-46f6-b9a0-6c5372bdefe2@github.com> References: <4vvR0KmqJsa1PmnblvSkgmsx6gv6n5DMYLbfbsEwaq0=.5b34be95-9ecc-46f6-b9a0-6c5372bdefe2@github.com> Message-ID: On Mon, 4 Sep 2023 09:44:11 GMT, Aleksey Shipilev wrote: >> As seen in [JDK-8313081](https://bugs.openjdk.org/browse/JDK-8313081), it is fairly easy to pass nullptr `Mutex` to `MutexLocker` by accident, which would just silently avoid the lock. >> >> There are a few places in Hotspot where we pass `nullptr` to simulate re-entrancy and/or conditionally take the lock. Those places can be more explicit, and the default `MutexLocker` can disallow nullptrs for extra safety. >> >> More thorough testing with different GC/JIT combinations is running now, we might find more issues there. Meanwhile, please comment on the approach. >> >> Additional testing: >> - [x] `grep -R "MutexLocker " src/hotspot | grep -i null`, only new `ConditionalMutexLocker` hits >> - [x] `grep -R "MutexLocker " src/hotspot | grep -i ?`, no hits >> - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` x `Serial Parallel G1 Shenandoah` > > Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: > > - Touchup whitespace > - Rewrite jvmtiManageCapabilities lock usage > - Re-instate old asserts @alexmenkov, I rewrote the `jvmtiCapabilities` lock usage introduced by #15219, want to take a look as well? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15043#issuecomment-1711297469 From lucy at openjdk.org Fri Sep 8 09:38:47 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 8 Sep 2023 09:38:47 GMT Subject: RFR: 8308479: [s390x] Implement alternative fast-locking scheme [v8] In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 05:44:04 GMT, Amit Kumar wrote: >> This PR implements new fast-locking scheme for s390x. Additionally few parameters have been renamed to be in sync with PPC. >> >> Testing done (for release, fastdebug and slowdebug build): >> All `test/jdk/java/util/concurrent` test with parameters: >> * LockingMode=2 >> * LockingMode=2 with -Xint >> * LockingMode=2 with -XX:TieredStopAtLevel=1 >> * LockingMode=2 with -XX:-TieredCompilation >> >> Result is consistently similar to Aarch(MacOS) and PPC, All of 124 tests are passing except `MapLoops.java` because in the 2nd part for this testcase, jvm starts with `HeavyMonitors` which conflict with `LockingMode=2` >> >> BenchMark Result for Renaissance-jmh: >> >> | Benchmark | Without fastLock (ms/op) | With fastLock (ms/op) | Improvement | >> |------------------------------------------|-------------------------|----------------------|-------------| >> | o.r.actors.JmhAkkaUct.runOperation | 1565.080 | 1365.877 | 12.70% | >> | o.r.actors.JmhReactors.runOperation | 9316.972 | 10592.982 | -13.70% | >> | o.r.jdk.concurrent.JmhFjKmeans.runOperation | 1257.183 | 1235.530 | 1.73% | >> | o.r.jdk.concurrent.JmhFutureGenetic.runOperation | 1925.158 | 2073.066 | -7.69% | >> | o.r.jdk.streams.JmhParMnemonics.runOperation | 2746.664 | 2836.085 | -3.24% | >> | o.r.jdk.streams.JmhScrabble.runOperation | 76.774 | 74.239 | 3.31% | >> | o.r.rx.JmhRxScrabble.runOperation | 162.270 | 167.061 | -2.96% | >> | o.r.scala.sat.JmhScalaDoku.runOperation | 3333.711 | 3271.078 | 1.88% | >> | o.r.scala.stdlib.JmhScalaKmeans.runOperation | 182.746 | 182.153 | 0.33% | >> | o.r.scala.stm.JmhPhilosophers.runOperation | 15003.329 | 13396.921 | 10.57% | >> | o.r.scala.stm.JmhScalaStmBench7.runOperation | 1669.090 | 1579.900 | 5.34% | >> | o.r.twitter.finagle.JmhFinagleChirper.runOperation | 9601.963 | 10034.404 | -4.52% | >> | o.r.twitter.finagle.JmhFinagleHttp.runOperation | 4403.725 | 4746.707 | -7.79% | >> >> >> DaCapo Benchmark Result: >> >> | Benchmark | Without fast lock (msec) | With fast lock (msec) | Improvement | >> |--... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > suggestions from Martin Looks good functionally. Performance regression should be further analyzed in a separate task. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14414#pullrequestreview-1617075351 From lucy at openjdk.org Fri Sep 8 09:38:49 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 8 Sep 2023 09:38:49 GMT Subject: RFR: 8308479: [s390x] Implement alternative fast-locking scheme [v8] In-Reply-To: <0OUTu5q5D3Cofz2eoihwbCukQKWudzM9h-YtSu8ua14=.024196a2-2f8d-4c4d-b659-1d1cb3f2f1f8@github.com> References: <0OUTu5q5D3Cofz2eoihwbCukQKWudzM9h-YtSu8ua14=.024196a2-2f8d-4c4d-b659-1d1cb3f2f1f8@github.com> Message-ID: On Thu, 7 Sep 2023 20:02:17 GMT, Coleen Phillimore wrote: > We are considering making Fast Locking on by default for Oracle supported platforms. Have these performance concerns been addressed? The performance regression which is observed in some tests is still not fully understood. I will approve the PR despite of that. According to all out testing, it is functionally correct. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14414#issuecomment-1711371003 From bulasevich at openjdk.org Fri Sep 8 10:02:53 2023 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 8 Sep 2023 10:02:53 GMT Subject: Integrated: 8307352: AARCH64: Improve itable_stub In-Reply-To: References: Message-ID: On Thu, 4 May 2023 07:36:43 GMT, Boris Ulasevich wrote: > This is a change for AARCH similar to https://github.com/openjdk/jdk/pull/13460 > > The change replaces two separate iterations over the itable with a new algorithm consisting of two loops. First, we look for a match with resolved_klass, checking for a match with holder_klass along the way. Then we continue iterating (not starting over) the itable using the second loop, checking only for a match with holder_klass. > > InterfaceCalls openjdk benchmark performance results on A53, A72, Neoverse N1 and V1 micro-architectures: > > > Cortex-A53 (Pi 3 Model B Rev 1.2) > > test1stInt2Types 37.5 37.358 0.38 > test1stInt3Types 160.166 148.04 8.19 > test1stInt5Types 158.131 147.955 6.88 > test2ndInt2Types 52.634 53.291 -1.23 > test2ndInt3Types 201.39 181.603 10.90 > test2ndInt5Types 195.722 176.707 10.76 > testIfaceCall 157.453 140.498 12.07 > testIfaceExtCall 175.46 154.351 13.68 > testMonomorphic 32.052 32.039 0.04 > AVG: 6.85 > > Cortex-A72 (Pi 4 Model B Rev 1.2) > > test1stInt2Types 27.4796 27.4738 0.02 > test1stInt3Types 66.0085 64.9374 1.65 > test1stInt5Types 67.9812 66.2316 2.64 > test2ndInt2Types 32.0581 32.062 -0.01 > test2ndInt3Types 68.2715 65.6643 3.97 > test2ndInt5Types 68.1012 65.8024 3.49 > testIfaceCall 64.0684 64.1811 -0.18 > testIfaceExtCall 91.6226 81.5867 12.30 > testMonomorphic 26.7161 26.7142 0.01 > AVG: 2.66 > > Neoverse N1 (m6g.metal) > > test1stInt2Types 2.9104 2.9086 0.06 > test1stInt3Types 10.9642 10.2909 6.54 > test1stInt5Types 10.9607 10.2856 6.56 > test2ndInt2Types 3.3410 3.3478 -0.20 > test2ndInt3Types 12.3291 11.3089 9.02 > test2ndInt5Types 12.328 11.2704 9.38 > testIfaceCall 11.0598 10.3657 6.70 > testIfaceExtCall 13.0692 11.2826 15.84 > testMonomorphic 2.2354 2.2341 0.06 > AVG: 6.00 > > Neoverse V1 (c7g.2xlarge) > > test1stInt2Types 2.2317 2.2320 -0.01 > test1stInt3Types 6.6884 6.1911 8.03 > test1stInt5Types 6.7334 6.2193 8.27 > test2ndInt2Types 2.4002 2.4013 -0.04 > test2ndInt3Types 7.9603 7.0372 13.12 > test2ndInt5Types 7.9532 7.0474 12.85 > testIfaceCall 6.7028 6.3272 5.94 > testIfaceExtCall 8.3253 6.9416 19.93 > testMonomorphic 1.2446 1.2544 -0.79 > AVG: 7.48 > > > Testing... This pull request has now been integrated. Changeset: c664f1ca Author: Boris Ulasevich URL: https://git.openjdk.org/jdk/commit/c664f1ca660adea934f099de8595b6ec10d3a824 Stats: 133 lines in 4 files changed: 113 ins; 15 del; 5 mod 8307352: AARCH64: Improve itable_stub Reviewed-by: simonis, eastigeevich, aph ------------- PR: https://git.openjdk.org/jdk/pull/13792 From fyang at openjdk.org Fri Sep 8 10:57:40 2023 From: fyang at openjdk.org (Fei Yang) Date: Fri, 8 Sep 2023 10:57:40 GMT Subject: RFR: 8315652: RISC-V: Features string uses wrong separator for jtreg [v3] In-Reply-To: References: <3oL6QCLCgkfDLQrUohFBdczMAVuJXdLvAkwU7xWhaI8=.595b5953-9501-4e6c-a265-a9f014634c04@github.com> Message-ID: On Fri, 8 Sep 2023 07:38:14 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> As described in jbs, this handles both cases with a rough solution by having two strings. >> Meaning we get e.g. 'v' as a separate feature from CPUInfo, but we still get the pretty string in e.g. hs_err. >> >> Tested tier1 on qemu rv. > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > One features string Updated change LGTM. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15579#pullrequestreview-1617205044 From duke at openjdk.org Fri Sep 8 11:24:48 2023 From: duke at openjdk.org (ExE Boss) Date: Fri, 8 Sep 2023 11:24:48 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v16] In-Reply-To: References: Message-ID: On Thu, 7 Sep 2023 13:07:50 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > Add support for sliced allocation src/java.base/share/classes/jdk/internal/foreign/NativeMemorySegmentImpl.java line 154: > 152: return UNSAFE.allocateMemory(size); > 153: } catch (IllegalArgumentException ex) { > 154: throw new OutOfMemoryError(); `OutOfMemoryError` should?be?updated to?have the?`Throwable`?accepting constructor?overloads, so?that this?can?include the?cause: Suggestion: throw new OutOfMemoryError(ex); See?also: https://github.com/openjdk/panama-foreign/pull/855#discussion_r1285058300 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1319732046 From mgronlun at openjdk.org Fri Sep 8 11:48:56 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Fri, 8 Sep 2023 11:48:56 GMT Subject: RFR: 8315930: Revert "8315220: Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native" Message-ID: Greetings, [JDK-8315220](https://bugs.openjdk.org/browse/JDK-8315220) hit an issue, [JDK-8315892](https://bugs.openjdk.org/browse/JDK-8315892), so we need to backout [JDK-8315220](https://bugs.openjdk.org/browse/JDK-8315220) This change set is the git revert. Thanks Markus ------------- Commit messages: - Revert "8315220: Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native" Changes: https://git.openjdk.org/jdk/pull/15635/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15635&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8315930 Stats: 395 lines in 10 files changed: 114 ins; 243 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/15635.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15635/head:pull/15635 PR: https://git.openjdk.org/jdk/pull/15635 From egahlin at openjdk.org Fri Sep 8 12:23:41 2023 From: egahlin at openjdk.org (Erik Gahlin) Date: Fri, 8 Sep 2023 12:23:41 GMT Subject: RFR: 8315930: Revert "8315220: Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native" In-Reply-To: References: Message-ID: On Fri, 8 Sep 2023 11:42:05 GMT, Markus Gr?nlund wrote: > Greetings, > > [JDK-8315220](https://bugs.openjdk.org/browse/JDK-8315220) hit an issue, [JDK-8315892](https://bugs.openjdk.org/browse/JDK-8315892), so we need to backout [JDK-8315220](https://bugs.openjdk.org/browse/JDK-8315220) > > This change set is the git revert. > > Testing: jdk_jfr > > Thanks > Markus Marked as reviewed by egahlin (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15635#pullrequestreview-1617364734 From aph at openjdk.org Fri Sep 8 12:23:49 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 8 Sep 2023 12:23:49 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v6] In-Reply-To: References: Message-ID: On Fri, 8 Sep 2023 00:57:10 GMT, Hao Sun wrote: >> ### Background >> >> 1. PAC-RET branch protection was initially implemented on Linux/AArch64 in JDK-8277204 [1]. >> >> 2. However, it was broken with the introduction of virtual threads [2], mainly because the continuation freeze/thaw mechanism would trigger stack copying to/from memory, whereas the saved and signed LR on the stack doesn't get re-signed accordingly. >> >> 3. PR-9067 [3] tried to implement the re-sign part, but it was not accepted because option "PreserveFramePointer" is always turned on by PAC-RET but this would slow down virtual threads by ~5-20x. >> >> 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview language features are enabled. Note that virtual thread is one preview feature then. >> >> 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. >> >> ### Goal >> >> This patch aims to make PAC-RET compatible with virtual threads. >> >> ### Requirements of virtual threads >> >> R-1: Option "PreserveFramePointer" should be turned off. That is, PAC-RET implementation should not rely on frame pointer FP. Otherwise, the fast path in stack copying will never be taken. >> >> R-2: Use some invariant values to stack copying as the modifier, so as to avoid the PAC re-sign for continuation thaw, as the fast path in stack copying doesn't walk the frame. >> >> Note that more details can be found in the discussion [3]. >> >> ### Investigation >> >> We considered to use (relative) stack pointer SP, thread ID, PACStack [7] and value zero as the candidate modifier. >> >> 1. SP: In some scenarios, we need to authenticate the return address in places where the current SP doesn't match the SP on function entry. E.g. see the usage in Runtime1::generate_handle_exception(). Hence, neither absolute nor relative SP works. >> >> 2. thread ID (tid): It's invariant to virtual thread, but it's nontrivial to access it from the JIT side. We need 1) firstly resolve the address of current thread (See [8] as an example), and 2) get the tid field in the way like java_lang_Thread::thread_id(). I suppose this would introduce big performance overhead. Then can we turn to use "rthread" register (JavaThread object address) as the modifier? Unfortunately, it's not an invariant to virtual threads and PAC re-sign is still needed. >> >> 5. PACStack uses the signed return address of caller as the modifier to sign the callee's return address. In this way, we get one PACed call chain. The modifier should be saved into somewhere around the frame record. Inevitably, FP sh... > > Hao Sun has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: > > - Revert to the implementation with zero as the PAC modifier > - Merge branch 'master' into jdk-8287325 > - Update aarch64.ad and jvmci AArch64TestAssembler.java > > Before this patch, rscratch1 is clobbered. > With this patch, we use the rscratch1 register after we save it on the > stack. > > In this way, the code would be consistent with > macroAssembler_aarch64.cpp. > - Merge branch 'master' into jdk-8287325 > - Remove my temp test patch on jvmci_global.hpp and stubGenerator_aarch64.hpp > - Use relative SP as the PAC modifier > - Merge branch 'master' into jdk-8287325 > - Merge branch 'master' into jdk-8287325 > - Rename return_pc_at and patch_pc_at > > Rename return_pc_at to return_address_at. > Rename patch_pc_at to patch_return_address_at. > - 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret > > * Background > > 1. PAC-RET branch protection was initially implemented on Linux/AArch64 > in JDK-8277204 [1]. > > 2. However, it was broken with the introduction of virtual threads [2], > mainly because the continuation freeze/thaw mechanism would trigger > stack copying to/from memory, whereas the saved and signed LR on the > stack doesn't get re-signed accordingly. > > 3. PR-9067 [3] tried to implement the re-sign part, but it was not > accepted because option "PreserveFramePointer" is always turned on by > PAC-RET but this would slow down virtual threads by ~5-20x. > > 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview > language features are enabled. Note that virtual thread is one preview > feature then. > > 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. > > * Goal > > This patch aims to make PAC-RET compatible with virtual threads. > > * Requirements of virtual threads > > R-1: Option "PreserveFramePointer" should be turned off. That is, > PAC-RET implementation should not rely on frame pointer FP. Otherwise, > the fast path in stack copying will never be taken. > > R-2: Use some invariant values to stack copying as the modifier, so as > to avoid the PAC re-sign for continuation thaw, as the fast path in > stack copying doesn't walk the frame. > > Note that more details can be found in the discussion [3]. > > * Investigation > > We considered to use (relative) stack pointer SP, thread ID, PACStack > [7] ... src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp line 64: > 62: *(address*)sp = pc; > 63: } > 64: Is it possible to make put methods in the superclass, and override then only for AArch64? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1319799674 From aph at openjdk.org Fri Sep 8 12:26:45 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 8 Sep 2023 12:26:45 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v6] In-Reply-To: References: Message-ID: On Fri, 8 Sep 2023 00:57:10 GMT, Hao Sun wrote: >> ### Background >> >> 1. PAC-RET branch protection was initially implemented on Linux/AArch64 in JDK-8277204 [1]. >> >> 2. However, it was broken with the introduction of virtual threads [2], mainly because the continuation freeze/thaw mechanism would trigger stack copying to/from memory, whereas the saved and signed LR on the stack doesn't get re-signed accordingly. >> >> 3. PR-9067 [3] tried to implement the re-sign part, but it was not accepted because option "PreserveFramePointer" is always turned on by PAC-RET but this would slow down virtual threads by ~5-20x. >> >> 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview language features are enabled. Note that virtual thread is one preview feature then. >> >> 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. >> >> ### Goal >> >> This patch aims to make PAC-RET compatible with virtual threads. >> >> ### Requirements of virtual threads >> >> R-1: Option "PreserveFramePointer" should be turned off. That is, PAC-RET implementation should not rely on frame pointer FP. Otherwise, the fast path in stack copying will never be taken. >> >> R-2: Use some invariant values to stack copying as the modifier, so as to avoid the PAC re-sign for continuation thaw, as the fast path in stack copying doesn't walk the frame. >> >> Note that more details can be found in the discussion [3]. >> >> ### Investigation >> >> We considered to use (relative) stack pointer SP, thread ID, PACStack [7] and value zero as the candidate modifier. >> >> 1. SP: In some scenarios, we need to authenticate the return address in places where the current SP doesn't match the SP on function entry. E.g. see the usage in Runtime1::generate_handle_exception(). Hence, neither absolute nor relative SP works. >> >> 2. thread ID (tid): It's invariant to virtual thread, but it's nontrivial to access it from the JIT side. We need 1) firstly resolve the address of current thread (See [8] as an example), and 2) get the tid field in the way like java_lang_Thread::thread_id(). I suppose this would introduce big performance overhead. Then can we turn to use "rthread" register (JavaThread object address) as the modifier? Unfortunately, it's not an invariant to virtual threads and PAC re-sign is still needed. >> >> 5. PACStack uses the signed return address of caller as the modifier to sign the callee's return address. In this way, we get one PACed call chain. The modifier should be saved into somewhere around the frame record. Inevitably, FP sh... > > Hao Sun has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: > > - Revert to the implementation with zero as the PAC modifier > - Merge branch 'master' into jdk-8287325 > - Update aarch64.ad and jvmci AArch64TestAssembler.java > > Before this patch, rscratch1 is clobbered. > With this patch, we use the rscratch1 register after we save it on the > stack. > > In this way, the code would be consistent with > macroAssembler_aarch64.cpp. > - Merge branch 'master' into jdk-8287325 > - Remove my temp test patch on jvmci_global.hpp and stubGenerator_aarch64.hpp > - Use relative SP as the PAC modifier > - Merge branch 'master' into jdk-8287325 > - Merge branch 'master' into jdk-8287325 > - Rename return_pc_at and patch_pc_at > > Rename return_pc_at to return_address_at. > Rename patch_pc_at to patch_return_address_at. > - 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret > > * Background > > 1. PAC-RET branch protection was initially implemented on Linux/AArch64 > in JDK-8277204 [1]. > > 2. However, it was broken with the introduction of virtual threads [2], > mainly because the continuation freeze/thaw mechanism would trigger > stack copying to/from memory, whereas the saved and signed LR on the > stack doesn't get re-signed accordingly. > > 3. PR-9067 [3] tried to implement the re-sign part, but it was not > accepted because option "PreserveFramePointer" is always turned on by > PAC-RET but this would slow down virtual threads by ~5-20x. > > 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview > language features are enabled. Note that virtual thread is one preview > feature then. > > 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. > > * Goal > > This patch aims to make PAC-RET compatible with virtual threads. > > * Requirements of virtual threads > > R-1: Option "PreserveFramePointer" should be turned off. That is, > PAC-RET implementation should not rely on frame pointer FP. Otherwise, > the fast path in stack copying will never be taken. > > R-2: Use some invariant values to stack copying as the modifier, so as > to avoid the PAC re-sign for continuation thaw, as the fast path in > stack copying doesn't walk the frame. > > Note that more details can be found in the discussion [3]. > > * Investigation > > We considered to use (relative) stack pointer SP, thread ID, PACStack > [7] ... > In the latest commit, I have reverted to the PAC-RET implementation using `zero` as the modifier. > @theRealAph Could you help take another look at it when you have spare time? Thanks Looking good. One more nit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13322#issuecomment-1711587153 From mgronlun at openjdk.org Fri Sep 8 12:29:56 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Fri, 8 Sep 2023 12:29:56 GMT Subject: Integrated: 8315930: Revert "8315220: Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native" In-Reply-To: References: Message-ID: On Fri, 8 Sep 2023 11:42:05 GMT, Markus Gr?nlund wrote: > Greetings, > > [JDK-8315220](https://bugs.openjdk.org/browse/JDK-8315220) hit an issue, [JDK-8315892](https://bugs.openjdk.org/browse/JDK-8315892), so we need to backout [JDK-8315220](https://bugs.openjdk.org/browse/JDK-8315220) > > This change set is the git revert. > > Testing: jdk_jfr > > Thanks > Markus This pull request has now been integrated. Changeset: b3dfc399 Author: Markus Gr?nlund URL: https://git.openjdk.org/jdk/commit/b3dfc399dae714958f22624daf76831c6ec2dfe0 Stats: 395 lines in 10 files changed: 114 ins; 243 del; 38 mod 8315930: Revert "8315220: Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native" Reviewed-by: egahlin ------------- PR: https://git.openjdk.org/jdk/pull/15635 From sspitsyn at openjdk.org Fri Sep 8 13:28:42 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 8 Sep 2023 13:28:42 GMT Subject: RFR: 8312174: missing JVMTI events from vthreads parked during JVMTI attach [v2] In-Reply-To: <7cWSn9ZypedqQPPkGV8xFhFfpGYHMJcDZ5TH6Hel4mQ=.49235d1c-737c-465e-833c-a26d814cb7ac@github.com> References: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> <7cWSn9ZypedqQPPkGV8xFhFfpGYHMJcDZ5TH6Hel4mQ=.49235d1c-737c-465e-833c-a26d814cb7ac@github.com> Message-ID: On Wed, 6 Sep 2023 20:23:57 GMT, Alex Menkov wrote: >> Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge >> - 8312174: missing JVMTI events from vthreads parked during JVMTI attach > > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/VThreadEventTest.java line 100: > >> 98: mready.await(); >> 99: try { >> 100: // timeout is big enough to keep mounted untill interrupted > > The comment is misleading. 1st group of threads are expected to be unmounted during attach and mounted after the threads are interrupted. Thanks! It was original typo in the comment. Fixed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1319870711 From sspitsyn at openjdk.org Fri Sep 8 13:42:41 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 8 Sep 2023 13:42:41 GMT Subject: RFR: 8312174: missing JVMTI events from vthreads parked during JVMTI attach [v2] In-Reply-To: <7cWSn9ZypedqQPPkGV8xFhFfpGYHMJcDZ5TH6Hel4mQ=.49235d1c-737c-465e-833c-a26d814cb7ac@github.com> References: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> <7cWSn9ZypedqQPPkGV8xFhFfpGYHMJcDZ5TH6Hel4mQ=.49235d1c-737c-465e-833c-a26d814cb7ac@github.com> Message-ID: On Wed, 6 Sep 2023 20:24:59 GMT, Alex Menkov wrote: >> Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge >> - 8312174: missing JVMTI events from vthreads parked during JVMTI attach > > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/VThreadEventTest.java line 91: > >> 89: >> 90: try (ExecutorService executorService = Executors.newVirtualThreadPerTaskExecutor()) { >> 91: for (int tCnt = 0; tCnt < TCNT1; tCnt++) { > > Could you please add a comment before each test group creation block about expected state Good suggestion. Fixed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1319886373 From duke at openjdk.org Fri Sep 8 13:48:39 2023 From: duke at openjdk.org (JoKern65) Date: Fri, 8 Sep 2023 13:48:39 GMT Subject: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX In-Reply-To: References: Message-ID: <8iYLsc9A5_530-soJPf_LQFQcXSD5f_IHDeR5mG9ndY=.6d6aaa02-244b-4f75-8e80-6e960f8abd22@github.com> On Wed, 6 Sep 2023 08:18:45 GMT, JoKern65 wrote: > After push of [JDK-8307478](https://bugs.openjdk.org/browse/JDK-8307478) , the following test started to fail on AIX : > com/sun/tools/attach/warnings/DynamicLoadWarningTest.java; > The problem was described in [JDK-8309549](https://bugs.openjdk.org/browse/JDK-8309549) with a first try of a fix. > A second fix via [JDK-8310191](https://bugs.openjdk.org/browse/JDK-8310191) was necessary. > Both fixes just disable the specific subtest on AIX, without correction of the root cause. > The root cause is, that dlopen() on AIX returns different handles every time, even if you load a library twice. There is no official AIX API available to get this information on a different way. > My proposal is, to use the stat64x API with the fields st_device and st_inode. After a dlopen() the stat64x() API is called additionally to get this information which is then stored parallel to the library handle in the jvmtiAgent. For AIX we then can compare these values instead of the library handle and get the same functionality as on linux. Why not having a solution for AIX anyway? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15583#issuecomment-1711701269 From sspitsyn at openjdk.org Fri Sep 8 13:56:41 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 8 Sep 2023 13:56:41 GMT Subject: RFR: 8312174: missing JVMTI events from vthreads parked during JVMTI attach [v2] In-Reply-To: <7cWSn9ZypedqQPPkGV8xFhFfpGYHMJcDZ5TH6Hel4mQ=.49235d1c-737c-465e-833c-a26d814cb7ac@github.com> References: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> <7cWSn9ZypedqQPPkGV8xFhFfpGYHMJcDZ5TH6Hel4mQ=.49235d1c-737c-465e-833c-a26d814cb7ac@github.com> Message-ID: <8Mt2R6DmHFqu51U9duEytc-VE-_K1DC_qDR-BmKonKk=.29bc3e3c-121a-40e8-97fc-aa10ec028ff1@github.com> On Wed, 6 Sep 2023 20:33:46 GMT, Alex Menkov wrote: >> Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge >> - 8312174: missing JVMTI events from vthreads parked during JVMTI attach > > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/VThreadEventTest.java line 149: > >> 147: for (int sleepNo = 0; sleepNo < 10 && threadEndCount() < THREAD_CNT; sleepNo++) { >> 148: log("main: wait iter: " + sleepNo); >> 149: Thread.sleep(100); > > sleep(1000)? (comment before the loop tells about 10 secs) Good catch. Leonid suggested to make waiting with unlimited sleeps, so the test timeout will work. Made it like below: // wait until all VirtualThreadEnd events are sent while (int sleepNo = 1; threadEndCount() < THREAD_CNT; sleepNo++) { if (sleepNo % 100 == 0) { // 10 sec period of waiting log("main: wait iter: " + sleepNo); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1319903293 From sspitsyn at openjdk.org Fri Sep 8 14:02:41 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 8 Sep 2023 14:02:41 GMT Subject: RFR: 8312174: missing JVMTI events from vthreads parked during JVMTI attach [v2] In-Reply-To: <7cWSn9ZypedqQPPkGV8xFhFfpGYHMJcDZ5TH6Hel4mQ=.49235d1c-737c-465e-833c-a26d814cb7ac@github.com> References: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> <7cWSn9ZypedqQPPkGV8xFhFfpGYHMJcDZ5TH6Hel4mQ=.49235d1c-737c-465e-833c-a26d814cb7ac@github.com> Message-ID: On Wed, 6 Sep 2023 20:39:59 GMT, Alex Menkov wrote: >> Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge >> - 8312174: missing JVMTI events from vthreads parked during JVMTI attach > > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/VThreadEventTest.java line 141: > >> 139: log("main: completedNo: " + completedNo); >> 140: attached = true; >> 141: for (Thread t : threads) { > > AFAIU threads in 3rd group (TCNT3) should be unmounted (with LockSupport.parkNanos) before they are interrupted. > Then we need sleep here Not sure, I understand the suggestion. The only interrupted threads are those in the TCN1 group as only these threads are added to the threads list. Please, see the line 95. Do you still think an extra timeout is needed here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1319910696 From jvernee at openjdk.org Fri Sep 8 14:35:25 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 8 Sep 2023 14:35:25 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v19] In-Reply-To: References: Message-ID: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 41 commits: - Merge branch 'master' into JEP22 - add code snippet - Split long throws clauses in `MemorySegment` javadoc Reviewed-by: jvernee - Add support for sliced allocation - add name of SysV ABI - Fix javadoc issues in MemorySegment::copy Reviewed-by: jvernee - remove reference to allocateArray - PPC linker changes - Merge branch 'master' into JEP22 - Paul's review comments - ... and 31 more: https://git.openjdk.org/jdk/compare/e409d07a...dbf3eec6 ------------- Changes: https://git.openjdk.org/jdk/pull/15103/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=18 Stats: 3753 lines in 244 files changed: 1897 ins; 1000 del; 856 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From jvernee at openjdk.org Fri Sep 8 14:40:47 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 8 Sep 2023 14:40:47 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v16] In-Reply-To: References: Message-ID: On Fri, 8 Sep 2023 11:20:39 GMT, ExE Boss wrote: >> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: >> >> Add support for sliced allocation > > src/java.base/share/classes/jdk/internal/foreign/NativeMemorySegmentImpl.java line 154: > >> 152: return UNSAFE.allocateMemory(size); >> 153: } catch (IllegalArgumentException ex) { >> 154: throw new OutOfMemoryError(); > > `OutOfMemoryError` should?be?updated to?have the?`Throwable`?accepting constructor?overloads, so?that this?can?include the?cause: > Suggestion: > > throw new OutOfMemoryError(ex); > > > See?also: https://github.com/openjdk/panama-foreign/pull/855#discussion_r1285058300 I think enhancing `OutOfMemoryError` is a discussion to be had separately. Not as a part of this JEP. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1319957862 From jjoo at openjdk.org Fri Sep 8 21:31:18 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Fri, 8 Sep 2023 21:31:18 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v7] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Move ThreadTotalCPUTimeClosure to thread.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/bcfe1516..780dfd34 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=05-06 Stats: 51 lines in 2 files changed: 25 ins; 26 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From jjoo at openjdk.org Fri Sep 8 21:40:26 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Fri, 8 Sep 2023 21:40:26 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v8] In-Reply-To: References: Message-ID: <1VlRygD5Ma7S7PwUsJdwi-gPVo_T30nwA7Tv0BAemNA=.47e3cfc0-0f89-4a9a-bd41-31689db67bd1@github.com> > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Fix includes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/780dfd34..2f44b814 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=06-07 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From sspitsyn at openjdk.org Fri Sep 8 23:32:40 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 8 Sep 2023 23:32:40 GMT Subject: RFR: 8312174: missing JVMTI events from vthreads parked during JVMTI attach [v2] In-Reply-To: References: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> Message-ID: <6T7nzXoS2bDHcS0JLhLiPxupus3ai5Tcb-XlqRvKivw=.5736c18a-890b-4907-b7f9-c96e85fa6d0a@github.com> On Fri, 8 Sep 2023 03:03:56 GMT, Leonid Mesnik wrote: >> Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge >> - 8312174: missing JVMTI events from vthreads parked during JVMTI attach > > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/VThreadEventTest.java line 86: > >> 84: log("WARNING: test expects at least 8 processors."); >> 85: } >> 86: Counter ready1 = new Counter(THREAD_CNT); > > I think that CountDownLatch should works fine here and no need to use custom Counter. Thanks. Replaced it with the CountDownLatch now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1320422749 From sspitsyn at openjdk.org Fri Sep 8 23:32:42 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 8 Sep 2023 23:32:42 GMT Subject: RFR: 8312174: missing JVMTI events from vthreads parked during JVMTI attach [v2] In-Reply-To: <9NY499Z6epRjQ-ZvDrbxS6weL8QG-7djNWJN-o9SCmc=.6d58bb4a-2e26-4a96-ae1b-dbe6682ebe8f@github.com> References: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> <7cWSn9ZypedqQPPkGV8xFhFfpGYHMJcDZ5TH6Hel4mQ=.49235d1c-737c-465e-833c-a26d814cb7ac@github.com> <9NY499Z6epRjQ-ZvDrbxS6weL8QG-7djNWJN-o9SCmc=.6d58bb4a-2e26-4a96-ae1b-dbe6682ebe8f@github.com> Message-ID: <-viFMo-IWbIoWm7DBpH5yISHqfmVdSy-UFwroc6O1-w=.b961f69b-71b1-431e-a19f-a5880292ec12@github.com> On Fri, 8 Sep 2023 03:08:33 GMT, Leonid Mesnik wrote: >> test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/VThreadEventTest.java line 136: >> >>> 134: ready1.await(); >>> 135: mready.decr(); >>> 136: VirtualMachine vm = VirtualMachine.attach(String.valueOf(ProcessHandle.current().pid())); >> >> I think sleep is needed here so threads which should be unmounted have time to unmount before attach. > > Would it makes sense also to check that thread state is TIMED_WAITING. It should be set TIMED_WAITING unmounted threads https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/lang/Thread.State.html#TIMED_WAITING It is strange that the the tested vthreads in sleep(timeout) have sate WAITING, not TIMED_WAITING. It can be a bug in the implementation. I've decided to add a short sleep. Checking states looks a little bit over complicated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1320422072 From manc at openjdk.org Sat Sep 9 01:04:43 2023 From: manc at openjdk.org (Man Cao) Date: Sat, 9 Sep 2023 01:04:43 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v8] In-Reply-To: <1VlRygD5Ma7S7PwUsJdwi-gPVo_T30nwA7Tv0BAemNA=.47e3cfc0-0f89-4a9a-bd41-31689db67bd1@github.com> References: <1VlRygD5Ma7S7PwUsJdwi-gPVo_T30nwA7Tv0BAemNA=.47e3cfc0-0f89-4a9a-bd41-31689db67bd1@github.com> Message-ID: <_-ekt02un2mj-y6Z_v-4HiLjDoassXhZirdsu3ZmmUA=.fca59c04-7ddf-44fc-9fc2-575e915509dd@github.com> On Fri, 8 Sep 2023 21:40:26 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Fix includes Have you enabled Github actions on your personal fork? I don't see the OpenJDK pre-submit build and tests being executed. See https://wiki.openjdk.org/display/SKARA/Testing#Testing-Configuringworkflowstorun, https://bugs.openjdk.org/browse/SKARA-846 src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 461: > 459: > 460: _g1_concurrent_mark_threads_cpu_time = > 461: PerfDataManager::create_counter(SUN_THREADS, "g1_conc_mark_thread.cpu_time", I find a bit odd to have "." in the name. "." should be the separator for namespace, but not within the counter name. I think @simonis 's suggestion about `sun.gc.collector..cpu_time` or `sun.gc.cpu_time` is to have a single, aggregated counter named `cpu_time`. If we don't do such aggregation, the names should probably be `g1_conc_mark_cpu_time`, `parallel_gc_workers_cpu_time`, etc. src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 168: > 166: // the primary thread is always woken up first from being blocked on a monitor > 167: // when there is refinement work to do (see comment in > 168: // G1ConcurrentRefineThread's constructor); The comment in G1ConcurrentRefineThread's constructor no longer exists, so probably just remove the reference in parenthesis. src/hotspot/share/gc/g1/g1ConcurrentRefineThread.hpp line 87: > 85: void report_inactive(const char* reason, const G1ConcurrentRefineStats& stats) const; > 86: > 87: // G1ConcurrentRefineThreadControl::update_threads_cpu_time() relies on the `is_primary()` is unused now, so it can be removed. src/hotspot/share/runtime/thread.hpp line 663: > 661: // hsperfdata counter. > 662: class ThreadTotalCPUTimeClosure: public ThreadClosure { > 663: private: It might be preferable to move this class to share/memory/iterator.hpp (where `ThreadClosure` is defined) or runtime/perfData.hpp (where a similar class `PerfTraceTime` is defined). Also we probably want to move the body of `do_thread()` and `~ThreadTotalCPUTimeClosure()` to the corresponding .cpp file, to minimize new include statements in .hpp. src/hotspot/share/runtime/thread.hpp line 680: > 678: // must ensure the thread exists and has not terminated. > 679: assert(os::is_thread_cpu_time_supported(), "os must support cpu time"); > 680: _time_diff = os::thread_cpu_time(thread); This does not look correct, the `_time_diff` is not a delta with the previous value of the CPU time. It also no longer accumulates the CPU time across a set of threads. I think we should stay with the previous approach of using `PerfVariable`, accumulating CPU time with `_total += os::thread_cpu_time(thread)`, then call ` _counter->set_value(_total)` in the destructor. To @simonis's point about using `PerfCounter` instead of `PerfVariable`, I agree ideally CPU time could use monotonically increasing `PerfCounter`. However, it would require computing a diff with the previously observed CPU time, which is essentially: `_counter->inc(_total - _counter->get_value())`. It looks unnecessary and is not as clean as ` _counter->set_value(_total)`. ------------- Changes requested by manc (Committer). PR Review: https://git.openjdk.org/jdk/pull/15082#pullrequestreview-1618521539 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1320442281 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1320430786 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1320441955 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1320441284 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1320434483 From sspitsyn at openjdk.org Sat Sep 9 01:23:41 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 9 Sep 2023 01:23:41 GMT Subject: RFR: 8312174: missing JVMTI events from vthreads parked during JVMTI attach [v2] In-Reply-To: References: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> Message-ID: On Fri, 8 Sep 2023 02:46:36 GMT, Leonid Mesnik wrote: >> Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge >> - 8312174: missing JVMTI events from vthreads parked during JVMTI attach > > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/libVThreadEventTest.cpp line 30: > >> 28: #include "jvmti_common.h" >> 29: >> 30: #ifdef _WIN32 > > Do we need it here? Thanks. Simplified now. > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/libVThreadEventTest.cpp line 44: > >> 42: >> 43: void JNICALL VirtualThreadEnd(jvmtiEnv *jvmti, JNIEnv* jni, jthread virtual_thread) { >> 44: std::lock_guard lockGuard(lock); > > the atomic would be better for counters. It guarantees that counter is always protected. Thanks. Alex suggested the same. Fixed now. > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/libVThreadEventTest.cpp line 62: > >> 60: >> 61: void >> 62: check_jvmti_err(jvmtiError err, const char* msg) { > > This function could be moved into jvmti_common.h. Good suggestion, thanks. Fixed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1320450305 PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1320450372 PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1320450217 From sspitsyn at openjdk.org Sat Sep 9 01:23:42 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 9 Sep 2023 01:23:42 GMT Subject: RFR: 8312174: missing JVMTI events from vthreads parked during JVMTI attach [v2] In-Reply-To: <7cWSn9ZypedqQPPkGV8xFhFfpGYHMJcDZ5TH6Hel4mQ=.49235d1c-737c-465e-833c-a26d814cb7ac@github.com> References: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> <7cWSn9ZypedqQPPkGV8xFhFfpGYHMJcDZ5TH6Hel4mQ=.49235d1c-737c-465e-833c-a26d814cb7ac@github.com> Message-ID: On Wed, 6 Sep 2023 20:32:03 GMT, Alex Menkov wrote: >> Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge >> - 8312174: missing JVMTI events from vthreads parked during JVMTI attach > > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/libVThreadEventTest.cpp line 37: > >> 35: >> 36: namespace { >> 37: std::mutex lock; > > This mutex is only to make access to counters atomic. > It would be clearer to make counters std::atomic and remove the mutex Good suggestion, thanks. Fixed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1320450060 From fyang at openjdk.org Sat Sep 9 03:21:38 2023 From: fyang at openjdk.org (Fei Yang) Date: Sat, 9 Sep 2023 03:21:38 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support [v8] In-Reply-To: <0SxTsRMH8gVoxpI7ky6nPbs1NgVFgniF83yj7y5w7UA=.1e5ee042-90cc-42eb-aadb-bc7ea5d7b571@github.com> References: <0SxTsRMH8gVoxpI7ky6nPbs1NgVFgniF83yj7y5w7UA=.1e5ee042-90cc-42eb-aadb-bc7ea5d7b571@github.com> Message-ID: On Fri, 8 Sep 2023 07:31:34 GMT, Ludovic Henry wrote: >> Good question, I don't find docs in the compilers for this either, but: >> https://github.com/riscv-non-isa/riscv-toolchain-conventions#cc-preprocessor-definitions > > We checked for it with [this simple snippet](https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename:'1',fontScale:14,fontUsePx:'0',j:2,lang:c%2B%2B,selection:(endColumn:1,endLineNumber:8,positionColumn:1,positionLineNumber:8,selectionStartColumn:1,selectionStartLineNumber:8,startColumn:1,startLineNumber:8),source:'int+main(void)+%7B%0A%23ifdef+__riscv_ztso%0A++++return+0%3B%0A%23else%0A++++return+1%3B%0A%23endif%0A%7D%0A'),l:'5',n:'0',o:'C%2B%2B+source+%232',t:'0')),k:51.29770473331974,l:'4',m:100,n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:rv64-gcctrunk,deviceViewOpen:'1',filters:(b:'0',binary:'1',binaryObject:'1',commentOnly:'0',debugCalls:'1',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'0',trim:'1'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:2,lang:c%2B%2B,libs:!(),options:'-O2+-march%3Drv64gcv_ztso0p1',overrides:!(),selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,st artColumn:1,startLineNumber:1),source:2),l:'5',n:'0',o:'+RISC-V+(64-bits)+gcc+(trunk)+(Editor+%232)',t:'0'),(h:compiler,i:(compiler:rv64-clang,deviceViewOpen:'1',filters:(b:'0',binary:'1',binaryObject:'1',commentOnly:'0',debugCalls:'1',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'0',trim:'1'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:3,lang:c%2B%2B,libs:!(),options:'-O2+-march%3Drv64gcv_ztso0p1+-menable-experimental-extensions',overrides:!(),selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:2),l:'5',n:'0',o:'+RISC-V+rv64gc+clang+(trunk)+(Editor+%232)',t:'0')),header:(),k:48.70229526668027,l:'4',m:100,n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4). You can see that in both cases, it returns `0` which implies TSO is enabled. OK. I think I have found its definition in gcc source code [1]. It's strange that this is not mentioned in [2]. [1] https://github.com/llvm/llvm-project/blob/523c471250a49b5603bd907ff05535f18ef61c91/clang/lib/Basic/Targets/RISCV.cpp#L160C5-L162C77 [2] https://github.com/riscv-non-isa/riscv-toolchain-conventions#cc-preprocessor-definitions ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15613#discussion_r1320468890 From lmesnik at openjdk.org Sat Sep 9 18:08:38 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sat, 9 Sep 2023 18:08:38 GMT Subject: RFR: 8312174: missing JVMTI events from vthreads parked during JVMTI attach [v2] In-Reply-To: <-viFMo-IWbIoWm7DBpH5yISHqfmVdSy-UFwroc6O1-w=.b961f69b-71b1-431e-a19f-a5880292ec12@github.com> References: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> <7cWSn9ZypedqQPPkGV8xFhFfpGYHMJcDZ5TH6Hel4mQ=.49235d1c-737c-465e-833c-a26d814cb7ac@github.com> <9NY499Z6epRjQ-ZvDrbxS6weL8QG-7djNWJN-o9SCmc=.6d58bb4a-2e26-4a96-ae1b-dbe6682ebe8f@github.com> <-viFMo-IWbIoWm7DBpH5yISHqfmVdSy-UFwroc6O1-w=.b961f69b-71b1-431e-a19f-a5880292ec12@github.com> Message-ID: On Fri, 8 Sep 2023 23:28:01 GMT, Serguei Spitsyn wrote: >> Would it makes sense also to check that thread state is TIMED_WAITING. It should be set TIMED_WAITING unmounted threads https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/lang/Thread.State.html#TIMED_WAITING > > It is strange that the the tested vthreads in sleep(timeout) have sate WAITING, not TIMED_WAITING. > It can be a bug in the implementation. > I've decided to add a short sleep. Checking states looks a little bit over complicated. Could you please also add comment with sleep which describe why is it needed and mention that the insufficient sleep time couldn't cause test failures. Test should pass anyway just don't test expected state. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1320620378 From fyang at openjdk.org Mon Sep 11 06:44:47 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 11 Sep 2023 06:44:47 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support [v8] In-Reply-To: References: Message-ID: On Thu, 7 Sep 2023 17:24:28 GMT, Ludovic Henry wrote: >> With the Ztso extension [1], some hardware will support TSO on RISC-V. That allows us to reduce the generation of memory fences, given the stronger memory model compared to RVWMO. >> >> [1] https://github.com/riscv/riscv-isa-manual/blob/6dcbc6da9ada01f0f57da83cda6059bdec57619f/src/ztso-st-ext.adoc#L1 > > Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: > > fixup! 8315841: RISC-V: Check for hardware TSO support LGTM. Any tests performed? ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15613#pullrequestreview-1619229259 From rcastanedalo at openjdk.org Mon Sep 11 07:08:05 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 11 Sep 2023 07:08:05 GMT Subject: RFR: 8315082: [REDO] Generational ZGC: Tests crash with assert(index == 0 || is_power_of_2(index)) Message-ID: This changeset (REDO of [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749)) ensures that the array copy stub underlying the intrinsic implementation of `Object.clone` only copies its (double-word aligned) payload, excluding the remaining object alignment padding words, when a non-default `ObjectAlignmentInBytes` value is used. This prevents the specialized ZGC stubs for `Object[]` array copy from processing undefined object alignment padding words as valid object pointers. For further details about the specific failure, see initial analysis of [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749) by Erik ?sterlund and Stefan Karlsson and comments in `test/hotspot/jtreg/compiler/gcbarriers/TestArrayCopyWithLargeObjectAlignment.java`. As a side-benefit, the changeset simplifies the array size computation logic in `GraphKit::new_array()` by decoupling computation of header size and alignment padding size. #### Additional changes compared to [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749) This changeset proposes the exact same solution as [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749), that is, identical changes to `barrierSetC2.cpp`, `graphKit.cpp`, `library_call.cpp`, and `TestArrayCopyWithLargeObjectAlignment.java`. On top of that, it relaxes an assertion in the idealization of `ArrayCopy` nodes violated by [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749) and reported in [JDK-8315029](https://bugs.openjdk.org/browse/JDK-8315029) (new changes in `arraycopynode.cpp`, new regression test `TestCloneArrayWithDifferentLengthConstness.java`). The original, stricter assertion checks that, while idealizing an ArrayCopy node, the "constness" of the array copy's word-length (whether it is known by C2 to be constant or not) is equivalent to that of the array copy's element-length. For cases in which the element-length is within a small, fixed range (e.g. for an `int` array of length `3..4`) so that all element-length values lead to the same number of word s (`2`), the assertion used to hold before this changeset only because of weak type propagation in `AndL` (preventing the constant word-length to be discovered), see the left graph below: ![from-element-to-word-length](https://github.com/openjdk/jdk/assets/8792647/3d5535cf-4afa-46dd-bc48-30430eead12f) With the proposed changes, the array copy word-length is computed in a more straightforward way that enables C2 to infer the precise number of words in the same scenario (see the right graph above). To accommodate this optimization, this changeset relaxes the assertion to check only one direction of the implication (if the element-length is constant, so is the word-length). The optimization does not affect the remaining idealization code, since it only uses element-length in the context in which the optimization is applied. The changeset includes an additional regression test (`TestCloneArrayWithDifferentLengthConstness.java`) that exercises different constant/variable combinations of element-length and word-length. #### Testing ##### Functionality - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64) - tier6-7 (linux-x64 only) - compiler-focused stress testing - failing tests reported in [JDK-8315029](https://bugs.openjdk.org/browse/JDK-8315029) - tier1-3, and a few custom examples, applying [JDK-8139457](https://github.com/openjdk/jdk/pull/11044) on top of this changeset ##### Performance Tested performance on the following set of OpenJDK micro-benchmarks, on linux-x64 (for both G1 and ZGC, using different ObjectAlignmentInBytes values): - `openjdk.bench.java.lang.ArrayClone.byteClone` - `openjdk.bench.java.lang.ArrayClone.intClone` - `openjdk.bench.java.lang.ArrayFiddle.simple_clone` - `openjdk.bench.java.lang.Clone.cloneLarge` - `openjdk.bench.java.lang.Clone.cloneThreeDifferent` No significant regression was observed. ------------- Commit messages: - Add regression test that exercises the relaxed assertion - Relaxed assertion - Remove extra whitespace - Remove extra whitespace - Revert use of UseNewCode - Revert "TEMPORARY: add additional macro-assembly comments" - Revert "TEMPORARY: set UseNewCode to true by default" - Revert "TEMPORARY: print only 'oop_disjoint_arraycopy_uninit' stub code" - Require GenZGC in the test - Round up object size at the end of the computation - ... and 11 more: https://git.openjdk.org/jdk/compare/024133b0...b3109e30 Changes: https://git.openjdk.org/jdk/pull/15589/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15589&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8315082 Stats: 188 lines in 6 files changed: 157 ins; 10 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/15589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15589/head:pull/15589 PR: https://git.openjdk.org/jdk/pull/15589 From rcastanedalo at openjdk.org Mon Sep 11 07:14:46 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 11 Sep 2023 07:14:46 GMT Subject: RFR: 8315082: [REDO] Generational ZGC: Tests crash with assert(index == 0 || is_power_of_2(index)) In-Reply-To: References: Message-ID: <7SKZMjHZ2-60TNxW1nluwDqG-ucTkjunn0AZHqrIWRo=.fed9ff46-1f58-448f-b14d-4ad8da0b7570@github.com> On Wed, 6 Sep 2023 11:54:04 GMT, Roberto Casta?eda Lozano wrote: > This changeset (REDO of [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749)) ensures that the array copy stub underlying the intrinsic implementation of `Object.clone` only copies its (double-word aligned) payload, excluding the remaining object alignment padding words, when a non-default `ObjectAlignmentInBytes` value is used. This prevents the specialized ZGC stubs for `Object[]` array copy from processing undefined object alignment padding words as valid object pointers. For further details about the specific failure, see initial analysis of [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749) by Erik ?sterlund and Stefan Karlsson and comments in `test/hotspot/jtreg/compiler/gcbarriers/TestArrayCopyWithLargeObjectAlignment.java`. > > As a side-benefit, the changeset simplifies the array size computation logic in `GraphKit::new_array()` by decoupling computation of header size and alignment padding size. > > #### Additional changes compared to [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749) > > This changeset proposes the exact same solution as [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749), that is, identical changes to `barrierSetC2.cpp`, `graphKit.cpp`, `library_call.cpp`, and `TestArrayCopyWithLargeObjectAlignment.java`. On top of that, it relaxes an assertion in the idealization of `ArrayCopy` nodes violated by [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749) and reported in [JDK-8315029](https://bugs.openjdk.org/browse/JDK-8315029) (new changes in `arraycopynode.cpp`, new regression test `TestCloneArrayWithDifferentLengthConstness.java`). The original, stricter assertion checks that, while idealizing an ArrayCopy node, the "constness" of the array copy's word-length (whether it is known by C2 to be constant or not) is equivalent to that of the array copy's element-length. For cases in which the element-length is within a small, fixed range (e.g. for an `int` array of length `3..4`) so that all element-length values lead to the same number of wo rds (`2`), the assertion used to hold before this changeset only because of weak type propagation in `AndL` (preventing the constant word-length to be discovered), see the left graph below: > > ![from-element-to-word-length](https://github.com/openjdk/jdk/assets/8792647/3d5535cf-4afa-46dd-bc48-30430eead12f) > > With the proposed changes, the array copy word-length is computed in a more straightforward way that enables C2 to infer the precise number of words in the same scenario ... @albertnetymk @vnkozlov @TobiHartmann could you please review this revised version of [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749)? The core changes remain identical, the only additional changes are an assertion relaxation and a second regression test (see description for details). Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15589#issuecomment-1713297160 From dholmes at openjdk.org Mon Sep 11 07:15:45 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 11 Sep 2023 07:15:45 GMT Subject: RFR: 8267174: Many test files have the wrong Copyright header In-Reply-To: References: Message-ID: <-l07QgwUIOOMsXw9SniuLk856xxeIs8v6lwe8SWO_oI=.7c375b23-9f35-4d0d-a7bb-5fe513edb1d5@github.com> On Tue, 5 Sep 2023 22:49:41 GMT, Erik Joelsson wrote: > There are a number of files in the `test` directory that have an incorrect copyright header, which includes the "classpath" exception text. This patch removes that text from all test files that I could find it in. I did this using a combination of `sed` and `grep`. Reviewing this patch is probably easier using the raw patch file or a suitable webrev format. > > It's my assumption that these headers were introduced by mistake as it's quite easy to copy the wrong template when creating new files. Looks good. I have often pointed out that the CPE was not relevant for test files but ... Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15573#pullrequestreview-1619274864 From dholmes at openjdk.org Mon Sep 11 07:26:41 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 11 Sep 2023 07:26:41 GMT Subject: RFR: 8313202: MutexLocker should disallow null Mutexes [v7] In-Reply-To: <4vvR0KmqJsa1PmnblvSkgmsx6gv6n5DMYLbfbsEwaq0=.5b34be95-9ecc-46f6-b9a0-6c5372bdefe2@github.com> References: <4vvR0KmqJsa1PmnblvSkgmsx6gv6n5DMYLbfbsEwaq0=.5b34be95-9ecc-46f6-b9a0-6c5372bdefe2@github.com> Message-ID: <0pVWKtyXgKaz20TxIEeVjJqUfZ3n-IEDceIZKWaFniY=.7d08ee45-cc0c-4817-b732-6199d86f50e7@github.com> On Mon, 4 Sep 2023 09:44:11 GMT, Aleksey Shipilev wrote: >> As seen in [JDK-8313081](https://bugs.openjdk.org/browse/JDK-8313081), it is fairly easy to pass nullptr `Mutex` to `MutexLocker` by accident, which would just silently avoid the lock. >> >> There are a few places in Hotspot where we pass `nullptr` to simulate re-entrancy and/or conditionally take the lock. Those places can be more explicit, and the default `MutexLocker` can disallow nullptrs for extra safety. >> >> More thorough testing with different GC/JIT combinations is running now, we might find more issues there. Meanwhile, please comment on the approach. >> >> Additional testing: >> - [x] `grep -R "MutexLocker " src/hotspot | grep -i null`, only new `ConditionalMutexLocker` hits >> - [x] `grep -R "MutexLocker " src/hotspot | grep -i ?`, no hits >> - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` x `Serial Parallel G1 Shenandoah` > > Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: > > - Touchup whitespace > - Rewrite jvmtiManageCapabilities lock usage > - Re-instate old asserts Updates look good. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15043#pullrequestreview-1619293310 From luhenry at openjdk.org Mon Sep 11 08:01:41 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 11 Sep 2023 08:01:41 GMT Subject: RFR: 8315841: RISC-V: Check for hardware TSO support [v8] In-Reply-To: References: Message-ID: On Mon, 11 Sep 2023 06:41:36 GMT, Fei Yang wrote: >> Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: >> >> fixup! 8315841: RISC-V: Check for hardware TSO support > > LGTM. Any tests performed? @RealFYang I've tested with jcstress on QEMU on x86 with the change and with an obvious breakage (not emitting barriers at all). It would fail when not emitting barriers at all (makes sense, x86 is not a sequential consistency memory model) and it would succeed with this patch (also makes sense given x86 is a TSO). It's obviously lacking testing on an actual hardware that supports TSO and RVWMO, but I'm not aware of any on the market at the moment. We (Rivos) will make sure to test as soon as we have our hardware available, and we'll report back. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15613#issuecomment-1713364178 From haosun at openjdk.org Mon Sep 11 08:20:15 2023 From: haosun at openjdk.org (Hao Sun) Date: Mon, 11 Sep 2023 08:20:15 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v7] In-Reply-To: References: Message-ID: > ### Background > > 1. PAC-RET branch protection was initially implemented on Linux/AArch64 in JDK-8277204 [1]. > > 2. However, it was broken with the introduction of virtual threads [2], mainly because the continuation freeze/thaw mechanism would trigger stack copying to/from memory, whereas the saved and signed LR on the stack doesn't get re-signed accordingly. > > 3. PR-9067 [3] tried to implement the re-sign part, but it was not accepted because option "PreserveFramePointer" is always turned on by PAC-RET but this would slow down virtual threads by ~5-20x. > > 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview language features are enabled. Note that virtual thread is one preview feature then. > > 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. > > ### Goal > > This patch aims to make PAC-RET compatible with virtual threads. > > ### Requirements of virtual threads > > R-1: Option "PreserveFramePointer" should be turned off. That is, PAC-RET implementation should not rely on frame pointer FP. Otherwise, the fast path in stack copying will never be taken. > > R-2: Use some invariant values to stack copying as the modifier, so as to avoid the PAC re-sign for continuation thaw, as the fast path in stack copying doesn't walk the frame. > > Note that more details can be found in the discussion [3]. > > ### Investigation > > We considered to use (relative) stack pointer SP, thread ID, PACStack [7] and value zero as the candidate modifier. > > 1. SP: In some scenarios, we need to authenticate the return address in places where the current SP doesn't match the SP on function entry. E.g. see the usage in Runtime1::generate_handle_exception(). Hence, neither absolute nor relative SP works. > > 2. thread ID (tid): It's invariant to virtual thread, but it's nontrivial to access it from the JIT side. We need 1) firstly resolve the address of current thread (See [8] as an example), and 2) get the tid field in the way like java_lang_Thread::thread_id(). I suppose this would introduce big performance overhead. Then can we turn to use "rthread" register (JavaThread object address) as the modifier? Unfortunately, it's not an invariant to virtual threads and PAC re-sign is still needed. > > 5. PACStack uses the signed return address of caller as the modifier to sign the callee's return address. In this way, we get one PACed call chain. The modifier should be saved into somewhere around the frame record. Inevitably, FP should be preserved to make it easy to find this modifier in case of... Hao Sun has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: - Merge branch 'master' into jdk-8287325 - Revert to the implementation with zero as the PAC modifier - Merge branch 'master' into jdk-8287325 - Update aarch64.ad and jvmci AArch64TestAssembler.java Before this patch, rscratch1 is clobbered. With this patch, we use the rscratch1 register after we save it on the stack. In this way, the code would be consistent with macroAssembler_aarch64.cpp. - Merge branch 'master' into jdk-8287325 - Remove my temp test patch on jvmci_global.hpp and stubGenerator_aarch64.hpp - Use relative SP as the PAC modifier - Merge branch 'master' into jdk-8287325 - Merge branch 'master' into jdk-8287325 - Rename return_pc_at and patch_pc_at Rename return_pc_at to return_address_at. Rename patch_pc_at to patch_return_address_at. - ... and 1 more: https://git.openjdk.org/jdk/compare/dab1c213...08a8815c ------------- Changes: https://git.openjdk.org/jdk/pull/13322/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13322&range=06 Stats: 193 lines in 29 files changed: 75 ins; 28 del; 90 mod Patch: https://git.openjdk.org/jdk/pull/13322.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13322/head:pull/13322 PR: https://git.openjdk.org/jdk/pull/13322 From haosun at openjdk.org Mon Sep 11 08:20:16 2023 From: haosun at openjdk.org (Hao Sun) Date: Mon, 11 Sep 2023 08:20:16 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v6] In-Reply-To: References: Message-ID: On Fri, 8 Sep 2023 12:21:01 GMT, Andrew Haley wrote: >> Hao Sun has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: >> >> - Revert to the implementation with zero as the PAC modifier >> - Merge branch 'master' into jdk-8287325 >> - Update aarch64.ad and jvmci AArch64TestAssembler.java >> >> Before this patch, rscratch1 is clobbered. >> With this patch, we use the rscratch1 register after we save it on the >> stack. >> >> In this way, the code would be consistent with >> macroAssembler_aarch64.cpp. >> - Merge branch 'master' into jdk-8287325 >> - Remove my temp test patch on jvmci_global.hpp and stubGenerator_aarch64.hpp >> - Use relative SP as the PAC modifier >> - Merge branch 'master' into jdk-8287325 >> - Merge branch 'master' into jdk-8287325 >> - Rename return_pc_at and patch_pc_at >> >> Rename return_pc_at to return_address_at. >> Rename patch_pc_at to patch_return_address_at. >> - 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret >> >> * Background >> >> 1. PAC-RET branch protection was initially implemented on Linux/AArch64 >> in JDK-8277204 [1]. >> >> 2. However, it was broken with the introduction of virtual threads [2], >> mainly because the continuation freeze/thaw mechanism would trigger >> stack copying to/from memory, whereas the saved and signed LR on the >> stack doesn't get re-signed accordingly. >> >> 3. PR-9067 [3] tried to implement the re-sign part, but it was not >> accepted because option "PreserveFramePointer" is always turned on by >> PAC-RET but this would slow down virtual threads by ~5-20x. >> >> 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview >> language features are enabled. Note that virtual thread is one preview >> feature then. >> >> 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. >> >> * Goal >> >> This patch aims to make PAC-RET compatible with virtual threads. >> >> * Requirements of virtual threads >> >> R-1: Option "PreserveFramePointer" should be turned off. That is, >> PAC-RET implementation should not rely on frame pointer FP. Otherwise, >> the fast path in stack copying will never be taken. >> >> R-2: Use some invariant values to stack copying as the modifier, so as >> to avoid the PAC re-sign for continuation thaw, as the fast path in >> stack copying doesn't walk the frame. >> >> Note that more details can be found in the discuss... > > src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp line 64: > >> 62: *(address*)sp = pc; >> 63: } >> 64: > > Is it possible to make put methods in the superclass, and override then only for AArch64? I doubt that. In current design, we declare member functions in the shared `ContinuationHelper.hpp`, and define them in the architecture specific `ContinuationHelper_xx.inline.hpp` (e.g, `ContinuationHelper_aarch64.inline.hpp` for AArch64 backend). Following current design, if we introduce one base class(e.g., `class ContinuationCommonHelper`) and the default `patch_return_address_at()` implementation, we still have to declare `ContinuationHelper::patch_return_address_at()` in the shared `ContinuationHelper.hpp` and define it for all architectures inevitably. Otherwise, we have to 1) only declare `ContinuationHelper::patch_return_address_at()` in `ContinuationHelper.hpp` for AArch64 target with the help of conditional compilation directives. OR 2) put the declaration of `class ContinuationHelper` to `ContinuationHelper_xx.inline.hpp` and only overwride `patch_return_address_at()` for AArch64. But I think neither way is neat. WDYT? Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1321161472 From sspitsyn at openjdk.org Mon Sep 11 09:03:20 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 11 Sep 2023 09:03:20 GMT Subject: RFR: 8312174: missing JVMTI events from vthreads parked during JVMTI attach [v3] In-Reply-To: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> References: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> Message-ID: > This update fixes two important issues: > - Issue reported by a bug submitter about missing JVMTI events on virtual threads after an a JVMTI agent dynamic attach > - Known scalability/performance issue: a need to lazily create `JvmtiThreadState's` for virtual threads > > The issue is tricky to fix because the existing mechanism of the JVMTI event management does not support unmounted virtual threads. The JVMTI `SetEventNotificationMode()` calls the function `JvmtiEventControllerPrivate::recompute_enabled()` > which inspects a `JavaThread's` list and for each thread in the list recomputes enabled event bits with the function `JvmtiEventControllerPrivate::recompute_thread_enabled()`. The `JvmtiThreadState` of each thread is created but only when it is really needed, eg, if any of the thread filtered events is enabled. There was an initial adjustment of this mechanism for virtual threads which accounted for both carrier and virtual threads when a virtual thread is mounted. However, it does not work for unmounted virtual threads. A temporary work around was to always create `JvmtiThreadState` for each virtual thread eagerly at a thread starting point. > > This fix introduces new function `JvmtiExport::get_jvmti_thread_state()` which checks if thread is virtual and there is a thread filtered event enabled globally, and if so, forces a creation of the `JvmtiThreadState`. Another adjustment was needed because the function `state_for_while_locked()` can be called directly in some contexts. New function `JvmtiEventController::recompute_thread_filtered()` was introduced to make necessary corrections. > > Testing: > - new test from the bug report was adopted: `test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest` > - ran mach5 tiers 1-6: all are passed Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: 1) addressed review comments; 2) replaced is_virtual with is_vthread_mounted ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15467/files - new: https://git.openjdk.org/jdk/pull/15467/files/dd97dacc..6cf97ef9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15467&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15467&range=01-02 Stats: 191 lines in 8 files changed: 80 ins; 57 del; 54 mod Patch: https://git.openjdk.org/jdk/pull/15467.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15467/head:pull/15467 PR: https://git.openjdk.org/jdk/pull/15467 From luhenry at openjdk.org Mon Sep 11 09:05:50 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 11 Sep 2023 09:05:50 GMT Subject: Integrated: 8315841: RISC-V: Check for hardware TSO support In-Reply-To: References: Message-ID: On Thu, 7 Sep 2023 09:00:50 GMT, Ludovic Henry wrote: > With the Ztso extension [1], some hardware will support TSO on RISC-V. That allows us to reduce the generation of memory fences, given the stronger memory model compared to RVWMO. > > [1] https://github.com/riscv/riscv-isa-manual/blob/6dcbc6da9ada01f0f57da83cda6059bdec57619f/src/ztso-st-ext.adoc#L1 This pull request has now been integrated. Changeset: 35bccacb Author: Ludovic Henry URL: https://git.openjdk.org/jdk/commit/35bccacb6618e9ec686be895a9ef6ba8f3375ef0 Stats: 28 lines in 5 files changed: 27 ins; 0 del; 1 mod 8315841: RISC-V: Check for hardware TSO support Reviewed-by: vkempik, rehn, fyang ------------- PR: https://git.openjdk.org/jdk/pull/15613 From sspitsyn at openjdk.org Mon Sep 11 09:08:26 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 11 Sep 2023 09:08:26 GMT Subject: RFR: 8312174: missing JVMTI events from vthreads parked during JVMTI attach [v4] In-Reply-To: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> References: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> Message-ID: <96hqmxZL0Uer6QWJRtpg9QYkPdt-JcUBcQ4bNzO6LVY=.cd603ff4-52c6-4614-b02f-35a64b6a3a15@github.com> > This update fixes two important issues: > - Issue reported by a bug submitter about missing JVMTI events on virtual threads after an a JVMTI agent dynamic attach > - Known scalability/performance issue: a need to lazily create `JvmtiThreadState's` for virtual threads > > The issue is tricky to fix because the existing mechanism of the JVMTI event management does not support unmounted virtual threads. The JVMTI `SetEventNotificationMode()` calls the function `JvmtiEventControllerPrivate::recompute_enabled()` > which inspects a `JavaThread's` list and for each thread in the list recomputes enabled event bits with the function `JvmtiEventControllerPrivate::recompute_thread_enabled()`. The `JvmtiThreadState` of each thread is created but only when it is really needed, eg, if any of the thread filtered events is enabled. There was an initial adjustment of this mechanism for virtual threads which accounted for both carrier and virtual threads when a virtual thread is mounted. However, it does not work for unmounted virtual threads. A temporary work around was to always create `JvmtiThreadState` for each virtual thread eagerly at a thread starting point. > > This fix introduces new function `JvmtiExport::get_jvmti_thread_state()` which checks if thread is virtual and there is a thread filtered event enabled globally, and if so, forces a creation of the `JvmtiThreadState`. Another adjustment was needed because the function `state_for_while_locked()` can be called directly in some contexts. New function `JvmtiEventController::recompute_thread_filtered()` was introduced to make necessary corrections. > > Testing: > - new test from the bug report was adopted: `test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest` > - ran mach5 tiers 1-6: all are passed Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: removed JavaThread::is_virtual() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15467/files - new: https://git.openjdk.org/jdk/pull/15467/files/6cf97ef9..f5a144bc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15467&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15467&range=02-03 Stats: 8 lines in 2 files changed: 0 ins; 8 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15467.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15467/head:pull/15467 PR: https://git.openjdk.org/jdk/pull/15467 From aph at openjdk.org Mon Sep 11 09:37:45 2023 From: aph at openjdk.org (Andrew Haley) Date: Mon, 11 Sep 2023 09:37:45 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v6] In-Reply-To: References: Message-ID: On Mon, 11 Sep 2023 08:15:16 GMT, Hao Sun wrote: >> src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp line 64: >> >>> 62: *(address*)sp = pc; >>> 63: } >>> 64: >> >> Is it possible to make put methods in the superclass, and override then only for AArch64? > > I doubt that. > > In current design, we declare member functions in the shared `ContinuationHelper.hpp`, and define them in the architecture specific `ContinuationHelper_xx.inline.hpp` (e.g, `ContinuationHelper_aarch64.inline.hpp` for AArch64 backend). > > Following current design, if we introduce one base class(e.g., `class ContinuationCommonHelper`) and the default `patch_return_address_at()` implementation, we still have to declare `ContinuationHelper::patch_return_address_at()` in the shared `ContinuationHelper.hpp` and define it for all architectures inevitably. > > Otherwise, we have to > 1) only declare `ContinuationHelper::patch_return_address_at()` in `ContinuationHelper.hpp` for AArch64 target with the help of conditional compilation directives. OR > 2) put the declaration of `class ContinuationHelper` to `ContinuationHelper_xx.inline.hpp` and only overwride `patch_return_address_at()` for AArch64. > > But I think neither way is neat. WDYT? Thanks. Why not define the default `BaseContinuationHelper::patch_return_address_at()` in ContinuationHelper.hpp? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1321267980 From haosun at openjdk.org Mon Sep 11 10:00:42 2023 From: haosun at openjdk.org (Hao Sun) Date: Mon, 11 Sep 2023 10:00:42 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v6] In-Reply-To: References: Message-ID: <2WwPU-8unu7c-__kCaG8YFgq_A97ZLX0Rh_aC_YtRtQ=.a180c529-8f86-45e2-8a6b-775ee90ff4c8@github.com> On Mon, 11 Sep 2023 09:35:19 GMT, Andrew Haley wrote: >> I doubt that. >> >> In current design, we declare member functions in the shared `ContinuationHelper.hpp`, and define them in the architecture specific `ContinuationHelper_xx.inline.hpp` (e.g, `ContinuationHelper_aarch64.inline.hpp` for AArch64 backend). >> >> Following current design, if we introduce one base class(e.g., `class ContinuationCommonHelper`) and the default `patch_return_address_at()` implementation, we still have to declare `ContinuationHelper::patch_return_address_at()` in the shared `ContinuationHelper.hpp` and define it for all architectures inevitably. >> >> Otherwise, we have to >> 1) only declare `ContinuationHelper::patch_return_address_at()` in `ContinuationHelper.hpp` for AArch64 target with the help of conditional compilation directives. OR >> 2) put the declaration of `class ContinuationHelper` to `ContinuationHelper_xx.inline.hpp` and only overwride `patch_return_address_at()` for AArch64. >> >> But I think neither way is neat. WDYT? Thanks. > > Why not define the default `BaseContinuationHelper::patch_return_address_at()` in ContinuationHelper.hpp? I guess the following pseudo-code is what you want: /* * file ContinuationHelper.hpp */ class BaseContinuationHelper { public: inline void patch_return_address_at(intptr_t* sp, address pc) { // the default implementation } } class ContinuationHelper : public BaseContinuationHelper { public: inline void patch_return_address_at(intptr_t* sp, address pc) {} // declare here } /* * file ContinuationHelper_aarch64.inline.hpp */ inline void ContinuationHelper::patch_return_address_at(intptr_t* sp, address pc) { // override here for AArch64 } /* * file ContinuationHelper_x86.inline.hpp */ // no need to define patch_return_address_at(). // use the default BaseContinuationHelper::patch_return_address_at(). However, it doesn't work because we have to define `ContinuationHelper::patch_return_address_at()` for x86 since we declare it. Please let me know if I misunderstood something. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1321300062 From rrich at openjdk.org Mon Sep 11 10:06:02 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 11 Sep 2023 10:06:02 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v4] In-Reply-To: References: Message-ID: > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Apply Thomas' suggestions Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/5b802ed3..d7ab2b0f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=02-03 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From ihse at openjdk.org Mon Sep 11 10:37:45 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 11 Sep 2023 10:37:45 GMT Subject: RFR: 8267174: Many test files have the wrong Copyright header In-Reply-To: References: Message-ID: On Tue, 5 Sep 2023 22:49:41 GMT, Erik Joelsson wrote: > There are a number of files in the `test` directory that have an incorrect copyright header, which includes the "classpath" exception text. This patch removes that text from all test files that I could find it in. I did this using a combination of `sed` and `grep`. Reviewing this patch is probably easier using the raw patch file or a suitable webrev format. > > It's my assumption that these headers were introduced by mistake as it's quite easy to copy the wrong template when creating new files. Looks good to me. Thanks for doing this cleanup! ------------- Marked as reviewed by ihse (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15573#pullrequestreview-1619669362 From ayang at openjdk.org Mon Sep 11 11:32:43 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 11 Sep 2023 11:32:43 GMT Subject: RFR: 8315082: [REDO] Generational ZGC: Tests crash with assert(index == 0 || is_power_of_2(index)) In-Reply-To: References: Message-ID: <3smMa3E5t8fjSWprFvl10aa23vsT9l-WsIhSBQcL3Sw=.db24cfe7-5b19-4e5b-b930-7cd1a6be5201@github.com> On Wed, 6 Sep 2023 11:54:04 GMT, Roberto Casta?eda Lozano wrote: > This changeset (REDO of [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749)) ensures that the array copy stub underlying the intrinsic implementation of `Object.clone` only copies its (double-word aligned) payload, excluding the remaining object alignment padding words, when a non-default `ObjectAlignmentInBytes` value is used. This prevents the specialized ZGC stubs for `Object[]` array copy from processing undefined object alignment padding words as valid object pointers. For further details about the specific failure, see initial analysis of [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749) by Erik ?sterlund and Stefan Karlsson and comments in `test/hotspot/jtreg/compiler/gcbarriers/TestArrayCopyWithLargeObjectAlignment.java`. > > As a side-benefit, the changeset simplifies the array size computation logic in `GraphKit::new_array()` by decoupling computation of header size and alignment padding size. > > #### Additional changes compared to [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749) > > This changeset proposes the exact same solution as [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749), that is, identical changes to `barrierSetC2.cpp`, `graphKit.cpp`, `library_call.cpp`, and `TestArrayCopyWithLargeObjectAlignment.java`. On top of that, it relaxes an assertion in the idealization of `ArrayCopy` nodes violated by [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749) and reported in [JDK-8315029](https://bugs.openjdk.org/browse/JDK-8315029) (new changes in `arraycopynode.cpp`, new regression test `TestCloneArrayWithDifferentLengthConstness.java`). The original, stricter assertion checks that, while idealizing an ArrayCopy node, the "constness" of the array copy's word-length (whether it is known by C2 to be constant or not) is equivalent to that of the array copy's element-length. For cases in which the element-length is within a small, fixed range (e.g. for an `int` array of length `3..4`) so that all element-length values lead to the same number of wo rds (`2`), the assertion used to hold before this changeset only because of weak type propagation in `AndL` (preventing the constant word-length to be discovered), see the left graph below: > > ![from-element-to-word-length](https://github.com/openjdk/jdk/assets/8792647/3d5535cf-4afa-46dd-bc48-30430eead12f) > > With the proposed changes, the array copy word-length is computed in a more straightforward way that enables C2 to infer the precise number of words in the same scenario ... Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15589#pullrequestreview-1619762002 From rcastanedalo at openjdk.org Mon Sep 11 11:40:47 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 11 Sep 2023 11:40:47 GMT Subject: RFR: 8315082: [REDO] Generational ZGC: Tests crash with assert(index == 0 || is_power_of_2(index)) In-Reply-To: References: Message-ID: On Wed, 6 Sep 2023 11:54:04 GMT, Roberto Casta?eda Lozano wrote: > This changeset (REDO of [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749)) ensures that the array copy stub underlying the intrinsic implementation of `Object.clone` only copies its (double-word aligned) payload, excluding the remaining object alignment padding words, when a non-default `ObjectAlignmentInBytes` value is used. This prevents the specialized ZGC stubs for `Object[]` array copy from processing undefined object alignment padding words as valid object pointers. For further details about the specific failure, see initial analysis of [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749) by Erik ?sterlund and Stefan Karlsson and comments in `test/hotspot/jtreg/compiler/gcbarriers/TestArrayCopyWithLargeObjectAlignment.java`. > > As a side-benefit, the changeset simplifies the array size computation logic in `GraphKit::new_array()` by decoupling computation of header size and alignment padding size. > > #### Additional changes compared to [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749) > > This changeset proposes the exact same solution as [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749), that is, identical changes to `barrierSetC2.cpp`, `graphKit.cpp`, `library_call.cpp`, and `TestArrayCopyWithLargeObjectAlignment.java`. On top of that, it relaxes an assertion in the idealization of `ArrayCopy` nodes violated by [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749) and reported in [JDK-8315029](https://bugs.openjdk.org/browse/JDK-8315029) (new changes in `arraycopynode.cpp`, new regression test `TestCloneArrayWithDifferentLengthConstness.java`). The original, stricter assertion checks that, while idealizing an ArrayCopy node, the "constness" of the array copy's word-length (whether it is known by C2 to be constant or not) is equivalent to that of the array copy's element-length. For cases in which the element-length is within a small, fixed range (e.g. for an `int` array of length `3..4`) so that all element-length values lead to the same number of wo rds (`2`), the assertion used to hold before this changeset only because of weak type propagation in `AndL` (preventing the constant word-length to be discovered), see the left graph below: > > ![from-element-to-word-length](https://github.com/openjdk/jdk/assets/8792647/3d5535cf-4afa-46dd-bc48-30430eead12f) > > With the proposed changes, the array copy word-length is computed in a more straightforward way that enables C2 to infer the precise number of words in the same scenario ... Thanks for reviewing, Albert! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15589#issuecomment-1713705474 From aph at openjdk.org Mon Sep 11 12:36:41 2023 From: aph at openjdk.org (Andrew Haley) Date: Mon, 11 Sep 2023 12:36:41 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v6] In-Reply-To: <2WwPU-8unu7c-__kCaG8YFgq_A97ZLX0Rh_aC_YtRtQ=.a180c529-8f86-45e2-8a6b-775ee90ff4c8@github.com> References: <2WwPU-8unu7c-__kCaG8YFgq_A97ZLX0Rh_aC_YtRtQ=.a180c529-8f86-45e2-8a6b-775ee90ff4c8@github.com> Message-ID: On Mon, 11 Sep 2023 09:57:31 GMT, Hao Sun wrote: >> Why not define the default `BaseContinuationHelper::patch_return_address_at()` in ContinuationHelper.hpp? > > I guess the following pseudo-code is what you want: > > > /* > * file ContinuationHelper.hpp > */ > > class BaseContinuationHelper { > public: > inline void patch_return_address_at(intptr_t* sp, address pc) { > // the default implementation > } > } > > class ContinuationHelper : public BaseContinuationHelper { > public: > inline void patch_return_address_at(intptr_t* sp, address pc) {} // declare here > } > > /* > * file ContinuationHelper_aarch64.inline.hpp > */ > > inline void ContinuationHelper::patch_return_address_at(intptr_t* sp, address pc) { > // override here for AArch64 > } > > /* > * file ContinuationHelper_x86.inline.hpp > */ > > // no need to define patch_return_address_at(). > // use the default BaseContinuationHelper::patch_return_address_at(). > > > However, it doesn't work because we have to define `ContinuationHelper::patch_return_address_at()` for x86 since we declare it. > > Please let me know if I misunderstood something. I see the problem. I'd do this: diff --git a/src/hotspot/cpu/aarch64/continuationHelper_aarch64.inline.hpp b/src/hotspot/cpu/aarch64/continuationHelper_aarch64.inline.hpp index 25e83e7e4b9..e1bd855dddf 100644 --- a/src/hotspot/cpu/aarch64/continuationHelper_aarch64.inline.hpp +++ b/src/hotspot/cpu/aarch64/continuationHelper_aarch64.inline.hpp @@ -68,6 +68,8 @@ inline void ContinuationHelper::push_pd(const frame& f) { *(intptr_t**)(f.sp() - frame::sender_sp_offset) = f.fp(); } +#define CPU_OVERRIDES_RETURN_ADDRESS_ACCESSORS + inline address ContinuationHelper::return_address_at(intptr_t* sp) { return pauth_strip_verifiable(*(address*)sp); } diff --git a/src/hotspot/cpu/x86/continuationHelper_x86.inline.hpp b/src/hotspot/cpu/x86/continuationHelper_x86.inline.hpp index ce88dd6dbba..55794f9ac7e 100644 --- a/src/hotspot/cpu/x86/continuationHelper_x86.inline.hpp +++ b/src/hotspot/cpu/x86/continuationHelper_x86.inline.hpp @@ -68,14 +68,6 @@ inline void ContinuationHelper::push_pd(const frame& f) { *(intptr_t**)(f.sp() - frame::sender_sp_offset) = f.fp(); } -inline address ContinuationHelper::return_address_at(intptr_t* sp) { - return *(address*)sp; -} - -inline void ContinuationHelper::patch_return_address_at(intptr_t* sp, address pc) { - *(address*)sp = pc; -} - inline void ContinuationHelper::set_anchor_to_entry_pd(JavaFrameAnchor* anchor, ContinuationEntry* entry) { anchor->set_last_Java_fp(entry->entry_fp()); } diff --git a/src/hotspot/share/runtime/continuationHelper.inline.hpp b/src/hotspot/share/runtime/continuationHelper.inline.hpp index 7c6ab7ee76b..6d4d739f219 100644 --- a/src/hotspot/share/runtime/continuationHelper.inline.hpp +++ b/src/hotspot/share/runtime/continuationHelper.inline.hpp @@ -37,6 +37,15 @@ #include CPU_HEADER_INLINE(continuationHelper) +#ifndef CPU_OVERRIDES_RETURN_ADDRESS_ACCESSORS +inline address ContinuationHelper::return_address_at(intptr_t* sp) { + return *(address*)sp; +} +inline void ContinuationHelper::patch_return_address_at(intptr_t* sp, address pc) { + *(address*)sp = pc; +} +#endif + inline bool ContinuationHelper::NonInterpretedUnknownFrame::is_instance(const frame& f) { return !f.is_interpreted_frame(); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1321480892 From tschatzl at openjdk.org Mon Sep 11 14:04:43 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 11 Sep 2023 14:04:43 GMT Subject: RFR: 8315550: G1: Fix -Wconversion warnings in g1NUMA In-Reply-To: References: Message-ID: On Fri, 1 Sep 2023 15:48:25 GMT, Albert Mingkun Yang wrote: > Simple `int` to `uint` for NUMA node-id. > > Possibly, `numa_get_leaf_groups` should accept `uint[]`. I will attempt that in another PR, as that will be mostly runtime, not G1 specific. Seems good. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15541#pullrequestreview-1620092430 From ayang at openjdk.org Mon Sep 11 14:53:40 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 11 Sep 2023 14:53:40 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v4] In-Reply-To: References: Message-ID: <0kMimZnIkHnMHqpMmJgN5KEz23R7qh7lqPZNnA5of4s=.85614360-c5ab-46ca-99bf-ac2a3bae82a4@github.com> On Mon, 11 Sep 2023 10:06:02 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Apply Thomas' suggestions > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> Could you merge master, just to make sure the patch plays nicely with the rest? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1714047519 From rrich at openjdk.org Mon Sep 11 15:01:30 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 11 Sep 2023 15:01:30 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v5] In-Reply-To: References: Message-ID: <4BxTPGct1ALNjXupIUdMtGT3ZfB0qalFlNq0Vva162Y=.17fc708f-0eda-43e0-bfe6-8447ee5b3488@github.com> > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' - Apply Thomas' suggestions Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> - Limit effect of previous commit to large array handling - Make sure to skip stripes where no object starts - 8310031: Parallel: Implement better work distribution for large object arrays in old gen ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/d7ab2b0f..67edf286 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=03-04 Stats: 138920 lines in 3357 files changed: 77774 ins; 42558 del; 18588 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From rrich at openjdk.org Mon Sep 11 15:01:31 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 11 Sep 2023 15:01:31 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v4] In-Reply-To: <0kMimZnIkHnMHqpMmJgN5KEz23R7qh7lqPZNnA5of4s=.85614360-c5ab-46ca-99bf-ac2a3bae82a4@github.com> References: <0kMimZnIkHnMHqpMmJgN5KEz23R7qh7lqPZNnA5of4s=.85614360-c5ab-46ca-99bf-ac2a3bae82a4@github.com> Message-ID: On Mon, 11 Sep 2023 14:50:44 GMT, Albert Mingkun Yang wrote: > Could you merge master, just to make sure the patch plays nicely with the rest? Sure. I'm currently testing the merge. I'll push it just now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1714057066 From simonis at openjdk.org Mon Sep 11 15:14:45 2023 From: simonis at openjdk.org (Volker Simonis) Date: Mon, 11 Sep 2023 15:14:45 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v8] In-Reply-To: <1VlRygD5Ma7S7PwUsJdwi-gPVo_T30nwA7Tv0BAemNA=.47e3cfc0-0f89-4a9a-bd41-31689db67bd1@github.com> References: <1VlRygD5Ma7S7PwUsJdwi-gPVo_T30nwA7Tv0BAemNA=.47e3cfc0-0f89-4a9a-bd41-31689db67bd1@github.com> Message-ID: On Fri, 8 Sep 2023 21:40:26 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Fix includes Thanks for your changes. Already much better now, but I still have some comments :) src/hotspot/share/gc/shared/stringdedup/stringDedupProcessor.cpp line 71: > 69: EXCEPTION_MARK; > 70: _concurrent_dedup_thread_cpu_time = > 71: PerfDataManager::create_counter(SUN_THREADS, "g1_conc_dedup_thread.cpu_time", I think this counter is not G1 specific so please drop `g1_` from the name. src/hotspot/share/runtime/vmThread.cpp line 299: > 297: "Must be called from VM thread"); > 298: // Update vm_thread_cpu_time after each VM operation. > 299: // _perf_vm_thread_cpu_time->set_value(os::current_thread_cpu_time()); I don't think we need this comment line any more (and it is also wrong, now that we've changed the `PerfVariable` to a `PerfCounter`. ------------- Changes requested by simonis (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15082#pullrequestreview-1620155306 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1321646182 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1321673421 From simonis at openjdk.org Mon Sep 11 15:14:50 2023 From: simonis at openjdk.org (Volker Simonis) Date: Mon, 11 Sep 2023 15:14:50 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v8] In-Reply-To: <_-ekt02un2mj-y6Z_v-4HiLjDoassXhZirdsu3ZmmUA=.fca59c04-7ddf-44fc-9fc2-575e915509dd@github.com> References: <1VlRygD5Ma7S7PwUsJdwi-gPVo_T30nwA7Tv0BAemNA=.47e3cfc0-0f89-4a9a-bd41-31689db67bd1@github.com> <_-ekt02un2mj-y6Z_v-4HiLjDoassXhZirdsu3ZmmUA=.fca59c04-7ddf-44fc-9fc2-575e915509dd@github.com> Message-ID: On Sat, 9 Sep 2023 00:39:38 GMT, Man Cao wrote: >> Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix includes > > src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 461: > >> 459: >> 460: _g1_concurrent_mark_threads_cpu_time = >> 461: PerfDataManager::create_counter(SUN_THREADS, "g1_conc_mark_thread.cpu_time", > > I find a bit odd to have "." in the name. "." should be the separator for namespace, but not within the counter name. > > I think @simonis 's suggestion about `sun.gc.collector..cpu_time` or `sun.gc.cpu_time` is to have a single, aggregated counter named `cpu_time`. If we don't do such aggregation, the names should probably be `g1_conc_mark_cpu_time`, `parallel_gc_workers_cpu_time`, etc. I suggested to have a single, aggregated counter for all GC threads **in addition** to the specific counters. This aggregated counter should be the same for all GCs which implement CPU times. It could easily be used by tools without knowing which GC is enabled and even more important, it would be immune to implementation changes (e.g. if a GC would establish a new subset of GC worker threads). From what I understand this is also your primary use case for your adaptable heap sizing feature. The aggregated counter should be in a generic place for *all* GCs (e.g. `sun.gc.cpu_time` or `sun.threads.gc_cpu_time`). For the JIT we could then add the corresponding `sun.ci.cpu_time` or `sun.threads.jit_cpu_time` (in addition to the more specific counters for C1, C2, etc.). For the specific GC counters: instead of putting the `.cpu_time` in the counter name, you can create a sub-namespace instead (e.g. `sun.gc.cpu_time.*` or `sun.threads.cpu_time.*`) and add all the other counters under that namespace (e.g. `sun.gc.cpu_time.total`, `sun.gc.cpu_time.conc_mark`, `sun.gc.cpu_time.parallel_gc_workers`, etc... What I want to avoid in any case is to have the name of the GC in every single counter name because that's redundant information. > src/hotspot/share/runtime/thread.hpp line 663: > >> 661: // hsperfdata counter. >> 662: class ThreadTotalCPUTimeClosure: public ThreadClosure { >> 663: private: > > It might be preferable to move this class to share/memory/iterator.hpp (where `ThreadClosure` is defined) or runtime/perfData.hpp (where a similar class `PerfTraceTime` is defined). > > Also we probably want to move the body of `do_thread()` and `~ThreadTotalCPUTimeClosure()` to the corresponding .cpp file, to minimize new include statements in .hpp. If you move it, I'd vote for `runtime/perfData.hpp`. > src/hotspot/share/runtime/thread.hpp line 680: > >> 678: // must ensure the thread exists and has not terminated. >> 679: assert(os::is_thread_cpu_time_supported(), "os must support cpu time"); >> 680: _time_diff = os::thread_cpu_time(thread); > > This does not look correct, the `_time_diff` is not a delta with the previous value of the CPU time. It also no longer accumulates the CPU time across a set of threads. I think we should stay with the previous approach of using `PerfVariable`, accumulating CPU time with `_total += os::thread_cpu_time(thread)`, then call ` _counter->set_value(_total)` in the destructor. > > To @simonis's point about using `PerfCounter` instead of `PerfVariable`, I agree ideally CPU time could use monotonically increasing `PerfCounter`. However, it would require computing a diff with the previously observed CPU time, which is essentially: `_counter->inc(_total - _counter->get_value())`. It looks unnecessary and is not as clean as ` _counter->set_value(_total)`. I agree that the current version doesn't look correct, but I don't see a reason to change the `PerfCounter` back to `PerfVariable`. Just accumulate the time as suggested by @caoman (i.e. `_total += os::thread_cpu_time(thread)`), then call `_counter->inc(_total - _counter->get_value())` in the destructor. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1321703912 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1321650203 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1321669415 From jvernee at openjdk.org Mon Sep 11 15:37:11 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 11 Sep 2023 15:37:11 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v20] In-Reply-To: References: Message-ID: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 43 commits: - 8315917: Passing struct by values seems under specified Reviewed-by: mcimadamore - Merge branch 'master' into JEP22 - Merge branch 'master' into JEP22 - add code snippet - Split long throws clauses in `MemorySegment` javadoc Reviewed-by: jvernee - Add support for sliced allocation - add name of SysV ABI - Fix javadoc issues in MemorySegment::copy Reviewed-by: jvernee - remove reference to allocateArray - PPC linker changes - ... and 33 more: https://git.openjdk.org/jdk/compare/35bccacb...0e702f06 ------------- Changes: https://git.openjdk.org/jdk/pull/15103/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=19 Stats: 3759 lines in 244 files changed: 1901 ins; 1000 del; 858 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From iwalulya at openjdk.org Mon Sep 11 15:48:37 2023 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 11 Sep 2023 15:48:37 GMT Subject: RFR: 8315550: G1: Fix -Wconversion warnings in g1NUMA In-Reply-To: References: Message-ID: <20Sk8gTeGi4BZ1Wf-LOr0JpHXwLvEagY5zQoWhdv4tI=.3e1a7e06-2de7-4ddf-95e5-5a68e7d4431c@github.com> On Fri, 1 Sep 2023 15:48:25 GMT, Albert Mingkun Yang wrote: > Simple `int` to `uint` for NUMA node-id. > > Possibly, `numa_get_leaf_groups` should accept `uint[]`. I will attempt that in another PR, as that will be mostly runtime, not G1 specific. Marked as reviewed by iwalulya (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15541#pullrequestreview-1620317425 From rrich at openjdk.org Mon Sep 11 15:51:43 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 11 Sep 2023 15:51:43 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v3] In-Reply-To: References: Message-ID: On Thu, 3 Aug 2023 11:46:45 GMT, Thomas Schatzl wrote: >> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: >> >> Limit effect of previous commit to large array handling > > src/hotspot/share/gc/parallel/psCardTable.cpp line 257: > >> 255: space_top); >> 256: >> 257: // Process a stripe iff it contains any obj-start or large array chunk > > Suggestion: > > // Stripes without an object start may either contain a large object, or a part of a large objArray; the latter must be handled specially, the former is handled by the owner of the stripe where that large object starts. > > I think the original comment referred to the not-taken path of the next `if`; sind the taken path now also potentially processes an object (looking for a part of a large objArray) I reformulated it. I overlooked that the suggested comment needs line breaks. Also I think that the long comment before `PSCardTable::scavenge_contents_parallel` fails to explain which thread has to scan which object: objects starting in a stripe are scanned completely by the thread owning the stripe even if they extend beyond it. Worker threads skip over objects not starting in their stripe. With that the comment at line 257 can be kept shorter. I'll come up with a suggestion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1321753008 From sspitsyn at openjdk.org Mon Sep 11 18:17:40 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 11 Sep 2023 18:17:40 GMT Subject: RFR: 8312174: missing JVMTI events from vthreads parked during JVMTI attach [v4] In-Reply-To: <96hqmxZL0Uer6QWJRtpg9QYkPdt-JcUBcQ4bNzO6LVY=.cd603ff4-52c6-4614-b02f-35a64b6a3a15@github.com> References: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> <96hqmxZL0Uer6QWJRtpg9QYkPdt-JcUBcQ4bNzO6LVY=.cd603ff4-52c6-4614-b02f-35a64b6a3a15@github.com> Message-ID: On Mon, 11 Sep 2023 09:08:26 GMT, Serguei Spitsyn wrote: >> This update fixes two important issues: >> - Issue reported by a bug submitter about missing JVMTI events on virtual threads after an a JVMTI agent dynamic attach >> - Known scalability/performance issue: a need to lazily create `JvmtiThreadState's` for virtual threads >> >> The issue is tricky to fix because the existing mechanism of the JVMTI event management does not support unmounted virtual threads. The JVMTI `SetEventNotificationMode()` calls the function `JvmtiEventControllerPrivate::recompute_enabled()` >> which inspects a `JavaThread's` list and for each thread in the list recomputes enabled event bits with the function `JvmtiEventControllerPrivate::recompute_thread_enabled()`. The `JvmtiThreadState` of each thread is created but only when it is really needed, eg, if any of the thread filtered events is enabled. There was an initial adjustment of this mechanism for virtual threads which accounted for both carrier and virtual threads when a virtual thread is mounted. However, it does not work for unmounted virtual threads. A temporary work around was to always create `JvmtiThreadState` for each virtual thread eagerly at a thread starting point. >> >> This fix introduces new function `JvmtiExport::get_jvmti_thread_state()` which checks if thread is virtual and there is a thread filtered event enabled globally, and if so, forces a creation of the `JvmtiThreadState`. Another adjustment was needed because the function `state_for_while_locked()` can be called directly in some contexts. New function `JvmtiEventController::recompute_thread_filtered()` was introduced to make necessary corrections. >> >> Testing: >> - new test from the bug report was adopted: `test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest` >> - ran mach5 tiers 1-6: all are passed > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > removed JavaThread::is_virtual() I've pushed an update. It includes: - addressed review comments on new test `test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest` (some details are listed below) - added comments for `state_for()` and `state_for_while_locked()` to `src/hotspot/share/prims/jvmtiThreadState.hpp` as Alex suggested - moved the call to `JvmtiEventController::recompute_thread_filtered(state)` from `state_for_while_locked()` to `state_for()` - removed newly added function `JavaThread::is_virtual()` and replaced its use in `jvmtiExport.cpp` with `is_vthread_mounted()` Some of new test updates: - Renamed `check_jvmti_err()` to `check_jvmti_error()` and moved it to `test/lib/jdk/test/lib/jvmti/jvmti_common.h` as Leonid suggested - Removed VARIADICJNI and `namespace` in the native agent - Removed `std::mutex lock` and used atomic counter instead - Added `@requires vm.compMode != "Xcomp"` as the test execution time with `-Xcomp` sometimes is not enough - I did not replace the test `Counter` class with the use `CountDownLatch` because it caused deadlocks while the test never deadlocks with the `Counter` class. Instead I've renamed `Counter` to `CountDownLatch` so that it can be easy to remove this custom implementation of `CountDownLatch` with the one from the `java.util.concorrent`. I'm not sure yet why use of original `CountDownLatch` class causes deadlocks. - Refactored Java part of the test by introducing methods `test1()`, `test2()` and `test3()` - Added code to wait for `test1()` to reach the execution of `Thread.sleep(big-timeout)` before the agent dynamic attach. One problem is that the thread state in `sleep()` is `WAITING` but not `TIMED_WAITING` (this looks like a bug: will need to follow up. So, there was a need to distinguish if the `test1()` does not execute the await code. It is why one more `CountDownLatch` object has been added. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15467#issuecomment-1714362970 From lmesnik at openjdk.org Mon Sep 11 19:55:40 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 11 Sep 2023 19:55:40 GMT Subject: RFR: 8312174: missing JVMTI events from vthreads parked during JVMTI attach [v4] In-Reply-To: <96hqmxZL0Uer6QWJRtpg9QYkPdt-JcUBcQ4bNzO6LVY=.cd603ff4-52c6-4614-b02f-35a64b6a3a15@github.com> References: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> <96hqmxZL0Uer6QWJRtpg9QYkPdt-JcUBcQ4bNzO6LVY=.cd603ff4-52c6-4614-b02f-35a64b6a3a15@github.com> Message-ID: On Mon, 11 Sep 2023 09:08:26 GMT, Serguei Spitsyn wrote: >> This update fixes two important issues: >> - Issue reported by a bug submitter about missing JVMTI events on virtual threads after an a JVMTI agent dynamic attach >> - Known scalability/performance issue: a need to lazily create `JvmtiThreadState's` for virtual threads >> >> The issue is tricky to fix because the existing mechanism of the JVMTI event management does not support unmounted virtual threads. The JVMTI `SetEventNotificationMode()` calls the function `JvmtiEventControllerPrivate::recompute_enabled()` >> which inspects a `JavaThread's` list and for each thread in the list recomputes enabled event bits with the function `JvmtiEventControllerPrivate::recompute_thread_enabled()`. The `JvmtiThreadState` of each thread is created but only when it is really needed, eg, if any of the thread filtered events is enabled. There was an initial adjustment of this mechanism for virtual threads which accounted for both carrier and virtual threads when a virtual thread is mounted. However, it does not work for unmounted virtual threads. A temporary work around was to always create `JvmtiThreadState` for each virtual thread eagerly at a thread starting point. >> >> This fix introduces new function `JvmtiExport::get_jvmti_thread_state()` which checks if thread is virtual and there is a thread filtered event enabled globally, and if so, forces a creation of the `JvmtiThreadState`. Another adjustment was needed because the function `state_for_while_locked()` can be called directly in some contexts. New function `JvmtiEventController::recompute_thread_filtered()` was introduced to make necessary corrections. >> >> Testing: >> - new test from the bug report was adopted: `test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest` >> - ran mach5 tiers 1-6: all are passed > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > removed JavaThread::is_virtual() Could you please add comment why standard CountDownLatch doesn't work for this test. ------------- Marked as reviewed by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15467#pullrequestreview-1620728724 From jjoo at openjdk.org Mon Sep 11 20:54:26 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Mon, 11 Sep 2023 20:54:26 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v9] In-Reply-To: References: Message-ID: <3EsRbzY1z018AwIpCwMJddEh2YtEhH73KqDnjU3WuiU=.5c34dd5b-39da-4699-b6f2-7a31e7f16e8a@github.com> > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: - Merge branch 'openjdk:master' into master - Fix includes - Move ThreadTotalCPUTimeClosure to thread.hpp - Properly initialize concurrent dedup thread counter - rename counters to be *.cpu_time - address partial comments from Volker and Man - address remainder of dholmes' comments - address dholmes@ comments - Add hsperf counters for CPU time of JVM internal threads ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/2f44b814..27c45a3c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=07-08 Stats: 76507 lines in 2417 files changed: 43696 ins; 16906 del; 15905 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From sspitsyn at openjdk.org Mon Sep 11 21:05:42 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 11 Sep 2023 21:05:42 GMT Subject: RFR: 8312174: missing JVMTI events from vthreads parked during JVMTI attach [v4] In-Reply-To: References: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> <96hqmxZL0Uer6QWJRtpg9QYkPdt-JcUBcQ4bNzO6LVY=.cd603ff4-52c6-4614-b02f-35a64b6a3a15@github.com> Message-ID: On Mon, 11 Sep 2023 19:53:19 GMT, Leonid Mesnik wrote: > Could you please add comment why standard CountDownLatch doesn't work for this test. Okay. Added comment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15467#issuecomment-1714574938 From jjoo at openjdk.org Mon Sep 11 21:08:28 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Mon, 11 Sep 2023 21:08:28 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v10] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Resolve some simple comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/27c45a3c..43e2de15 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=08-09 Stats: 8 lines in 3 files changed: 0 ins; 7 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From duke at openjdk.org Mon Sep 11 21:18:49 2023 From: duke at openjdk.org (ExE Boss) Date: Mon, 11 Sep 2023 21:18:49 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v20] In-Reply-To: References: Message-ID: <6ekIcAU0bARYkB_E_QCPD4u9jnqtFgEXCpImDcaxVPE=.a2b116fe-409b-4867-a389-d53ad1a94873@github.com> On Mon, 11 Sep 2023 15:37:11 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedO... > > Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 43 commits: > > - 8315917: Passing struct by values seems under specified > > Reviewed-by: mcimadamore > - Merge branch 'master' into JEP22 > - Merge branch 'master' into JEP22 > - add code snippet > - Split long throws clauses in `MemorySegment` javadoc > > Reviewed-by: jvernee > - Add support for sliced allocation > - add name of SysV ABI > - Fix javadoc issues in MemorySegment::copy > > Reviewed-by: jvernee > - remove reference to allocateArray > - PPC linker changes > - ... and 33 more: https://git.openjdk.org/jdk/compare/35bccacb...0e702f06 src/java.base/share/classes/java/lang/foreign/Linker.java line 573: > 571: * The returned method handle will throw an {@link IllegalArgumentException} if the {@link MemorySegment} > 572: * representing the target address of the foreign function is the {@link MemorySegment#NULL} address. If an argument > 573: * is a {@link MemorySegment},whose corresponding layout is an {@linkplain GroupLayout group layout}, the linker might attempt to access the contents of the segment. As such, one of the exceptions specified by the **Nit:** Suggestion: * is a {@link MemorySegment},whose corresponding layout is a {@linkplain GroupLayout group layout}, the linker might attempt to access the contents of the segment. As such, one of the exceptions specified by the ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1322081045 From sspitsyn at openjdk.org Mon Sep 11 21:22:18 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 11 Sep 2023 21:22:18 GMT Subject: RFR: 8312174: missing JVMTI events from vthreads parked during JVMTI attach [v5] In-Reply-To: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> References: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> Message-ID: > This update fixes two important issues: > - Issue reported by a bug submitter about missing JVMTI events on virtual threads after an a JVMTI agent dynamic attach > - Known scalability/performance issue: a need to lazily create `JvmtiThreadState's` for virtual threads > > The issue is tricky to fix because the existing mechanism of the JVMTI event management does not support unmounted virtual threads. The JVMTI `SetEventNotificationMode()` calls the function `JvmtiEventControllerPrivate::recompute_enabled()` > which inspects a `JavaThread's` list and for each thread in the list recomputes enabled event bits with the function `JvmtiEventControllerPrivate::recompute_thread_enabled()`. The `JvmtiThreadState` of each thread is created but only when it is really needed, eg, if any of the thread filtered events is enabled. There was an initial adjustment of this mechanism for virtual threads which accounted for both carrier and virtual threads when a virtual thread is mounted. However, it does not work for unmounted virtual threads. A temporary work around was to always create `JvmtiThreadState` for each virtual thread eagerly at a thread starting point. > > This fix introduces new function `JvmtiExport::get_jvmti_thread_state()` which checks if thread is virtual and there is a thread filtered event enabled globally, and if so, forces a creation of the `JvmtiThreadState`. Another adjustment was needed because the function `state_for_while_locked()` can be called directly in some contexts. New function `JvmtiEventController::recompute_thread_filtered()` was introduced to make necessary corrections. > > Testing: > - new test from the bug report was adopted: `test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest` > - ran mach5 tiers 1-6: all are passed Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: addressed second round of review comments on VThreadEventTest.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15467/files - new: https://git.openjdk.org/jdk/pull/15467/files/f5a144bc..9f9355df Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15467&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15467&range=03-04 Stats: 15 lines in 1 file changed: 5 ins; 8 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15467.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15467/head:pull/15467 PR: https://git.openjdk.org/jdk/pull/15467 From amenkov at openjdk.org Mon Sep 11 22:41:42 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Mon, 11 Sep 2023 22:41:42 GMT Subject: RFR: 8312174: missing JVMTI events from vthreads parked during JVMTI attach [v5] In-Reply-To: References: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> Message-ID: On Mon, 11 Sep 2023 21:22:18 GMT, Serguei Spitsyn wrote: >> This update fixes two important issues: >> - Issue reported by a bug submitter about missing JVMTI events on virtual threads after an a JVMTI agent dynamic attach >> - Known scalability/performance issue: a need to lazily create `JvmtiThreadState's` for virtual threads >> >> The issue is tricky to fix because the existing mechanism of the JVMTI event management does not support unmounted virtual threads. The JVMTI `SetEventNotificationMode()` calls the function `JvmtiEventControllerPrivate::recompute_enabled()` >> which inspects a `JavaThread's` list and for each thread in the list recomputes enabled event bits with the function `JvmtiEventControllerPrivate::recompute_thread_enabled()`. The `JvmtiThreadState` of each thread is created but only when it is really needed, eg, if any of the thread filtered events is enabled. There was an initial adjustment of this mechanism for virtual threads which accounted for both carrier and virtual threads when a virtual thread is mounted. However, it does not work for unmounted virtual threads. A temporary work around was to always create `JvmtiThreadState` for each virtual thread eagerly at a thread starting point. >> >> This fix introduces new function `JvmtiExport::get_jvmti_thread_state()` which checks if thread is virtual and there is a thread filtered event enabled globally, and if so, forces a creation of the `JvmtiThreadState`. Another adjustment was needed because the function `state_for_while_locked()` can be called directly in some contexts. New function `JvmtiEventController::recompute_thread_filtered()` was introduced to make necessary corrections. >> >> Testing: >> - new test from the bug report was adopted: `test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest` >> - ran mach5 tiers 1-6: all are passed > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > addressed second round of review comments on VThreadEventTest.java Marked as reviewed by amenkov (Reviewer). test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/VThreadEventTest.java line 45: > 43: * The test uses custom implementation of the CountDownLatch class. > 44: * The reason is we want the state of tested thread to be predictable. > 45: * With original CountDownLatch it is not clear what thread state is expected. "original CountDownLatch" -> "java.util.concurrent.CountDownLatch" test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/VThreadEventTest.java line 106: > 104: ready1.countDown(); // to guaranty state is not State.WAITING after await(mready) > 105: try { > 106: Thread.sleep(20000); // big timeout to keep unmounted untill interrupted untill -> until test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/VThreadEventTest.java line 132: > 130: // keep mounted > 131: } > 132: LockSupport.parkNanos(10 * TIMEOUT_BASE); // will cause extra mount and unmount I don't see much sense in TIMEOUT_BASE constant (it's used only here and multiplied by 10) I think it would be clearer to Suggestion: // park for 10ms; causes extra unmount and mount LockSupport.parkNanos(10_000_000L); test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/libVThreadEventTest.cpp line 24: > 22: */ > 23: > 24: #include I suppose this include was needed for abort() only and not needed anymore test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/libVThreadEventTest.cpp line 27: > 25: #include > 26: #include > 27: #include not needed ------------- PR Review: https://git.openjdk.org/jdk/pull/15467#pullrequestreview-1620911716 PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1322125143 PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1322125314 PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1322134900 PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1322138828 PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1322135837 From manc at openjdk.org Mon Sep 11 23:42:40 2023 From: manc at openjdk.org (Man Cao) Date: Mon, 11 Sep 2023 23:42:40 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v8] In-Reply-To: References: <1VlRygD5Ma7S7PwUsJdwi-gPVo_T30nwA7Tv0BAemNA=.47e3cfc0-0f89-4a9a-bd41-31689db67bd1@github.com> <_-ekt02un2mj-y6Z_v-4HiLjDoassXhZirdsu3ZmmUA=.fca59c04-7ddf-44fc-9fc2-575e915509dd@github.com> Message-ID: On Mon, 11 Sep 2023 15:10:23 GMT, Volker Simonis wrote: >> src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 461: >> >>> 459: >>> 460: _g1_concurrent_mark_threads_cpu_time = >>> 461: PerfDataManager::create_counter(SUN_THREADS, "g1_conc_mark_thread.cpu_time", >> >> I find a bit odd to have "." in the name. "." should be the separator for namespace, but not within the counter name. >> >> I think @simonis 's suggestion about `sun.gc.collector..cpu_time` or `sun.gc.cpu_time` is to have a single, aggregated counter named `cpu_time`. If we don't do such aggregation, the names should probably be `g1_conc_mark_cpu_time`, `parallel_gc_workers_cpu_time`, etc. > > I suggested to have a single, aggregated counter for all GC threads **in addition** to the specific counters. This aggregated counter should be the same for all GCs which implement CPU times. It could easily be used by tools without knowing which GC is enabled and even more important, it would be immune to implementation changes (e.g. if a GC would establish a new subset of GC worker threads). From what I understand this is also your primary use case for your adaptable heap sizing feature. > > The aggregated counter should be in a generic place for *all* GCs (e.g. `sun.gc.cpu_time` or `sun.threads.gc_cpu_time`). For the JIT we could then add the corresponding `sun.ci.cpu_time` or `sun.threads.jit_cpu_time` (in addition to the more specific counters for C1, C2, etc.). > > For the specific GC counters: instead of putting the `.cpu_time` in the counter name, you can create a sub-namespace instead (e.g. `sun.gc.cpu_time.*` or `sun.threads.cpu_time.*`) and add all the other counters under that namespace (e.g. `sun.gc.cpu_time.total`, `sun.gc.cpu_time.conc_mark`, `sun.gc.cpu_time.parallel_gc_workers`, etc... What I want to avoid in any case is to have the name of the GC in every single counter name because that's redundant information. Thanks for the clarification. Yes, adding a new counter for aggregation sounds good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1322182754 From sspitsyn at openjdk.org Tue Sep 12 00:06:41 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 12 Sep 2023 00:06:41 GMT Subject: RFR: 8312174: missing JVMTI events from vthreads parked during JVMTI attach [v5] In-Reply-To: References: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> Message-ID: <7zuVX2HiaCQeIRssPp_6FaXdYlRzFeoSTz4pAjmxlpY=.74459df8-d3ad-4dd8-987d-8867fb02807f@github.com> On Mon, 11 Sep 2023 21:22:18 GMT, Serguei Spitsyn wrote: >> This update fixes two important issues: >> - Issue reported by a bug submitter about missing JVMTI events on virtual threads after an a JVMTI agent dynamic attach >> - Known scalability/performance issue: a need to lazily create `JvmtiThreadState's` for virtual threads >> >> The issue is tricky to fix because the existing mechanism of the JVMTI event management does not support unmounted virtual threads. The JVMTI `SetEventNotificationMode()` calls the function `JvmtiEventControllerPrivate::recompute_enabled()` >> which inspects a `JavaThread's` list and for each thread in the list recomputes enabled event bits with the function `JvmtiEventControllerPrivate::recompute_thread_enabled()`. The `JvmtiThreadState` of each thread is created but only when it is really needed, eg, if any of the thread filtered events is enabled. There was an initial adjustment of this mechanism for virtual threads which accounted for both carrier and virtual threads when a virtual thread is mounted. However, it does not work for unmounted virtual threads. A temporary work around was to always create `JvmtiThreadState` for each virtual thread eagerly at a thread starting point. >> >> This fix introduces new function `JvmtiExport::get_jvmti_thread_state()` which checks if thread is virtual and there is a thread filtered event enabled globally, and if so, forces a creation of the `JvmtiThreadState`. Another adjustment was needed because the function `state_for_while_locked()` can be called directly in some contexts. New function `JvmtiEventController::recompute_thread_filtered()` was introduced to make necessary corrections. >> >> Testing: >> - new test from the bug report was adopted: `test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest` >> - ran mach5 tiers 1-6: all are passed > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > addressed second round of review comments on VThreadEventTest.java Leonid and Alex, thank you a lot for review and discussions! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15467#issuecomment-1714769193 From sspitsyn at openjdk.org Tue Sep 12 00:06:47 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 12 Sep 2023 00:06:47 GMT Subject: RFR: 8312174: missing JVMTI events from vthreads parked during JVMTI attach [v5] In-Reply-To: References: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> Message-ID: <8wBxYU27CmY9UzBGr-LlcIEg-TgnBetf5MrNwyzPWZ4=.c2c8c069-d8f3-4740-9202-5ad12ad507d8@github.com> On Mon, 11 Sep 2023 22:18:03 GMT, Alex Menkov wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> addressed second round of review comments on VThreadEventTest.java > > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/VThreadEventTest.java line 45: > >> 43: * The test uses custom implementation of the CountDownLatch class. >> 44: * The reason is we want the state of tested thread to be predictable. >> 45: * With original CountDownLatch it is not clear what thread state is expected. > > "original CountDownLatch" -> "java.util.concurrent.CountDownLatch" Thanks - fixed now. > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/VThreadEventTest.java line 106: > >> 104: ready1.countDown(); // to guaranty state is not State.WAITING after await(mready) >> 105: try { >> 106: Thread.sleep(20000); // big timeout to keep unmounted untill interrupted > > untill -> until Thanks - fixed now. > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/VThreadEventTest.java line 132: > >> 130: // keep mounted >> 131: } >> 132: LockSupport.parkNanos(10 * TIMEOUT_BASE); // will cause extra mount and unmount > > I don't see much sense in TIMEOUT_BASE constant (it's used only here and multiplied by 10) > I think it would be clearer to > Suggestion: > > // park for 10ms; causes extra unmount and mount > LockSupport.parkNanos(10_000_000L); Thanks - fixed. > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/libVThreadEventTest.cpp line 24: > >> 22: */ >> 23: >> 24: #include > > I suppose this include was needed for abort() only and not needed anymore Thanks - removed. > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/libVThreadEventTest.cpp line 27: > >> 25: #include >> 26: #include >> 27: #include > > not needed Thanks - removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1322195487 PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1322198339 PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1322201761 PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1322205612 PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1322204613 From sspitsyn at openjdk.org Tue Sep 12 00:06:47 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 12 Sep 2023 00:06:47 GMT Subject: RFR: 8312174: missing JVMTI events from vthreads parked during JVMTI attach [v5] In-Reply-To: References: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> <7cWSn9ZypedqQPPkGV8xFhFfpGYHMJcDZ5TH6Hel4mQ=.49235d1c-737c-465e-833c-a26d814cb7ac@github.com> <9NY499Z6epRjQ-ZvDrbxS6weL8QG-7djNWJN-o9SCmc=.6d58bb4a-2e26-4a96-ae1b-dbe6682ebe8f@github.com> <-viFMo-IWbIoWm7DBpH5yISHqfmVdSy-UFwroc6O1-w=.b961f69b-71b1-431e-a19f-a5880292ec12@github.com> Message-ID: On Sat, 9 Sep 2023 18:05:51 GMT, Leonid Mesnik wrote: >> It is strange that the the tested vthreads in sleep(timeout) have sate WAITING, not TIMED_WAITING. >> It can be a bug in the implementation. >> I've decided to add a short sleep. Checking states looks a little bit over complicated. > > Could you please also add comment with sleep which describe why is it needed and mention that the insufficient sleep time couldn't cause test failures. Test should pass anyway just don't test expected state. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15467#discussion_r1322194160 From sspitsyn at openjdk.org Tue Sep 12 01:23:31 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 12 Sep 2023 01:23:31 GMT Subject: RFR: 8312174: missing JVMTI events from vthreads parked during JVMTI attach [v6] In-Reply-To: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> References: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> Message-ID: > This update fixes two important issues: > - Issue reported by a bug submitter about missing JVMTI events on virtual threads after an a JVMTI agent dynamic attach > - Known scalability/performance issue: a need to lazily create `JvmtiThreadState's` for virtual threads > > The issue is tricky to fix because the existing mechanism of the JVMTI event management does not support unmounted virtual threads. The JVMTI `SetEventNotificationMode()` calls the function `JvmtiEventControllerPrivate::recompute_enabled()` > which inspects a `JavaThread's` list and for each thread in the list recomputes enabled event bits with the function `JvmtiEventControllerPrivate::recompute_thread_enabled()`. The `JvmtiThreadState` of each thread is created but only when it is really needed, eg, if any of the thread filtered events is enabled. There was an initial adjustment of this mechanism for virtual threads which accounted for both carrier and virtual threads when a virtual thread is mounted. However, it does not work for unmounted virtual threads. A temporary work around was to always create `JvmtiThreadState` for each virtual thread eagerly at a thread starting point. > > This fix introduces new function `JvmtiExport::get_jvmti_thread_state()` which checks if thread is virtual and there is a thread filtered event enabled globally, and if so, forces a creation of the `JvmtiThreadState`. Another adjustment was needed because the function `state_for_while_locked()` can be called directly in some contexts. New function `JvmtiEventController::recompute_thread_filtered()` was introduced to make necessary corrections. > > Testing: > - new test from the bug report was adopted: `test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest` > - ran mach5 tiers 1-6: all are passed Serguei Spitsyn has updated the pull request incrementally with two additional commits since the last revision: - removed a trailing space - addressed one more review round on new test VThreadEventTest ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15467/files - new: https://git.openjdk.org/jdk/pull/15467/files/9f9355df..600974e0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15467&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15467&range=04-05 Stats: 8 lines in 2 files changed: 1 ins; 3 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/15467.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15467/head:pull/15467 PR: https://git.openjdk.org/jdk/pull/15467 From duke at openjdk.org Tue Sep 12 01:39:03 2023 From: duke at openjdk.org (duke) Date: Tue, 12 Sep 2023 01:39:03 GMT Subject: Withdrawn: 8311661: Resolve duplicate symbol of StringTable::StringTable with JDK static linking In-Reply-To: References: Message-ID: On Sat, 8 Jul 2023 00:15:01 GMT, Jiangli Zhou wrote: > Move StringTable to 'hotspot_jvm' namespace. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/14808 From jjoo at openjdk.org Tue Sep 12 01:43:19 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Tue, 12 Sep 2023 01:43:19 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v11] In-Reply-To: References: Message-ID: <2DYTZnj5sVD-gGmUtOmgP9yXqThdS-WFdrEqSCZsatY=.e76add5d-4643-4a0a-b071-e6a3d6de4d99@github.com> > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Address counte update correctness ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/43e2de15..18c8f9cb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=09-10 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From sspitsyn at openjdk.org Tue Sep 12 01:44:23 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 12 Sep 2023 01:44:23 GMT Subject: RFR: 8312174: missing JVMTI events from vthreads parked during JVMTI attach [v7] In-Reply-To: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> References: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> Message-ID: > This update fixes two important issues: > - Issue reported by a bug submitter about missing JVMTI events on virtual threads after an a JVMTI agent dynamic attach > - Known scalability/performance issue: a need to lazily create `JvmtiThreadState's` for virtual threads > > The issue is tricky to fix because the existing mechanism of the JVMTI event management does not support unmounted virtual threads. The JVMTI `SetEventNotificationMode()` calls the function `JvmtiEventControllerPrivate::recompute_enabled()` > which inspects a `JavaThread's` list and for each thread in the list recomputes enabled event bits with the function `JvmtiEventControllerPrivate::recompute_thread_enabled()`. The `JvmtiThreadState` of each thread is created but only when it is really needed, eg, if any of the thread filtered events is enabled. There was an initial adjustment of this mechanism for virtual threads which accounted for both carrier and virtual threads when a virtual thread is mounted. However, it does not work for unmounted virtual threads. A temporary work around was to always create `JvmtiThreadState` for each virtual thread eagerly at a thread starting point. > > This fix introduces new function `JvmtiExport::get_jvmti_thread_state()` which checks if thread is virtual and there is a thread filtered event enabled globally, and if so, forces a creation of the `JvmtiThreadState`. Another adjustment was needed because the function `state_for_while_locked()` can be called directly in some contexts. New function `JvmtiEventController::recompute_thread_filtered()` was introduced to make necessary corrections. > > Testing: > - new test from the bug report was adopted: `test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest` > - ran mach5 tiers 1-6: all are passed Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: use virtualThreadScheduler.parallelism instead of ForkJoinPool.common.parallelism ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15467/files - new: https://git.openjdk.org/jdk/pull/15467/files/600974e0..0ce4701a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15467&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15467&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15467.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15467/head:pull/15467 PR: https://git.openjdk.org/jdk/pull/15467 From sspitsyn at openjdk.org Tue Sep 12 02:49:49 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 12 Sep 2023 02:49:49 GMT Subject: Integrated: 8312174: missing JVMTI events from vthreads parked during JVMTI attach In-Reply-To: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> References: <5UkbsOBV6ixFV5IhduISKS7NpvPjU8s1r54KOwpBTC4=.510974ad-dfec-4ef8-8b41-1cd8d867d905@github.com> Message-ID: On Tue, 29 Aug 2023 10:09:21 GMT, Serguei Spitsyn wrote: > This update fixes two important issues: > - Issue reported by a bug submitter about missing JVMTI events on virtual threads after an a JVMTI agent dynamic attach > - Known scalability/performance issue: a need to lazily create `JvmtiThreadState's` for virtual threads > > The issue is tricky to fix because the existing mechanism of the JVMTI event management does not support unmounted virtual threads. The JVMTI `SetEventNotificationMode()` calls the function `JvmtiEventControllerPrivate::recompute_enabled()` > which inspects a `JavaThread's` list and for each thread in the list recomputes enabled event bits with the function `JvmtiEventControllerPrivate::recompute_thread_enabled()`. The `JvmtiThreadState` of each thread is created but only when it is really needed, eg, if any of the thread filtered events is enabled. There was an initial adjustment of this mechanism for virtual threads which accounted for both carrier and virtual threads when a virtual thread is mounted. However, it does not work for unmounted virtual threads. A temporary work around was to always create `JvmtiThreadState` for each virtual thread eagerly at a thread starting point. > > This fix introduces new function `JvmtiExport::get_jvmti_thread_state()` which checks if thread is virtual and there is a thread filtered event enabled globally, and if so, forces a creation of the `JvmtiThreadState`. Another adjustment was needed because the function `state_for_while_locked()` can be called directly in some contexts. New function `JvmtiEventController::recompute_thread_filtered()` was introduced to make necessary corrections. > > Testing: > - new test from the bug report was adopted: `test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest` > - ran mach5 tiers 1-6: all are passed This pull request has now been integrated. Changeset: fda142ff Author: Serguei Spitsyn URL: https://git.openjdk.org/jdk/commit/fda142ff6cfefa12ec1ea4d4eb48b3c1b285bc04 Stats: 424 lines in 9 files changed: 376 ins; 20 del; 28 mod 8312174: missing JVMTI events from vthreads parked during JVMTI attach Reviewed-by: lmesnik, amenkov ------------- PR: https://git.openjdk.org/jdk/pull/15467 From dholmes at openjdk.org Tue Sep 12 04:54:39 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 12 Sep 2023 04:54:39 GMT Subject: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX In-Reply-To: References: Message-ID: On Wed, 6 Sep 2023 08:18:45 GMT, JoKern65 wrote: > After push of [JDK-8307478](https://bugs.openjdk.org/browse/JDK-8307478) , the following test started to fail on AIX : > com/sun/tools/attach/warnings/DynamicLoadWarningTest.java; > The problem was described in [JDK-8309549](https://bugs.openjdk.org/browse/JDK-8309549) with a first try of a fix. > A second fix via [JDK-8310191](https://bugs.openjdk.org/browse/JDK-8310191) was necessary. > Both fixes just disable the specific subtest on AIX, without correction of the root cause. > The root cause is, that dlopen() on AIX returns different handles every time, even if you load a library twice. There is no official AIX API available to get this information on a different way. > My proposal is, to use the stat64x API with the fields st_device and st_inode. After a dlopen() the stat64x() API is called additionally to get this information which is then stored parallel to the library handle in the jvmtiAgent. For AIX we then can compare these values instead of the library handle and get the same functionality as on linux. test/jdk/com/sun/tools/attach/warnings/DynamicLoadWarningTest.java line 127: > 125: > 126: // test behavior on platforms that can detect if an agent library was previously loaded > 127: if (!Platform.isAix()) { You need to fix the indentation of the old block. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15583#discussion_r1322378759 From dholmes at openjdk.org Tue Sep 12 05:01:40 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 12 Sep 2023 05:01:40 GMT Subject: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX In-Reply-To: References: Message-ID: On Wed, 6 Sep 2023 08:18:45 GMT, JoKern65 wrote: > After push of [JDK-8307478](https://bugs.openjdk.org/browse/JDK-8307478) , the following test started to fail on AIX : > com/sun/tools/attach/warnings/DynamicLoadWarningTest.java; > The problem was described in [JDK-8309549](https://bugs.openjdk.org/browse/JDK-8309549) with a first try of a fix. > A second fix via [JDK-8310191](https://bugs.openjdk.org/browse/JDK-8310191) was necessary. > Both fixes just disable the specific subtest on AIX, without correction of the root cause. > The root cause is, that dlopen() on AIX returns different handles every time, even if you load a library twice. There is no official AIX API available to get this information on a different way. > My proposal is, to use the stat64x API with the fields st_device and st_inode. After a dlopen() the stat64x() API is called additionally to get this information which is then stored parallel to the library handle in the jvmtiAgent. For AIX we then can compare these values instead of the library handle and get the same functionality as on linux. I would much rather see an AIX solution that was in the AIX version of `os::dll_load` rather than having to pollute the shared JVMTI code. I'm not sure how best to achieve that - it may not be possible to hide it completely - but we should be able to refactor things so `stat64x_LIBPATH` is defined in AIX code, and its use is via a helper so the code is only written once. Then we would only need a handful of `AIX_ONLY(...)` statements. src/hotspot/share/prims/jvmtiAgentList.cpp line 251: > 249: while (it.has_next()) { > 250: JvmtiAgent* const agent = it.next(); > 251: if (!agent->is_static_lib() && device && inode && Style nit: we don't use implicit booleans so check `device != 0` and `inode != 0` explicitly please. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15583#issuecomment-1714969165 PR Review Comment: https://git.openjdk.org/jdk/pull/15583#discussion_r1322387181 From haosun at openjdk.org Tue Sep 12 06:21:13 2023 From: haosun at openjdk.org (Hao Sun) Date: Tue, 12 Sep 2023 06:21:13 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v8] In-Reply-To: References: Message-ID: > ### Background > > 1. PAC-RET branch protection was initially implemented on Linux/AArch64 in JDK-8277204 [1]. > > 2. However, it was broken with the introduction of virtual threads [2], mainly because the continuation freeze/thaw mechanism would trigger stack copying to/from memory, whereas the saved and signed LR on the stack doesn't get re-signed accordingly. > > 3. PR-9067 [3] tried to implement the re-sign part, but it was not accepted because option "PreserveFramePointer" is always turned on by PAC-RET but this would slow down virtual threads by ~5-20x. > > 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview language features are enabled. Note that virtual thread is one preview feature then. > > 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. > > ### Goal > > This patch aims to make PAC-RET compatible with virtual threads. > > ### Requirements of virtual threads > > R-1: Option "PreserveFramePointer" should be turned off. That is, PAC-RET implementation should not rely on frame pointer FP. Otherwise, the fast path in stack copying will never be taken. > > R-2: Use some invariant values to stack copying as the modifier, so as to avoid the PAC re-sign for continuation thaw, as the fast path in stack copying doesn't walk the frame. > > Note that more details can be found in the discussion [3]. > > ### Investigation > > We considered to use (relative) stack pointer SP, thread ID, PACStack [7] and value zero as the candidate modifier. > > 1. SP: In some scenarios, we need to authenticate the return address in places where the current SP doesn't match the SP on function entry. E.g. see the usage in Runtime1::generate_handle_exception(). Hence, neither absolute nor relative SP works. > > 2. thread ID (tid): It's invariant to virtual thread, but it's nontrivial to access it from the JIT side. We need 1) firstly resolve the address of current thread (See [8] as an example), and 2) get the tid field in the way like java_lang_Thread::thread_id(). I suppose this would introduce big performance overhead. Then can we turn to use "rthread" register (JavaThread object address) as the modifier? Unfortunately, it's not an invariant to virtual threads and PAC re-sign is still needed. > > 5. PACStack uses the signed return address of caller as the modifier to sign the callee's return address. In this way, we get one PACed call chain. The modifier should be saved into somewhere around the frame record. Inevitably, FP should be preserved to make it easy to find this modifier in case of... Hao Sun has updated the pull request incrementally with one additional commit since the last revision: Introduce CPU_OVERRIDES_RETURN_ADDRESS_ACCESSORS ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13322/files - new: https://git.openjdk.org/jdk/pull/13322/files/08a8815c..14b809e0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13322&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13322&range=06-07 Stats: 59 lines in 8 files changed: 12 ins; 47 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13322.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13322/head:pull/13322 PR: https://git.openjdk.org/jdk/pull/13322 From haosun at openjdk.org Tue Sep 12 06:21:13 2023 From: haosun at openjdk.org (Hao Sun) Date: Tue, 12 Sep 2023 06:21:13 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v6] In-Reply-To: References: <2WwPU-8unu7c-__kCaG8YFgq_A97ZLX0Rh_aC_YtRtQ=.a180c529-8f86-45e2-8a6b-775ee90ff4c8@github.com> Message-ID: On Mon, 11 Sep 2023 12:34:02 GMT, Andrew Haley wrote: >> I guess the following pseudo-code is what you want: >> >> >> /* >> * file ContinuationHelper.hpp >> */ >> >> class BaseContinuationHelper { >> public: >> inline void patch_return_address_at(intptr_t* sp, address pc) { >> // the default implementation >> } >> } >> >> class ContinuationHelper : public BaseContinuationHelper { >> public: >> inline void patch_return_address_at(intptr_t* sp, address pc) {} // declare here >> } >> >> /* >> * file ContinuationHelper_aarch64.inline.hpp >> */ >> >> inline void ContinuationHelper::patch_return_address_at(intptr_t* sp, address pc) { >> // override here for AArch64 >> } >> >> /* >> * file ContinuationHelper_x86.inline.hpp >> */ >> >> // no need to define patch_return_address_at(). >> // use the default BaseContinuationHelper::patch_return_address_at(). >> >> >> However, it doesn't work because we have to define `ContinuationHelper::patch_return_address_at()` for x86 since we declare it. >> >> Please let me know if I misunderstood something. > > I see the problem. I'd do this: > > > diff --git a/src/hotspot/cpu/aarch64/continuationHelper_aarch64.inline.hpp b/src/hotspot/cpu/aarch64/continuationHelper_aarch64.inline.hpp > index 25e83e7e4b9..e1bd855dddf 100644 > --- a/src/hotspot/cpu/aarch64/continuationHelper_aarch64.inline.hpp > +++ b/src/hotspot/cpu/aarch64/continuationHelper_aarch64.inline.hpp > @@ -68,6 +68,8 @@ inline void ContinuationHelper::push_pd(const frame& f) { > *(intptr_t**)(f.sp() - frame::sender_sp_offset) = f.fp(); > } > > +#define CPU_OVERRIDES_RETURN_ADDRESS_ACCESSORS > + > inline address ContinuationHelper::return_address_at(intptr_t* sp) { > return pauth_strip_verifiable(*(address*)sp); > } > diff --git a/src/hotspot/cpu/x86/continuationHelper_x86.inline.hpp b/src/hotspot/cpu/x86/continuationHelper_x86.inline.hpp > index ce88dd6dbba..55794f9ac7e 100644 > --- a/src/hotspot/cpu/x86/continuationHelper_x86.inline.hpp > +++ b/src/hotspot/cpu/x86/continuationHelper_x86.inline.hpp > @@ -68,14 +68,6 @@ inline void ContinuationHelper::push_pd(const frame& f) { > *(intptr_t**)(f.sp() - frame::sender_sp_offset) = f.fp(); > } > > -inline address ContinuationHelper::return_address_at(intptr_t* sp) { > - return *(address*)sp; > -} > - > -inline void ContinuationHelper::patch_return_address_at(intptr_t* sp, address pc) { > - *(address*)sp = pc; > -} > - > inline void ContinuationHelper::set_anchor_to_entry_pd(JavaFrameAnchor* anchor, ContinuationEntry* entry) { > anchor->set_last_Java_fp(entry->entry_fp()); > } > diff --git a/src/hotspot/share/runtime/continuationHelper.inline.hpp b/src/hotspot/share/runtime/continuationHelper.inline.hpp > index 7c6ab7ee76b..6d4d739f219 100644 > --- a/src/hotspot/share/runtime/continuationHelper.inline.hpp > +++ b/src/hotspot/share/runtime/continuationHelper.inline.hpp > @@ -37,6 +37,15 @@ > > #include CPU_HEADER_INLINE(continuationHelper) > > +#ifndef CPU_OVERRIDES_RETURN_ADDRESS_ACCESSORS > +inline address ContinuationHelper::return_address_at(intptr_t* sp) { > + return *(address*)sp; > +} > +inline void ContinuationHelper::patch_return_address_at(intptr_t* sp, address pc) { > + *(address*)sp = pc; > +} > +#endif > + > inline bool ContinuationHelper::NonInterpretedUnknownFrame::is_instance(const frame& f) { > return !f.is_interpreted_frame(); > } Thanks for your suggestion. Updated in the latest commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1322474996 From ayang at openjdk.org Tue Sep 12 07:41:42 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 12 Sep 2023 07:41:42 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v5] In-Reply-To: <4BxTPGct1ALNjXupIUdMtGT3ZfB0qalFlNq0Vva162Y=.17fc708f-0eda-43e0-bfe6-8447ee5b3488@github.com> References: <4BxTPGct1ALNjXupIUdMtGT3ZfB0qalFlNq0Vva162Y=.17fc708f-0eda-43e0-bfe6-8447ee5b3488@github.com> Message-ID: On Mon, 11 Sep 2023 15:01:30 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' > - Apply Thomas' suggestions > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > - Limit effect of previous commit to large array handling > - Make sure to skip stripes where no object starts > - 8310031: Parallel: Implement better work distribution for large object arrays in old gen Just fyi, using the benchmark from JDK-8062128, I observe some slowdown on my box with `-Xms3g -Xmx3g -XX:+UseParallelGC -XX:ParallelGCThreads=2 -Xlog:gc DelayInducer.java`. ## baseline [0.003s][info][gc] Using Parallel [2.424s][info][gc] GC(0) Pause Young (Allocation Failure) 768M->619M(2944M) 1166.978ms [10.300s][info][gc] GC(1) Pause Young (Allocation Failure) 1704M->1670M(2944M) 3163.502ms [10.331s][info][gc] GC(2) Pause Full (Ergonomics) 1670M->4M(2385M) 30.905ms [12.121s][info][gc] GC(3) Pause Young (Allocation Failure) 654M->773M(2385M) 1446.274ms [16.185s][info][gc] GC(4) Pause Young (Allocation Failure) 1541M->1125M(2385M) 861.151ms [17.003s][info][gc] GC(5) Pause Full (Ergonomics) 1125M->164M(2699M) 818.025ms ## new [0.002s][info][gc] Using Parallel [2.410s][info][gc] GC(0) Pause Young (Allocation Failure) 768M->618M(2944M) 1152.280ms [12.303s][info][gc] GC(1) Pause Young (Allocation Failure) 1704M->1670M(2944M) 5176.716ms [12.334s][info][gc] GC(2) Pause Full (Ergonomics) 1670M->4M(2384M) 30.617ms [14.050s][info][gc] GC(3) Pause Young (Allocation Failure) 654M->773M(2384M) 1415.687ms [18.196s][info][gc] GC(4) Pause Young (Allocation Failure) 1541M->1125M(2384M) 1008.057ms [19.022s][info][gc] GC(5) Pause Full (Ergonomics) 1125M->164M(2691M) 825.451ms Note the ~2s increase in `GC(1)` young-gc pause. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1715166970 From ayang at openjdk.org Tue Sep 12 07:42:55 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 12 Sep 2023 07:42:55 GMT Subject: RFR: 8315550: G1: Fix -Wconversion warnings in g1NUMA In-Reply-To: References: Message-ID: On Fri, 1 Sep 2023 15:48:25 GMT, Albert Mingkun Yang wrote: > Simple `int` to `uint` for NUMA node-id. > > Possibly, `numa_get_leaf_groups` should accept `uint[]`. I will attempt that in another PR, as that will be mostly runtime, not G1 specific. Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15541#issuecomment-1715167949 From ayang at openjdk.org Tue Sep 12 07:42:56 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 12 Sep 2023 07:42:56 GMT Subject: Integrated: 8315550: G1: Fix -Wconversion warnings in g1NUMA In-Reply-To: References: Message-ID: On Fri, 1 Sep 2023 15:48:25 GMT, Albert Mingkun Yang wrote: > Simple `int` to `uint` for NUMA node-id. > > Possibly, `numa_get_leaf_groups` should accept `uint[]`. I will attempt that in another PR, as that will be mostly runtime, not G1 specific. This pull request has now been integrated. Changeset: 94800781 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/94800781eae192d3e82f5635d4aad165f11eabc1 Stats: 32 lines in 8 files changed: 0 ins; 1 del; 31 mod 8315550: G1: Fix -Wconversion warnings in g1NUMA Reviewed-by: tschatzl, iwalulya ------------- PR: https://git.openjdk.org/jdk/pull/15541 From rrich at openjdk.org Tue Sep 12 08:08:38 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 12 Sep 2023 08:08:38 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v5] In-Reply-To: References: <4BxTPGct1ALNjXupIUdMtGT3ZfB0qalFlNq0Vva162Y=.17fc708f-0eda-43e0-bfe6-8447ee5b3488@github.com> Message-ID: On Tue, 12 Sep 2023 07:38:53 GMT, Albert Mingkun Yang wrote: > Note the ~2s increase in `GC(1)` young-gc pause. My adhoc explanation would be the same as for the regression with just 1 gc thread running BigArrayInOldGenRR.java mentioned in the PR synopsis: splitting the array scan is less efficient. The cost for doing it can be higher then the gain if there is just 1 or 2 gc threads to benefit from the sharing. I'm still surprised that it happens with 2 gc threads. I will look into it after replying to all comments by Thomas and Roman. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1715210126 From aph at openjdk.org Tue Sep 12 10:17:44 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 12 Sep 2023 10:17:44 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v8] In-Reply-To: References: Message-ID: On Tue, 12 Sep 2023 06:21:13 GMT, Hao Sun wrote: >> ### Background >> >> 1. PAC-RET branch protection was initially implemented on Linux/AArch64 in JDK-8277204 [1]. >> >> 2. However, it was broken with the introduction of virtual threads [2], mainly because the continuation freeze/thaw mechanism would trigger stack copying to/from memory, whereas the saved and signed LR on the stack doesn't get re-signed accordingly. >> >> 3. PR-9067 [3] tried to implement the re-sign part, but it was not accepted because option "PreserveFramePointer" is always turned on by PAC-RET but this would slow down virtual threads by ~5-20x. >> >> 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview language features are enabled. Note that virtual thread is one preview feature then. >> >> 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. >> >> ### Goal >> >> This patch aims to make PAC-RET compatible with virtual threads. >> >> ### Requirements of virtual threads >> >> R-1: Option "PreserveFramePointer" should be turned off. That is, PAC-RET implementation should not rely on frame pointer FP. Otherwise, the fast path in stack copying will never be taken. >> >> R-2: Use some invariant values to stack copying as the modifier, so as to avoid the PAC re-sign for continuation thaw, as the fast path in stack copying doesn't walk the frame. >> >> Note that more details can be found in the discussion [3]. >> >> ### Investigation >> >> We considered to use (relative) stack pointer SP, thread ID, PACStack [7] and value zero as the candidate modifier. >> >> 1. SP: In some scenarios, we need to authenticate the return address in places where the current SP doesn't match the SP on function entry. E.g. see the usage in Runtime1::generate_handle_exception(). Hence, neither absolute nor relative SP works. >> >> 2. thread ID (tid): It's invariant to virtual thread, but it's nontrivial to access it from the JIT side. We need 1) firstly resolve the address of current thread (See [8] as an example), and 2) get the tid field in the way like java_lang_Thread::thread_id(). I suppose this would introduce big performance overhead. Then can we turn to use "rthread" register (JavaThread object address) as the modifier? Unfortunately, it's not an invariant to virtual threads and PAC re-sign is still needed. >> >> 5. PACStack uses the signed return address of caller as the modifier to sign the callee's return address. In this way, we get one PACed call chain. The modifier should be saved into somewhere around the frame record. Inevitably, FP sh... > > Hao Sun has updated the pull request incrementally with one additional commit since the last revision: > > Introduce CPU_OVERRIDES_RETURN_ADDRESS_ACCESSORS src/hotspot/share/runtime/continuationFreezeThaw.cpp line 689: > 687: // patch return pc of the bottom-most frozen frame (now in the chunk) with the actual caller's return address > 688: intptr_t* chunk_bottom_sp = chunk_top + cont_size() - _cont.argsize() - frame::metadata_words_at_top; > 689: assert(_empty || ContinuationHelper::return_address_at(chunk_bottom_sp-frame::sender_sp_ret_address_offset()) == StubRoutines::cont_returnBarrier(), ""); You can't have empty assertion comments. Also, this line is way too long. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1322807958 From aph at openjdk.org Tue Sep 12 10:34:45 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 12 Sep 2023 10:34:45 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v8] In-Reply-To: References: Message-ID: On Tue, 12 Sep 2023 06:21:13 GMT, Hao Sun wrote: >> ### Background >> >> 1. PAC-RET branch protection was initially implemented on Linux/AArch64 in JDK-8277204 [1]. >> >> 2. However, it was broken with the introduction of virtual threads [2], mainly because the continuation freeze/thaw mechanism would trigger stack copying to/from memory, whereas the saved and signed LR on the stack doesn't get re-signed accordingly. >> >> 3. PR-9067 [3] tried to implement the re-sign part, but it was not accepted because option "PreserveFramePointer" is always turned on by PAC-RET but this would slow down virtual threads by ~5-20x. >> >> 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview language features are enabled. Note that virtual thread is one preview feature then. >> >> 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. >> >> ### Goal >> >> This patch aims to make PAC-RET compatible with virtual threads. >> >> ### Requirements of virtual threads >> >> R-1: Option "PreserveFramePointer" should be turned off. That is, PAC-RET implementation should not rely on frame pointer FP. Otherwise, the fast path in stack copying will never be taken. >> >> R-2: Use some invariant values to stack copying as the modifier, so as to avoid the PAC re-sign for continuation thaw, as the fast path in stack copying doesn't walk the frame. >> >> Note that more details can be found in the discussion [3]. >> >> ### Investigation >> >> We considered to use (relative) stack pointer SP, thread ID, PACStack [7] and value zero as the candidate modifier. >> >> 1. SP: In some scenarios, we need to authenticate the return address in places where the current SP doesn't match the SP on function entry. E.g. see the usage in Runtime1::generate_handle_exception(). Hence, neither absolute nor relative SP works. >> >> 2. thread ID (tid): It's invariant to virtual thread, but it's nontrivial to access it from the JIT side. We need 1) firstly resolve the address of current thread (See [8] as an example), and 2) get the tid field in the way like java_lang_Thread::thread_id(). I suppose this would introduce big performance overhead. Then can we turn to use "rthread" register (JavaThread object address) as the modifier? Unfortunately, it's not an invariant to virtual threads and PAC re-sign is still needed. >> >> 5. PACStack uses the signed return address of caller as the modifier to sign the callee's return address. In this way, we get one PACed call chain. The modifier should be saved into somewhere around the frame record. Inevitably, FP sh... > > Hao Sun has updated the pull request incrementally with one additional commit since the last revision: > > Introduce CPU_OVERRIDES_RETURN_ADDRESS_ACCESSORS src/hotspot/share/runtime/continuationFreezeThaw.cpp line 678: > 676: > 677: intptr_t* chunk_top = chunk->start_address() + chunk_new_sp; > 678: assert(_empty || ContinuationHelper::return_address_at(_orig_chunk_sp - frame::sender_sp_ret_address_offset()) == chunk->pc(), ""); This line is way too long too. Suggestion: if (! _empty) { address *retaddr_slot = _orig_chunk_sp - frame::sender_sp_ret_address_offset(); assert(ContinuationHelper::return_address_at(retaddr_slot) == chunk->pc(), "Saved return address is bad"); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1322841143 From jvernee at openjdk.org Tue Sep 12 10:49:38 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 12 Sep 2023 10:49:38 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v21] In-Reply-To: References: Message-ID: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: - add missing space + reflow lines - Fix typo Co-authored-by: ExE Boss <3889017+ExE-Boss at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15103/files - new: https://git.openjdk.org/jdk/pull/15103/files/0e702f06..e68b95c1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=19-20 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From shade at openjdk.org Tue Sep 12 11:46:35 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 12 Sep 2023 11:46:35 GMT Subject: RFR: 8313202: MutexLocker should disallow null Mutexes [v8] In-Reply-To: References: Message-ID: > As seen in [JDK-8313081](https://bugs.openjdk.org/browse/JDK-8313081), it is fairly easy to pass nullptr `Mutex` to `MutexLocker` by accident, which would just silently avoid the lock. > > There are a few places in Hotspot where we pass `nullptr` to simulate re-entrancy and/or conditionally take the lock. Those places can be more explicit, and the default `MutexLocker` can disallow nullptrs for extra safety. > > More thorough testing with different GC/JIT combinations is running now, we might find more issues there. Meanwhile, please comment on the approach. > > Additional testing: > - [x] `grep -R "MutexLocker " src/hotspot | grep -i null`, only new `ConditionalMutexLocker` hits > - [x] `grep -R "MutexLocker " src/hotspot | grep -i ?`, no hits > - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` x `Serial Parallel G1 Shenandoah` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: - Merge branch 'master' into JDK-8313202-mutexlocker-nulls - Touchup whitespace - Rewrite jvmtiManageCapabilities lock usage - Re-instate old asserts - Merge branch 'master' into JDK-8313202-mutexlocker-nulls - Merge branch 'master' into JDK-8313202-mutexlocker-nulls - Merge branch 'master' into JDK-8313202-mutexlocker-nulls - Merge branch 'master' into JDK-8313202-mutexlocker-nulls - Accept one more potentially nullptr mutex - Merge branch 'master' into JDK-8313202-mutexlocker-nulls - ... and 4 more: https://git.openjdk.org/jdk/compare/ac7097a6...e3da7697 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15043/files - new: https://git.openjdk.org/jdk/pull/15043/files/3676fa71..e3da7697 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15043&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15043&range=06-07 Stats: 26710 lines in 843 files changed: 15579 ins; 7371 del; 3760 mod Patch: https://git.openjdk.org/jdk/pull/15043.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15043/head:pull/15043 PR: https://git.openjdk.org/jdk/pull/15043 From luhenry at openjdk.org Tue Sep 12 12:10:40 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 12 Sep 2023 12:10:40 GMT Subject: RFR: 8315652: RISC-V: Features string uses wrong separator for jtreg [v3] In-Reply-To: References: <3oL6QCLCgkfDLQrUohFBdczMAVuJXdLvAkwU7xWhaI8=.595b5953-9501-4e6c-a265-a9f014634c04@github.com> Message-ID: On Fri, 8 Sep 2023 07:38:14 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> As described in jbs, this handles both cases with a rough solution by having two strings. >> Meaning we get e.g. 'v' as a separate feature from CPUInfo, but we still get the pretty string in e.g. hs_err. >> >> Tested tier1 on qemu rv. > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > One features string Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15579#pullrequestreview-1622090606 From luhenry at openjdk.org Tue Sep 12 12:10:39 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 12 Sep 2023 12:10:39 GMT Subject: RFR: 8315743: RISC-V: Cleanup masm lr()/sc() methods In-Reply-To: References: Message-ID: On Wed, 6 Sep 2023 06:08:55 GMT, Robbin Ehn wrote: > Hi, please consider this small cleanup. > > Tested tier1 on qemu RV Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15578#pullrequestreview-1622089293 From luhenry at openjdk.org Tue Sep 12 14:35:13 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 12 Sep 2023 14:35:13 GMT Subject: RFR: 8315934: RISC-V: Disable conservative fences per vendor Message-ID: <-T6bk7lMUI_lCdASJQiE0q1AtxvqwbfxhnQ38JgbGbY=.d9d1e1f8-3b84-41bd-a9d9-34a52d42cf5a@github.com> Conservative fences are not a requirement on some RISC-V hardware for correctness, but can bring a performance penalty. Let's make sure we disable them on a per-vendor basis, and keep them enabled for the default case. ------------- Commit messages: - 8315934: RISC-V: Disable conservative fences per vendor Changes: https://git.openjdk.org/jdk/pull/15684/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15684&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8315934 Stats: 21 lines in 3 files changed: 17 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15684.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15684/head:pull/15684 PR: https://git.openjdk.org/jdk/pull/15684 From rrich at openjdk.org Tue Sep 12 16:08:26 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 12 Sep 2023 16:08:26 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v6] In-Reply-To: References: Message-ID: > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: objArrayOopDesc::oop_oop_iterate_bounded must be defined in objArrayOop.inline.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/67edf286..d535a10b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=04-05 Stats: 17 lines in 2 files changed: 9 ins; 8 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From rrich at openjdk.org Tue Sep 12 16:08:30 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 12 Sep 2023 16:08:30 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v3] In-Reply-To: References: Message-ID: On Wed, 16 Aug 2023 08:28:01 GMT, Richard Reingruber wrote: >> src/hotspot/share/oops/objArrayKlass.inline.hpp line 121: >> >>> 119: } >>> 120: >>> 121: template >> >> It looks to me like this implementation is misplaced, I believe it should reside in objArrayOop.inline.hpp. > > Indeed, nice catch! I'll fix it asap when back. > And thanks for looking at the change! Fixed. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1323263929 From rrich at openjdk.org Tue Sep 12 16:24:40 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 12 Sep 2023 16:24:40 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v5] In-Reply-To: References: <4BxTPGct1ALNjXupIUdMtGT3ZfB0qalFlNq0Vva162Y=.17fc708f-0eda-43e0-bfe6-8447ee5b3488@github.com> Message-ID: On Tue, 12 Sep 2023 08:05:43 GMT, Richard Reingruber wrote: > Note the ~2s increase in `GC(1)` young-gc pause. I've done some experimenting with DelayInducer. For all runs I used `-Xms3g -Xmx3g -XX:+UseParallelGC`. The durations given are the duration of GC(1). BL: Baseline NEW: https://github.com/openjdk/jdk/pull/14846/commits/d535a10b1ad47bef224dc15111774ed2ff904ed8 NEW*: is NEW with 4x larger stripes #### 1 GC Thread BL: stable at 1.9s NEW: stable at 5.6s NEW*: stable at 2.9s #### 2 GC Thread BL: either 2.4s or 4.9s NEW: stable at 3.5s NEW*: stable at 2.3s #### 8 GC Thread BL: 4.9s to 10.5s NEW: 1.4s to 1.6s NEW*: stable at 1.4s ### Observations * NEW scales as expected. * Even with just 2 threads there is inverse scaling with BL. * Some BL runs with 2 threads are faster and some are slower than NEW. * Bad scaling of BL with 8 threads. NEW is much better. Also better than BL singled threaded. * The issue can be mitigated by increasing the stripe size. * DelayInducer results are not sensitive to stripe size of BL (no numbers given) ### Interpretation So it helps to split the work in less pieces. To me this seems to support the adoc explanation given above. By default a stripe corresponds to 128 cards. 1 card corresponds by default to 512 bytes heap. So per 1G of old generation we get 16k stripes. Thats a whole lot for just 2 threads. I guess even just 1k stripes would be enough. With fewer stripes we get less interruptions and better per thread performance. I think it would be worth revisiting the sizing of stripes. Maybe it would be better to have a fixed number of stripes? Maybe dependent on the number of threads? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1716039824 From matsaave at openjdk.org Tue Sep 12 16:45:19 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 12 Sep 2023 16:45:19 GMT Subject: RFR: 8313638: Add test for dump of resolved references Message-ID: The change in [JDK-8306582](https://bugs.openjdk.org/browse/JDK-8306582) revealed that the state of the resolved references array is not checked in the CDS archive. This patch adds a test to ensure that the resolved references array is correct whether the application is archived or not. ------------- Commit messages: - 8313638: Add test for dump of resolved references Changes: https://git.openjdk.org/jdk/pull/15686/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15686&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8313638 Stats: 198 lines in 5 files changed: 198 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15686.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15686/head:pull/15686 PR: https://git.openjdk.org/jdk/pull/15686 From djelinski at openjdk.org Tue Sep 12 17:22:30 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Tue, 12 Sep 2023 17:22:30 GMT Subject: RFR: 8316125: Windows call_stub unnecessarily saves xmm16-31 when UseAVX>=3 Message-ID: Please review this patch that removes saving of xmm16-xmm31 registers from Windows call_stub. Windows ABI only mandates saving xmm6-xmm15, which we continue to do here. No new tests. Mach5 tier1-5 builds and tests clean. ------------- Commit messages: - Use AVX(1) instructions for register saving - Stop saving xmm16-31 in Windows call_stub Changes: https://git.openjdk.org/jdk/pull/15688/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15688&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316125 Stats: 26 lines in 3 files changed: 0 ins; 17 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/15688.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15688/head:pull/15688 PR: https://git.openjdk.org/jdk/pull/15688 From ccheung at openjdk.org Tue Sep 12 17:46:38 2023 From: ccheung at openjdk.org (Calvin Cheung) Date: Tue, 12 Sep 2023 17:46:38 GMT Subject: RFR: 8313638: Add test for dump of resolved references In-Reply-To: References: Message-ID: On Tue, 12 Sep 2023 16:37:04 GMT, Matias Saavedra Silva wrote: > The change in [JDK-8306582](https://bugs.openjdk.org/browse/JDK-8306582) revealed that the state of the resolved references array is not checked in the CDS archive. This patch adds a test to ensure that the resolved references array is correct whether the application is archived or not. Nice test addition. I have one comment. Looks good otherwise. test/hotspot/jtreg/runtime/cds/appcds/sharedStrings/ResolvedReferencesWb.java line 32: > 30: if (args.length < 2 && args[0].equals("--isArchived")) { > 31: throw new RuntimeException("Test requires --isArchived flag"); > 32: } What happens if `args.length` is 0? Also, the message for the `RuntimeExeption` could be clearer as follows: ` throw new RuntimeException("Test requires two args: --isArchived [true|false]");` I think the "--isArchived" may not be needed, just passing in "true" or "false" should be sufficient. ------------- PR Review: https://git.openjdk.org/jdk/pull/15686#pullrequestreview-1622812395 PR Review Comment: https://git.openjdk.org/jdk/pull/15686#discussion_r1323370476 From matsaave at openjdk.org Tue Sep 12 18:50:22 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 12 Sep 2023 18:50:22 GMT Subject: RFR: 8313638: Add test for dump of resolved references [v2] In-Reply-To: References: Message-ID: > The change in [JDK-8306582](https://bugs.openjdk.org/browse/JDK-8306582) revealed that the state of the resolved references array is not checked in the CDS archive. This patch adds a test to ensure that the resolved references array is correct whether the application is archived or not. Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: - Fixed spacing - Calvin comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15686/files - new: https://git.openjdk.org/jdk/pull/15686/files/a400903d..9b649b9c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15686&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15686&range=00-01 Stats: 12 lines in 2 files changed: 4 ins; 1 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/15686.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15686/head:pull/15686 PR: https://git.openjdk.org/jdk/pull/15686 From erikj at openjdk.org Tue Sep 12 20:18:51 2023 From: erikj at openjdk.org (Erik Joelsson) Date: Tue, 12 Sep 2023 20:18:51 GMT Subject: RFR: 8267174: Many test files have the wrong Copyright header In-Reply-To: References: Message-ID: On Tue, 5 Sep 2023 22:49:41 GMT, Erik Joelsson wrote: > There are a number of files in the `test` directory that have an incorrect copyright header, which includes the "classpath" exception text. This patch removes that text from all test files that I could find it in. I did this using a combination of `sed` and `grep`. Reviewing this patch is probably easier using the raw patch file or a suitable webrev format. > > It's my assumption that these headers were introduced by mistake as it's quite easy to copy the wrong template when creating new files. Thanks for reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15573#issuecomment-1716362585 From erikj at openjdk.org Tue Sep 12 20:18:52 2023 From: erikj at openjdk.org (Erik Joelsson) Date: Tue, 12 Sep 2023 20:18:52 GMT Subject: Integrated: 8267174: Many test files have the wrong Copyright header In-Reply-To: References: Message-ID: <0l5wbPPACAFDpuzCNnCNkr_RJ35CFXxPC9EpiVFt_Ao=.fbfbc877-43fd-4706-9d0d-bace4d004632@github.com> On Tue, 5 Sep 2023 22:49:41 GMT, Erik Joelsson wrote: > There are a number of files in the `test` directory that have an incorrect copyright header, which includes the "classpath" exception text. This patch removes that text from all test files that I could find it in. I did this using a combination of `sed` and `grep`. Reviewing this patch is probably easier using the raw patch file or a suitable webrev format. > > It's my assumption that these headers were introduced by mistake as it's quite easy to copy the wrong template when creating new files. This pull request has now been integrated. Changeset: 020255a7 Author: Erik Joelsson URL: https://git.openjdk.org/jdk/commit/020255a72dc374ba0bdd44772047f14a8bfe69a9 Stats: 1944 lines in 648 files changed: 0 ins; 1296 del; 648 mod 8267174: Many test files have the wrong Copyright header Reviewed-by: valeriep, aivanov, iris, dholmes, ihse ------------- PR: https://git.openjdk.org/jdk/pull/15573 From iklam at openjdk.org Tue Sep 12 21:03:41 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 12 Sep 2023 21:03:41 GMT Subject: RFR: 8313638: Add test for dump of resolved references [v2] In-Reply-To: References: Message-ID: On Tue, 12 Sep 2023 18:50:22 GMT, Matias Saavedra Silva wrote: >> The change in [JDK-8306582](https://bugs.openjdk.org/browse/JDK-8306582) revealed that the state of the resolved references array is not checked in the CDS archive. This patch adds a test to ensure that the resolved references array is correct whether the application is archived or not. > > Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: > > - Fixed spacing > - Calvin comments The test logic looks good to me. I think the comments can be improved. test/hotspot/jtreg/runtime/cds/appcds/sharedStrings/ResolvedReferencesNotNullTest.java line 47: > 45: // If ResolvedReferencesTestApp is not archived, the resolvedReferences array should only contain > 46: // the objects fooString and barString since they are static. The object > 47: // quxString should NOT be in the array since the method that returns it is not called I think this comment should be moved into ResolvedReferencesWb.java, so it's closed to the actual test logic. test/hotspot/jtreg/runtime/cds/appcds/sharedStrings/ResolvedReferencesWb.java line 61: > 59: // them inside the resolvedReferences array. The strings fooString and > 60: // barString must have been found but not quxString unless ResolvedReferencesTestApp is archived. > 61: if (isArchived) { I think the comment would be clear if you break it down in the two case, with something like this: if (isArchived) { // CDS eagerly resolves all the string literals in the ConstantPool. At this point, all // three strings should be in the resolvedReferences array. } else { // If the class is not archived, the string literals in the ConstantPool are resolved // on-demand. At this point, ResolvedReferencesTestApp:: has been executed // so the two strings used there should be in the resolvedReferences array. // ResolvedReferencesTestApp::qux() is not executed so "quxString" // should not yet be resolved. } ------------- PR Review: https://git.openjdk.org/jdk/pull/15686#pullrequestreview-1623109005 PR Review Comment: https://git.openjdk.org/jdk/pull/15686#discussion_r1323550590 PR Review Comment: https://git.openjdk.org/jdk/pull/15686#discussion_r1323559479 From matsaave at openjdk.org Tue Sep 12 21:42:34 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 12 Sep 2023 21:42:34 GMT Subject: RFR: 8313638: Add test for dump of resolved references [v3] In-Reply-To: References: Message-ID: > The change in [JDK-8306582](https://bugs.openjdk.org/browse/JDK-8306582) revealed that the state of the resolved references array is not checked in the CDS archive. This patch adds a test to ensure that the resolved references array is correct whether the application is archived or not. Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Ioi Comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15686/files - new: https://git.openjdk.org/jdk/pull/15686/files/9b649b9c..99c18a91 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15686&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15686&range=01-02 Stats: 19 lines in 2 files changed: 12 ins; 6 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15686.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15686/head:pull/15686 PR: https://git.openjdk.org/jdk/pull/15686 From jvernee at openjdk.org Tue Sep 12 23:15:38 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 12 Sep 2023 23:15:38 GMT Subject: RFR: 8316125: Windows call_stub unnecessarily saves xmm16-31 when UseAVX>=3 In-Reply-To: References: Message-ID: On Tue, 12 Sep 2023 17:10:38 GMT, Daniel Jeli?ski wrote: > Please review this patch that removes saving of xmm16-xmm31 registers from Windows call_stub. > > Windows ABI only mandates saving xmm6-xmm15, which we continue to do here. > > No new tests. Mach5 tier1-5 builds and tests clean. We've discussed this offline. Based on the calling convention described here: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#callercallee-saved-registers these registers are 'volatile', i.e. _not_ callee saved. That means that in this case, where we have a C function calling into Java, we don't need to save these registers for our C caller. The relevant parts behind that link copied here (part bolded by me for emphasis): > The x64 ABI considers the registers RAX, RCX, RDX, R8, R9, R10, R11, and XMM0-XMM5 volatile. When present, the upper portions of YMM0-YMM15 and ZMM0-ZMM15 are also volatile. **On AVX512VL, the ZMM, YMM, and XMM registers 16-31 are also volatile.** When AMX support is present, the TMM tile registers are volatile. Consider volatile registers destroyed on function calls unless otherwise safety-provable by analysis such as whole program optimization. > > The x64 ABI considers registers RBX, RBP, RDI, RSI, RSP, R12, R13, R14, R15, and XMM6-XMM15 nonvolatile. They must be saved and restored by a function that uses them. ------------- Marked as reviewed by jvernee (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15688#pullrequestreview-1623346494 PR Comment: https://git.openjdk.org/jdk/pull/15688#issuecomment-1716643255 From dholmes at openjdk.org Wed Sep 13 02:02:37 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 13 Sep 2023 02:02:37 GMT Subject: RFR: 8316125: Windows call_stub unnecessarily saves xmm16-31 when UseAVX>=3 In-Reply-To: References: Message-ID: On Tue, 12 Sep 2023 17:10:38 GMT, Daniel Jeli?ski wrote: > Please review this patch that removes saving of xmm16-xmm31 registers from Windows call_stub. > > Windows ABI only mandates saving xmm6-xmm15, which we continue to do here. > > No new tests. Mach5 tier1-5 builds and tests clean. This code originally came from the Intel folk: https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2015-April/017579.html It would be good if someone from Intel could review the changes. I always like to try and understand why something was originally done the way it was. Perhaps it predates the Windows ABI defining these as volatile? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15688#issuecomment-1716819302 From rehn at openjdk.org Wed Sep 13 05:01:47 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 13 Sep 2023 05:01:47 GMT Subject: Integrated: 8315743: RISC-V: Cleanup masm lr()/sc() methods In-Reply-To: References: Message-ID: On Wed, 6 Sep 2023 06:08:55 GMT, Robbin Ehn wrote: > Hi, please consider this small cleanup. > > Tested tier1 on qemu RV This pull request has now been integrated. Changeset: 1ebf510e Author: Robbin Ehn URL: https://git.openjdk.org/jdk/commit/1ebf510e5a42c7b53720ed94e39e081f74821fc1 Stats: 17 lines in 2 files changed: 2 ins; 0 del; 15 mod 8315743: RISC-V: Cleanup masm lr()/sc() methods Reviewed-by: fyang, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/15578 From rehn at openjdk.org Wed Sep 13 05:02:50 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 13 Sep 2023 05:02:50 GMT Subject: Integrated: 8315652: RISC-V: Features string uses wrong separator for jtreg In-Reply-To: <3oL6QCLCgkfDLQrUohFBdczMAVuJXdLvAkwU7xWhaI8=.595b5953-9501-4e6c-a265-a9f014634c04@github.com> References: <3oL6QCLCgkfDLQrUohFBdczMAVuJXdLvAkwU7xWhaI8=.595b5953-9501-4e6c-a265-a9f014634c04@github.com> Message-ID: On Wed, 6 Sep 2023 06:14:07 GMT, Robbin Ehn wrote: > Hi, please consider. > > As described in jbs, this handles both cases with a rough solution by having two strings. > Meaning we get e.g. 'v' as a separate feature from CPUInfo, but we still get the pretty string in e.g. hs_err. > > Tested tier1 on qemu rv. This pull request has now been integrated. Changeset: cbbfa0dd Author: Robbin Ehn URL: https://git.openjdk.org/jdk/commit/cbbfa0ddfb1485edfc6275dd7085b3096f7eccf6 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod 8315652: RISC-V: Features string uses wrong separator for jtreg Reviewed-by: fyang, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/15579 From thartmann at openjdk.org Wed Sep 13 05:17:39 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 13 Sep 2023 05:17:39 GMT Subject: RFR: 8315082: [REDO] Generational ZGC: Tests crash with assert(index == 0 || is_power_of_2(index)) In-Reply-To: References: Message-ID: On Wed, 6 Sep 2023 11:54:04 GMT, Roberto Casta?eda Lozano wrote: > This changeset (REDO of [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749)) ensures that the array copy stub underlying the intrinsic implementation of `Object.clone` only copies its (double-word aligned) payload, excluding the remaining object alignment padding words, when a non-default `ObjectAlignmentInBytes` value is used. This prevents the specialized ZGC stubs for `Object[]` array copy from processing undefined object alignment padding words as valid object pointers. For further details about the specific failure, see initial analysis of [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749) by Erik ?sterlund and Stefan Karlsson and comments in `test/hotspot/jtreg/compiler/gcbarriers/TestArrayCopyWithLargeObjectAlignment.java`. > > As a side-benefit, the changeset simplifies the array size computation logic in `GraphKit::new_array()` by decoupling computation of header size and alignment padding size. > > #### Additional changes compared to [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749) > > This changeset proposes the exact same solution as [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749), that is, identical changes to `barrierSetC2.cpp`, `graphKit.cpp`, `library_call.cpp`, and `TestArrayCopyWithLargeObjectAlignment.java`. On top of that, it relaxes an assertion in the idealization of `ArrayCopy` nodes violated by [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749) and reported in [JDK-8315029](https://bugs.openjdk.org/browse/JDK-8315029) (new changes in `arraycopynode.cpp`, new regression test `TestCloneArrayWithDifferentLengthConstness.java`). The original, stricter assertion checks that, while idealizing an ArrayCopy node, the "constness" of the array copy's word-length (whether it is known by C2 to be constant or not) is equivalent to that of the array copy's element-length. For cases in which the element-length is within a small, fixed range (e.g. for an `int` array of length `3..4`) so that all element-length values lead to the same number of wo rds (`2`), the assertion used to hold before this changeset only because of weak type propagation in `AndL` (preventing the constant word-length to be discovered), see the left graph below: > > ![from-element-to-word-length](https://github.com/openjdk/jdk/assets/8792647/3d5535cf-4afa-46dd-bc48-30430eead12f) > > With the proposed changes, the array copy word-length is computed in a more straightforward way that enables C2 to infer the precise number of words in the same scenario ... Thanks for the detailed analysis and explanation. The fix looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15589#pullrequestreview-1623703259 From shade at openjdk.org Wed Sep 13 07:35:58 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 13 Sep 2023 07:35:58 GMT Subject: RFR: 8313202: MutexLocker should disallow null Mutexes [v8] In-Reply-To: References: Message-ID: On Tue, 12 Sep 2023 11:46:35 GMT, Aleksey Shipilev wrote: >> As seen in [JDK-8313081](https://bugs.openjdk.org/browse/JDK-8313081), it is fairly easy to pass nullptr `Mutex` to `MutexLocker` by accident, which would just silently avoid the lock. >> >> There are a few places in Hotspot where we pass `nullptr` to simulate re-entrancy and/or conditionally take the lock. Those places can be more explicit, and the default `MutexLocker` can disallow nullptrs for extra safety. >> >> More thorough testing with different GC/JIT combinations is running now, we might find more issues there. Meanwhile, please comment on the approach. >> >> Additional testing: >> - [x] `grep -R "MutexLocker " src/hotspot | grep -i null`, only new `ConditionalMutexLocker` hits >> - [x] `grep -R "MutexLocker " src/hotspot | grep -i ?`, no hits >> - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` x `Serial Parallel G1 Shenandoah` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - Merge branch 'master' into JDK-8313202-mutexlocker-nulls > - Touchup whitespace > - Rewrite jvmtiManageCapabilities lock usage > - Re-instate old asserts > - Merge branch 'master' into JDK-8313202-mutexlocker-nulls > - Merge branch 'master' into JDK-8313202-mutexlocker-nulls > - Merge branch 'master' into JDK-8313202-mutexlocker-nulls > - Merge branch 'master' into JDK-8313202-mutexlocker-nulls > - Accept one more potentially nullptr mutex > - Merge branch 'master' into JDK-8313202-mutexlocker-nulls > - ... and 4 more: https://git.openjdk.org/jdk/compare/6cf662c5...e3da7697 All right, let me integrate today. We would have the rest of the week to figure out if anything else is broken. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15043#issuecomment-1717095982 From shade at openjdk.org Wed Sep 13 07:35:59 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 13 Sep 2023 07:35:59 GMT Subject: Integrated: 8313202: MutexLocker should disallow null Mutexes In-Reply-To: References: Message-ID: On Wed, 26 Jul 2023 17:06:02 GMT, Aleksey Shipilev wrote: > As seen in [JDK-8313081](https://bugs.openjdk.org/browse/JDK-8313081), it is fairly easy to pass nullptr `Mutex` to `MutexLocker` by accident, which would just silently avoid the lock. > > There are a few places in Hotspot where we pass `nullptr` to simulate re-entrancy and/or conditionally take the lock. Those places can be more explicit, and the default `MutexLocker` can disallow nullptrs for extra safety. > > More thorough testing with different GC/JIT combinations is running now, we might find more issues there. Meanwhile, please comment on the approach. > > Additional testing: > - [x] `grep -R "MutexLocker " src/hotspot | grep -i null`, only new `ConditionalMutexLocker` hits > - [x] `grep -R "MutexLocker " src/hotspot | grep -i ?`, no hits > - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` x `Serial Parallel G1 Shenandoah` This pull request has now been integrated. Changeset: 2d168c57 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/2d168c573402c0fc3dfb6c1fe6f48ec46997fa67 Stats: 110 lines in 18 files changed: 46 ins; 13 del; 51 mod 8313202: MutexLocker should disallow null Mutexes Reviewed-by: dholmes, coleenp, dcubed ------------- PR: https://git.openjdk.org/jdk/pull/15043 From rehn at openjdk.org Wed Sep 13 09:40:39 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 13 Sep 2023 09:40:39 GMT Subject: RFR: 8315934: RISC-V: Disable conservative fences per vendor In-Reply-To: <-T6bk7lMUI_lCdASJQiE0q1AtxvqwbfxhnQ38JgbGbY=.d9d1e1f8-3b84-41bd-a9d9-34a52d42cf5a@github.com> References: <-T6bk7lMUI_lCdASJQiE0q1AtxvqwbfxhnQ38JgbGbY=.d9d1e1f8-3b84-41bd-a9d9-34a52d42cf5a@github.com> Message-ID: On Tue, 12 Sep 2023 14:27:40 GMT, Ludovic Henry wrote: > Conservative fences are not a requirement on some RISC-V hardware for correctness, but can bring a performance penalty. Let's make sure we disable them on a per-vendor basis, and keep them enabled for the default case. Seems reasonable when looking at other platforms vm versions. You may consider turn off ztso if someone really wants a lot of extra fencing :) ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15684#pullrequestreview-1624125478 From dnsimon at openjdk.org Wed Sep 13 09:57:57 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 13 Sep 2023 09:57:57 GMT Subject: RFR: 8315954: getArgumentValues002.java fails on Graal Message-ID: This PR adds `ResolvedJavaMethod.getLiveObjectLocalsAt` to get the oop map for a method at a given BCI. This is required to do correct clearing of oop locals at OSR entry points. As part of this addition, I needed to be able to detect requests for oop maps at invalid BCIs. For this, I added `InterpreterOopMap::has_valid_mask()`. When an oop map computation is requested for an invalid BCI, this method returns false. ------------- Commit messages: - added ResolvedJavaMethod.getLiveObjectLocalsAt Changes: https://git.openjdk.org/jdk/pull/15705/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15705&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8315954 Stats: 333 lines in 7 files changed: 309 ins; 0 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/15705.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15705/head:pull/15705 PR: https://git.openjdk.org/jdk/pull/15705 From dnsimon at openjdk.org Wed Sep 13 09:57:59 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 13 Sep 2023 09:57:59 GMT Subject: RFR: 8315954: getArgumentValues002.java fails on Graal In-Reply-To: References: Message-ID: On Wed, 13 Sep 2023 09:46:01 GMT, Doug Simon wrote: > This PR adds `ResolvedJavaMethod.getLiveObjectLocalsAt` to get the oop map for a method at a given BCI. This is required to do correct clearing of oop locals at OSR entry points. > > As part of this addition, I needed to be able to detect requests for oop maps at invalid BCIs. For this, I added `InterpreterOopMap::has_valid_mask()`. When an oop map computation is requested for an invalid BCI, this method returns false. src/hotspot/share/interpreter/oopMapCache.cpp line 204: > 202: void InterpreterOopMap::initialize() { > 203: _method = nullptr; > 204: _mask_size = INT_MAX; // This value should cause a failure quickly Unless I'm mistaken, `USHRT_MAX` is a legal (but unlikely) value (i.e. `max_locals` in a class file can be 65635) so I changed this to use `INT_MAX` instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15705#discussion_r1324264486 From mli at openjdk.org Wed Sep 13 10:02:40 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 13 Sep 2023 10:02:40 GMT Subject: RFR: 8315934: RISC-V: Disable conservative fences per vendor In-Reply-To: <-T6bk7lMUI_lCdASJQiE0q1AtxvqwbfxhnQ38JgbGbY=.d9d1e1f8-3b84-41bd-a9d9-34a52d42cf5a@github.com> References: <-T6bk7lMUI_lCdASJQiE0q1AtxvqwbfxhnQ38JgbGbY=.d9d1e1f8-3b84-41bd-a9d9-34a52d42cf5a@github.com> Message-ID: <_9U4XFJU2eT5XftcEJj-sAd3AeNmEBrBkPmh8PduttE=.c3252521-51df-478f-aec3-27b37eee3066@github.com> On Tue, 12 Sep 2023 14:27:40 GMT, Ludovic Henry wrote: > Conservative fences are not a requirement on some RISC-V hardware for correctness, but can bring a performance penalty. Let's make sure we disable them on a per-vendor basis, and keep them enabled for the default case. Marked as reviewed by mli (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15684#pullrequestreview-1624169110 From stefank at openjdk.org Wed Sep 13 11:19:05 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 13 Sep 2023 11:19:05 GMT Subject: RFR: 8316179: Use consistent naming for lightweight locking in MacroAssembler Message-ID: Different platforms uses different names for the `MacroAssembler` functions that implement the lightweight fast locking/unlocking code. I propose that we use consistent naming for all platforms. These are the current names for the lightweight-locking functions: * AArch64, ppc, riscv: `fast_lock` * x86: `fast_lock_impl` * arm: `fast_lock_2` Note that x86 and arm uses different names and the likely reason for that is that the `C2_MacroAssembler` subclass also implements a fast_lock function in that class, on those platforms. The fast_lock function in `C2_MacroAssembler` deals with the fast locking for all `LockingMode` implementations (monitor, legacy, and lightweight), while the `MacroAssembler::fast_lock*` functions only implement the lightweight locking implementation. I therefore propose that we use the name `MacroAssembler::lightweight_lock` on all platforms. *Note* that this is a small cleanup to update the names. The reason why I'm looking into this is that I want to move the C2 fast locking code out of the AArch64 (and other platforms) .ad file into C++ files to make it consistent with the x64 code structure (and to get better IDE support when the code is in plain C++ files). In that restructuring of the code I'm introducing `C2_MacroAssembler::fast_lock` functions that currently name-clash / shadow the `MacroAssembler::fast_lock` functions. ------------- Commit messages: - Use consistent naming for MacroAssembler lightweight locking Changes: https://git.openjdk.org/jdk/pull/15709/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15709&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316179 Stats: 78 lines in 29 files changed: 0 ins; 0 del; 78 mod Patch: https://git.openjdk.org/jdk/pull/15709.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15709/head:pull/15709 PR: https://git.openjdk.org/jdk/pull/15709 From rkennke at openjdk.org Wed Sep 13 11:31:48 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 13 Sep 2023 11:31:48 GMT Subject: RFR: 8316179: Use consistent naming for lightweight locking in MacroAssembler In-Reply-To: References: Message-ID: On Wed, 13 Sep 2023 11:00:13 GMT, Stefan Karlsson wrote: > Different platforms uses different names for the `MacroAssembler` functions that implement the lightweight fast locking/unlocking code. I propose that we use consistent naming for all platforms. > > These are the current names for the lightweight-locking functions: > * AArch64, ppc, riscv: `fast_lock` > * x86: `fast_lock_impl` > * arm: `fast_lock_2` > > Note that x86 and arm uses different names and the likely reason for that is that the `C2_MacroAssembler` subclass also implements a fast_lock function in that class, on those platforms. > > The fast_lock function in `C2_MacroAssembler` deals with the fast locking for all `LockingMode` implementations (monitor, legacy, and lightweight), while the `MacroAssembler::fast_lock*` functions only implement the lightweight locking implementation. > > I therefore propose that we use the name `MacroAssembler::lightweight_lock` on all platforms. > > *Note* that this is a small cleanup to update the names. The reason why I'm looking into this is that I want to move the C2 fast locking code out of the AArch64 (and other platforms) .ad file into C++ files to make it consistent with the x64 code structure (and to get better IDE support when the code is in plain C++ files). In that restructuring of the code I'm introducing `C2_MacroAssembler::fast_lock` functions that currently name-clash / shadow the `MacroAssembler::fast_lock` functions. Looks good to me, thank you! ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15709#pullrequestreview-1624316275 From rkennke at openjdk.org Wed Sep 13 12:40:03 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 13 Sep 2023 12:40:03 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v55] In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 11:56:15 GMT, Roman Kennke wrote: >> See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. >> >> Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. >> >> Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. >> >> Testing: >> - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) >> - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) >> - [x] tier1 (x86_64, x86_32, aarch64, riscv) >> - [x] tier2 (x86_64, aarch64, riscv) >> - [x] tier3 (x86_64, riscv) > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 83 commits: > > - Merge remote-tracking branch 'upstream/master' into JDK-8139457 > - Fix comments > - Fix inconsistencies in argument naming C1_MacroAssembler::allocate_array() > - Fix call to arrayOopDesc::header_size() in arm port > - Fix wrong alignment > - Move away arrayOopDesc::header_size() > - Move alignment-gap-clearing into allocate_array() (aarch64) > - Move header_size_in_bytes closer to length_offset_in_bytes > - RISCV fixes by @RealYFang > - Fix GetObjectSizeIntrinsicsTest.java to work correctly with +/-UseCCP > - ... and 73 more: https://git.openjdk.org/jdk/compare/97b94cb1...f48ad53b Ping... What is the status of this PR? Is it ready to go? AFAICT, it is currently blocked on the CSR, which needs reviewers. Could somebody have a look at this, and/or ping the relevant people (Joe Darcy suggested @PaulSandoz or @mlchung in the CSR)? Thanks, Roman ------------- PR Comment: https://git.openjdk.org/jdk/pull/11044#issuecomment-1717547445 From stefank at openjdk.org Wed Sep 13 13:02:05 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 13 Sep 2023 13:02:05 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v55] In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 11:56:15 GMT, Roman Kennke wrote: >> See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. >> >> Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. >> >> Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. >> >> Testing: >> - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) >> - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) >> - [x] tier1 (x86_64, x86_32, aarch64, riscv) >> - [x] tier2 (x86_64, aarch64, riscv) >> - [x] tier3 (x86_64, riscv) > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 83 commits: > > - Merge remote-tracking branch 'upstream/master' into JDK-8139457 > - Fix comments > - Fix inconsistencies in argument naming C1_MacroAssembler::allocate_array() > - Fix call to arrayOopDesc::header_size() in arm port > - Fix wrong alignment > - Move away arrayOopDesc::header_size() > - Move alignment-gap-clearing into allocate_array() (aarch64) > - Move header_size_in_bytes closer to length_offset_in_bytes > - RISCV fixes by @RealYFang > - Fix GetObjectSizeIntrinsicsTest.java to work correctly with +/-UseCCP > - ... and 73 more: https://git.openjdk.org/jdk/compare/97b94cb1...f48ad53b There's gtest a failure in the GHA run: [ RUN ] arrayOopDesc.double_vm /home/runner/work/jdk/jdk/test/hotspot/gtest/oops/test_arrayOop.cpp:51: Failure check_max_length_overflow(T_DOUBLE) evaluates to false, where T_DOUBLE evaluates to [ FAILED ] arrayOopDesc.double_vm (0 ms) [ RUN ] arrayOopDesc.byte_vm [ OK ] arrayOopDesc.byte_vm (0 ms) [ RUN ] arrayOopDesc.short_vm [ OK ] arrayOopDesc.short_vm (0 ms) [ RUN ] arrayOopDesc.int_vm [ OK ] arrayOopDesc.int_vm (0 ms) [ RUN ] arrayOopDesc.long_vm /home/runner/work/jdk/jdk/test/hotspot/gtest/oops/test_arrayOop.cpp:67: Failure check_max_length_overflow(T_LONG) evaluates to false, where T_LONG evaluates to [ FAILED ] arrayOopDesc.long_vm (0 ms) ------------- PR Comment: https://git.openjdk.org/jdk/pull/11044#issuecomment-1717584433 From fyang at openjdk.org Wed Sep 13 13:32:41 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 13 Sep 2023 13:32:41 GMT Subject: RFR: 8315934: RISC-V: Disable conservative fences per vendor In-Reply-To: <-T6bk7lMUI_lCdASJQiE0q1AtxvqwbfxhnQ38JgbGbY=.d9d1e1f8-3b84-41bd-a9d9-34a52d42cf5a@github.com> References: <-T6bk7lMUI_lCdASJQiE0q1AtxvqwbfxhnQ38JgbGbY=.d9d1e1f8-3b84-41bd-a9d9-34a52d42cf5a@github.com> Message-ID: On Tue, 12 Sep 2023 14:27:40 GMT, Ludovic Henry wrote: > Conservative fences are not a requirement on some RISC-V hardware for correctness, but can bring a performance penalty. Let's make sure we disable them on a per-vendor basis, and keep them enabled for the default case. src/hotspot/cpu/riscv/vm_version_riscv.cpp line 102: > 100: } > 101: > 102: // Enable vendor specific features I guess it's better to have vendor specific settings at the end of this function? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15684#discussion_r1324518216 From fyang at openjdk.org Wed Sep 13 13:38:41 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 13 Sep 2023 13:38:41 GMT Subject: RFR: 8315934: RISC-V: Disable conservative fences per vendor In-Reply-To: <-T6bk7lMUI_lCdASJQiE0q1AtxvqwbfxhnQ38JgbGbY=.d9d1e1f8-3b84-41bd-a9d9-34a52d42cf5a@github.com> References: <-T6bk7lMUI_lCdASJQiE0q1AtxvqwbfxhnQ38JgbGbY=.d9d1e1f8-3b84-41bd-a9d9-34a52d42cf5a@github.com> Message-ID: On Tue, 12 Sep 2023 14:27:40 GMT, Ludovic Henry wrote: > Conservative fences are not a requirement on some RISC-V hardware for correctness, but can bring a performance penalty. Let's make sure we disable them on a per-vendor basis, and keep them enabled for the default case. LGTM. Let's keep the current shape. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15684#pullrequestreview-1624572418 From rkennke at openjdk.org Wed Sep 13 14:17:24 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 13 Sep 2023 14:17:24 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v55] In-Reply-To: References: Message-ID: <6oWO6eKsZasmfvGT45D6hWW1ioDM1Qwej3KqSUoOxLM=.d30979bc-478f-4ce8-905c-16f91f4541d8@github.com> On Wed, 13 Sep 2023 12:58:36 GMT, Stefan Karlsson wrote: > There's gtest a failure in the GHA run: > > ``` > [ RUN ] arrayOopDesc.double_vm > /home/runner/work/jdk/jdk/test/hotspot/gtest/oops/test_arrayOop.cpp:51: Failure > check_max_length_overflow(T_DOUBLE) evaluates to false, where > T_DOUBLE evaluates to ? > > [ FAILED ] arrayOopDesc.double_vm (0 ms) > [ RUN ] arrayOopDesc.byte_vm > [ OK ] arrayOopDesc.byte_vm (0 ms) > [ RUN ] arrayOopDesc.short_vm > [ OK ] arrayOopDesc.short_vm (0 ms) > [ RUN ] arrayOopDesc.int_vm > [ OK ] arrayOopDesc.int_vm (0 ms) > [ RUN ] arrayOopDesc.long_vm > /home/runner/work/jdk/jdk/test/hotspot/gtest/oops/test_arrayOop.cpp:67: Failure > check_max_length_overflow(T_LONG) evaluates to false, where > T_LONG evaluates to ? > > [ FAILED ] arrayOopDesc.long_vm (0 ms) > ``` Aww, this max_array_length() method and 32bit builds. :-/ We should re-write this method altogether and special-case it for !_LP64 and maybe simply make it a switch on the incoming type, with hard-coded values. This might be easier to understand than getting the logic absolutely right. Also, with this change, and even more so with upcoming Lilliput changes, this method is a little too conservative and we could offer somewhat increased array lengths. Alternatively, we could do what the comments suggests and fix up all the uses of the method to use sensible types (size_t?) and make it simple and obvious. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11044#issuecomment-1717720461 From rkennke at openjdk.org Wed Sep 13 14:16:44 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 13 Sep 2023 14:16:44 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v56] In-Reply-To: References: Message-ID: <5SWNSiGQtqblydBU_lLIwfZ-kNZVxzdHJjW-9zdxEoY=.d916d610-32ec-4789-a67d-2f0033135445@github.com> > See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. > > Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. > > Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. > > Testing: > - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] tier1 (x86_64, x86_32, aarch64, riscv) > - [x] tier2 (x86_64, aarch64, riscv) > - [x] tier3 (x86_64, riscv) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix gtest failure on x86 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11044/files - new: https://git.openjdk.org/jdk/pull/11044/files/f48ad53b..bd5a65fd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11044&range=55 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11044&range=54-55 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11044.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/11044/head:pull/11044 PR: https://git.openjdk.org/jdk/pull/11044 From luhenry at openjdk.org Wed Sep 13 14:57:51 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 13 Sep 2023 14:57:51 GMT Subject: Integrated: 8315934: RISC-V: Disable conservative fences per vendor In-Reply-To: <-T6bk7lMUI_lCdASJQiE0q1AtxvqwbfxhnQ38JgbGbY=.d9d1e1f8-3b84-41bd-a9d9-34a52d42cf5a@github.com> References: <-T6bk7lMUI_lCdASJQiE0q1AtxvqwbfxhnQ38JgbGbY=.d9d1e1f8-3b84-41bd-a9d9-34a52d42cf5a@github.com> Message-ID: On Tue, 12 Sep 2023 14:27:40 GMT, Ludovic Henry wrote: > Conservative fences are not a requirement on some RISC-V hardware for correctness, but can bring a performance penalty. Let's make sure we disable them on a per-vendor basis, and keep them enabled for the default case. This pull request has now been integrated. Changeset: a731a24c Author: Ludovic Henry URL: https://git.openjdk.org/jdk/commit/a731a24c93a89df08db7e01c09eb5786889c9207 Stats: 21 lines in 3 files changed: 17 ins; 3 del; 1 mod 8315934: RISC-V: Disable conservative fences per vendor Reviewed-by: rehn, mli, fyang ------------- PR: https://git.openjdk.org/jdk/pull/15684 From iklam at openjdk.org Wed Sep 13 15:08:40 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 13 Sep 2023 15:08:40 GMT Subject: RFR: 8313638: Add test for dump of resolved references [v3] In-Reply-To: References: Message-ID: On Tue, 12 Sep 2023 21:42:34 GMT, Matias Saavedra Silva wrote: >> The change in [JDK-8306582](https://bugs.openjdk.org/browse/JDK-8306582) revealed that the state of the resolved references array is not checked in the CDS archive. This patch adds a test to ensure that the resolved references array is correct whether the application is archived or not. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Ioi Comments LGTM. As we discussed off-line, the comments on line 26-30 are a little redundant so can be removed. ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15686#pullrequestreview-1624800810 From jkern at openjdk.org Wed Sep 13 15:30:22 2023 From: jkern at openjdk.org (Joachim Kern) Date: Wed, 13 Sep 2023 15:30:22 GMT Subject: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX [v2] In-Reply-To: References: Message-ID: <7qbEWz9YaMJEUeVE4KRQQCDs4HPYsGOEL_peLCw44IU=.79b1f378-a648-4dd3-84c7-a6ce32537463@github.com> > After push of [JDK-8307478](https://bugs.openjdk.org/browse/JDK-8307478) , the following test started to fail on AIX : > com/sun/tools/attach/warnings/DynamicLoadWarningTest.java; > The problem was described in [JDK-8309549](https://bugs.openjdk.org/browse/JDK-8309549) with a first try of a fix. > A second fix via [JDK-8310191](https://bugs.openjdk.org/browse/JDK-8310191) was necessary. > Both fixes just disable the specific subtest on AIX, without correction of the root cause. > The root cause is, that dlopen() on AIX returns different handles every time, even if you load a library twice. There is no official AIX API available to get this information on a different way. > My proposal is, to use the stat64x API with the fields st_device and st_inode. After a dlopen() the stat64x() API is called additionally to get this information which is then stored parallel to the library handle in the jvmtiAgent. For AIX we then can compare these values instead of the library handle and get the same functionality as on linux. Joachim Kern has updated the pull request incrementally with one additional commit since the last revision: try to improve code following Davids suggestions and do some cosmetic changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15583/files - new: https://git.openjdk.org/jdk/pull/15583/files/e5b41fb0..46a531b0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15583&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15583&range=00-01 Stats: 102 lines in 5 files changed: 32 ins; 48 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/15583.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15583/head:pull/15583 PR: https://git.openjdk.org/jdk/pull/15583 From jkern at openjdk.org Wed Sep 13 15:30:26 2023 From: jkern at openjdk.org (Joachim Kern) Date: Wed, 13 Sep 2023 15:30:26 GMT Subject: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX [v2] In-Reply-To: References: Message-ID: On Tue, 12 Sep 2023 04:59:13 GMT, David Holmes wrote: >> Joachim Kern has updated the pull request incrementally with one additional commit since the last revision: >> >> try to improve code following Davids suggestions and do some cosmetic changes > > src/hotspot/share/prims/jvmtiAgentList.cpp line 251: > >> 249: while (it.has_next()) { >> 250: JvmtiAgent* const agent = it.next(); >> 251: if (!agent->is_static_lib() && device && inode && > > Style nit: we don't use implicit booleans so check `device != 0` and `inode != 0` explicitly please. I followed your suggestion > test/jdk/com/sun/tools/attach/warnings/DynamicLoadWarningTest.java line 127: > >> 125: >> 126: // test behavior on platforms that can detect if an agent library was previously loaded >> 127: if (!Platform.isAix()) { > > You need to fix the indentation of the old block. I followed your suggestion here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15583#discussion_r1324696223 PR Review Comment: https://git.openjdk.org/jdk/pull/15583#discussion_r1324696651 From jkern at openjdk.org Wed Sep 13 15:54:40 2023 From: jkern at openjdk.org (Joachim Kern) Date: Wed, 13 Sep 2023 15:54:40 GMT Subject: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX [v2] In-Reply-To: <7qbEWz9YaMJEUeVE4KRQQCDs4HPYsGOEL_peLCw44IU=.79b1f378-a648-4dd3-84c7-a6ce32537463@github.com> References: <7qbEWz9YaMJEUeVE4KRQQCDs4HPYsGOEL_peLCw44IU=.79b1f378-a648-4dd3-84c7-a6ce32537463@github.com> Message-ID: On Wed, 13 Sep 2023 15:30:22 GMT, Joachim Kern wrote: >> After push of [JDK-8307478](https://bugs.openjdk.org/browse/JDK-8307478) , the following test started to fail on AIX : >> com/sun/tools/attach/warnings/DynamicLoadWarningTest.java; >> The problem was described in [JDK-8309549](https://bugs.openjdk.org/browse/JDK-8309549) with a first try of a fix. >> A second fix via [JDK-8310191](https://bugs.openjdk.org/browse/JDK-8310191) was necessary. >> Both fixes just disable the specific subtest on AIX, without correction of the root cause. >> The root cause is, that dlopen() on AIX returns different handles every time, even if you load a library twice. There is no official AIX API available to get this information on a different way. >> My proposal is, to use the stat64x API with the fields st_device and st_inode. After a dlopen() the stat64x() API is called additionally to get this information which is then stored parallel to the library handle in the jvmtiAgent. For AIX we then can compare these values instead of the library handle and get the same functionality as on linux. > > Joachim Kern has updated the pull request incrementally with one additional commit since the last revision: > > try to improve code following Davids suggestions and do some cosmetic changes I moved `stat64x_LIBPATH` to AIX code, and tried to use some `AIX_ONLY(...)` statements. I hope this is better. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15583#issuecomment-1717898238 From duke at openjdk.org Wed Sep 13 16:12:57 2023 From: duke at openjdk.org (Soumadipta Roy) Date: Wed, 13 Sep 2023 16:12:57 GMT Subject: RFR: 8315937: Enable parallelism in vmTestbase/nsk/stress/numeric tests Message-ID: 'vmTestbase/nsk/stress/numeric' is a small and quick test suite. There seems to be no reason to run these tests exclusively. The tests themselves can be run as performance tests, but they are not executed as such in current configs. We should consider enabling parallelism for them and get improved test performance. Currently it is blocked by 'TEST.properties' with 'exclusiveAccess.dirs' directives in them. Below are few metrics which shows around 10% improvement in fastdebug mode and around 5% improvement in release mode without any regression: * fastdebug_before : **72.78s user 20.76s system 272% cpu 34.337 total** * fastdebug_after : **73.63s user 19.73s system 303% cpu 30.711 total** * release_before : **33.42s user 19.42s system 241% cpu 21.898 total** * release_after : **33.47s user 18.60s system 255% cpu 20.364 total** ------------- Commit messages: - 8315937: Enable parallelism in vmTestbase/nsk/stress/numeric test Changes: https://git.openjdk.org/jdk/pull/15725/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15725&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8315937 Stats: 24 lines in 1 file changed: 0 ins; 24 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15725.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15725/head:pull/15725 PR: https://git.openjdk.org/jdk/pull/15725 From shade at openjdk.org Wed Sep 13 16:12:58 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 13 Sep 2023 16:12:58 GMT Subject: RFR: 8315937: Enable parallelism in vmTestbase/nsk/stress/numeric tests In-Reply-To: References: Message-ID: On Wed, 13 Sep 2023 16:01:07 GMT, Soumadipta Roy wrote: > 'vmTestbase/nsk/stress/numeric' is a small and quick test suite. There seems to be no reason to run these tests exclusively. The tests themselves can be run as performance tests, but they are not executed as such in current configs. We should consider enabling parallelism for them and get improved test performance. Currently it is blocked by 'TEST.properties' with 'exclusiveAccess.dirs' directives in them. > > Below are few metrics which shows around 10% improvement in fastdebug mode and around 5% improvement in release mode without any regression: > > * fastdebug_before : **72.78s user 20.76s system 272% cpu 34.337 total** > * fastdebug_after : **73.63s user 19.73s system 303% cpu 30.711 total** > * release_before : **33.42s user 19.42s system 241% cpu 21.898 total** > * release_after : **33.47s user 18.60s system 255% cpu 20.364 total** I think the larger improvement comes from the fact that we do not lose parallel VM workers waiting on these short tests, when larger `tier4` suite is running. This looks good to me, but @lmesnik might want to give it a spin. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15725#issuecomment-1717917346 From shade at openjdk.org Wed Sep 13 16:14:41 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 13 Sep 2023 16:14:41 GMT Subject: RFR: 8315937: Enable parallelism in vmTestbase/nsk/stress/numeric tests In-Reply-To: References: Message-ID: On Wed, 13 Sep 2023 16:01:07 GMT, Soumadipta Roy wrote: > 'vmTestbase/nsk/stress/numeric' is a small and quick test suite. There seems to be no reason to run these tests exclusively. The tests themselves can be run as performance tests, but they are not executed as such in current configs. We should consider enabling parallelism for them and get improved test performance. Currently it is blocked by 'TEST.properties' with 'exclusiveAccess.dirs' directives in them. > > Below are few metrics which shows around 10% improvement in fastdebug mode and around 5% improvement in release mode without any regression: > > * fastdebug_before : **72.78s user 20.76s system 272% cpu 34.337 total** > * fastdebug_after : **73.63s user 19.73s system 303% cpu 30.711 total** > * release_before : **33.42s user 19.42s system 241% cpu 21.898 total** > * release_after : **33.47s user 18.60s system 255% cpu 20.364 total** Looks good. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15725#pullrequestreview-1624943763 From djelinski at openjdk.org Wed Sep 13 16:30:37 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Wed, 13 Sep 2023 16:30:37 GMT Subject: RFR: 8316125: Windows call_stub unnecessarily saves xmm16-31 when UseAVX>=3 In-Reply-To: References: Message-ID: On Tue, 12 Sep 2023 17:10:38 GMT, Daniel Jeli?ski wrote: > Please review this patch that removes saving of xmm16-xmm31 registers from Windows call_stub. > > Windows ABI only mandates saving xmm6-xmm15, which we continue to do here. > > No new tests. Mach5 tier1-5 builds and tests clean. Jira suggests that the original submitter of this patch might no longer be around, but any insights from folks more familiar with the problem are welcome. For what it's worth: - VS 2015 documentation specifies that XMM16-31 are volatile [link](https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-140#callercallee-saved-registers) - VS 2015 documentation [was released 2015-07](https://learn.microsoft.com/en-us/previous-versions/visualstudio/), so was not available at the time this patch was merged - VS 2013 documentation [did not mention XMM16-31 at all](https://learn.microsoft.com/en-us/previous-versions/visualstudio/visual-studio-2013/9z1stfyw(v=vs.120)) Based on the above, I'd assume that saving XMM16-31 was based on best guess. Saving of XMM16-31 was added in the second webrev, the first one did not have them. As far as I can tell, the change was not discussed. https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2015-April/017738.html ------------- PR Comment: https://git.openjdk.org/jdk/pull/15688#issuecomment-1717953351 From ccheung at openjdk.org Wed Sep 13 16:54:40 2023 From: ccheung at openjdk.org (Calvin Cheung) Date: Wed, 13 Sep 2023 16:54:40 GMT Subject: RFR: 8313638: Add test for dump of resolved references [v3] In-Reply-To: References: Message-ID: On Tue, 12 Sep 2023 21:42:34 GMT, Matias Saavedra Silva wrote: >> The change in [JDK-8306582](https://bugs.openjdk.org/browse/JDK-8306582) revealed that the state of the resolved references array is not checked in the CDS archive. This patch adds a test to ensure that the resolved references array is correct whether the application is archived or not. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Ioi Comments Looks good. Thanks. ------------- Marked as reviewed by ccheung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15686#pullrequestreview-1625012937 From jjoo at openjdk.org Wed Sep 13 18:12:06 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Wed, 13 Sep 2023 18:12:06 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v12] In-Reply-To: References: Message-ID: <4mDV3L0bMDfGt59uIQvGzK7u4d6vc4QIywboAx2tzfI=.7627f230-7c7b-4e0a-ab13-cf3843520e6a@github.com> > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Partial commit attempting to add total cpu tracker ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/18c8f9cb..dc7ff007 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=10-11 Stats: 88 lines in 11 files changed: 53 ins; 24 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From matsaave at openjdk.org Wed Sep 13 18:59:28 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 13 Sep 2023 18:59:28 GMT Subject: RFR: 8313638: Add test for dump of resolved references [v4] In-Reply-To: References: Message-ID: > The change in [JDK-8306582](https://bugs.openjdk.org/browse/JDK-8306582) revealed that the state of the resolved references array is not checked in the CDS archive. This patch adds a test to ensure that the resolved references array is correct whether the application is archived or not. Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into resolved_ref_test_8313638 - Removed redundant comment - Ioi Comments - Fixed spacing - Calvin comments - 8313638: Add test for dump of resolved references ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15686/files - new: https://git.openjdk.org/jdk/pull/15686/files/99c18a91..3e0e31aa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15686&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15686&range=02-03 Stats: 24114 lines in 1381 files changed: 11749 ins; 8317 del; 4048 mod Patch: https://git.openjdk.org/jdk/pull/15686.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15686/head:pull/15686 PR: https://git.openjdk.org/jdk/pull/15686 From coleenp at openjdk.org Wed Sep 13 19:08:38 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 13 Sep 2023 19:08:38 GMT Subject: RFR: 8316179: Use consistent naming for lightweight locking in MacroAssembler In-Reply-To: References: Message-ID: On Wed, 13 Sep 2023 11:00:13 GMT, Stefan Karlsson wrote: > Different platforms uses different names for the `MacroAssembler` functions that implement the lightweight fast locking/unlocking code. I propose that we use consistent naming for all platforms. > > These are the current names for the lightweight-locking functions: > * AArch64, ppc, riscv: `fast_lock` > * x86: `fast_lock_impl` > * arm: `fast_lock_2` > > Note that x86 and arm uses different names and the likely reason for that is that the `C2_MacroAssembler` subclass also implements a fast_lock function in that class, on those platforms. > > The fast_lock function in `C2_MacroAssembler` deals with the fast locking for all `LockingMode` implementations (monitor, legacy, and lightweight), while the `MacroAssembler::fast_lock*` functions only implement the lightweight locking implementation. > > I therefore propose that we use the name `MacroAssembler::lightweight_lock` on all platforms. > > *Note* that this is a small cleanup to update the names. The reason why I'm looking into this is that I want to move the C2 fast locking code out of the AArch64 (and other platforms) .ad file into C++ files to make it consistent with the x64 code structure (and to get better IDE support when the code is in plain C++ files). In that restructuring of the code I'm introducing `C2_MacroAssembler::fast_lock` functions that currently name-clash / shadow the `MacroAssembler::fast_lock` functions. Looks really good to me. Thanks. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15709#pullrequestreview-1625240865 From ayang at openjdk.org Wed Sep 13 19:33:40 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 13 Sep 2023 19:33:40 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v5] In-Reply-To: References: <4BxTPGct1ALNjXupIUdMtGT3ZfB0qalFlNq0Vva162Y=.17fc708f-0eda-43e0-bfe6-8447ee5b3488@github.com> Message-ID: On Tue, 12 Sep 2023 16:21:44 GMT, Richard Reingruber wrote: > BL: either 2.4s or 4.9s The variance seems too large, ~100% fluctuation. Unclear why. I just attached a small program, which is essentially the same as `BigArrayInOldGenRR.java` except the larger array. (I also changed it to be fixed-work for easier comparison.) Running `java -Xms3g -Xmx3g -XX:+UseParallelGC -XX:ParallelGCThreads=2 card_scan.java`: ## baseline [0.003s][info][gc] Using Parallel [2.397s][info][gc] GC(0) Pause Young (Allocation Failure) 1791M->1027M(2944M) 560.539ms [3.718s][info][gc] GC(1) Pause Young (Allocation Failure) 1795M->1027M(2944M) 571.168ms ## new [0.002s][info][gc] Using Parallel [15.187s][info][gc] GC(0) Pause Young (Allocation Failure) 1791M->1027M(2944M) 13441.110ms [29.356s][info][gc] GC(1) Pause Young (Allocation Failure) 1795M->1027M(2944M) 13418.047ms That is ~20x overhead in gc-pause. I also played with diff values of `ParallelGCThreads` and got the same observation as you: baseline becomes worse with more gc-workers and the proposed patch scales nicely. I agree that the scalability issue on master should be addressed. However, the regression while using fewer gc-threads is too significant, IMO. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1718201435 From duke at openjdk.org Wed Sep 13 19:48:46 2023 From: duke at openjdk.org (ExE Boss) Date: Wed, 13 Sep 2023 19:48:46 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v21] In-Reply-To: References: Message-ID: On Tue, 12 Sep 2023 10:49:38 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: > > - add missing space + reflow lines > - Fix typo > > Co-authored-by: ExE Boss <3889017+ExE-Boss at users.noreply.github.com> src/java.base/share/classes/jdk/internal/foreign/abi/fallback/FallbackLinker.java line 311: > 309: }; > 310: > 311: CANONICAL_LAYOUTS = Map.ofEntries( `LibFallback::wcharSize()` and?other?getters for?`LibFallback.NativeConstants`?fields can?throw an?error when?`LibFallback.SUPPORTED` is?`false` due?to the?`fallbackLinker`?library not?being?present, so?this static?initializer should?be?made into?a?method?instead: Suggestion: static final Map CANONICAL_LAYOUTS = initCanonicalLayouts(); private static Map initCanonicalLayouts() { if (!isSupported()) { return null; } int wchar_size = LibFallback.wcharSize(); MemoryLayout wchartLayout = switch(wchar_size) { case 2 -> JAVA_CHAR; // prefer JAVA_CHAR default -> FFIType.layoutFor(wchar_size); }; return Map.ofEntries( ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1324996396 From svkamath at openjdk.org Wed Sep 13 20:25:22 2023 From: svkamath at openjdk.org (Smita Kamath) Date: Wed, 13 Sep 2023 20:25:22 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v2] In-Reply-To: References: Message-ID: <8AUhJXT3sS9-gohY9kANLReqbUXcA28xNPiI2DPYE_k=.6ca0e589-156a-4085-8977-c55a3f95ec79@github.com> > Hi All, > I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. > > Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: > > |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup > |-------------|------------|---------------|------------------|-----------| > |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 > full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 > small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 > small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 > full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 > full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 > small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 > small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 > ? | ? | ? | ? | ? > full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 > full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 > small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 > small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 > full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 > full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 > small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 > small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 > ? | ? | ? | ? | ? > full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 > full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 > small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 > small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 > full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 > full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 > small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 > small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 > ? | ? | ? | ? | ? > full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 > full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 > small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 > small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 > full.AESGCMBench.decryptMultiPart | 65536 | 42649.816 | 47591.587 |1.11 > full.AESGCMBe... Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: Removed isEncrypt boolean variable ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15410/files - new: https://git.openjdk.org/jdk/pull/15410/files/33b1d980..2727c199 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15410&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15410&range=00-01 Stats: 43 lines in 8 files changed: 0 ins; 10 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/15410.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15410/head:pull/15410 PR: https://git.openjdk.org/jdk/pull/15410 From lmesnik at openjdk.org Wed Sep 13 20:46:38 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 13 Sep 2023 20:46:38 GMT Subject: RFR: 8315937: Enable parallelism in vmTestbase/nsk/stress/numeric tests In-Reply-To: References: Message-ID: On Wed, 13 Sep 2023 16:01:07 GMT, Soumadipta Roy wrote: > 'vmTestbase/nsk/stress/numeric' is a small and quick test suite. There seems to be no reason to run these tests exclusively. The tests themselves can be run as performance tests, but they are not executed as such in current configs. We should consider enabling parallelism for them and get improved test performance. Currently it is blocked by 'TEST.properties' with 'exclusiveAccess.dirs' directives in them. > > Below are few metrics which shows around 10% improvement in fastdebug mode and around 5% improvement in release mode without any regression: > > * fastdebug_before : **72.78s user 20.76s system 272% cpu 34.337 total** > * fastdebug_after : **73.63s user 19.73s system 303% cpu 30.711 total** > * release_before : **33.42s user 19.42s system 241% cpu 21.898 total** > * release_after : **33.47s user 18.60s system 255% cpu 20.364 total** Testing in CI passed. ------------- Marked as reviewed by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15725#pullrequestreview-1625423465 From jjoo at openjdk.org Wed Sep 13 22:12:08 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Wed, 13 Sep 2023 22:12:08 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v13] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Update total gc cpu implementation (still not finished) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/dc7ff007..f07bf70c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=11-12 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From sviswanathan at openjdk.org Wed Sep 13 23:16:40 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 13 Sep 2023 23:16:40 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v2] In-Reply-To: <8AUhJXT3sS9-gohY9kANLReqbUXcA28xNPiI2DPYE_k=.6ca0e589-156a-4085-8977-c55a3f95ec79@github.com> References: <8AUhJXT3sS9-gohY9kANLReqbUXcA28xNPiI2DPYE_k=.6ca0e589-156a-4085-8977-c55a3f95ec79@github.com> Message-ID: On Wed, 13 Sep 2023 20:25:22 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: >> >> |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup >> |-------------|------------|---------------|------------------|-----------| >> |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 >> full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 >> small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 >> small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 >> full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 >> full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 >> small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 >> small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 >> full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 >> small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 >> small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 >> full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 >> full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 >> small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 >> small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 >> full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 >> small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 >> small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 >> full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 >> full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 >> small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 >> small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 >> full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 >> small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 >> small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 >> full.AES... > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Removed isEncrypt boolean variable src/hotspot/cpu/x86/stubGenerator_x86_64.hpp line 362: > 360: // AVX2 AES-GCM related functions > 361: void initial_blocks(XMMRegister ctr, Register rounds, Register key, Register len, > 362: Register in, Register out, Register ct, Register subkeyHtbl, Register pos); You could rename it to gcm_intial_blocks_avx2(). src/hotspot/cpu/x86/stubGenerator_x86_64.hpp line 365: > 363: void gfmul_avx2(XMMRegister GH, XMMRegister HK); > 364: void generateHtbl_8_block_avx2(Register htbl, Register rscratch); > 365: void ghash8_encrypt8_parallel(Register key, Register subkeyHtbl, XMMRegister ctr_blockx, XMMRegister aad_hashx, Rename to ghash8_encrypt8_parallel_avx2(). src/hotspot/cpu/x86/stubGenerator_x86_64.hpp line 367: > 365: void ghash8_encrypt8_parallel(Register key, Register subkeyHtbl, XMMRegister ctr_blockx, XMMRegister aad_hashx, > 366: Register in, Register out, Register ct, Register pos, bool out_order, Register rounds); > 367: void ghash_last_8(Register subkeyHtbl); Rename to ghash_last_8_avx2. src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 185: > 183: if (VM_Version::supports_avx2()) { > 184: StubRoutines::_galoisCounterMode_AESCrypt = generate_avx2_galoisCounterMode_AESCrypt(); > 185: } This could be moved to line 192. src/hotspot/cpu/x86/stubRoutines_x86.hpp line 40: > 38: // AVX512 intrinsics add more code in 64-bit VM, > 39: // Windows have more code to save/restore registers > 40: _compiler_stubs_code_size = 30000 LP64_ONLY(+30000) WINDOWS_ONLY(+2000), Since the stub is for 64 bit, the LP64_ONLY part needs to increase and not the base. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1325143256 PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1325143499 PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1325143801 PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1325145931 PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1325144430 From iklam at openjdk.org Wed Sep 13 23:17:44 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 13 Sep 2023 23:17:44 GMT Subject: RFR: 8310160: Make GC APIs for handling archive heap objects agnostic of GC policy [v2] In-Reply-To: References: Message-ID: <-9tCKD-FVi6jWQJuKHjzLhfZ9BFKvqkAD7G-Ik0newg=.315e54e6-44f4-4248-adae-115c3c53376d@github.com> On Wed, 26 Jul 2023 04:28:25 GMT, Ioi Lam wrote: >> @iklam I agree this is a much better approach and makes the whole process truly collector agnostic. Great work, specially the optimization to re-order the objects. >> >> Given that this has minimal impact on performance, are we good to go ahead with this approach now? >> >> One issue I noticed while doing some testing with Shenandoah collector is probably worth pointing out here: >> When using `-XX:+NahlRawAlloc` with very small heap size like -Xmx4m or -Xmx8m the java process freezes. . This happens because the allocations for archive objects causes pacing to kick in and the main thread waits on `ShenandoahPacer::_wait_monitor` [0] to be notified by ShenandoahPeriodicPacerNotify [1]. But the WatcherThread which is responsible for executing the `ShenandoahPeriodicPacerNotify` task does run the periodic tasks until VM init is done [2][3]. So the main thread is stuck now. >> >> I guess if we do the wait in `ShenandoahPacer::pace_for_alloc` only after VM init is completed, it would resolve this issue. >> >> I haven't noticed this with `-XX:-NahlRawAlloc`, not sure why that should make any difference. >> >> Here are the stack traces: >> >> main thread: >> >> #5 0x00007f5a1fafbafc in PlatformMonitor::wait (this=this at entry=0x7f5a180f6c78, millis=, millis at entry=10) at src/hotspot/os/posix/mutex_posix.hpp:124 >> #6 0x00007f5a1faa3f9c in Monitor::wait (this=0x7f5a180f6c70, timeout=timeout at entry=10) at src/hotspot/share/runtime/mutex.cpp:254 >> #7 0x00007f5a1fc2d3bd in ShenandoahPacer::wait (time_ms=10, this=0x7f5a180f6a20) at src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp:286 >> #8 ShenandoahPacer::pace_for_alloc (this=0x7f5a180f6a20, words=) at src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp:263 >> #9 0x00007f5a1fbfc7e1 in ShenandoahHeap::allocate_memory (this=0x7f5a180ca590, req=...) at src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp:855 >> #10 0x00007f5a1fbfcb5c in ShenandoahHeap::mem_allocate (this=, size=, gc_overhead_limit_was_exceeded=) at src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp:931 >> #11 0x00007f5a1f2402c2 in NewQuickLoader::mem_allocate_raw (size=6) at src/hotspot/share/cds/archiveHeapLoader.cpp:493 >> #12 NewQuickLoaderImpl::allocate (this=, __the_thread__=, size=: 6, stream=0x7f5a1d228850) at src/hotspot/share/cds/archiveHeapLoader.cpp:712 >> #13 NewQuickLoaderImpl::load_archive_heap_inner > Hi @ashu-mehra thanks for testing the patch. I think we all agree that the minor performance impact is acceptable because the code is simpler and more portable. I'll try to clean up my patch and start a PR. > > BTW, I have implemented a simpler relocation algorithm with similar performance. It uses less memory and hopefully will be easier to understand as well. The algorithm is described in comments inside archiveHeapLoader.cpp > > https://github.com/openjdk/jdk/compare/master...iklam:jdk:8310823-materialize-cds-heap-with-regular-alloc.alternative-relocation?expand=1 > > As a prerequisite, I'll start a PR for [JDK-8251330: Reorder CDS archived heap to speed up relocation](https://bugs.openjdk.org/browse/JDK-8251330) > > Regarding raw allocation, it doesn't seem to be too much faster, so maybe we should disable it, at least for the initial integration. > > > $ (for i in {1..6}; do perf stat -r 100 java -XX:+NewArchiveHeapLoading -XX:-NahlRawAlloc --version > /dev/null; done) 2>&1 | grep elapsed > 0.0162332 +- 0.0000914 seconds time elapsed ( +- 0.56% ) > 0.0161316 +- 0.0000228 seconds time elapsed ( +- 0.14% ) > 0.0161171 +- 0.0000250 seconds time elapsed ( +- 0.16% ) > 0.0161311 +- 0.0000231 seconds time elapsed ( +- 0.14% ) > 0.0161433 +- 0.0000244 seconds time elapsed ( +- 0.15% ) > 0.0161121 +- 0.0000271 seconds time elapsed ( +- 0.17% ) > $ (for i in {1..6}; do perf stat -r 100 java -XX:+NewArchiveHeapLoading -XX:+NahlRawAlloc --version > /dev/null; done) 2>&1 | grep elapsed > 0.0160640 +- 0.0000973 seconds time elapsed ( +- 0.61% ) > 0.0159320 +- 0.0000221 seconds time elapsed ( +- 0.14% ) > 0.0159910 +- 0.0000272 seconds time elapsed ( +- 0.17% ) > 0.0159406 +- 0.0000230 seconds time elapsed ( +- 0.14% ) > 0.0159930 +- 0.0000252 seconds time elapsed ( +- 0.16% ) > 0.0159670 +- 0.0000296 seconds time elapsed ( +- 0.19% ) > $ (for i in {1..6}; do perf stat -r 100 java -XX:-NewArchiveHeapLoading -XX:+NahlRawAlloc --version > /dev/null; done) 2>&1 | grep elapsed > 0.0149069 +- 0.0000932 seconds time elapsed ( +- 0.63% ) > 0.0148363 +- 0.0000259 seconds time elapsed ( +- 0.17% ) > 0.0148077 +- 0.0000218 seconds time elapsed ( +- 0.15% ) > 0.0148377 +- 0.0000212 seconds time elapsed ( +- 0.14% ) > 0.0148411 +- 0.0000245 seconds time elapsed ( +- 0.17% ) > 0.0148504 +- 0.0000258 seconds time elapsed ( +- 0.17% ) > > @iklam I agree this is a much better approach and makes the whole process truly collector agnostic. Great work, specially the optimization to re-order the objects. > > Given that this has minimal impact on performance, are we good to go ahead with this approach now? > > One issue I noticed while doing some testing with Shenandoah collector is probably worth pointing out here: When using `-XX:+NahlRawAlloc` with very small heap size like -Xmx4m or -Xmx8m the java process freezes. . This happens because the allocations for archive objects causes pacing to kick in and the main thread waits on `ShenandoahPacer::_wait_monitor` [0] to be notified by ShenandoahPeriodicPacerNotify [1]. But the WatcherThread which is responsible for executing the `ShenandoahPeriodicPacerNotify` task does run the periodic tasks until VM init is done [2][3]. So the main thread is stuck now. > > I guess if we do the wait in `ShenandoahPacer::pace_for_alloc` only after VM init is completed, it would resolve this issue. > > Yes, Shenandoah pacing should not run before VM init is complete. This seems like a separate issue. > > Great work in this PR, though! I have done more testing with my prototype for JDK-8310823 (Materialize CDS archived objects with regular GC allocation) [1] [2] and found a few issues: - The minimum heap requirement has increased significantly. We allocate many objects during VM start-up. This may fill up the young generation for some collectors, causing the VM to exit as a GC is not yet possible at this stage. Since the generation sizing policy is different in each GC (and not really controllable via `-XX:NewSize`), the failure mode with small heap size becomes unpredictable. I think this a functional regression, which is more serious that the performance regression in start-up time. - Although it's possible to enable archive heap objects for ZGC with my JDK-8310823 patch , there's only very marginal improvement in start-up time (probably because ZGC is doing many other tasks concurrently during start-up) As we expect more heap objects to be archived in Project Leyden, we need a solution that scales well without blowing up the minimum heap size. For example, with the mainline, an 18MB archived heap can be mapped with -Xmx20M, but with my patch, I needed to use -Xmx40M. To reduce the minimum heap requirement, @fisk has developed a prototype that materializes the archived objects incrementally. However, it's not clear whether future Leyden developments would support such incremental loading. As we don't see any immediate benefits for JDK-8310823, I would suggest putting that on hold for now, until we get better understanding of the requirements from Leyden. ------ [1] https://github.com/openjdk/jdk/pull/15730 [2] https://bugs.openjdk.org/browse/JDK-8310823 ------------- PR Comment: https://git.openjdk.org/jdk/pull/14520#issuecomment-1718431204 From sviswanathan at openjdk.org Thu Sep 14 00:21:39 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 14 Sep 2023 00:21:39 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v2] In-Reply-To: <8AUhJXT3sS9-gohY9kANLReqbUXcA28xNPiI2DPYE_k=.6ca0e589-156a-4085-8977-c55a3f95ec79@github.com> References: <8AUhJXT3sS9-gohY9kANLReqbUXcA28xNPiI2DPYE_k=.6ca0e589-156a-4085-8977-c55a3f95ec79@github.com> Message-ID: On Wed, 13 Sep 2023 20:25:22 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: >> >> |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup >> |-------------|------------|---------------|------------------|-----------| >> |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 >> full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 >> small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 >> small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 >> full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 >> full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 >> small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 >> small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 >> full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 >> small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 >> small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 >> full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 >> full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 >> small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 >> small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 >> full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 >> small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 >> small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 >> full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 >> full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 >> small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 >> small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 >> full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 >> small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 >> small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 >> full.AES... > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Removed isEncrypt boolean variable src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 3399: > 3397: __ vpshufb(xmm6, xmm6, t5, Assembler::AVX_128bit); //perform a 16Byte swap > 3398: __ vpshufb(xmm7, xmm7, t5, Assembler::AVX_128bit); //perform a 16Byte swap > 3399: __ vpshufb(xmm8, xmm8, t5, Assembler::AVX_128bit); //perform a 16Byte swap This could be written in a loop: for (int rnum = 1; rnum <= 8; rnum++) { __ vpshufb(as_XMMRegister(rnum), as_XMMRegister(rnum), t5, Assembler::AVX_128bit); } Similar technique can be used in some places below. src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 4040: > 4038: __ aesenc(xmm6, t_key); > 4039: __ aesenc(xmm7, t_key); > 4040: __ aesenc(xmm8, t_key); This code is repeated multiple times so can be generated through a method like aesenc_step_avx2(t_key); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1325200254 PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1325180777 From tanksherman27 at gmail.com Thu Sep 14 02:48:34 2023 From: tanksherman27 at gmail.com (Julian Waters) Date: Thu, 14 Sep 2023 10:48:34 +0800 Subject: Seemingly erroneous return value in HotSpot test Message-ID: Hi all, In the HotSpot test /test/hotspot/jtreg/vmTestbase/nsk/jvmti/IterateThroughHeap/filter-tagged/HeapFilter.cpp, the method occurance_expected, which returns a jboolean, returns JNI_ERR in an error condition. The only place this is used is on line 383, at https://github.com/openjdk/jdk/blob/11d431b2c436d6b2a0aa7a00d676a93c1b87da0e/test/hotspot/jtreg/vmTestbase/nsk/jvmti/IterateThroughHeap/filter-tagged/HeapFilter.cpp#L383. There is no special handling for the JNI_ERR condition as far as I can tell, and since jboolean is unsigned, the JNI_ERR value of -1 will end up becoming 255 when returning from this method, which is then interpreted as JNI_TRUE on line 385. I caught this after redefining jboolean to bool, which then caused the microsoft compiler to warn about truncating the value. This seems like a bug to me, since I don't think it should return true on an error condition, is there something I'm missing? best regards, Julian -------------- next part -------------- An HTML attachment was scrubbed... URL: From jwaters at openjdk.org Thu Sep 14 03:25:15 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 14 Sep 2023 03:25:15 GMT Subject: RFR: 8307160: [REDO] Enable the permissive- flag on the Microsoft Visual C compiler [v5] In-Reply-To: <7piLRto5nNbhYYYfENCr5ecm4M2xNtMkjkE8XhrLLQ0=.8fd1ac3a-46f8-47a8-ae37-a4abbf7757d9@github.com> References: <7piLRto5nNbhYYYfENCr5ecm4M2xNtMkjkE8XhrLLQ0=.8fd1ac3a-46f8-47a8-ae37-a4abbf7757d9@github.com> Message-ID: > We should set the -permissive- flag for the Microsoft Visual C compiler, as was requested by the now backed out [JDK-8241499](https://bugs.openjdk.org/browse/JDK-8241499). Doing so makes the Visual C compiler much less accepting of ill formed code, which will improve code quality on Windows in the future. Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: - Merge branch 'master' into patch-10 - Document changes in awt_DnDDS.cpp - Remove negation in os_windows.cpp - Mismatched declaration in D3DGlyphCache.cpp - Fields in awt_TextComponent.cpp - reinterpret_cast needed in AccessBridgeJavaEntryPoints.cpp - Qualifiers in awt_PrintDialog.h should be removed - Likewise for awt_DnDDT.cpp - awt_ole.h include order issue in awt_DnDDS.cpp - Revert awt_ole.h - ... and 15 more: https://git.openjdk.org/jdk/compare/11d431b2...1d3d6b5e ------------- Changes: https://git.openjdk.org/jdk/pull/15096/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15096&range=04 Stats: 802 lines in 17 files changed: 171 ins; 127 del; 504 mod Patch: https://git.openjdk.org/jdk/pull/15096.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15096/head:pull/15096 PR: https://git.openjdk.org/jdk/pull/15096 From jwaters at openjdk.org Thu Sep 14 03:26:04 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 14 Sep 2023 03:26:04 GMT Subject: RFR: 8307160: [REDO] Enable the permissive- flag on the Microsoft Visual C compiler [v4] In-Reply-To: References: <7piLRto5nNbhYYYfENCr5ecm4M2xNtMkjkE8XhrLLQ0=.8fd1ac3a-46f8-47a8-ae37-a4abbf7757d9@github.com> Message-ID: On Thu, 17 Aug 2023 08:38:01 GMT, Julian Waters wrote: >> We should set the -permissive- flag for the Microsoft Visual C compiler, as was requested by the now backed out [JDK-8241499](https://bugs.openjdk.org/browse/JDK-8241499). Doing so makes the Visual C compiler much less accepting of ill formed code, which will improve code quality on Windows in the future. > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Document changes in awt_DnDDS.cpp Pinging ------------- PR Comment: https://git.openjdk.org/jdk/pull/15096#issuecomment-1718706290 From dholmes at openjdk.org Thu Sep 14 05:39:38 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 14 Sep 2023 05:39:38 GMT Subject: RFR: 8316125: Windows call_stub unnecessarily saves xmm16-31 when UseAVX>=3 In-Reply-To: References: Message-ID: <5VNInTyKr4AYIdXsgIlxZpRUIGT4Vt_6-HAO4_heLV4=.fc7e7ae5-6273-4bb4-807a-693e414b7cd5@github.com> On Tue, 12 Sep 2023 17:10:38 GMT, Daniel Jeli?ski wrote: > Please review this patch that removes saving of xmm16-xmm31 registers from Windows call_stub. > > Windows ABI only mandates saving xmm6-xmm15, which we continue to do here. > > No new tests. Mach5 tier1-5 builds and tests clean. I'm guessing this part of the change description covers it: > The call save regions have been extended for both compilation models to handle their respective register banks and are working correctly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15688#issuecomment-1718792276 From haosun at openjdk.org Thu Sep 14 05:40:24 2023 From: haosun at openjdk.org (Hao Sun) Date: Thu, 14 Sep 2023 05:40:24 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v9] In-Reply-To: References: Message-ID: > ### Background > > 1. PAC-RET branch protection was initially implemented on Linux/AArch64 in JDK-8277204 [1]. > > 2. However, it was broken with the introduction of virtual threads [2], mainly because the continuation freeze/thaw mechanism would trigger stack copying to/from memory, whereas the saved and signed LR on the stack doesn't get re-signed accordingly. > > 3. PR-9067 [3] tried to implement the re-sign part, but it was not accepted because option "PreserveFramePointer" is always turned on by PAC-RET but this would slow down virtual threads by ~5-20x. > > 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview language features are enabled. Note that virtual thread is one preview feature then. > > 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. > > ### Goal > > This patch aims to make PAC-RET compatible with virtual threads. > > ### Requirements of virtual threads > > R-1: Option "PreserveFramePointer" should be turned off. That is, PAC-RET implementation should not rely on frame pointer FP. Otherwise, the fast path in stack copying will never be taken. > > R-2: Use some invariant values to stack copying as the modifier, so as to avoid the PAC re-sign for continuation thaw, as the fast path in stack copying doesn't walk the frame. > > Note that more details can be found in the discussion [3]. > > ### Investigation > > We considered to use (relative) stack pointer SP, thread ID, PACStack [7] and value zero as the candidate modifier. > > 1. SP: In some scenarios, we need to authenticate the return address in places where the current SP doesn't match the SP on function entry. E.g. see the usage in Runtime1::generate_handle_exception(). Hence, neither absolute nor relative SP works. > > 2. thread ID (tid): It's invariant to virtual thread, but it's nontrivial to access it from the JIT side. We need 1) firstly resolve the address of current thread (See [8] as an example), and 2) get the tid field in the way like java_lang_Thread::thread_id(). I suppose this would introduce big performance overhead. Then can we turn to use "rthread" register (JavaThread object address) as the modifier? Unfortunately, it's not an invariant to virtual threads and PAC re-sign is still needed. > > 5. PACStack uses the signed return address of caller as the modifier to sign the callee's return address. In this way, we get one PACed call chain. The modifier should be saved into somewhere around the frame record. Inevitably, FP should be preserved to make it easy to find this modifier in case of... Hao Sun has updated the pull request incrementally with one additional commit since the last revision: Refactor long assertions in continuationFreezeThaw.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13322/files - new: https://git.openjdk.org/jdk/pull/13322/files/14b809e0..63e934d9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13322&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13322&range=07-08 Stats: 14 lines in 1 file changed: 12 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13322.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13322/head:pull/13322 PR: https://git.openjdk.org/jdk/pull/13322 From haosun at openjdk.org Thu Sep 14 05:40:50 2023 From: haosun at openjdk.org (Hao Sun) Date: Thu, 14 Sep 2023 05:40:50 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v8] In-Reply-To: References: Message-ID: On Tue, 12 Sep 2023 10:31:52 GMT, Andrew Haley wrote: >> Hao Sun has updated the pull request incrementally with one additional commit since the last revision: >> >> Introduce CPU_OVERRIDES_RETURN_ADDRESS_ACCESSORS > > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 678: > >> 676: >> 677: intptr_t* chunk_top = chunk->start_address() + chunk_new_sp; >> 678: assert(_empty || ContinuationHelper::return_address_at(_orig_chunk_sp - frame::sender_sp_ret_address_offset()) == chunk->pc(), ""); > > This line is way too long too. > Suggestion: > > if (! _empty) { > address *retaddr_slot = _orig_chunk_sp - frame::sender_sp_ret_address_offset(); > assert(ContinuationHelper::return_address_at(retaddr_slot) == chunk->pc(), > "Saved return address is bad"); > } Thanks for your suggestion. Updated in the latest commit. Note that `#ifdef ASSERT` is added in my commit, since variables `_empty` and `_orig_chunk_sp` belong to that scope. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 689: > >> 687: // patch return pc of the bottom-most frozen frame (now in the chunk) with the actual caller's return address >> 688: intptr_t* chunk_bottom_sp = chunk_top + cont_size() - _cont.argsize() - frame::metadata_words_at_top; >> 689: assert(_empty || ContinuationHelper::return_address_at(chunk_bottom_sp-frame::sender_sp_ret_address_offset()) == StubRoutines::cont_returnBarrier(), ""); > > You can't have empty assertion comments. Also, this line is way too long. Updated in the latest commit. Could you help take another look at it. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1325377053 PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1325377498 From dholmes at openjdk.org Thu Sep 14 05:52:47 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 14 Sep 2023 05:52:47 GMT Subject: RFR: 8316179: Use consistent naming for lightweight locking in MacroAssembler In-Reply-To: References: Message-ID: On Wed, 13 Sep 2023 11:00:13 GMT, Stefan Karlsson wrote: > Different platforms uses different names for the `MacroAssembler` functions that implement the lightweight fast locking/unlocking code. I propose that we use consistent naming for all platforms. > > These are the current names for the lightweight-locking functions: > * AArch64, ppc, riscv: `fast_lock` > * x86: `fast_lock_impl` > * arm: `fast_lock_2` > > Note that x86 and arm uses different names and the likely reason for that is that the `C2_MacroAssembler` subclass also implements a fast_lock function in that class, on those platforms. > > The fast_lock function in `C2_MacroAssembler` deals with the fast locking for all `LockingMode` implementations (monitor, legacy, and lightweight), while the `MacroAssembler::fast_lock*` functions only implement the lightweight locking implementation. > > I therefore propose that we use the name `MacroAssembler::lightweight_lock` on all platforms. > > *Note* that this is a small cleanup to update the names. The reason why I'm looking into this is that I want to move the C2 fast locking code out of the AArch64 (and other platforms) .ad file into C++ files to make it consistent with the x64 code structure (and to get better IDE support when the code is in plain C++ files). In that restructuring of the code I'm introducing `C2_MacroAssembler::fast_lock` functions that currently name-clash / shadow the `MacroAssembler::fast_lock` functions. This makes a lot of sense! Looks good. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15709#pullrequestreview-1625887358 From dholmes at openjdk.org Thu Sep 14 06:16:38 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 14 Sep 2023 06:16:38 GMT Subject: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX [v2] In-Reply-To: <7qbEWz9YaMJEUeVE4KRQQCDs4HPYsGOEL_peLCw44IU=.79b1f378-a648-4dd3-84c7-a6ce32537463@github.com> References: <7qbEWz9YaMJEUeVE4KRQQCDs4HPYsGOEL_peLCw44IU=.79b1f378-a648-4dd3-84c7-a6ce32537463@github.com> Message-ID: On Wed, 13 Sep 2023 15:30:22 GMT, Joachim Kern wrote: >> After push of [JDK-8307478](https://bugs.openjdk.org/browse/JDK-8307478) , the following test started to fail on AIX : >> com/sun/tools/attach/warnings/DynamicLoadWarningTest.java; >> The problem was described in [JDK-8309549](https://bugs.openjdk.org/browse/JDK-8309549) with a first try of a fix. >> A second fix via [JDK-8310191](https://bugs.openjdk.org/browse/JDK-8310191) was necessary. >> Both fixes just disable the specific subtest on AIX, without correction of the root cause. >> The root cause is, that dlopen() on AIX returns different handles every time, even if you load a library twice. There is no official AIX API available to get this information on a different way. >> My proposal is, to use the stat64x API with the fields st_device and st_inode. After a dlopen() the stat64x() API is called additionally to get this information which is then stored parallel to the library handle in the jvmtiAgent. For AIX we then can compare these values instead of the library handle and get the same functionality as on linux. > > Joachim Kern has updated the pull request incrementally with one additional commit since the last revision: > > try to improve code following Davids suggestions and do some cosmetic changes That is a definite improvement - thanks. Lets see what others think. ------------- PR Review: https://git.openjdk.org/jdk/pull/15583#pullrequestreview-1625958347 From david.holmes at oracle.com Thu Sep 14 06:20:55 2023 From: david.holmes at oracle.com (David Holmes) Date: Thu, 14 Sep 2023 16:20:55 +1000 Subject: Seemingly erroneous return value in HotSpot test In-Reply-To: References: Message-ID: <1a55a222-8eeb-48b3-b9a4-d09134976876@oracle.com> Hi Julian, This is a serviceability issue - cc'd. On 14/09/2023 12:48 pm, Julian Waters wrote: > Hi all, > > In the HotSpot > test?/test/hotspot/jtreg/vmTestbase/nsk/jvmti/IterateThroughHeap/filter-tagged/HeapFilter.cpp, the method?occurance_expected, which returns a jboolean, returns JNI_ERR in an error condition. The only place this is used is on line 383, at https://github.com/openjdk/jdk/blob/11d431b2c436d6b2a0aa7a00d676a93c1b87da0e/test/hotspot/jtreg/vmTestbase/nsk/jvmti/IterateThroughHeap/filter-tagged/HeapFilter.cpp#L383 . There is no special handling for the JNI_ERR condition as far as I can tell, and since jboolean is unsigned, the JNI_ERR value of -1 will end up becoming 255 when returning from this method, which is then interpreted as JNI_TRUE on line 385. I caught this after redefining jboolean to bool, which then caused the microsoft compiler to warn about?truncating the?value. This seems like a bug to me, since I don't think it should return true on an error condition, is there something I'm missing? That's a bug. Unclear whether they intended JNI_FALSE like verify_tag, or whether this was really intended to be an error. If it is an error then the test should abort somehow. Cheers, David > best regards, > Julian From stefank at openjdk.org Thu Sep 14 07:00:52 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 14 Sep 2023 07:00:52 GMT Subject: RFR: 8316179: Use consistent naming for lightweight locking in MacroAssembler In-Reply-To: References: Message-ID: On Wed, 13 Sep 2023 11:28:57 GMT, Roman Kennke wrote: >> Different platforms uses different names for the `MacroAssembler` functions that implement the lightweight fast locking/unlocking code. I propose that we use consistent naming for all platforms. >> >> These are the current names for the lightweight-locking functions: >> * AArch64, ppc, riscv: `fast_lock` >> * x86: `fast_lock_impl` >> * arm: `fast_lock_2` >> >> Note that x86 and arm uses different names and the likely reason for that is that the `C2_MacroAssembler` subclass also implements a fast_lock function in that class, on those platforms. >> >> The fast_lock function in `C2_MacroAssembler` deals with the fast locking for all `LockingMode` implementations (monitor, legacy, and lightweight), while the `MacroAssembler::fast_lock*` functions only implement the lightweight locking implementation. >> >> I therefore propose that we use the name `MacroAssembler::lightweight_lock` on all platforms. >> >> *Note* that this is a small cleanup to update the names. The reason why I'm looking into this is that I want to move the C2 fast locking code out of the AArch64 (and other platforms) .ad file into C++ files to make it consistent with the x64 code structure (and to get better IDE support when the code is in plain C++ files). In that restructuring of the code I'm introducing `C2_MacroAssembler::fast_lock` functions that currently name-clash / shadow the `MacroAssembler::fast_lock` functions. > > Looks good to me, thank you! Thanks @rkennke @coleenp @dholmes-ora for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15709#issuecomment-1718870361 From stefank at openjdk.org Thu Sep 14 07:05:30 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 14 Sep 2023 07:05:30 GMT Subject: Integrated: 8316179: Use consistent naming for lightweight locking in MacroAssembler In-Reply-To: References: Message-ID: On Wed, 13 Sep 2023 11:00:13 GMT, Stefan Karlsson wrote: > Different platforms uses different names for the `MacroAssembler` functions that implement the lightweight fast locking/unlocking code. I propose that we use consistent naming for all platforms. > > These are the current names for the lightweight-locking functions: > * AArch64, ppc, riscv: `fast_lock` > * x86: `fast_lock_impl` > * arm: `fast_lock_2` > > Note that x86 and arm uses different names and the likely reason for that is that the `C2_MacroAssembler` subclass also implements a fast_lock function in that class, on those platforms. > > The fast_lock function in `C2_MacroAssembler` deals with the fast locking for all `LockingMode` implementations (monitor, legacy, and lightweight), while the `MacroAssembler::fast_lock*` functions only implement the lightweight locking implementation. > > I therefore propose that we use the name `MacroAssembler::lightweight_lock` on all platforms. > > *Note* that this is a small cleanup to update the names. The reason why I'm looking into this is that I want to move the C2 fast locking code out of the AArch64 (and other platforms) .ad file into C++ files to make it consistent with the x64 code structure (and to get better IDE support when the code is in plain C++ files). In that restructuring of the code I'm introducing `C2_MacroAssembler::fast_lock` functions that currently name-clash / shadow the `MacroAssembler::fast_lock` functions. This pull request has now been integrated. Changeset: 639ba13c Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/639ba13c4b0ada1c2ae0a46e99ed707c219b3e53 Stats: 78 lines in 29 files changed: 0 ins; 0 del; 78 mod 8316179: Use consistent naming for lightweight locking in MacroAssembler Reviewed-by: rkennke, coleenp, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/15709 From jjoo at openjdk.org Thu Sep 14 08:25:19 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Thu, 14 Sep 2023 08:25:19 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v14] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Update to improve total time tracking ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/f07bf70c..6ba441e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=12-13 Stats: 28 lines in 9 files changed: 14 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From jjoo at openjdk.org Thu Sep 14 08:45:31 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Thu, 14 Sep 2023 08:45:31 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v15] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: comment out lines that cause segfault ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/6ba441e2..9ed97e88 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=13-14 Stats: 5 lines in 2 files changed: 1 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From aph at openjdk.org Thu Sep 14 08:55:43 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 14 Sep 2023 08:55:43 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v9] In-Reply-To: References: Message-ID: On Thu, 14 Sep 2023 05:40:24 GMT, Hao Sun wrote: >> ### Background >> >> 1. PAC-RET branch protection was initially implemented on Linux/AArch64 in JDK-8277204 [1]. >> >> 2. However, it was broken with the introduction of virtual threads [2], mainly because the continuation freeze/thaw mechanism would trigger stack copying to/from memory, whereas the saved and signed LR on the stack doesn't get re-signed accordingly. >> >> 3. PR-9067 [3] tried to implement the re-sign part, but it was not accepted because option "PreserveFramePointer" is always turned on by PAC-RET but this would slow down virtual threads by ~5-20x. >> >> 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview language features are enabled. Note that virtual thread is one preview feature then. >> >> 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. >> >> ### Goal >> >> This patch aims to make PAC-RET compatible with virtual threads. >> >> ### Requirements of virtual threads >> >> R-1: Option "PreserveFramePointer" should be turned off. That is, PAC-RET implementation should not rely on frame pointer FP. Otherwise, the fast path in stack copying will never be taken. >> >> R-2: Use some invariant values to stack copying as the modifier, so as to avoid the PAC re-sign for continuation thaw, as the fast path in stack copying doesn't walk the frame. >> >> Note that more details can be found in the discussion [3]. >> >> ### Investigation >> >> We considered to use (relative) stack pointer SP, thread ID, PACStack [7] and value zero as the candidate modifier. >> >> 1. SP: In some scenarios, we need to authenticate the return address in places where the current SP doesn't match the SP on function entry. E.g. see the usage in Runtime1::generate_handle_exception(). Hence, neither absolute nor relative SP works. >> >> 2. thread ID (tid): It's invariant to virtual thread, but it's nontrivial to access it from the JIT side. We need 1) firstly resolve the address of current thread (See [8] as an example), and 2) get the tid field in the way like java_lang_Thread::thread_id(). I suppose this would introduce big performance overhead. Then can we turn to use "rthread" register (JavaThread object address) as the modifier? Unfortunately, it's not an invariant to virtual threads and PAC re-sign is still needed. >> >> 5. PACStack uses the signed return address of caller as the modifier to sign the callee's return address. In this way, we get one PACed call chain. The modifier should be saved into somewhere around the frame record. Inevitably, FP sh... > > Hao Sun has updated the pull request incrementally with one additional commit since the last revision: > > Refactor long assertions in continuationFreezeThaw.cpp src/hotspot/share/runtime/continuationFreezeThaw.cpp line 595: > 593: DEBUG_ONLY(_empty = false;) > 594: assert(chunk->sp() < (chunk->stack_size() - chunk->argsize()), ""); > 595: assert(ContinuationHelper::return_address_at(chunk->sp_address() - frame::sender_sp_ret_address_offset()) == chunk->pc(), ""); You missed this file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1325609228 From aph at openjdk.org Thu Sep 14 08:55:45 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 14 Sep 2023 08:55:45 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v8] In-Reply-To: References: Message-ID: On Thu, 14 Sep 2023 05:36:43 GMT, Hao Sun wrote: >> src/hotspot/share/runtime/continuationFreezeThaw.cpp line 678: >> >>> 676: >>> 677: intptr_t* chunk_top = chunk->start_address() + chunk_new_sp; >>> 678: assert(_empty || ContinuationHelper::return_address_at(_orig_chunk_sp - frame::sender_sp_ret_address_offset()) == chunk->pc(), ""); >> >> This line is way too long too. >> Suggestion: >> >> if (! _empty) { >> address *retaddr_slot = _orig_chunk_sp - frame::sender_sp_ret_address_offset(); >> assert(ContinuationHelper::return_address_at(retaddr_slot) == chunk->pc(), >> "Saved return address is bad"); >> } > > Thanks for your suggestion. Updated in the latest commit. > > Note that `#ifdef ASSERT` is added in my commit, since variables `_empty` and `_orig_chunk_sp` belong to that scope. Please find some other extremely long lines in this patch. One is 138 characters long! Anything much more than 80 is a candidate for attention. It's very hard to read. And please don't use empty comment strings in assertions. All of this is about code readability. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1325610819 From jkern at openjdk.org Thu Sep 14 09:20:24 2023 From: jkern at openjdk.org (Joachim Kern) Date: Thu, 14 Sep 2023 09:20:24 GMT Subject: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX [v3] In-Reply-To: References: Message-ID: > After push of [JDK-8307478](https://bugs.openjdk.org/browse/JDK-8307478) , the following test started to fail on AIX : > com/sun/tools/attach/warnings/DynamicLoadWarningTest.java; > The problem was described in [JDK-8309549](https://bugs.openjdk.org/browse/JDK-8309549) with a first try of a fix. > A second fix via [JDK-8310191](https://bugs.openjdk.org/browse/JDK-8310191) was necessary. > Both fixes just disable the specific subtest on AIX, without correction of the root cause. > The root cause is, that dlopen() on AIX returns different handles every time, even if you load a library twice. There is no official AIX API available to get this information on a different way. > My proposal is, to use the stat64x API with the fields st_device and st_inode. After a dlopen() the stat64x() API is called additionally to get this information which is then stored parallel to the library handle in the jvmtiAgent. For AIX we then can compare these values instead of the library handle and get the same functionality as on linux. Joachim Kern has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'openjdk:master' into JDK-8315706 - try to improve code following Davids suggestions and do some cosmetic changes - JDK-8315706 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15583/files - new: https://git.openjdk.org/jdk/pull/15583/files/46a531b0..f565f9a5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15583&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15583&range=01-02 Stats: 25976 lines in 1433 files changed: 13398 ins; 8377 del; 4201 mod Patch: https://git.openjdk.org/jdk/pull/15583.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15583/head:pull/15583 PR: https://git.openjdk.org/jdk/pull/15583 From alanb at openjdk.org Thu Sep 14 09:46:44 2023 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 14 Sep 2023 09:46:44 GMT Subject: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX [v3] In-Reply-To: References: Message-ID: On Thu, 14 Sep 2023 09:20:24 GMT, Joachim Kern wrote: >> After push of [JDK-8307478](https://bugs.openjdk.org/browse/JDK-8307478) , the following test started to fail on AIX : >> com/sun/tools/attach/warnings/DynamicLoadWarningTest.java; >> The problem was described in [JDK-8309549](https://bugs.openjdk.org/browse/JDK-8309549) with a first try of a fix. >> A second fix via [JDK-8310191](https://bugs.openjdk.org/browse/JDK-8310191) was necessary. >> Both fixes just disable the specific subtest on AIX, without correction of the root cause. >> The root cause is, that dlopen() on AIX returns different handles every time, even if you load a library twice. There is no official AIX API available to get this information on a different way. >> My proposal is, to use the stat64x API with the fields st_device and st_inode. After a dlopen() the stat64x() API is called additionally to get this information which is then stored parallel to the library handle in the jvmtiAgent. For AIX we then can compare these values instead of the library handle and get the same functionality as on linux. > > Joachim Kern has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8315706 > - try to improve code following Davids suggestions and do some cosmetic changes > - JDK-8315706 No objection to doing this but just to repeat again that the spec does not not require that warnings are de-duplicated. This means the change is not necessary, only happens the case where someone is loading the same agent dynamically many times into a VM running on AIX, it just means there will be one rather than N warnings in the logs. src/hotspot/share/prims/jvmtiAgent.hpp line 48: > 46: #ifdef AIX > 47: long _inode; > 48: long _device; How are dev_t and ino_t defined on AIX, I'm wondering if long is okay here. ------------- PR Review: https://git.openjdk.org/jdk/pull/15583#pullrequestreview-1626479247 PR Review Comment: https://git.openjdk.org/jdk/pull/15583#discussion_r1325687093 From jkern at openjdk.org Thu Sep 14 10:10:46 2023 From: jkern at openjdk.org (Joachim Kern) Date: Thu, 14 Sep 2023 10:10:46 GMT Subject: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX [v3] In-Reply-To: References: Message-ID: <9pdhZGAelYN5jbnp_EE3pUknd0cWKnAHeA-SSNF7JyM=.765d1324-76e1-46b6-8e14-1bb61c0a48a2@github.com> On Thu, 14 Sep 2023 09:40:54 GMT, Alan Bateman wrote: >> Joachim Kern has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into JDK-8315706 >> - try to improve code following Davids suggestions and do some cosmetic changes >> - JDK-8315706 > > src/hotspot/share/prims/jvmtiAgent.hpp line 48: > >> 46: #ifdef AIX >> 47: long _inode; >> 48: long _device; > > How are dev_t and ino_t defined on AIX, I'm wondering if long is okay here. They are defined as __ulong64_t which is unsigned long. So I can change it to unsigned long or even to dev_t and ino_t. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15583#discussion_r1325721375 From clanger at openjdk.org Thu Sep 14 10:37:45 2023 From: clanger at openjdk.org (Christoph Langer) Date: Thu, 14 Sep 2023 10:37:45 GMT Subject: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX [v3] In-Reply-To: References: Message-ID: On Thu, 14 Sep 2023 09:20:24 GMT, Joachim Kern wrote: >> After push of [JDK-8307478](https://bugs.openjdk.org/browse/JDK-8307478) , the following test started to fail on AIX : >> com/sun/tools/attach/warnings/DynamicLoadWarningTest.java; >> The problem was described in [JDK-8309549](https://bugs.openjdk.org/browse/JDK-8309549) with a first try of a fix. >> A second fix via [JDK-8310191](https://bugs.openjdk.org/browse/JDK-8310191) was necessary. >> Both fixes just disable the specific subtest on AIX, without correction of the root cause. >> The root cause is, that dlopen() on AIX returns different handles every time, even if you load a library twice. There is no official AIX API available to get this information on a different way. >> My proposal is, to use the stat64x API with the fields st_device and st_inode. After a dlopen() the stat64x() API is called additionally to get this information which is then stored parallel to the library handle in the jvmtiAgent. For AIX we then can compare these values instead of the library handle and get the same functionality as on linux. > > Joachim Kern has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8315706 > - try to improve code following Davids suggestions and do some cosmetic changes > - JDK-8315706 Just a few indentation remarks. Could you please also update the SAP copyright year in src/hotspot/os/aix/os_aix.hpp? src/hotspot/os/aix/os_aix.cpp line 3062: > 3060: size_t libpathlen = strlen(env); > 3061: char* libpath = NEW_C_HEAP_ARRAY(char, libpathlen + 1, mtServiceability); > 3062: char* combined = NEW_C_HEAP_ARRAY(char, libpathlen + strlen(path) +1, mtServiceability); Space between +1 Suggestion: char* combined = NEW_C_HEAP_ARRAY(char, libpathlen + strlen(path) + 1, mtServiceability); src/hotspot/os/aix/os_aix.cpp line 3065: > 3063: char *saveptr, *token; > 3064: strcpy(libpath, env); > 3065: for( token = strtok_r(libpath, ":", &saveptr); token != nullptr; token = strtok_r(nullptr, ":", &saveptr) ) { spaces: Suggestion: for (token = strtok_r(libpath, ":", &saveptr); token != nullptr; token = strtok_r(nullptr, ":", &saveptr)) { src/hotspot/share/prims/jvmtiAgent.cpp line 306: > 304: agent->set_device(libstat.st_dev); > 305: } > 306: else { Fix style: Suggestion: } else { ------------- PR Review: https://git.openjdk.org/jdk/pull/15583#pullrequestreview-1626565423 PR Review Comment: https://git.openjdk.org/jdk/pull/15583#discussion_r1325746854 PR Review Comment: https://git.openjdk.org/jdk/pull/15583#discussion_r1325746078 PR Review Comment: https://git.openjdk.org/jdk/pull/15583#discussion_r1325747780 From jkern at openjdk.org Thu Sep 14 12:32:18 2023 From: jkern at openjdk.org (Joachim Kern) Date: Thu, 14 Sep 2023 12:32:18 GMT Subject: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX [v4] In-Reply-To: References: Message-ID: > After push of [JDK-8307478](https://bugs.openjdk.org/browse/JDK-8307478) , the following test started to fail on AIX : > com/sun/tools/attach/warnings/DynamicLoadWarningTest.java; > The problem was described in [JDK-8309549](https://bugs.openjdk.org/browse/JDK-8309549) with a first try of a fix. > A second fix via [JDK-8310191](https://bugs.openjdk.org/browse/JDK-8310191) was necessary. > Both fixes just disable the specific subtest on AIX, without correction of the root cause. > The root cause is, that dlopen() on AIX returns different handles every time, even if you load a library twice. There is no official AIX API available to get this information on a different way. > My proposal is, to use the stat64x API with the fields st_device and st_inode. After a dlopen() the stat64x() API is called additionally to get this information which is then stored parallel to the library handle in the jvmtiAgent. For AIX we then can compare these values instead of the library handle and get the same functionality as on linux. Joachim Kern has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'origin/JDK-8315706' into JDK-8315706 - Following the proposals ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15583/files - new: https://git.openjdk.org/jdk/pull/15583/files/f565f9a5..a8c6e65b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15583&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15583&range=02-03 Stats: 17 lines in 6 files changed: 0 ins; 1 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/15583.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15583/head:pull/15583 PR: https://git.openjdk.org/jdk/pull/15583 From jkern at openjdk.org Thu Sep 14 12:41:41 2023 From: jkern at openjdk.org (Joachim Kern) Date: Thu, 14 Sep 2023 12:41:41 GMT Subject: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX [v4] In-Reply-To: References: Message-ID: On Thu, 14 Sep 2023 12:32:18 GMT, Joachim Kern wrote: >> After push of [JDK-8307478](https://bugs.openjdk.org/browse/JDK-8307478) , the following test started to fail on AIX : >> com/sun/tools/attach/warnings/DynamicLoadWarningTest.java; >> The problem was described in [JDK-8309549](https://bugs.openjdk.org/browse/JDK-8309549) with a first try of a fix. >> A second fix via [JDK-8310191](https://bugs.openjdk.org/browse/JDK-8310191) was necessary. >> Both fixes just disable the specific subtest on AIX, without correction of the root cause. >> The root cause is, that dlopen() on AIX returns different handles every time, even if you load a library twice. There is no official AIX API available to get this information on a different way. >> My proposal is, to use the stat64x API with the fields st_device and st_inode. After a dlopen() the stat64x() API is called additionally to get this information which is then stored parallel to the library handle in the jvmtiAgent. For AIX we then can compare these values instead of the library handle and get the same functionality as on linux. > > Joachim Kern has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8315706' into JDK-8315706 > - Following the proposals Again, I followed the proposals. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15583#issuecomment-1719374035 From matsaave at openjdk.org Thu Sep 14 14:32:56 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 14 Sep 2023 14:32:56 GMT Subject: RFR: 8313638: Add test for dump of resolved references [v3] In-Reply-To: References: Message-ID: On Wed, 13 Sep 2023 16:52:02 GMT, Calvin Cheung wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Ioi Comments > > Looks good. Thanks. Thank you for the reviews @calvinccheung and @iklam! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15686#issuecomment-1719567027 From matsaave at openjdk.org Thu Sep 14 14:32:59 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 14 Sep 2023 14:32:59 GMT Subject: Integrated: 8313638: Add test for dump of resolved references In-Reply-To: References: Message-ID: On Tue, 12 Sep 2023 16:37:04 GMT, Matias Saavedra Silva wrote: > The change in [JDK-8306582](https://bugs.openjdk.org/browse/JDK-8306582) revealed that the state of the resolved references array is not checked in the CDS archive. This patch adds a test to ensure that the resolved references array is correct whether the application is archived or not. This pull request has now been integrated. Changeset: 83dca629 Author: Matias Saavedra Silva URL: https://git.openjdk.org/jdk/commit/83dca6296e3fc7b9912ef7b82e443ce1415a7bcc Stats: 202 lines in 5 files changed: 202 ins; 0 del; 0 mod 8313638: Add test for dump of resolved references Reviewed-by: ccheung, iklam ------------- PR: https://git.openjdk.org/jdk/pull/15686 From rrich at openjdk.org Thu Sep 14 15:32:29 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 14 Sep 2023 15:32:29 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v7] In-Reply-To: References: Message-ID: > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: 100 stripes per active worker thread ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/d535a10b..9a2b230d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=05-06 Stats: 25 lines in 3 files changed: 16 ins; 3 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From rrich at openjdk.org Thu Sep 14 15:41:48 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 14 Sep 2023 15:41:48 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v5] In-Reply-To: References: <4BxTPGct1ALNjXupIUdMtGT3ZfB0qalFlNq0Vva162Y=.17fc708f-0eda-43e0-bfe6-8447ee5b3488@github.com> Message-ID: On Wed, 13 Sep 2023 19:30:33 GMT, Albert Mingkun Yang wrote: > I agree that the scalability issue on master should be addressed. However, the regression while using fewer gc-threads is too significant, IMO. Thanks for the additional testing. You convinced me that the regression is too significant. So far I havn't found a bug that causes it. The work is dominated by scanning the elements of the large array. As I see it, it is devided into too many pieces. In the last commit I've changed the sizing of the stripes to get 100 stripes per active worker thread. These are the results I get. ## baseline $ baseline/images/jdk/bin/java -Xms3g -Xmx3g -XX:+UseParallelGC -XX:ParallelGCThreads=2 -Xlog:gc=trace card_scan [0.006s][info][gc] Using Parallel [2.269s][trace][gc] GC(0) PSYoung generation size at maximum: 1048576K [2.269s][info ][gc] GC(0) Pause Young (Allocation Failure) 1793M->1025M(2944M) 553.722ms [3.996s][trace][gc] GC(1) PSYoung generation size at maximum: 1048576K [3.996s][info ][gc] GC(1) Pause Young (Allocation Failure) 1793M->1025M(2944M) 556.092ms $ baseline/images/jdk/bin/java -Xms3g -Xmx3g -XX:+UseParallelGC -XX:ParallelGCThreads=4 -Xlog:gc=trace card_scan [0.006s][info][gc] Using Parallel [2.673s][trace][gc] GC(0) PSYoung generation size at maximum: 1048576K [2.673s][info ][gc] GC(0) Pause Young (Allocation Failure) 1793M->1025M(2944M) 977.355ms [4.819s][trace][gc] GC(1) PSYoung generation size at maximum: 1048576K [4.820s][info ][gc] GC(1) Pause Young (Allocation Failure) 1793M->1025M(2944M) 986.606ms $ baseline/images/jdk/bin/java -Xms3g -Xmx3g -XX:+UseParallelGC -XX:ParallelGCThreads=8 -Xlog:gc=trace card_scan [0.005s][info][gc] Using Parallel [3.103s][trace][gc] GC(0) PSYoung generation size at maximum: 1048576K [3.103s][info ][gc] GC(0) Pause Young (Allocation Failure) 1793M->1025M(2944M) 1387.618ms [5.688s][trace][gc] GC(1) PSYoung generation size at maximum: 1048576K [5.688s][info ][gc] GC(1) Pause Young (Allocation Failure) 1793M->1025M(2944M) 1398.658ms ## new $ new/images/jdk/bin/java -Xms3g -Xmx3g -XX:+UseParallelGC -XX:ParallelGCThreads=2 -Xlog:gc=trace -Xlog:gc+scavenge=trace card_scan [0.006s][info][gc] Using Parallel [1.710s][trace][gc,scavenge] stripe count:200 stripe size:5248K [2.044s][trace][gc ] GC(0) PSYoung generation size at maximum: 1048576K [2.044s][info ][gc ] GC(0) Pause Young (Allocation Failure) 1793M->1025M(2944M) 334.022ms [3.229s][trace][gc,scavenge] stripe count:200 stripe size:5248K [3.562s][trace][gc ] GC(1) PSYoung generation size at maximum: 1048576K [3.562s][info ][gc ] GC(1) Pause Young (Allocation Failure) 1793M->1025M(2944M) 333.069ms $ new/images/jdk/bin/java -Xms3g -Xmx3g -XX:+UseParallelGC -XX:ParallelGCThreads=4 -Xlog:gc=trace -Xlog:gc+scavenge=trace card_scan [0.006s][info][gc] Using Parallel [1.689s][trace][gc,scavenge] stripe count:400 stripe size:2624K [1.944s][trace][gc ] GC(0) PSYoung generation size at maximum: 1048576K [1.944s][info ][gc ] GC(0) Pause Young (Allocation Failure) 1793M->1025M(2944M) 255.195ms [3.100s][trace][gc,scavenge] stripe count:400 stripe size:2624K [3.347s][trace][gc ] GC(1) PSYoung generation size at maximum: 1048576K [3.348s][info ][gc ] GC(1) Pause Young (Allocation Failure) 1793M->1025M(2944M) 247.918ms $ new/images/jdk/bin/java -Xms3g -Xmx3g -XX:+UseParallelGC -XX:ParallelGCThreads=8 -Xlog:gc=trace -Xlog:gc+scavenge=trace card_scan [0.006s][info][gc] Using Parallel [1.707s][trace][gc,scavenge] stripe count:800 stripe size:1312K [1.920s][trace][gc ] GC(0) PSYoung generation size at maximum: 1048576K [1.920s][info ][gc ] GC(0) Pause Young (Allocation Failure) 1793M->1025M(2944M) 213.508ms [3.088s][trace][gc,scavenge] stripe count:800 stripe size:1312K [3.297s][trace][gc ] GC(1) PSYoung generation size at maximum: 1048576K [3.297s][info ][gc ] GC(1) Pause Young (Allocation Failure) 1793M->1025M(2944M) 209.769ms I've tested on a server with Intel CPUs(*). Please let me know how it works for you. (*) lscpu shows "Genuine Intel(R) CPU 0000%@" as model name. Maybe the Linux is too old. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1719694789 From duke at openjdk.org Thu Sep 14 16:01:50 2023 From: duke at openjdk.org (Soumadipta Roy) Date: Thu, 14 Sep 2023 16:01:50 GMT Subject: Integrated: 8315937: Enable parallelism in vmTestbase/nsk/stress/numeric tests In-Reply-To: References: Message-ID: On Wed, 13 Sep 2023 16:01:07 GMT, Soumadipta Roy wrote: > 'vmTestbase/nsk/stress/numeric' is a small and quick test suite. There seems to be no reason to run these tests exclusively. The tests themselves can be run as performance tests, but they are not executed as such in current configs. We should consider enabling parallelism for them and get improved test performance. Currently it is blocked by 'TEST.properties' with 'exclusiveAccess.dirs' directives in them. > > Below are few metrics which shows around 10% improvement in fastdebug mode and around 5% improvement in release mode without any regression: > > * fastdebug_before : **72.78s user 20.76s system 272% cpu 34.337 total** > * fastdebug_after : **73.63s user 19.73s system 303% cpu 30.711 total** > * release_before : **33.42s user 19.42s system 241% cpu 21.898 total** > * release_after : **33.47s user 18.60s system 255% cpu 20.364 total** This pull request has now been integrated. Changeset: eb1f67b1 Author: Soumadipta Roy Committer: Paul Hohensee URL: https://git.openjdk.org/jdk/commit/eb1f67b160c4d2b8feb7330786ecd8e53ed53946 Stats: 24 lines in 1 file changed: 0 ins; 24 del; 0 mod 8315937: Enable parallelism in vmTestbase/nsk/stress/numeric tests Reviewed-by: shade, lmesnik ------------- PR: https://git.openjdk.org/jdk/pull/15725 From kvn at openjdk.org Thu Sep 14 17:18:40 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 14 Sep 2023 17:18:40 GMT Subject: RFR: 8316125: Windows call_stub unnecessarily saves xmm16-31 when UseAVX>=3 In-Reply-To: References: Message-ID: On Tue, 12 Sep 2023 17:10:38 GMT, Daniel Jeli?ski wrote: > Please review this patch that removes saving of xmm16-xmm31 registers from Windows call_stub. > > Windows ABI only mandates saving xmm6-xmm15, which we continue to do here. > > No new tests. Mach5 tier1-5 builds and tests clean. @jatin-bhateja or @sviswa7 can you look on this PR? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15688#issuecomment-1719847529 From kvn at openjdk.org Thu Sep 14 17:51:46 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 14 Sep 2023 17:51:46 GMT Subject: RFR: 8315082: [REDO] Generational ZGC: Tests crash with assert(index == 0 || is_power_of_2(index)) In-Reply-To: References: Message-ID: <7vWXolmj0hDdJ3JxuoBnaifU1jZ7TkEiDGm6dVcQamU=.628c723a-3d44-4491-ba51-379d69e06741@github.com> On Wed, 6 Sep 2023 11:54:04 GMT, Roberto Casta?eda Lozano wrote: > This changeset (REDO of [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749)) ensures that the array copy stub underlying the intrinsic implementation of `Object.clone` only copies its (double-word aligned) payload, excluding the remaining object alignment padding words, when a non-default `ObjectAlignmentInBytes` value is used. This prevents the specialized ZGC stubs for `Object[]` array copy from processing undefined object alignment padding words as valid object pointers. For further details about the specific failure, see initial analysis of [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749) by Erik ?sterlund and Stefan Karlsson and comments in `test/hotspot/jtreg/compiler/gcbarriers/TestArrayCopyWithLargeObjectAlignment.java`. > > As a side-benefit, the changeset simplifies the array size computation logic in `GraphKit::new_array()` by decoupling computation of header size and alignment padding size. > > #### Additional changes compared to [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749) > > This changeset proposes the exact same solution as [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749), that is, identical changes to `barrierSetC2.cpp`, `graphKit.cpp`, `library_call.cpp`, and `TestArrayCopyWithLargeObjectAlignment.java`. On top of that, it relaxes an assertion in the idealization of `ArrayCopy` nodes violated by [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749) and reported in [JDK-8315029](https://bugs.openjdk.org/browse/JDK-8315029) (new changes in `arraycopynode.cpp`, new regression test `TestCloneArrayWithDifferentLengthConstness.java`). The original, stricter assertion checks that, while idealizing an ArrayCopy node, the "constness" of the array copy's word-length (whether it is known by C2 to be constant or not) is equivalent to that of the array copy's element-length. For cases in which the element-length is within a small, fixed range (e.g. for an `int` array of length `3..4`) so that all element-length values lead to the same number of wo rds (`2`), the assertion used to hold before this changeset only because of weak type propagation in `AndL` (preventing the constant word-length to be discovered), see the left graph below: > > ![from-element-to-word-length](https://github.com/openjdk/jdk/assets/8792647/3d5535cf-4afa-46dd-bc48-30430eead12f) > > With the proposed changes, the array copy word-length is computed in a more straightforward way that enables C2 to infer the precise number of words in the same scenario ... Few comments. src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 684: > 682: // multiple of BytesPerLong for sub-long element types. > 683: payload_size = kit->gvn().transform(new AddXNode(payload_size, kit->MakeConX(BytesPerLong - 1))); > 684: } Back to this change. Why rounding is only arrays? Do we have a check that object's alignment >= 8 bytes? If it less you may access beyond array. Can `Add` result overflow in 32-bit VM? src/hotspot/share/opto/graphKit.cpp line 3859: > 3857: abody = _gvn.transform(new LShiftXNode(lengthx, elem_shift)); > 3858: } > 3859: Node* non_rounded_size = _gvn.transform(new AddXNode(headerx, abody)); Here src/hotspot/share/opto/graphKit.cpp line 3869: > 3867: if (round_mask != 0) { > 3868: Node* mask1 = MakeConX(round_mask); > 3869: size = _gvn.transform(new AddXNode(size, mask1)); and here again question about overflow in 32-bit VM. Do we generate compare with FastAllocateSizeLimit before this code is executed? ------------- PR Review: https://git.openjdk.org/jdk/pull/15589#pullrequestreview-1627447238 PR Review Comment: https://git.openjdk.org/jdk/pull/15589#discussion_r1326319766 PR Review Comment: https://git.openjdk.org/jdk/pull/15589#discussion_r1326324968 PR Review Comment: https://git.openjdk.org/jdk/pull/15589#discussion_r1326326514 From lmesnik at openjdk.org Thu Sep 14 20:06:54 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 14 Sep 2023 20:06:54 GMT Subject: RFR: 8315415: OutputAnalyzer.shouldMatchByLine() fails in some cases Message-ID: OutputAnalyzer.shouldMatchByLine(from, to, pattern) treat from and to parameters as patterns and not lines. So it might fail to compile them or work not as expected in some cases. I grepped the usage of shouldMatchByLine and stdoutShouldMatchByLine and found that in most cases from/to are set to some regex patterns. So I just updated the names of variables and documentation to explicitly say that from/to are patterns. See bugs for details. Tested with tier1 (mostly for validation scripts since no code changes.) ------------- Commit messages: - 8315415 Changes: https://git.openjdk.org/jdk/pull/15753/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15753&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8315415 Stats: 28 lines in 1 file changed: 0 ins; 0 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/15753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15753/head:pull/15753 PR: https://git.openjdk.org/jdk/pull/15753 From jjoo at openjdk.org Thu Sep 14 20:35:20 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Thu, 14 Sep 2023 20:35:20 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v16] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Fix segfaults on build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/9ed97e88..8b6c5533 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=14-15 Stats: 13 lines in 4 files changed: 2 ins; 5 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From cslucas at openjdk.org Thu Sep 14 22:16:39 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 14 Sep 2023 22:16:39 GMT Subject: RFR: JDK-8315279: Factor 'basic_plus_adr' out of PhaseMacroExpand and delete make_load/store [v3] In-Reply-To: References: <3_ThxcuU3e_hPvWi4lJBfXsyG4Ky_eyyifbkZ2izlKQ=.0070b59a-31ae-4ede-9625-a9e4bf3b7a16@github.com> Message-ID: On Thu, 31 Aug 2023 20:42:39 GMT, Vladimir Ivanov wrote: >> Thanks for clarifying @iwanowww . I think I see your point now. My original intent was to just move these methods out of PhaseMacroExpand and not much else. >> >> I'm going to do some more refactoring and patch all users of these make methods to just use this single method: `static Node* make(PhaseIterGVN& igvn, Node* base, Node* ptr, Node* offset)`. What do you think? > > Sounds good. > > But in the future I'd like to see `PhaseMacroExpand` and `PhaseIdealLoop` migrated to `GraphKit` instead. @iwanowww do you think I should just withdraw this PR and close the associated RFE? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15480#discussion_r1326567520 From jjoo at openjdk.org Thu Sep 14 23:18:09 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Thu, 14 Sep 2023 23:18:09 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v17] In-Reply-To: References: Message-ID: <1z-hvRQ7mNpWbqfMC_mteb56iFE01YAOCjqnrGRAQGI=.2f477f85-2ada-4a16-a23c-e3464fec9141@github.com> > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Add unit test to check existence of GC CPU counters ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/8b6c5533..4e51426b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=15-16 Stats: 38 lines in 1 file changed: 38 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From jjoo at openjdk.org Thu Sep 14 23:29:15 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Thu, 14 Sep 2023 23:29:15 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v18] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Clean up test and improve total counter name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/4e51426b..7e1812e3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=16-17 Stats: 8 lines in 2 files changed: 2 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From dholmes at openjdk.org Fri Sep 15 01:32:53 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 15 Sep 2023 01:32:53 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v18] In-Reply-To: References: Message-ID: On Thu, 14 Sep 2023 23:29:15 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Clean up test and improve total counter name Changes requested by dholmes (Reviewer). src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 2429: > 2427: ThreadTotalCPUTimeClosure tttc(_perf_parallel_worker_threads_cpu_time, true); > 2428: // Currently parallel worker threads never terminate (JDK-8081682), so it is > 2429: // safe for VMThread to read their CPU times. If upstream fixes JDK-8087340 The reference to "upstream" is not applicable here - this is "upstream". src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 2089: > 2087: > 2088: void G1ConcurrentMark::update_concurrent_mark_threads_cpu_time() { > 2089: assert(Thread::current() == static_cast(cm_thread()), No cast is needed here. src/hotspot/share/runtime/perfData.hpp line 64: > 62: COM_THREADS, > 63: SUN_THREADS, > 64: SUN_THREADS_GCCPU, // Subsystem for Sun Threads GC CPU Really not sure about this naming ... src/hotspot/share/runtime/thread.hpp line 36: > 34: #include "runtime/globals.hpp" > 35: #include "runtime/os.hpp" > 36: #include "runtime/perfData.hpp" Why is this needed here? src/hotspot/share/runtime/vmThread.cpp line 296: > 294: > 295: if (UsePerfData && os::is_thread_cpu_time_supported()) { > 296: assert(Thread::current() == static_cast(this), No cast needed test/jdk/sun/tools/jcmd/TestGcCounters.java line 34: > 32: output.shouldHaveExitValue(0); > 33: output.shouldContain(SUN_THREADS + ".total_gc_cpu_time"); > 34: output.shouldContain(SUN_THREADS_GCCPU + ".g1_conc_mark"); If this test is only for G1 then you need it to `@require` that the GC is G1, else this will fail when run under other GCs. ------------- PR Review: https://git.openjdk.org/jdk/pull/15082#pullrequestreview-1628061971 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1326673343 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1326674877 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1326678536 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1326678976 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1326679175 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1326680025 From dholmes at openjdk.org Fri Sep 15 01:42:46 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 15 Sep 2023 01:42:46 GMT Subject: RFR: 8315415: OutputAnalyzer.shouldMatchByLine() fails in some cases In-Reply-To: References: Message-ID: On Thu, 14 Sep 2023 19:59:53 GMT, Leonid Mesnik wrote: > OutputAnalyzer.shouldMatchByLine(from, to, pattern) > treat from and to parameters as patterns and not lines. So it might fail to compile them or work not as expected in some cases. > > I grepped the usage of shouldMatchByLine and stdoutShouldMatchByLine and found that in most cases from/to are set to some regex patterns. So I just updated the names of variables and documentation to explicitly say that from/to are patterns. > > See bugs for details. Tested with tier1 (mostly for validation scripts since no code changes.) This seems reasonable. But it doesn't seem to fix the issue that was reported in the bug report. Was that test subsequently modified to ensure the line did not happen to contain regex pattern meta-characters? ------------- PR Review: https://git.openjdk.org/jdk/pull/15753#pullrequestreview-1628075185 From dholmes at openjdk.org Fri Sep 15 02:04:43 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 15 Sep 2023 02:04:43 GMT Subject: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX [v4] In-Reply-To: References: Message-ID: On Thu, 14 Sep 2023 12:32:18 GMT, Joachim Kern wrote: >> After push of [JDK-8307478](https://bugs.openjdk.org/browse/JDK-8307478) , the following test started to fail on AIX : >> com/sun/tools/attach/warnings/DynamicLoadWarningTest.java; >> The problem was described in [JDK-8309549](https://bugs.openjdk.org/browse/JDK-8309549) with a first try of a fix. >> A second fix via [JDK-8310191](https://bugs.openjdk.org/browse/JDK-8310191) was necessary. >> Both fixes just disable the specific subtest on AIX, without correction of the root cause. >> The root cause is, that dlopen() on AIX returns different handles every time, even if you load a library twice. There is no official AIX API available to get this information on a different way. >> My proposal is, to use the stat64x API with the fields st_device and st_inode. After a dlopen() the stat64x() API is called additionally to get this information which is then stored parallel to the library handle in the jvmtiAgent. For AIX we then can compare these values instead of the library handle and get the same functionality as on linux. > > Joachim Kern has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8315706' into JDK-8315706 > - Following the proposals Changes requested by dholmes (Reviewer). src/hotspot/share/prims/jvmtiAgent.hpp line 48: > 46: #ifdef AIX > 47: unsigned long _inode; > 48: unsigned long _device; It is best, IMO, to use the actual types rather than something expected to be "equivalent". ------------- PR Review: https://git.openjdk.org/jdk/pull/15583#pullrequestreview-1628087511 PR Review Comment: https://git.openjdk.org/jdk/pull/15583#discussion_r1326697271 From lmesnik at openjdk.org Fri Sep 15 02:33:39 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 15 Sep 2023 02:33:39 GMT Subject: RFR: 8315415: OutputAnalyzer.shouldMatchByLine() fails in some cases In-Reply-To: References: Message-ID: On Thu, 14 Sep 2023 19:59:53 GMT, Leonid Mesnik wrote: > OutputAnalyzer.shouldMatchByLine(from, to, pattern) > treat from and to parameters as patterns and not lines. So it might fail to compile them or work not as expected in some cases. > > I grepped the usage of shouldMatchByLine and stdoutShouldMatchByLine and found that in most cases from/to are set to some regex patterns. So I just updated the names of variables and documentation to explicitly say that from/to are patterns. > > See bugs for details. Tested with tier1 (mostly for validation scripts since no code changes.) The test doesn't use this API. As I understand Calvin find this problem while writing tests, and decided to just to write it other way. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15753#issuecomment-1720400430 From dholmes at openjdk.org Fri Sep 15 03:21:37 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 15 Sep 2023 03:21:37 GMT Subject: RFR: 8315415: OutputAnalyzer.shouldMatchByLine() fails in some cases In-Reply-To: References: Message-ID: <5e1_186WM0OK0PMHvKFO1DQCBfLOAqRFrSqS93nGgx4=.99241836-8cd2-4e7a-a82a-44ba0e50610b@github.com> On Thu, 14 Sep 2023 19:59:53 GMT, Leonid Mesnik wrote: > OutputAnalyzer.shouldMatchByLine(from, to, pattern) > treat from and to parameters as patterns and not lines. So it might fail to compile them or work not as expected in some cases. > > I grepped the usage of shouldMatchByLine and stdoutShouldMatchByLine and found that in most cases from/to are set to some regex patterns. So I just updated the names of variables and documentation to explicitly say that from/to are patterns. > > See bugs for details. Tested with tier1 (mostly for validation scripts since no code changes.) Okay - thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15753#pullrequestreview-1628135485 From haosun at openjdk.org Fri Sep 15 06:11:38 2023 From: haosun at openjdk.org (Hao Sun) Date: Fri, 15 Sep 2023 06:11:38 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v10] In-Reply-To: References: Message-ID: > ### Background > > 1. PAC-RET branch protection was initially implemented on Linux/AArch64 in JDK-8277204 [1]. > > 2. However, it was broken with the introduction of virtual threads [2], mainly because the continuation freeze/thaw mechanism would trigger stack copying to/from memory, whereas the saved and signed LR on the stack doesn't get re-signed accordingly. > > 3. PR-9067 [3] tried to implement the re-sign part, but it was not accepted because option "PreserveFramePointer" is always turned on by PAC-RET but this would slow down virtual threads by ~5-20x. > > 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview language features are enabled. Note that virtual thread is one preview feature then. > > 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. > > ### Goal > > This patch aims to make PAC-RET compatible with virtual threads. > > ### Requirements of virtual threads > > R-1: Option "PreserveFramePointer" should be turned off. That is, PAC-RET implementation should not rely on frame pointer FP. Otherwise, the fast path in stack copying will never be taken. > > R-2: Use some invariant values to stack copying as the modifier, so as to avoid the PAC re-sign for continuation thaw, as the fast path in stack copying doesn't walk the frame. > > Note that more details can be found in the discussion [3]. > > ### Investigation > > We considered to use (relative) stack pointer SP, thread ID, PACStack [7] and value zero as the candidate modifier. > > 1. SP: In some scenarios, we need to authenticate the return address in places where the current SP doesn't match the SP on function entry. E.g. see the usage in Runtime1::generate_handle_exception(). Hence, neither absolute nor relative SP works. > > 2. thread ID (tid): It's invariant to virtual thread, but it's nontrivial to access it from the JIT side. We need 1) firstly resolve the address of current thread (See [8] as an example), and 2) get the tid field in the way like java_lang_Thread::thread_id(). I suppose this would introduce big performance overhead. Then can we turn to use "rthread" register (JavaThread object address) as the modifier? Unfortunately, it's not an invariant to virtual threads and PAC re-sign is still needed. > > 5. PACStack uses the signed return address of caller as the modifier to sign the callee's return address. In this way, we get one PACed call chain. The modifier should be saved into somewhere around the frame record. Inevitably, FP should be preserved to make it easy to find this modifier in case of... Hao Sun has updated the pull request incrementally with one additional commit since the last revision: break long lines ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13322/files - new: https://git.openjdk.org/jdk/pull/13322/files/63e934d9..68ccba06 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13322&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13322&range=08-09 Stats: 62 lines in 7 files changed: 40 ins; 1 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/13322.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13322/head:pull/13322 PR: https://git.openjdk.org/jdk/pull/13322 From haosun at openjdk.org Fri Sep 15 06:12:42 2023 From: haosun at openjdk.org (Hao Sun) Date: Fri, 15 Sep 2023 06:12:42 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v9] In-Reply-To: References: Message-ID: On Thu, 14 Sep 2023 08:52:18 GMT, Andrew Haley wrote: >> Hao Sun has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactor long assertions in continuationFreezeThaw.cpp > > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 595: > >> 593: DEBUG_ONLY(_empty = false;) >> 594: assert(chunk->sp() < (chunk->stack_size() - chunk->argsize()), ""); >> 595: assert(ContinuationHelper::return_address_at(chunk->sp_address() - frame::sender_sp_ret_address_offset()) == chunk->pc(), ""); > > You missed this file. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1326840542 From haosun at openjdk.org Fri Sep 15 06:12:44 2023 From: haosun at openjdk.org (Hao Sun) Date: Fri, 15 Sep 2023 06:12:44 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v8] In-Reply-To: References: Message-ID: On Thu, 14 Sep 2023 08:53:17 GMT, Andrew Haley wrote: >> Thanks for your suggestion. Updated in the latest commit. >> >> Note that `#ifdef ASSERT` is added in my commit, since variables `_empty` and `_orig_chunk_sp` belong to that scope. > > Please find some other extremely long lines in this patch. One is 138 characters long! Anything much more than 80 is a candidate for attention. It's very hard to read. And please don't use empty comment strings in assertions. > All of this is about code readability. Thanks for pointing this out. Updated in the latest commit. Hope the code readability is improved now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1326840329 From jkern at openjdk.org Fri Sep 15 06:28:43 2023 From: jkern at openjdk.org (Joachim Kern) Date: Fri, 15 Sep 2023 06:28:43 GMT Subject: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX [v4] In-Reply-To: References: Message-ID: On Fri, 15 Sep 2023 02:01:26 GMT, David Holmes wrote: >> Joachim Kern has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/JDK-8315706' into JDK-8315706 >> - Following the proposals > > src/hotspot/share/prims/jvmtiAgent.hpp line 48: > >> 46: #ifdef AIX >> 47: unsigned long _inode; >> 48: unsigned long _device; > > It is best, IMO, to use the actual types rather than something expected to be "equivalent". OK, this would be ino64_t and dev64_t. I will do the change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15583#discussion_r1326853499 From jkern at openjdk.org Fri Sep 15 07:22:32 2023 From: jkern at openjdk.org (Joachim Kern) Date: Fri, 15 Sep 2023 07:22:32 GMT Subject: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX [v5] In-Reply-To: References: Message-ID: > After push of [JDK-8307478](https://bugs.openjdk.org/browse/JDK-8307478) , the following test started to fail on AIX : > com/sun/tools/attach/warnings/DynamicLoadWarningTest.java; > The problem was described in [JDK-8309549](https://bugs.openjdk.org/browse/JDK-8309549) with a first try of a fix. > A second fix via [JDK-8310191](https://bugs.openjdk.org/browse/JDK-8310191) was necessary. > Both fixes just disable the specific subtest on AIX, without correction of the root cause. > The root cause is, that dlopen() on AIX returns different handles every time, even if you load a library twice. There is no official AIX API available to get this information on a different way. > My proposal is, to use the stat64x API with the fields st_device and st_inode. After a dlopen() the stat64x() API is called additionally to get this information which is then stored parallel to the library handle in the jvmtiAgent. For AIX we then can compare these values instead of the library handle and get the same functionality as on linux. Joachim Kern has updated the pull request incrementally with one additional commit since the last revision: adopt types ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15583/files - new: https://git.openjdk.org/jdk/pull/15583/files/a8c6e65b..c3852b38 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15583&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15583&range=03-04 Stats: 10 lines in 4 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/15583.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15583/head:pull/15583 PR: https://git.openjdk.org/jdk/pull/15583 From azafari at openjdk.org Fri Sep 15 07:23:40 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 15 Sep 2023 07:23:40 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v4] In-Reply-To: References: <068Gqd9adw6k8nrLAJoEMDmbw2s3RMpV0KPmWDS0OdI=.1d8171a3-012a-425c-bea6-44f538a64106@github.com> <2kgLlH__lH2LiM7VdVePqCgW8Uck-AvrO-klTB8yzq4=.66ad8590-4661-47f2-b206-5979c8dba063@github.com> Message-ID: On Thu, 7 Sep 2023 07:06:46 GMT, Afshin Zafari wrote: >>> Also, why isn't this change also being applied to `find_from_end` >> >> Thank you @kimbarrett, the function is also changed accordingly. > >> We could just as well do a capturing lambda here, yes. Then we'd have: >> >> ```c++ >> template >> int find(F finder); >> ``` >> >> It'd be a template instead of function pointer since it's a capturing lambda and `std::function` is not permitted in Hotspot AFAIK. >> >> As an aside, to clarify for readers: There's a `&` missing in the capture list of your examples. > > It would be nice to have this templated Function as finder. However, I think it is better to keep the changes small and manageable for this PR. Thanks for the comment. > Also, why isn't this change also being applied to `find_from_end` Thanks Kim! It is also changed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15418#discussion_r1326901038 From aph at openjdk.org Fri Sep 15 09:06:42 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 15 Sep 2023 09:06:42 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v10] In-Reply-To: References: Message-ID: On Fri, 15 Sep 2023 06:11:38 GMT, Hao Sun wrote: >> ### Background >> >> 1. PAC-RET branch protection was initially implemented on Linux/AArch64 in JDK-8277204 [1]. >> >> 2. However, it was broken with the introduction of virtual threads [2], mainly because the continuation freeze/thaw mechanism would trigger stack copying to/from memory, whereas the saved and signed LR on the stack doesn't get re-signed accordingly. >> >> 3. PR-9067 [3] tried to implement the re-sign part, but it was not accepted because option "PreserveFramePointer" is always turned on by PAC-RET but this would slow down virtual threads by ~5-20x. >> >> 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview language features are enabled. Note that virtual thread is one preview feature then. >> >> 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. >> >> ### Goal >> >> This patch aims to make PAC-RET compatible with virtual threads. >> >> ### Requirements of virtual threads >> >> R-1: Option "PreserveFramePointer" should be turned off. That is, PAC-RET implementation should not rely on frame pointer FP. Otherwise, the fast path in stack copying will never be taken. >> >> R-2: Use some invariant values to stack copying as the modifier, so as to avoid the PAC re-sign for continuation thaw, as the fast path in stack copying doesn't walk the frame. >> >> Note that more details can be found in the discussion [3]. >> >> ### Investigation >> >> We considered to use (relative) stack pointer SP, thread ID, PACStack [7] and value zero as the candidate modifier. >> >> 1. SP: In some scenarios, we need to authenticate the return address in places where the current SP doesn't match the SP on function entry. E.g. see the usage in Runtime1::generate_handle_exception(). Hence, neither absolute nor relative SP works. >> >> 2. thread ID (tid): It's invariant to virtual thread, but it's nontrivial to access it from the JIT side. We need 1) firstly resolve the address of current thread (See [8] as an example), and 2) get the tid field in the way like java_lang_Thread::thread_id(). I suppose this would introduce big performance overhead. Then can we turn to use "rthread" register (JavaThread object address) as the modifier? Unfortunately, it's not an invariant to virtual threads and PAC re-sign is still needed. >> >> 5. PACStack uses the signed return address of caller as the modifier to sign the callee's return address. In this way, we get one PACed call chain. The modifier should be saved into somewhere around the frame record. Inevitably, FP sh... > > Hao Sun has updated the pull request incrementally with one additional commit since the last revision: > > break long lines OK! I think we're done. It's been a long haul. Thank you, especially for your patience during the review process. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13322#pullrequestreview-1628528210 From haosun at openjdk.org Fri Sep 15 09:51:45 2023 From: haosun at openjdk.org (Hao Sun) Date: Fri, 15 Sep 2023 09:51:45 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v6] In-Reply-To: References: Message-ID: On Fri, 8 Sep 2023 12:23:56 GMT, Andrew Haley wrote: >> Hao Sun has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: >> >> - Revert to the implementation with zero as the PAC modifier >> - Merge branch 'master' into jdk-8287325 >> - Update aarch64.ad and jvmci AArch64TestAssembler.java >> >> Before this patch, rscratch1 is clobbered. >> With this patch, we use the rscratch1 register after we save it on the >> stack. >> >> In this way, the code would be consistent with >> macroAssembler_aarch64.cpp. >> - Merge branch 'master' into jdk-8287325 >> - Remove my temp test patch on jvmci_global.hpp and stubGenerator_aarch64.hpp >> - Use relative SP as the PAC modifier >> - Merge branch 'master' into jdk-8287325 >> - Merge branch 'master' into jdk-8287325 >> - Rename return_pc_at and patch_pc_at >> >> Rename return_pc_at to return_address_at. >> Rename patch_pc_at to patch_return_address_at. >> - 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret >> >> * Background >> >> 1. PAC-RET branch protection was initially implemented on Linux/AArch64 >> in JDK-8277204 [1]. >> >> 2. However, it was broken with the introduction of virtual threads [2], >> mainly because the continuation freeze/thaw mechanism would trigger >> stack copying to/from memory, whereas the saved and signed LR on the >> stack doesn't get re-signed accordingly. >> >> 3. PR-9067 [3] tried to implement the re-sign part, but it was not >> accepted because option "PreserveFramePointer" is always turned on by >> PAC-RET but this would slow down virtual threads by ~5-20x. >> >> 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview >> language features are enabled. Note that virtual thread is one preview >> feature then. >> >> 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. >> >> * Goal >> >> This patch aims to make PAC-RET compatible with virtual threads. >> >> * Requirements of virtual threads >> >> R-1: Option "PreserveFramePointer" should be turned off. That is, >> PAC-RET implementation should not rely on frame pointer FP. Otherwise, >> the fast path in stack copying will never be taken. >> >> R-2: Use some invariant values to stack copying as the modifier, so as >> to avoid the PAC re-sign for continuation thaw, as the fast path in >> stack copying doesn't walk the frame. >> >> Note that more details can be found in the discuss... > >> In the latest commit, I have reverted to the PAC-RET implementation using `zero` as the modifier. >> @theRealAph Could you help take another look at it when you have spare time? Thanks > > Looking good. One more nit. Thanks a lot for your insightful comments to make this patch better. @theRealAph I believe a second reviewer is needed. I was wondering if @dean-long could help take another look at it? Thanks in advance. Here is a brief summary of our updates after your last review. Regarding to the `relative SP` solution, we decided not to take it. After the discussion with aph, we thought `relative SP` solution is a bit complex and not easy to maintain, e.g., - we have to obtain one scratch register (via saving/restoring it on the stack) during the PAC signing/authenticating process at the function prologue/epilogue. - rthread(x28) register is needed to compute the `relative SP`. Hence, we must take it carefully at the interface when C code calls Java code. Hence, we finally revert to our initial implementation, i.e. using `zero const` as the PAC modifier, and make some refactorings. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13322#issuecomment-1720992957 From ayang at openjdk.org Fri Sep 15 11:48:43 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 15 Sep 2023 11:48:43 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v5] In-Reply-To: References: <4BxTPGct1ALNjXupIUdMtGT3ZfB0qalFlNq0Vva162Y=.17fc708f-0eda-43e0-bfe6-8447ee5b3488@github.com> Message-ID: On Thu, 14 Sep 2023 15:38:59 GMT, Richard Reingruber wrote: > These are the results I get. I get similar ones using the latest revision. `DelayInducer` also shows improvement. Then, I changed the bm slightly, `static final int stride = 32 * 64;` for more scarce dirty cards and got huge (~300x) regression on my box. ## baseline [0.006s][info][gc] Using Parallel [1.430s][info][gc] GC(0) Pause Young (Allocation Failure) 1791M->1027M(2944M) 248.338ms [1.788s][info][gc] GC(1) Pause Young (Allocation Failure) 1795M->1027M(2944M) 241.907ms ## new [0.003s][info][gc] Using Parallel [83.072s][info][gc] GC(0) Pause Young (Allocation Failure) 1791M->1027M(2944M) 81936.186ms [165.091s][info][gc] GC(1) Pause Young (Allocation Failure) 1795M->1027M(2944M) 81905.387ms ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1721141251 From rrich at openjdk.org Fri Sep 15 12:47:41 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 15 Sep 2023 12:47:41 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v7] In-Reply-To: References: Message-ID: On Thu, 14 Sep 2023 15:32:29 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > 100 stripes per active worker thread I though you would do that :) To prevent the huge regression we could scan from the first dirty card to the stripe end. Less precise but still better than the baseline in many cases (I hope). Instead we could end the scan when finding a clean region that is big enough but that wouldn't help much I reckon. Better keep it simple. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1721220378 From dnsimon at openjdk.org Fri Sep 15 14:01:19 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 15 Sep 2023 14:01:19 GMT Subject: RFR: 8315954: getArgumentValues002.java fails on Graal [v2] In-Reply-To: <04EdBkR5FxVRvYdChK8g3nWNu93Iht65AgYEQv_ef6U=.bc20196c-9cab-4212-bb1a-85ce46a46b72@github.com> References: <04EdBkR5FxVRvYdChK8g3nWNu93Iht65AgYEQv_ef6U=.bc20196c-9cab-4212-bb1a-85ce46a46b72@github.com> Message-ID: On Wed, 13 Sep 2023 17:06:23 GMT, Tom Rodriguez wrote: >> src/hotspot/share/interpreter/oopMapCache.cpp line 204: >> >>> 202: void InterpreterOopMap::initialize() { >>> 203: _method = nullptr; >>> 204: _mask_size = INT_MAX; // This value should cause a failure quickly >> >> Unless I'm mistaken, `USHRT_MAX` is a legal (but unlikely) value (i.e. `max_locals` in a class file can be 65635) so I changed this to use `INT_MAX` instead. > > It's actually not legal since _mask_size is always a multiple of `bits_per_entry` which is 2 so any odd value would work. I would be leery of touching this unless you really need to. Ok, I'll revert it to `USHRT_MAX`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15705#discussion_r1324862209 From dnsimon at openjdk.org Fri Sep 15 14:01:23 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 15 Sep 2023 14:01:23 GMT Subject: RFR: 8315954: getArgumentValues002.java fails on Graal [v2] In-Reply-To: References: <8aJO9BMxnzPYrMKaB73gsH2ra5Htz07Qo9c1yz-j-pk=.ca761ecd-81d8-47b7-b25b-673b2834d536@github.com> Message-ID: On Thu, 14 Sep 2023 17:46:15 GMT, Tom Rodriguez wrote: >> We only look up the mask for locals and so ignore stack indexes in the mask altogether. I'm assuming therefore that `mask.is_oop(i)` can never hit any problems. >> Note that this API should be safe when called for *any* valid BCI, not just those for an OSR entry point. Even if called for a BCI with a non-empty stack, the current implementation simply ignores that part of the mask. > > Yes that's implied by the name of the method. It would make me happy if there was a comment pointing out that we're explicitly ignoring whether the stack is non-empty and contains oops. Instead, I generalized `getLiveObjectLocalsAt` to `getOopMapAt` since the VM computation is for both locals and operand stack anyway. When called for OSR entry points, the result will be the same since (currently) HotSpot requires the stack to be empty. >> In the common case we can avoid the overhead of allocating a long array and initializing each of it elements with a JNI upcall. > > It's premature optimization. This is called once per OSR compile and the call itself is doing a dataflow over the bytecodes which seems more expensive then a couple JNI calls. The array can be filled in using `copy_bytes_to` from HotSpot, which is a pair of JNI calls. You already have 2 JNI calls because of the unsafe allocation. I can maybe accept that returning a long from the JVM to avoid an allocation for the < 64 case isn't a terrible idea but I can't see any universe where we care about that. > > This method should be returning a BitSet not a long. Having the caller do that fixup is super ugly. Ok, I'm convinced. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15705#discussion_r1327321803 PR Review Comment: https://git.openjdk.org/jdk/pull/15705#discussion_r1324933851 From never at openjdk.org Fri Sep 15 14:01:18 2023 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 15 Sep 2023 14:01:18 GMT Subject: RFR: 8315954: getArgumentValues002.java fails on Graal [v2] In-Reply-To: References: Message-ID: <04EdBkR5FxVRvYdChK8g3nWNu93Iht65AgYEQv_ef6U=.bc20196c-9cab-4212-bb1a-85ce46a46b72@github.com> On Wed, 13 Sep 2023 09:47:48 GMT, Doug Simon wrote: >> Doug Simon has updated the pull request incrementally with three additional commits since the last revision: >> >> - generalized getLiveObjectLocalsAt to getOopMapAt >> - need to zero oop_map_buf >> - simplified getLiveObjectLocalsAt and moved it from ResolvedJavaMethod to HotSpotResolvedJavaMethod > > src/hotspot/share/interpreter/oopMapCache.cpp line 204: > >> 202: void InterpreterOopMap::initialize() { >> 203: _method = nullptr; >> 204: _mask_size = INT_MAX; // This value should cause a failure quickly > > Unless I'm mistaken, `USHRT_MAX` is a legal (but unlikely) value (i.e. `max_locals` in a class file can be 65635) so I changed this to use `INT_MAX` instead. It's actually not legal since _mask_size is always a multiple of `bits_per_entry` which is 2 so any odd value would work. I would be leery of touching this unless you really need to. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15705#discussion_r1324817483 From dnsimon at openjdk.org Fri Sep 15 14:01:21 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 15 Sep 2023 14:01:21 GMT Subject: RFR: 8315954: getArgumentValues002.java fails on Graal [v2] In-Reply-To: References: Message-ID: On Fri, 15 Sep 2023 13:56:00 GMT, Doug Simon wrote: >> This PR adds `HotSpotResolvedJavaMethod.getOopMapAt` to get the oop map for a method at a given BCI. This is required to do correct clearing of oops at OSR entry points. >> >> As part of this addition, I needed to be able to detect requests for oop maps at invalid BCIs. For this, I added `InterpreterOopMap::has_valid_mask()`. When an oop map computation is requested for an invalid BCI, this method returns false. > > Doug Simon has updated the pull request incrementally with three additional commits since the last revision: > > - generalized getLiveObjectLocalsAt to getOopMapAt > - need to zero oop_map_buf > - simplified getLiveObjectLocalsAt and moved it from ResolvedJavaMethod to HotSpotResolvedJavaMethod src/hotspot/share/interpreter/oopMapCache.cpp line 616: > 614: tmp->fill(method, bci); > 615: if (tmp->has_valid_mask()) { > 616: entry->resource_copy(tmp); If `tmp` is invalid (e.g. oop map was requested for invalid BCI), then `resource_copy` crashes the VM in strange ways since it blindly trusts the mask size to be valid. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15705#discussion_r1325828568 From never at openjdk.org Fri Sep 15 14:01:22 2023 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 15 Sep 2023 14:01:22 GMT Subject: RFR: 8315954: getArgumentValues002.java fails on Graal [v2] In-Reply-To: References: <8aJO9BMxnzPYrMKaB73gsH2ra5Htz07Qo9c1yz-j-pk=.ca761ecd-81d8-47b7-b25b-673b2834d536@github.com> Message-ID: On Thu, 14 Sep 2023 17:40:56 GMT, Doug Simon wrote: >> src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 3099: >> >>> 3097: JVMCIPrimitiveArray oop_map = JVMCIENV->wrap(oop_map_handle); >>> 3098: int oop_map_len = JVMCIENV->get_length(oop_map); >>> 3099: if (nwords > oop_map_len) { >> >> Should we sanity check against `mask.number_of_entries()`? One wrinkle here is that `compute_one_oop_map` also computes information about the stack so the mask it computes can be larger than just max_locals. For the purposes of OSR this doesn't matter as none of the JITs support OSR with a non-empty stack, so we would never call it for a bci with a non-empty stack. So should we disallow calling it with a non-empty stack or just properly handle it by passing in an array long enough to contain `max_locals + max_stack`? > > We only look up the mask for locals and so ignore stack indexes in the mask altogether. I'm assuming therefore that `mask.is_oop(i)` can never hit any problems. > Note that this API should be safe when called for *any* valid BCI, not just those for an OSR entry point. Even if called for a BCI with a non-empty stack, the current implementation simply ignores that part of the mask. Yes that's implied by the name of the method. It would make me happy if there was a comment pointing out that we're explicitly ignoring whether the stack is non-empty and contains oops. >> src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java line 788: >> >>> 786: >>> 787: @Override >>> 788: public long getLiveObjectLocalsAt(int bci, BitSet bigOopMap) { >> >> This seems overly complicated to me. Why isn't it simply: >> >> public BitSet getLiveObjectLocalsAt(int bci) { >> int locals = getMaxLocals(); >> int nwords = ((locals - 1) / 64) + 1; >> long liveness[] = new long[nwords]; >> compilerToVM().getLiveObjectLocalsAt(this, bci, liveness); >> return new BitSet(liveness); >> } > > In the common case we can avoid the overhead of allocating a long array and initializing each of it elements with a JNI upcall. It's premature optimization. This is called once per OSR compile and the call itself is doing a dataflow over the bytecodes which seems more expensive then a couple JNI calls. The array can be filled in using `copy_bytes_to` from HotSpot, which is a pair of JNI calls. You already have 2 JNI calls because of the unsafe allocation. I can maybe accept that returning a long from the JVM to avoid an allocation for the < 64 case isn't a terrible idea but I can't see any universe where we care about that. This method should be returning a BitSet not a long. Having the caller do that fixup is super ugly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15705#discussion_r1326324383 PR Review Comment: https://git.openjdk.org/jdk/pull/15705#discussion_r1324912621 From dnsimon at openjdk.org Fri Sep 15 14:01:13 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 15 Sep 2023 14:01:13 GMT Subject: RFR: 8315954: getArgumentValues002.java fails on Graal [v2] In-Reply-To: References: Message-ID: > This PR adds `HotSpotResolvedJavaMethod.getOopMapAt` to get the oop map for a method at a given BCI. This is required to do correct clearing of oops at OSR entry points. > > As part of this addition, I needed to be able to detect requests for oop maps at invalid BCIs. For this, I added `InterpreterOopMap::has_valid_mask()`. When an oop map computation is requested for an invalid BCI, this method returns false. Doug Simon has updated the pull request incrementally with three additional commits since the last revision: - generalized getLiveObjectLocalsAt to getOopMapAt - need to zero oop_map_buf - simplified getLiveObjectLocalsAt and moved it from ResolvedJavaMethod to HotSpotResolvedJavaMethod ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15705/files - new: https://git.openjdk.org/jdk/pull/15705/files/23b94e35..c6c6c0d8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15705&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15705&range=00-01 Stats: 251 lines in 8 files changed: 65 ins; 120 del; 66 mod Patch: https://git.openjdk.org/jdk/pull/15705.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15705/head:pull/15705 PR: https://git.openjdk.org/jdk/pull/15705 From dnsimon at openjdk.org Fri Sep 15 14:01:21 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 15 Sep 2023 14:01:21 GMT Subject: RFR: 8315954: getArgumentValues002.java fails on Graal [v2] In-Reply-To: <8aJO9BMxnzPYrMKaB73gsH2ra5Htz07Qo9c1yz-j-pk=.ca761ecd-81d8-47b7-b25b-673b2834d536@github.com> References: <8aJO9BMxnzPYrMKaB73gsH2ra5Htz07Qo9c1yz-j-pk=.ca761ecd-81d8-47b7-b25b-673b2834d536@github.com> Message-ID: On Thu, 14 Sep 2023 17:27:35 GMT, Tom Rodriguez wrote: >> Doug Simon has updated the pull request incrementally with three additional commits since the last revision: >> >> - generalized getLiveObjectLocalsAt to getOopMapAt >> - need to zero oop_map_buf >> - simplified getLiveObjectLocalsAt and moved it from ResolvedJavaMethod to HotSpotResolvedJavaMethod > > src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 3099: > >> 3097: JVMCIPrimitiveArray oop_map = JVMCIENV->wrap(oop_map_handle); >> 3098: int oop_map_len = JVMCIENV->get_length(oop_map); >> 3099: if (nwords > oop_map_len) { > > Should we sanity check against `mask.number_of_entries()`? One wrinkle here is that `compute_one_oop_map` also computes information about the stack so the mask it computes can be larger than just max_locals. For the purposes of OSR this doesn't matter as none of the JITs support OSR with a non-empty stack, so we would never call it for a bci with a non-empty stack. So should we disallow calling it with a non-empty stack or just properly handle it by passing in an array long enough to contain `max_locals + max_stack`? We only look up the mask for locals and so ignore stack indexes in the mask altogether. I'm assuming therefore that `mask.is_oop(i)` can never hit any problems. Note that this API should be safe when called for *any* valid BCI, not just those for an OSR entry point. Even if called for a BCI with a non-empty stack, the current implementation simply ignores that part of the mask. > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java line 788: > >> 786: >> 787: @Override >> 788: public long getLiveObjectLocalsAt(int bci, BitSet bigOopMap) { > > This seems overly complicated to me. Why isn't it simply: > > public BitSet getLiveObjectLocalsAt(int bci) { > int locals = getMaxLocals(); > int nwords = ((locals - 1) / 64) + 1; > long liveness[] = new long[nwords]; > compilerToVM().getLiveObjectLocalsAt(this, bci, liveness); > return new BitSet(liveness); > } In the common case we can avoid the overhead of allocating a long array and initializing each of it elements with a JNI upcall. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15705#discussion_r1326318835 PR Review Comment: https://git.openjdk.org/jdk/pull/15705#discussion_r1324860652 From never at openjdk.org Fri Sep 15 14:01:18 2023 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 15 Sep 2023 14:01:18 GMT Subject: RFR: 8315954: getArgumentValues002.java fails on Graal [v2] In-Reply-To: References: Message-ID: <8aJO9BMxnzPYrMKaB73gsH2ra5Htz07Qo9c1yz-j-pk=.ca761ecd-81d8-47b7-b25b-673b2834d536@github.com> On Fri, 15 Sep 2023 13:56:00 GMT, Doug Simon wrote: >> This PR adds `HotSpotResolvedJavaMethod.getOopMapAt` to get the oop map for a method at a given BCI. This is required to do correct clearing of oops at OSR entry points. >> >> As part of this addition, I needed to be able to detect requests for oop maps at invalid BCIs. For this, I added `InterpreterOopMap::has_valid_mask()`. When an oop map computation is requested for an invalid BCI, this method returns false. > > Doug Simon has updated the pull request incrementally with three additional commits since the last revision: > > - generalized getLiveObjectLocalsAt to getOopMapAt > - need to zero oop_map_buf > - simplified getLiveObjectLocalsAt and moved it from ResolvedJavaMethod to HotSpotResolvedJavaMethod This looks good now. src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 3099: > 3097: JVMCIPrimitiveArray oop_map = JVMCIENV->wrap(oop_map_handle); > 3098: int oop_map_len = JVMCIENV->get_length(oop_map); > 3099: if (nwords > oop_map_len) { Should we sanity check against `mask.number_of_entries()`? One wrinkle here is that `compute_one_oop_map` also computes information about the stack so the mask it computes can be larger than just max_locals. For the purposes of OSR this doesn't matter as none of the JITs support OSR with a non-empty stack, so we would never call it for a bci with a non-empty stack. So should we disallow calling it with a non-empty stack or just properly handle it by passing in an array long enough to contain `max_locals + max_stack`? src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethod.java line 136: > 134: * Computes which local variables contain live object values > 135: * at the instruction denoted by {@code bci}. This is the "oop map" used > 136: * by the garbage collector. `by the garbage collector for interpreter frames.` src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java line 788: > 786: > 787: @Override > 788: public long getLiveObjectLocalsAt(int bci, BitSet bigOopMap) { This seems overly complicated to me. Why isn't it simply: public BitSet getLiveObjectLocalsAt(int bci) { int locals = getMaxLocals(); int nwords = ((locals - 1) / 64) + 1; long liveness[] = new long[nwords]; compilerToVM().getLiveObjectLocalsAt(this, bci, liveness); return new BitSet(liveness); } src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ResolvedJavaMethod.java line 494: > 492: * by the current JVMCI runtime > 493: */ > 494: default long getLiveObjectLocalsAt(int bci, BitSet bigOopMap) { I think this should be in HotSpotResolvedJavaMethod since there's no apparent need for it in SVM. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15705#pullrequestreview-1627408725 PR Review Comment: https://git.openjdk.org/jdk/pull/15705#discussion_r1326305491 PR Review Comment: https://git.openjdk.org/jdk/pull/15705#discussion_r1326294532 PR Review Comment: https://git.openjdk.org/jdk/pull/15705#discussion_r1324808658 PR Review Comment: https://git.openjdk.org/jdk/pull/15705#discussion_r1324809645 From shade at openjdk.org Fri Sep 15 17:01:14 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 15 Sep 2023 17:01:14 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates In-Reply-To: References: Message-ID: On Wed, 13 Sep 2023 14:02:19 GMT, Aleksey Shipilev wrote: > Work in progress, submitting for broader attention. > > See more details in the bug and related issues. > > This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. > > This implements the mitigation for AArch64 and x86_64. More platforms can be implemented in this PR, or deferred to later PRs. Port maintainers, feel free to suggest the patches for your arches, I'll be happy to fold them in. > > Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. > > Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. > > Current PR deliberately sets backoff at `1000` to simplify testing. The actual value should be chosen by broader experiments. > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` > - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` Motivational improvements for new benchmark shows that `contended` case improves a lot the larger backoff values we set, and `uncontended` does not regress. $ make test TEST=micro:SecondarySuperCache TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:SecondarySuperMissBackoff=..." == AArch64 (Mac M1, 10 cores) # SecondarySuperMissBackoff=0 (current JDK) SecondarySuperCache.contended avgt 15 956,819 ? 26,373 ns/op SecondarySuperCache.uncontended avgt 15 2,797 ? 0,111 ns/op # SecondarySuperMissBackoff=1 SecondarySuperCache.contended avgt 15 401,065 ? 144,172 ns/op SecondarySuperCache.uncontended avgt 15 2,763 ? 0,077 ns/op # SecondarySuperMissBackoff=10 SecondarySuperCache.contended avgt 15 45,953 ? 24,371 ns/op SecondarySuperCache.uncontended avgt 15 2,787 ? 0,096 ns/op # SecondarySuperMissBackoff=100 SecondarySuperCache.contended avgt 15 17,581 ? 2,910 ns/op SecondarySuperCache.uncontended avgt 15 2,771 ? 0,084 ns/op # SecondarySuperMissBackoff=1000 SecondarySuperCache.contended avgt 15 7,841 ? 0,413 ns/op SecondarySuperCache.uncontended avgt 15 2,739 ? 0,010 ns/op # SecondarySuperMissBackoff=10000 SecondarySuperCache.contended avgt 15 6,780 ? 0,082 ns/op SecondarySuperCache.uncontended avgt 15 2,781 ? 0,045 ns/op === x86_64 (Xeon, 18 cores) Benchmark Mode Cnt Score Error Units # SecondarySuperMissBackoff=0 (current JDK) SecondarySuperCache.contended avgt 15 2380.915 ? 159.350 ns/op SecondarySuperCache.uncontended avgt 15 9.165 ? 0.017 ns/op # SecondarySuperMissBackoff=1 SecondarySuperCache.contended avgt 15 1523.914 ? 19.694 ns/op SecondarySuperCache.uncontended avgt 15 9.173 ? 0.026 ns/op # SecondarySuperMissBackoff=10 SecondarySuperCache.contended avgt 15 736.271 ? 6.554 ns/op SecondarySuperCache.uncontended avgt 15 9.169 ? 0.039 ns/op # SecondarySuperMissBackoff=100 SecondarySuperCache.contended avgt 15 254.527 ? 3.167 ns/op SecondarySuperCache.uncontended avgt 15 9.175 ? 0.053 ns/op # SecondarySuperMissBackoff=1000 SecondarySuperCache.contended avgt 15 91.392 ? 1.006 ns/op SecondarySuperCache.uncontended avgt 15 9.172 ? 0.025 ns/op # SecondarySuperMissBackoff=10000 SecondarySuperCache.contended avgt 15 67.798 ? 0.575 ns/op SecondarySuperCache.uncontended avgt 15 9.173 ? 0.047 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1721574298 From shade at openjdk.org Fri Sep 15 17:01:14 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 15 Sep 2023 17:01:14 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates Message-ID: Work in progress, submitting for broader attention. See more details in the bug and related issues. This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. This implements the mitigation for AArch64 and x86_64. More platforms can be implemented in this PR, or deferred to later PRs. Port maintainers, feel free to suggest the patches for your arches, I'll be happy to fold them in. Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. Current PR deliberately sets backoff at `1000` to simplify testing. The actual value should be chosen by broader experiments. Additional testing: - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` ------------- Commit messages: - Revert "WIP x86_32" - WIP x86_32 - Revert ARM/PPC/RISC-V/S390 development stubs - Development stubs for other arches - Touchup flag handling; enable =1000 by default to allow testing - Wrinkle: branches destroy condition codes - WIP x86_64 - Cleanups, renames - WIP AArch64 Changes: https://git.openjdk.org/jdk/pull/15718/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15718&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316180 Stats: 115 lines in 6 files changed: 111 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/15718.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15718/head:pull/15718 PR: https://git.openjdk.org/jdk/pull/15718 From jjoo at openjdk.org Sat Sep 16 00:16:30 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Sat, 16 Sep 2023 00:16:30 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v18] In-Reply-To: References: Message-ID: On Fri, 15 Sep 2023 01:29:57 GMT, David Holmes wrote: >> Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: >> >> Clean up test and improve total counter name > > test/jdk/sun/tools/jcmd/TestGcCounters.java line 34: > >> 32: output.shouldHaveExitValue(0); >> 33: output.shouldContain(SUN_THREADS + ".total_gc_cpu_time"); >> 34: output.shouldContain(SUN_THREADS_GCCPU + ".g1_conc_mark"); > > If this test is only for G1 then you need it to `@require` that the GC is G1, else this will fail when run under other GCs. Is the proper way to do this via adding the @requires annotation in the test header, like done in https://github.com/openjdk/jdk/blob/8f46abc938ffe338e25d5fdbdcfa0aaa12edfa58/test/hotspot/jtreg/gc/stress/TestStressG1Humongous.java#L30? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1327872911 From jjoo at openjdk.org Sat Sep 16 00:16:26 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Sat, 16 Sep 2023 00:16:26 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v19] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Address dholmes@ comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/7e1812e3..0a2565dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=17-18 Stats: 7 lines in 5 files changed: 1 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From vlivanov at openjdk.org Sat Sep 16 00:48:49 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 16 Sep 2023 00:48:49 GMT Subject: RFR: JDK-8315279: Factor 'basic_plus_adr' out of PhaseMacroExpand and delete make_load/store [v3] In-Reply-To: References: <3_ThxcuU3e_hPvWi4lJBfXsyG4Ky_eyyifbkZ2izlKQ=.0070b59a-31ae-4ede-9625-a9e4bf3b7a16@github.com> Message-ID: On Thu, 14 Sep 2023 22:14:05 GMT, Cesar Soares Lucas wrote: >> Sounds good. >> >> But in the future I'd like to see `PhaseMacroExpand` and `PhaseIdealLoop` migrated to `GraphKit` instead. > > @iwanowww do you think I should just withdraw this PR and close the associated RFE? It's up to you, Cesar. I find existing code good enough, but also I don't have anything against your proposal. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15480#discussion_r1327878747 From ihse at openjdk.org Sat Sep 16 08:50:53 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Sat, 16 Sep 2023 08:50:53 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic This looks good from a build perspective. ------------- Marked as reviewed by ihse (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/10661#pullrequestreview-1629936252 From duke at openjdk.org Sun Sep 17 07:08:00 2023 From: duke at openjdk.org (ExE Boss) Date: Sun, 17 Sep 2023 07:08:00 GMT Subject: RFR: 8268829: Provide an optimized way to walk the stack with Class object only [v12] In-Reply-To: References: Message-ID: On Thu, 7 Sep 2023 19:27:14 GMT, Mandy Chung wrote: >> 8268829: Provide an optimized way to walk the stack with Class object only >> >> `StackWalker::walk` creates one `StackFrame` per frame and the current implementation >> allocates one `StackFrameInfo` and one `MemberName` objects per frame. Some frameworks >> like logging may only interest in the Class object but not the method name nor the BCI, >> for example, filters out its implementation classes to find the caller class. It's >> similar to `StackWalker::getCallerClass` but allows a predicate to filter out the element. >> >> This PR proposes to add `Option::DROP_METHOD_INFO` enum that requests to drop the method information. If no method information is needed, a `StackWalker` with `DROP_METHOD_INFO` >> can be used instead and such stack walker will save the overhead of extracting the method information >> and the memory used for the stack walking. >> >> New factory methods to take a parameter to specify the kind of stack walker to be created are defined. >> This provides a simple way for existing code, for example logging frameworks, to take advantage of >> this enhancement with the least change as it can keep the existing function for traversing >> `StackFrame`s. >> >> For example: to find the first caller filtering a known list of implementation class, >> existing code can create a stack walker instance with `DROP_METHOD_INFO` option: >> >> >> StackWalker walker = StackWalker.getInstance(Option.DROP_METHOD_INFO, Option.RETAIN_CLASS_REFERENCE); >> Optional> callerClass = walker.walk(s -> >> s.map(StackFrame::getDeclaringClass) >> .filter(Predicate.not(implClasses::contains)) >> .findFirst()); >> >> >> If method information is accessed on the `StackFrame`s produced by this stack walker such as >> `StackFrame::getMethodName`, then `UnsupportedOperationException` will be thrown. >> >> #### Javadoc & specdiff >> >> https://cr.openjdk.org/~mchung/api/java.base/java/lang/StackWalker.html >> https://cr.openjdk.org/~mchung/jdk22/specdiff/overview-summary.html >> >> #### Alternatives Considered >> One alternative is to provide a new API: >> ` T walkClass(Function, ? extends T> function)` >> >> In this case, the caller would need to pass a function that takes a stream >> of `Class` object instead of `StackFrame`. Existing code would have to >> modify calls to the `walk` method to `walkClass` and the function body. >> >> ### Implementation Details >> >> A `StackWalker` configured with `DROP_METHOD_INFO` ... > > Mandy Chung has updated the pull request incrementally with one additional commit since the last revision: > > Fix @Param due to the rename from default to class+method src/java.base/share/classes/java/lang/ClassFrameInfo.java line 40: > 38: * @see StackWalker.Option#DROP_METHOD_INFO > 39: */ > 40: class ClassFrameInfo implements StackFrame { This class can be `sealed`: Suggestion: sealed class ClassFrameInfo implements StackFrame permits StackFrameInfo { src/java.base/share/classes/java/lang/StackFrameInfo.java line 93: > 91: synchronized (this) { > 92: if (type instanceof String sig) { > 93: type = JLIA.getMethodType(sig, declaringClass().getClassLoader()); Maybe?there should?be a?`return`?here: Suggestion: return type = JLIA.getMethodType(sig, declaringClass().getClassLoader()); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15370#discussion_r1328054480 PR Review Comment: https://git.openjdk.org/jdk/pull/15370#discussion_r1328053888 From liach at openjdk.org Sun Sep 17 07:46:00 2023 From: liach at openjdk.org (Chen Liang) Date: Sun, 17 Sep 2023 07:46:00 GMT Subject: RFR: 8268829: Provide an optimized way to walk the stack with Class object only [v12] In-Reply-To: References: Message-ID: On Sun, 17 Sep 2023 06:57:46 GMT, ExE Boss wrote: >> Mandy Chung has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix @Param due to the rename from default to class+method > > src/java.base/share/classes/java/lang/StackFrameInfo.java line 93: > >> 91: synchronized (this) { >> 92: if (type instanceof String sig) { >> 93: type = JLIA.getMethodType(sig, declaringClass().getClassLoader()); > > Maybe?there should?be a?`return`?here: > Suggestion: > > return type = JLIA.getMethodType(sig, declaringClass().getClassLoader()); `type` is of type `Object`, don't think this compiles as the result type of `=` is the `type` variable's type. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15370#discussion_r1328057872 From dholmes at openjdk.org Mon Sep 18 01:39:59 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 18 Sep 2023 01:39:59 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: <9OmIlHnBM1uKOsfdsY7ynBAUBGb_UJddZVV1bPPvUFM=.6a61ae3f-787d-48a4-848f-d3364052faef@github.com> On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic Nothing seems to have changed since this was updated in October 2022 and then went through several close/open cycles. What is actually being proposed now? ------------- PR Comment: https://git.openjdk.org/jdk/pull/10661#issuecomment-1722649674 From dholmes at openjdk.org Mon Sep 18 02:11:51 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 18 Sep 2023 02:11:51 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v19] In-Reply-To: References: Message-ID: On Sat, 16 Sep 2023 00:16:26 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Address dholmes@ comments Thanks for updates. ------------- PR Review: https://git.openjdk.org/jdk/pull/15082#pullrequestreview-1630202692 From dholmes at openjdk.org Mon Sep 18 02:11:52 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 18 Sep 2023 02:11:52 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v18] In-Reply-To: References: Message-ID: On Sat, 16 Sep 2023 00:10:22 GMT, Jonathan Joo wrote: >> test/jdk/sun/tools/jcmd/TestGcCounters.java line 34: >> >>> 32: output.shouldHaveExitValue(0); >>> 33: output.shouldContain(SUN_THREADS + ".total_gc_cpu_time"); >>> 34: output.shouldContain(SUN_THREADS_GCCPU + ".g1_conc_mark"); >> >> If this test is only for G1 then you need it to `@require` that the GC is G1, else this will fail when run under other GCs. > > Is the proper way to do this via adding the @requires annotation in the test header, like done in https://github.com/openjdk/jdk/blob/8f46abc938ffe338e25d5fdbdcfa0aaa12edfa58/test/hotspot/jtreg/gc/stress/TestStressG1Humongous.java#L30? Yes ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1328197163 From dholmes at openjdk.org Mon Sep 18 04:44:39 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 18 Sep 2023 04:44:39 GMT Subject: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX [v5] In-Reply-To: References: Message-ID: On Fri, 15 Sep 2023 07:22:32 GMT, Joachim Kern wrote: >> After push of [JDK-8307478](https://bugs.openjdk.org/browse/JDK-8307478) , the following test started to fail on AIX : >> com/sun/tools/attach/warnings/DynamicLoadWarningTest.java; >> The problem was described in [JDK-8309549](https://bugs.openjdk.org/browse/JDK-8309549) with a first try of a fix. >> A second fix via [JDK-8310191](https://bugs.openjdk.org/browse/JDK-8310191) was necessary. >> Both fixes just disable the specific subtest on AIX, without correction of the root cause. >> The root cause is, that dlopen() on AIX returns different handles every time, even if you load a library twice. There is no official AIX API available to get this information on a different way. >> My proposal is, to use the stat64x API with the fields st_device and st_inode. After a dlopen() the stat64x() API is called additionally to get this information which is then stored parallel to the library handle in the jvmtiAgent. For AIX we then can compare these values instead of the library handle and get the same functionality as on linux. > > Joachim Kern has updated the pull request incrementally with one additional commit since the last revision: > > adopt types Nothing further from me. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15583#pullrequestreview-1630272868 From mbaesken at openjdk.org Mon Sep 18 10:27:45 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 18 Sep 2023 10:27:45 GMT Subject: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX [v5] In-Reply-To: References: Message-ID: On Fri, 15 Sep 2023 07:22:32 GMT, Joachim Kern wrote: >> After push of [JDK-8307478](https://bugs.openjdk.org/browse/JDK-8307478) , the following test started to fail on AIX : >> com/sun/tools/attach/warnings/DynamicLoadWarningTest.java; >> The problem was described in [JDK-8309549](https://bugs.openjdk.org/browse/JDK-8309549) with a first try of a fix. >> A second fix via [JDK-8310191](https://bugs.openjdk.org/browse/JDK-8310191) was necessary. >> Both fixes just disable the specific subtest on AIX, without correction of the root cause. >> The root cause is, that dlopen() on AIX returns different handles every time, even if you load a library twice. There is no official AIX API available to get this information on a different way. >> My proposal is, to use the stat64x API with the fields st_device and st_inode. After a dlopen() the stat64x() API is called additionally to get this information which is then stored parallel to the library handle in the jvmtiAgent. For AIX we then can compare these values instead of the library handle and get the same functionality as on linux. > > Joachim Kern has updated the pull request incrementally with one additional commit since the last revision: > > adopt types Marked as reviewed by mbaesken (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15583#pullrequestreview-1630726225 From jkern at openjdk.org Mon Sep 18 12:00:53 2023 From: jkern at openjdk.org (Joachim Kern) Date: Mon, 18 Sep 2023 12:00:53 GMT Subject: Integrated: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX In-Reply-To: References: Message-ID: On Wed, 6 Sep 2023 08:18:45 GMT, Joachim Kern wrote: > After push of [JDK-8307478](https://bugs.openjdk.org/browse/JDK-8307478) , the following test started to fail on AIX : > com/sun/tools/attach/warnings/DynamicLoadWarningTest.java; > The problem was described in [JDK-8309549](https://bugs.openjdk.org/browse/JDK-8309549) with a first try of a fix. > A second fix via [JDK-8310191](https://bugs.openjdk.org/browse/JDK-8310191) was necessary. > Both fixes just disable the specific subtest on AIX, without correction of the root cause. > The root cause is, that dlopen() on AIX returns different handles every time, even if you load a library twice. There is no official AIX API available to get this information on a different way. > My proposal is, to use the stat64x API with the fields st_device and st_inode. After a dlopen() the stat64x() API is called additionally to get this information which is then stored parallel to the library handle in the jvmtiAgent. For AIX we then can compare these values instead of the library handle and get the same functionality as on linux. This pull request has now been integrated. Changeset: 21c2dac1 Author: Joachim Kern Committer: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/21c2dac15957e6d71e8f32a55f3825671da097a9 Stats: 114 lines in 7 files changed: 101 ins; 3 del; 10 mod 8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX Reviewed-by: dholmes, mbaesken ------------- PR: https://git.openjdk.org/jdk/pull/15583 From ayang at openjdk.org Mon Sep 18 12:30:50 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 18 Sep 2023 12:30:50 GMT Subject: RFR: 8316098: Revise signature of numa_get_leaf_groups Message-ID: Simple refactoring to better reflect NUMA node id is non-negative using unsigned type. More cleanup can possibly be done to avoid the use of `checked_cast` in `os_windows.cpp`, but since the NUMA code in Windows is unreachable (JDK-8244065), I went to for the smallest diff there in order to avoid adding more untested code. ------------- Commit messages: - os-node-id-api Changes: https://git.openjdk.org/jdk/pull/15786/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15786&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316098 Stats: 11 lines in 7 files changed: 1 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/15786.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15786/head:pull/15786 PR: https://git.openjdk.org/jdk/pull/15786 From ayang at openjdk.org Mon Sep 18 13:09:43 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 18 Sep 2023 13:09:43 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v7] In-Reply-To: References: Message-ID: On Thu, 14 Sep 2023 15:32:29 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > 100 stripes per active worker thread Much of the complexity (both baseline and the proposed patch) in determining the limits of iteration/clearing is because clearing and redirtying occurs in one pass. An alternative is to accumulate all to-be-redirtied cards during scavenge and redirty them at a later stage. (This is very similar to what G1 does.) One advantage of this approach is that one can verify that a card is dirty iff it contains old-to-young pointer at the end of young-gc pause. Recording to-be-redirtied cards requires some extra memory and redirtying them introduces another phase, so the overall effect in young-gc pause length still needs to be properly evaluated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1723373026 From rrich at openjdk.org Mon Sep 18 13:44:42 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 18 Sep 2023 13:44:42 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v7] In-Reply-To: References: Message-ID: On Thu, 14 Sep 2023 15:32:29 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > 100 stripes per active worker thread The G1 remembered set representation is quite sophisticated. I was aware of coarsening but this was enhanced even more in JDK18 ([JDK-8017163](https://bugs.openjdk.org/browse/JDK-8017163) and related). I would like to keep this as simple as possible if you don't mind because we intend to backport it to JDK17 and maybe even to JDK11 as we have bug reports for large Gerrit instances were scavenge pauses are sometimes 30s or 40s. I'll push a new commit later where all cards of a large array on the current stripe following a dirty card are considered dirty too. With this the regression you've shown goes away. Of course I don't object at all to further enhancements after we have something that can be backported. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1723439200 From jvernee at openjdk.org Mon Sep 18 14:17:35 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 18 Sep 2023 14:17:35 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v21] In-Reply-To: References: Message-ID: On Wed, 13 Sep 2023 19:43:53 GMT, ExE Boss wrote: >> Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: >> >> - add missing space + reflow lines >> - Fix typo >> >> Co-authored-by: ExE Boss <3889017+ExE-Boss at users.noreply.github.com> > > src/java.base/share/classes/jdk/internal/foreign/abi/fallback/FallbackLinker.java line 311: > >> 309: }; >> 310: >> 311: CANONICAL_LAYOUTS = Map.ofEntries( > > `LibFallback::wcharSize()` and?other?getters for?`LibFallback.NativeConstants`?fields can?throw an?error when?`LibFallback.SUPPORTED` is?`false` due?to the?`fallbackLinker`?library not?being?present, so?this static?initializer should?be?made into?a?method?instead: > Suggestion: > > static final Map CANONICAL_LAYOUTS = initCanonicalLayouts(); > > private static Map initCanonicalLayouts() { > if (!isSupported()) { > return null; > } > > int wchar_size = LibFallback.wcharSize(); > MemoryLayout wchartLayout = switch(wchar_size) { > case 2 -> JAVA_CHAR; // prefer JAVA_CHAR > default -> FFIType.layoutFor(wchar_size); > }; > > return Map.ofEntries( Good catch! I've chosen a slightly different solution though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1328794993 From jvernee at openjdk.org Mon Sep 18 14:17:30 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 18 Sep 2023 14:17:30 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v22] In-Reply-To: References: Message-ID: <57W4fhF3ZFjp-PYI4CoFbNtbBfyh2c0S46bRz1D8T0c=.ff1a4e84-e440-43cf-9366-7624e68cc609@github.com> > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: Avoid eager use of LibFallback in FallbackLinker static block ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15103/files - new: https://git.openjdk.org/jdk/pull/15103/files/e68b95c1..1f3248df Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=20-21 Stats: 62 lines in 1 file changed: 26 ins; 24 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From ayang at openjdk.org Mon Sep 18 14:53:44 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 18 Sep 2023 14:53:44 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v7] In-Reply-To: References: Message-ID: On Mon, 18 Sep 2023 13:41:32 GMT, Richard Reingruber wrote: > I would like to keep this as simple as possible Totally agree; what I suggested above doesn't require remset-related change (at least not as far I can envision). > ... all cards of a large array on the current stripe following a dirty card are considered dirty too. Wouldn't that negate the purpose of precise-dirty-card for obj-array, i.e. skipping scanning mem corresponding to clean cards? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1723605681 From mchung at openjdk.org Mon Sep 18 16:46:00 2023 From: mchung at openjdk.org (Mandy Chung) Date: Mon, 18 Sep 2023 16:46:00 GMT Subject: RFR: 8268829: Provide an optimized way to walk the stack with Class object only [v12] In-Reply-To: References: Message-ID: On Sun, 17 Sep 2023 07:42:43 GMT, Chen Liang wrote: >> src/java.base/share/classes/java/lang/StackFrameInfo.java line 93: >> >>> 91: synchronized (this) { >>> 92: if (type instanceof String sig) { >>> 93: type = JLIA.getMethodType(sig, declaringClass().getClassLoader()); >> >> Maybe?there should?be a?`return`?here: >> Suggestion: >> >> return type = JLIA.getMethodType(sig, declaringClass().getClassLoader()); > > `type` is of type `Object`, don't think this compiles as the result type of `=` is the `type` variable's type. You missed line 96 where `type` is returned. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15370#discussion_r1329010705 From coleenp at openjdk.org Mon Sep 18 16:46:15 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 18 Sep 2023 16:46:15 GMT Subject: RFR: 8316427: Duplicated code for {obj,type}ArrayKlass::array_klass Message-ID: <7_e_ClM7hvCdLc2vZPR_G53swZSB84sB5ipSr2AYymY=.4d415d32-2ed7-4148-b23d-4d554fc864bc@github.com> Please review this trivial change to move duplicated array_klass(). Tested with tier1 on Oracle supported platforms. ------------- Commit messages: - 8316427: Duplicated code for {obj,type}ArrayKlass::array_klass Changes: https://git.openjdk.org/jdk/pull/15791/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15791&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316427 Stats: 198 lines in 6 files changed: 65 ins; 130 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/15791.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15791/head:pull/15791 PR: https://git.openjdk.org/jdk/pull/15791 From rrich at openjdk.org Mon Sep 18 16:57:41 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 18 Sep 2023 16:57:41 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v7] In-Reply-To: References: Message-ID: On Mon, 18 Sep 2023 14:51:02 GMT, Albert Mingkun Yang wrote: > > I would like to keep this as simple as possible > > Totally agree; what I suggested above doesn't require remset-related change (at least not as far I can envision). Likely I misunderstood you, sorry. I though you were suggesting a solution to avoid the last regression. I agree that if the card table is only read during scavenge this would reduce complexity. Basically you need a 2nd card table to collect the dirty marks, don't you? > > ... all cards of a large array on the current stripe following a dirty card are considered dirty too. > > Wouldn't that negate the purpose of precise-dirty-card for obj-array, i.e. skipping scanning mem corresponding to clean cards? At most, only the elements on the current stripe are scanned when a dirty card is encountered, while all elements are scanned with the baseline. So the precise card marks are actually used with the proposed patch. In the worst case (only the first card of each stripe is dirty) this does not help though. This scheme is better than the baseline and avoids the regression when clean/dirty cards alternate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1723966585 From never at openjdk.org Mon Sep 18 18:40:41 2023 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 18 Sep 2023 18:40:41 GMT Subject: RFR: 8315954: getArgumentValues002.java fails on Graal [v2] In-Reply-To: References: Message-ID: <-vcjV1PPar4XngNmn___A2whUTbQIK46aaq_fgtHJlI=.237223b6-d854-46e6-847f-b9e608f7ae06@github.com> On Fri, 15 Sep 2023 14:01:13 GMT, Doug Simon wrote: >> This PR adds `HotSpotResolvedJavaMethod.getOopMapAt` to get the oop map for a method at a given BCI. This is required to do correct clearing of oops at OSR entry points. >> >> As part of this addition, I needed to be able to detect requests for oop maps at invalid BCIs. For this, I added `InterpreterOopMap::has_valid_mask()`. When an oop map computation is requested for an invalid BCI, this method returns false. > > Doug Simon has updated the pull request incrementally with three additional commits since the last revision: > > - generalized getLiveObjectLocalsAt to getOopMapAt > - need to zero oop_map_buf > - simplified getLiveObjectLocalsAt and moved it from ResolvedJavaMethod to HotSpotResolvedJavaMethod New version looks good. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15705#pullrequestreview-1631741044 From rrich at openjdk.org Mon Sep 18 19:54:10 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 18 Sep 2023 19:54:10 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v8] In-Reply-To: References: Message-ID: > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Scan large array stripe from first dirty card to stripe end ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/9a2b230d..3e6c1b74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=06-07 Stats: 39 lines in 1 file changed: 10 ins; 17 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From rrich at openjdk.org Mon Sep 18 19:57:43 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 18 Sep 2023 19:57:43 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v8] In-Reply-To: References: Message-ID: On Mon, 18 Sep 2023 19:54:10 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Scan large array stripe from first dirty card to stripe end card_scan_scarce.java (attached to the JBS item) is a variant of your last test that allows to have just one very large array or a bunch smaller ones. I've created it to see if scanning for dirty cards can remain precise for smaller arrays but that still showed a regression. With the last version (https://github.com/openjdk/jdk/pull/14846/commits/3e6c1b74e7caf0aa44a9688e18b7c710e3d0cb42) we assume that all cards of the large array in the current stripe after the first dirty card are dirty too. With this the regression goes away. Baseline -------- $ ./jdk-baseline/bin/java -Xms3g -Xmx3g -XX:+UseParallelGC -XX:ParallelGCThreads=2 -Xlog:gc=trace -Xlog:gc+scavenge=trace card_scan_scarce 1000 1 [0.002s][warning][logging] No tag set matches selection: gc+scavenge. Did you mean any of the following? gc* gc+exit* gc+load gc+reloc gc+unmap [0.007s][info ][gc ] Using Parallel ### bigArrLen:1000M bigArrCount:1 ### System.gc [0.500s][trace ][gc ] GC(0) PSYoung generation size at maximum: 1048576K [0.500s][info ][gc ] GC(0) Pause Young (System.gc()) 1047M->1001M(2944M) 208.600ms [0.932s][info ][gc ] GC(1) Pause Full (System.gc()) 1001M->1001M(2944M) 431.232ms [1.396s][trace ][gc ] GC(2) PSYoung generation size at maximum: 1048576K [1.396s][info ][gc ] GC(2) Pause Young (Allocation Failure) 1769M->1001M(2944M) 209.498ms [1.756s][trace ][gc ] GC(3) PSYoung generation size at maximum: 1048576K [1.757s][info ][gc ] GC(3) Pause Young (Allocation Failure) 1769M->1001M(2944M) 206.165ms [2.110s][trace ][gc ] GC(4) PSYoung generation size at maximum: 1048576K [2.110s][info ][gc ] GC(4) Pause Young (Allocation Failure) 1769M->1001M(2944M) 199.424ms New --- $ ./jdk-new/bin/java -Xms3g -Xmx3g -XX:+UseParallelGC -XX:ParallelGCThreads=2 -Xlog:gc=trace -Xlog:gc+scavenge=trace card_scan_scarce 1000 1 [0.006s][info][gc] Using Parallel ### bigArrLen:1000M bigArrCount:1 ### System.gc [0.293s][trace][gc,scavenge] stripe count:200 stripe size:5125K [0.386s][trace][gc ] GC(0) PSYoung generation size at maximum: 1048576K [0.386s][info ][gc ] GC(0) Pause Young (System.gc()) 1047M->1001M(2944M) 93.863ms [0.802s][info ][gc ] GC(1) Pause Full (System.gc()) 1001M->1001M(2944M) 415.417ms [1.048s][trace][gc,scavenge] stripe count:200 stripe size:5126K [1.215s][trace][gc ] GC(2) PSYoung generation size at maximum: 1048576K [1.215s][info ][gc ] GC(2) Pause Young (Allocation Failure) 1769M->1001M(2944M) 166.850ms [1.362s][trace][gc,scavenge] stripe count:200 stripe size:5126K [1.516s][trace][gc ] GC(3) PSYoung generation size at maximum: 1048576K [1.516s][info ][gc ] GC(3) Pause Young (Allocation Failure) 1769M->1001M(2944M) 154.607ms [1.679s][trace][gc,scavenge] stripe count:200 stripe size:5126K [1.835s][trace][gc ] GC(4) PSYoung generation size at maximum: 1048576K [1.835s][info ][gc ] GC(4) Pause Young (Allocation Failure) 1769M->1001M(2944M) 156.783ms ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1724277752 From jjoo at openjdk.org Mon Sep 18 22:57:43 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Mon, 18 Sep 2023 22:57:43 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v20] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Add header for failing build check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/0a2565dc..d45e54d5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=18-19 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From mchung at openjdk.org Mon Sep 18 23:07:01 2023 From: mchung at openjdk.org (Mandy Chung) Date: Mon, 18 Sep 2023 23:07:01 GMT Subject: RFR: 8316456: StackWalker may skip Continuation::yield0 frame mistakenly Message-ID: `JVM_MoreStackWalk` has a bug that always assumes that the Java frame stream is currently at the frame decoded in the last patch and so always advances to the next frame before filling in the new batch of stack frame. However `JVM_MoreStackWalk` may return 0. The library will set the continuation to its parent. It then call `JVM_MoreStackWalk` to continue the stack walking but the last decoded frame has already been advanced. The Java frame stream is already at the top frame of the parent continuation. . The current implementation skips "Continuation::yield0" mistakenly. This only happens with `-XX:+ShowHiddenFrames` or `StackWalker.Option.SHOW_HIDDEN_FRAMES`. The fix is to pass the number of frames decoded in the last batch to `JVM_MoreStackWalk` so that the VM will determine if the current frame should be skipped or not. `test/jdk/jdk/internal/vm/Continuation/Scoped.java` test now correctly checks the expected result where "yield0" exists between "enter" and "run" frames. ------------- Commit messages: - 8316456: StackWalker may skip Continuation::yield0 frame mistakenly Changes: https://git.openjdk.org/jdk/pull/15804/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15804&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316456 Stats: 128 lines in 7 files changed: 36 ins; 11 del; 81 mod Patch: https://git.openjdk.org/jdk/pull/15804.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15804/head:pull/15804 PR: https://git.openjdk.org/jdk/pull/15804 From jjoo at openjdk.org Mon Sep 18 23:19:19 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Mon, 18 Sep 2023 23:19:19 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v21] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Add more header files for broken debug build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/d45e54d5..fed27c0c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=19-20 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From jjoo at openjdk.org Tue Sep 19 00:23:30 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Tue, 19 Sep 2023 00:23:30 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v22] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Fix more broken headers for sanity checks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/fed27c0c..9c6e0723 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=20-21 Stats: 3 lines in 3 files changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From dholmes at openjdk.org Tue Sep 19 04:44:38 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 19 Sep 2023 04:44:38 GMT Subject: RFR: 8316427: Duplicated code for {obj, type}ArrayKlass::array_klass In-Reply-To: <7_e_ClM7hvCdLc2vZPR_G53swZSB84sB5ipSr2AYymY=.4d415d32-2ed7-4148-b23d-4d554fc864bc@github.com> References: <7_e_ClM7hvCdLc2vZPR_G53swZSB84sB5ipSr2AYymY=.4d415d32-2ed7-4148-b23d-4d554fc864bc@github.com> Message-ID: On Mon, 18 Sep 2023 16:40:08 GMT, Coleen Phillimore wrote: > Please review this trivial change to move duplicated array_klass(). Tested with tier1 on Oracle supported platforms. I have to wonder if it is in fact a bug that `TypeArrayKlass::array_klass` invokes `ObjArrayKlass::allocate_objArray_klass` when that creates the wrong type of array??? ------------- PR Review: https://git.openjdk.org/jdk/pull/15791#pullrequestreview-1632410283 From dholmes at openjdk.org Tue Sep 19 06:40:10 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 19 Sep 2023 06:40:10 GMT Subject: RFR: 8316229: Enhance class initialization logging Message-ID: <15AsPHmB9KIv7FT238yqlj0iASAnDx79JiCD0WC81hg=.ebf94e32-df86-4232-8970-2557346e5cf1@github.com> This change adds some additional debug logging to the class linking and initialization process. It was useful in diagnosing the deadlock described by: https://bugs.openjdk.org/browse/JDK-8316469 See the example output in JBS issue. The changes are mostly uncontroversial but I needed to expose a way to access the init thread's name, which can't use the regular `name()` method as the safety check doesn't recognise the calling context as being safe. Testing: - manual examination of logging output - ran the only test that enables class+init logging: runtime/logging/ClassInitializationTest.java (no change as expected) - tiers1-3 sanity Thanks. ------------- Commit messages: - Cache the log enabled state in a local. - 8316229: Enhance class initialization logging Changes: https://git.openjdk.org/jdk/pull/15809/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15809&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316229 Stats: 73 lines in 4 files changed: 69 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/15809.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15809/head:pull/15809 PR: https://git.openjdk.org/jdk/pull/15809 From stefank at openjdk.org Tue Sep 19 06:57:04 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 19 Sep 2023 06:57:04 GMT Subject: RFR: 8316436: ContinuationWrapper uses unhandled nullptr oop Message-ID: The ZGC oop verification code in combination with CheckUnhandledOops finds an unhandled oop in ContinuationWrapper: Test java/lang/Thread/virtual/stress/Skynet.java#ZGenerational with ' -XX:+CheckUnhandledOops' crashes with # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (src/hotspot/share/gc/z/zAddress.inline.hpp:296), pid=986260, tid=986296 # assert(!assert_on_failure) failed: Has low-order bits set: 0xfffffffffffffff1 V [libjvm.so+0x1962fda] initialize_check_oop_function()::{lambda(oopDesc*)#1}::_FUN(oopDesc*)+0x5a (zAddress.inline.hpp:296) V [libjvm.so+0xa6d484] ContinuationWrapper::~ContinuationWrapper()+0x24 (oopsHierarchy.hpp:89) V [libjvm.so+0xa66c83] int freeze_internal >(JavaThread*, long*)+0x373 (continuationFreezeThaw.cpp:1584) V [libjvm.so+0xa6711b] int freeze >(JavaThread*, long*)+0x5b (continuationFreezeThaw.cpp:272) J 216 jdk.internal.vm.Continuation.doYield()I [java.base at 22-internal](mailto:java.base at 22-internal) (0 bytes) @ 0x00007f614c630875 [0x00007f614c630820+0x0000000000000055] This is the scenario that triggers this bug: 1) ContinuationWrapper is created on the stack 2) We enter a JRT_BLOCK section 3) Call ContinuationWrapper::done() 4) Exit the JRT_BLOCK 5) ~ContinuationWrapper is called (3) sets ContinuationWrapper::_continuation to nullptr (4) hits a safepoint and sets ContinuationWrapper::_continuation to 0xfffffffffffffff1 (5) uses ContinuationWrapper::_continuation in `_continuation != nullptr`, which triggers ZGC's verification code that finds the broken oop. So, this crashes with ZGC, but that's because ZGC finds a broken usage of _continuation. To show that this is still a problem with other GCs I added this assert: diff --git a/src/hotspot/share/runtime/javaThread.hpp b/src/hotspot/share/runtime/javaThread.hpp index 40205d324a6..80b60d0b7b8 100644 --- a/src/hotspot/share/runtime/javaThread.hpp +++ b/src/hotspot/share/runtime/javaThread.hpp @@ -258,7 +258,7 @@ class JavaThread: public Thread { public: void inc_no_safepoint_count() { _no_safepoint_count++; } - void dec_no_safepoint_count() { _no_safepoint_count--; } + void dec_no_safepoint_count() { _no_safepoint_count--; assert(_no_safepoint_count >= 0, "Catch G1 in the act!"); } #endif // ASSERT public: // These functions check conditions before possibly going to a safepoint. To catch the broken nullptr check in: void allow_safepoint() { #ifdef ASSERT // we could have already allowed safepoints in done if (_continuation != nullptr && _current_thread->is_Java_thread()) { JavaThread::cast(_current_thread)->dec_no_safepoint_count(); } #endif } The assert is triggered when I run the test with G1. I propose that we fix this by stop setting _continuation to nullptr as a way to indicate cancellation of the ContinuationWrapper, and instead use a bool for that. I made some slight changes to remove all the code duplication in the constructors so that I only had to initialize the new _done variable in one place. I've tested this with the original reproducer + assert with both ZGC and G1. I'll run this through our internal CI pipeline as well. ------------- Commit messages: - 8316436: ContinuationWrapper uses unhandled nullptr oop Changes: https://git.openjdk.org/jdk/pull/15810/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15810&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316436 Stats: 27 lines in 2 files changed: 8 ins; 11 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/15810.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15810/head:pull/15810 PR: https://git.openjdk.org/jdk/pull/15810 From shade at openjdk.org Tue Sep 19 07:53:40 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 19 Sep 2023 07:53:40 GMT Subject: RFR: 8316229: Enhance class initialization logging In-Reply-To: <15AsPHmB9KIv7FT238yqlj0iASAnDx79JiCD0WC81hg=.ebf94e32-df86-4232-8970-2557346e5cf1@github.com> References: <15AsPHmB9KIv7FT238yqlj0iASAnDx79JiCD0WC81hg=.ebf94e32-df86-4232-8970-2557346e5cf1@github.com> Message-ID: <9vqpDFgo_SJ5gvEWftwUpljzVG0kk2QUFEyeTM9lTsY=.4841a9fe-a404-4875-bfee-65a3a6f84cbd@github.com> On Tue, 19 Sep 2023 06:33:02 GMT, David Holmes wrote: > This change adds some additional debug logging to the class linking and initialization process. It was useful in diagnosing the deadlock described by: > > https://bugs.openjdk.org/browse/JDK-8316469 > > See the example output in JBS issue. > > The changes are mostly uncontroversial but I needed to expose a way to access the init thread's name, which can't use the regular `name()` method as the safety check doesn't recognise the calling context as being safe. > > Testing: > - manual examination of logging output > - ran the only test that enables class+init logging: runtime/logging/ClassInitializationTest.java (no change as expected) > - tiers1-3 sanity > > Thanks. This looks good, I have only stylistic questions/suggestions. src/hotspot/share/oops/instanceKlass.cpp line 773: > 771: MonitorLocker ml(current, _init_monitor); > 772: > 773: bool debug_logging_enabled = log_is_enabled(Debug, class, init); Here and later: Do we need to peel off the `log_is_enabled` check into a separate variable? We don't do it in most (all?) of our places. src/hotspot/share/oops/instanceKlass.cpp line 779: > 777: if (debug_logging_enabled) { > 778: ResourceMark rm(current); > 779: log_debug(class, init)("Thread %s waiting for linking of %s by thread %s", Here and later: Should thread names be in quotes? This would match other places, e.g. error handler, jstack output, etc. src/hotspot/share/oops/instanceKlass.hpp line 502: > 500: // We can safely access the name as long as we hold the _init_monitor. > 501: const char* init_thread_name() { > 502: assert(_init_monitor->owned_by_self(), " Must hold _init_monitor here"); Suggestion: assert(_init_monitor->owned_by_self(), "Must hold _init_monitor here"); ------------- PR Review: https://git.openjdk.org/jdk/pull/15809#pullrequestreview-1632632886 PR Review Comment: https://git.openjdk.org/jdk/pull/15809#discussion_r1329714158 PR Review Comment: https://git.openjdk.org/jdk/pull/15809#discussion_r1329714897 PR Review Comment: https://git.openjdk.org/jdk/pull/15809#discussion_r1329716244 From ayang at openjdk.org Tue Sep 19 08:26:43 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 19 Sep 2023 08:26:43 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v7] In-Reply-To: References: Message-ID: On Mon, 18 Sep 2023 16:54:28 GMT, Richard Reingruber wrote: > Basically you need a 2nd card table to collect the dirty marks, don't you? Yes, sth alone that line; that's what I meant by "Recording to-be-redirtied cards requires some extra memory". > So the precise card marks are actually used with the proposed patch. In the worst case (only the first card of each stripe is dirty) this does not help though. Yes, this is my understanding -- in some scenarios, it degenerates into imprecise card scanning. OTOH, parallel processing large-obj-array and precise card scanning are two orthogonal aspects anyway. Using the latest revision, I don't observe any regression in the attached bms. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1725047188 From adinn at openjdk.org Tue Sep 19 08:50:40 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 19 Sep 2023 08:50:40 GMT Subject: RFR: 8316427: Duplicated code for {obj, type}ArrayKlass::array_klass In-Reply-To: <7_e_ClM7hvCdLc2vZPR_G53swZSB84sB5ipSr2AYymY=.4d415d32-2ed7-4148-b23d-4d554fc864bc@github.com> References: <7_e_ClM7hvCdLc2vZPR_G53swZSB84sB5ipSr2AYymY=.4d415d32-2ed7-4148-b23d-4d554fc864bc@github.com> Message-ID: On Mon, 18 Sep 2023 16:40:08 GMT, Coleen Phillimore wrote: > Please review this trivial change to move duplicated array_klass(). Tested with tier1 on Oracle supported platforms. Why is it the wrong type? If the type array is, say, int[] then its array class is int[][] which is an array of int[] i.e. of objects. This patch looks fine to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15791#issuecomment-1725081600 From rcastanedalo at openjdk.org Tue Sep 19 09:24:40 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 19 Sep 2023 09:24:40 GMT Subject: RFR: 8315082: [REDO] Generational ZGC: Tests crash with assert(index == 0 || is_power_of_2(index)) In-Reply-To: References: Message-ID: On Wed, 13 Sep 2023 05:14:55 GMT, Tobias Hartmann wrote: > Thanks for the detailed analysis and explanation. The fix looks good. Thanks for reviewing, Tobias! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15589#issuecomment-1725137705 From rcastanedalo at openjdk.org Tue Sep 19 09:24:42 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 19 Sep 2023 09:24:42 GMT Subject: RFR: 8315082: [REDO] Generational ZGC: Tests crash with assert(index == 0 || is_power_of_2(index)) In-Reply-To: <7vWXolmj0hDdJ3JxuoBnaifU1jZ7TkEiDGm6dVcQamU=.628c723a-3d44-4491-ba51-379d69e06741@github.com> References: <7vWXolmj0hDdJ3JxuoBnaifU1jZ7TkEiDGm6dVcQamU=.628c723a-3d44-4491-ba51-379d69e06741@github.com> Message-ID: On Thu, 14 Sep 2023 17:48:27 GMT, Vladimir Kozlov wrote: > Few comments. Thanks for reviewing, Vladimir! See my replies inlined for each comment/question. > Why rounding is only arrays? Class instance (bytewise) size is already rounded to `MinObjAlignmentInBytes` (i.e. it is a multiple of `BytesPerLong`, assuming minimum object alignment of 8 bytes), so rounding it again would be a no-op. > Do we have a check that object's alignment >= 8 bytes? If it less you may access beyond array. This code simply assumes that the minimum object alignment in HotSpot is 8 bytes, as per `src/hotspot/share/runtime/globals.hpp`: #ifdef _LP64 ... product(int, ObjectAlignmentInBytes, 8, \ "Default object alignment in bytes, 8 is minimum") \ range(8, 256) \ constraint(ObjectAlignmentInBytesConstraintFunc, AtParse) ... #else ... const int ObjectAlignmentInBytes = 8; ... #endif Is there any VM configuration where this might not hold? Or do I misunderstand your question? > Can `Add` result overflow in 32-bit VM? `payload_size + (BytesPerLong - 1)` cannot overflow because it is performed after a subtraction of a larger quantity (`base_off`). Would it make sense to add an assertion like this?: `assert(base_off >= BytesPerLong - 1, "array payload size computation should not overflow");` > src/hotspot/share/opto/graphKit.cpp line 3859: > >> 3857: abody = _gvn.transform(new LShiftXNode(lengthx, elem_shift)); >> 3858: } >> 3859: Node* non_rounded_size = _gvn.transform(new AddXNode(headerx, abody)); > > Here `headerx + abody` cannot overflow because of to the array length limit imposed by `TypeAryPtr::max_array_length()`, which is designed to fit the total bytes of an array header and payload (including external alignment) in 32 bits (on 32-bits platforms). I checked this using the length limits for all basic types. Exceeding this limit causes the array allocation to throw a `java.lang.OutOfMemoryError: Requested array size exceeds VM limit` exception). > src/hotspot/share/opto/graphKit.cpp line 3869: > >> 3867: if (round_mask != 0) { >> 3868: Node* mask1 = MakeConX(round_mask); >> 3869: size = _gvn.transform(new AddXNode(size, mask1)); > > and here again question about overflow in 32-bit VM. > Do we generate compare with FastAllocateSizeLimit before this code is executed? `size + mask1` cannot overflow because, as you suggest, it is only computed in the fast allocation path (i.e. for small values of `size` restricted by FastAllocateSizeLimit). The only way to bypass this check and indeed trigger the overflow is to set `-XX:FastAllocateSizeLimit=2147483647` (debug-only flag), this causes indeed a segmentation fault, reported in [JDK-8316512](https://bugs.openjdk.org/browse/JDK-8316512). In general, the behavior of size computation within `GraphKit::new_array` w.r.t. overflow is not altered by this changeset, as far as I can see, because it essentially only changes whether `round_mask` is added to the header size (as done before) or to the total, non-rounded size (as proposed here). ------------- PR Comment: https://git.openjdk.org/jdk/pull/15589#issuecomment-1725139074 PR Review Comment: https://git.openjdk.org/jdk/pull/15589#discussion_r1329825119 PR Review Comment: https://git.openjdk.org/jdk/pull/15589#discussion_r1329828675 PR Review Comment: https://git.openjdk.org/jdk/pull/15589#discussion_r1329829522 From coleenp at openjdk.org Tue Sep 19 11:49:38 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 19 Sep 2023 11:49:38 GMT Subject: RFR: 8316427: Duplicated code for {obj, type}ArrayKlass::array_klass In-Reply-To: <7_e_ClM7hvCdLc2vZPR_G53swZSB84sB5ipSr2AYymY=.4d415d32-2ed7-4148-b23d-4d554fc864bc@github.com> References: <7_e_ClM7hvCdLc2vZPR_G53swZSB84sB5ipSr2AYymY=.4d415d32-2ed7-4148-b23d-4d554fc864bc@github.com> Message-ID: On Mon, 18 Sep 2023 16:40:08 GMT, Coleen Phillimore wrote: > Please review this trivial change to move duplicated array_klass(). Tested with tier1 on Oracle supported platforms. What Andrew said. An array of int[][] is an array of objects, so an ObjArrayKlass. The reason that array_klass can't return an ObjArrayKlass instead of ArrayKlass, is that a single dimension primitive type array returns TypeArrayKlass. Thanks Andrew, please approve the change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15791#issuecomment-1725344290 From adinn at openjdk.org Tue Sep 19 12:07:39 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 19 Sep 2023 12:07:39 GMT Subject: RFR: 8316427: Duplicated code for {obj, type}ArrayKlass::array_klass In-Reply-To: <7_e_ClM7hvCdLc2vZPR_G53swZSB84sB5ipSr2AYymY=.4d415d32-2ed7-4148-b23d-4d554fc864bc@github.com> References: <7_e_ClM7hvCdLc2vZPR_G53swZSB84sB5ipSr2AYymY=.4d415d32-2ed7-4148-b23d-4d554fc864bc@github.com> Message-ID: On Mon, 18 Sep 2023 16:40:08 GMT, Coleen Phillimore wrote: > Please review this trivial change to move duplicated array_klass(). Tested with tier1 on Oracle supported platforms. Marked as reviewed by adinn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15791#pullrequestreview-1633089308 From fbredberg at openjdk.org Tue Sep 19 12:23:09 2023 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Tue, 19 Sep 2023 12:23:09 GMT Subject: RFR: 8315966: Relativize initial_sp in interpreter frames Message-ID: Relativize initial_sp in interpreter frames. By changing the "initial_sp" (AKA "monitor_block_top" or "monitors" on PowerPC) member in interpreter frames from being an absolute address into an offset that is relative to the frame pointer, we don't need to change the value as we freeze and thaw frames of virtual threads. This is since we might freeze and thaw from and to different worker threads, so the absolute address to locals might change, but the offset from the frame pointer will be constant. This subtask only handles relativization of "initial_sp" and "monitor_block_top" since it's the same slot in interpreter frames (roughly the same as "monitors" on PowerPC). Relativization of other interpreter frame members are handled in other subtasks to JDK-8289296. Tested tier1-tier7 on supported platforms. The rest was sanity tested using Qemu. ------------- Commit messages: - Merge branch 'master' into 8315966_relativize_initial_sp - 8315966: Relativize initial_sp in interpreter frames Changes: https://git.openjdk.org/jdk/pull/15815/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15815&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8315966 Stats: 196 lines in 31 files changed: 85 ins; 36 del; 75 mod Patch: https://git.openjdk.org/jdk/pull/15815.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15815/head:pull/15815 PR: https://git.openjdk.org/jdk/pull/15815 From stefank at openjdk.org Tue Sep 19 12:35:41 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 19 Sep 2023 12:35:41 GMT Subject: RFR: 8316436: ContinuationWrapper uses unhandled nullptr oop [v2] In-Reply-To: References: Message-ID: > The ZGC oop verification code in combination with CheckUnhandledOops finds an unhandled oop in ContinuationWrapper: > > > Test java/lang/Thread/virtual/stress/Skynet.java#ZGenerational with ' -XX:+CheckUnhandledOops' crashes with > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (src/hotspot/share/gc/z/zAddress.inline.hpp:296), pid=986260, tid=986296 > # assert(!assert_on_failure) failed: Has low-order bits set: 0xfffffffffffffff1 > > V [libjvm.so+0x1962fda] initialize_check_oop_function()::{lambda(oopDesc*)#1}::_FUN(oopDesc*)+0x5a (zAddress.inline.hpp:296) > V [libjvm.so+0xa6d484] ContinuationWrapper::~ContinuationWrapper()+0x24 (oopsHierarchy.hpp:89) > V [libjvm.so+0xa66c83] int freeze_internal >(JavaThread*, long*)+0x373 (continuationFreezeThaw.cpp:1584) > V [libjvm.so+0xa6711b] int freeze >(JavaThread*, long*)+0x5b (continuationFreezeThaw.cpp:272) > J 216 jdk.internal.vm.Continuation.doYield()I [java.base at 22-internal](mailto:java.base at 22-internal) (0 bytes) @ 0x00007f614c630875 [0x00007f614c630820+0x0000000000000055] > > > This is the scenario that triggers this bug: > 1) ContinuationWrapper is created on the stack > 2) We enter a JRT_BLOCK section > 3) Call ContinuationWrapper::done() > 4) Exit the JRT_BLOCK > 5) ~ContinuationWrapper is called > > (3) sets ContinuationWrapper::_continuation to nullptr > (4) hits a safepoint and sets ContinuationWrapper::_continuation to 0xfffffffffffffff1 > (5) uses ContinuationWrapper::_continuation in `_continuation != nullptr`, which triggers ZGC's verification code that finds the broken oop. > > So, this crashes with ZGC, but that's because ZGC finds a broken usage of _continuation. To show that this is still a problem with other GCs I added this assert: > > diff --git a/src/hotspot/share/runtime/javaThread.hpp b/src/hotspot/share/runtime/javaThread.hpp > index 40205d324a6..80b60d0b7b8 100644 > --- a/src/hotspot/share/runtime/javaThread.hpp > +++ b/src/hotspot/share/runtime/javaThread.hpp > @@ -258,7 +258,7 @@ class JavaThread: public Thread { > > public: > void inc_no_safepoint_count() { _no_safepoint_count++; } > - void dec_no_safepoint_count() { _no_safepoint_count--; } > + void dec_no_safepoint_count() { _no_safepoint_count--; assert(_no_safepoint_count >= 0, "Catch G1 in the act!"); } > #endif // ASSERT > public: > // These functions check conditions before possibly going to a safepoint. > > > To catch the broken nullptr check in: > > void allow_safepoint() { > ... Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Fix thread argument ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15810/files - new: https://git.openjdk.org/jdk/pull/15810/files/88663b45..9a636516 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15810&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15810&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15810.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15810/head:pull/15810 PR: https://git.openjdk.org/jdk/pull/15810 From fbredberg at openjdk.org Tue Sep 19 12:40:41 2023 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Tue, 19 Sep 2023 12:40:41 GMT Subject: RFR: 8315966: Relativize initial_sp in interpreter frames In-Reply-To: References: Message-ID: <7Wnz146tpJX_Z7__bWrpzDYgVyI21jMk8JKe83K_c_M=.65aec7da-6515-4316-8fef-c68abd837d20@github.com> On Tue, 19 Sep 2023 09:00:01 GMT, Fredrik Bredberg wrote: > Relativize initial_sp in interpreter frames. > > By changing the "initial_sp" (AKA "monitor_block_top" or "monitors" on PowerPC) member in interpreter frames from being an absolute address into an offset that is relative to the frame pointer, we don't need to change the value as we freeze and thaw frames of virtual threads. This is since we might freeze and thaw from and to different worker threads, so the absolute address to locals might change, but the offset from the frame pointer will be constant. > > This subtask only handles relativization of "initial_sp" and "monitor_block_top" since it's the same slot in interpreter frames (roughly the same as "monitors" on PowerPC). Relativization of other interpreter frame members are handled in other subtasks to JDK-8289296. > > Tested tier1-tier7 on supported platforms. The rest was sanity tested using Qemu. Calling @TheRealMDoerr and @RealFYang for PowerPC and RISC-V reviews. I've done basic loom related testing on both PowerPC and RISC-V. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15815#issuecomment-1725431646 From jpbempel at openjdk.org Tue Sep 19 12:49:45 2023 From: jpbempel at openjdk.org (Jean-Philippe Bempel) Date: Tue, 19 Sep 2023 12:49:45 GMT Subject: RFR: 8308762: Metaspace leak with Instrumentation.retransform In-Reply-To: References: Message-ID: On Thu, 13 Jul 2023 14:34:38 GMT, Coleen Phillimore wrote: >> Fix a small leak in constant pool merging during retransformation of a class. If this class has a catch block with `Throwable`, the class `Throwable` is pre-resolved in the constant pool, while all the other classes are in a unresolved state. So the constant pool merging process was considering the entry with pre-resolved class as different compared to the destination and create a new entry. We now try to consider it as equal specially for Methodref/Fieldref. > > Also there is a nice test harness for class redefinition in the test/hotspot/jtreg/serviceability/jvmti/RedefineClasses tests that you might be able to use to add a test for this. Please @coleenp review my last changes ------------- PR Comment: https://git.openjdk.org/jdk/pull/14780#issuecomment-1725446154 From zgu at openjdk.org Tue Sep 19 13:03:39 2023 From: zgu at openjdk.org (Zhengyu Gu) Date: Tue, 19 Sep 2023 13:03:39 GMT Subject: RFR: 8316436: ContinuationWrapper uses unhandled nullptr oop [v2] In-Reply-To: References: Message-ID: On Tue, 19 Sep 2023 12:35:41 GMT, Stefan Karlsson wrote: >> The ZGC oop verification code in combination with CheckUnhandledOops finds an unhandled oop in ContinuationWrapper: >> >> >> Test java/lang/Thread/virtual/stress/Skynet.java#ZGenerational with ' -XX:+CheckUnhandledOops' crashes with >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (src/hotspot/share/gc/z/zAddress.inline.hpp:296), pid=986260, tid=986296 >> # assert(!assert_on_failure) failed: Has low-order bits set: 0xfffffffffffffff1 >> >> V [libjvm.so+0x1962fda] initialize_check_oop_function()::{lambda(oopDesc*)#1}::_FUN(oopDesc*)+0x5a (zAddress.inline.hpp:296) >> V [libjvm.so+0xa6d484] ContinuationWrapper::~ContinuationWrapper()+0x24 (oopsHierarchy.hpp:89) >> V [libjvm.so+0xa66c83] int freeze_internal >(JavaThread*, long*)+0x373 (continuationFreezeThaw.cpp:1584) >> V [libjvm.so+0xa6711b] int freeze >(JavaThread*, long*)+0x5b (continuationFreezeThaw.cpp:272) >> J 216 jdk.internal.vm.Continuation.doYield()I [java.base at 22-internal](mailto:java.base at 22-internal) (0 bytes) @ 0x00007f614c630875 [0x00007f614c630820+0x0000000000000055] >> >> >> This is the scenario that triggers this bug: >> 1) ContinuationWrapper is created on the stack >> 2) We enter a JRT_BLOCK section >> 3) Call ContinuationWrapper::done() >> 4) Exit the JRT_BLOCK >> 5) ~ContinuationWrapper is called >> >> (3) sets ContinuationWrapper::_continuation to nullptr >> (4) hits a safepoint and sets ContinuationWrapper::_continuation to 0xfffffffffffffff1 >> (5) uses ContinuationWrapper::_continuation in `_continuation != nullptr`, which triggers ZGC's verification code that finds the broken oop. >> >> So, this crashes with ZGC, but that's because ZGC finds a broken usage of _continuation. To show that this is still a problem with other GCs I added this assert: >> >> diff --git a/src/hotspot/share/runtime/javaThread.hpp b/src/hotspot/share/runtime/javaThread.hpp >> index 40205d324a6..80b60d0b7b8 100644 >> --- a/src/hotspot/share/runtime/javaThread.hpp >> +++ b/src/hotspot/share/runtime/javaThread.hpp >> @@ -258,7 +258,7 @@ class JavaThread: public Thread { >> >> public: >> void inc_no_safepoint_count() { _no_safepoint_count++; } >> - void dec_no_safepoint_count() { _no_safepoint_count--; } >> + void dec_no_safepoint_count() { _no_safepoint_count--; assert(_no_safepoint_count >= 0, "Catch G1 in the act!"); } >> #endif // ASSERT >> public: >> // These functions check conditions before possibly going to ... > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Fix thread argument src/hotspot/share/runtime/continuationWrapper.inline.hpp line 52: > 50: oop _continuation; // jdk.internal.vm.Continuation instance > 51: stackChunkOop _tail; > 52: bool _done; Looks like `_done` can be debug-only flag. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15810#discussion_r1330090805 From mdoerr at openjdk.org Tue Sep 19 14:27:41 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 19 Sep 2023 14:27:41 GMT Subject: RFR: 8315966: Relativize initial_sp in interpreter frames In-Reply-To: References: Message-ID: On Tue, 19 Sep 2023 09:00:01 GMT, Fredrik Bredberg wrote: > Relativize initial_sp in interpreter frames. > > By changing the "initial_sp" (AKA "monitor_block_top" or "monitors" on PowerPC) member in interpreter frames from being an absolute address into an offset that is relative to the frame pointer, we don't need to change the value as we freeze and thaw frames of virtual threads. This is since we might freeze and thaw from and to different worker threads, so the absolute address to locals might change, but the offset from the frame pointer will be constant. > > This subtask only handles relativization of "initial_sp" and "monitor_block_top" since it's the same slot in interpreter frames (roughly the same as "monitors" on PowerPC). Relativization of other interpreter frame members are handled in other subtasks to JDK-8289296. > > Tested tier1-tier7 on supported platforms. The rest was sanity tested using Qemu. Thanks for implementing the platform parts! Note that you could have used R0 which is typically available as scratch reg. But, ok, we have enough other regs. Looks correct. I'll run more tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15815#issuecomment-1725731457 From rrich at openjdk.org Tue Sep 19 14:37:43 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 19 Sep 2023 14:37:43 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v7] In-Reply-To: References: Message-ID: <9v9FCKMPNJIf78JT3zhYZhy7BL4bmoMk5kPBKyqeiOQ=.a64934cf-0a4a-470f-ba9d-2f224bcdbcfb@github.com> On Tue, 19 Sep 2023 08:23:59 GMT, Albert Mingkun Yang wrote: > > So the precise card marks are actually used with the proposed patch. In the worst case (only the first card of each stripe is dirty) this does not help though. > > Yes, this is my understanding -- in some scenarios, it degenerates into imprecise card scanning. OTOH, parallel processing large-obj-array and precise card scanning are two orthogonal aspects anyway. Yes, indeed. If we can't find a simple scheme for precise card scanning then it's better to split that work off into a dedicated rfe. > Using the latest revision, I don't observe any regression in the attached bms. Thanks for testing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1725770508 From duke at openjdk.org Tue Sep 19 15:40:56 2023 From: duke at openjdk.org (duke) Date: Tue, 19 Sep 2023 15:40:56 GMT Subject: Withdrawn: 8312502: Mass migrate HotSpot attributes to the correct location In-Reply-To: References: Message-ID: On Fri, 21 Jul 2023 10:07:24 GMT, Julian Waters wrote: > Someone had to do it, so I did. Moves attributes to the correct place as specified in the HotSpot Style Guide once and for all This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/14969 From lmesnik at openjdk.org Tue Sep 19 16:36:52 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 19 Sep 2023 16:36:52 GMT Subject: Integrated: 8315415: OutputAnalyzer.shouldMatchByLine() fails in some cases In-Reply-To: References: Message-ID: On Thu, 14 Sep 2023 19:59:53 GMT, Leonid Mesnik wrote: > OutputAnalyzer.shouldMatchByLine(from, to, pattern) > treat from and to parameters as patterns and not lines. So it might fail to compile them or work not as expected in some cases. > > I grepped the usage of shouldMatchByLine and stdoutShouldMatchByLine and found that in most cases from/to are set to some regex patterns. So I just updated the names of variables and documentation to explicitly say that from/to are patterns. > > See bugs for details. Tested with tier1 (mostly for validation scripts since no code changes.) This pull request has now been integrated. Changeset: 7b1e2bfe Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/7b1e2bfe0f805a69b59839b6bf8250b62ea356b8 Stats: 28 lines in 1 file changed: 0 ins; 0 del; 28 mod 8315415: OutputAnalyzer.shouldMatchByLine() fails in some cases Reviewed-by: dholmes ------------- PR: https://git.openjdk.org/jdk/pull/15753 From pchilanomate at openjdk.org Tue Sep 19 16:53:43 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 19 Sep 2023 16:53:43 GMT Subject: RFR: 8316436: ContinuationWrapper uses unhandled nullptr oop [v2] In-Reply-To: References: Message-ID: On Tue, 19 Sep 2023 13:01:00 GMT, Zhengyu Gu wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix thread argument > > src/hotspot/share/runtime/continuationWrapper.inline.hpp line 52: > >> 50: oop _continuation; // jdk.internal.vm.Continuation instance >> 51: stackChunkOop _tail; >> 52: bool _done; > > Looks like `_done` can be debug-only flag. +1 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15810#discussion_r1330431718 From pchilanomate at openjdk.org Tue Sep 19 16:53:41 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 19 Sep 2023 16:53:41 GMT Subject: RFR: 8316436: ContinuationWrapper uses unhandled nullptr oop [v2] In-Reply-To: References: Message-ID: <6y2hLhhj2ieykUXwIwBiSDWJiCRlpbkTX6Zwp3sEDgQ=.2a61488e-d6be-4bc4-9c48-bce8f53a3436@github.com> On Tue, 19 Sep 2023 12:35:41 GMT, Stefan Karlsson wrote: >> The ZGC oop verification code in combination with CheckUnhandledOops finds an unhandled oop in ContinuationWrapper: >> >> >> Test java/lang/Thread/virtual/stress/Skynet.java#ZGenerational with ' -XX:+CheckUnhandledOops' crashes with >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (src/hotspot/share/gc/z/zAddress.inline.hpp:296), pid=986260, tid=986296 >> # assert(!assert_on_failure) failed: Has low-order bits set: 0xfffffffffffffff1 >> >> V [libjvm.so+0x1962fda] initialize_check_oop_function()::{lambda(oopDesc*)#1}::_FUN(oopDesc*)+0x5a (zAddress.inline.hpp:296) >> V [libjvm.so+0xa6d484] ContinuationWrapper::~ContinuationWrapper()+0x24 (oopsHierarchy.hpp:89) >> V [libjvm.so+0xa66c83] int freeze_internal >(JavaThread*, long*)+0x373 (continuationFreezeThaw.cpp:1584) >> V [libjvm.so+0xa6711b] int freeze >(JavaThread*, long*)+0x5b (continuationFreezeThaw.cpp:272) >> J 216 jdk.internal.vm.Continuation.doYield()I [java.base at 22-internal](mailto:java.base at 22-internal) (0 bytes) @ 0x00007f614c630875 [0x00007f614c630820+0x0000000000000055] >> >> >> This is the scenario that triggers this bug: >> 1) ContinuationWrapper is created on the stack >> 2) We enter a JRT_BLOCK section >> 3) Call ContinuationWrapper::done() >> 4) Exit the JRT_BLOCK >> 5) ~ContinuationWrapper is called >> >> (3) sets ContinuationWrapper::_continuation to nullptr >> (4) hits a safepoint and sets ContinuationWrapper::_continuation to 0xfffffffffffffff1 >> (5) uses ContinuationWrapper::_continuation in `_continuation != nullptr`, which triggers ZGC's verification code that finds the broken oop. >> >> So, this crashes with ZGC, but that's because ZGC finds a broken usage of _continuation. To show that this is still a problem with other GCs I added this assert: >> >> diff --git a/src/hotspot/share/runtime/javaThread.hpp b/src/hotspot/share/runtime/javaThread.hpp >> index 40205d324a6..80b60d0b7b8 100644 >> --- a/src/hotspot/share/runtime/javaThread.hpp >> +++ b/src/hotspot/share/runtime/javaThread.hpp >> @@ -258,7 +258,7 @@ class JavaThread: public Thread { >> >> public: >> void inc_no_safepoint_count() { _no_safepoint_count++; } >> - void dec_no_safepoint_count() { _no_safepoint_count--; } >> + void dec_no_safepoint_count() { _no_safepoint_count--; assert(_no_safepoint_count >= 0, "Catch G1 in the act!"); } >> #endif // ASSERT >> public: >> // These functions check conditions before possibly going to ... > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Fix thread argument Looks good to me. Thanks, Patricio ------------- Marked as reviewed by pchilanomate (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15810#pullrequestreview-1633768094 From rrich at openjdk.org Tue Sep 19 20:48:44 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 19 Sep 2023 20:48:44 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v8] In-Reply-To: References: Message-ID: <4WqrcMgtvoVJ6NK7XatTp-unCkhYSatRznUwplN9UqA=.54e88b43-bb5d-45a0-b97f-6fd852641632@github.com> On Mon, 18 Sep 2023 19:54:10 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Scan large array stripe from first dirty card to stripe end I've made some further experiments with precise scanning and found that the regression goes away when using a simplified version of `PSCardTable::find_first_clean_card` in `PSCardTable::scavenge_large_array_stripe` like in this [tmp branch](https://github.com/reinrich/jdk/commits/tmp2). Apparently using the start array causes the regression. Not sure why... I'd like to fix the regression like this, keeping the precise scanning of large object arrays. Unfortunately this will take an extra day (or two) because corporate IT semi-bricked my notebook :/ ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1726442916 From cslucas at openjdk.org Tue Sep 19 21:51:41 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 19 Sep 2023 21:51:41 GMT Subject: RFR: JDK-8315279: Factor 'basic_plus_adr' out of PhaseMacroExpand and delete make_load/store [v3] In-Reply-To: References: <3_ThxcuU3e_hPvWi4lJBfXsyG4Ky_eyyifbkZ2izlKQ=.0070b59a-31ae-4ede-9625-a9e4bf3b7a16@github.com> Message-ID: On Sat, 16 Sep 2023 00:45:56 GMT, Vladimir Ivanov wrote: >> @iwanowww do you think I should just withdraw this PR and close the associated RFE? > > It's up to you, Cesar. I find existing code good enough, but also I don't have anything against your proposal. Thanks, Vladimir. Since the work is already done and it may improve things slightly, I'll keep the PR open. I created this RFE for migrating PhaseMacroExpand, PhaseIdealLoop to GraphKit: https://bugs.openjdk.org/browse/JDK-8316560 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15480#discussion_r1330741302 From jjoo at openjdk.org Tue Sep 19 22:32:45 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Tue, 19 Sep 2023 22:32:45 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v23] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Fix logic for publishing total cpu time and convert atomic jlong to long ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/9c6e0723..0fbfa006 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=21-22 Stats: 15 lines in 3 files changed: 9 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From kvn at openjdk.org Tue Sep 19 22:39:41 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 19 Sep 2023 22:39:41 GMT Subject: RFR: JDK-8315279: Factor 'basic_plus_adr' out of PhaseMacroExpand and delete make_load/store [v4] In-Reply-To: References: Message-ID: On Thu, 31 Aug 2023 21:56:07 GMT, Cesar Soares Lucas wrote: >> I believe the factory methods for AddPNode should be in the AddPNode class. The make_load / make_store methods in PhaseMacroExpand can be refactored to instead just use the "make" methods from Load/Store classes. >> >> Tested with tier1-3. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Convert uses of AddPNode::make to new AddPNode I don't like the last update. Now you may create AddP node even if offset is 0. Yes, AddPNode::Identity() will fix that but it will happen only later during IGVN transform. The only code which checks offset in BarrierSetC2::obj_allocate is hard to read. The duplicated code was factored into `basic_plus_adr()` for that reason - to simplify code in uses. May be we should put these changes on hold until after you look on JDK-8316560. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15480#issuecomment-1726631114 From jjoo at openjdk.org Tue Sep 19 22:48:16 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Tue, 19 Sep 2023 22:48:16 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v24] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Fix build issues ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/0fbfa006..3eae6bba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=22-23 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From dholmes at openjdk.org Wed Sep 20 00:22:43 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 20 Sep 2023 00:22:43 GMT Subject: RFR: 8308762: Metaspace leak with Instrumentation.retransform [v6] In-Reply-To: References: Message-ID: <8QoG0sRwT2fmo-cgtpQGIReK4uPNuJzM2LpPg7HvyV4=.f36921f7-8540-4d93-a432-736901a23b51@github.com> On Thu, 3 Aug 2023 08:43:12 GMT, Jean-Philippe Bempel wrote: >> Fix a small leak in constant pool merging during retransformation of a class. If this class has a catch block with `Throwable`, the class `Throwable` is pre-resolved in the constant pool, while all the other classes are in a unresolved state. So the constant pool merging process was considering the entry with pre-resolved class as different compared to the destination and create a new entry. We now try to consider it as equal specially for Methodref/Fieldref. > > Jean-Philippe Bempel has updated the pull request incrementally with one additional commit since the last revision: > > remove trailing whitespace Based on the previous discussion and comments this simple change seems quite reasonable to me. One query on the test though. Thanks test/hotspot/jtreg/serviceability/jvmti/RedefineClasses/RedefineLeakThrowable.java line 28: > 26: # @bug 8308762 > 27: * @library /test/lib > 28: * @summary Test that redefinition of class containing Throwable refs does not leak constant pool Exactly how is this test tracking whether there is a leak or not? Is it simply setting metaspace size small enough that the 500 iterations would exhaust metaspace if there were a leak? ------------- PR Review: https://git.openjdk.org/jdk/pull/14780#pullrequestreview-1634362012 PR Review Comment: https://git.openjdk.org/jdk/pull/14780#discussion_r1330816904 From dholmes at openjdk.org Wed Sep 20 00:42:44 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 20 Sep 2023 00:42:44 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v24] In-Reply-To: References: Message-ID: <_U1jBJQChDb-Y86Qd-0xMl3f3oCjEv2egqem9ZME7GY=.0737b93e-4521-4b82-b330-7f4491370907@github.com> On Tue, 19 Sep 2023 22:48:16 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Fix build issues Changes requested by dholmes (Reviewer). src/hotspot/share/gc/shared/collectedHeap.cpp line 161: > 159: } > 160: > 161: void CollectedHeap::inc_total_cpu_time(long diff) { We don't use `long` in shared code as it has different size on different platforms. ------------- PR Review: https://git.openjdk.org/jdk/pull/15082#pullrequestreview-1634383244 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1330832166 From dholmes at openjdk.org Wed Sep 20 01:56:39 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 20 Sep 2023 01:56:39 GMT Subject: RFR: 8316229: Enhance class initialization logging In-Reply-To: <9vqpDFgo_SJ5gvEWftwUpljzVG0kk2QUFEyeTM9lTsY=.4841a9fe-a404-4875-bfee-65a3a6f84cbd@github.com> References: <15AsPHmB9KIv7FT238yqlj0iASAnDx79JiCD0WC81hg=.ebf94e32-df86-4232-8970-2557346e5cf1@github.com> <9vqpDFgo_SJ5gvEWftwUpljzVG0kk2QUFEyeTM9lTsY=.4841a9fe-a404-4875-bfee-65a3a6f84cbd@github.com> Message-ID: <9ZMbuRTXPX3rBRa94mRsK1qQBnLhKGqGIINf4GrJrX4=.dac55f08-a282-4163-ae21-137d5cb74433@github.com> On Tue, 19 Sep 2023 07:49:06 GMT, Aleksey Shipilev wrote: >> This change adds some additional debug logging to the class linking and initialization process. It was useful in diagnosing the deadlock described by: >> >> https://bugs.openjdk.org/browse/JDK-8316469 >> >> See the example output in JBS issue. >> >> The changes are mostly uncontroversial but I needed to expose a way to access the init thread's name, which can't use the regular `name()` method as the safety check doesn't recognise the calling context as being safe. >> >> Testing: >> - manual examination of logging output >> - ran the only test that enables class+init logging: runtime/logging/ClassInitializationTest.java (no change as expected) >> - tiers1-3 sanity >> >> Thanks. > > src/hotspot/share/oops/instanceKlass.cpp line 773: > >> 771: MonitorLocker ml(current, _init_monitor); >> 772: >> 773: bool debug_logging_enabled = log_is_enabled(Debug, class, init); > > Here and later: Do we need to peel off the `log_is_enabled` check into a separate variable? We don't do it in most (all?) of our places. ha! You'll see that was my last separate commit. It bugs me when we keep checking `log_is_enabled` over and over in the same chunk of code. Even though it is supposed to be fast it is surely fast with a local? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15809#discussion_r1330866851 From dholmes at openjdk.org Wed Sep 20 02:01:38 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 20 Sep 2023 02:01:38 GMT Subject: RFR: 8316229: Enhance class initialization logging In-Reply-To: <9vqpDFgo_SJ5gvEWftwUpljzVG0kk2QUFEyeTM9lTsY=.4841a9fe-a404-4875-bfee-65a3a6f84cbd@github.com> References: <15AsPHmB9KIv7FT238yqlj0iASAnDx79JiCD0WC81hg=.ebf94e32-df86-4232-8970-2557346e5cf1@github.com> <9vqpDFgo_SJ5gvEWftwUpljzVG0kk2QUFEyeTM9lTsY=.4841a9fe-a404-4875-bfee-65a3a6f84cbd@github.com> Message-ID: <36IAh5QuVydtxma8_lD-H3LkhZybvPLdGHjavuuhuyA=.309484eb-fad1-43fb-8919-b3ca44bb08bd@github.com> On Tue, 19 Sep 2023 07:49:42 GMT, Aleksey Shipilev wrote: >> This change adds some additional debug logging to the class linking and initialization process. It was useful in diagnosing the deadlock described by: >> >> https://bugs.openjdk.org/browse/JDK-8316469 >> >> See the example output in JBS issue. >> >> The changes are mostly uncontroversial but I needed to expose a way to access the init thread's name, which can't use the regular `name()` method as the safety check doesn't recognise the calling context as being safe. >> >> Testing: >> - manual examination of logging output >> - ran the only test that enables class+init logging: runtime/logging/ClassInitializationTest.java (no change as expected) >> - tiers1-3 sanity >> >> Thanks. > > src/hotspot/share/oops/instanceKlass.cpp line 779: > >> 777: if (debug_logging_enabled) { >> 778: ResourceMark rm(current); >> 779: log_debug(class, init)("Thread %s waiting for linking of %s by thread %s", > > Here and later: Should thread names be in quotes? This would match other places, e.g. error handler, jstack output, etc. I don't think it is critical but I can change it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15809#discussion_r1330869106 From dholmes at openjdk.org Wed Sep 20 02:09:22 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 20 Sep 2023 02:09:22 GMT Subject: RFR: 8316229: Enhance class initialization logging [v2] In-Reply-To: <15AsPHmB9KIv7FT238yqlj0iASAnDx79JiCD0WC81hg=.ebf94e32-df86-4232-8970-2557346e5cf1@github.com> References: <15AsPHmB9KIv7FT238yqlj0iASAnDx79JiCD0WC81hg=.ebf94e32-df86-4232-8970-2557346e5cf1@github.com> Message-ID: > This change adds some additional debug logging to the class linking and initialization process. It was useful in diagnosing the deadlock described by: > > https://bugs.openjdk.org/browse/JDK-8316469 > > See the example output in JBS issue. > > The changes are mostly uncontroversial but I needed to expose a way to access the init thread's name, which can't use the regular `name()` method as the safety check doesn't recognise the calling context as being safe. > > Testing: > - manual examination of logging output > - ran the only test that enables class+init logging: runtime/logging/ClassInitializationTest.java (no change as expected) > - tiers1-3 sanity > > Thanks. David Holmes has updated the pull request incrementally with two additional commits since the last revision: - Extra space - Put thread name in quotes. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15809/files - new: https://git.openjdk.org/jdk/pull/15809/files/8db375eb..d7530af0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15809&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15809&range=00-01 Stats: 11 lines in 2 files changed: 0 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/15809.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15809/head:pull/15809 PR: https://git.openjdk.org/jdk/pull/15809 From dholmes at openjdk.org Wed Sep 20 02:09:24 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 20 Sep 2023 02:09:24 GMT Subject: RFR: 8316229: Enhance class initialization logging [v2] In-Reply-To: <9vqpDFgo_SJ5gvEWftwUpljzVG0kk2QUFEyeTM9lTsY=.4841a9fe-a404-4875-bfee-65a3a6f84cbd@github.com> References: <15AsPHmB9KIv7FT238yqlj0iASAnDx79JiCD0WC81hg=.ebf94e32-df86-4232-8970-2557346e5cf1@github.com> <9vqpDFgo_SJ5gvEWftwUpljzVG0kk2QUFEyeTM9lTsY=.4841a9fe-a404-4875-bfee-65a3a6f84cbd@github.com> Message-ID: On Tue, 19 Sep 2023 07:51:17 GMT, Aleksey Shipilev wrote: >> David Holmes has updated the pull request incrementally with two additional commits since the last revision: >> >> - Extra space >> - Put thread name in quotes. > > This looks good, I have only stylistic questions/suggestions. Thanks for looking at this @shipilev ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15809#issuecomment-1726773082 From dholmes at openjdk.org Wed Sep 20 02:12:38 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 20 Sep 2023 02:12:38 GMT Subject: RFR: 8316427: Duplicated code for {obj, type}ArrayKlass::array_klass In-Reply-To: <7_e_ClM7hvCdLc2vZPR_G53swZSB84sB5ipSr2AYymY=.4d415d32-2ed7-4148-b23d-4d554fc864bc@github.com> References: <7_e_ClM7hvCdLc2vZPR_G53swZSB84sB5ipSr2AYymY=.4d415d32-2ed7-4148-b23d-4d554fc864bc@github.com> Message-ID: On Mon, 18 Sep 2023 16:40:08 GMT, Coleen Phillimore wrote: > Please review this trivial change to move duplicated array_klass(). Tested with tier1 on Oracle supported platforms. It creates an `objArrayKlass` not a `typeArrayKlass`. I guess I don't understand what these two things each represent. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15791#issuecomment-1726776760 From dholmes at openjdk.org Wed Sep 20 02:19:44 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 20 Sep 2023 02:19:44 GMT Subject: RFR: 8316427: Duplicated code for {obj, type}ArrayKlass::array_klass In-Reply-To: <7_e_ClM7hvCdLc2vZPR_G53swZSB84sB5ipSr2AYymY=.4d415d32-2ed7-4148-b23d-4d554fc864bc@github.com> References: <7_e_ClM7hvCdLc2vZPR_G53swZSB84sB5ipSr2AYymY=.4d415d32-2ed7-4148-b23d-4d554fc864bc@github.com> Message-ID: On Mon, 18 Sep 2023 16:40:08 GMT, Coleen Phillimore wrote: > Please review this trivial change to move duplicated array_klass(). Tested with tier1 on Oracle supported platforms. Okay - refactoring looks good. Thanks. Okay `typeArrayKlass` presents an array of primitive type. `objArrayKlass` represents an array of reference type. For a multi-dimensional array we are dealing with an array of object references (each of which is an array of some other type). Got it. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15791#pullrequestreview-1634445749 PR Comment: https://git.openjdk.org/jdk/pull/15791#issuecomment-1726779415 From tschatzl at openjdk.org Wed Sep 20 07:28:09 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 20 Sep 2023 07:28:09 GMT Subject: RFR: 8315503: G1: Code root scan causes long GC pauses due to imbalanced iteration Message-ID: Hi all, please review this change that modifies the code root (remembered) set to use the CHT as internal representation. This removes lots of locking (inhibiting throughput), provides automatic balancing for the code root scan phase, and (parallel) bulk unregistering of nmethdos during code cache unloading improving performance of various pauses that deal with code root sets. With a stress test that frequently loads and unloads 6000 classes and associated methods from them we could previously see the following issues: During collection pauses: [4179,965s][gc,phases ] GC(273) Evacuate Collection Set: 812,18ms [..] [4179,965s][gc,phases ] GC(273) Code Root Scan (ms): Min: 0,00, Avg: 59,03, Max: 775,12, Diff: 775,12, Sum: 944,44, Workers: 16 [...] [4179,965s][gc,phases ] GC(273) Termination (ms): Min: 0,03, Avg: 643,90, Max: 690,96, Diff: 690,93, Sum: 10302,47, Workers: 16 Code root scan now reduces to ~22ms max on average in this case. Class unloading (breaking down the code cache flushing, i.e. `CodeCache::flush_unlinked_nmethods`): Clear Exception Caches 35,5ms Unregister NMethods 598,5ms <---- this is nmethod unregistering. Unregister Old NMethods 3,0ms CodeBlob flush 41,1ms CodeCache free 5730,3ms With this change, the `unregister nmethods` phase takes ~25ms max on that stress test. @walulyai contributed this part. We have recently seen some imbalances in code root scan and long Remark pauses (thankfully not to that extreme) in other real-world applications too: [2466.979s][gc,phases ] GC(131) Code Root Scan (ms): Min: 0.0, Avg: 5.7, Max: 46.4, Diff: 46.4, Sum: 57.0, Workers: 10 Some random comment: * the mutex for the CHT had to be decreased in priority by one to not conflict with `CodeCache_lock`. This does not seem to be detrimental otherwise. At the same time, I had to move the locks at `nosafepoint-3` to `nosafepoint-4` as well to keep previous ordering. All mutexes with uses of `nosafepoint` as their rank seem to be good now. Testing: tier1-5 Thanks, Thomas ------------- Commit messages: - initial version that seems to work Changes: https://git.openjdk.org/jdk/pull/15811/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15811&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8315503 Stats: 470 lines in 23 files changed: 315 ins; 93 del; 62 mod Patch: https://git.openjdk.org/jdk/pull/15811.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15811/head:pull/15811 PR: https://git.openjdk.org/jdk/pull/15811 From jpbempel at openjdk.org Wed Sep 20 07:59:43 2023 From: jpbempel at openjdk.org (Jean-Philippe Bempel) Date: Wed, 20 Sep 2023 07:59:43 GMT Subject: RFR: 8308762: Metaspace leak with Instrumentation.retransform [v6] In-Reply-To: <8QoG0sRwT2fmo-cgtpQGIReK4uPNuJzM2LpPg7HvyV4=.f36921f7-8540-4d93-a432-736901a23b51@github.com> References: <8QoG0sRwT2fmo-cgtpQGIReK4uPNuJzM2LpPg7HvyV4=.f36921f7-8540-4d93-a432-736901a23b51@github.com> Message-ID: <5H8YgYXqnRlnaFGAzzwz5XKGgUyiAgcHJM3E-Esd9Tw=.cd637efe-2baa-4c8a-b659-7cd005225a65@github.com> On Wed, 20 Sep 2023 00:05:01 GMT, David Holmes wrote: > Is it simply setting metaspace size small enough that the 500 iterations would exhaust metaspace if there were a leak? yes exactly. with a leak you end up with `OutOfMemoryError: metaspace` running the test ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14780#discussion_r1331182331 From fyang at openjdk.org Wed Sep 20 08:54:44 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 20 Sep 2023 08:54:44 GMT Subject: RFR: 8315966: Relativize initial_sp in interpreter frames In-Reply-To: References: Message-ID: On Tue, 19 Sep 2023 09:00:01 GMT, Fredrik Bredberg wrote: > Relativize initial_sp in interpreter frames. > > By changing the "initial_sp" (AKA "monitor_block_top" or "monitors" on PowerPC) member in interpreter frames from being an absolute address into an offset that is relative to the frame pointer, we don't need to change the value as we freeze and thaw frames of virtual threads. This is since we might freeze and thaw from and to different worker threads, so the absolute address to locals might change, but the offset from the frame pointer will be constant. > > This subtask only handles relativization of "initial_sp" and "monitor_block_top" since it's the same slot in interpreter frames (roughly the same as "monitors" on PowerPC). Relativization of other interpreter frame members are handled in other subtasks to JDK-8289296. > > Tested tier1-tier7 on supported platforms. The rest was sanity tested using Qemu. Hi, I have arranged tier1-3 test on linux-riscv64 platform. Thanks for adding handling for riscv. src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 704: > 702: // register for unlock_object to pass to VM directly > 703: ld(c_rarg1, monitor_block_top); // derelativize pointer > 704: shadd(c_rarg1, c_rarg1, fp, c_rarg1, LogBytesPerWord); Nit: One redundant space between the 3rd and 4th parameters for each `shadd` call added. ------------- PR Review: https://git.openjdk.org/jdk/pull/15815#pullrequestreview-1635063095 PR Review Comment: https://git.openjdk.org/jdk/pull/15815#discussion_r1331261765 From shade at openjdk.org Wed Sep 20 09:21:47 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 20 Sep 2023 09:21:47 GMT Subject: RFR: 8316229: Enhance class initialization logging [v2] In-Reply-To: References: <15AsPHmB9KIv7FT238yqlj0iASAnDx79JiCD0WC81hg=.ebf94e32-df86-4232-8970-2557346e5cf1@github.com> Message-ID: <6Zf3KZAlUnN-bw-b5qqOKwjfc4L6yP7QJnU-UK-5z1o=.31fd4237-9f4a-456a-8590-e11d1ec4225a@github.com> On Wed, 20 Sep 2023 02:09:22 GMT, David Holmes wrote: >> This change adds some additional debug logging to the class linking and initialization process. It was useful in diagnosing the deadlock described by: >> >> https://bugs.openjdk.org/browse/JDK-8316469 >> >> See the example output in JBS issue. >> >> The changes are mostly uncontroversial but I needed to expose a way to access the init thread's name, which can't use the regular `name()` method as the safety check doesn't recognise the calling context as being safe. >> >> Testing: >> - manual examination of logging output >> - ran the only test that enables class+init logging: runtime/logging/ClassInitializationTest.java (no change as expected) >> - tiers1-3 sanity >> >> Thanks. > > David Holmes has updated the pull request incrementally with two additional commits since the last revision: > > - Extra space > - Put thread name in quotes. Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15809#pullrequestreview-1635161132 From shade at openjdk.org Wed Sep 20 09:21:49 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 20 Sep 2023 09:21:49 GMT Subject: RFR: 8316229: Enhance class initialization logging [v2] In-Reply-To: <9ZMbuRTXPX3rBRa94mRsK1qQBnLhKGqGIINf4GrJrX4=.dac55f08-a282-4163-ae21-137d5cb74433@github.com> References: <15AsPHmB9KIv7FT238yqlj0iASAnDx79JiCD0WC81hg=.ebf94e32-df86-4232-8970-2557346e5cf1@github.com> <9vqpDFgo_SJ5gvEWftwUpljzVG0kk2QUFEyeTM9lTsY=.4841a9fe-a404-4875-bfee-65a3a6f84cbd@github.com> <9ZMbuRTXPX3rBRa94mRsK1qQBnLhKGqGIINf4GrJrX4=.dac55f08-a282-4163-ae21-137d5cb74433@github.com> Message-ID: On Wed, 20 Sep 2023 01:54:19 GMT, David Holmes wrote: >> src/hotspot/share/oops/instanceKlass.cpp line 773: >> >>> 771: MonitorLocker ml(current, _init_monitor); >>> 772: >>> 773: bool debug_logging_enabled = log_is_enabled(Debug, class, init); >> >> Here and later: Do we need to peel off the `log_is_enabled` check into a separate variable? We don't do it in most (all?) of our places. > > ha! You'll see that was my last separate commit. It bugs me when we keep checking `log_is_enabled` over and over in the same chunk of code. Even though it is supposed to be fast it is surely faster with a local? Well, I prefer to stack all logging code in one place. But I have no strong opinion about this one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15809#discussion_r1331328613 From mdoerr at openjdk.org Wed Sep 20 09:28:41 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 20 Sep 2023 09:28:41 GMT Subject: RFR: 8315966: Relativize initial_sp in interpreter frames In-Reply-To: References: Message-ID: On Tue, 19 Sep 2023 09:00:01 GMT, Fredrik Bredberg wrote: > Relativize initial_sp in interpreter frames. > > By changing the "initial_sp" (AKA "monitor_block_top" or "monitors" on PowerPC) member in interpreter frames from being an absolute address into an offset that is relative to the frame pointer, we don't need to change the value as we freeze and thaw frames of virtual threads. This is since we might freeze and thaw from and to different worker threads, so the absolute address to locals might change, but the offset from the frame pointer will be constant. > > This subtask only handles relativization of "initial_sp" and "monitor_block_top" since it's the same slot in interpreter frames (roughly the same as "monitors" on PowerPC). Relativization of other interpreter frame members are handled in other subtasks to JDK-8289296. > > Tested tier1-tier7 on supported platforms. The rest was sanity tested using Qemu. Tests have passed and the PPC64 parts look good. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15815#pullrequestreview-1635181336 From tschatzl at openjdk.org Wed Sep 20 09:41:12 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 20 Sep 2023 09:41:12 GMT Subject: RFR: 8316581: Improve performance of Symbol::print_value_on() Message-ID: Hi all, please review this (hopefully correct) optimization of `Symbol::print_value_on()`; investigation into class unloading time distribution showed that a lot of time is spent in the `UnloadingEventLog::log()` call (25+%, see CR). The reason seems to be the use of `outputStream::print()` without any need for formatting. This seems to decrease time spent in this logging by almost 10x. Testing: hs_err output seems still be the same, GHA Thanks, Thomas ------------- Commit messages: - improve symbol::print_value_on() performance Changes: https://git.openjdk.org/jdk/pull/15838/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15838&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316581 Stats: 6 lines in 1 file changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/15838.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15838/head:pull/15838 PR: https://git.openjdk.org/jdk/pull/15838 From shade at openjdk.org Wed Sep 20 09:57:39 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 20 Sep 2023 09:57:39 GMT Subject: RFR: 8316581: Improve performance of Symbol::print_value_on() In-Reply-To: References: Message-ID: On Wed, 20 Sep 2023 09:34:05 GMT, Thomas Schatzl wrote: > Hi all, > > please review this (hopefully correct) optimization of `Symbol::print_value_on()`; investigation into class unloading time distribution showed that a lot of time is spent in the `UnloadingEventLog::log()` call (25+%, see CR). > > The reason seems to be the use of `outputStream::print()` without any need for formatting. > > This seems to decrease time spent in this logging by almost 10x. > > Testing: hs_err output seems still be the same, GHA > > Thanks, > Thomas Seems to be similar to what `java_lang_Class::print_signature` already does: https://github.com/openjdk/jdk/blob/e1870d360e05c372e672b519d7de2a60c333675b/src/hotspot/share/classfile/javaClasses.cpp#L1296 -- might replace that one with the call to `Symbol::print_value_on` (maybe later). ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15838#pullrequestreview-1635238150 From rpressler at openjdk.org Wed Sep 20 10:48:38 2023 From: rpressler at openjdk.org (Ron Pressler) Date: Wed, 20 Sep 2023 10:48:38 GMT Subject: RFR: 8316456: StackWalker may skip Continuation::yield0 frame mistakenly In-Reply-To: References: Message-ID: On Mon, 18 Sep 2023 23:00:09 GMT, Mandy Chung wrote: > `JVM_MoreStackWalk` has a bug that always assumes that the Java frame > stream is currently at the frame decoded in the last patch and so always > advances to the next frame before filling in the new batch of stack frame. > However `JVM_MoreStackWalk` may return 0. The library will set > the continuation to its parent. It then call `JVM_MoreStackWalk` to continue > the stack walking but the last decoded frame has already been advanced. > The Java frame stream is already at the top frame of the parent continuation. . > The current implementation skips "Continuation::yield0" mistakenly. This > only happens with `-XX:+ShowHiddenFrames` or `StackWalker.Option.SHOW_HIDDEN_FRAMES`. > > The fix is to pass the number of frames decoded in the last batch to `JVM_MoreStackWalk` > so that the VM will determine if the current frame should be skipped or not. > > `test/jdk/jdk/internal/vm/Continuation/Scoped.java` test now correctly checks > the expected result where "yield0" exists between "enter" and "run" frames. Marked as reviewed by rpressler (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15804#pullrequestreview-1635330981 From coleenp at openjdk.org Wed Sep 20 12:46:44 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 20 Sep 2023 12:46:44 GMT Subject: RFR: 8316581: Improve performance of Symbol::print_value_on() In-Reply-To: References: Message-ID: On Wed, 20 Sep 2023 09:34:05 GMT, Thomas Schatzl wrote: > Hi all, > > please review this (hopefully correct) optimization of `Symbol::print_value_on()`; investigation into class unloading time distribution showed that a lot of time is spent in the `UnloadingEventLog::log()` call (25+%, see CR). > > The reason seems to be the use of `outputStream::print()` without any need for formatting. > > This seems to decrease time spent in this logging by almost 10x. > > Testing: hs_err output seems still be the same, GHA > > Thanks, > Thomas src/hotspot/share/oops/symbol.cpp line 392: > 390: // disassembler and error reporting. > 391: void Symbol::print_value_on(outputStream* st) const { > 392: st->write("'", 1); I haven't seen code using the 'write' function of ostream, but usually print_raw. Can you use that instead? print_raw just calls write. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15838#discussion_r1331574246 From coleenp at openjdk.org Wed Sep 20 13:05:46 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 20 Sep 2023 13:05:46 GMT Subject: RFR: 8316229: Enhance class initialization logging [v2] In-Reply-To: References: <15AsPHmB9KIv7FT238yqlj0iASAnDx79JiCD0WC81hg=.ebf94e32-df86-4232-8970-2557346e5cf1@github.com> Message-ID: On Wed, 20 Sep 2023 02:09:22 GMT, David Holmes wrote: >> This change adds some additional debug logging to the class linking and initialization process. It was useful in diagnosing the deadlock described by: >> >> https://bugs.openjdk.org/browse/JDK-8316469 >> >> See the example output in JBS issue. >> >> The changes are mostly uncontroversial but I needed to expose a way to access the init thread's name, which can't use the regular `name()` method as the safety check doesn't recognise the calling context as being safe. >> >> Testing: >> - manual examination of logging output >> - ran the only test that enables class+init logging: runtime/logging/ClassInitializationTest.java (no change as expected) >> - tiers1-3 sanity >> >> Thanks. > > David Holmes has updated the pull request incrementally with two additional commits since the last revision: > > - Extra space > - Put thread name in quotes. This looks reasonable. I like the variable for log_is_enabled, keeps it from being less intrusive. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15809#pullrequestreview-1635590586 From coleenp at openjdk.org Wed Sep 20 14:28:58 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 20 Sep 2023 14:28:58 GMT Subject: RFR: 8308762: Metaspace leak with Instrumentation.retransform [v6] In-Reply-To: References: Message-ID: On Thu, 3 Aug 2023 08:43:12 GMT, Jean-Philippe Bempel wrote: >> Fix a small leak in constant pool merging during retransformation of a class. If this class has a catch block with `Throwable`, the class `Throwable` is pre-resolved in the constant pool, while all the other classes are in a unresolved state. So the constant pool merging process was considering the entry with pre-resolved class as different compared to the destination and create a new entry. We now try to consider it as equal specially for Methodref/Fieldref. > > Jean-Philippe Bempel has updated the pull request incrementally with one additional commit since the last revision: > > remove trailing whitespace I'm sorry for the delay in reviewing this. This looks great! ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14780#pullrequestreview-1635785374 From coleenp at openjdk.org Wed Sep 20 14:41:53 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 20 Sep 2023 14:41:53 GMT Subject: RFR: 8316427: Duplicated code for {obj, type}ArrayKlass::array_klass In-Reply-To: <7_e_ClM7hvCdLc2vZPR_G53swZSB84sB5ipSr2AYymY=.4d415d32-2ed7-4148-b23d-4d554fc864bc@github.com> References: <7_e_ClM7hvCdLc2vZPR_G53swZSB84sB5ipSr2AYymY=.4d415d32-2ed7-4148-b23d-4d554fc864bc@github.com> Message-ID: On Mon, 18 Sep 2023 16:40:08 GMT, Coleen Phillimore wrote: > Please review this trivial change to move duplicated array_klass(). Tested with tier1 on Oracle supported platforms. Thanks for reviewing Andrew and David. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15791#issuecomment-1727861444 From coleenp at openjdk.org Wed Sep 20 14:41:55 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 20 Sep 2023 14:41:55 GMT Subject: Integrated: 8316427: Duplicated code for {obj,type}ArrayKlass::array_klass In-Reply-To: <7_e_ClM7hvCdLc2vZPR_G53swZSB84sB5ipSr2AYymY=.4d415d32-2ed7-4148-b23d-4d554fc864bc@github.com> References: <7_e_ClM7hvCdLc2vZPR_G53swZSB84sB5ipSr2AYymY=.4d415d32-2ed7-4148-b23d-4d554fc864bc@github.com> Message-ID: <4XcVyItAhQICrB2T-gvlf40ZWLLxonQaXaBcpwtXuCs=.16a13341-f99c-437d-a387-9e53d76a7981@github.com> On Mon, 18 Sep 2023 16:40:08 GMT, Coleen Phillimore wrote: > Please review this trivial change to move duplicated array_klass(). Tested with tier1 on Oracle supported platforms. This pull request has now been integrated. Changeset: 9e00949a Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/9e00949a26fa881d0c6726be3ec27edd142e592c Stats: 198 lines in 6 files changed: 65 ins; 130 del; 3 mod 8316427: Duplicated code for {obj,type}ArrayKlass::array_klass Reviewed-by: dholmes, adinn ------------- PR: https://git.openjdk.org/jdk/pull/15791 From mdoerr at openjdk.org Wed Sep 20 14:44:51 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 20 Sep 2023 14:44:51 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates In-Reply-To: References: Message-ID: On Wed, 13 Sep 2023 14:02:19 GMT, Aleksey Shipilev wrote: > Work in progress, submitting for broader attention. > > See more details in the bug and related issues. > > This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. > > This implements the mitigation for AArch64 and x86_64. More platforms can be implemented in this PR, or deferred to later PRs. Port maintainers, feel free to suggest the patches for your arches, I'll be happy to fold them in. > > Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. > > Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. > > Current PR deliberately sets backoff at `1000` to simplify testing. The actual value should be chosen by broader experiments. > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` > - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` PPC64 implementation: diff --git a/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp b/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp index 8942199610e..0bef1b3760a 100644 --- a/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp +++ b/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp @@ -2021,7 +2021,28 @@ void MacroAssembler::check_klass_subtype_slow_path(Register sub_klass, b(fallthru); bind(hit); - std(super_klass, target_offset, sub_klass); // save result to cache + // Success. Try to cache the super we found and proceed in triumph. + uint32_t super_cache_backoff = checked_cast(SecondarySuperMissBackoff); + if (super_cache_backoff > 0) { + Label L_skip; + + lwz(temp, in_bytes(JavaThread::backoff_secondary_super_miss_offset()), R16_thread); + addic_(temp, temp, -1); + stw(temp, in_bytes(JavaThread::backoff_secondary_super_miss_offset()), R16_thread); + bgt(CCR0, L_skip); + + load_const_optimized(temp, super_cache_backoff); + stw(temp, in_bytes(JavaThread::backoff_secondary_super_miss_offset()), R16_thread); + + std(super_klass, target_offset, sub_klass); // save result to cache + + bind(L_skip); + if (L_success == nullptr && result_reg == noreg) { + crorc(CCR0, Assembler::equal, CCR0, Assembler::equal); // Restore CCR0 EQ + } + } else { + std(super_klass, target_offset, sub_klass); // save result to cache + } if (result_reg != noreg) { li(result_reg, 0); } // load zero result (indicates a hit) if (L_success != nullptr) { b(*L_success); } else if (result_reg == noreg) { blr(); } // return with CR0.eq if neither label nor result reg provided Power10 results (2 cores, SMT8, 3.55 GHz): -XX:SecondarySuperMissBackoff=0 Benchmark Mode Cnt Score Error Units SecondarySuperCache.contended avgt 15 1107.019 ? 16.206 ns/op SecondarySuperCache.uncontended avgt 15 17.984 ? 0.164 ns/op -XX:SecondarySuperMissBackoff=10 Benchmark Mode Cnt Score Error Units SecondarySuperCache.contended avgt 15 431.557 ? 3.690 ns/op SecondarySuperCache.uncontended avgt 15 17.870 ? 0.088 ns/op -XX:SecondarySuperMissBackoff=100 Benchmark Mode Cnt Score Error Units SecondarySuperCache.contended avgt 15 90.766 ? 0.196 ns/op SecondarySuperCache.uncontended avgt 15 17.925 ? 0.239 ns/op -XX:SecondarySuperMissBackoff=1000 Benchmark Mode Cnt Score Error Units SecondarySuperCache.contended avgt 15 39.803 ? 0.369 ns/op SecondarySuperCache.uncontended avgt 15 18.070 ? 0.337 ns/op -XX:SecondarySuperMissBackoff=10000 Benchmark Mode Cnt Score Error Units SecondarySuperCache.contended avgt 15 34.499 ? 0.451 ns/op SecondarySuperCache.uncontended avgt 15 17.933 ? 0.165 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1727871239 From shade at openjdk.org Wed Sep 20 14:52:32 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 20 Sep 2023 14:52:32 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v2] In-Reply-To: References: Message-ID: > Work in progress, submitting for broader attention. > > See more details in the bug and related issues. > > This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. > > This implements the mitigation for AArch64 and x86_64. More platforms can be implemented in this PR, or deferred to later PRs. Port maintainers, feel free to suggest the patches for your arches, I'll be happy to fold them in. > > Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. > > Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. > > Current PR deliberately sets backoff at `1000` to simplify testing. The actual value should be chosen by broader experiments. > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` > - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: PPC version ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15718/files - new: https://git.openjdk.org/jdk/pull/15718/files/d0a4d5c5..6927a8f7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15718&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15718&range=00-01 Stats: 22 lines in 1 file changed: 21 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15718.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15718/head:pull/15718 PR: https://git.openjdk.org/jdk/pull/15718 From shade at openjdk.org Wed Sep 20 14:52:34 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 20 Sep 2023 14:52:34 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates In-Reply-To: References: Message-ID: On Wed, 20 Sep 2023 14:41:52 GMT, Martin Doerr wrote: >> Work in progress, submitting for broader attention. >> >> See more details in the bug and related issues. >> >> This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. >> >> This implements the mitigation for AArch64 and x86_64. More platforms can be implemented in this PR, or deferred to later PRs. Port maintainers, feel free to suggest the patches for your arches, I'll be happy to fold them in. >> >> Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. >> >> Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. >> >> Current PR deliberately sets backoff at `1000` to simplify testing. The actual value should be chosen by broader experiments. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` >> - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` > > PPC64 implementation: > > diff --git a/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp b/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp > index 8942199610e..0bef1b3760a 100644 > --- a/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp > +++ b/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp > @@ -2021,7 +2021,28 @@ void MacroAssembler::check_klass_subtype_slow_path(Register sub_klass, > b(fallthru); > > bind(hit); > - std(super_klass, target_offset, sub_klass); // save result to cache > + // Success. Try to cache the super we found and proceed in triumph. > + uint32_t super_cache_backoff = checked_cast(SecondarySuperMissBackoff); > + if (super_cache_backoff > 0) { > + Label L_skip; > + > + lwz(temp, in_bytes(JavaThread::backoff_secondary_super_miss_offset()), R16_thread); > + addic_(temp, temp, -1); > + stw(temp, in_bytes(JavaThread::backoff_secondary_super_miss_offset()), R16_thread); > + bgt(CCR0, L_skip); > + > + load_const_optimized(temp, super_cache_backoff); > + stw(temp, in_bytes(JavaThread::backoff_secondary_super_miss_offset()), R16_thread); > + > + std(super_klass, target_offset, sub_klass); // save result to cache > + > + bind(L_skip); > + if (L_success == nullptr && result_reg == noreg) { > + crorc(CCR0, Assembler::equal, CCR0, Assembler::equal); // Restore CCR0 EQ > + } > + } else { > + std(super_klass, target_offset, sub_klass); // save result to cache > + } > if (result_reg != noreg) { li(result_reg, 0); } // load zero result (indicates a hit) > if (L_success != nullptr) { b(*L_success); } > else if (result_reg == noreg) { blr(); } // return with CR0.eq if neither label nor result reg provided > > > Power10 results (2 cores, SMT8, 3.55 GHz): > -XX:SecondarySuperMissBackoff=0 > > Benchmark Mode Cnt Score Error Units > SecondarySuperCache.contended avgt 15 1107.019 ? 16.206 ns/op > SecondarySuperCache.uncontended avgt 15 17.984 ? 0.164 ns/op > > > -XX:SecondarySuperMissBackoff=10 > > Benchmark Mode Cnt Score Error Units > SecondarySuperCache.contended avgt 15 431.557 ? 3.690 ns/op > SecondarySuperCache.uncontended avgt 15 17.870 ? 0.088 ns/op > > > -XX:SecondarySuperMissBackoff=100 > > Benchmark Mode Cnt Score Error Units > SecondarySuperCache.contended avgt 15 90.766 ? 0.196 ns/op > SecondarySuperCache.uncontended avgt 15 17.925 ? 0.239 ns/op > > > -XX:SecondarySuperMissBackoff=1000 > > Benchmark Mode Cnt Score Error Units > SecondarySuperCache.c... @TheRealMDoerr: Folded in, thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1727876381 From mdoerr at openjdk.org Wed Sep 20 14:52:36 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 20 Sep 2023 14:52:36 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates In-Reply-To: References: Message-ID: <3SOlbHlxZJOpxmera6rzndYQYtT4N6kjw8rKdfnhpLE=.cf0777c6-83f1-4691-930f-468e191c2c13@github.com> On Wed, 13 Sep 2023 14:02:19 GMT, Aleksey Shipilev wrote: > Work in progress, submitting for broader attention. > > See more details in the bug and related issues. > > This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. > > This implements the mitigation for AArch64 and x86_64. More platforms can be implemented in this PR, or deferred to later PRs. Port maintainers, feel free to suggest the patches for your arches, I'll be happy to fold them in. > > Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. > > Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. > > Current PR deliberately sets backoff at `1000` to simplify testing. The actual value should be chosen by broader experiments. > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` > - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` You're using 64 bit instructions, despite `uint32_t _backoff_secondary_super_miss;`. That should get fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1727880089 From shade at openjdk.org Wed Sep 20 14:53:42 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 20 Sep 2023 14:53:42 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates In-Reply-To: <3SOlbHlxZJOpxmera6rzndYQYtT4N6kjw8rKdfnhpLE=.cf0777c6-83f1-4691-930f-468e191c2c13@github.com> References: <3SOlbHlxZJOpxmera6rzndYQYtT4N6kjw8rKdfnhpLE=.cf0777c6-83f1-4691-930f-468e191c2c13@github.com> Message-ID: On Wed, 20 Sep 2023 14:46:41 GMT, Martin Doerr wrote: > You're using 64 bit instructions, despite `uint32_t _backoff_secondary_super_miss;`. That should get fixed. Oh, on AArch64, right. x86_64 looks fine, uses 32-bit `subl`/`movl` properly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1727886840 From shade at openjdk.org Wed Sep 20 15:41:38 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 20 Sep 2023 15:41:38 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v3] In-Reply-To: References: Message-ID: <5WhP2XBIpUpeQvKiDBRnzMy9kJmxjLPqI-8bhpxjEZs=.1f50959b-48b4-4ec8-a211-c5376d3a4610@github.com> > Work in progress, submitting for broader attention. > > See more details in the bug and related issues. > > This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. > > This implements the mitigation for AArch64 and x86_64. More platforms can be implemented in this PR, or deferred to later PRs. Port maintainers, feel free to suggest the patches for your arches, I'll be happy to fold them in. > > Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. > > Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. > > Current PR deliberately sets backoff at `1000` to simplify testing. The actual value should be chosen by broader experiments. > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` > - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Use proper 32-bit stores on AArch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15718/files - new: https://git.openjdk.org/jdk/pull/15718/files/6927a8f7..e5e7ce74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15718&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15718&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/15718.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15718/head:pull/15718 PR: https://git.openjdk.org/jdk/pull/15718 From tschatzl at openjdk.org Wed Sep 20 15:43:18 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 20 Sep 2023 15:43:18 GMT Subject: RFR: 8316581: Improve performance of Symbol::print_value_on() [v2] In-Reply-To: References: Message-ID: > Hi all, > > please review this (hopefully correct) optimization of `Symbol::print_value_on()`; investigation into class unloading time distribution showed that a lot of time is spent in the `UnloadingEventLog::log()` call (25+%, see CR). > > The reason seems to be the use of `outputStream::print()` without any need for formatting. > > This seems to decrease time spent in this logging by almost 10x. > > Testing: hs_err output seems still be the same, GHA > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: coleen review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15838/files - new: https://git.openjdk.org/jdk/pull/15838/files/d89cf89c..1ffdce2e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15838&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15838&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/15838.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15838/head:pull/15838 PR: https://git.openjdk.org/jdk/pull/15838 From coleenp at openjdk.org Wed Sep 20 15:53:43 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 20 Sep 2023 15:53:43 GMT Subject: RFR: 8316581: Improve performance of Symbol::print_value_on() [v2] In-Reply-To: References: Message-ID: On Wed, 20 Sep 2023 15:43:18 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this (hopefully correct) optimization of `Symbol::print_value_on()`; investigation into class unloading time distribution showed that a lot of time is spent in the `UnloadingEventLog::log()` call (25+%, see CR). >> >> The reason seems to be the use of `outputStream::print()` without any need for formatting. >> >> This seems to decrease time spent in this logging by almost 10x. >> >> Testing: hs_err output seems still be the same, GHA >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > coleen review That looks good. I see no reason why it was written as this loop. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15838#pullrequestreview-1635989106 From shade at openjdk.org Wed Sep 20 15:53:43 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 20 Sep 2023 15:53:43 GMT Subject: RFR: 8316581: Improve performance of Symbol::print_value_on() [v2] In-Reply-To: References: Message-ID: On Wed, 20 Sep 2023 15:43:18 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this (hopefully correct) optimization of `Symbol::print_value_on()`; investigation into class unloading time distribution showed that a lot of time is spent in the `UnloadingEventLog::log()` call (25+%, see CR). >> >> The reason seems to be the use of `outputStream::print()` without any need for formatting. >> >> This seems to decrease time spent in this logging by almost 10x. >> >> Testing: hs_err output seems still be the same, GHA >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > coleen review Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15838#pullrequestreview-1635994358 From aph at openjdk.org Wed Sep 20 16:00:44 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 20 Sep 2023 16:00:44 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v3] In-Reply-To: <5WhP2XBIpUpeQvKiDBRnzMy9kJmxjLPqI-8bhpxjEZs=.1f50959b-48b4-4ec8-a211-c5376d3a4610@github.com> References: <5WhP2XBIpUpeQvKiDBRnzMy9kJmxjLPqI-8bhpxjEZs=.1f50959b-48b4-4ec8-a211-c5376d3a4610@github.com> Message-ID: On Wed, 20 Sep 2023 15:41:38 GMT, Aleksey Shipilev wrote: >> Work in progress, submitting for broader attention. >> >> See more details in the bug and related issues. >> >> This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. >> >> This implements the mitigation for AArch64 and x86_64. More platforms can be implemented in this PR, or deferred to later PRs. Port maintainers, feel free to suggest the patches for your arches, I'll be happy to fold them in. >> >> Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. >> >> Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. >> >> Current PR deliberately sets backoff at `1000` to simplify testing. The actual value should be chosen by broader experiments. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` >> - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Use proper 32-bit stores on AArch64 src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 1545: > 1543: > 1544: cmp(rscratch1, (u1) 0); > 1545: br(Assembler::GT, L_skip); Suggestion: subsw(rscratch1, rscratch1, 1); strw(rscratch1, Address(rthread, JavaThread::backoff_secondary_super_miss_offset())); br(Assembler::GT, L_skip); src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 1547: > 1545: br(Assembler::GT, L_skip); > 1546: > 1547: mov_immediate32(rscratch1, super_cache_backoff); Suggestion: movw(rscratch1, super_cache_backoff); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15718#discussion_r1331867151 PR Review Comment: https://git.openjdk.org/jdk/pull/15718#discussion_r1331865760 From jpbempel at openjdk.org Wed Sep 20 16:13:50 2023 From: jpbempel at openjdk.org (Jean-Philippe Bempel) Date: Wed, 20 Sep 2023 16:13:50 GMT Subject: RFR: 8308762: Metaspace leak with Instrumentation.retransform [v7] In-Reply-To: References: Message-ID: > Fix a small leak in constant pool merging during retransformation of a class. If this class has a catch block with `Throwable`, the class `Throwable` is pre-resolved in the constant pool, while all the other classes are in a unresolved state. So the constant pool merging process was considering the entry with pre-resolved class as different compared to the destination and create a new entry. We now try to consider it as equal specially for Methodref/Fieldref. Jean-Philippe Bempel has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'openjdk:master' into 8308762 - remove trailing whitespace - remove now useless comment - Rewrite unit test unresolved t2 too, cleanup JVM_CONSTANT_Class useless case - Revert resolved class to unresolved for comparison remove is_unresolved_class_mismatch - add jtreg test for leak - 8308762: Metaspace leak with Instrumentation.retransform Fix a small leak in constant pool merging during retransformation of a class. If this class has a catch block with Throwable, the class Throwable is pre-resolved in the constant pool, while all the other classes are in a unresolved state. So the constant pool merging process was considering the entry with pre-resolved class as different compared to the destination and create a new entry. We now try to consider it as equal specially for Methodref/Fieldref. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14780/files - new: https://git.openjdk.org/jdk/pull/14780/files/26cb41b7..16bc7fad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14780&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14780&range=05-06 Stats: 301167 lines in 6336 files changed: 127258 ins; 113513 del; 60396 mod Patch: https://git.openjdk.org/jdk/pull/14780.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14780/head:pull/14780 PR: https://git.openjdk.org/jdk/pull/14780 From shade at openjdk.org Wed Sep 20 16:17:00 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 20 Sep 2023 16:17:00 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v4] In-Reply-To: References: Message-ID: > Work in progress, submitting for broader attention. > > See more details in the bug and related issues. > > This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. > > This implements the mitigation for AArch64 and x86_64. More platforms can be implemented in this PR, or deferred to later PRs. Port maintainers, feel free to suggest the patches for your arches, I'll be happy to fold them in. > > Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. > > Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. > > Current PR deliberately sets backoff at `1000` to simplify testing. The actual value should be chosen by broader experiments. > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` > - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Cleaner AArch64 code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15718/files - new: https://git.openjdk.org/jdk/pull/15718/files/e5e7ce74..c752a687 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15718&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15718&range=02-03 Stats: 9 lines in 1 file changed: 2 ins; 6 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15718.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15718/head:pull/15718 PR: https://git.openjdk.org/jdk/pull/15718 From shade at openjdk.org Wed Sep 20 16:17:04 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 20 Sep 2023 16:17:04 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v3] In-Reply-To: References: <5WhP2XBIpUpeQvKiDBRnzMy9kJmxjLPqI-8bhpxjEZs=.1f50959b-48b4-4ec8-a211-c5376d3a4610@github.com> Message-ID: On Wed, 20 Sep 2023 15:57:42 GMT, Andrew Haley wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Use proper 32-bit stores on AArch64 > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 1545: > >> 1543: >> 1544: cmp(rscratch1, (u1) 0); >> 1545: br(Assembler::GT, L_skip); > > Suggestion: > > subsw(rscratch1, rscratch1, 1); > strw(rscratch1, Address(rthread, JavaThread::backoff_secondary_super_miss_offset())); > br(Assembler::GT, L_skip); So, does `subsw` have the same condition-code semantics as its x86 sibling? That's nice to know! Thanks, I applied both changes. And then I realized we can actually piggyback on a single counter update, and make a denser code. See new commit! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15718#discussion_r1331882129 From cslucas at openjdk.org Wed Sep 20 16:53:55 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 20 Sep 2023 16:53:55 GMT Subject: Withdrawn: JDK-8315279: Factor 'basic_plus_adr' out of PhaseMacroExpand and delete make_load/store In-Reply-To: References: Message-ID: On Wed, 30 Aug 2023 03:13:08 GMT, Cesar Soares Lucas wrote: > I believe the factory methods for AddPNode should be in the AddPNode class. The make_load / make_store methods in PhaseMacroExpand can be refactored to instead just use the "make" methods from Load/Store classes. > > Tested with tier1-3. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/15480 From kvn at openjdk.org Wed Sep 20 19:11:40 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 20 Sep 2023 19:11:40 GMT Subject: RFR: 8315082: [REDO] Generational ZGC: Tests crash with assert(index == 0 || is_power_of_2(index)) In-Reply-To: References: Message-ID: <_s636XQwztr7WdHsKtvOIQCPil9ny5VNA32dxzD03zs=.23434116-7776-4aa8-9ca9-a8fe014d2c70@github.com> On Wed, 6 Sep 2023 11:54:04 GMT, Roberto Casta?eda Lozano wrote: > This changeset (REDO of [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749)) ensures that the array copy stub underlying the intrinsic implementation of `Object.clone` only copies its (double-word aligned) payload, excluding the remaining object alignment padding words, when a non-default `ObjectAlignmentInBytes` value is used. This prevents the specialized ZGC stubs for `Object[]` array copy from processing undefined object alignment padding words as valid object pointers. For further details about the specific failure, see initial analysis of [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749) by Erik ?sterlund and Stefan Karlsson and comments in `test/hotspot/jtreg/compiler/gcbarriers/TestArrayCopyWithLargeObjectAlignment.java`. > > As a side-benefit, the changeset simplifies the array size computation logic in `GraphKit::new_array()` by decoupling computation of header size and alignment padding size. > > #### Additional changes compared to [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749) > > This changeset proposes the exact same solution as [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749), that is, identical changes to `barrierSetC2.cpp`, `graphKit.cpp`, `library_call.cpp`, and `TestArrayCopyWithLargeObjectAlignment.java`. On top of that, it relaxes an assertion in the idealization of `ArrayCopy` nodes violated by [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749) and reported in [JDK-8315029](https://bugs.openjdk.org/browse/JDK-8315029) (new changes in `arraycopynode.cpp`, new regression test `TestCloneArrayWithDifferentLengthConstness.java`). The original, stricter assertion checks that, while idealizing an ArrayCopy node, the "constness" of the array copy's word-length (whether it is known by C2 to be constant or not) is equivalent to that of the array copy's element-length. For cases in which the element-length is within a small, fixed range (e.g. for an `int` array of length `3..4`) so that all element-length values lead to the same number of wo rds (`2`), the assertion used to hold before this changeset only because of weak type propagation in `AndL` (preventing the constant word-length to be discovered), see the left graph below: > > ![from-element-to-word-length](https://github.com/openjdk/jdk/assets/8792647/3d5535cf-4afa-46dd-bc48-30430eead12f) > > With the proposed changes, the array copy word-length is computed in a more straightforward way that enables C2 to infer the precise number of words in the same scenario ... Thank you for answering all my questions. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15589#pullrequestreview-1636327288 From rkennke at openjdk.org Wed Sep 20 19:27:36 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 20 Sep 2023 19:27:36 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v57] In-Reply-To: References: Message-ID: > See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. > > Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. > > Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. > > Testing: > - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] tier1 (x86_64, x86_32, aarch64, riscv) > - [x] tier2 (x86_64, aarch64, riscv) > - [x] tier3 (x86_64, riscv) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Move gap init into allocate_header() (x86) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11044/files - new: https://git.openjdk.org/jdk/pull/11044/files/bd5a65fd..ae1cb780 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11044&range=56 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11044&range=55-56 Stats: 25 lines in 1 file changed: 13 ins; 11 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11044.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/11044/head:pull/11044 PR: https://git.openjdk.org/jdk/pull/11044 From dlong at openjdk.org Wed Sep 20 21:47:45 2023 From: dlong at openjdk.org (Dean Long) Date: Wed, 20 Sep 2023 21:47:45 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v10] In-Reply-To: References: Message-ID: On Fri, 15 Sep 2023 06:11:38 GMT, Hao Sun wrote: >> ### Background >> >> 1. PAC-RET branch protection was initially implemented on Linux/AArch64 in JDK-8277204 [1]. >> >> 2. However, it was broken with the introduction of virtual threads [2], mainly because the continuation freeze/thaw mechanism would trigger stack copying to/from memory, whereas the saved and signed LR on the stack doesn't get re-signed accordingly. >> >> 3. PR-9067 [3] tried to implement the re-sign part, but it was not accepted because option "PreserveFramePointer" is always turned on by PAC-RET but this would slow down virtual threads by ~5-20x. >> >> 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview language features are enabled. Note that virtual thread is one preview feature then. >> >> 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. >> >> ### Goal >> >> This patch aims to make PAC-RET compatible with virtual threads. >> >> ### Requirements of virtual threads >> >> R-1: Option "PreserveFramePointer" should be turned off. That is, PAC-RET implementation should not rely on frame pointer FP. Otherwise, the fast path in stack copying will never be taken. >> >> R-2: Use some invariant values to stack copying as the modifier, so as to avoid the PAC re-sign for continuation thaw, as the fast path in stack copying doesn't walk the frame. >> >> Note that more details can be found in the discussion [3]. >> >> ### Investigation >> >> We considered to use (relative) stack pointer SP, thread ID, PACStack [7] and value zero as the candidate modifier. >> >> 1. SP: In some scenarios, we need to authenticate the return address in places where the current SP doesn't match the SP on function entry. E.g. see the usage in Runtime1::generate_handle_exception(). Hence, neither absolute nor relative SP works. >> >> 2. thread ID (tid): It's invariant to virtual thread, but it's nontrivial to access it from the JIT side. We need 1) firstly resolve the address of current thread (See [8] as an example), and 2) get the tid field in the way like java_lang_Thread::thread_id(). I suppose this would introduce big performance overhead. Then can we turn to use "rthread" register (JavaThread object address) as the modifier? Unfortunately, it's not an invariant to virtual threads and PAC re-sign is still needed. >> >> 5. PACStack uses the signed return address of caller as the modifier to sign the callee's return address. In this way, we get one PACed call chain. The modifier should be saved into somewhere around the frame record. Inevitably, FP sh... > > Hao Sun has updated the pull request incrementally with one additional commit since the last revision: > > break long lines src/hotspot/os_cpu/linux_aarch64/pauth_linux_aarch64.inline.hpp line 57: > 55: register address r17 __asm("r17") = ret_addr; > 56: register address r16 __asm("r16") = 0; > 57: asm (PACIA1716 : "+r"(r17) : "r"(r16)); Can we use PACIZA or PACIA here so we don't force the use of r16/r17? src/hotspot/os_cpu/linux_aarch64/pauth_linux_aarch64.inline.hpp line 69: > 67: register address r17 __asm("r17") = ret_addr; > 68: register address r16 __asm("r16") = 0; > 69: asm (AUTIA1716 : "+r"(r17) : "r"(r16)); AUTIZA or AUTIA? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1332214210 PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1332215314 From dlong at openjdk.org Wed Sep 20 21:59:49 2023 From: dlong at openjdk.org (Dean Long) Date: Wed, 20 Sep 2023 21:59:49 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v10] In-Reply-To: References: Message-ID: On Fri, 15 Sep 2023 06:11:38 GMT, Hao Sun wrote: >> ### Background >> >> 1. PAC-RET branch protection was initially implemented on Linux/AArch64 in JDK-8277204 [1]. >> >> 2. However, it was broken with the introduction of virtual threads [2], mainly because the continuation freeze/thaw mechanism would trigger stack copying to/from memory, whereas the saved and signed LR on the stack doesn't get re-signed accordingly. >> >> 3. PR-9067 [3] tried to implement the re-sign part, but it was not accepted because option "PreserveFramePointer" is always turned on by PAC-RET but this would slow down virtual threads by ~5-20x. >> >> 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview language features are enabled. Note that virtual thread is one preview feature then. >> >> 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. >> >> ### Goal >> >> This patch aims to make PAC-RET compatible with virtual threads. >> >> ### Requirements of virtual threads >> >> R-1: Option "PreserveFramePointer" should be turned off. That is, PAC-RET implementation should not rely on frame pointer FP. Otherwise, the fast path in stack copying will never be taken. >> >> R-2: Use some invariant values to stack copying as the modifier, so as to avoid the PAC re-sign for continuation thaw, as the fast path in stack copying doesn't walk the frame. >> >> Note that more details can be found in the discussion [3]. >> >> ### Investigation >> >> We considered to use (relative) stack pointer SP, thread ID, PACStack [7] and value zero as the candidate modifier. >> >> 1. SP: In some scenarios, we need to authenticate the return address in places where the current SP doesn't match the SP on function entry. E.g. see the usage in Runtime1::generate_handle_exception(). Hence, neither absolute nor relative SP works. >> >> 2. thread ID (tid): It's invariant to virtual thread, but it's nontrivial to access it from the JIT side. We need 1) firstly resolve the address of current thread (See [8] as an example), and 2) get the tid field in the way like java_lang_Thread::thread_id(). I suppose this would introduce big performance overhead. Then can we turn to use "rthread" register (JavaThread object address) as the modifier? Unfortunately, it's not an invariant to virtual threads and PAC re-sign is still needed. >> >> 5. PACStack uses the signed return address of caller as the modifier to sign the callee's return address. In this way, we get one PACed call chain. The modifier should be saved into somewhere around the frame record. Inevitably, FP sh... > > Hao Sun has updated the pull request incrementally with one additional commit since the last revision: > > break long lines src/hotspot/share/runtime/continuationFreezeThaw.cpp line 721: > 719: #endif > 720: ContinuationHelper::patch_return_address_at( > 721: chunk_bottom_sp - frame::sender_sp_ret_address_offset(), How about reusing retaddr_slot here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1332223594 From dholmes at openjdk.org Thu Sep 21 00:04:47 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 21 Sep 2023 00:04:47 GMT Subject: RFR: 8308762: Metaspace leak with Instrumentation.retransform [v7] In-Reply-To: References: Message-ID: On Wed, 20 Sep 2023 16:13:50 GMT, Jean-Philippe Bempel wrote: >> Fix a small leak in constant pool merging during retransformation of a class. If this class has a catch block with `Throwable`, the class `Throwable` is pre-resolved in the constant pool, while all the other classes are in a unresolved state. So the constant pool merging process was considering the entry with pre-resolved class as different compared to the destination and create a new entry. We now try to consider it as equal specially for Methodref/Fieldref. > > Jean-Philippe Bempel has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'openjdk:master' into 8308762 > - remove trailing whitespace > - remove now useless comment > - Rewrite unit test > > unresolved t2 too, cleanup JVM_CONSTANT_Class useless case > - Revert resolved class to unresolved for comparison > > remove is_unresolved_class_mismatch > - add jtreg test for leak > - 8308762: Metaspace leak with Instrumentation.retransform > > Fix a small leak in constant pool merging during retransformation of > a class. If this class has a catch block with Throwable, the class > Throwable is pre-resolved in the constant pool, while all the other > classes are in a unresolved state. So the constant pool merging > process was considering the entry with pre-resolved class as different > compared to the destination and create a new entry. We now try to > consider it as equal specially for Methodref/Fieldref. Looks good. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14780#pullrequestreview-1636670219 From dholmes at openjdk.org Thu Sep 21 00:53:58 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 21 Sep 2023 00:53:58 GMT Subject: RFR: 8316229: Enhance class initialization logging [v2] In-Reply-To: <6Zf3KZAlUnN-bw-b5qqOKwjfc4L6yP7QJnU-UK-5z1o=.31fd4237-9f4a-456a-8590-e11d1ec4225a@github.com> References: <15AsPHmB9KIv7FT238yqlj0iASAnDx79JiCD0WC81hg=.ebf94e32-df86-4232-8970-2557346e5cf1@github.com> <6Zf3KZAlUnN-bw-b5qqOKwjfc4L6yP7QJnU-UK-5z1o=.31fd4237-9f4a-456a-8590-e11d1ec4225a@github.com> Message-ID: On Wed, 20 Sep 2023 09:18:31 GMT, Aleksey Shipilev wrote: >> David Holmes has updated the pull request incrementally with two additional commits since the last revision: >> >> - Extra space >> - Put thread name in quotes. > > Marked as reviewed by shade (Reviewer). Thanks for the reviews @shipilev and @coleenp ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15809#issuecomment-1728603822 From dholmes at openjdk.org Thu Sep 21 00:54:00 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 21 Sep 2023 00:54:00 GMT Subject: Integrated: 8316229: Enhance class initialization logging In-Reply-To: <15AsPHmB9KIv7FT238yqlj0iASAnDx79JiCD0WC81hg=.ebf94e32-df86-4232-8970-2557346e5cf1@github.com> References: <15AsPHmB9KIv7FT238yqlj0iASAnDx79JiCD0WC81hg=.ebf94e32-df86-4232-8970-2557346e5cf1@github.com> Message-ID: On Tue, 19 Sep 2023 06:33:02 GMT, David Holmes wrote: > This change adds some additional debug logging to the class linking and initialization process. It was useful in diagnosing the deadlock described by: > > https://bugs.openjdk.org/browse/JDK-8316469 > > See the example output in JBS issue. > > The changes are mostly uncontroversial but I needed to expose a way to access the init thread's name, which can't use the regular `name()` method as the safety check doesn't recognise the calling context as being safe. > > Testing: > - manual examination of logging output > - ran the only test that enables class+init logging: runtime/logging/ClassInitializationTest.java (no change as expected) > - tiers1-3 sanity > > Thanks. This pull request has now been integrated. Changeset: 84124794 Author: David Holmes URL: https://git.openjdk.org/jdk/commit/84124794c2ff70ba22cbfbf1ff01cf4d935896bd Stats: 73 lines in 4 files changed: 69 ins; 1 del; 3 mod 8316229: Enhance class initialization logging Reviewed-by: shade, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/15809 From dholmes at openjdk.org Thu Sep 21 03:02:47 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 21 Sep 2023 03:02:47 GMT Subject: RFR: 8316581: Improve performance of Symbol::print_value_on() [v2] In-Reply-To: References: Message-ID: <6eUyzhZ4IfNH8zsMKNv76u3aarY9LTd02uKls9ZrzTk=.bdbc8214-a096-4cea-a4b4-b5578cec8aab@github.com> On Wed, 20 Sep 2023 15:43:18 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this (hopefully correct) optimization of `Symbol::print_value_on()`; investigation into class unloading time distribution showed that a lot of time is spent in the `UnloadingEventLog::log()` call (25+%, see CR). >> >> The reason seems to be the use of `outputStream::print()` without any need for formatting. >> >> This seems to decrease time spent in this logging by almost 10x. >> >> Testing: hs_err output seems still be the same, GHA >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > coleen review This seems okay on the surface but raises some questions for me. Why do we have `print_value_on` and `print_symbol_on`? I get the sense that the former is somehow lower-level and potentially unsafe - which suggests that UL should not be using it in general! I could not find the original review thread for when `print_value_on` was added to answer this question, nor answer why the loop was used. Thanks. src/hotspot/share/oops/symbol.cpp line 393: > 391: void Symbol::print_value_on(outputStream* st) const { > 392: st->print_raw("'", 1); > 393: static_assert(sizeof(u1) == sizeof(char), "must be"); Given the whole class assumes u1 and char equivalence this assertion seems out of place here. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15838#pullrequestreview-1636799185 PR Review Comment: https://git.openjdk.org/jdk/pull/15838#discussion_r1332383153 From jpbempel at openjdk.org Thu Sep 21 05:08:45 2023 From: jpbempel at openjdk.org (Jean-Philippe Bempel) Date: Thu, 21 Sep 2023 05:08:45 GMT Subject: RFR: 8308762: Metaspace leak with Instrumentation.retransform [v7] In-Reply-To: References: Message-ID: On Wed, 20 Sep 2023 16:13:50 GMT, Jean-Philippe Bempel wrote: >> Fix a small leak in constant pool merging during retransformation of a class. If this class has a catch block with `Throwable`, the class `Throwable` is pre-resolved in the constant pool, while all the other classes are in a unresolved state. So the constant pool merging process was considering the entry with pre-resolved class as different compared to the destination and create a new entry. We now try to consider it as equal specially for Methodref/Fieldref. > > Jean-Philippe Bempel has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'openjdk:master' into 8308762 > - remove trailing whitespace > - remove now useless comment > - Rewrite unit test > > unresolved t2 too, cleanup JVM_CONSTANT_Class useless case > - Revert resolved class to unresolved for comparison > > remove is_unresolved_class_mismatch > - add jtreg test for leak > - 8308762: Metaspace leak with Instrumentation.retransform > > Fix a small leak in constant pool merging during retransformation of > a class. If this class has a catch block with Throwable, the class > Throwable is pre-resolved in the constant pool, while all the other > classes are in a unresolved state. So the constant pool merging > process was considering the entry with pre-resolved class as different > compared to the destination and create a new entry. We now try to > consider it as equal specially for Methodref/Fieldref. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14780#issuecomment-1728833538 From pchilanomate at openjdk.org Thu Sep 21 05:18:41 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 21 Sep 2023 05:18:41 GMT Subject: RFR: 8316456: StackWalker may skip Continuation::yield0 frame mistakenly In-Reply-To: References: Message-ID: On Mon, 18 Sep 2023 23:00:09 GMT, Mandy Chung wrote: > `JVM_MoreStackWalk` has a bug that always assumes that the Java frame > stream is currently at the frame decoded in the last patch and so always > advances to the next frame before filling in the new batch of stack frame. > However `JVM_MoreStackWalk` may return 0. The library will set > the continuation to its parent. It then call `JVM_MoreStackWalk` to continue > the stack walking but the last decoded frame has already been advanced. > The Java frame stream is already at the top frame of the parent continuation. . > The current implementation skips "Continuation::yield0" mistakenly. This > only happens with `-XX:+ShowHiddenFrames` or `StackWalker.Option.SHOW_HIDDEN_FRAMES`. > > The fix is to pass the number of frames decoded in the last batch to `JVM_MoreStackWalk` > so that the VM will determine if the current frame should be skipped or not. > > `test/jdk/jdk/internal/vm/Continuation/Scoped.java` test now correctly checks > the expected result where "yield0" exists between "enter" and "run" frames. src/java.base/share/classes/java/lang/StackStreamFactory.java line 443: > 441: > 442: // If the last batch didn't fetch any frames, keep the current batch size. > 443: int lastBatchFrameCount = frameBuffer.numFrames(); I run some tests to understand the issue and I got the same failure if I now set MIN_BATCH_SIZE to 7. This just forces the same situation where Continuation::enter is the last frame in the buffer, otherwise since the patch also changes the batch sizes we don't fall into that issue anymore. The problem is with this numFrames() method which still returns a number > 0 after the fetch attempt that returns with no frames. Maybe there is a reset missing for origin and fence when fetching the next batch? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15804#discussion_r1332471224 From jpbempel at openjdk.org Thu Sep 21 05:18:55 2023 From: jpbempel at openjdk.org (Jean-Philippe Bempel) Date: Thu, 21 Sep 2023 05:18:55 GMT Subject: Integrated: 8308762: Metaspace leak with Instrumentation.retransform In-Reply-To: References: Message-ID: On Thu, 6 Jul 2023 05:18:01 GMT, Jean-Philippe Bempel wrote: > Fix a small leak in constant pool merging during retransformation of a class. If this class has a catch block with `Throwable`, the class `Throwable` is pre-resolved in the constant pool, while all the other classes are in a unresolved state. So the constant pool merging process was considering the entry with pre-resolved class as different compared to the destination and create a new entry. We now try to consider it as equal specially for Methodref/Fieldref. This pull request has now been integrated. Changeset: df4a25b4 Author: Jean-Philippe Bempel Committer: David Holmes URL: https://git.openjdk.org/jdk/commit/df4a25b41c7f339cd077e072aa0fd3604ed809f5 Stats: 133 lines in 5 files changed: 78 ins; 55 del; 0 mod 8308762: Metaspace leak with Instrumentation.retransform Reviewed-by: dholmes, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/14780 From rcastanedalo at openjdk.org Thu Sep 21 05:51:58 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 21 Sep 2023 05:51:58 GMT Subject: Integrated: 8315082: [REDO] Generational ZGC: Tests crash with assert(index == 0 || is_power_of_2(index)) In-Reply-To: References: Message-ID: On Wed, 6 Sep 2023 11:54:04 GMT, Roberto Casta?eda Lozano wrote: > This changeset (REDO of [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749)) ensures that the array copy stub underlying the intrinsic implementation of `Object.clone` only copies its (double-word aligned) payload, excluding the remaining object alignment padding words, when a non-default `ObjectAlignmentInBytes` value is used. This prevents the specialized ZGC stubs for `Object[]` array copy from processing undefined object alignment padding words as valid object pointers. For further details about the specific failure, see initial analysis of [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749) by Erik ?sterlund and Stefan Karlsson and comments in `test/hotspot/jtreg/compiler/gcbarriers/TestArrayCopyWithLargeObjectAlignment.java`. > > As a side-benefit, the changeset simplifies the array size computation logic in `GraphKit::new_array()` by decoupling computation of header size and alignment padding size. > > #### Additional changes compared to [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749) > > This changeset proposes the exact same solution as [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749), that is, identical changes to `barrierSetC2.cpp`, `graphKit.cpp`, `library_call.cpp`, and `TestArrayCopyWithLargeObjectAlignment.java`. On top of that, it relaxes an assertion in the idealization of `ArrayCopy` nodes violated by [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749) and reported in [JDK-8315029](https://bugs.openjdk.org/browse/JDK-8315029) (new changes in `arraycopynode.cpp`, new regression test `TestCloneArrayWithDifferentLengthConstness.java`). The original, stricter assertion checks that, while idealizing an ArrayCopy node, the "constness" of the array copy's word-length (whether it is known by C2 to be constant or not) is equivalent to that of the array copy's element-length. For cases in which the element-length is within a small, fixed range (e.g. for an `int` array of length `3..4`) so that all element-length values lead to the same number of wo rds (`2`), the assertion used to hold before this changeset only because of weak type propagation in `AndL` (preventing the constant word-length to be discovered), see the left graph below: > > ![from-element-to-word-length](https://github.com/openjdk/jdk/assets/8792647/3d5535cf-4afa-46dd-bc48-30430eead12f) > > With the proposed changes, the array copy word-length is computed in a more straightforward way that enables C2 to infer the precise number of words in the same scenario ... This pull request has now been integrated. Changeset: ceff47b4 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/ceff47b462ccbaff5cc16111dc65463a6d8d3d8d Stats: 188 lines in 6 files changed: 157 ins; 10 del; 21 mod 8315082: [REDO] Generational ZGC: Tests crash with assert(index == 0 || is_power_of_2(index)) Co-authored-by: Stefan Karlsson Co-authored-by: Erik ?sterlund Reviewed-by: ayang, thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/15589 From tschatzl at openjdk.org Thu Sep 21 08:30:53 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 21 Sep 2023 08:30:53 GMT Subject: RFR: 8316581: Improve performance of Symbol::print_value_on() [v3] In-Reply-To: References: Message-ID: <33ER-B1HZD1c5J_mZnH5zmWEuqtymYJv5xhbXwmacSA=.27e0a2dc-906c-4c0e-8285-17c635e52e2e@github.com> > Hi all, > > please review this (hopefully correct) optimization of `Symbol::print_value_on()`; investigation into class unloading time distribution showed that a lot of time is spent in the `UnloadingEventLog::log()` call (25+%, see CR). > > The reason seems to be the use of `outputStream::print()` without any need for formatting. > > This seems to decrease time spent in this logging by almost 10x. > > Testing: hs_err output seems still be the same, GHA > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: dholmes review y ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15838/files - new: https://git.openjdk.org/jdk/pull/15838/files/1ffdce2e..473c5ecc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15838&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15838&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15838.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15838/head:pull/15838 PR: https://git.openjdk.org/jdk/pull/15838 From tschatzl at openjdk.org Thu Sep 21 08:30:56 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 21 Sep 2023 08:30:56 GMT Subject: RFR: 8316581: Improve performance of Symbol::print_value_on() [v2] In-Reply-To: <6eUyzhZ4IfNH8zsMKNv76u3aarY9LTd02uKls9ZrzTk=.bdbc8214-a096-4cea-a4b4-b5578cec8aab@github.com> References: <6eUyzhZ4IfNH8zsMKNv76u3aarY9LTd02uKls9ZrzTk=.bdbc8214-a096-4cea-a4b4-b5578cec8aab@github.com> Message-ID: On Thu, 21 Sep 2023 02:51:11 GMT, David Holmes wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> coleen review > > src/hotspot/share/oops/symbol.cpp line 393: > >> 391: void Symbol::print_value_on(outputStream* st) const { >> 392: st->print_raw("'", 1); >> 393: static_assert(sizeof(u1) == sizeof(char), "must be"); > > Given the whole class assumes u1 and char equivalence this assertion seems out of place here. I removed the static assert. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15838#discussion_r1332674523 From tschatzl at openjdk.org Thu Sep 21 08:35:41 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 21 Sep 2023 08:35:41 GMT Subject: RFR: 8316098: Revise signature of numa_get_leaf_groups In-Reply-To: References: Message-ID: <_Da2nLzcJzgi0OlC8oZc-q9cmNTupzYMDjism9yAnbU=.32e81091-0ff5-4f3a-a95e-88b7c462a281@github.com> On Mon, 18 Sep 2023 12:22:11 GMT, Albert Mingkun Yang wrote: > Simple refactoring to better reflect NUMA node id is non-negative using unsigned type. > > More cleanup can possibly be done to avoid the use of `checked_cast` in `os_windows.cpp`, but since the NUMA code in Windows is unreachable (JDK-8244065), I went to for the smallest diff there in order to avoid adding more untested code. Lgtm. Please file an RFE for fixing "More cleanup can possibly be done to avoid the use of checked_cast in os_windows.cpp, but since the NUMA code in Windows is unreachable (JDK-8244065), I went to for the smallest diff there in order to avoid adding more untested code." and link it to JDK-8244065. Thanks. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15786#pullrequestreview-1637241181 From rrich at openjdk.org Thu Sep 21 09:16:34 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 21 Sep 2023 09:16:34 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v9] In-Reply-To: References: Message-ID: > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request incrementally with three additional commits since the last revision: - Avoid expensive start array queries on long arrays - find_first_clean_card: avoid expensive start array queries on long arrays - Revert back to precise scanning of large object arrays This reverts commit 3e6c1b74e7caf0aa44a9688e18b7c710e3d0cb42. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/3e6c1b74..ac6bddbb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=07-08 Stats: 45 lines in 1 file changed: 19 ins; 11 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From ihse at openjdk.org Thu Sep 21 09:20:52 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 21 Sep 2023 09:20:52 GMT Subject: RFR: 8307160: [REDO] Enable the permissive- flag on the Microsoft Visual C compiler [v2] In-Reply-To: References: <7piLRto5nNbhYYYfENCr5ecm4M2xNtMkjkE8XhrLLQ0=.8fd1ac3a-46f8-47a8-ae37-a4abbf7757d9@github.com> Message-ID: On Tue, 8 Aug 2023 19:57:12 GMT, Thomas Stuefe wrote: >> Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision: >> >> - Mismatched declaration in D3DGlyphCache.cpp >> - Fields in awt_TextComponent.cpp >> - reinterpret_cast needed in AccessBridgeJavaEntryPoints.cpp >> - Qualifiers in awt_PrintDialog.h should be removed >> - Likewise for awt_DnDDT.cpp >> - awt_ole.h include order issue in awt_DnDDS.cpp >> - Revert awt_ole.h >> - Earlier fix in awt_ole.h was not complete >> - Merge branch 'openjdk:master' into patch-10 >> - Likewise for awt_Frame.cpp >> - ... and 12 more: https://git.openjdk.org/jdk/compare/3ab6ec2a...51230f3d > > src/java.desktop/windows/native/libawt/windows/awt_Canvas.cpp line 216: > >> 214: { >> 215: PDATA pData; >> 216: JNI_CHECK_PEER_GOTO(canvas, ret); > > Here, and other places: why this scope? ~~I am curious about this, too. What aspect of the code is different from the pedantic compiler perspective?~~ edit: Found the answer further down in the comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15096#discussion_r1332748161 From ihse at openjdk.org Thu Sep 21 09:20:55 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 21 Sep 2023 09:20:55 GMT Subject: RFR: 8307160: [REDO] Enable the permissive- flag on the Microsoft Visual C compiler [v5] In-Reply-To: References: <7piLRto5nNbhYYYfENCr5ecm4M2xNtMkjkE8XhrLLQ0=.8fd1ac3a-46f8-47a8-ae37-a4abbf7757d9@github.com> Message-ID: On Tue, 8 Aug 2023 19:59:52 GMT, Thomas Stuefe wrote: >> Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: >> >> - Merge branch 'master' into patch-10 >> - Document changes in awt_DnDDS.cpp >> - Remove negation in os_windows.cpp >> - Mismatched declaration in D3DGlyphCache.cpp >> - Fields in awt_TextComponent.cpp >> - reinterpret_cast needed in AccessBridgeJavaEntryPoints.cpp >> - Qualifiers in awt_PrintDialog.h should be removed >> - Likewise for awt_DnDDT.cpp >> - awt_ole.h include order issue in awt_DnDDS.cpp >> - Revert awt_ole.h >> - ... and 15 more: https://git.openjdk.org/jdk/compare/11d431b2...1d3d6b5e > > src/java.desktop/windows/native/libawt/windows/awt_DnDDT.cpp line 34: > >> 32: #include "sun_awt_windows_WDropTargetContextPeer.h" >> 33: #include "awt_Container.h" >> 34: #include "awt_ole.h" > > Why? Is this related to the `#define malloc Do_Not_Use_Malloc` issue? If so, the required ordering of includes should be documented here as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15096#discussion_r1332749949 From ihse at openjdk.org Thu Sep 21 09:32:54 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 21 Sep 2023 09:32:54 GMT Subject: RFR: 8307160: [REDO] Enable the permissive- flag on the Microsoft Visual C compiler [v4] In-Reply-To: References: <7piLRto5nNbhYYYfENCr5ecm4M2xNtMkjkE8XhrLLQ0=.8fd1ac3a-46f8-47a8-ae37-a4abbf7757d9@github.com> Message-ID: On Thu, 14 Sep 2023 03:23:55 GMT, Julian Waters wrote: >> Julian Waters has updated the pull request incrementally with one additional commit since the last revision: >> >> Document changes in awt_DnDDS.cpp > > Pinging @TheShermanTanker In my experience, getting reviews from all areas for issues like this that cuts through the entire JDK can be difficult. Another approach, which requires more work from your side, but hopefully less from the reviewers' (and thus makes it easier for them to review) is to split this PR into multiple ones: One for each area (basically, area == mailing list) that just makes the changes to the code necessary to (in the future) turn on /permissive-. And then finally a small "finishing" PR which just touches the makefile and enables the flag, when all code is fixed. As a side effect, it is also 100% clear that all parts of the code has been correctly reviewed, since then reviewers do not need to leave conditions on their reviews ("i only looked at the foo parts"). ------------- PR Comment: https://git.openjdk.org/jdk/pull/15096#issuecomment-1729204060 From jianyesun at openjdk.org Thu Sep 21 09:41:12 2023 From: jianyesun at openjdk.org (Sun Jianye) Date: Thu, 21 Sep 2023 09:41:12 GMT Subject: RFR: 8316654: remove edundant dmb after casal instruction Message-ID: Hi,all. The `casal` means a CAS operate with both load-acquire and store-release semantics.It looks like the subsequent dmb is redundant. Can we remove it? ------------- Commit messages: - remove dmb when using lse cas instructions Changes: https://git.openjdk.org/jdk/pull/15856/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15856&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316654 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15856.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15856/head:pull/15856 PR: https://git.openjdk.org/jdk/pull/15856 From rrich at openjdk.org Thu Sep 21 09:52:43 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 21 Sep 2023 09:52:43 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v9] In-Reply-To: References: Message-ID: On Thu, 21 Sep 2023 09:16:34 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with three additional commits since the last revision: > > - Avoid expensive start array queries on long arrays > - find_first_clean_card: avoid expensive start array queries on long arrays > - Revert back to precise scanning of large object arrays > > This reverts commit 3e6c1b74e7caf0aa44a9688e18b7c710e3d0cb42. So I found the cause for the regression with precise scanning of large arrays: it was the redundant queries of the start array. Without them the regression goes away. I've reverted the last commit bringing back precise scanning of large arrays and changed `find_first_clean_card` to avoid redundant start array queries. This version is new-1 below (https://github.com/openjdk/jdk/pull/14846/commits/bba1d2a6b00b1e9e31ba24c979ded42fa2bc65b9). With the new test variant [card_scan_scarce_2.java](https://bugs.openjdk.org/secure/attachment/106493/card_scan_scarce_2.java) we get dirty and clean cards alternating card by card. New-1 showed still a 7x regression if there are many large arrays (with the same total length of 1000M elements). Baseline -------- $ ./jdk-baseline/bin/java -Xms3g -Xmx3g -XX:+UseParallelGC -XX:ParallelGCThreads=2 -Xlog:gc=trace -Xlog:gc+scavenge=trace card_scan_scarce_2 20 50 [0.002s][warning][logging] No tag set matches selection: gc+scavenge. Did you mean any of the following? gc* gc+exit* gc+load gc+reloc gc+unmap [0.007s][info ][gc ] Using Parallel ARR_ELTS_PER_CARD:128 stride:256 ### bigArrLen:20M bigArrCount:50 length sum:1000M [0.430s][trace ][gc ] GC(0) PSYoung generation size at maximum: 1048576K [0.430s][info ][gc ] GC(0) Pause Young (Allocation Failure) 767M->721M(2944M) 215.824ms ### System.gc [0.596s][trace ][gc ] GC(1) PSYoung generation size at maximum: 1048576K [0.596s][info ][gc ] GC(1) Pause Young (System.gc()) 1023M->1001M(2944M) 123.293ms [0.945s][info ][gc ] GC(2) Pause Full (System.gc()) 1001M->1001M(2944M) 348.504ms [1.565s][trace ][gc ] GC(3) PSYoung generation size at maximum: 1048576K [1.565s][info ][gc ] GC(3) Pause Young (Allocation Failure) 1769M->1001M(2944M) 193.667ms [2.117s][trace ][gc ] GC(4) PSYoung generation size at maximum: 1048576K [2.117s][info ][gc ] GC(4) Pause Young (Allocation Failure) 1769M->1001M(2944M) 193.416ms New-1 ----- $ ./jdk-new-1/bin/java -Xms3g -Xmx3g -XX:+UseParallelGC -XX:ParallelGCThreads=2 -Xlog:gc=trace -Xlog:gc+scavenge=trace card_scan_scarce_2 20 50 [0.006s][info][gc] Using Parallel ARR_ELTS_PER_CARD:128 stride:256 ### bigArrLen:20M bigArrCount:50 length sum:1000M [0.213s][trace][gc,scavenge] stripe count:200 stripe size:64K [0.428s][trace][gc ] GC(0) PSYoung generation size at maximum: 1048576K [0.428s][info ][gc ] GC(0) Pause Young (Allocation Failure) 767M->721M(2944M) 215.891ms ### System.gc [0.471s][trace][gc,scavenge] stripe count:200 stripe size:3077K [0.595s][trace][gc ] GC(1) PSYoung generation size at maximum: 1048576K [0.595s][info ][gc ] GC(1) Pause Young (System.gc()) 1023M->1001M(2944M) 124.276ms [0.939s][info ][gc ] GC(2) Pause Full (System.gc()) 1001M->1001M(2944M) 343.521ms [1.364s][trace][gc,scavenge] stripe count:200 stripe size:5126K [2.744s][trace][gc ] GC(3) PSYoung generation size at maximum: 1048576K [2.744s][info ][gc ] GC(3) Pause Young (Allocation Failure) 1769M->1001M(2944M) 1379.808ms [3.131s][trace][gc,scavenge] stripe count:200 stripe size:5126K [4.518s][trace][gc ] GC(4) PSYoung generation size at maximum: 1048576K [4.518s][info ][gc ] GC(4) Pause Young (Allocation Failure) 1769M->1001M(2944M) 1387.014ms This was again caused by redundant queries of the start array in stripes with the start of a large array. If there are many large array start stripes, this sums up to the regression. New-2 solves this (https://github.com/openjdk/jdk/pull/14846/commits/ac6bddbb14b96dbb47609d2a5c01bfc46490365a). New-2 ----- $ ./jdk-new-2/bin/java -Xms3g -Xmx3g -XX:+UseParallelGC -XX:ParallelGCThreads=2 -Xlog:gc=trace -Xlog:gc+scavenge=trace card_scan_scarce_2 20 50 [0.007s][info][gc] Using Parallel ARR_ELTS_PER_CARD:128 stride:256 ### bigArrLen:20M bigArrCount:50 length sum:1000M [0.234s][trace][gc,scavenge] stripe count:200 stripe size:64K [0.445s][trace][gc ] GC(0) PSYoung generation size at maximum: 1048576K [0.445s][info ][gc ] GC(0) Pause Young (Allocation Failure) 767M->721M(2944M) 212.116ms ### System.gc [0.494s][trace][gc,scavenge] stripe count:200 stripe size:3077K [0.614s][trace][gc ] GC(1) PSYoung generation size at maximum: 1048576K [0.614s][info ][gc ] GC(1) Pause Young (System.gc()) 1023M->1001M(2944M) 120.199ms [0.946s][info ][gc ] GC(2) Pause Full (System.gc()) 1001M->1001M(2944M) 331.740ms [1.376s][trace][gc,scavenge] stripe count:200 stripe size:5126K [1.474s][trace][gc ] GC(3) PSYoung generation size at maximum: 1048576K [1.474s][info ][gc ] GC(3) Pause Young (Allocation Failure) 1769M->1001M(2944M) 97.478ms [1.872s][trace][gc,scavenge] stripe count:200 stripe size:5126K [1.971s][trace][gc ] GC(4) PSYoung generation size at maximum: 1048576K [1.971s][info ][gc ] GC(4) Pause Young (Allocation Failure) 1769M->1001M(2944M) 98.731ms ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1729234695 From smonteith at openjdk.org Thu Sep 21 10:55:39 2023 From: smonteith at openjdk.org (Stuart Monteith) Date: Thu, 21 Sep 2023 10:55:39 GMT Subject: RFR: 8316654: remove edundant dmb after casal instruction In-Reply-To: References: Message-ID: <9NJ-zCd6mDBcNklT6qFTLL0j6DeaoyvItygShH9whP0=.8094279b-1970-4eff-8490-e6ed40fa3a60@github.com> On Thu, 21 Sep 2023 09:33:59 GMT, Sun Jianye wrote: > Hi,all. > The `casal` means a CAS operate with both load-acquire and store-release semantics.It looks like the subsequent dmb is redundant. Can we remove it? I don't think this is correct. The DMB is necessary for when the CASAL fails, the release semantics only applies for that instruction when the write is successful. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15856#issuecomment-1729329265 From jianyesun at openjdk.org Thu Sep 21 11:04:41 2023 From: jianyesun at openjdk.org (Sun Jianye) Date: Thu, 21 Sep 2023 11:04:41 GMT Subject: RFR: 8316654: remove edundant dmb after casal instruction In-Reply-To: <9NJ-zCd6mDBcNklT6qFTLL0j6DeaoyvItygShH9whP0=.8094279b-1970-4eff-8490-e6ed40fa3a60@github.com> References: <9NJ-zCd6mDBcNklT6qFTLL0j6DeaoyvItygShH9whP0=.8094279b-1970-4eff-8490-e6ed40fa3a60@github.com> Message-ID: <0mGFi95iFDoJZyXCw4QNcKNXiLxZr2LUCv85TLHEfEs=.1f51c21e-5a35-4e80-a118-3e3652e2b97b@github.com> On Thu, 21 Sep 2023 10:52:57 GMT, Stuart Monteith wrote: > I don't think this is correct. The DMB is necessary for when the CASAL fails, the release semantics only applies for that instruction when the write is successful. What about changing it to `casa + dmb` ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15856#issuecomment-1729342227 From jianyesun at openjdk.org Thu Sep 21 11:09:41 2023 From: jianyesun at openjdk.org (Sun Jianye) Date: Thu, 21 Sep 2023 11:09:41 GMT Subject: RFR: 8316654: remove edundant dmb after casal instruction In-Reply-To: <0mGFi95iFDoJZyXCw4QNcKNXiLxZr2LUCv85TLHEfEs=.1f51c21e-5a35-4e80-a118-3e3652e2b97b@github.com> References: <9NJ-zCd6mDBcNklT6qFTLL0j6DeaoyvItygShH9whP0=.8094279b-1970-4eff-8490-e6ed40fa3a60@github.com> <0mGFi95iFDoJZyXCw4QNcKNXiLxZr2LUCv85TLHEfEs=.1f51c21e-5a35-4e80-a118-3e3652e2b97b@github.com> Message-ID: On Thu, 21 Sep 2023 11:02:11 GMT, Sun Jianye wrote: > > I don't think this is correct. The DMB is necessary for when the CASAL fails, the release semantics only applies for that instruction when the write is successful. > > What about changing it to `casa + dmb` ? Well, please ignore it. I make a mistake . Thanks for the answer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15856#issuecomment-1729349225 From jianyesun at openjdk.org Thu Sep 21 11:21:49 2023 From: jianyesun at openjdk.org (Sun Jianye) Date: Thu, 21 Sep 2023 11:21:49 GMT Subject: Withdrawn: 8316654: remove edundant dmb after casal instruction In-Reply-To: References: Message-ID: On Thu, 21 Sep 2023 09:33:59 GMT, Sun Jianye wrote: > Hi,all. > The `casal` means a CAS operate with both load-acquire and store-release semantics.It looks like the subsequent dmb is redundant. Can we remove it? This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/15856 From azafari at openjdk.org Thu Sep 21 12:11:57 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 21 Sep 2023 12:11:57 GMT Subject: RFR: 8299915: Remove ArrayAllocatorMallocLimit and associated code Message-ID: 1. `ArrayAllocatorMallocLimit` is removed. The test cases that tested it also are removed. 2. `AllocArrayAllocator` instances are replaced with `MallocArrayAllocator`. 3. The signature of `CHeapBitMap::free(ptr, size)` is kept as it is, since it is called in this way from `GrowableBitMap::resize`, where `T` can be also `ArenaBitMap` and `ResourceBitMap`. However, it uses `MallocArrayAllocator::free(ptr)` and ignores the `size`: ```C++ void CHeapBitMap::free(bm_word_t* map, idx_t size_in_words) const { MallocArrayAllocator::free(map); } ### Test tiers1-4 passed on all platforms. ------------- Commit messages: - 8299915: Remove ArrayAllocatorMallocLimit and associated code Changes: https://git.openjdk.org/jdk/pull/15859/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15859&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8299915 Stats: 265 lines in 8 files changed: 2 ins; 257 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/15859.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15859/head:pull/15859 PR: https://git.openjdk.org/jdk/pull/15859 From djelinski at openjdk.org Thu Sep 21 12:31:41 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Thu, 21 Sep 2023 12:31:41 GMT Subject: RFR: 8316125: Windows call_stub unnecessarily saves xmm16-31 when UseAVX>=3 In-Reply-To: References: Message-ID: On Tue, 12 Sep 2023 17:10:38 GMT, Daniel Jeli?ski wrote: > Please review this patch that removes saving of xmm16-xmm31 registers from Windows call_stub. > > Windows ABI only mandates saving xmm6-xmm15, which we continue to do here. > > No new tests. Mach5 tier1-5 builds and tests clean. @jatin-bhateja or @sviswa7 would one of you be able to find out why the current code saves the high XMM registers? The code was provided by @mcberg2016, but he's no longer around as far as I can tell. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15688#issuecomment-1729470545 From fyang at openjdk.org Thu Sep 21 12:47:43 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 21 Sep 2023 12:47:43 GMT Subject: RFR: 8315966: Relativize initial_sp in interpreter frames In-Reply-To: References: Message-ID: On Wed, 20 Sep 2023 08:52:09 GMT, Fei Yang wrote: > Hi, I have arranged tier1-3 test on linux-riscv64 platform. Thanks for adding handling for riscv. Tier1-3 test is clean. The riscv part looks good except for the nit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15815#issuecomment-1729494980 From iwalulya at openjdk.org Thu Sep 21 12:48:42 2023 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 21 Sep 2023 12:48:42 GMT Subject: RFR: 8315503: G1: Code root scan causes long GC pauses due to imbalanced iteration In-Reply-To: References: Message-ID: On Tue, 19 Sep 2023 08:04:23 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that modifies the code root (remembered) set to use the CHT as internal representation. > > This removes lots of locking (inhibiting throughput), provides automatic balancing for the code root scan phase, and (parallel) bulk unregistering of nmethdos during code cache unloading improving performance of various pauses that deal with code root sets. > > With a stress test that frequently loads and unloads 6000 classes and associated methods from them we could previously see the following issues: > > During collection pauses: > > [4179,965s][gc,phases ] GC(273) Evacuate Collection Set: 812,18ms > [..] > [4179,965s][gc,phases ] GC(273) Code Root Scan (ms): Min: 0,00, Avg: 59,03, Max: 775,12, Diff: 775,12, Sum: 944,44, Workers: 16 > [...] > [4179,965s][gc,phases ] GC(273) Termination (ms): Min: 0,03, Avg: 643,90, Max: 690,96, Diff: 690,93, Sum: 10302,47, Workers: 16 > > > Code root scan now reduces to ~22ms max on average in this case. > > Class unloading (breaking down the code cache flushing, i.e. `CodeCache::flush_unlinked_nmethods`): > > Clear Exception Caches 35,5ms > Unregister NMethods 598,5ms <---- this is nmethod unregistering. > Unregister Old NMethods 3,0ms > CodeBlob flush 41,1ms > CodeCache free 5730,3ms > > > With this change, the `unregister nmethods` phase takes ~25ms max on that stress test. @walulyai contributed this part. > > We have recently seen some imbalances in code root scan and long Remark pauses (thankfully not to that extreme) in other real-world applications too: > > [2466.979s][gc,phases ] GC(131) Code Root Scan (ms): Min: 0.0, Avg: 5.7, Max: 46.4, Diff: 46.4, Sum: 57.0, Workers: 10 > > > Some random comment: > * the mutex for the CHT had to be decreased in priority by one to not conflict with `CodeCache_lock`. This does not seem to be detrimental otherwise. At the same time, I had to move the locks at `nosafepoint-3` to `nosafepoint-4` as well to keep previous ordering. All mutexes with uses of `nosafepoint` as their rank seem to be good now. > > Testing: tier1-5 > > Thanks, > Thomas src/hotspot/share/gc/g1/g1CodeRootSet.cpp line 238: > 236: // A table with the new size should be at most filled by this percentage. Otherwise > 237: // we would grow again quickly. > 238: const float WantedFillFactor = 0.5; WantedLoadFactor src/hotspot/share/gc/g1/g1CollectedHeap.hpp line 1268: > 1266: > 1267: // Removes unlinked nmethods from all code root sets after class unloading. > 1268: void clean_code_root_sets(); maybe rename to something similar to `remove_dead_entries` as used elsewhere ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15811#discussion_r1332995448 PR Review Comment: https://git.openjdk.org/jdk/pull/15811#discussion_r1332994828 From fparain at openjdk.org Thu Sep 21 13:21:44 2023 From: fparain at openjdk.org (Frederic Parain) Date: Thu, 21 Sep 2023 13:21:44 GMT Subject: RFR: 8315954: getArgumentValues002.java fails on Graal [v2] In-Reply-To: References: Message-ID: On Thu, 14 Sep 2023 11:47:30 GMT, Doug Simon wrote: >> Doug Simon has updated the pull request incrementally with three additional commits since the last revision: >> >> - generalized getLiveObjectLocalsAt to getOopMapAt >> - need to zero oop_map_buf >> - simplified getLiveObjectLocalsAt and moved it from ResolvedJavaMethod to HotSpotResolvedJavaMethod > > src/hotspot/share/interpreter/oopMapCache.cpp line 616: > >> 614: tmp->fill(method, bci); >> 615: if (tmp->has_valid_mask()) { >> 616: entry->resource_copy(tmp); > > If `tmp` is invalid (e.g. oop map was requested for invalid BCI), then `resource_copy` crashes the VM in strange ways since it blindly trusts the mask size to be valid. This is not the only place where `resource_copy()` is called, could you add an assert in `resource_copy()` itself to check that it is never called with an invalid bci/mask_size. Thank you. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15705#discussion_r1333043061 From dnsimon at openjdk.org Thu Sep 21 14:41:14 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 21 Sep 2023 14:41:14 GMT Subject: RFR: 8315954: getArgumentValues002.java fails on Graal [v3] In-Reply-To: References: Message-ID: > This PR adds `HotSpotResolvedJavaMethod.getOopMapAt` to get the oop map for a method at a given BCI. This is required to do correct clearing of oops at OSR entry points. > > As part of this addition, I needed to be able to detect requests for oop maps at invalid BCIs. For this, I added `InterpreterOopMap::has_valid_mask()`. When an oop map computation is requested for an invalid BCI, this method returns false. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: add assertion to InterpreterOopMap::resource_copy ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15705/files - new: https://git.openjdk.org/jdk/pull/15705/files/c6c6c0d8..3c903ec0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15705&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15705&range=01-02 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15705.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15705/head:pull/15705 PR: https://git.openjdk.org/jdk/pull/15705 From dnsimon at openjdk.org Thu Sep 21 14:41:15 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 21 Sep 2023 14:41:15 GMT Subject: RFR: 8315954: getArgumentValues002.java fails on Graal [v3] In-Reply-To: References: Message-ID: On Thu, 21 Sep 2023 13:18:31 GMT, Frederic Parain wrote: >> src/hotspot/share/interpreter/oopMapCache.cpp line 616: >> >>> 614: tmp->fill(method, bci); >>> 615: if (tmp->has_valid_mask()) { >>> 616: entry->resource_copy(tmp); >> >> If `tmp` is invalid (e.g. oop map was requested for invalid BCI), then `resource_copy` crashes the VM in strange ways since it blindly trusts the mask size to be valid. > > This is not the only place where `resource_copy()` is called, could you add an assert in `resource_copy()` itself to check that it is never called with an invalid bci/mask_size. > Thank you. Ok, I've added that assertion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15705#discussion_r1333169517 From tschatzl at openjdk.org Thu Sep 21 14:49:53 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 21 Sep 2023 14:49:53 GMT Subject: Integrated: 8316581: Improve performance of Symbol::print_value_on() In-Reply-To: References: Message-ID: On Wed, 20 Sep 2023 09:34:05 GMT, Thomas Schatzl wrote: > Hi all, > > please review this (hopefully correct) optimization of `Symbol::print_value_on()`; investigation into class unloading time distribution showed that a lot of time is spent in the `UnloadingEventLog::log()` call (25+%, see CR). > > The reason seems to be the use of `outputStream::print()` without any need for formatting. > > This seems to decrease time spent in this logging by almost 10x. > > Testing: hs_err output seems still be the same, GHA > > Thanks, > Thomas This pull request has now been integrated. Changeset: 90bcdbd1 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/90bcdbd15fe7211377f6f6812a2b562c17995d65 Stats: 6 lines in 1 file changed: 0 ins; 3 del; 3 mod 8316581: Improve performance of Symbol::print_value_on() Reviewed-by: shade, coleenp, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/15838 From tschatzl at openjdk.org Thu Sep 21 14:49:52 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 21 Sep 2023 14:49:52 GMT Subject: RFR: 8316581: Improve performance of Symbol::print_value_on() [v2] In-Reply-To: <6eUyzhZ4IfNH8zsMKNv76u3aarY9LTd02uKls9ZrzTk=.bdbc8214-a096-4cea-a4b4-b5578cec8aab@github.com> References: <6eUyzhZ4IfNH8zsMKNv76u3aarY9LTd02uKls9ZrzTk=.bdbc8214-a096-4cea-a4b4-b5578cec8aab@github.com> Message-ID: <09FU4WpsIupq9WnfAwcoamU8FsPpb3fo9NZRRNTeMU0=.5eb6b847-4e98-4e31-a20a-72346ce9aafe@github.com> On Thu, 21 Sep 2023 03:00:14 GMT, David Holmes wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> coleen review > > This seems okay on the surface but raises some questions for me. Why do we have `print_value_on` and `print_symbol_on`? I get the sense that the former is somehow lower-level and potentially unsafe - which suggests that UL should not be using it in general! I could not find the original review thread for when `print_value_on` was added to answer this question, nor answer why the loop was used. > > Thanks. Thanks @dholmes-ora @shipilev @coleenp for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/15838#issuecomment-1729734169 From fparain at openjdk.org Thu Sep 21 15:02:45 2023 From: fparain at openjdk.org (Frederic Parain) Date: Thu, 21 Sep 2023 15:02:45 GMT Subject: RFR: 8315954: getArgumentValues002.java fails on Graal [v3] In-Reply-To: References: Message-ID: On Thu, 21 Sep 2023 14:41:14 GMT, Doug Simon wrote: >> This PR adds `HotSpotResolvedJavaMethod.getOopMapAt` to get the oop map for a method at a given BCI. This is required to do correct clearing of oops at OSR entry points. >> >> As part of this addition, I needed to be able to detect requests for oop maps at invalid BCIs. For this, I added `InterpreterOopMap::has_valid_mask()`. When an oop map computation is requested for an invalid BCI, this method returns false. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > add assertion to InterpreterOopMap::resource_copy Runtime changes look good to me. Thank you for the additional assert. ------------- Marked as reviewed by fparain (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15705#pullrequestreview-1638080188 From tschatzl at openjdk.org Thu Sep 21 15:16:20 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 21 Sep 2023 15:16:20 GMT Subject: RFR: 8315503: G1: Code root scan causes long GC pauses due to imbalanced iteration [v2] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that modifies the code root (remembered) set to use the CHT as internal representation. > > This removes lots of locking (inhibiting throughput), provides automatic balancing for the code root scan phase, and (parallel) bulk unregistering of nmethdos during code cache unloading improving performance of various pauses that deal with code root sets. > > With a stress test that frequently loads and unloads 6000 classes and associated methods from them we could previously see the following issues: > > During collection pauses: > > [4179,965s][gc,phases ] GC(273) Evacuate Collection Set: 812,18ms > [..] > [4179,965s][gc,phases ] GC(273) Code Root Scan (ms): Min: 0,00, Avg: 59,03, Max: 775,12, Diff: 775,12, Sum: 944,44, Workers: 16 > [...] > [4179,965s][gc,phases ] GC(273) Termination (ms): Min: 0,03, Avg: 643,90, Max: 690,96, Diff: 690,93, Sum: 10302,47, Workers: 16 > > > Code root scan now reduces to ~22ms max on average in this case. > > Class unloading (breaking down the code cache flushing, i.e. `CodeCache::flush_unlinked_nmethods`): > > Clear Exception Caches 35,5ms > Unregister NMethods 598,5ms <---- this is nmethod unregistering. > Unregister Old NMethods 3,0ms > CodeBlob flush 41,1ms > CodeCache free 5730,3ms > > > With this change, the `unregister nmethods` phase takes ~25ms max on that stress test. @walulyai contributed this part. > > We have recently seen some imbalances in code root scan and long Remark pauses (thankfully not to that extreme) in other real-world applications too: > > [2466.979s][gc,phases ] GC(131) Code Root Scan (ms): Min: 0.0, Avg: 5.7, Max: 46.4, Diff: 46.4, Sum: 57.0, Workers: 10 > > > Some random comment: > * the mutex for the CHT had to be decreased in priority by one to not conflict with `CodeCache_lock`. This does not seem to be detrimental otherwise. At the same time, I had to move the locks at `nosafepoint-3` to `nosafepoint-4` as well to keep previous ordering. All mutexes with uses of `nosafepoint` as their rank seem to be good now. > > Testing: tier1-5 > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: iwalulya review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15811/files - new: https://git.openjdk.org/jdk/pull/15811/files/b1984bc4..afad0655 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15811&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15811&range=00-01 Stats: 26 lines in 5 files changed: 2 ins; 3 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/15811.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15811/head:pull/15811 PR: https://git.openjdk.org/jdk/pull/15811 From dnsimon at openjdk.org Thu Sep 21 16:31:51 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 21 Sep 2023 16:31:51 GMT Subject: RFR: 8315954: getArgumentValues002.java fails on Graal [v3] In-Reply-To: References: Message-ID: On Thu, 21 Sep 2023 14:41:14 GMT, Doug Simon wrote: >> This PR adds `HotSpotResolvedJavaMethod.getOopMapAt` to get the oop map for a method at a given BCI. This is required to do correct clearing of oops at OSR entry points. >> >> As part of this addition, I needed to be able to detect requests for oop maps at invalid BCIs. For this, I added `InterpreterOopMap::has_valid_mask()`. When an oop map computation is requested for an invalid BCI, this method returns false. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > add assertion to InterpreterOopMap::resource_copy Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15705#issuecomment-1729911323 From dnsimon at openjdk.org Thu Sep 21 16:31:52 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 21 Sep 2023 16:31:52 GMT Subject: Integrated: 8315954: getArgumentValues002.java fails on Graal In-Reply-To: References: Message-ID: On Wed, 13 Sep 2023 09:46:01 GMT, Doug Simon wrote: > This PR adds `HotSpotResolvedJavaMethod.getOopMapAt` to get the oop map for a method at a given BCI. This is required to do correct clearing of oops at OSR entry points. > > As part of this addition, I needed to be able to detect requests for oop maps at invalid BCIs. For this, I added `InterpreterOopMap::has_valid_mask()`. When an oop map computation is requested for an invalid BCI, this method returns false. This pull request has now been integrated. Changeset: 542b3000 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/542b3000f0cd1136466066cb4046257220ac2827 Stats: 278 lines in 8 files changed: 256 ins; 0 del; 22 mod 8315954: getArgumentValues002.java fails on Graal Reviewed-by: never, fparain ------------- PR: https://git.openjdk.org/jdk/pull/15705 From mchung at openjdk.org Thu Sep 21 18:16:27 2023 From: mchung at openjdk.org (Mandy Chung) Date: Thu, 21 Sep 2023 18:16:27 GMT Subject: RFR: 8316456: StackWalker may skip Continuation::yield0 frame mistakenly In-Reply-To: References: Message-ID: On Thu, 21 Sep 2023 05:15:26 GMT, Patricio Chilano Mateo wrote: >> `JVM_MoreStackWalk` has a bug that always assumes that the Java frame >> stream is currently at the frame decoded in the last patch and so always >> advances to the next frame before filling in the new batch of stack frame. >> However `JVM_MoreStackWalk` may return 0. The library will set >> the continuation to its parent. It then call `JVM_MoreStackWalk` to continue >> the stack walking but the last decoded frame has already been advanced. >> The Java frame stream is already at the top frame of the parent continuation. . >> The current implementation skips "Continuation::yield0" mistakenly. This >> only happens with `-XX:+ShowHiddenFrames` or `StackWalker.Option.SHOW_HIDDEN_FRAMES`. >> >> The fix is to pass the number of frames decoded in the last batch to `JVM_MoreStackWalk` >> so that the VM will determine if the current frame should be skipped or not. >> >> `test/jdk/jdk/internal/vm/Continuation/Scoped.java` test now correctly checks >> the expected result where "yield0" exists between "enter" and "run" frames. > > src/java.base/share/classes/java/lang/StackStreamFactory.java line 443: > >> 441: >> 442: // If the last batch didn't fetch any frames, keep the current batch size. >> 443: int lastBatchFrameCount = frameBuffer.numFrames(); > > I run some tests to understand the issue and I got the same failure if I now set MIN_BATCH_SIZE to 7. This just forces the same situation where Continuation::enter is the last frame in the buffer, otherwise since the patch also changes the batch sizes we don't fall into that issue anymore. The problem is with this numFrames() method which still returns a number > 0 after the fetch attempt that returns with no frames. Maybe there is a reset missing for origin and fence when fetching the next batch? Thanks for catching this. The problem is that `fetchStackFrames(int batchSize)` is supposed to call `setBatch` to set origin and fence for the new batch. But if no frame is fetched, it skips not calling that. I have a fix for it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15804#discussion_r1333433443 From mchung at openjdk.org Thu Sep 21 18:20:36 2023 From: mchung at openjdk.org (Mandy Chung) Date: Thu, 21 Sep 2023 18:20:36 GMT Subject: RFR: 8316456: StackWalker may skip Continuation::yield0 frame mistakenly [v2] In-Reply-To: References: Message-ID: > `JVM_MoreStackWalk` has a bug that always assumes that the Java frame > stream is currently at the frame decoded in the last patch and so always > advances to the next frame before filling in the new batch of stack frame. > However `JVM_MoreStackWalk` may return 0. The library will set > the continuation to its parent. It then call `JVM_MoreStackWalk` to continue > the stack walking but the last decoded frame has already been advanced. > The Java frame stream is already at the top frame of the parent continuation. . > The current implementation skips "Continuation::yield0" mistakenly. This > only happens with `-XX:+ShowHiddenFrames` or `StackWalker.Option.SHOW_HIDDEN_FRAMES`. > > The fix is to pass the number of frames decoded in the last batch to `JVM_MoreStackWalk` > so that the VM will determine if the current frame should be skipped or not. > > `test/jdk/jdk/internal/vm/Continuation/Scoped.java` test now correctly checks > the expected result where "yield0" exists between "enter" and "run" frames. Mandy Chung has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' of https://github.com/openjdk/jdk into JDK-8316456 - call setBatch to update origin and fence for an empty batch - 8316456: StackWalker may skip Continuation::yield0 frame mistakenly ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15804/files - new: https://git.openjdk.org/jdk/pull/15804/files/f3ec0dac..ddaeaf99 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15804&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15804&range=00-01 Stats: 19285 lines in 295 files changed: 11045 ins; 7282 del; 958 mod Patch: https://git.openjdk.org/jdk/pull/15804.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15804/head:pull/15804 PR: https://git.openjdk.org/jdk/pull/15804 From pchilanomate at openjdk.org Thu Sep 21 19:34:23 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 21 Sep 2023 19:34:23 GMT Subject: RFR: 8316456: StackWalker may skip Continuation::yield0 frame mistakenly [v2] In-Reply-To: References: Message-ID: On Thu, 21 Sep 2023 18:20:36 GMT, Mandy Chung wrote: >> `JVM_MoreStackWalk` has a bug that always assumes that the Java frame >> stream is currently at the frame decoded in the last patch and so always >> advances to the next frame before filling in the new batch of stack frame. >> However `JVM_MoreStackWalk` may return 0. The library will set >> the continuation to its parent. It then call `JVM_MoreStackWalk` to continue >> the stack walking but the last decoded frame has already been advanced. >> The Java frame stream is already at the top frame of the parent continuation. . >> The current implementation skips "Continuation::yield0" mistakenly. This >> only happens with `-XX:+ShowHiddenFrames` or `StackWalker.Option.SHOW_HIDDEN_FRAMES`. >> >> The fix is to pass the number of frames decoded in the last batch to `JVM_MoreStackWalk` >> so that the VM will determine if the current frame should be skipped or not. >> >> `test/jdk/jdk/internal/vm/Continuation/Scoped.java` test now correctly checks >> the expected result where "yield0" exists between "enter" and "run" frames. > > Mandy Chung has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' of https://github.com/openjdk/jdk into JDK-8316456 > - call setBatch to update origin and fence for an empty batch > - 8316456: StackWalker may skip Continuation::yield0 frame mistakenly Looks good to me, thanks. Patricio src/hotspot/share/prims/stackwalk.cpp line 189: > 187: // skip hidden frames for default StackWalker option (i.e. SHOW_HIDDEN_FRAMES > 188: // not set) and when StackWalker::getCallerClass is called > 189: LogTarget(Debug, stackwalk) lt; Nit, leftover. ------------- Marked as reviewed by pchilanomate (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15804#pullrequestreview-1638561745 PR Review Comment: https://git.openjdk.org/jdk/pull/15804#discussion_r1333514103 From mchung at openjdk.org Thu Sep 21 20:11:12 2023 From: mchung at openjdk.org (Mandy Chung) Date: Thu, 21 Sep 2023 20:11:12 GMT Subject: RFR: 8316456: StackWalker may skip Continuation::yield0 frame mistakenly [v2] In-Reply-To: References: Message-ID: On Thu, 21 Sep 2023 19:29:30 GMT, Patricio Chilano Mateo wrote: >> Mandy Chung has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' of https://github.com/openjdk/jdk into JDK-8316456 >> - call setBatch to update origin and fence for an empty batch >> - 8316456: StackWalker may skip Continuation::yield0 frame mistakenly > > src/hotspot/share/prims/stackwalk.cpp line 189: > >> 187: // skip hidden frames for default StackWalker option (i.e. SHOW_HIDDEN_FRAMES >> 188: // not set) and when StackWalker::getCallerClass is called >> 189: LogTarget(Debug, stackwalk) lt; > > Nit, leftover. Thanks for the review. Will clean up before it's integrated ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15804#discussion_r1333551895 From duke at openjdk.org Thu Sep 21 21:00:49 2023 From: duke at openjdk.org (Yi-Fan Tsai) Date: Thu, 21 Sep 2023 21:00:49 GMT Subject: RFR: 8314029: Add file name parameter to Compiler.perfmap Message-ID: `jcmd Compiler.perfmap` uses the hard-coded file name for a perf map: `/tmp/perf-%d.map`. This change adds an option for specifying a file name. The help message of Compiler.perfmap: Compiler.perfmap Write map file for Linux perf tool. Impact: Low Syntax : Compiler.perfmap [options] Options: (options must be specified using the or = syntax) filename : [optional] Name of the map file (STRING, no default value) ------------- Commit messages: - Update parameter name - Update jcmd man page - Change to use filename and add a test - 8314029: Add file name parameter to Compiler.perfmap Changes: https://git.openjdk.org/jdk/pull/15871/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15871&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314029 Stats: 69 lines in 7 files changed: 49 ins; 5 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/15871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15871/head:pull/15871 PR: https://git.openjdk.org/jdk/pull/15871 From mchung at openjdk.org Thu Sep 21 23:16:04 2023 From: mchung at openjdk.org (Mandy Chung) Date: Thu, 21 Sep 2023 23:16:04 GMT Subject: RFR: 8316456: StackWalker may skip Continuation::yield0 frame mistakenly [v3] In-Reply-To: References: Message-ID: > `JVM_MoreStackWalk` has a bug that always assumes that the Java frame > stream is currently at the frame decoded in the last patch and so always > advances to the next frame before filling in the new batch of stack frame. > However `JVM_MoreStackWalk` may return 0. The library will set > the continuation to its parent. It then call `JVM_MoreStackWalk` to continue > the stack walking but the last decoded frame has already been advanced. > The Java frame stream is already at the top frame of the parent continuation. . > The current implementation skips "Continuation::yield0" mistakenly. This > only happens with `-XX:+ShowHiddenFrames` or `StackWalker.Option.SHOW_HIDDEN_FRAMES`. > > The fix is to pass the number of frames decoded in the last batch to `JVM_MoreStackWalk` > so that the VM will determine if the current frame should be skipped or not. > > `test/jdk/jdk/internal/vm/Continuation/Scoped.java` test now correctly checks > the expected result where "yield0" exists between "enter" and "run" frames. Mandy Chung has updated the pull request incrementally with one additional commit since the last revision: minor clean up ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15804/files - new: https://git.openjdk.org/jdk/pull/15804/files/ddaeaf99..14f13223 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15804&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15804&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/15804.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15804/head:pull/15804 PR: https://git.openjdk.org/jdk/pull/15804 From mchung at openjdk.org Thu Sep 21 23:16:05 2023 From: mchung at openjdk.org (Mandy Chung) Date: Thu, 21 Sep 2023 23:16:05 GMT Subject: Integrated: 8316456: StackWalker may skip Continuation::yield0 frame mistakenly In-Reply-To: References: Message-ID: On Mon, 18 Sep 2023 23:00:09 GMT, Mandy Chung wrote: > `JVM_MoreStackWalk` has a bug that always assumes that the Java frame > stream is currently at the frame decoded in the last patch and so always > advances to the next frame before filling in the new batch of stack frame. > However `JVM_MoreStackWalk` may return 0. The library will set > the continuation to its parent. It then call `JVM_MoreStackWalk` to continue > the stack walking but the last decoded frame has already been advanced. > The Java frame stream is already at the top frame of the parent continuation. . > The current implementation skips "Continuation::yield0" mistakenly. This > only happens with `-XX:+ShowHiddenFrames` or `StackWalker.Option.SHOW_HIDDEN_FRAMES`. > > The fix is to pass the number of frames decoded in the last batch to `JVM_MoreStackWalk` > so that the VM will determine if the current frame should be skipped or not. > > `test/jdk/jdk/internal/vm/Continuation/Scoped.java` test now correctly checks > the expected result where "yield0" exists between "enter" and "run" frames. This pull request has now been integrated. Changeset: c72f0046 Author: Mandy Chung URL: https://git.openjdk.org/jdk/commit/c72f00463fcb1c4a94126932abbc82a2582c10c2 Stats: 210 lines in 7 files changed: 47 ins; 57 del; 106 mod 8316456: StackWalker may skip Continuation::yield0 frame mistakenly Reviewed-by: rpressler, pchilanomate ------------- PR: https://git.openjdk.org/jdk/pull/15804 From mchung at openjdk.org Thu Sep 21 23:41:36 2023 From: mchung at openjdk.org (Mandy Chung) Date: Thu, 21 Sep 2023 23:41:36 GMT Subject: RFR: 8316698: build failure caused by JDK-8316456 Message-ID: JDK-8316456 causes hotspot build to fail due to an extra argument. Trivial fix. ------------- Commit messages: - 8316698: build failure caused by JDK-8316456 Changes: https://git.openjdk.org/jdk/pull/15876/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15876&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316698 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15876.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15876/head:pull/15876 PR: https://git.openjdk.org/jdk/pull/15876 From dcubed at openjdk.org Thu Sep 21 23:59:13 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 21 Sep 2023 23:59:13 GMT Subject: RFR: 8316698: build failure caused by JDK-8316456 In-Reply-To: References: Message-ID: On Thu, 21 Sep 2023 23:34:27 GMT, Mandy Chung wrote: > JDK-8316456 causes hotspot build to fail due to an extra argument passed to log_debug. Trivial fix. Thumbs up. This is a trivial fix. ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15876#pullrequestreview-1638878479 From dholmes at openjdk.org Fri Sep 22 00:02:09 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 22 Sep 2023 00:02:09 GMT Subject: RFR: 8316698: build failure caused by JDK-8316456 In-Reply-To: References: Message-ID: On Thu, 21 Sep 2023 23:34:27 GMT, Mandy Chung wrote: > JDK-8316456 causes hotspot build to fail due to an extra argument passed to log_debug. Trivial fix. Looks good. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15876#pullrequestreview-1638884308 From mchung at openjdk.org Fri Sep 22 00:13:19 2023 From: mchung at openjdk.org (Mandy Chung) Date: Fri, 22 Sep 2023 00:13:19 GMT Subject: Integrated: 8316698: build failure caused by JDK-8316456 In-Reply-To: References: Message-ID: On Thu, 21 Sep 2023 23:34:27 GMT, Mandy Chung wrote: > JDK-8316456 causes hotspot build to fail due to an extra argument passed to log_debug. Trivial fix. This pull request has now been integrated. Changeset: a1e03463 Author: Mandy Chung URL: https://git.openjdk.org/jdk/commit/a1e03463accfe830eef0aa53a806d0d5ba873b24 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8316698: build failure caused by JDK-8316456 Reviewed-by: dcubed, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/15876 From manc at openjdk.org Fri Sep 22 00:14:15 2023 From: manc at openjdk.org (Man Cao) Date: Fri, 22 Sep 2023 00:14:15 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v24] In-Reply-To: <_U1jBJQChDb-Y86Qd-0xMl3f3oCjEv2egqem9ZME7GY=.0737b93e-4521-4b82-b330-7f4491370907@github.com> References: <_U1jBJQChDb-Y86Qd-0xMl3f3oCjEv2egqem9ZME7GY=.0737b93e-4521-4b82-b330-7f4491370907@github.com> Message-ID: On Wed, 20 Sep 2023 00:39:44 GMT, David Holmes wrote: >> Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix build issues > > src/hotspot/share/gc/shared/collectedHeap.cpp line 161: > >> 159: } >> 160: >> 161: void CollectedHeap::inc_total_cpu_time(long diff) { > > We don't use `long` in shared code as it has different size on different platforms. Using `long` is to avoid build failure on 32-bit ARM and x86. `jlong` is `long long` on 32-bit, and Atomic template does not support `long long` on 32-bit. Example failure: https://github.com/jjoo172/jdk/actions/runs/6229455243/job/16907994694. Is there a better way to avoid these failures on 32-bit? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1333718984 From fyang at openjdk.org Fri Sep 22 01:47:30 2023 From: fyang at openjdk.org (Fei Yang) Date: Fri, 22 Sep 2023 01:47:30 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v55] In-Reply-To: <6oWO6eKsZasmfvGT45D6hWW1ioDM1Qwej3KqSUoOxLM=.d30979bc-478f-4ce8-905c-16f91f4541d8@github.com> References: <6oWO6eKsZasmfvGT45D6hWW1ioDM1Qwej3KqSUoOxLM=.d30979bc-478f-4ce8-905c-16f91f4541d8@github.com> Message-ID: On Wed, 13 Sep 2023 14:14:43 GMT, Roman Kennke wrote: >> There's gtest a failure in the GHA run: >> >> [ RUN ] arrayOopDesc.double_vm >> /home/runner/work/jdk/jdk/test/hotspot/gtest/oops/test_arrayOop.cpp:51: Failure >> check_max_length_overflow(T_DOUBLE) evaluates to false, where >> T_DOUBLE evaluates to >> >> [ FAILED ] arrayOopDesc.double_vm (0 ms) >> [ RUN ] arrayOopDesc.byte_vm >> [ OK ] arrayOopDesc.byte_vm (0 ms) >> [ RUN ] arrayOopDesc.short_vm >> [ OK ] arrayOopDesc.short_vm (0 ms) >> [ RUN ] arrayOopDesc.int_vm >> [ OK ] arrayOopDesc.int_vm (0 ms) >> [ RUN ] arrayOopDesc.long_vm >> /home/runner/work/jdk/jdk/test/hotspot/gtest/oops/test_arrayOop.cpp:67: Failure >> check_max_length_overflow(T_LONG) evaluates to false, where >> T_LONG evaluates to >> >> >> [ FAILED ] arrayOopDesc.long_vm (0 ms) > >> There's gtest a failure in the GHA run: >> >> ``` >> [ RUN ] arrayOopDesc.double_vm >> /home/runner/work/jdk/jdk/test/hotspot/gtest/oops/test_arrayOop.cpp:51: Failure >> check_max_length_overflow(T_DOUBLE) evaluates to false, where >> T_DOUBLE evaluates to ? >> >> [ FAILED ] arrayOopDesc.double_vm (0 ms) >> [ RUN ] arrayOopDesc.byte_vm >> [ OK ] arrayOopDesc.byte_vm (0 ms) >> [ RUN ] arrayOopDesc.short_vm >> [ OK ] arrayOopDesc.short_vm (0 ms) >> [ RUN ] arrayOopDesc.int_vm >> [ OK ] arrayOopDesc.int_vm (0 ms) >> [ RUN ] arrayOopDesc.long_vm >> /home/runner/work/jdk/jdk/test/hotspot/gtest/oops/test_arrayOop.cpp:67: Failure >> check_max_length_overflow(T_LONG) evaluates to false, where >> T_LONG evaluates to ? >> >> [ FAILED ] arrayOopDesc.long_vm (0 ms) >> ``` > > Aww, this max_array_length() method and 32bit builds. :-/ > We should re-write this method altogether and special-case it for !_LP64 and maybe simply make it a switch on the incoming type, with hard-coded values. This might be easier to understand than getting the logic absolutely right. Also, with this change, and even more so with upcoming Lilliput changes, this method is a little too conservative and we could offer somewhat increased array lengths. Alternatively, we could do what the comments suggests and fix up all the uses of the method to use sensible types (size_t?) and make it simple and obvious. @rkennke : Please add small riscv-specific change to bring it into the same shape as aarch64/x86 [11044-rv-update.diff.txt](https://github.com/openjdk/jdk/files/12695800/11044-rv-update.diff.txt) ------------- PR Comment: https://git.openjdk.org/jdk/pull/11044#issuecomment-1730670082 From haosun at openjdk.org Fri Sep 22 02:09:46 2023 From: haosun at openjdk.org (Hao Sun) Date: Fri, 22 Sep 2023 02:09:46 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v11] In-Reply-To: References: Message-ID: <-xwMAKvUAhY0pafX8kODsvTJMoSyTPVomnbpUYG0wWA=.3bd89717-7092-4650-ae82-7c743ffdd81e@github.com> > ### Background > > 1. PAC-RET branch protection was initially implemented on Linux/AArch64 in JDK-8277204 [1]. > > 2. However, it was broken with the introduction of virtual threads [2], mainly because the continuation freeze/thaw mechanism would trigger stack copying to/from memory, whereas the saved and signed LR on the stack doesn't get re-signed accordingly. > > 3. PR-9067 [3] tried to implement the re-sign part, but it was not accepted because option "PreserveFramePointer" is always turned on by PAC-RET but this would slow down virtual threads by ~5-20x. > > 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview language features are enabled. Note that virtual thread is one preview feature then. > > 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. > > ### Goal > > This patch aims to make PAC-RET compatible with virtual threads. > > ### Requirements of virtual threads > > R-1: Option "PreserveFramePointer" should be turned off. That is, PAC-RET implementation should not rely on frame pointer FP. Otherwise, the fast path in stack copying will never be taken. > > R-2: Use some invariant values to stack copying as the modifier, so as to avoid the PAC re-sign for continuation thaw, as the fast path in stack copying doesn't walk the frame. > > Note that more details can be found in the discussion [3]. > > ### Investigation > > We considered to use (relative) stack pointer SP, thread ID, PACStack [7] and value zero as the candidate modifier. > > 1. SP: In some scenarios, we need to authenticate the return address in places where the current SP doesn't match the SP on function entry. E.g. see the usage in Runtime1::generate_handle_exception(). Hence, neither absolute nor relative SP works. > > 2. thread ID (tid): It's invariant to virtual thread, but it's nontrivial to access it from the JIT side. We need 1) firstly resolve the address of current thread (See [8] as an example), and 2) get the tid field in the way like java_lang_Thread::thread_id(). I suppose this would introduce big performance overhead. Then can we turn to use "rthread" register (JavaThread object address) as the modifier? Unfortunately, it's not an invariant to virtual threads and PAC re-sign is still needed. > > 5. PACStack uses the signed return address of caller as the modifier to sign the callee's return address. In this way, we get one PACed call chain. The modifier should be saved into somewhere around the frame record. Inevitably, FP should be preserved to make it easy to find this modifier in case of... Hao Sun has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - Not use pacia1716 and reuse retaddr_slot - Merge branch 'master' into jdk-8287325 - break long lines - Refactor long assertions in continuationFreezeThaw.cpp - Introduce CPU_OVERRIDES_RETURN_ADDRESS_ACCESSORS - Merge branch 'master' into jdk-8287325 - Revert to the implementation with zero as the PAC modifier - Merge branch 'master' into jdk-8287325 - Update aarch64.ad and jvmci AArch64TestAssembler.java Before this patch, rscratch1 is clobbered. With this patch, we use the rscratch1 register after we save it on the stack. In this way, the code would be consistent with macroAssembler_aarch64.cpp. - Merge branch 'master' into jdk-8287325 - ... and 6 more: https://git.openjdk.org/jdk/compare/84124794...47e9e942 ------------- Changes: https://git.openjdk.org/jdk/pull/13322/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13322&range=10 Stats: 236 lines in 23 files changed: 94 ins; 32 del; 110 mod Patch: https://git.openjdk.org/jdk/pull/13322.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13322/head:pull/13322 PR: https://git.openjdk.org/jdk/pull/13322 From haosun at openjdk.org Fri Sep 22 02:19:17 2023 From: haosun at openjdk.org (Hao Sun) Date: Fri, 22 Sep 2023 02:19:17 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v10] In-Reply-To: References: Message-ID: On Wed, 20 Sep 2023 21:43:08 GMT, Dean Long wrote: >> Hao Sun has updated the pull request incrementally with one additional commit since the last revision: >> >> break long lines > > src/hotspot/os_cpu/linux_aarch64/pauth_linux_aarch64.inline.hpp line 57: > >> 55: register address r17 __asm("r17") = ret_addr; >> 56: register address r16 __asm("r16") = 0; >> 57: asm (PACIA1716 : "+r"(r17) : "r"(r16)); > > Can we use PACIZA or PACIA here so we don't force the use of r16/r17? Thanks for pointing it out. `pacia1716` was used in the initial implementation of pac-ret, mainly because we used `fp` as the pac modifier then and `pacia x30, fp` doesn't belong to the NOP instruction space. Now we use "zero const" as the modifier and we'd better use `paciaz/autiaz` as it's consistent with our usages in macro-assembler helpers i.e. `protection_return_address()` and `authenticate_return_address()`. Updated in the latest commit. Please help take another look at it. Thanks. > src/hotspot/os_cpu/linux_aarch64/pauth_linux_aarch64.inline.hpp line 69: > >> 67: register address r17 __asm("r17") = ret_addr; >> 68: register address r16 __asm("r16") = 0; >> 69: asm (AUTIA1716 : "+r"(r17) : "r"(r16)); > > AUTIZA or AUTIA? Replied in the previous comment together. Thanks. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 721: > >> 719: #endif >> 720: ContinuationHelper::patch_return_address_at( >> 721: chunk_bottom_sp - frame::sender_sp_ret_address_offset(), > > How about reusing retaddr_slot here? Yes, we can. In the latest commit, I introduced one specific variable name `chunk_bottom_retaddr_slot` rather than using the common `retaddr_slot`. Because the common `retaddr_slot` is used in several sites in this file and I'd like to use it in the limited assertion scope. That's why I put the assertions at lines 600/621/1887 in extra braces. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1333804214 PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1333804514 PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1333808752 From jwaters at openjdk.org Fri Sep 22 02:30:25 2023 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 22 Sep 2023 02:30:25 GMT Subject: RFR: 8307160: [REDO] Enable the permissive- flag on the Microsoft Visual C compiler [v4] In-Reply-To: References: <7piLRto5nNbhYYYfENCr5ecm4M2xNtMkjkE8XhrLLQ0=.8fd1ac3a-46f8-47a8-ae37-a4abbf7757d9@github.com> Message-ID: On Thu, 14 Sep 2023 03:23:55 GMT, Julian Waters wrote: >> Julian Waters has updated the pull request incrementally with one additional commit since the last revision: >> >> Document changes in awt_DnDDS.cpp > > Pinging > @TheShermanTanker In my experience, getting reviews from all areas for issues like this that cuts through the entire JDK can be difficult. Another approach, which requires more work from your side, but hopefully less from the reviewers' (and thus makes it easier for them to review) is to split this PR into multiple ones: One for each area (basically, area == mailing list) that just makes the changes to the code necessary to (in the future) turn on /permissive-. And then finally a small "finishing" PR which just touches the makefile and enables the flag, when all code is fixed. > > As a side effect, it is also 100% clear that all parts of the code has been correctly reviewed, since then reviewers do not need to leave conditions on their reviews ("i only looked at the foo parts"). I understand, will split this into multiple changes after I answer all queries above ------------- PR Comment: https://git.openjdk.org/jdk/pull/15096#issuecomment-1730719188 From dholmes at openjdk.org Fri Sep 22 02:46:12 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 22 Sep 2023 02:46:12 GMT Subject: RFR: 8299915: Remove ArrayAllocatorMallocLimit and associated code In-Reply-To: References: Message-ID: On Thu, 21 Sep 2023 12:02:24 GMT, Afshin Zafari wrote: > 1. `ArrayAllocatorMallocLimit` is removed. The test cases that tested it also are removed. > 2. `AllocArrayAllocator` instances are replaced with `MallocArrayAllocator`. > 3. The signature of `CHeapBitMap::free(ptr, size)` is kept as it is, since it is called in this way from `GrowableBitMap::resize`, where `T` can be also `ArenaBitMap` and `ResourceBitMap`. However, it uses `MallocArrayAllocator::free(ptr)` and ignores the `size`: > ```C++ > void CHeapBitMap::free(bm_word_t* map, idx_t size_in_words) const { > MallocArrayAllocator::free(map); > } > > ### Test > tiers1-4 passed on all platforms. Functional removal of code looks good, but some of the test changes need to be changed. Thanks. test/hotspot/jtreg/serviceability/attach/AttachSetGetFlag.java line 64: > 62: testGetFlag("ArrayAllocatorMallocLimit", "128"); > 63: // testSetFlag("ArrayAllocatorMallocLimit", "64", "128"); > 64: You need to replace this with another non-manageable size_t flag so that code coverage is maintained. test/lib-test/jdk/test/whitebox/vm_flags/SizeTTest.java line 1: > 1: /* This test also should not be removed but changed to use a different size_t flag so that the WB functionality continues to be tested for a flag of this type. ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15859#pullrequestreview-1639059839 PR Review Comment: https://git.openjdk.org/jdk/pull/15859#discussion_r1333820086 PR Review Comment: https://git.openjdk.org/jdk/pull/15859#discussion_r1333820917 From dholmes at openjdk.org Fri Sep 22 02:52:20 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 22 Sep 2023 02:52:20 GMT Subject: RFR: 8314029: Add file name parameter to Compiler.perfmap In-Reply-To: References: Message-ID: On Thu, 21 Sep 2023 20:43:56 GMT, Yi-Fan Tsai wrote: > `jcmd Compiler.perfmap` uses the hard-coded file name for a perf map: `/tmp/perf-%d.map`. This change adds an option for specifying a file name. > > The help message of Compiler.perfmap: > > Compiler.perfmap > Write map file for Linux perf tool. > > Impact: Low > > Syntax : Compiler.perfmap [options] > > Options: (options must be specified using the or = syntax) > filename : [optional] Name of the map file (STRING, no default value) This will also need a CSR request created and approved. src/jdk.jcmd/share/man/jcmd.1 line 1: > 1: .\" Copyright (c) 2012, 2023, Oracle and/or its affiliates. All rights reserved. The actual markdown source for this file needs to be updated with these changes. Those sources are not open-source unfortunately. Please either coordinate to get the sources updated with an Oracle developer as part of this PR (they will integrate the internal part), or else please defer this to a subtask and let an Oracle developer update the source and output at the same time. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15871#issuecomment-1730731221 PR Review Comment: https://git.openjdk.org/jdk/pull/15871#discussion_r1333823529 From dholmes at openjdk.org Fri Sep 22 03:31:18 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 22 Sep 2023 03:31:18 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v24] In-Reply-To: References: <_U1jBJQChDb-Y86Qd-0xMl3f3oCjEv2egqem9ZME7GY=.0737b93e-4521-4b82-b330-7f4491370907@github.com> Message-ID: On Fri, 22 Sep 2023 00:11:19 GMT, Man Cao wrote: >> src/hotspot/share/gc/shared/collectedHeap.cpp line 161: >> >>> 159: } >>> 160: >>> 161: void CollectedHeap::inc_total_cpu_time(long diff) { >> >> We don't use `long` in shared code as it has different size on different platforms. > > Using `long` is to avoid build failure on 32-bit ARM and x86. `jlong` is `long long` on 32-bit, and Atomic template does not support `long long` on 32-bit. Example failure: https://github.com/jjoo172/jdk/actions/runs/6229455243/job/16907994694. > > Is there a better way to avoid these failures on 32-bit? `long` is 32-bit on Windows x64 as well which means you're reducing the utility of these timers there (else you could use 32-bit everywhere). AFAICS it should be supported on x86-32 as we define `SUPPORTS_NATIVE_CX8` whilst for ARM it is restricted to ARMv7a and above. (Does anyone build ARMv6 still?) But that appears not to be handled by the atomic templates. Not sure the best way to approach this one. If the templates correctly handled SUPPORTS_NATIVE_CX8 to define the 64-bit variants then the ideal solution would be to use a typedef that is 64-bit on supported platforms and 32-bit elsewhere. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1333838798 From dlong at openjdk.org Fri Sep 22 07:00:17 2023 From: dlong at openjdk.org (Dean Long) Date: Fri, 22 Sep 2023 07:00:17 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v11] In-Reply-To: <-xwMAKvUAhY0pafX8kODsvTJMoSyTPVomnbpUYG0wWA=.3bd89717-7092-4650-ae82-7c743ffdd81e@github.com> References: <-xwMAKvUAhY0pafX8kODsvTJMoSyTPVomnbpUYG0wWA=.3bd89717-7092-4650-ae82-7c743ffdd81e@github.com> Message-ID: On Fri, 22 Sep 2023 02:09:46 GMT, Hao Sun wrote: >> ### Background >> >> 1. PAC-RET branch protection was initially implemented on Linux/AArch64 in JDK-8277204 [1]. >> >> 2. However, it was broken with the introduction of virtual threads [2], mainly because the continuation freeze/thaw mechanism would trigger stack copying to/from memory, whereas the saved and signed LR on the stack doesn't get re-signed accordingly. >> >> 3. PR-9067 [3] tried to implement the re-sign part, but it was not accepted because option "PreserveFramePointer" is always turned on by PAC-RET but this would slow down virtual threads by ~5-20x. >> >> 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview language features are enabled. Note that virtual thread is one preview feature then. >> >> 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. >> >> ### Goal >> >> This patch aims to make PAC-RET compatible with virtual threads. >> >> ### Requirements of virtual threads >> >> R-1: Option "PreserveFramePointer" should be turned off. That is, PAC-RET implementation should not rely on frame pointer FP. Otherwise, the fast path in stack copying will never be taken. >> >> R-2: Use some invariant values to stack copying as the modifier, so as to avoid the PAC re-sign for continuation thaw, as the fast path in stack copying doesn't walk the frame. >> >> Note that more details can be found in the discussion [3]. >> >> ### Investigation >> >> We considered to use (relative) stack pointer SP, thread ID, PACStack [7] and value zero as the candidate modifier. >> >> 1. SP: In some scenarios, we need to authenticate the return address in places where the current SP doesn't match the SP on function entry. E.g. see the usage in Runtime1::generate_handle_exception(). Hence, neither absolute nor relative SP works. >> >> 2. thread ID (tid): It's invariant to virtual thread, but it's nontrivial to access it from the JIT side. We need 1) firstly resolve the address of current thread (See [8] as an example), and 2) get the tid field in the way like java_lang_Thread::thread_id(). I suppose this would introduce big performance overhead. Then can we turn to use "rthread" register (JavaThread object address) as the modifier? Unfortunately, it's not an invariant to virtual threads and PAC re-sign is still needed. >> >> 5. PACStack uses the signed return address of caller as the modifier to sign the callee's return address. In this way, we get one PACed call chain. The modifier should be saved into somewhere around the frame record. Inevitably, FP sh... > > Hao Sun has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - Not use pacia1716 and reuse retaddr_slot > - Merge branch 'master' into jdk-8287325 > - break long lines > - Refactor long assertions in continuationFreezeThaw.cpp > - Introduce CPU_OVERRIDES_RETURN_ADDRESS_ACCESSORS > - Merge branch 'master' into jdk-8287325 > - Revert to the implementation with zero as the PAC modifier > - Merge branch 'master' into jdk-8287325 > - Update aarch64.ad and jvmci AArch64TestAssembler.java > > Before this patch, rscratch1 is clobbered. > With this patch, we use the rscratch1 register after we save it on the > stack. > > In this way, the code would be consistent with > macroAssembler_aarch64.cpp. > - Merge branch 'master' into jdk-8287325 > - ... and 6 more: https://git.openjdk.org/jdk/compare/84124794...47e9e942 Looks good. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13322#pullrequestreview-1639265602 From iwalulya at openjdk.org Fri Sep 22 09:08:14 2023 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 22 Sep 2023 09:08:14 GMT Subject: RFR: 8315503: G1: Code root scan causes long GC pauses due to imbalanced iteration [v2] In-Reply-To: References: Message-ID: <0qto3ikValrz07kF8A6rPebHGjfJJzYNRdMLJV6Bjsk=.0aa33bea-18bf-487a-ae9c-444ceb011439@github.com> On Thu, 21 Sep 2023 15:16:20 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that modifies the code root (remembered) set to use the CHT as internal representation. >> >> This removes lots of locking (inhibiting throughput), provides automatic balancing for the code root scan phase, and (parallel) bulk unregistering of nmethdos during code cache unloading improving performance of various pauses that deal with code root sets. >> >> With a stress test that frequently loads and unloads 6000 classes and associated methods from them we could previously see the following issues: >> >> During collection pauses: >> >> [4179,965s][gc,phases ] GC(273) Evacuate Collection Set: 812,18ms >> [..] >> [4179,965s][gc,phases ] GC(273) Code Root Scan (ms): Min: 0,00, Avg: 59,03, Max: 775,12, Diff: 775,12, Sum: 944,44, Workers: 16 >> [...] >> [4179,965s][gc,phases ] GC(273) Termination (ms): Min: 0,03, Avg: 643,90, Max: 690,96, Diff: 690,93, Sum: 10302,47, Workers: 16 >> >> >> Code root scan now reduces to ~22ms max on average in this case. >> >> Class unloading (breaking down the code cache flushing, i.e. `CodeCache::flush_unlinked_nmethods`): >> >> Clear Exception Caches 35,5ms >> Unregister NMethods 598,5ms <---- this is nmethod unregistering. >> Unregister Old NMethods 3,0ms >> CodeBlob flush 41,1ms >> CodeCache free 5730,3ms >> >> >> With this change, the `unregister nmethods` phase takes ~25ms max on that stress test. @walulyai contributed this part. >> >> We have recently seen some imbalances in code root scan and long Remark pauses (thankfully not to that extreme) in other real-world applications too: >> >> [2466.979s][gc,phases ] GC(131) Code Root Scan (ms): Min: 0.0, Avg: 5.7, Max: 46.4, Diff: 46.4, Sum: 57.0, Workers: 10 >> >> >> Some random comment: >> * the mutex for the CHT had to be decreased in priority by one to not conflict with `CodeCache_lock`. This does not seem to be detrimental otherwise. At the same time, I had to move the locks at `nosafepoint-3` to `nosafepoint-4` as well to keep previous ordering. All mutexes with uses of `nosafepoint` as their rank seem to be good now. >> >> Testing: tier1-5 >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > iwalulya review LGTM! Minor suggestions! src/hotspot/share/gc/g1/g1CodeRootSet.cpp line 205: > 203: ++_num_retained; > 204: return false; > 205: } Suggestion: G1CodeRootSetHashTableDeleteUnlinked() {} bool operator()(G1CodeRootSetHashTableValue* value) { nmethod* unlinked_next = value->_nmethod->unlinked_next(); if (unlinked_next != nullptr) { return true; } else { return false; } src/hotspot/share/gc/g1/g1CodeRootSet.cpp line 231: > 229: > 230: Atomic::store(&_num_entries, delete_check._num_retained); > 231: shrink_to_match(delete_check._num_retained); Called under HR Claimer, so can be simplified to: Suggestion: G1CodeRootSetHashTableDeleteUnlinked delete_check; clean(delete_check); src/hotspot/share/gc/g1/g1CodeRootSet.hpp line 44: > 42: const static size_t SmallSize = 32; > 43: const static size_t Threshold = 24; > 44: const static size_t LargeSize = 512; Above constants are not used anymore ------------- Marked as reviewed by iwalulya (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15811#pullrequestreview-1639424197 PR Review Comment: https://git.openjdk.org/jdk/pull/15811#discussion_r1334100620 PR Review Comment: https://git.openjdk.org/jdk/pull/15811#discussion_r1334099579 PR Review Comment: https://git.openjdk.org/jdk/pull/15811#discussion_r1334065899 From pli at openjdk.org Fri Sep 22 09:53:03 2023 From: pli at openjdk.org (Pengfei Li) Date: Fri, 22 Sep 2023 09:53:03 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v3] In-Reply-To: References: Message-ID: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. Pengfei Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge branch 'master' (as of Sep 20) into postloop - Address part of comments from Emanuel - JDK-8308994: C2: Re-implement experimental post loop vectorization ------------- Changes: https://git.openjdk.org/jdk/pull/14581/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14581&range=02 Stats: 1999 lines in 38 files changed: 1993 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14581.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14581/head:pull/14581 PR: https://git.openjdk.org/jdk/pull/14581 From mdoerr at openjdk.org Fri Sep 22 09:54:25 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 22 Sep 2023 09:54:25 GMT Subject: RFR: 8316735: Print LockStack in hs_err files Message-ID: Example output: Objects fast locked by this thread (top to bottom): LockStack[1]: nsk.share.jdi.EventHandler {0x00000000bcc28198} - klass: 'nsk/share/jdi/EventHandler' - ---- fields (total size 5 words): - private volatile 'wasInterrupted' 'Z' @12 false (0x00) - private 'debuggee' 'Lnsk/share/jdi/Debugee;' @16 a 'nsk/share/jdi/LocalLaunchedDebugee'{0x00000000bcc08c18} (0xbcc08c18) - private 'log' 'Lnsk/share/Log;' @20 a 'nsk/share/Log'{0x00000000bcc08cb0} (0xbcc08cb0) - private 'vm' 'Lcom/sun/jdi/VirtualMachine;' @24 a 'com/sun/tools/jdi/VirtualMachineImpl'{0x00000000bccb3d60} (0xbccb3d60) - private 'requestManager' 'Lcom/sun/jdi/request/EventRequestManager;' @28 a 'com/sun/tools/jdi/EventRequestManagerImpl'{0x00000000bccb56f8} (0xbccb56f8) - private 'listenThread' 'Ljava/lang/Thread;' @32 a 'java/lang/Thread'{0x00000000bcc280e8} (0xbcc280e8) LockStack[0]: java.util.Collections$SynchronizedRandomAccessList {0x00000000bcb163e8} - klass: 'java/util/Collections$SynchronizedRandomAccessList' - ---- fields (total size 3 words): - final 'c' 'Ljava/util/Collection;' @12 a 'java/util/Vector'{0x00000000bcb16400} (0xbcb16400) - final 'mutex' 'Ljava/lang/Object;' @16 a 'java/util/Collections$SynchronizedRandomAccessList'{0x00000000bcb163e8} (0xbcb163e8) - final 'list' 'Ljava/util/List;' @20 a 'java/util/Vector'{0x00000000bcb16400} (0xbcb16400) ------------- Commit messages: - 8316735: Print LockStack in hs_err files Changes: https://git.openjdk.org/jdk/pull/15884/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15884&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316735 Stats: 20 lines in 3 files changed: 19 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15884.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15884/head:pull/15884 PR: https://git.openjdk.org/jdk/pull/15884 From evergizova at openjdk.org Fri Sep 22 11:39:21 2023 From: evergizova at openjdk.org (Ekaterina Vergizova) Date: Fri, 22 Sep 2023 11:39:21 GMT Subject: RFR: 8314220: Configurable InlineCacheBuffer size [v2] In-Reply-To: References: Message-ID: <3tTe7-NoMb5JwEaWklD-_TpjiFoe-qQdmesGPY6eFJU=.1e0495f1-1153-4270-9b0d-b73476a662c8@github.com> On Wed, 30 Aug 2023 19:24:00 GMT, Ekaterina Vergizova wrote: >> InlineCacheBuffer size is currently hardcoded to 10K. >> This can lead to multiple ICBufferFull safepoints for InlineCacheBuffer cleanup and possible performance degradation. >> >> Added experimental command line option InlineCacheBufferSize with the same default value, allowing it to be configured for performance experiments with ICBufferFull safepoints frequency. > > Ekaterina Vergizova has updated the pull request incrementally with one additional commit since the last revision: > > Changed type, added range check Can someone please review this fix? Pre-submit failures are unrelated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15271#issuecomment-1731270643 From eosterlund at openjdk.org Fri Sep 22 12:03:15 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 22 Sep 2023 12:03:15 GMT Subject: RFR: 8310239: Add missing cross modifying fence in nmethod entry barriers [v2] In-Reply-To: References: Message-ID: On Tue, 20 Jun 2023 08:26:08 GMT, Erik ?sterlund wrote: >> In fact, there is a current race in the nmethod entry barriers, where what we are doing violates the AMD APM (cf. APM volume 2 section 7.6.1 https://www.amd.com/system/files/TechDocs/24593.pdf). >> In particular, if the compare instruction of the nmethod entry barrier is not yet patched and we call a slow path on thread 1, then before taking the nmethod entry lock, another thread 2 could fix and disarm the nmethod. Then thread 1 will observe *data* suggesting the nmethod has been patched, but never re-executes the patched compare (which might indeed still be stale), hence not qualifying for asynchronous cross modifying code, and neither do we run a serializing cpuid instruction, qualifying for synchronous cross modifying code. In this scenario, we can indeed start executing the nmethod instructions, while observing inconsistent concurrent patching effects, where some instructions will be updated and some not. >> >> The following patch ensures that x86 nmethod entry barriers execute cross modifying fence after calling into the VM, where another thread could have disarmed the nmethod. I also ensured the other platforms perform their fencing after the VM call, instead of before - including a cross_modify_fence in the shared code for OSR nmethod entries. While fencing before will flush out the instruction pipeline, and it shouldn't be populated with problematic instructions until after we start executing the nmethod again, it feels unnecessary to fence on the wrong side of the modifications it wishes to guard, and hence not strictly following the synchronous cross modifying fence recipe. >> >> I'm currently running tier1-5 and running performance testing in aurora. In the interest of time, I'm opening this PR before getting the final result, and will report the results when they come in. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Typo in comment Any takers? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14543#issuecomment-1731298381 From mcimadamore at openjdk.org Fri Sep 22 13:44:20 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Fri, 22 Sep 2023 13:44:20 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v22] In-Reply-To: <57W4fhF3ZFjp-PYI4CoFbNtbBfyh2c0S46bRz1D8T0c=.ff1a4e84-e440-43cf-9366-7624e68cc609@github.com> References: <57W4fhF3ZFjp-PYI4CoFbNtbBfyh2c0S46bRz1D8T0c=.ff1a4e84-e440-43cf-9366-7624e68cc609@github.com> Message-ID: On Mon, 18 Sep 2023 14:17:30 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > Avoid eager use of LibFallback in FallbackLinker static block src/java.base/share/classes/java/lang/foreign/Linker.java line 152: > 150: *

> 151: * The following table shows some examples of how C types are modelled in Linux/x64 according to the > 152: * "System V Application Binary Interface - AMD64 Architecture Processor Supplement" (all the examples provided I have seen some discussion on this and I agree an authoritative link does not exist (I have searched for it in the past). I guess I'm not super wild about also including "AMD64 Architecture Processor Supplement". E.g. IMHO "System V Application Binary Interface" is more than enough? src/java.base/share/classes/java/lang/foreign/Linker.java line 409: > 407: * > 408: * Variadic functions are C functions which can accept a variable number and type of arguments. They are declared with a > 409: * trailing ellipsis ({@code ...}) at the end of the formal parameter list, such as: {@code void foo(int x, ...);}. Looking at the javadoc - it seems to me that the `;` after the declaration of `foo` leads to a bit of jarring as it is immediately followed by a period (`.`). Consider dropping that - or maybe put the declaration in a snippet. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1334395919 PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1334399541 From mcimadamore at openjdk.org Fri Sep 22 14:06:20 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Fri, 22 Sep 2023 14:06:20 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v22] In-Reply-To: <57W4fhF3ZFjp-PYI4CoFbNtbBfyh2c0S46bRz1D8T0c=.ff1a4e84-e440-43cf-9366-7624e68cc609@github.com> References: <57W4fhF3ZFjp-PYI4CoFbNtbBfyh2c0S46bRz1D8T0c=.ff1a4e84-e440-43cf-9366-7624e68cc609@github.com> Message-ID: On Mon, 18 Sep 2023 14:17:30 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > Avoid eager use of LibFallback in FallbackLinker static block src/java.base/share/classes/java/lang/foreign/SegmentAllocator.java line 310: > 308: > 309: /** > 310: * {@return a new memory segment with a {@linkplain MemorySegment#byteSize() byteSize()} of Suggestion: * {@return a new memory segment with a {@linkplain MemorySegment#byteSize() byte size} of ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1334428574 From eastigeevich at openjdk.org Fri Sep 22 14:14:14 2023 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 22 Sep 2023 14:14:14 GMT Subject: RFR: 8314029: Add file name parameter to Compiler.perfmap In-Reply-To: References: Message-ID: On Thu, 21 Sep 2023 20:43:56 GMT, Yi-Fan Tsai wrote: > `jcmd Compiler.perfmap` uses the hard-coded file name for a perf map: `/tmp/perf-%d.map`. This change adds an option for specifying a file name. > > The help message of Compiler.perfmap: > > Compiler.perfmap > Write map file for Linux perf tool. > > Impact: Low > > Syntax : Compiler.perfmap [options] > > Options: (options must be specified using the or = syntax) > filename : [optional] Name of the map file (STRING, no default value) src/hotspot/share/code/codeCache.cpp line 1805: > 1803: CodeCache::DefaultPerfMapFile::DefaultPerfMapFile() { > 1804: // Perf expects to find the map file at /tmp/perf-.map. > 1805: jio_snprintf(_name, sizeof(_name), "/tmp/perf-%d.map", os::current_process_id()); Please change the comment to: // Perf expects to find the map file at /tmp/perf-.map. // It is used as the default file name. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15871#discussion_r1334437849 From mcimadamore at openjdk.org Fri Sep 22 14:14:19 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Fri, 22 Sep 2023 14:14:19 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v22] In-Reply-To: References: <57W4fhF3ZFjp-PYI4CoFbNtbBfyh2c0S46bRz1D8T0c=.ff1a4e84-e440-43cf-9366-7624e68cc609@github.com> Message-ID: On Fri, 22 Sep 2023 14:03:52 GMT, Maurizio Cimadamore wrote: >> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: >> >> Avoid eager use of LibFallback in FallbackLinker static block > > src/java.base/share/classes/java/lang/foreign/SegmentAllocator.java line 310: > >> 308: >> 309: /** >> 310: * {@return a new memory segment with a {@linkplain MemorySegment#byteSize() byteSize()} of > > Suggestion: > > * {@return a new memory segment with a {@linkplain MemorySegment#byteSize() byte size} of Also, in the panama repo I see this: Allocates a memory segment with the given layout and initializes it with the bytes in the provided source memory segment. Which seems more correct - e.g. more consistent with other allocation methods, and also more succinct (note that the first sentence is really what shows up in the method summary javadoc, so there is a certain interest in providing a quick description of what the method does, ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1334434230 From mcimadamore at openjdk.org Fri Sep 22 14:14:20 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Fri, 22 Sep 2023 14:14:20 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v22] In-Reply-To: References: <57W4fhF3ZFjp-PYI4CoFbNtbBfyh2c0S46bRz1D8T0c=.ff1a4e84-e440-43cf-9366-7624e68cc609@github.com> Message-ID: On Fri, 22 Sep 2023 14:08:29 GMT, Maurizio Cimadamore wrote: >> src/java.base/share/classes/java/lang/foreign/SegmentAllocator.java line 310: >> >>> 308: >>> 309: /** >>> 310: * {@return a new memory segment with a {@linkplain MemorySegment#byteSize() byteSize()} of >> >> Suggestion: >> >> * {@return a new memory segment with a {@linkplain MemorySegment#byteSize() byte size} of > > Also, in the panama repo I see this: > > Allocates a memory segment with the given layout and initializes it with the bytes in the provided source memory segment. > > Which seems more correct - e.g. more consistent with other allocation methods, and also more succinct (note that the first sentence is really what shows up in the method summary javadoc, so there is a certain interest in providing a quick description of what the method does, Panama repo change: https://github.com/openjdk/panama-foreign/commit/06e2017883c939188103c4dd53185417a00b2921 But, this code was altered in a follow up merge - maybe the merge was problematic? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1334437424 From igavrilin at openjdk.org Fri Sep 22 14:25:33 2023 From: igavrilin at openjdk.org (Ilya Gavrilin) Date: Fri, 22 Sep 2023 14:25:33 GMT Subject: RFR: 8316743: RISC-V: Change UseVectorizedMismatchIntrinsic option result to warning Message-ID: Please review this small change for UseVectorizedMismatchIntrinsic option. On RISC-V we do not have VectorizedMismatch intrinsic, so `void LIRGenerator::do_vectorizedMismatch(Intrinsic* x)` prodeuces fatal error when this option turned on. Other similar options (like -XX:+UseCRC32Intrinsics) produces only warning: https://github.com/openjdk/jdk/blob/c90d63105ca774c047d5f5a4348aa657efc57953/src/hotspot/cpu/riscv/vm_version_riscv.cpp#L150-L183 Also, on platforms, where VectorizedMismatch unimplemented to we got warning. ------------- Commit messages: - UseVectroizedMismatchIntrinsic option update RISC-V Changes: https://git.openjdk.org/jdk/pull/15890/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15890&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316743 Stats: 6 lines in 2 files changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15890/head:pull/15890 PR: https://git.openjdk.org/jdk/pull/15890 From mcimadamore at openjdk.org Fri Sep 22 14:34:23 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Fri, 22 Sep 2023 14:34:23 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v22] In-Reply-To: <57W4fhF3ZFjp-PYI4CoFbNtbBfyh2c0S46bRz1D8T0c=.ff1a4e84-e440-43cf-9366-7624e68cc609@github.com> References: <57W4fhF3ZFjp-PYI4CoFbNtbBfyh2c0S46bRz1D8T0c=.ff1a4e84-e440-43cf-9366-7624e68cc609@github.com> Message-ID: On Mon, 18 Sep 2023 14:17:30 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > Avoid eager use of LibFallback in FallbackLinker static block Marked as reviewed by mcimadamore (Reviewer). test/micro/org/openjdk/bench/java/lang/foreign/AllocFromSliceTest.java line 48: > 46: @State(org.openjdk.jmh.annotations.Scope.Thread) > 47: @OutputTimeUnit(TimeUnit.NANOSECONDS) > 48: @Fork(value = 3, jvmArgsAppend = { "--enable-native-access=ALL-UNNAMED" }) native access not needed? test/micro/org/openjdk/bench/java/lang/foreign/LoopOverNonConstant.java line 51: > 49: @State(org.openjdk.jmh.annotations.Scope.Thread) > 50: @OutputTimeUnit(TimeUnit.MILLISECONDS) > 51: @Fork(value = 3, jvmArgsAppend = { "--enable-native-access=ALL-UNNAMED" }) Is native access needed? test/micro/org/openjdk/bench/java/lang/foreign/LoopOverSlice.java line 52: > 50: @State(org.openjdk.jmh.annotations.Scope.Thread) > 51: @OutputTimeUnit(TimeUnit.MILLISECONDS) > 52: @Fork(value = 3, jvmArgsAppend = { "--enable-native-access=ALL-UNNAMED" }) Is --enable-native-access needed? test/micro/org/openjdk/bench/java/lang/foreign/TestLoadBytes.java line 52: > 50: @OutputTimeUnit(TimeUnit.NANOSECONDS) > 51: @Fork(value = 1, jvmArgsAppend = { > 52: "-Dforeign.restricted=permit", This seems obsolete? Maybe check other files too ------------- PR Review: https://git.openjdk.org/jdk/pull/15103#pullrequestreview-1640071167 PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1334460809 PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1334456849 PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1334455707 PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1334452394 From jvernee at openjdk.org Fri Sep 22 14:34:25 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 22 Sep 2023 14:34:25 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v22] In-Reply-To: References: <57W4fhF3ZFjp-PYI4CoFbNtbBfyh2c0S46bRz1D8T0c=.ff1a4e84-e440-43cf-9366-7624e68cc609@github.com> Message-ID: On Fri, 22 Sep 2023 14:10:58 GMT, Maurizio Cimadamore wrote: >> Also, in the panama repo I see this: >> >> Allocates a memory segment with the given layout and initializes it with the bytes in the provided source memory segment. >> >> Which seems more correct - e.g. more consistent with other allocation methods, and also more succinct (note that the first sentence is really what shows up in the method summary javadoc, so there is a certain interest in providing a quick description of what the method does, > > Panama repo change: > https://github.com/openjdk/panama-foreign/commit/06e2017883c939188103c4dd53185417a00b2921 > > But, this code was altered in a follow up merge - maybe the merge was problematic? This was changed in the main line repo as a result of: https://github.com/openjdk/jdk/pull/14997 Since all the other methods were using `{@return ...}` I changed this new overload to that style as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1334462650 From jvernee at openjdk.org Fri Sep 22 14:34:25 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 22 Sep 2023 14:34:25 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v22] In-Reply-To: References: <57W4fhF3ZFjp-PYI4CoFbNtbBfyh2c0S46bRz1D8T0c=.ff1a4e84-e440-43cf-9366-7624e68cc609@github.com> Message-ID: On Fri, 22 Sep 2023 14:31:08 GMT, Jorn Vernee wrote: >> Panama repo change: >> https://github.com/openjdk/panama-foreign/commit/06e2017883c939188103c4dd53185417a00b2921 >> >> But, this code was altered in a follow up merge - maybe the merge was problematic? > > This was changed in the main line repo as a result of: https://github.com/openjdk/jdk/pull/14997 Since all the other methods were using `{@return ...}` I changed this new overload to that style as well. I think I did the same when resolving the merge ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1334463091 From rrich at openjdk.org Fri Sep 22 14:37:53 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 22 Sep 2023 14:37:53 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v10] In-Reply-To: References: Message-ID: > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Feedback Thomas ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/ac6bddbb..86747ff7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=08-09 Stats: 52 lines in 4 files changed: 24 ins; 12 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From jvernee at openjdk.org Fri Sep 22 14:38:22 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 22 Sep 2023 14:38:22 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v22] In-Reply-To: References: <57W4fhF3ZFjp-PYI4CoFbNtbBfyh2c0S46bRz1D8T0c=.ff1a4e84-e440-43cf-9366-7624e68cc609@github.com> Message-ID: <2i6hAPjL9qnofo6nFyUGjC9MBUxyYDtgWWDsvHtoBvc=.0a6f9585-319d-44e2-842e-adb61c80811a@github.com> On Fri, 22 Sep 2023 14:31:30 GMT, Jorn Vernee wrote: >> This was changed in the main line repo as a result of: https://github.com/openjdk/jdk/pull/14997 Since all the other methods were using `{@return ...}` I changed this new overload to that style as well. > > I think I did the same when resolving the merge I don't mind changing it back to the old style, but I think the style should be consistent for all the allocateFrom overloads? So, I'd have to change all of them back. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1334468243 From rrich at openjdk.org Fri Sep 22 14:46:19 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 22 Sep 2023 14:46:19 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v3] In-Reply-To: References: Message-ID: On Thu, 3 Aug 2023 13:59:35 GMT, Thomas Schatzl wrote: >> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: >> >> Limit effect of previous commit to large array handling > > Another option that is likely more readable is putting the distinction between iterating over a large objArray and regular object into `PSCardTable::scan_objects_in_range` instead of trying to split these two. > > I.e. So that this inner loop looks like g1/serial in `HeapRegion::oops_on_memregion_iterate()`. > > I am not sure where the requirement implemented that the last part of a large objArray must be scanned by the thread working on the second-to-last stripe comes from too. > > The performance sensitive part of this scanning code is typically not finding the dirty cards but actually scanning the corresponding objects and doing the per-reference work. @tschatzl I will also try to move processing of the start of a large array to `scavenge_large_array_stripe`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1731544567 From jvernee at openjdk.org Fri Sep 22 14:50:25 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 22 Sep 2023 14:50:25 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v22] In-Reply-To: References: <57W4fhF3ZFjp-PYI4CoFbNtbBfyh2c0S46bRz1D8T0c=.ff1a4e84-e440-43cf-9366-7624e68cc609@github.com> Message-ID: <9eYdE4uXg5zGj4yfFRFYNwpHFMbA_25yqPOhgq6n7vA=.0a40fc9c-86e3-4fef-96f4-d68a82c07438@github.com> On Fri, 22 Sep 2023 14:29:35 GMT, Maurizio Cimadamore wrote: >> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: >> >> Avoid eager use of LibFallback in FallbackLinker static block > > test/micro/org/openjdk/bench/java/lang/foreign/AllocFromSliceTest.java line 48: > >> 46: @State(org.openjdk.jmh.annotations.Scope.Thread) >> 47: @OutputTimeUnit(TimeUnit.NANOSECONDS) >> 48: @Fork(value = 3, jvmArgsAppend = { "--enable-native-access=ALL-UNNAMED" }) > > native access not needed? This class extends `CLayouts` which also creates some native method handles, so it's needed here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1334484155 From jvernee at openjdk.org Fri Sep 22 15:09:24 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 22 Sep 2023 15:09:24 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v22] In-Reply-To: References: <57W4fhF3ZFjp-PYI4CoFbNtbBfyh2c0S46bRz1D8T0c=.ff1a4e84-e440-43cf-9366-7624e68cc609@github.com> Message-ID: On Fri, 22 Sep 2023 13:39:00 GMT, Maurizio Cimadamore wrote: >> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: >> >> Avoid eager use of LibFallback in FallbackLinker static block > > src/java.base/share/classes/java/lang/foreign/Linker.java line 152: > >> 150: *

>> 151: * The following table shows some examples of how C types are modelled in Linux/x64 according to the >> 152: * "System V Application Binary Interface - AMD64 Architecture Processor Supplement" (all the examples provided > > I have seen some discussion on this and I agree an authoritative link does not exist (I have searched for it in the past). I guess I'm not super wild about also including "AMD64 Architecture Processor Supplement". E.g. IMHO "System V Application Binary Interface" is more than enough? Ok, I think I agree, given that we already name x64 as a platform. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1334509092 From jvernee at openjdk.org Fri Sep 22 15:20:05 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 22 Sep 2023 15:20:05 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v23] In-Reply-To: References: Message-ID: <0DuiWnffjpUzBASIfbHchpCP-b75VBI4evA6bb9o_eo=.25faef35-e2ef-47c3-bf97-ffd3dd2a8fd9@github.com> > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 49 commits: - remove unneeded benchmark flags - Merge branch 'master' into JEP22 - 8310659: The jar tool should support allowing access to restricted methods from executable jars Reviewed-by: mcimadamore - Avoid eager use of LibFallback in FallbackLinker static block - add missing space + reflow lines - Fix typo Co-authored-by: ExE Boss <3889017+ExE-Boss at users.noreply.github.com> - 8315917: Passing struct by values seems under specified Reviewed-by: mcimadamore - Merge branch 'master' into JEP22 - Merge branch 'master' into JEP22 - add code snippet - ... and 39 more: https://git.openjdk.org/jdk/compare/c90d6310...5b64181d ------------- Changes: https://git.openjdk.org/jdk/pull/15103/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=22 Stats: 4255 lines in 253 files changed: 2191 ins; 1200 del; 864 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From jvernee at openjdk.org Fri Sep 22 15:20:08 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 22 Sep 2023 15:20:08 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v22] In-Reply-To: <57W4fhF3ZFjp-PYI4CoFbNtbBfyh2c0S46bRz1D8T0c=.ff1a4e84-e440-43cf-9366-7624e68cc609@github.com> References: <57W4fhF3ZFjp-PYI4CoFbNtbBfyh2c0S46bRz1D8T0c=.ff1a4e84-e440-43cf-9366-7624e68cc609@github.com> Message-ID: On Mon, 18 Sep 2023 14:17:30 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > Avoid eager use of LibFallback in FallbackLinker static block After some offline discussion, the changes for adding the `Enable-Native-Access` jar manifest attribute have now bee added too: - commit: https://github.com/openjdk/jdk/pull/15103/commits/6b24e886588a32845249e6d684c5219c27dbf751 - Original PR: https://github.com/openjdk/panama-foreign/pull/843 ------------- PR Comment: https://git.openjdk.org/jdk/pull/15103#issuecomment-1731594343 From mcimadamore at openjdk.org Fri Sep 22 15:22:18 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Fri, 22 Sep 2023 15:22:18 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v22] In-Reply-To: <2i6hAPjL9qnofo6nFyUGjC9MBUxyYDtgWWDsvHtoBvc=.0a6f9585-319d-44e2-842e-adb61c80811a@github.com> References: <57W4fhF3ZFjp-PYI4CoFbNtbBfyh2c0S46bRz1D8T0c=.ff1a4e84-e440-43cf-9366-7624e68cc609@github.com> <2i6hAPjL9qnofo6nFyUGjC9MBUxyYDtgWWDsvHtoBvc=.0a6f9585-319d-44e2-842e-adb61c80811a@github.com> Message-ID: <6XT6RHMxjba8n0P9rx7Pyy8Ot5VbdtupaTJikgYfeD0=.197bdfbe-e863-4529-97ed-c581c4a21d7d@github.com> On Fri, 22 Sep 2023 14:35:12 GMT, Jorn Vernee wrote: >> I think I did the same when resolving the merge > > I don't mind changing it back to the old style, but I think the style should be consistent for all the allocateFrom overloads? So, I'd have to change all of them back. I forgot about the change that went into mainline. Do you have a link of the latest javadoc? I'd like to check how the method summary looks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1334524306 From rrich at openjdk.org Fri Sep 22 15:32:19 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 22 Sep 2023 15:32:19 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v10] In-Reply-To: References: Message-ID: On Fri, 22 Sep 2023 14:37:53 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Feedback Thomas By the way: one reason why it is so hard to follow the code is how alignment is done. Example: `byte_for(first_obj_addr - 1) + 1` which is actually `byte_for(align_up(first_obj_addr, _card_size_in_words))`. Maybe we could have an `AlignOp` parameter: `byte_for(first_obj_addr, AlignOp::align_up)`? I think this would help but maybe it's just me... ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1731616247 From liach at openjdk.org Fri Sep 22 15:33:26 2023 From: liach at openjdk.org (Chen Liang) Date: Fri, 22 Sep 2023 15:33:26 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v23] In-Reply-To: <0DuiWnffjpUzBASIfbHchpCP-b75VBI4evA6bb9o_eo=.25faef35-e2ef-47c3-bf97-ffd3dd2a8fd9@github.com> References: <0DuiWnffjpUzBASIfbHchpCP-b75VBI4evA6bb9o_eo=.25faef35-e2ef-47c3-bf97-ffd3dd2a8fd9@github.com> Message-ID: On Fri, 22 Sep 2023 15:20:05 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedO... > > Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 49 commits: > > - remove unneeded benchmark flags > - Merge branch 'master' into JEP22 > - 8310659: The jar tool should support allowing access to restricted methods from executable jars > > Reviewed-by: mcimadamore > - Avoid eager use of LibFallback in FallbackLinker static block > - add missing space + reflow lines > - Fix typo > > Co-authored-by: ExE Boss <3889017+ExE-Boss at users.noreply.github.com> > - 8315917: Passing struct by values seems under specified > > Reviewed-by: mcimadamore > - Merge branch 'master' into JEP22 > - Merge branch 'master' into JEP22 > - add code snippet > - ... and 39 more: https://git.openjdk.org/jdk/compare/c90d6310...5b64181d Just curious, for `Enable-Native-Access`, if it's present on an automatic module `Automatic-Module-Name` jar, can it apply to only that automatic module instead of all unnamed modules? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15103#issuecomment-1731612017 From jvernee at openjdk.org Fri Sep 22 15:33:27 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 22 Sep 2023 15:33:27 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v23] In-Reply-To: References: <0DuiWnffjpUzBASIfbHchpCP-b75VBI4evA6bb9o_eo=.25faef35-e2ef-47c3-bf97-ffd3dd2a8fd9@github.com> Message-ID: On Fri, 22 Sep 2023 15:26:45 GMT, Chen Liang wrote: > Just curious, for `Enable-Native-Access`, if it's present on an automatic module `Automatic-Module-Name` jar, can it apply to only that automatic module instead of all unnamed modules? No. It's only there for executable jars (run using `java -jar`), which are always placed on the class path. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15103#issuecomment-1731617936 From jvernee at openjdk.org Fri Sep 22 16:32:27 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 22 Sep 2023 16:32:27 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v22] In-Reply-To: <6XT6RHMxjba8n0P9rx7Pyy8Ot5VbdtupaTJikgYfeD0=.197bdfbe-e863-4529-97ed-c581c4a21d7d@github.com> References: <57W4fhF3ZFjp-PYI4CoFbNtbBfyh2c0S46bRz1D8T0c=.ff1a4e84-e440-43cf-9366-7624e68cc609@github.com> <2i6hAPjL9qnofo6nFyUGjC9MBUxyYDtgWWDsvHtoBvc=.0a6f9585-319d-44e2-842e-adb61c80811a@github.com> <6XT6RHMxjba8n0P9rx7Pyy8Ot5VbdtupaTJikgYfeD0=.197bdfbe-e863-4529-97ed-c581c4a21d7d@github.com> Message-ID: On Fri, 22 Sep 2023 15:19:35 GMT, Maurizio Cimadamore wrote: >> I don't mind changing it back to the old style, but I think the style should be consistent for all the allocateFrom overloads? So, I'd have to change all of them back. > > I forgot about the change that went into mainline. Do you have a link of the latest javadoc? I'd like to check how the method summary looks. Here you go: https://cr.openjdk.org/~jvernee/FFM_22_PR_v1/java.base/java/lang/foreign/SegmentAllocator.html#allocateFrom(java.lang.foreign.ValueLayout,java.lang.foreign.MemorySegment,java.lang.foreign.ValueLayout,long,long) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1334600062 From jvernee at openjdk.org Fri Sep 22 16:40:08 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 22 Sep 2023 16:40:08 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v22] In-Reply-To: References: <57W4fhF3ZFjp-PYI4CoFbNtbBfyh2c0S46bRz1D8T0c=.ff1a4e84-e440-43cf-9366-7624e68cc609@github.com> Message-ID: On Fri, 22 Sep 2023 13:41:50 GMT, Maurizio Cimadamore wrote: >> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: >> >> Avoid eager use of LibFallback in FallbackLinker static block > > src/java.base/share/classes/java/lang/foreign/Linker.java line 409: > >> 407: * >> 408: * Variadic functions are C functions which can accept a variable number and type of arguments. They are declared with a >> 409: * trailing ellipsis ({@code ...}) at the end of the formal parameter list, such as: {@code void foo(int x, ...);}. > > Looking at the javadoc - it seems to me that the `;` after the declaration of `foo` leads to a bit of jarring as it is immediately followed by a period (`.`). Consider dropping that - or maybe put the declaration in a snippet. Will drop the period ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1334604329 From jvernee at openjdk.org Fri Sep 22 16:40:04 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 22 Sep 2023 16:40:04 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v24] In-Reply-To: References: Message-ID: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15103/files - new: https://git.openjdk.org/jdk/pull/15103/files/5b64181d..1c24f33e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=22-23 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From mcimadamore at openjdk.org Fri Sep 22 17:01:22 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Fri, 22 Sep 2023 17:01:22 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v22] In-Reply-To: References: <57W4fhF3ZFjp-PYI4CoFbNtbBfyh2c0S46bRz1D8T0c=.ff1a4e84-e440-43cf-9366-7624e68cc609@github.com> <2i6hAPjL9qnofo6nFyUGjC9MBUxyYDtgWWDsvHtoBvc=.0a6f9585-319d-44e2-842e-adb61c80811a@github.com> <6XT6RHMxjba8n0P9rx7Pyy8Ot5VbdtupaTJikgYfeD0=.197bdfbe-e863-4529-97ed-c581c4a21d7d@github.com> Message-ID: <17Hdk-QZiKVqpNzEV_v3vhd6uqDTdIdNpIYQmbETPNc=.ee3d0448-684d-4e67-abcf-f2af95bd1ea5@github.com> On Fri, 22 Sep 2023 16:29:51 GMT, Jorn Vernee wrote: >> I forgot about the change that went into mainline. Do you have a link of the latest javadoc? I'd like to check how the method summary looks. > > Here you go: https://cr.openjdk.org/~jvernee/FFM_22_PR_v1/java.base/java/lang/foreign/SegmentAllocator.html#allocateFrom(java.lang.foreign.ValueLayout,java.lang.foreign.MemorySegment,java.lang.foreign.ValueLayout,long,long) Ok, now I'm more convinced that the method summary really does look bad (or worse, compared to 20). For instance [allocateFrom](https://cr.openjdk.org/~jvernee/FFM_22_PR_v1/java.base/java/lang/foreign/SegmentAllocator.html#allocateFrom(java.lang.foreign.ValueLayout.OfByte,byte...): Returns a new memory segment with a byteSize() initialized with the provided E byte elements as specified by the provided layout (i.e. byte ordering, alignment and size). (same is true for all the other array-accepting `allocateFrom` methods). This should be simplified to: Returns a new memory segment initialized with the elements in the provided byte array. (then, if we want to say that the initialization honors the endianness of the provided layout, we can do so in a followup para, but the method summary should be simple). So, once all the array-accepting methods are fixed, the segment-accepting `allocateFrom` needs to be simplified to: Returns a new memory segment initialized with the contents of the provided segment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1334627056 From sviswanathan at openjdk.org Fri Sep 22 20:51:18 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 22 Sep 2023 20:51:18 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v2] In-Reply-To: <8AUhJXT3sS9-gohY9kANLReqbUXcA28xNPiI2DPYE_k=.6ca0e589-156a-4085-8977-c55a3f95ec79@github.com> References: <8AUhJXT3sS9-gohY9kANLReqbUXcA28xNPiI2DPYE_k=.6ca0e589-156a-4085-8977-c55a3f95ec79@github.com> Message-ID: On Wed, 13 Sep 2023 20:25:22 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: >> >> |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup >> |-------------|------------|---------------|------------------|-----------| >> |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 >> full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 >> small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 >> small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 >> full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 >> full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 >> small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 >> small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 >> full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 >> small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 >> small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 >> full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 >> full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 >> small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 >> small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 >> full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 >> small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 >> small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 >> full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 >> full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 >> small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 >> small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 >> full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 >> small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 >> small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 >> full.AES... > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Removed isEncrypt boolean variable src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 3627: > 3625: __ cmpl(rounds, 52); > 3626: __ jcc(Assembler::greaterEqual, aes_192); > 3627: __ jmp(last_aes_rnd); Could be replaced with __ jcc(Assembler::below, last_aes_rnd); src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 3649: > 3647: __ cmpl(rounds, 60); > 3648: __ jcc(Assembler::aboveEqual, aes_256); > 3649: __ jmp(last_aes_rnd); Could be replaced with __ jcc(Assembler::below, last_aes_rnd); src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 4199: > 4197: //The entire message was encrypted processed in initial and now need to be hashed > 4198: __ cmpl(len, 0); > 4199: __ jcc(Assembler::equal, encrypt_done); We should check for len to be atleast 128 here as the block following processes 128 bytes: __ cmpl(len, 128); __ jcc(Assembler::less, encrypt_done); src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 4241: > 4239: __ jcc(Assembler::equal, encrypt_done); > 4240: > 4241: __ bind(encrypt_done); This is a fall through case: __ cmpl(r14, 0); __ jcc(Assembler::equal, encrypt_done); The above two instructions can be removed. src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 4246: > 4244: __ bind(ghash_done); > 4245: __ movdqu(xmm15, ExternalAddress(counter_mask_linc1_addr()), rbx /*rscratch*/); > 4246: __ vpaddd(xmm9, xmm9, xmm15, Assembler::AVX_128bit); We could do the following here: __ vpaddd(xmm9, xmm9, ExternalAddress(counter_mask_linc1_addr()), Assembler::AVX_128bit, rbx); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1334673738 PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1334674168 PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1334660702 PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1334657499 PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1334665625 From sviswanathan at openjdk.org Fri Sep 22 21:35:21 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 22 Sep 2023 21:35:21 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v2] In-Reply-To: <8AUhJXT3sS9-gohY9kANLReqbUXcA28xNPiI2DPYE_k=.6ca0e589-156a-4085-8977-c55a3f95ec79@github.com> References: <8AUhJXT3sS9-gohY9kANLReqbUXcA28xNPiI2DPYE_k=.6ca0e589-156a-4085-8977-c55a3f95ec79@github.com> Message-ID: On Wed, 13 Sep 2023 20:25:22 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: >> >> |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup >> |-------------|------------|---------------|------------------|-----------| >> |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 >> full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 >> small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 >> small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 >> full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 >> full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 >> small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 >> small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 >> full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 >> small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 >> small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 >> full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 >> full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 >> small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 >> small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 >> full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 >> small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 >> small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 >> full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 >> full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 >> small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 >> small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 >> full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 >> small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 >> small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 >> full.AES... > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Removed isEncrypt boolean variable src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 3929: > 3927: __ vpaddd(xmm6, xmm5, t5, Assembler::AVX_128bit); > 3928: __ vpaddd(xmm7, xmm6, t5, Assembler::AVX_128bit); > 3929: __ vpaddd(xmm8, xmm7, t5, Assembler::AVX_128bit); This could be done more efficiently as follows: __ movdqu(t5, ExternalAddress(counter_mask_linc1_addr()), rbx /*rscratch*/); __ movdqu(t6, ExternalAddress(counter_mask_linc2_addr()), rbx /*rscratch*/); __ vpaddd(xmm2, xmm1, t5, Assembler::AVX_128bit); __ vpaddd(xmm3, xmm1, t6, Assembler::AVX_128bit); __ vpaddd(xmm4, xmm2, t6, Assembler::AVX_128bit); __ vpaddd(xmm5, xmm3, t6, Assembler::AVX_128bit); __ vpaddd(xmm6, xmm4, t6, Assembler::AVX_128bit); __ vpaddd(xmm7, xmm5, t6, Assembler::AVX_128bit); __ vpaddd(xmm8, xmm6, t6, Assembler::AVX_128bit); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1334831118 From sviswanathan at openjdk.org Fri Sep 22 21:57:11 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 22 Sep 2023 21:57:11 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v2] In-Reply-To: <8AUhJXT3sS9-gohY9kANLReqbUXcA28xNPiI2DPYE_k=.6ca0e589-156a-4085-8977-c55a3f95ec79@github.com> References: <8AUhJXT3sS9-gohY9kANLReqbUXcA28xNPiI2DPYE_k=.6ca0e589-156a-4085-8977-c55a3f95ec79@github.com> Message-ID: <41YkWu3U3uGBxtk_WBxpe5Ph5Qc6azAji7NEeGBPvP4=.9ae8b950-ce12-4d7b-a5f0-6b7054afe382@github.com> On Wed, 13 Sep 2023 20:25:22 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: >> >> |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup >> |-------------|------------|---------------|------------------|-----------| >> |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 >> full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 >> small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 >> small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 >> full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 >> full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 >> small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 >> small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 >> full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 >> small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 >> small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 >> full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 >> full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 >> small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 >> small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 >> full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 >> small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 >> small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 >> full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 >> full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 >> small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 >> small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 >> full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 >> small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 >> small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 >> full.AES... > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Removed isEncrypt boolean variable src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 4045: > 4043: __ cmpl(rounds, 52); > 4044: __ jcc(Assembler::greaterEqual, aes_192); > 4045: __ jmp(last_aes_rnd); This could be replaced by: __ jcc(Assembler::less, last_aes_rnd); src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 4068: > 4066: __ cmpl(rounds, 60); > 4067: __ jcc(Assembler::aboveEqual, aes_256); > 4068: __ jmp(last_aes_rnd); This could be replaced by: __ jcc(Assembler::less, last_aes_rnd); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1334835575 PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1334835717 From dlong at openjdk.org Fri Sep 22 23:52:11 2023 From: dlong at openjdk.org (Dean Long) Date: Fri, 22 Sep 2023 23:52:11 GMT Subject: RFR: 8314220: Configurable InlineCacheBuffer size [v2] In-Reply-To: References: Message-ID: On Wed, 30 Aug 2023 19:24:00 GMT, Ekaterina Vergizova wrote: >> InlineCacheBuffer size is currently hardcoded to 10K. >> This can lead to multiple ICBufferFull safepoints for InlineCacheBuffer cleanup and possible performance degradation. >> >> Added experimental command line option InlineCacheBufferSize with the same default value, allowing it to be configured for performance experiments with ICBufferFull safepoints frequency. > > Ekaterina Vergizova has updated the pull request incrementally with one additional commit since the last revision: > > Changed type, added range check src/hotspot/share/runtime/flags/jvmFlagConstraintsCompiler.cpp line 459: > 457: } > 458: > 459: if ((value % CodeEntryAlignment) != 0) { I don't understand why this is necessary. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15271#discussion_r1334885434 From dlong at openjdk.org Fri Sep 22 23:57:12 2023 From: dlong at openjdk.org (Dean Long) Date: Fri, 22 Sep 2023 23:57:12 GMT Subject: RFR: 8314220: Configurable InlineCacheBuffer size [v2] In-Reply-To: References: Message-ID: On Wed, 30 Aug 2023 19:24:00 GMT, Ekaterina Vergizova wrote: >> InlineCacheBuffer size is currently hardcoded to 10K. >> This can lead to multiple ICBufferFull safepoints for InlineCacheBuffer cleanup and possible performance degradation. >> >> Added experimental command line option InlineCacheBufferSize with the same default value, allowing it to be configured for performance experiments with ICBufferFull safepoints frequency. > > Ekaterina Vergizova has updated the pull request incrementally with one additional commit since the last revision: > > Changed type, added range check src/hotspot/share/runtime/flags/jvmFlagConstraintsCompiler.cpp line 443: > 441: > 442: JVMFlag::Error InlineCacheBufferSizeConstraintFunc(int value, bool verbose) { > 443: if (value <= 0) { Shouldn't be needed if the type is unsigned. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15271#discussion_r1334886456 From dlong at openjdk.org Fri Sep 22 23:57:14 2023 From: dlong at openjdk.org (Dean Long) Date: Fri, 22 Sep 2023 23:57:14 GMT Subject: RFR: 8314220: Configurable InlineCacheBuffer size [v2] In-Reply-To: References: Message-ID: <5X84H-etGQ-RDan1RTnnZVmXujoo6aleWopu3Hl_J0k=.d82350d3-be39-4acb-accb-ddcb7a8a6fa4@github.com> On Wed, 30 Aug 2023 19:17:16 GMT, Ekaterina Vergizova wrote: >> src/hotspot/share/runtime/globals.hpp line 299: >> >>> 297: \ >>> 298: product(uintx, InlineCacheBufferSize, 10*K, EXPERIMENTAL, \ >>> 299: "InlineCacheBuffer size") \ >> >> Can you make this type an int and add a range to it? Line 143 above will get -Wconversion warnings if we ever turned them on. > > Thanks, I changed type to int and added a range check constraint. I'd rather have the type as size_t and change StubQueue accordingly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15271#discussion_r1334886517 From fyang at openjdk.org Sat Sep 23 01:29:19 2023 From: fyang at openjdk.org (Fei Yang) Date: Sat, 23 Sep 2023 01:29:19 GMT Subject: RFR: 8316743: RISC-V: Change UseVectorizedMismatchIntrinsic option result to warning In-Reply-To: References: Message-ID: On Fri, 22 Sep 2023 14:17:40 GMT, Ilya Gavrilin wrote: > Please review this small change for UseVectorizedMismatchIntrinsic option. > On RISC-V we do not have VectorizedMismatch intrinsic, so `void LIRGenerator::do_vectorizedMismatch(Intrinsic* x)` prodeuces fatal error when this option turned on. > Other similar options (like -XX:+UseCRC32Intrinsics) produces only warning: https://github.com/openjdk/jdk/blob/c90d63105ca774c047d5f5a4348aa657efc57953/src/hotspot/cpu/riscv/vm_version_riscv.cpp#L150-L183 > Also, on platforms, where VectorizedMismatch unimplemented to we got warning. Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15890#pullrequestreview-1640761477 From aph at openjdk.org Sat Sep 23 08:41:18 2023 From: aph at openjdk.org (Andrew Haley) Date: Sat, 23 Sep 2023 08:41:18 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v4] In-Reply-To: References: Message-ID: <_OOe2WCX5FFEHYrkY_Bne2b-EZtMmQa5042qdARjSw0=.04809ebd-fd1e-4143-8aa7-42095afd6c8c@github.com> On Wed, 20 Sep 2023 16:17:00 GMT, Aleksey Shipilev wrote: >> Work in progress, submitting for broader attention. >> >> See more details in the bug and related issues. >> >> This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. >> >> This implements the mitigation for AArch64 and x86_64. More platforms can be implemented in this PR, or deferred to later PRs. Port maintainers, feel free to suggest the patches for your arches, I'll be happy to fold them in. >> >> Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. >> >> Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. >> >> Current PR deliberately sets backoff at `1000` to simplify testing. The actual value should be chosen by broader experiments. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` >> - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Cleaner AArch64 code OK for AArch64 with minor changes. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 1542: > 1540: ldrw(rscratch1, Address(rthread, JavaThread::backoff_secondary_super_miss_offset())); > 1541: subsw(rscratch1, rscratch1, 1); > 1542: br(Assembler::GT, L_skip); Suggestion: subw(rscratch1, rscratch1, 1); cbzw(rscratch1, 31, L_skip); I know this is >= 0 rather than > 0, but that doesn't matter, and we should make this code small. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 1552: > 1550: // The operations above destroy condition codes set by scan. > 1551: // This is the success path, restore them ourselves. > 1552: cmp(zr, zr); // Set Z flag Suggestion: ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15718#pullrequestreview-1640804554 PR Review Comment: https://git.openjdk.org/jdk/pull/15718#discussion_r1334944173 PR Review Comment: https://git.openjdk.org/jdk/pull/15718#discussion_r1334943973 From luhenry at openjdk.org Sat Sep 23 09:56:10 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Sat, 23 Sep 2023 09:56:10 GMT Subject: RFR: 8316743: RISC-V: Change UseVectorizedMismatchIntrinsic option result to warning In-Reply-To: References: Message-ID: On Fri, 22 Sep 2023 14:17:40 GMT, Ilya Gavrilin wrote: > Please review this small change for UseVectorizedMismatchIntrinsic option. > On RISC-V we do not have VectorizedMismatch intrinsic, so `void LIRGenerator::do_vectorizedMismatch(Intrinsic* x)` prodeuces fatal error when this option turned on. > Other similar options (like -XX:+UseCRC32Intrinsics) produces only warning: https://github.com/openjdk/jdk/blob/c90d63105ca774c047d5f5a4348aa657efc57953/src/hotspot/cpu/riscv/vm_version_riscv.cpp#L150-L183 > Also, on platforms, where VectorizedMismatch unimplemented to we got warning. Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15890#pullrequestreview-1640862657 From haosun at openjdk.org Mon Sep 25 01:11:48 2023 From: haosun at openjdk.org (Hao Sun) Date: Mon, 25 Sep 2023 01:11:48 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v12] In-Reply-To: References: Message-ID: > ### Background > > 1. PAC-RET branch protection was initially implemented on Linux/AArch64 in JDK-8277204 [1]. > > 2. However, it was broken with the introduction of virtual threads [2], mainly because the continuation freeze/thaw mechanism would trigger stack copying to/from memory, whereas the saved and signed LR on the stack doesn't get re-signed accordingly. > > 3. PR-9067 [3] tried to implement the re-sign part, but it was not accepted because option "PreserveFramePointer" is always turned on by PAC-RET but this would slow down virtual threads by ~5-20x. > > 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview language features are enabled. Note that virtual thread is one preview feature then. > > 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. > > ### Goal > > This patch aims to make PAC-RET compatible with virtual threads. > > ### Requirements of virtual threads > > R-1: Option "PreserveFramePointer" should be turned off. That is, PAC-RET implementation should not rely on frame pointer FP. Otherwise, the fast path in stack copying will never be taken. > > R-2: Use some invariant values to stack copying as the modifier, so as to avoid the PAC re-sign for continuation thaw, as the fast path in stack copying doesn't walk the frame. > > Note that more details can be found in the discussion [3]. > > ### Investigation > > We considered to use (relative) stack pointer SP, thread ID, PACStack [7] and value zero as the candidate modifier. > > 1. SP: In some scenarios, we need to authenticate the return address in places where the current SP doesn't match the SP on function entry. E.g. see the usage in Runtime1::generate_handle_exception(). Hence, neither absolute nor relative SP works. > > 2. thread ID (tid): It's invariant to virtual thread, but it's nontrivial to access it from the JIT side. We need 1) firstly resolve the address of current thread (See [8] as an example), and 2) get the tid field in the way like java_lang_Thread::thread_id(). I suppose this would introduce big performance overhead. Then can we turn to use "rthread" register (JavaThread object address) as the modifier? Unfortunately, it's not an invariant to virtual threads and PAC re-sign is still needed. > > 5. PACStack uses the signed return address of caller as the modifier to sign the callee's return address. In this way, we get one PACed call chain. The modifier should be saved into somewhere around the frame record. Inevitably, FP should be preserved to make it easy to find this modifier in case of... Hao Sun has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: - Merge branch 'master' into jdk-8287325 - Not use pacia1716 and reuse retaddr_slot - Merge branch 'master' into jdk-8287325 - break long lines - Refactor long assertions in continuationFreezeThaw.cpp - Introduce CPU_OVERRIDES_RETURN_ADDRESS_ACCESSORS - Merge branch 'master' into jdk-8287325 - Revert to the implementation with zero as the PAC modifier - Merge branch 'master' into jdk-8287325 - Update aarch64.ad and jvmci AArch64TestAssembler.java Before this patch, rscratch1 is clobbered. With this patch, we use the rscratch1 register after we save it on the stack. In this way, the code would be consistent with macroAssembler_aarch64.cpp. - ... and 7 more: https://git.openjdk.org/jdk/compare/a2391a92...8bd62e5a ------------- Changes: https://git.openjdk.org/jdk/pull/13322/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13322&range=11 Stats: 236 lines in 23 files changed: 94 ins; 32 del; 110 mod Patch: https://git.openjdk.org/jdk/pull/13322.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13322/head:pull/13322 PR: https://git.openjdk.org/jdk/pull/13322 From dholmes at openjdk.org Mon Sep 25 01:25:11 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 25 Sep 2023 01:25:11 GMT Subject: RFR: 8316735: Print LockStack in hs_err files In-Reply-To: References: Message-ID: On Fri, 22 Sep 2023 09:47:22 GMT, Martin Doerr wrote: > Example output: > > Objects fast locked by this thread (top to bottom): > LockStack[1]: nsk.share.jdi.EventHandler > {0x00000000bcc28198} - klass: 'nsk/share/jdi/EventHandler' > - ---- fields (total size 5 words): > - private volatile 'wasInterrupted' 'Z' @12 false (0x00) > - private 'debuggee' 'Lnsk/share/jdi/Debugee;' @16 a 'nsk/share/jdi/LocalLaunchedDebugee'{0x00000000bcc08c18} (0xbcc08c18) > - private 'log' 'Lnsk/share/Log;' @20 a 'nsk/share/Log'{0x00000000bcc08cb0} (0xbcc08cb0) > - private 'vm' 'Lcom/sun/jdi/VirtualMachine;' @24 a 'com/sun/tools/jdi/VirtualMachineImpl'{0x00000000bccb3d60} (0xbccb3d60) > - private 'requestManager' 'Lcom/sun/jdi/request/EventRequestManager;' @28 a 'com/sun/tools/jdi/EventRequestManagerImpl'{0x00000000bccb56f8} (0xbccb56f8) > - private 'listenThread' 'Ljava/lang/Thread;' @32 a 'java/lang/Thread'{0x00000000bcc280e8} (0xbcc280e8) > LockStack[0]: java.util.Collections$SynchronizedRandomAccessList > {0x00000000bcb163e8} - klass: 'java/util/Collections$SynchronizedRandomAccessList' > - ---- fields (total size 3 words): > - final 'c' 'Ljava/util/Collection;' @12 a 'java/util/Vector'{0x00000000bcb16400} (0xbcb16400) > - final 'mutex' 'Ljava/lang/Object;' @16 a 'java/util/Collections$SynchronizedRandomAccessList'{0x00000000bcb163e8} (0xbcb163e8) > - final 'list' 'Ljava/util/List;' @20 a 'java/util/Vector'{0x00000000bcb16400} (0xbcb16400) Yes it is a good idea to have this. Potentially we may want it for thread dumps too (separate RFE fine). Thanks src/hotspot/share/runtime/lockStack.hpp line 33: > 31: #include "utilities/sizes.hpp" > 32: > 33: class JavaThread; Seems an unrelated change. src/hotspot/share/utilities/vmError.cpp line 1173: > 1171: st->print_cr("Objects fast locked by this thread (top to bottom):"); > 1172: JavaThread::cast(_thread)->lock_stack().print_on(st); > 1173: I would have expected this to be printed along with the other current thread info. "fast locked" is not terminology we are using for this any more IIUC. I would suggest just saying this is the lock stack for the thread. ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15884#pullrequestreview-1641190833 PR Review Comment: https://git.openjdk.org/jdk/pull/15884#discussion_r1335294178 PR Review Comment: https://git.openjdk.org/jdk/pull/15884#discussion_r1335295956 From haosun at openjdk.org Mon Sep 25 05:44:25 2023 From: haosun at openjdk.org (Hao Sun) Date: Mon, 25 Sep 2023 05:44:25 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v12] In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 01:11:48 GMT, Hao Sun wrote: >> ### Background >> >> 1. PAC-RET branch protection was initially implemented on Linux/AArch64 in JDK-8277204 [1]. >> >> 2. However, it was broken with the introduction of virtual threads [2], mainly because the continuation freeze/thaw mechanism would trigger stack copying to/from memory, whereas the saved and signed LR on the stack doesn't get re-signed accordingly. >> >> 3. PR-9067 [3] tried to implement the re-sign part, but it was not accepted because option "PreserveFramePointer" is always turned on by PAC-RET but this would slow down virtual threads by ~5-20x. >> >> 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview language features are enabled. Note that virtual thread is one preview feature then. >> >> 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. >> >> ### Goal >> >> This patch aims to make PAC-RET compatible with virtual threads. >> >> ### Requirements of virtual threads >> >> R-1: Option "PreserveFramePointer" should be turned off. That is, PAC-RET implementation should not rely on frame pointer FP. Otherwise, the fast path in stack copying will never be taken. >> >> R-2: Use some invariant values to stack copying as the modifier, so as to avoid the PAC re-sign for continuation thaw, as the fast path in stack copying doesn't walk the frame. >> >> Note that more details can be found in the discussion [3]. >> >> ### Investigation >> >> We considered to use (relative) stack pointer SP, thread ID, PACStack [7] and value zero as the candidate modifier. >> >> 1. SP: In some scenarios, we need to authenticate the return address in places where the current SP doesn't match the SP on function entry. E.g. see the usage in Runtime1::generate_handle_exception(). Hence, neither absolute nor relative SP works. >> >> 2. thread ID (tid): It's invariant to virtual thread, but it's nontrivial to access it from the JIT side. We need 1) firstly resolve the address of current thread (See [8] as an example), and 2) get the tid field in the way like java_lang_Thread::thread_id(). I suppose this would introduce big performance overhead. Then can we turn to use "rthread" register (JavaThread object address) as the modifier? Unfortunately, it's not an invariant to virtual threads and PAC re-sign is still needed. >> >> 5. PACStack uses the signed return address of caller as the modifier to sign the callee's return address. In this way, we get one PACed call chain. The modifier should be saved into somewhere around the frame record. Inevitably, FP sh... > > Hao Sun has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge branch 'master' into jdk-8287325 > - Not use pacia1716 and reuse retaddr_slot > - Merge branch 'master' into jdk-8287325 > - break long lines > - Refactor long assertions in continuationFreezeThaw.cpp > - Introduce CPU_OVERRIDES_RETURN_ADDRESS_ACCESSORS > - Merge branch 'master' into jdk-8287325 > - Revert to the implementation with zero as the PAC modifier > - Merge branch 'master' into jdk-8287325 > - Update aarch64.ad and jvmci AArch64TestAssembler.java > > Before this patch, rscratch1 is clobbered. > With this patch, we use the rscratch1 register after we save it on the > stack. > > In this way, the code would be consistent with > macroAssembler_aarch64.cpp. > - ... and 7 more: https://git.openjdk.org/jdk/compare/a2391a92...8bd62e5a Thanks a lot for your reviews. I merged with the latest code in the mainline and reran the tests locally. All the tests passed. And GHA tests are green too. Let me integrate it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13322#issuecomment-1732951216 From haosun at openjdk.org Mon Sep 25 05:44:27 2023 From: haosun at openjdk.org (Hao Sun) Date: Mon, 25 Sep 2023 05:44:27 GMT Subject: Integrated: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 08:00:20 GMT, Hao Sun wrote: > ### Background > > 1. PAC-RET branch protection was initially implemented on Linux/AArch64 in JDK-8277204 [1]. > > 2. However, it was broken with the introduction of virtual threads [2], mainly because the continuation freeze/thaw mechanism would trigger stack copying to/from memory, whereas the saved and signed LR on the stack doesn't get re-signed accordingly. > > 3. PR-9067 [3] tried to implement the re-sign part, but it was not accepted because option "PreserveFramePointer" is always turned on by PAC-RET but this would slow down virtual threads by ~5-20x. > > 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview language features are enabled. Note that virtual thread is one preview feature then. > > 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. > > ### Goal > > This patch aims to make PAC-RET compatible with virtual threads. > > ### Requirements of virtual threads > > R-1: Option "PreserveFramePointer" should be turned off. That is, PAC-RET implementation should not rely on frame pointer FP. Otherwise, the fast path in stack copying will never be taken. > > R-2: Use some invariant values to stack copying as the modifier, so as to avoid the PAC re-sign for continuation thaw, as the fast path in stack copying doesn't walk the frame. > > Note that more details can be found in the discussion [3]. > > ### Investigation > > We considered to use (relative) stack pointer SP, thread ID, PACStack [7] and value zero as the candidate modifier. > > 1. SP: In some scenarios, we need to authenticate the return address in places where the current SP doesn't match the SP on function entry. E.g. see the usage in Runtime1::generate_handle_exception(). Hence, neither absolute nor relative SP works. > > 2. thread ID (tid): It's invariant to virtual thread, but it's nontrivial to access it from the JIT side. We need 1) firstly resolve the address of current thread (See [8] as an example), and 2) get the tid field in the way like java_lang_Thread::thread_id(). I suppose this would introduce big performance overhead. Then can we turn to use "rthread" register (JavaThread object address) as the modifier? Unfortunately, it's not an invariant to virtual threads and PAC re-sign is still needed. > > 5. PACStack uses the signed return address of caller as the modifier to sign the callee's return address. In this way, we get one PACed call chain. The modifier should be saved into somewhere around the frame record. Inevitably, FP should be preserved to make it easy to find this modifier in case of... This pull request has now been integrated. Changeset: 481cfc79 Author: Hao Sun URL: https://git.openjdk.org/jdk/commit/481cfc798533f5b3adae7cc4a076a98b0b3f9737 Stats: 236 lines in 23 files changed: 94 ins; 32 del; 110 mod 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret Co-authored-by: Nick Gasson Reviewed-by: aph, dlong ------------- PR: https://git.openjdk.org/jdk/pull/13322 From shade at openjdk.org Mon Sep 25 06:26:12 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 25 Sep 2023 06:26:12 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v4] In-Reply-To: <_OOe2WCX5FFEHYrkY_Bne2b-EZtMmQa5042qdARjSw0=.04809ebd-fd1e-4143-8aa7-42095afd6c8c@github.com> References: <_OOe2WCX5FFEHYrkY_Bne2b-EZtMmQa5042qdARjSw0=.04809ebd-fd1e-4143-8aa7-42095afd6c8c@github.com> Message-ID: On Sat, 23 Sep 2023 08:37:46 GMT, Andrew Haley wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Cleaner AArch64 code > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 1542: > >> 1540: ldrw(rscratch1, Address(rthread, JavaThread::backoff_secondary_super_miss_offset())); >> 1541: subsw(rscratch1, rscratch1, 1); >> 1542: br(Assembler::GT, L_skip); > > Suggestion: > > subw(rscratch1, rscratch1, 1); > cbzw(rscratch1, 31, L_skip); > > > I know this is >= 0 rather than > 0, but that doesn't matter, and we should make this code small. Did you mean `tbz(rscratch1, 31, L_skip);` here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15718#discussion_r1335426464 From shade at openjdk.org Mon Sep 25 06:59:30 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 25 Sep 2023 06:59:30 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v5] In-Reply-To: References: Message-ID: > Work in progress, submitting for broader attention. > > See more details in the bug and related issues. > > This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. > > This implements the mitigation for AArch64 and x86_64. More platforms can be implemented in this PR, or deferred to later PRs. Port maintainers, feel free to suggest the patches for your arches, I'll be happy to fold them in. > > Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. > > Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. > > Current PR deliberately sets backoff at `1000` to simplify testing. The actual value should be chosen by broader experiments. > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` > - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Denser AArch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15718/files - new: https://git.openjdk.org/jdk/pull/15718/files/c752a687..81a0ddd2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15718&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15718&range=03-04 Stats: 6 lines in 1 file changed: 0 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15718.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15718/head:pull/15718 PR: https://git.openjdk.org/jdk/pull/15718 From shade at openjdk.org Mon Sep 25 07:06:25 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 25 Sep 2023 07:06:25 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v4] In-Reply-To: References: <_OOe2WCX5FFEHYrkY_Bne2b-EZtMmQa5042qdARjSw0=.04809ebd-fd1e-4143-8aa7-42095afd6c8c@github.com> Message-ID: On Mon, 25 Sep 2023 06:23:47 GMT, Aleksey Shipilev wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 1542: >> >>> 1540: ldrw(rscratch1, Address(rthread, JavaThread::backoff_secondary_super_miss_offset())); >>> 1541: subsw(rscratch1, rscratch1, 1); >>> 1542: br(Assembler::GT, L_skip); >> >> Suggestion: >> >> subw(rscratch1, rscratch1, 1); >> cbzw(rscratch1, 31, L_skip); >> >> >> I know this is >= 0 rather than > 0, but that doesn't matter, and we should make this code small. > > Did you mean `tbz(rscratch1, 31, L_skip);` here? Did `tbz` in the new commit. AFAICS, this indeed makes the flag restoration unnecessary, removed those too. Testing now... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15718#discussion_r1335465467 From azafari at openjdk.org Mon Sep 25 09:20:20 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Mon, 25 Sep 2023 09:20:20 GMT Subject: RFR: 8299915: Remove ArrayAllocatorMallocLimit and associated code [v2] In-Reply-To: References: Message-ID: <4i265Gn613urJfDewi3W2U4-dHgyjkp9Ejs13Cxm5Gs=.0e3a2de3-de51-4eb9-a4e9-fa5b9173b830@github.com> > 1. `ArrayAllocatorMallocLimit` is removed. The test cases that tested it also are removed. > 2. `AllocArrayAllocator` instances are replaced with `MallocArrayAllocator`. > 3. The signature of `CHeapBitMap::free(ptr, size)` is kept as it is, since it is called in this way from `GrowableBitMap::resize`, where `T` can be also `ArenaBitMap` and `ResourceBitMap`. However, it uses `MallocArrayAllocator::free(ptr)` and ignores the `size`: > ```C++ > void CHeapBitMap::free(bm_word_t* map, idx_t size_in_words) const { > MallocArrayAllocator::free(map); > } > > ### Test > tiers1-4 passed on all platforms. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: other size_t flags than the ArrayAllocatorMallocLimit are used in tests. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15859/files - new: https://git.openjdk.org/jdk/pull/15859/files/3371127f..623d076f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15859&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15859&range=00-01 Stats: 55 lines in 2 files changed: 55 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15859.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15859/head:pull/15859 PR: https://git.openjdk.org/jdk/pull/15859 From azafari at openjdk.org Mon Sep 25 09:20:23 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Mon, 25 Sep 2023 09:20:23 GMT Subject: RFR: 8299915: Remove ArrayAllocatorMallocLimit and associated code [v2] In-Reply-To: References: Message-ID: On Fri, 22 Sep 2023 02:40:39 GMT, David Holmes wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> other size_t flags than the ArrayAllocatorMallocLimit are used in tests. > > test/hotspot/jtreg/serviceability/attach/AttachSetGetFlag.java line 64: > >> 62: testGetFlag("ArrayAllocatorMallocLimit", "128"); >> 63: // testSetFlag("ArrayAllocatorMallocLimit", "64", "128"); >> 64: > > You need to replace this with another non-manageable size_t flag so that code coverage is maintained. Fixed. > test/lib-test/jdk/test/whitebox/vm_flags/SizeTTest.java line 1: > >> 1: /* > > This test also should not be removed but changed to use a different size_t flag so that the WB functionality continues to be tested for a flag of this type. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15859#discussion_r1335611692 PR Review Comment: https://git.openjdk.org/jdk/pull/15859#discussion_r1335611842 From aph at openjdk.org Mon Sep 25 09:30:23 2023 From: aph at openjdk.org (Andrew Haley) Date: Mon, 25 Sep 2023 09:30:23 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v4] In-Reply-To: References: <_OOe2WCX5FFEHYrkY_Bne2b-EZtMmQa5042qdARjSw0=.04809ebd-fd1e-4143-8aa7-42095afd6c8c@github.com> Message-ID: On Mon, 25 Sep 2023 07:03:04 GMT, Aleksey Shipilev wrote: >> Did you mean `tbz(rscratch1, 31, L_skip);` here? > > Did `tbz` in the new commit. AFAICS, this indeed makes the flag restoration unnecessary, removed those too. Testing now... > Did you mean `tbz(rscratch1, 31, L_skip);` here? Argh, of course. Yes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15718#discussion_r1335627651 From mdoerr at openjdk.org Mon Sep 25 10:54:25 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 25 Sep 2023 10:54:25 GMT Subject: RFR: 8316735: Print LockStack in hs_err files In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 01:15:07 GMT, David Holmes wrote: >> Example output: >> >> Lock stack of current Java thread (top to bottom): >> LockStack[1]: nsk.share.jdi.EventHandler >> {0x00000000bcc28198} - klass: 'nsk/share/jdi/EventHandler' >> - ---- fields (total size 5 words): >> - private volatile 'wasInterrupted' 'Z' @12 false (0x00) >> - private 'debuggee' 'Lnsk/share/jdi/Debugee;' @16 a 'nsk/share/jdi/LocalLaunchedDebugee'{0x00000000bcc08c18} (0xbcc08c18) >> - private 'log' 'Lnsk/share/Log;' @20 a 'nsk/share/Log'{0x00000000bcc08cb0} (0xbcc08cb0) >> - private 'vm' 'Lcom/sun/jdi/VirtualMachine;' @24 a 'com/sun/tools/jdi/VirtualMachineImpl'{0x00000000bccb3d60} (0xbccb3d60) >> - private 'requestManager' 'Lcom/sun/jdi/request/EventRequestManager;' @28 a 'com/sun/tools/jdi/EventRequestManagerImpl'{0x00000000bccb56f8} (0xbccb56f8) >> - private 'listenThread' 'Ljava/lang/Thread;' @32 a 'java/lang/Thread'{0x00000000bcc280e8} (0xbcc280e8) >> LockStack[0]: java.util.Collections$SynchronizedRandomAccessList >> {0x00000000bcb163e8} - klass: 'java/util/Collections$SynchronizedRandomAccessList' >> - ---- fields (total size 3 words): >> - final 'c' 'Ljava/util/Collection;' @12 a 'java/util/Vector'{0x00000000bcb16400} (0xbcb16400) >> - final 'mutex' 'Ljava/lang/Object;' @16 a 'java/util/Collections$SynchronizedRandomAccessList'{0x00000000bcb163e8} (0xbcb163e8) >> - final 'list' 'Ljava/util/List;' @20 a 'java/util/Vector'{0x00000000bcb16400} (0xbcb16400) > > src/hotspot/share/runtime/lockStack.hpp line 33: > >> 31: #include "utilities/sizes.hpp" >> 32: >> 33: class JavaThread; > > Seems an unrelated change. This is a minor cleanup of the prototypes. lockStack.hpp uses `JavaThread*`, not `Thread*`. Do you prefer not to touch it in this PR? > src/hotspot/share/utilities/vmError.cpp line 1173: > >> 1171: st->print_cr("Objects fast locked by this thread (top to bottom):"); >> 1172: JavaThread::cast(_thread)->lock_stack().print_on(st); >> 1173: > > I would have expected this to be printed along with the other current thread info. > > "fast locked" is not terminology we are using for this any more IIUC. I would suggest just saying this is the lock stack for the thread. I had put it at the end of the "T H R E A D" section. It's moved a bit up, now. Changed wording and added newline. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15884#discussion_r1335719014 PR Review Comment: https://git.openjdk.org/jdk/pull/15884#discussion_r1335719826 From mdoerr at openjdk.org Mon Sep 25 11:13:20 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 25 Sep 2023 11:13:20 GMT Subject: RFR: 8316735: Print LockStack in hs_err files In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 01:22:47 GMT, David Holmes wrote: > Yes it is a good idea to have this. Potentially we may want it for thread dumps too (separate RFE fine). > > Thanks I think that the fast locked objects can already be found by thread dumps by inspecting the frames. But, I haven't checked. Should we file a new issue for investigating? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15884#issuecomment-1733452364 From ayang at openjdk.org Mon Sep 25 11:32:15 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 25 Sep 2023 11:32:15 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v9] In-Reply-To: References: Message-ID: On Thu, 21 Sep 2023 09:49:37 GMT, Richard Reingruber wrote: > So I found the cause for the regression with precise scanning of large arrays: it was the redundant queries of the start array. Yes, `object_start` can be quite expensive when obj-start is far away. > byte_for(align_up(first_obj_addr, _card_size_in_words)) I recall this triggers an assertion failure when the address after alignup is the end-of-heap. > Basically you need a 2nd card table to collect the dirty marks, don't you? I experimented with the aforementioned read-only card table idea a bit and here is the draft: https://github.com/openjdk/jdk/compare/master...albertnetymk:jdk:pgc-precise-obj-arr?expand=1 Most are quite straightforward after creating a "shadow" card table; it also includes an optimization to avoid expensive calls to find object-start. The optimization is nicely isolated from the real work, so the code looks fairly readable, IMO. I can't observe much perf diff running bms using the latest revision of this PR and shadow-card-table, except for `card_scan.java`. Additionally, the cost of calling `object_start` multiple times starts to show up significantly when dirty cards are scarce. For example, `card_scan.java` + `static final int stride = 32 * 64;`: ## this PR [0.002s][info][gc] Using Parallel [1.338s][info][gc] GC(0) Pause Young (Allocation Failure) 1791M->1027M(2944M) 157.749ms [1.593s][info][gc] GC(1) Pause Young (Allocation Failure) 1795M->1027M(2944M) 139.567ms ## shadow card table [0.002s][info][gc] Using Parallel [1.240s][info][gc] GC(0) Pause Young (Allocation Failure) 1791M->1027M(2944M) 25.763ms [1.379s][info][gc] GC(1) Pause Young (Allocation Failure) 1795M->1027M(2944M) 24.372ms ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1733480484 From mdoerr at openjdk.org Mon Sep 25 11:58:47 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 25 Sep 2023 11:58:47 GMT Subject: RFR: 8316735: Print LockStack in hs_err files [v2] In-Reply-To: References: Message-ID: <3rU128P35kRKsdJ4IqrnTAbrWAZqrep4wD53q7JK1YI=.550e838a-d88b-4d0a-81b5-cb611259f2b6@github.com> > Example output: > > Lock stack of current Java thread (top to bottom): > LockStack[1]: nsk.share.jdi.EventHandler > {0x00000000bcc28198} - klass: 'nsk/share/jdi/EventHandler' > - ---- fields (total size 5 words): > - private volatile 'wasInterrupted' 'Z' @12 false (0x00) > - private 'debuggee' 'Lnsk/share/jdi/Debugee;' @16 a 'nsk/share/jdi/LocalLaunchedDebugee'{0x00000000bcc08c18} (0xbcc08c18) > - private 'log' 'Lnsk/share/Log;' @20 a 'nsk/share/Log'{0x00000000bcc08cb0} (0xbcc08cb0) > - private 'vm' 'Lcom/sun/jdi/VirtualMachine;' @24 a 'com/sun/tools/jdi/VirtualMachineImpl'{0x00000000bccb3d60} (0xbccb3d60) > - private 'requestManager' 'Lcom/sun/jdi/request/EventRequestManager;' @28 a 'com/sun/tools/jdi/EventRequestManagerImpl'{0x00000000bccb56f8} (0xbccb56f8) > - private 'listenThread' 'Ljava/lang/Thread;' @32 a 'java/lang/Thread'{0x00000000bcc280e8} (0xbcc280e8) > LockStack[0]: java.util.Collections$SynchronizedRandomAccessList > {0x00000000bcb163e8} - klass: 'java/util/Collections$SynchronizedRandomAccessList' > - ---- fields (total size 3 words): > - final 'c' 'Ljava/util/Collection;' @12 a 'java/util/Vector'{0x00000000bcb16400} (0xbcb16400) > - final 'mutex' 'Ljava/lang/Object;' @16 a 'java/util/Collections$SynchronizedRandomAccessList'{0x00000000bcb163e8} (0xbcb163e8) > - final 'list' 'Ljava/util/List;' @20 a 'java/util/Vector'{0x00000000bcb16400} (0xbcb16400) Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Move up and change wording. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15884/files - new: https://git.openjdk.org/jdk/pull/15884/files/14e4d679..1373be38 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15884&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15884&range=00-01 Stats: 9 lines in 1 file changed: 5 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15884.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15884/head:pull/15884 PR: https://git.openjdk.org/jdk/pull/15884 From ayang at openjdk.org Mon Sep 25 12:06:12 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 25 Sep 2023 12:06:12 GMT Subject: RFR: 8316098: Revise signature of numa_get_leaf_groups In-Reply-To: References: Message-ID: On Mon, 18 Sep 2023 12:22:11 GMT, Albert Mingkun Yang wrote: > Simple refactoring to better reflect NUMA node id is non-negative using unsigned type. > > More cleanup can possibly be done to avoid the use of `checked_cast` in `os_windows.cpp`, but since the NUMA code in Windows is unreachable (JDK-8244065), I went to for the smallest diff there in order to avoid adding more untested code. Filed https://bugs.openjdk.org/browse/JDK-8316886 ------------- PR Comment: https://git.openjdk.org/jdk/pull/15786#issuecomment-1733538801 From rkennke at openjdk.org Mon Sep 25 12:25:28 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 25 Sep 2023 12:25:28 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v58] In-Reply-To: References: Message-ID: > See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. > > Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. > > Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. > > Testing: > - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] tier1 (x86_64, x86_32, aarch64, riscv) > - [x] tier2 (x86_64, aarch64, riscv) > - [x] tier3 (x86_64, riscv) Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Various cleanups - RISC changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11044/files - new: https://git.openjdk.org/jdk/pull/11044/files/ae1cb780..251fee0b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11044&range=57 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11044&range=56-57 Stats: 22 lines in 3 files changed: 11 ins; 9 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11044.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/11044/head:pull/11044 PR: https://git.openjdk.org/jdk/pull/11044 From rkennke at openjdk.org Mon Sep 25 12:30:33 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 25 Sep 2023 12:30:33 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v59] In-Reply-To: References: Message-ID: > See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. > > Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. > > Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. > > Testing: > - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] tier1 (x86_64, x86_32, aarch64, riscv) > - [x] tier2 (x86_64, aarch64, riscv) > - [x] tier3 (x86_64, riscv) Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 88 commits: - Merge remote-tracking branch 'upstream/master' into JDK-8139457 - Various cleanups - RISC changes - Move gap init into allocate_header() (x86) - Fix gtest failure on x86 - Merge remote-tracking branch 'upstream/master' into JDK-8139457 - Fix comments - Fix inconsistencies in argument naming C1_MacroAssembler::allocate_array() - Fix call to arrayOopDesc::header_size() in arm port - Fix wrong alignment - ... and 78 more: https://git.openjdk.org/jdk/compare/0f0c5b2d...8617a596 ------------- Changes: https://git.openjdk.org/jdk/pull/11044/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11044&range=58 Stats: 626 lines in 33 files changed: 478 ins; 83 del; 65 mod Patch: https://git.openjdk.org/jdk/pull/11044.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/11044/head:pull/11044 PR: https://git.openjdk.org/jdk/pull/11044 From ayang at openjdk.org Mon Sep 25 13:26:44 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 25 Sep 2023 13:26:44 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v59] In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 12:30:33 GMT, Roman Kennke wrote: >> See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. >> >> Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. >> >> Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. >> >> Testing: >> - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) >> - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) >> - [x] tier1 (x86_64, x86_32, aarch64, riscv) >> - [x] tier2 (x86_64, aarch64, riscv) >> - [x] tier3 (x86_64, riscv) > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 88 commits: > > - Merge remote-tracking branch 'upstream/master' into JDK-8139457 > - Various cleanups > - RISC changes > - Move gap init into allocate_header() (x86) > - Fix gtest failure on x86 > - Merge remote-tracking branch 'upstream/master' into JDK-8139457 > - Fix comments > - Fix inconsistencies in argument naming C1_MacroAssembler::allocate_array() > - Fix call to arrayOopDesc::header_size() in arm port > - Fix wrong alignment > - ... and 78 more: https://git.openjdk.org/jdk/compare/0f0c5b2d...8617a596 Much of my confusion is caused by the unclear definition of 'header/body' ? specifically, whether the alignment gap is included as part of the header. It's a preexisting issue for instance objs, though. Since some parts of this will be fixed in upcoming PRs, as indicated by `// TODO: This could perhaps go into initialize_body()...`, I believe it can be merged to move this forward. A comprehensive explanation of the memory representation of instances and arrays, including the mark word, klass-pointer, optional alignment gap, and possible squeezed-into fields, would be greatly appreciated in future PRs in this area. That would help clarify the boundary between the header and body. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11044#issuecomment-1733705839 From tschatzl at openjdk.org Mon Sep 25 13:54:58 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 25 Sep 2023 13:54:58 GMT Subject: RFR: 8315503: G1: Code root scan causes long GC pauses due to imbalanced iteration [v3] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that modifies the code root (remembered) set to use the CHT as internal representation. > > This removes lots of locking (inhibiting throughput), provides automatic balancing for the code root scan phase, and (parallel) bulk unregistering of nmethdos during code cache unloading improving performance of various pauses that deal with code root sets. > > With a stress test that frequently loads and unloads 6000 classes and associated methods from them we could previously see the following issues: > > During collection pauses: > > [4179,965s][gc,phases ] GC(273) Evacuate Collection Set: 812,18ms > [..] > [4179,965s][gc,phases ] GC(273) Code Root Scan (ms): Min: 0,00, Avg: 59,03, Max: 775,12, Diff: 775,12, Sum: 944,44, Workers: 16 > [...] > [4179,965s][gc,phases ] GC(273) Termination (ms): Min: 0,03, Avg: 643,90, Max: 690,96, Diff: 690,93, Sum: 10302,47, Workers: 16 > > > Code root scan now reduces to ~22ms max on average in this case. > > Class unloading (breaking down the code cache flushing, i.e. `CodeCache::flush_unlinked_nmethods`): > > Clear Exception Caches 35,5ms > Unregister NMethods 598,5ms <---- this is nmethod unregistering. > Unregister Old NMethods 3,0ms > CodeBlob flush 41,1ms > CodeCache free 5730,3ms > > > With this change, the `unregister nmethods` phase takes ~25ms max on that stress test. @walulyai contributed this part. > > We have recently seen some imbalances in code root scan and long Remark pauses (thankfully not to that extreme) in other real-world applications too: > > [2466.979s][gc,phases ] GC(131) Code Root Scan (ms): Min: 0.0, Avg: 5.7, Max: 46.4, Diff: 46.4, Sum: 57.0, Workers: 10 > > > Some random comment: > * the mutex for the CHT had to be decreased in priority by one to not conflict with `CodeCache_lock`. This does not seem to be detrimental otherwise. At the same time, I had to move the locks at `nosafepoint-3` to `nosafepoint-4` as well to keep previous ordering. All mutexes with uses of `nosafepoint` as their rank seem to be good now. > > Testing: tier1-5 > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: iwalulya review - more (gtest) cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15811/files - new: https://git.openjdk.org/jdk/pull/15811/files/afad0655..97bf0892 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15811&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15811&range=01-02 Stats: 58 lines in 3 files changed: 0 ins; 47 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/15811.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15811/head:pull/15811 PR: https://git.openjdk.org/jdk/pull/15811 From mdoerr at openjdk.org Mon Sep 25 14:04:49 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 25 Sep 2023 14:04:49 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object Message-ID: I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. Currently only PPC64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). ------------- Commit messages: - 8316746: Top of lock-stack does not match the unlocked object Changes: https://git.openjdk.org/jdk/pull/15903/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15903&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316746 Stats: 36 lines in 4 files changed: 13 ins; 11 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/15903.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15903/head:pull/15903 PR: https://git.openjdk.org/jdk/pull/15903 From tschatzl at openjdk.org Mon Sep 25 14:13:34 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 25 Sep 2023 14:13:34 GMT Subject: RFR: 8315503: G1: Code root scan causes long GC pauses due to imbalanced iteration [v4] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that modifies the code root (remembered) set to use the CHT as internal representation. > > This removes lots of locking (inhibiting throughput), provides automatic balancing for the code root scan phase, and (parallel) bulk unregistering of nmethdos during code cache unloading improving performance of various pauses that deal with code root sets. > > With a stress test that frequently loads and unloads 6000 classes and associated methods from them we could previously see the following issues: > > During collection pauses: > > [4179,965s][gc,phases ] GC(273) Evacuate Collection Set: 812,18ms > [..] > [4179,965s][gc,phases ] GC(273) Code Root Scan (ms): Min: 0,00, Avg: 59,03, Max: 775,12, Diff: 775,12, Sum: 944,44, Workers: 16 > [...] > [4179,965s][gc,phases ] GC(273) Termination (ms): Min: 0,03, Avg: 643,90, Max: 690,96, Diff: 690,93, Sum: 10302,47, Workers: 16 > > > Code root scan now reduces to ~22ms max on average in this case. > > Class unloading (breaking down the code cache flushing, i.e. `CodeCache::flush_unlinked_nmethods`): > > Clear Exception Caches 35,5ms > Unregister NMethods 598,5ms <---- this is nmethod unregistering. > Unregister Old NMethods 3,0ms > CodeBlob flush 41,1ms > CodeCache free 5730,3ms > > > With this change, the `unregister nmethods` phase takes ~25ms max on that stress test. @walulyai contributed this part. > > We have recently seen some imbalances in code root scan and long Remark pauses (thankfully not to that extreme) in other real-world applications too: > > [2466.979s][gc,phases ] GC(131) Code Root Scan (ms): Min: 0.0, Avg: 5.7, Max: 46.4, Diff: 46.4, Sum: 57.0, Workers: 10 > > > Some random comment: > * the mutex for the CHT had to be decreased in priority by one to not conflict with `CodeCache_lock`. This does not seem to be detrimental otherwise. At the same time, I had to move the locks at `nosafepoint-3` to `nosafepoint-4` as well to keep previous ordering. All mutexes with uses of `nosafepoint` as their rank seem to be good now. > > Testing: tier1-5 > > Thanks, > Thomas Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge branch 'master' into 8315503-code-root-scan-imbalance - iwalulya review - more (gtest) cleanup - iwalulya review - initial version that seems to work Contains kludge to avoid modification of currently scanned code root set. Ought to be fixed differently. Contains debug code in table scanners of CodeRootSet/CardSet to find out problems with table growing Hashcode hack for code root set, using copy&paste ZHash Shrink table after clean Bulk removal of nmethods from code root sets after class unloading. From Ivan. Cleanup, resize after bulk delete, hashcode verification ------------- Changes: https://git.openjdk.org/jdk/pull/15811/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15811&range=03 Stats: 458 lines in 23 files changed: 283 ins; 109 del; 66 mod Patch: https://git.openjdk.org/jdk/pull/15811.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15811/head:pull/15811 PR: https://git.openjdk.org/jdk/pull/15811 From rkennke at openjdk.org Mon Sep 25 14:23:13 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 25 Sep 2023 14:23:13 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 13:57:26 GMT, Martin Doerr wrote: > I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. > Currently only PPC64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). No, not really. C2 should only encounter balanced locking. If the TOS doesn't match, then the issue is a discrepancy elsewhere. See my comment in the ticket for a possible explanation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1733809724 From pchilanomate at openjdk.org Mon Sep 25 14:50:14 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 25 Sep 2023 14:50:14 GMT Subject: RFR: 8315966: Relativize initial_sp in interpreter frames In-Reply-To: References: Message-ID: On Tue, 19 Sep 2023 09:00:01 GMT, Fredrik Bredberg wrote: > Relativize initial_sp in interpreter frames. > > By changing the "initial_sp" (AKA "monitor_block_top" or "monitors" on PowerPC) member in interpreter frames from being an absolute address into an offset that is relative to the frame pointer, we don't need to change the value as we freeze and thaw frames of virtual threads. This is since we might freeze and thaw from and to different worker threads, so the absolute address to locals might change, but the offset from the frame pointer will be constant. > > This subtask only handles relativization of "initial_sp" and "monitor_block_top" since it's the same slot in interpreter frames (roughly the same as "monitors" on PowerPC). Relativization of other interpreter frame members are handled in other subtasks to JDK-8289296. > > Tested tier1-tier7 on supported platforms. The rest was sanity tested using Qemu. x86 and AArch64 parts look good to me. Thanks, Patricio ------------- Marked as reviewed by pchilanomate (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15815#pullrequestreview-1642348863 From jvernee at openjdk.org Mon Sep 25 15:05:19 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 25 Sep 2023 15:05:19 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v25] In-Reply-To: References: Message-ID: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: Split note about byte order/alignment out of header ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15103/files - new: https://git.openjdk.org/jdk/pull/15103/files/1c24f33e..49bdd953 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=23-24 Stats: 83 lines in 1 file changed: 42 ins; 2 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From jvernee at openjdk.org Mon Sep 25 15:05:21 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 25 Sep 2023 15:05:21 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v22] In-Reply-To: <17Hdk-QZiKVqpNzEV_v3vhd6uqDTdIdNpIYQmbETPNc=.ee3d0448-684d-4e67-abcf-f2af95bd1ea5@github.com> References: <57W4fhF3ZFjp-PYI4CoFbNtbBfyh2c0S46bRz1D8T0c=.ff1a4e84-e440-43cf-9366-7624e68cc609@github.com> <2i6hAPjL9qnofo6nFyUGjC9MBUxyYDtgWWDsvHtoBvc=.0a6f9585-319d-44e2-842e-adb61c80811a@github.com> <6XT6RHMxjba8n0P9rx7Pyy8Ot5VbdtupaTJikgYfeD0=.197bdfbe-e863-4529-97ed-c581c4a21d7d@github.com> <17Hdk-QZiKVqpNzEV_v3vhd6uqDTdIdNpIY QmbETPNc=.ee3d0448-684d-4e67-abcf-f2af95bd1ea5@github.com> Message-ID: On Fri, 22 Sep 2023 16:58:05 GMT, Maurizio Cimadamore wrote: >> Here you go: https://cr.openjdk.org/~jvernee/FFM_22_PR_v1/java.base/java/lang/foreign/SegmentAllocator.html#allocateFrom(java.lang.foreign.ValueLayout,java.lang.foreign.MemorySegment,java.lang.foreign.ValueLayout,long,long) > > Ok, now I'm more convinced that the method summary really does look bad (or worse, compared to 20). > > For instance [allocateFrom](https://cr.openjdk.org/~jvernee/FFM_22_PR_v1/java.base/java/lang/foreign/SegmentAllocator.html#allocateFrom(java.lang.foreign.ValueLayout.OfByte,byte...): > > > Returns a new memory segment with a byteSize() initialized with the provided E byte elements as specified by the provided layout (i.e. byte ordering, alignment and size). > > > (same is true for all the other array-accepting `allocateFrom` methods). This should be simplified to: > > > Returns a new memory segment initialized with the elements in the provided byte array. > > (then, if we want to say that the initialization honors the endianness of the provided layout, we can do so in a followup para, but the method summary should be simple). > > So, once all the array-accepting methods are fixed, the segment-accepting `allocateFrom` needs to be simplified to: > > > Returns a new memory segment initialized with the contents of the provided segment. I've updated the 'header' of these methods, and instead added a more elaborated comment in a second paragraph. The single value `allocateFrom` methods had the same issue. I fixed those as well See: https://github.com/openjdk/jdk/pull/15103/commits/49bdd953eaed8732b59a803516355abc8df31fa3 Here is the re-generated javadoc: https://cr.openjdk.org/~jvernee/FFM_22_PR/v2/java.base/java/lang/foreign/SegmentAllocator.html#allocateFrom(java.lang.foreign.ValueLayout.OfInt,int...) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1336017587 From jvernee at openjdk.org Mon Sep 25 15:09:09 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 25 Sep 2023 15:09:09 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v26] In-Reply-To: References: Message-ID: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: Tweak support for restricted methods Reviewed-by: jvernee, pminborg ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15103/files - new: https://git.openjdk.org/jdk/pull/15103/files/49bdd953..82a91258 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=24-25 Stats: 34 lines in 10 files changed: 3 ins; 11 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From tschatzl at openjdk.org Mon Sep 25 15:49:16 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 25 Sep 2023 15:49:16 GMT Subject: RFR: 8315503: G1: Code root scan causes long GC pauses due to imbalanced iteration [v4] In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 14:13:34 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that modifies the code root (remembered) set to use the CHT as internal representation. >> >> This removes lots of locking (inhibiting throughput), provides automatic balancing for the code root scan phase, and (parallel) bulk unregistering of nmethdos during code cache unloading improving performance of various pauses that deal with code root sets. >> >> With a stress test that frequently loads and unloads 6000 classes and associated methods from them we could previously see the following issues: >> >> During collection pauses: >> >> [4179,965s][gc,phases ] GC(273) Evacuate Collection Set: 812,18ms >> [..] >> [4179,965s][gc,phases ] GC(273) Code Root Scan (ms): Min: 0,00, Avg: 59,03, Max: 775,12, Diff: 775,12, Sum: 944,44, Workers: 16 >> [...] >> [4179,965s][gc,phases ] GC(273) Termination (ms): Min: 0,03, Avg: 643,90, Max: 690,96, Diff: 690,93, Sum: 10302,47, Workers: 16 >> >> >> Code root scan now reduces to ~22ms max on average in this case. >> >> Class unloading (breaking down the code cache flushing, i.e. `CodeCache::flush_unlinked_nmethods`): >> >> Clear Exception Caches 35,5ms >> Unregister NMethods 598,5ms <---- this is nmethod unregistering. >> Unregister Old NMethods 3,0ms >> CodeBlob flush 41,1ms >> CodeCache free 5730,3ms >> >> >> With this change, the `unregister nmethods` phase takes ~25ms max on that stress test. @walulyai contributed this part. >> >> We have recently seen some imbalances in code root scan and long Remark pauses (thankfully not to that extreme) in other real-world applications too: >> >> [2466.979s][gc,phases ] GC(131) Code Root Scan (ms): Min: 0.0, Avg: 5.7, Max: 46.4, Diff: 46.4, Sum: 57.0, Workers: 10 >> >> >> Some random comment: >> * the mutex for the CHT had to be decreased in priority by one to not conflict with `CodeCache_lock`. This does not seem to be detrimental otherwise. At the same time, I had to move the locks at `nosafepoint-3` to `nosafepoint-4` as well to keep previous ordering. All mutexes with uses of `nosafepoint` as their rank seem to be good now. >> >> Testing: tier1-5 >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge branch 'master' into 8315503-code-root-scan-imbalance > - iwalulya review - more (gtest) cleanup > - iwalulya review > - initial version that seems to work > > Contains kludge to avoid modification of currently scanned code root set. > Ought to be fixed differently. > > Contains debug code in table scanners of CodeRootSet/CardSet to find out problems with table growing > > Hashcode hack for code root set, using copy&paste ZHash > > Shrink table after clean > > Bulk removal of nmethods from code root sets after class unloading. From Ivan. > > Cleanup, resize after bulk delete, hashcode verification There is still a problem with imbalancedness of removing dead nmethods from code root remembered sets: currently they are distributed on a per-region basis, so if there are a few exceptionally large code root remsets, the other threads need to wait. I will file a CR to improve that (balancing within code root remembered sets). ------------- PR Comment: https://git.openjdk.org/jdk/pull/15811#issuecomment-1734009727 From mcimadamore at openjdk.org Mon Sep 25 16:47:26 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 25 Sep 2023 16:47:26 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v26] In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 15:09:09 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > Tweak support for restricted methods > > Reviewed-by: jvernee, pminborg src/java.base/share/classes/java/lang/foreign/SegmentAllocator.java line 336: > 334: * {@return a new memory segment initialized with the contents of the provided segment.} > 335: *

> 336: * The size of the allocated memory segment is the {@code elementLayout.byteSize() * elementCount}. Suggestion: * The size of the allocated memory segment is {@code elementLayout.byteSize() * elementCount}. src/java.base/share/classes/java/lang/foreign/SegmentAllocator.java line 378: > 376: * {@return a new memory segment initialized with the elements in the provided byte array.} > 377: *

> 378: * The size of the allocated memory segment is the {@code elementLayout.byteSize() * elements.length}. Suggestion: * The size of the allocated memory segment is {@code elementLayout.byteSize() * elements.length}. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1336148784 PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1336149140 From mcimadamore at openjdk.org Mon Sep 25 16:47:28 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 25 Sep 2023 16:47:28 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v26] In-Reply-To: References: Message-ID: <7g-pbFrnyMYtXK4894kdPWvsXJ79YdIp538CXAAiswU=.b65efe29-615e-45c3-abab-bd44d4c91bca@github.com> On Mon, 25 Sep 2023 16:44:01 GMT, Maurizio Cimadamore wrote: >> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: >> >> Tweak support for restricted methods >> >> Reviewed-by: jvernee, pminborg > > src/java.base/share/classes/java/lang/foreign/SegmentAllocator.java line 378: > >> 376: * {@return a new memory segment initialized with the elements in the provided byte array.} >> 377: *

>> 378: * The size of the allocated memory segment is the {@code elementLayout.byteSize() * elements.length}. > > Suggestion: > > * The size of the allocated memory segment is {@code elementLayout.byteSize() * elements.length}. Here and also in all the other array-accepting methods. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1336149494 From rkennke at openjdk.org Mon Sep 25 16:47:50 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 25 Sep 2023 16:47:50 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v60] In-Reply-To: References: Message-ID: <8B41q1u51ZS-4qeVtmsDm6Y4IdKlykuMkVjld8350wg=.b064e79a-ddf9-409c-b92b-94017aabed6e@github.com> > See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. > > Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. > > Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. > > Testing: > - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] tier1 (x86_64, x86_32, aarch64, riscv) > - [x] tier2 (x86_64, aarch64, riscv) > - [x] tier3 (x86_64, riscv) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix ARM build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11044/files - new: https://git.openjdk.org/jdk/pull/11044/files/8617a596..c117e394 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11044&range=59 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11044&range=58-59 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11044.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/11044/head:pull/11044 PR: https://git.openjdk.org/jdk/pull/11044 From jvernee at openjdk.org Mon Sep 25 16:54:10 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 25 Sep 2023 16:54:10 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v27] In-Reply-To: References: Message-ID: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: fix typos ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15103/files - new: https://git.openjdk.org/jdk/pull/15103/files/82a91258..0244845a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=25-26 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From iwalulya at openjdk.org Mon Sep 25 17:30:14 2023 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 25 Sep 2023 17:30:14 GMT Subject: RFR: 8315503: G1: Code root scan causes long GC pauses due to imbalanced iteration [v4] In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 14:13:34 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that modifies the code root (remembered) set to use the CHT as internal representation. >> >> This removes lots of locking (inhibiting throughput), provides automatic balancing for the code root scan phase, and (parallel) bulk unregistering of nmethdos during code cache unloading improving performance of various pauses that deal with code root sets. >> >> With a stress test that frequently loads and unloads 6000 classes and associated methods from them we could previously see the following issues: >> >> During collection pauses: >> >> [4179,965s][gc,phases ] GC(273) Evacuate Collection Set: 812,18ms >> [..] >> [4179,965s][gc,phases ] GC(273) Code Root Scan (ms): Min: 0,00, Avg: 59,03, Max: 775,12, Diff: 775,12, Sum: 944,44, Workers: 16 >> [...] >> [4179,965s][gc,phases ] GC(273) Termination (ms): Min: 0,03, Avg: 643,90, Max: 690,96, Diff: 690,93, Sum: 10302,47, Workers: 16 >> >> >> Code root scan now reduces to ~22ms max on average in this case. >> >> Class unloading (breaking down the code cache flushing, i.e. `CodeCache::flush_unlinked_nmethods`): >> >> Clear Exception Caches 35,5ms >> Unregister NMethods 598,5ms <---- this is nmethod unregistering. >> Unregister Old NMethods 3,0ms >> CodeBlob flush 41,1ms >> CodeCache free 5730,3ms >> >> >> With this change, the `unregister nmethods` phase takes ~25ms max on that stress test. @walulyai contributed this part. >> >> We have recently seen some imbalances in code root scan and long Remark pauses (thankfully not to that extreme) in other real-world applications too: >> >> [2466.979s][gc,phases ] GC(131) Code Root Scan (ms): Min: 0.0, Avg: 5.7, Max: 46.4, Diff: 46.4, Sum: 57.0, Workers: 10 >> >> >> Some random comment: >> * the mutex for the CHT had to be decreased in priority by one to not conflict with `CodeCache_lock`. This does not seem to be detrimental otherwise. At the same time, I had to move the locks at `nosafepoint-3` to `nosafepoint-4` as well to keep previous ordering. All mutexes with uses of `nosafepoint` as their rank seem to be good now. >> >> Testing: tier1-5 >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge branch 'master' into 8315503-code-root-scan-imbalance > - iwalulya review - more (gtest) cleanup > - iwalulya review > - initial version that seems to work > > Contains kludge to avoid modification of currently scanned code root set. > Ought to be fixed differently. > > Contains debug code in table scanners of CodeRootSet/CardSet to find out problems with table growing > > Hashcode hack for code root set, using copy&paste ZHash > > Shrink table after clean > > Bulk removal of nmethods from code root sets after class unloading. From Ivan. > > Cleanup, resize after bulk delete, hashcode verification Still LGTM! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15811#issuecomment-1734175896 From mdoerr at openjdk.org Mon Sep 25 21:05:12 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 25 Sep 2023 21:05:12 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object In-Reply-To: References: Message-ID: <2lELFxO9u4H6i8zd_kOLpNfL4h3zipdTcZwk0Qzju60=.ede06eab-c12a-4c49-bf99-2eb04c86227a@github.com> On Mon, 25 Sep 2023 14:20:33 GMT, Roman Kennke wrote: > No, not really. C2 should only encounter balanced locking. If the TOS doesn't match, then the issue is a discrepancy elsewhere. See my comment in the ticket for a possible explanation. Handling ANONYMOUS_OWNER in the C2 monitor unlocking code doesn't solve this issue, but this PR does. Please see my reply in the JBS issue. Also note that other tests and benchmarks are stable on PPC64. We should have seen more problems if the implementation had such a fundamental bug. This issue only shows up in jdi tests which do special things. Will need to look more what exactly they do. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1734457622 From mdoerr at openjdk.org Mon Sep 25 21:14:11 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 25 Sep 2023 21:14:11 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v2] In-Reply-To: References: Message-ID: > I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. > Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Add x86_64 and aarch64 implementation. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15903/files - new: https://git.openjdk.org/jdk/pull/15903/files/afa0e86f..83da590b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15903&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15903&range=00-01 Stats: 36 lines in 2 files changed: 21 ins; 14 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15903.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15903/head:pull/15903 PR: https://git.openjdk.org/jdk/pull/15903 From mdoerr at openjdk.org Mon Sep 25 21:14:12 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 25 Sep 2023 21:14:12 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 13:57:26 GMT, Martin Doerr wrote: > I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. > Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). I've added x86_64 and aarch64 support in case somebody else has seen similar problems and would like to try this patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1734465087 From sviswanathan at openjdk.org Mon Sep 25 23:16:17 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 25 Sep 2023 23:16:17 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v2] In-Reply-To: <8AUhJXT3sS9-gohY9kANLReqbUXcA28xNPiI2DPYE_k=.6ca0e589-156a-4085-8977-c55a3f95ec79@github.com> References: <8AUhJXT3sS9-gohY9kANLReqbUXcA28xNPiI2DPYE_k=.6ca0e589-156a-4085-8977-c55a3f95ec79@github.com> Message-ID: On Wed, 13 Sep 2023 20:25:22 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: >> >> |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup >> |-------------|------------|---------------|------------------|-----------| >> |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 >> full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 >> small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 >> small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 >> full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 >> full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 >> small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 >> small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 >> full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 >> small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 >> small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 >> full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 >> full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 >> small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 >> small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 >> full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 >> small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 >> small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 >> full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 >> full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 >> small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 >> small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 >> full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 >> small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 >> small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 >> full.AES... > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Removed isEncrypt boolean variable src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 84: > 82: } > 83: > 84: ATTRIBUTE_ALIGNED(16) uint64_t COUNTER_MASK_LINC1F[] = { Please also update the copyright year of stubGenerator_x86_64_aes.cpp. src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 4167: > 4165: const Register pos = rax; > 4166: const Register rounds = r10; > 4167: const XMMRegister ctr_blockx = xmm9; It will be good to use the ctr_blockx consistently across instead of xmm9 below. src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 4174: > 4172: //Macro flow: > 4173: //calculate the number of 16byte blocks in the message > 4174: //process 8 16 byte blocks in initial_num_blocks.' The character ' at the end of the line seems extra. src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 4193: > 4191: > 4192: //Save the amount of data left to process in r14 > 4193: __ mov(r14, len); It looks to me that you could use len directly without moving it to r14. src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 4195: > 4193: __ mov(r14, len); > 4194: > 4195: initial_blocks(xmm9, rounds, key, r14, in, out, ct, subkeyHtbl, pos); For each of the method definitions (initial_blocks, ghash8_encrypt8_parallel, ghash_last8, generateHtbl_8_block_avx2) it would be good to add a comment at the beginning of the method indicating the inputs, outputs, and temporary registers. src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 4213: > 4211: __ jcc(Assembler::greater, encrypt_by_8); > 4212: > 4213: __ addl(r15, 8); Should this be addb(r15, 8)? src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 4226: > 4224: __ vpshufb(xmm9, xmm9, ExternalAddress(counter_shuffle_mask_addr()), Assembler::AVX_128bit, rbx /*rscratch*/); > 4225: > 4226: __ addl(r15, 8); Should this be addb(r15, 8)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1336467494 PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1336425142 PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1336377175 PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1336397169 PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1336401373 PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1336421620 PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1336421899 From dlong at openjdk.org Tue Sep 26 01:08:24 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 26 Sep 2023 01:08:24 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v2] In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 21:14:11 GMT, Martin Doerr wrote: >> I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. >> Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add x86_64 and aarch64 implementation. As Roman said, the C2 locks should be balanced, and the lock stack looks correct: nsk.share.jdi.EventHandler.run locks listeners first, then the EventHandler. Maybe something is going wrong in deoptimization? Pushing this change without understanding the cause could make it harder to find the real problem. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1734677034 From duke at openjdk.org Tue Sep 26 01:41:20 2023 From: duke at openjdk.org (Logan Abernathy) Date: Tue, 26 Sep 2023 01:41:20 GMT Subject: RFR: 8316743: RISC-V: Change UseVectorizedMismatchIntrinsic option result to warning In-Reply-To: References: Message-ID: On Fri, 22 Sep 2023 14:17:40 GMT, Ilya Gavrilin wrote: > Please review this small change for UseVectorizedMismatchIntrinsic option. > On RISC-V we do not have VectorizedMismatch intrinsic, so `void LIRGenerator::do_vectorizedMismatch(Intrinsic* x)` prodeuces fatal error when this option turned on. > Other similar options (like -XX:+UseCRC32Intrinsics) produces only warning: https://github.com/openjdk/jdk/blob/c90d63105ca774c047d5f5a4348aa657efc57953/src/hotspot/cpu/riscv/vm_version_riscv.cpp#L150-L183 > Also, on platforms, where VectorizedMismatch unimplemented to we got warning. Marked as reviewed by M4ximumPizza at github.com (no known OpenJDK username). ------------- PR Review: https://git.openjdk.org/jdk/pull/15890#pullrequestreview-1643202886 From dholmes at openjdk.org Tue Sep 26 02:10:22 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 26 Sep 2023 02:10:22 GMT Subject: RFR: 8299915: Remove ArrayAllocatorMallocLimit and associated code [v2] In-Reply-To: <4i265Gn613urJfDewi3W2U4-dHgyjkp9Ejs13Cxm5Gs=.0e3a2de3-de51-4eb9-a4e9-fa5b9173b830@github.com> References: <4i265Gn613urJfDewi3W2U4-dHgyjkp9Ejs13Cxm5Gs=.0e3a2de3-de51-4eb9-a4e9-fa5b9173b830@github.com> Message-ID: <_dZ75oFfvcVHw8v0GG_UTCpfWyN8rvVYqZmE8jzVD3I=.71f479ae-2006-4642-a71d-ccfacf4f7c36@github.com> On Mon, 25 Sep 2023 09:20:20 GMT, Afshin Zafari wrote: >> 1. `ArrayAllocatorMallocLimit` is removed. The test cases that tested it also are removed. >> 2. `AllocArrayAllocator` instances are replaced with `MallocArrayAllocator`. >> 3. The signature of `CHeapBitMap::free(ptr, size)` is kept as it is, since it is called in this way from `GrowableBitMap::resize`, where `T` can be also `ArenaBitMap` and `ResourceBitMap`. However, it uses `MallocArrayAllocator::free(ptr)` and ignores the `size`: >> ```C++ >> void CHeapBitMap::free(bm_word_t* map, idx_t size_in_words) const { >> MallocArrayAllocator::free(map); >> } >> >> ### Test >> tiers1-4 passed on all platforms. > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > other size_t flags than the ArrayAllocatorMallocLimit are used in tests. test/hotspot/jtreg/serviceability/attach/AttachSetGetFlag.java line 62: > 60: // Test a non-manageable size_t flag. > 61: // Since it is not manageable, we can't test the setFlag functionality. > 62: testGetFlag("StringDeduplicationCleanupDeadMinimum", "128"); A non-experimental flag, like MetaspaceSize, might be better long term in case the experimental flag get removed again. test/lib-test/jdk/test/whitebox/vm_flags/SizeTTest.java line 38: > 36: > 37: public class SizeTTest { > 38: private static final String FLAG_NAME = "StringDeduplicationCleanupDeadMinimum"; Again a non-experimental flag, like MetaspaceSize, might be better here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15859#discussion_r1336545378 PR Review Comment: https://git.openjdk.org/jdk/pull/15859#discussion_r1336546774 From jwaters at openjdk.org Tue Sep 26 02:58:43 2023 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 26 Sep 2023 02:58:43 GMT Subject: RFR: 8316930: HotSpot should use noexcept instead of throw() Message-ID: throw() has been deprecated since C++11 alongside dynamic exception specifications, we should replace all instances of it with noexcept to prepare HotSpot for later versions of C++ ------------- Commit messages: - Formatting - throw () should be swapped to noexcept Changes: https://git.openjdk.org/jdk/pull/15910/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15910&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316930 Stats: 94 lines in 39 files changed: 0 ins; 0 del; 94 mod Patch: https://git.openjdk.org/jdk/pull/15910.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15910/head:pull/15910 PR: https://git.openjdk.org/jdk/pull/15910 From amitkumar at openjdk.org Tue Sep 26 03:44:22 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 26 Sep 2023 03:44:22 GMT Subject: RFR: 8308479: [s390x] Implement alternative fast-locking scheme [v8] In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 05:44:04 GMT, Amit Kumar wrote: >> This PR implements new fast-locking scheme for s390x. Additionally few parameters have been renamed to be in sync with PPC. >> >> Testing done (for release, fastdebug and slowdebug build): >> All `test/jdk/java/util/concurrent` test with parameters: >> * LockingMode=2 >> * LockingMode=2 with -Xint >> * LockingMode=2 with -XX:TieredStopAtLevel=1 >> * LockingMode=2 with -XX:-TieredCompilation >> >> Result is consistently similar to Aarch(MacOS) and PPC, All of 124 tests are passing except `MapLoops.java` because in the 2nd part for this testcase, jvm starts with `HeavyMonitors` which conflict with `LockingMode=2` >> >> BenchMark Result for Renaissance-jmh: >> >> | Benchmark | Without fastLock (ms/op) | With fastLock (ms/op) | Improvement | >> |------------------------------------------|-------------------------|----------------------|-------------| >> | o.r.actors.JmhAkkaUct.runOperation | 1565.080 | 1365.877 | 12.70% | >> | o.r.actors.JmhReactors.runOperation | 9316.972 | 10592.982 | -13.70% | >> | o.r.jdk.concurrent.JmhFjKmeans.runOperation | 1257.183 | 1235.530 | 1.73% | >> | o.r.jdk.concurrent.JmhFutureGenetic.runOperation | 1925.158 | 2073.066 | -7.69% | >> | o.r.jdk.streams.JmhParMnemonics.runOperation | 2746.664 | 2836.085 | -3.24% | >> | o.r.jdk.streams.JmhScrabble.runOperation | 76.774 | 74.239 | 3.31% | >> | o.r.rx.JmhRxScrabble.runOperation | 162.270 | 167.061 | -2.96% | >> | o.r.scala.sat.JmhScalaDoku.runOperation | 3333.711 | 3271.078 | 1.88% | >> | o.r.scala.stdlib.JmhScalaKmeans.runOperation | 182.746 | 182.153 | 0.33% | >> | o.r.scala.stm.JmhPhilosophers.runOperation | 15003.329 | 13396.921 | 10.57% | >> | o.r.scala.stm.JmhScalaStmBench7.runOperation | 1669.090 | 1579.900 | 5.34% | >> | o.r.twitter.finagle.JmhFinagleChirper.runOperation | 9601.963 | 10034.404 | -4.52% | >> | o.r.twitter.finagle.JmhFinagleHttp.runOperation | 4403.725 | 4746.707 | -7.79% | >> >> >> DaCapo Benchmark Result: >> >> | Benchmark | Without fast lock (msec) | With fast lock (msec) | Improvement | >> |--... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > suggestions from Martin Lutz, Martin, Thanks for reviews and help. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14414#issuecomment-1734776014 From amitkumar at openjdk.org Tue Sep 26 03:44:24 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 26 Sep 2023 03:44:24 GMT Subject: Integrated: 8308479: [s390x] Implement alternative fast-locking scheme In-Reply-To: References: Message-ID: On Mon, 12 Jun 2023 11:04:53 GMT, Amit Kumar wrote: > This PR implements new fast-locking scheme for s390x. Additionally few parameters have been renamed to be in sync with PPC. > > Testing done (for release, fastdebug and slowdebug build): > All `test/jdk/java/util/concurrent` test with parameters: > * LockingMode=2 > * LockingMode=2 with -Xint > * LockingMode=2 with -XX:TieredStopAtLevel=1 > * LockingMode=2 with -XX:-TieredCompilation > > Result is consistently similar to Aarch(MacOS) and PPC, All of 124 tests are passing except `MapLoops.java` because in the 2nd part for this testcase, jvm starts with `HeavyMonitors` which conflict with `LockingMode=2` > > BenchMark Result for Renaissance-jmh: > > | Benchmark | Without fastLock (ms/op) | With fastLock (ms/op) | Improvement | > |------------------------------------------|-------------------------|----------------------|-------------| > | o.r.actors.JmhAkkaUct.runOperation | 1565.080 | 1365.877 | 12.70% | > | o.r.actors.JmhReactors.runOperation | 9316.972 | 10592.982 | -13.70% | > | o.r.jdk.concurrent.JmhFjKmeans.runOperation | 1257.183 | 1235.530 | 1.73% | > | o.r.jdk.concurrent.JmhFutureGenetic.runOperation | 1925.158 | 2073.066 | -7.69% | > | o.r.jdk.streams.JmhParMnemonics.runOperation | 2746.664 | 2836.085 | -3.24% | > | o.r.jdk.streams.JmhScrabble.runOperation | 76.774 | 74.239 | 3.31% | > | o.r.rx.JmhRxScrabble.runOperation | 162.270 | 167.061 | -2.96% | > | o.r.scala.sat.JmhScalaDoku.runOperation | 3333.711 | 3271.078 | 1.88% | > | o.r.scala.stdlib.JmhScalaKmeans.runOperation | 182.746 | 182.153 | 0.33% | > | o.r.scala.stm.JmhPhilosophers.runOperation | 15003.329 | 13396.921 | 10.57% | > | o.r.scala.stm.JmhScalaStmBench7.runOperation | 1669.090 | 1579.900 | 5.34% | > | o.r.twitter.finagle.JmhFinagleChirper.runOperation | 9601.963 | 10034.404 | -4.52% | > | o.r.twitter.finagle.JmhFinagleHttp.runOperation | 4403.725 | 4746.707 | -7.79% | > > > DaCapo Benchmark Result: > > | Benchmark | Without fast lock (msec) | With fast lock (msec) | Improvement | > |--------------------------|-------------------------|-----------------... This pull request has now been integrated. Changeset: 3fe6e0fa Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/3fe6e0faca78e8106e33a3a53de78f8864be92b7 Stats: 340 lines in 7 files changed: 206 ins; 21 del; 113 mod 8308479: [s390x] Implement alternative fast-locking scheme Reviewed-by: lucy, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/14414 From jwaters at openjdk.org Tue Sep 26 03:50:14 2023 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 26 Sep 2023 03:50:14 GMT Subject: RFR: 8316930: HotSpot should use noexcept instead of throw() [v2] In-Reply-To: References: Message-ID: <5ctCpwKXcy9ywwvThRNzl6s_Bn7rHWMFtXdmqWbjq50=.eedf46de-165a-4e7e-b2d2-dcf5ce5d153a@github.com> > throw() has been deprecated since C++11 alongside dynamic exception specifications, we should replace all instances of it with noexcept to prepare HotSpot for later versions of C++ Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Nevermind, this looks better ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15910/files - new: https://git.openjdk.org/jdk/pull/15910/files/7f1dff96..acbc449c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15910&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15910&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15910.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15910/head:pull/15910 PR: https://git.openjdk.org/jdk/pull/15910 From dholmes at openjdk.org Tue Sep 26 06:30:13 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 26 Sep 2023 06:30:13 GMT Subject: RFR: 8316735: Print LockStack in hs_err files [v2] In-Reply-To: <3rU128P35kRKsdJ4IqrnTAbrWAZqrep4wD53q7JK1YI=.550e838a-d88b-4d0a-81b5-cb611259f2b6@github.com> References: <3rU128P35kRKsdJ4IqrnTAbrWAZqrep4wD53q7JK1YI=.550e838a-d88b-4d0a-81b5-cb611259f2b6@github.com> Message-ID: <8NXEMcD_AgIFZVFYkwP1fXaSu1cVnDq7iF0CAWttCBk=.7a5af773-1ce1-4871-b716-fd7c5001e2f5@github.com> On Mon, 25 Sep 2023 11:58:47 GMT, Martin Doerr wrote: >> Example output: >> >> Lock stack of current Java thread (top to bottom): >> LockStack[1]: nsk.share.jdi.EventHandler >> {0x00000000bcc28198} - klass: 'nsk/share/jdi/EventHandler' >> - ---- fields (total size 5 words): >> - private volatile 'wasInterrupted' 'Z' @12 false (0x00) >> - private 'debuggee' 'Lnsk/share/jdi/Debugee;' @16 a 'nsk/share/jdi/LocalLaunchedDebugee'{0x00000000bcc08c18} (0xbcc08c18) >> - private 'log' 'Lnsk/share/Log;' @20 a 'nsk/share/Log'{0x00000000bcc08cb0} (0xbcc08cb0) >> - private 'vm' 'Lcom/sun/jdi/VirtualMachine;' @24 a 'com/sun/tools/jdi/VirtualMachineImpl'{0x00000000bccb3d60} (0xbccb3d60) >> - private 'requestManager' 'Lcom/sun/jdi/request/EventRequestManager;' @28 a 'com/sun/tools/jdi/EventRequestManagerImpl'{0x00000000bccb56f8} (0xbccb56f8) >> - private 'listenThread' 'Ljava/lang/Thread;' @32 a 'java/lang/Thread'{0x00000000bcc280e8} (0xbcc280e8) >> LockStack[0]: java.util.Collections$SynchronizedRandomAccessList >> {0x00000000bcb163e8} - klass: 'java/util/Collections$SynchronizedRandomAccessList' >> - ---- fields (total size 3 words): >> - final 'c' 'Ljava/util/Collection;' @12 a 'java/util/Vector'{0x00000000bcb16400} (0xbcb16400) >> - final 'mutex' 'Ljava/lang/Object;' @16 a 'java/util/Collections$SynchronizedRandomAccessList'{0x00000000bcb163e8} (0xbcb163e8) >> - final 'list' 'Ljava/util/List;' @20 a 'java/util/Vector'{0x00000000bcb16400} (0xbcb16400) > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Move up and change wording. Updates looks good - thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15884#pullrequestreview-1643433658 From dholmes at openjdk.org Tue Sep 26 06:30:14 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 26 Sep 2023 06:30:14 GMT Subject: RFR: 8316735: Print LockStack in hs_err files [v2] In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 10:50:45 GMT, Martin Doerr wrote: >> src/hotspot/share/runtime/lockStack.hpp line 33: >> >>> 31: #include "utilities/sizes.hpp" >>> 32: >>> 33: class JavaThread; >> >> Seems an unrelated change. > > This is a minor cleanup of the prototypes. lockStack.hpp uses `JavaThread*`, not `Thread*`. Do you prefer not to touch it in this PR? If it uses `JavaThread` but only had `Thread` then it must be including it some other way, so the prototype seems unnecessary. But okay to leave it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15884#discussion_r1336681164 From dholmes at openjdk.org Tue Sep 26 06:34:10 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 26 Sep 2023 06:34:10 GMT Subject: RFR: 8316735: Print LockStack in hs_err files [v2] In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 11:10:10 GMT, Martin Doerr wrote: > I think that the fast locked objects can already be found by thread dumps by inspecting the frames. But, I haven't checked. Should we file a new issue for investigating? I'm not sure how we could present the information in the context of the thread stack dump, even though it might be useful to validate it against the locking information found in the frames themselves. It is probably too disruptive to just jam it into the existing output. We can revisit this later if we find we are missing the information. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15884#issuecomment-1734909378 From mdoerr at openjdk.org Tue Sep 26 08:12:13 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 26 Sep 2023 08:12:13 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v2] In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 21:14:11 GMT, Martin Doerr wrote: >> I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. >> Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add x86_64 and aarch64 implementation. I'm not planning to push it without having understood what the problem really is. This PR may be interesting for other people, too. That's why I've created it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1735041840 From azafari at openjdk.org Tue Sep 26 08:17:17 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 26 Sep 2023 08:17:17 GMT Subject: RFR: 8299915: Remove ArrayAllocatorMallocLimit and associated code [v3] In-Reply-To: References: Message-ID: > 1. `ArrayAllocatorMallocLimit` is removed. The test cases that tested it also are removed. > 2. `AllocArrayAllocator` instances are replaced with `MallocArrayAllocator`. > 3. The signature of `CHeapBitMap::free(ptr, size)` is kept as it is, since it is called in this way from `GrowableBitMap::resize`, where `T` can be also `ArenaBitMap` and `ResourceBitMap`. However, it uses `MallocArrayAllocator::free(ptr)` and ignores the `size`: > ```C++ > void CHeapBitMap::free(bm_word_t* map, idx_t size_in_words) const { > MallocArrayAllocator::free(map); > } > > ### Test > tiers1-4 passed on all platforms. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: The size_t LargePageSizeInBytes option is used. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15859/files - new: https://git.openjdk.org/jdk/pull/15859/files/623d076f..4a739243 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15859&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15859&range=01-02 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15859.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15859/head:pull/15859 PR: https://git.openjdk.org/jdk/pull/15859 From azafari at openjdk.org Tue Sep 26 08:19:17 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 26 Sep 2023 08:19:17 GMT Subject: RFR: 8299915: Remove ArrayAllocatorMallocLimit and associated code [v2] In-Reply-To: <_dZ75oFfvcVHw8v0GG_UTCpfWyN8rvVYqZmE8jzVD3I=.71f479ae-2006-4642-a71d-ccfacf4f7c36@github.com> References: <4i265Gn613urJfDewi3W2U4-dHgyjkp9Ejs13Cxm5Gs=.0e3a2de3-de51-4eb9-a4e9-fa5b9173b830@github.com> <_dZ75oFfvcVHw8v0GG_UTCpfWyN8rvVYqZmE8jzVD3I=.71f479ae-2006-4642-a71d-ccfacf4f7c36@github.com> Message-ID: On Tue, 26 Sep 2023 02:04:14 GMT, David Holmes wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> other size_t flags than the ArrayAllocatorMallocLimit are used in tests. > > test/hotspot/jtreg/serviceability/attach/AttachSetGetFlag.java line 62: > >> 60: // Test a non-manageable size_t flag. >> 61: // Since it is not manageable, we can't test the setFlag functionality. >> 62: testGetFlag("StringDeduplicationCleanupDeadMinimum", "128"); > > A non-experimental flag, like MetaspaceSize, might be better long term in case the experimental flag get removed again. When `MetaspaceSize` is set to 0, following assertion raised at metaspace.cpp: 316 ```C++ size_t MetaspaceGC::capacity_until_GC() { size_t value = Atomic::load_acquire(&_capacity_until_GC); assert(value >= MetaspaceSize, "Not initialized properly?"); // <----- return value; } `LargePageSizeInBytes` is used instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15859#discussion_r1336812986 From azafari at openjdk.org Tue Sep 26 08:35:10 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 26 Sep 2023 08:35:10 GMT Subject: RFR: 8299915: Remove ArrayAllocatorMallocLimit and associated code [v4] In-Reply-To: References: Message-ID: > 1. `ArrayAllocatorMallocLimit` is removed. The test cases that tested it also are removed. > 2. `AllocArrayAllocator` instances are replaced with `MallocArrayAllocator`. > 3. The signature of `CHeapBitMap::free(ptr, size)` is kept as it is, since it is called in this way from `GrowableBitMap::resize`, where `T` can be also `ArenaBitMap` and `ResourceBitMap`. However, it uses `MallocArrayAllocator::free(ptr)` and ignores the `size`: > ```C++ > void CHeapBitMap::free(bm_word_t* map, idx_t size_in_words) const { > MallocArrayAllocator::free(map); > } > > ### Test > tiers1-4 passed on all platforms. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: MetaspaceSize and its lower bound is used. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15859/files - new: https://git.openjdk.org/jdk/pull/15859/files/4a739243..e2acfcb8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15859&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15859&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15859.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15859/head:pull/15859 PR: https://git.openjdk.org/jdk/pull/15859 From azafari at openjdk.org Tue Sep 26 08:35:14 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 26 Sep 2023 08:35:14 GMT Subject: RFR: 8299915: Remove ArrayAllocatorMallocLimit and associated code [v2] In-Reply-To: <_dZ75oFfvcVHw8v0GG_UTCpfWyN8rvVYqZmE8jzVD3I=.71f479ae-2006-4642-a71d-ccfacf4f7c36@github.com> References: <4i265Gn613urJfDewi3W2U4-dHgyjkp9Ejs13Cxm5Gs=.0e3a2de3-de51-4eb9-a4e9-fa5b9173b830@github.com> <_dZ75oFfvcVHw8v0GG_UTCpfWyN8rvVYqZmE8jzVD3I=.71f479ae-2006-4642-a71d-ccfacf4f7c36@github.com> Message-ID: On Tue, 26 Sep 2023 02:07:41 GMT, David Holmes wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> other size_t flags than the ArrayAllocatorMallocLimit are used in tests. > > test/lib-test/jdk/test/whitebox/vm_flags/SizeTTest.java line 38: > >> 36: >> 37: public class SizeTTest { >> 38: private static final String FLAG_NAME = "StringDeduplicationCleanupDeadMinimum"; > > Again a non-experimental flag, like MetaspaceSize, might be better here. `MetaspaceSize` is used with its lower bound 65536. Otherwise, the test fails. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15859#discussion_r1336830220 From tschatzl at openjdk.org Tue Sep 26 08:42:09 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 26 Sep 2023 08:42:09 GMT Subject: RFR: 8315503: G1: Code root scan causes long GC pauses due to imbalanced iteration [v5] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that modifies the code root (remembered) set to use the CHT as internal representation. > > This removes lots of locking (inhibiting throughput), provides automatic balancing for the code root scan phase, and (parallel) bulk unregistering of nmethdos during code cache unloading improving performance of various pauses that deal with code root sets. > > With a stress test that frequently loads and unloads 6000 classes and associated methods from them we could previously see the following issues: > > During collection pauses: > > [4179,965s][gc,phases ] GC(273) Evacuate Collection Set: 812,18ms > [..] > [4179,965s][gc,phases ] GC(273) Code Root Scan (ms): Min: 0,00, Avg: 59,03, Max: 775,12, Diff: 775,12, Sum: 944,44, Workers: 16 > [...] > [4179,965s][gc,phases ] GC(273) Termination (ms): Min: 0,03, Avg: 643,90, Max: 690,96, Diff: 690,93, Sum: 10302,47, Workers: 16 > > > Code root scan now reduces to ~22ms max on average in this case. > > Class unloading (breaking down the code cache flushing, i.e. `CodeCache::flush_unlinked_nmethods`): > > Clear Exception Caches 35,5ms > Unregister NMethods 598,5ms <---- this is nmethod unregistering. > Unregister Old NMethods 3,0ms > CodeBlob flush 41,1ms > CodeCache free 5730,3ms > > > With this change, the `unregister nmethods` phase takes ~25ms max on that stress test. @walulyai contributed this part. > > We have recently seen some imbalances in code root scan and long Remark pauses (thankfully not to that extreme) in other real-world applications too: > > [2466.979s][gc,phases ] GC(131) Code Root Scan (ms): Min: 0.0, Avg: 5.7, Max: 46.4, Diff: 46.4, Sum: 57.0, Workers: 10 > > > Some random comment: > * the mutex for the CHT had to be decreased in priority by one to not conflict with `CodeCache_lock`. This does not seem to be detrimental otherwise. At the same time, I had to move the locks at `nosafepoint-3` to `nosafepoint-4` as well to keep previous ordering. All mutexes with uses of `nosafepoint` as their rank seem to be good now. > > Testing: tier1-5 > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: unregister-nmethod still called during nmethod::flush ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15811/files - new: https://git.openjdk.org/jdk/pull/15811/files/3ac3b0c4..e0588160 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15811&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15811&range=03-04 Stats: 18 lines in 10 files changed: 1 ins; 0 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/15811.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15811/head:pull/15811 PR: https://git.openjdk.org/jdk/pull/15811 From tschatzl at openjdk.org Tue Sep 26 08:46:16 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 26 Sep 2023 08:46:16 GMT Subject: RFR: 8315503: G1: Code root scan causes long GC pauses due to imbalanced iteration [v5] In-Reply-To: References: Message-ID: On Tue, 26 Sep 2023 08:42:09 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that modifies the code root (remembered) set to use the CHT as internal representation. >> >> This removes lots of locking (inhibiting throughput), provides automatic balancing for the code root scan phase, and (parallel) bulk unregistering of nmethdos during code cache unloading improving performance of various pauses that deal with code root sets. >> >> With a stress test that frequently loads and unloads 6000 classes and associated methods from them we could previously see the following issues: >> >> During collection pauses: >> >> [4179,965s][gc,phases ] GC(273) Evacuate Collection Set: 812,18ms >> [..] >> [4179,965s][gc,phases ] GC(273) Code Root Scan (ms): Min: 0,00, Avg: 59,03, Max: 775,12, Diff: 775,12, Sum: 944,44, Workers: 16 >> [...] >> [4179,965s][gc,phases ] GC(273) Termination (ms): Min: 0,03, Avg: 643,90, Max: 690,96, Diff: 690,93, Sum: 10302,47, Workers: 16 >> >> >> Code root scan now reduces to ~22ms max on average in this case. >> >> Class unloading (breaking down the code cache flushing, i.e. `CodeCache::flush_unlinked_nmethods`): >> >> Clear Exception Caches 35,5ms >> Unregister NMethods 598,5ms <---- this is nmethod unregistering. >> Unregister Old NMethods 3,0ms >> CodeBlob flush 41,1ms >> CodeCache free 5730,3ms >> >> >> With this change, the `unregister nmethods` phase takes ~25ms max on that stress test. @walulyai contributed this part. >> >> We have recently seen some imbalances in code root scan and long Remark pauses (thankfully not to that extreme) in other real-world applications too: >> >> [2466.979s][gc,phases ] GC(131) Code Root Scan (ms): Min: 0.0, Avg: 5.7, Max: 46.4, Diff: 46.4, Sum: 57.0, Workers: 10 >> >> >> Some random comment: >> * the mutex for the CHT had to be decreased in priority by one to not conflict with `CodeCache_lock`. This does not seem to be detrimental otherwise. At the same time, I had to move the locks at `nosafepoint-3` to `nosafepoint-4` as well to keep previous ordering. All mutexes with uses of `nosafepoint` as their rank seem to be good now. >> >> Testing: tier1-5 >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > unregister-nmethod still called during nmethod::flush @albertnetymk noticed that the change did not skip unregistering nmethods during the serial part of the nmethod flushing in G1 due to missing to propagate the correct flag. That lengthened the Remark pause unnecessarily. Fixed in the [e058816](https://github.com/openjdk/jdk/pull/15811/commits/e05881602f6873bb152f99ce3cb63f940ab0fe96) commit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15811#issuecomment-1735093852 From azafari at openjdk.org Tue Sep 26 09:02:48 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 26 Sep 2023 09:02:48 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v5] In-Reply-To: References: Message-ID: > The `find` method now is > ```C++ > template > int find(T* token, bool f(T*, E)) const { > ... > > Any other functions which use this are also changed. > Local linux-x64-debug hotspot:tier1 passed. Mach5 tier1 build on linux and Windows passed. Afshin Zafari has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into _8314502 - changed the `E` param of find methods to `const E&`. - find_from_end and its caller are also updated. - 8314502: Change the comparator taking version of GrowableArray::find to be a template method - 8314502: GrowableArray: Make find with comparator take template ------------- Changes: https://git.openjdk.org/jdk/pull/15418/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15418&range=04 Stats: 19 lines in 9 files changed: 2 ins; 1 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/15418.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15418/head:pull/15418 PR: https://git.openjdk.org/jdk/pull/15418 From amitkumar at openjdk.org Tue Sep 26 09:06:39 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 26 Sep 2023 09:06:39 GMT Subject: RFR: 8316935: [s390x] Use consistent naming for lightweight locking in MacroAssembler Message-ID: We (s390) also needs to update our naming from fast_lock & fast_unlock to MacroAssembler::lightweight_lock and MacroAssembler::lightweight_unlock respectively. ------------- Commit messages: - s390 patch Changes: https://git.openjdk.org/jdk/pull/15915/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15915&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316935 Stats: 16 lines in 4 files changed: 0 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/15915.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15915/head:pull/15915 PR: https://git.openjdk.org/jdk/pull/15915 From mdoerr at openjdk.org Tue Sep 26 09:36:14 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 26 Sep 2023 09:36:14 GMT Subject: RFR: 8316935: [s390x] Use consistent naming for lightweight locking in MacroAssembler In-Reply-To: References: Message-ID: On Tue, 26 Sep 2023 08:59:44 GMT, Amit Kumar wrote: > We (s390) also needs to update our naming from fast_lock & fast_unlock to MacroAssembler::lightweight_lock and MacroAssembler::lightweight_unlock respectively. Looks good and trivial. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15915#pullrequestreview-1643811581 From azafari at openjdk.org Tue Sep 26 10:21:19 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 26 Sep 2023 10:21:19 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v5] In-Reply-To: References: Message-ID: <5W3gx7mXdw08xMPNFI4au4mxt9o8Tzev34rsM-yzKeY=.58f3f10e-c83b-437b-a230-6bb7bde12fcc@github.com> On Tue, 26 Sep 2023 09:02:48 GMT, Afshin Zafari wrote: >> The `find` method now is >> ```C++ >> template >> int find(T* token, bool f(T*, E)) const { >> ... >> >> Any other functions which use this are also changed. >> Local linux-x64-debug hotspot:tier1 passed. Mach5 tier1 build on linux and Windows passed. > > Afshin Zafari has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into _8314502 > - changed the `E` param of find methods to `const E&`. > - find_from_end and its caller are also updated. > - 8314502: Change the comparator taking version of GrowableArray::find to be a template method > - 8314502: GrowableArray: Make find with comparator take template Dear reviewers @kimbarrett, @dholmes-ora, @stefank, @sspitsyn and @merykitty, Would you please check if you have any more comments on this? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15418#issuecomment-1735248312 From fbredberg at openjdk.org Tue Sep 26 11:41:52 2023 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Tue, 26 Sep 2023 11:41:52 GMT Subject: RFR: 8315966: Relativize initial_sp in interpreter frames [v2] In-Reply-To: References: Message-ID: <2MlDQiIr7FVTQzH2KyaDyepa6y8Abh1jM_8iuzxDlnA=.ab3b9155-b867-4a6a-aab4-96c2e3915737@github.com> > Relativize initial_sp in interpreter frames. > > By changing the "initial_sp" (AKA "monitor_block_top" or "monitors" on PowerPC) member in interpreter frames from being an absolute address into an offset that is relative to the frame pointer, we don't need to change the value as we freeze and thaw frames of virtual threads. This is since we might freeze and thaw from and to different worker threads, so the absolute address to locals might change, but the offset from the frame pointer will be constant. > > This subtask only handles relativization of "initial_sp" and "monitor_block_top" since it's the same slot in interpreter frames (roughly the same as "monitors" on PowerPC). Relativization of other interpreter frame members are handled in other subtasks to JDK-8289296. > > Tested tier1-tier7 on supported platforms. The rest was sanity tested using Qemu. Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: Updated after review. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15815/files - new: https://git.openjdk.org/jdk/pull/15815/files/dad25b66..29b576fd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15815&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15815&range=00-01 Stats: 29 lines in 6 files changed: 0 ins; 13 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/15815.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15815/head:pull/15815 PR: https://git.openjdk.org/jdk/pull/15815 From fbredberg at openjdk.org Tue Sep 26 11:46:16 2023 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Tue, 26 Sep 2023 11:46:16 GMT Subject: RFR: 8315966: Relativize initial_sp in interpreter frames [v2] In-Reply-To: References: Message-ID: On Wed, 20 Sep 2023 08:46:39 GMT, Fei Yang wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Updated after review. > > src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 704: > >> 702: // register for unlock_object to pass to VM directly >> 703: ld(c_rarg1, monitor_block_top); // derelativize pointer >> 704: shadd(c_rarg1, c_rarg1, fp, c_rarg1, LogBytesPerWord); > > Nit: One redundant space between the 3rd and 4th parameters for each `shadd` call added. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15815#discussion_r1337078087 From azafari at openjdk.org Tue Sep 26 11:53:41 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 26 Sep 2023 11:53:41 GMT Subject: RFR: 8198918: jio_snprintf and friends are not checked by -Wformat Message-ID: - The `ATTRIBUTE_PRINTF` usage in cpp files is useless. They are removed. - There are cases where `jio_xxprintf` functions use `char *` arguments for format string, rather than a literal like `"%s..."`. These cases are not compiled when `ATTRIBUTE_PRINTF` is used for them. So, I use the attribute and got the corresponding compile errors. Then I fixed the issues and remove the attribute when all fixed. - ### Test The changes are tested on all platforms tiers 1-4. ------------- Commit messages: - 8198918: jio_snprintf and friends are not checked by -Wformat Changes: https://git.openjdk.org/jdk/pull/15918/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15918&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8198918 Stats: 13 lines in 5 files changed: 0 ins; 5 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/15918.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15918/head:pull/15918 PR: https://git.openjdk.org/jdk/pull/15918 From fbredberg at openjdk.org Tue Sep 26 12:04:15 2023 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Tue, 26 Sep 2023 12:04:15 GMT Subject: RFR: 8315966: Relativize initial_sp in interpreter frames [v2] In-Reply-To: References: Message-ID: On Thu, 21 Sep 2023 12:45:07 GMT, Fei Yang wrote: >> Hi, I have arranged tier1-3 test on linux-riscv64 platform. Thanks for adding handling for riscv. > >> Hi, I have arranged tier1-3 test on linux-riscv64 platform. Thanks for adding handling for riscv. > > Tier1-3 test is clean. The riscv part looks good except for the nit. Thanks for the review { @RealFYang, @pchilano, @TheRealMDoerr }. The "Updated after review" version contains the following changes. 1. The extra space in `interp_masm_riscv.cpp` was removed. 2. I'm now using `R0` as a scratch register in `InterpreterMacroAssembler::save_interpreter_state` on PowerPC. 3. I removed some local variables in `FreezeBase::relativize_interpreted_frame_metadata` and `ThawBase::derelativize_interpreted_frame_metadata` that are no longer needed (x86, AArch64 and RISC-V). Runs tier1-4 ok on supported platforms. PowerPC and RISC-V was sanity tested using Qemu. Please check if still looks ok. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15815#issuecomment-1735406747 From amitkumar at openjdk.org Tue Sep 26 12:10:11 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 26 Sep 2023 12:10:11 GMT Subject: RFR: 8316935: [s390x] Use consistent naming for lightweight locking in MacroAssembler In-Reply-To: References: Message-ID: <5lJCGw9v2YvI60oU2a_3aoCPdwrnOoJyspHSaSaUJiw=.dfefcded-b1f2-4590-9995-cf9ceb5380cc@github.com> On Tue, 26 Sep 2023 09:32:56 GMT, Martin Doerr wrote: >> We (s390) also needs to update our naming from fast_lock & fast_unlock to MacroAssembler::lightweight_lock and MacroAssembler::lightweight_unlock respectively. > > Looks good and trivial. Thanks @TheRealMDoerr for reviewing it. It's a trivial change, Should I integrate it or wait for a another Review ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15915#issuecomment-1735415391 From coleenp at openjdk.org Tue Sep 26 12:28:13 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 26 Sep 2023 12:28:13 GMT Subject: RFR: 8316098: Revise signature of numa_get_leaf_groups In-Reply-To: References: Message-ID: On Mon, 18 Sep 2023 12:22:11 GMT, Albert Mingkun Yang wrote: > Simple refactoring to better reflect NUMA node id is non-negative using unsigned type. > > More cleanup can possibly be done to avoid the use of `checked_cast` in `os_windows.cpp`, but since the NUMA code in Windows is unreachable (JDK-8244065), I went to for the smallest diff there in order to avoid adding more untested code. Looks good to me. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15786#pullrequestreview-1644136515 From mbaesken at openjdk.org Tue Sep 26 12:39:14 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 26 Sep 2023 12:39:14 GMT Subject: RFR: 8316735: Print LockStack in hs_err files [v2] In-Reply-To: <3rU128P35kRKsdJ4IqrnTAbrWAZqrep4wD53q7JK1YI=.550e838a-d88b-4d0a-81b5-cb611259f2b6@github.com> References: <3rU128P35kRKsdJ4IqrnTAbrWAZqrep4wD53q7JK1YI=.550e838a-d88b-4d0a-81b5-cb611259f2b6@github.com> Message-ID: <9rHTSTdis0RotKMcSyJwUToIQIr-SxXxiV21S9LL2VM=.dd002c2a-3c83-45b0-ae08-ef301a175208@github.com> On Mon, 25 Sep 2023 11:58:47 GMT, Martin Doerr wrote: >> Example output: >> >> Lock stack of current Java thread (top to bottom): >> LockStack[1]: nsk.share.jdi.EventHandler >> {0x00000000bcc28198} - klass: 'nsk/share/jdi/EventHandler' >> - ---- fields (total size 5 words): >> - private volatile 'wasInterrupted' 'Z' @12 false (0x00) >> - private 'debuggee' 'Lnsk/share/jdi/Debugee;' @16 a 'nsk/share/jdi/LocalLaunchedDebugee'{0x00000000bcc08c18} (0xbcc08c18) >> - private 'log' 'Lnsk/share/Log;' @20 a 'nsk/share/Log'{0x00000000bcc08cb0} (0xbcc08cb0) >> - private 'vm' 'Lcom/sun/jdi/VirtualMachine;' @24 a 'com/sun/tools/jdi/VirtualMachineImpl'{0x00000000bccb3d60} (0xbccb3d60) >> - private 'requestManager' 'Lcom/sun/jdi/request/EventRequestManager;' @28 a 'com/sun/tools/jdi/EventRequestManagerImpl'{0x00000000bccb56f8} (0xbccb56f8) >> - private 'listenThread' 'Ljava/lang/Thread;' @32 a 'java/lang/Thread'{0x00000000bcc280e8} (0xbcc280e8) >> LockStack[0]: java.util.Collections$SynchronizedRandomAccessList >> {0x00000000bcb163e8} - klass: 'java/util/Collections$SynchronizedRandomAccessList' >> - ---- fields (total size 3 words): >> - final 'c' 'Ljava/util/Collection;' @12 a 'java/util/Vector'{0x00000000bcb16400} (0xbcb16400) >> - final 'mutex' 'Ljava/lang/Object;' @16 a 'java/util/Collections$SynchronizedRandomAccessList'{0x00000000bcb163e8} (0xbcb163e8) >> - final 'list' 'Ljava/util/List;' @20 a 'java/util/Vector'{0x00000000bcb16400} (0xbcb16400) > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Move up and change wording. LGTM ------------- Marked as reviewed by mbaesken (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15884#pullrequestreview-1644158818 From coleenp at openjdk.org Tue Sep 26 12:54:28 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 26 Sep 2023 12:54:28 GMT Subject: RFR: 8309599: WeakHandle and OopHandle release should clear obj pointer Message-ID: <7iiMlW8hlRFlFh8GdIwbphR86Hh3G7L5tA-f_VkxuC8=.fa35078c-c776-4041-841a-5d6c69ba4e06@github.com> This change makes WeakHandle and OopHandle release null out the obj pointer, at the cost of making the release function non-const and some changes that propagated from that. This enables ObjectMonitor code to test for null to see if the obj was already released, and seems like the right thing to do. See comments from related PR in the bug report. Tested with tier1-4. ------------- Commit messages: - 8309599: WeakHandle and OopHandle release should clear obj pointer Changes: https://git.openjdk.org/jdk/pull/15920/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15920&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309599 Stats: 13 lines in 8 files changed: 2 ins; 1 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/15920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15920/head:pull/15920 PR: https://git.openjdk.org/jdk/pull/15920 From ayang at openjdk.org Tue Sep 26 12:57:15 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 26 Sep 2023 12:57:15 GMT Subject: RFR: 8316098: Revise signature of numa_get_leaf_groups In-Reply-To: References: Message-ID: On Mon, 18 Sep 2023 12:22:11 GMT, Albert Mingkun Yang wrote: > Simple refactoring to better reflect NUMA node id is non-negative using unsigned type. > > More cleanup can possibly be done to avoid the use of `checked_cast` in `os_windows.cpp`, but since the NUMA code in Windows is unreachable (JDK-8244065), I went to for the smallest diff there in order to avoid adding more untested code. Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15786#issuecomment-1735487614 From ayang at openjdk.org Tue Sep 26 13:00:25 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 26 Sep 2023 13:00:25 GMT Subject: Integrated: 8316098: Revise signature of numa_get_leaf_groups In-Reply-To: References: Message-ID: On Mon, 18 Sep 2023 12:22:11 GMT, Albert Mingkun Yang wrote: > Simple refactoring to better reflect NUMA node id is non-negative using unsigned type. > > More cleanup can possibly be done to avoid the use of `checked_cast` in `os_windows.cpp`, but since the NUMA code in Windows is unreachable (JDK-8244065), I went to for the smallest diff there in order to avoid adding more untested code. This pull request has now been integrated. Changeset: e510dee1 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/e510dee162612d9a706ba54d0ab79a49139e77d8 Stats: 11 lines in 7 files changed: 1 ins; 0 del; 10 mod 8316098: Revise signature of numa_get_leaf_groups Reviewed-by: tschatzl, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/15786 From shade at openjdk.org Tue Sep 26 13:00:19 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 26 Sep 2023 13:00:19 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v24] In-Reply-To: References: <_U1jBJQChDb-Y86Qd-0xMl3f3oCjEv2egqem9ZME7GY=.0737b93e-4521-4b82-b330-7f4491370907@github.com> Message-ID: On Fri, 22 Sep 2023 03:28:18 GMT, David Holmes wrote: >> Using `long` is to avoid build failure on 32-bit ARM and x86. `jlong` is `long long` on 32-bit, and Atomic template does not support `long long` on 32-bit. Example failure: https://github.com/jjoo172/jdk/actions/runs/6229455243/job/16907994694. >> >> Is there a better way to avoid these failures on 32-bit? > > `long` is 32-bit on Windows x64 as well which means you're reducing the utility of these timers there (else you could use 32-bit everywhere). > > AFAICS it should be supported on x86-32 as we define `SUPPORTS_NATIVE_CX8` whilst for ARM it is restricted to ARMv7a and above. (Does anyone build ARMv6 still?) But that appears not to be handled by the atomic templates. > > Not sure the best way to approach this one. If the templates correctly handled SUPPORTS_NATIVE_CX8 to define the 64-bit variants then the ideal solution would be to use a typedef that is 64-bit on supported platforms and 32-bit elsewhere. Looks to me that x86_32 actually implements PlatformCmpxchg<8> (via linux_x86.S assembly), but not PlatformAdd<8>. So maybe the workaround would be to use Atomic::cmpxchg for this update, at least on _LP64 path? I see CollectedHeap::publish_total_cpu_time already does the CAS too. A cleaner solution would be to implement PlatformAdd<8> for x86_32, which would could be expressed via PlatformCmpxchg<8>, I think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1337163806 From simonis at openjdk.org Tue Sep 26 13:19:17 2023 From: simonis at openjdk.org (Volker Simonis) Date: Tue, 26 Sep 2023 13:19:17 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v24] In-Reply-To: References: Message-ID: On Tue, 19 Sep 2023 22:48:16 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Fix build issues Thanks for your continued work on this PR. I think it looks pretty good now once you fix the `long`/`jlong` issue mentioned by @dholmes-ora. And the solution proposed by @shipilev should work equally well on arm32. src/hotspot/share/gc/shared/stringdedup/stringDedupThread.cpp line 31: > 29: #include "runtime/handles.hpp" > 30: #include "runtime/os.hpp" > 31: #include "runtime/perfData.hpp" This include doesn't seem to be needed? ------------- PR Review: https://git.openjdk.org/jdk/pull/15082#pullrequestreview-1644233704 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1337186614 From rkennke at openjdk.org Tue Sep 26 13:25:11 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 26 Sep 2023 13:25:11 GMT Subject: RFR: 8309599: WeakHandle and OopHandle release should clear obj pointer In-Reply-To: <7iiMlW8hlRFlFh8GdIwbphR86Hh3G7L5tA-f_VkxuC8=.fa35078c-c776-4041-841a-5d6c69ba4e06@github.com> References: <7iiMlW8hlRFlFh8GdIwbphR86Hh3G7L5tA-f_VkxuC8=.fa35078c-c776-4041-841a-5d6c69ba4e06@github.com> Message-ID: On Tue, 26 Sep 2023 12:47:42 GMT, Coleen Phillimore wrote: > This change makes WeakHandle and OopHandle release null out the obj pointer, at the cost of making the release function non-const and some changes that propagated from that. This enables ObjectMonitor code to test for null to see if the obj was already released, and seems like the right thing to do. See comments from related PR in the bug report. > Tested with tier1-4. Not a full review yet, but it looks like it achieves the same as https://github.com/openjdk/jdk/pull/13721 but differently. I opened that PR then for other reasons, and it has been rejected then, on the grounds that those reasons are not valid (which I agree). Since then I have found at least one other use of the same cleaning mechanism, which is ZGC support for Lilliput, but we decided to follow another, better route (full obj->OM mapping). Maybe you find inspiration by that old PR ;-) It seems to get away with less intrusions in unrelated code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15920#issuecomment-1735534464 From mdoerr at openjdk.org Tue Sep 26 13:28:29 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 26 Sep 2023 13:28:29 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v2] In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 21:14:11 GMT, Martin Doerr wrote: >> I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. >> Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add x86_64 and aarch64 implementation. Closing after Roman's design explanation: "C2 (and C1) enforces strictly nested balanced locking." We will hopefully find a better fix to make this really happen. (Otherwise, we could still reopen.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1735538703 From mdoerr at openjdk.org Tue Sep 26 13:28:30 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 26 Sep 2023 13:28:30 GMT Subject: Withdrawn: 8316746: Top of lock-stack does not match the unlocked object In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 13:57:26 GMT, Martin Doerr wrote: > I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. > Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/15903 From ayang at openjdk.org Tue Sep 26 13:32:15 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 26 Sep 2023 13:32:15 GMT Subject: RFR: 8315503: G1: Code root scan causes long GC pauses due to imbalanced iteration [v5] In-Reply-To: References: Message-ID: On Tue, 26 Sep 2023 08:42:09 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that modifies the code root (remembered) set to use the CHT as internal representation. >> >> This removes lots of locking (inhibiting throughput), provides automatic balancing for the code root scan phase, and (parallel) bulk unregistering of nmethdos during code cache unloading improving performance of various pauses that deal with code root sets. >> >> With a stress test that frequently loads and unloads 6000 classes and associated methods from them we could previously see the following issues: >> >> During collection pauses: >> >> [4179,965s][gc,phases ] GC(273) Evacuate Collection Set: 812,18ms >> [..] >> [4179,965s][gc,phases ] GC(273) Code Root Scan (ms): Min: 0,00, Avg: 59,03, Max: 775,12, Diff: 775,12, Sum: 944,44, Workers: 16 >> [...] >> [4179,965s][gc,phases ] GC(273) Termination (ms): Min: 0,03, Avg: 643,90, Max: 690,96, Diff: 690,93, Sum: 10302,47, Workers: 16 >> >> >> Code root scan now reduces to ~22ms max on average in this case. >> >> Class unloading (breaking down the code cache flushing, i.e. `CodeCache::flush_unlinked_nmethods`): >> >> Clear Exception Caches 35,5ms >> Unregister NMethods 598,5ms <---- this is nmethod unregistering. >> Unregister Old NMethods 3,0ms >> CodeBlob flush 41,1ms >> CodeCache free 5730,3ms >> >> >> With this change, the `unregister nmethods` phase takes ~25ms max on that stress test. @walulyai contributed this part. >> >> We have recently seen some imbalances in code root scan and long Remark pauses (thankfully not to that extreme) in other real-world applications too: >> >> [2466.979s][gc,phases ] GC(131) Code Root Scan (ms): Min: 0.0, Avg: 5.7, Max: 46.4, Diff: 46.4, Sum: 57.0, Workers: 10 >> >> >> Some random comment: >> * the mutex for the CHT had to be decreased in priority by one to not conflict with `CodeCache_lock`. This does not seem to be detrimental otherwise. At the same time, I had to move the locks at `nosafepoint-3` to `nosafepoint-4` as well to keep previous ordering. All mutexes with uses of `nosafepoint` as their rank seem to be good now. >> >> Testing: tier1-5 >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > unregister-nmethod still called during nmethod::flush If I understand it correctly, this PR consists of two optimizations: 1. Replacing a mutex with conc-hashtable in g1 code-root. 2. Implementing bulk-notify unregistered nmethods. The first optimization is specific to g1, while the second one touches around the `CollectedHeap::unregister_nmethod` API and could be beneficial for other collectors as well. I wonder if it's possible to split this into two separate PRs for easier reviewing and to gain a better understanding of the performance impact of each optimization. src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 3041: > 3039: p2i(_nm), HR_FORMAT_PARAMS(hr), HR_FORMAT_PARAMS(hr->humongous_start_region())); > 3040: > 3041: // HeapRegion::add_code_root() avoids adding duplicate entries. Obsolete comment. ------------- PR Review: https://git.openjdk.org/jdk/pull/15811#pullrequestreview-1644078426 PR Review Comment: https://git.openjdk.org/jdk/pull/15811#discussion_r1337092226 From mdoerr at openjdk.org Tue Sep 26 13:36:26 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 26 Sep 2023 13:36:26 GMT Subject: RFR: 8316735: Print LockStack in hs_err files [v2] In-Reply-To: <3rU128P35kRKsdJ4IqrnTAbrWAZqrep4wD53q7JK1YI=.550e838a-d88b-4d0a-81b5-cb611259f2b6@github.com> References: <3rU128P35kRKsdJ4IqrnTAbrWAZqrep4wD53q7JK1YI=.550e838a-d88b-4d0a-81b5-cb611259f2b6@github.com> Message-ID: On Mon, 25 Sep 2023 11:58:47 GMT, Martin Doerr wrote: >> Example output: >> >> Lock stack of current Java thread (top to bottom): >> LockStack[1]: nsk.share.jdi.EventHandler >> {0x00000000bcc28198} - klass: 'nsk/share/jdi/EventHandler' >> - ---- fields (total size 5 words): >> - private volatile 'wasInterrupted' 'Z' @12 false (0x00) >> - private 'debuggee' 'Lnsk/share/jdi/Debugee;' @16 a 'nsk/share/jdi/LocalLaunchedDebugee'{0x00000000bcc08c18} (0xbcc08c18) >> - private 'log' 'Lnsk/share/Log;' @20 a 'nsk/share/Log'{0x00000000bcc08cb0} (0xbcc08cb0) >> - private 'vm' 'Lcom/sun/jdi/VirtualMachine;' @24 a 'com/sun/tools/jdi/VirtualMachineImpl'{0x00000000bccb3d60} (0xbccb3d60) >> - private 'requestManager' 'Lcom/sun/jdi/request/EventRequestManager;' @28 a 'com/sun/tools/jdi/EventRequestManagerImpl'{0x00000000bccb56f8} (0xbccb56f8) >> - private 'listenThread' 'Ljava/lang/Thread;' @32 a 'java/lang/Thread'{0x00000000bcc280e8} (0xbcc280e8) >> LockStack[0]: java.util.Collections$SynchronizedRandomAccessList >> {0x00000000bcb163e8} - klass: 'java/util/Collections$SynchronizedRandomAccessList' >> - ---- fields (total size 3 words): >> - final 'c' 'Ljava/util/Collection;' @12 a 'java/util/Vector'{0x00000000bcb16400} (0xbcb16400) >> - final 'mutex' 'Ljava/lang/Object;' @16 a 'java/util/Collections$SynchronizedRandomAccessList'{0x00000000bcb163e8} (0xbcb163e8) >> - final 'list' 'Ljava/util/List;' @20 a 'java/util/Vector'{0x00000000bcb16400} (0xbcb16400) > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Move up and change wording. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15884#issuecomment-1735552220 From mdoerr at openjdk.org Tue Sep 26 13:36:28 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 26 Sep 2023 13:36:28 GMT Subject: Integrated: 8316735: Print LockStack in hs_err files In-Reply-To: References: Message-ID: On Fri, 22 Sep 2023 09:47:22 GMT, Martin Doerr wrote: > Example output: > > Lock stack of current Java thread (top to bottom): > LockStack[1]: nsk.share.jdi.EventHandler > {0x00000000bcc28198} - klass: 'nsk/share/jdi/EventHandler' > - ---- fields (total size 5 words): > - private volatile 'wasInterrupted' 'Z' @12 false (0x00) > - private 'debuggee' 'Lnsk/share/jdi/Debugee;' @16 a 'nsk/share/jdi/LocalLaunchedDebugee'{0x00000000bcc08c18} (0xbcc08c18) > - private 'log' 'Lnsk/share/Log;' @20 a 'nsk/share/Log'{0x00000000bcc08cb0} (0xbcc08cb0) > - private 'vm' 'Lcom/sun/jdi/VirtualMachine;' @24 a 'com/sun/tools/jdi/VirtualMachineImpl'{0x00000000bccb3d60} (0xbccb3d60) > - private 'requestManager' 'Lcom/sun/jdi/request/EventRequestManager;' @28 a 'com/sun/tools/jdi/EventRequestManagerImpl'{0x00000000bccb56f8} (0xbccb56f8) > - private 'listenThread' 'Ljava/lang/Thread;' @32 a 'java/lang/Thread'{0x00000000bcc280e8} (0xbcc280e8) > LockStack[0]: java.util.Collections$SynchronizedRandomAccessList > {0x00000000bcb163e8} - klass: 'java/util/Collections$SynchronizedRandomAccessList' > - ---- fields (total size 3 words): > - final 'c' 'Ljava/util/Collection;' @12 a 'java/util/Vector'{0x00000000bcb16400} (0xbcb16400) > - final 'mutex' 'Ljava/lang/Object;' @16 a 'java/util/Collections$SynchronizedRandomAccessList'{0x00000000bcb163e8} (0xbcb163e8) > - final 'list' 'Ljava/util/List;' @20 a 'java/util/Vector'{0x00000000bcb16400} (0xbcb16400) This pull request has now been integrated. Changeset: 20ff6031 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/20ff603108a52468dd41020cbf6c0bf669e23861 Stats: 21 lines in 3 files changed: 20 ins; 0 del; 1 mod 8316735: Print LockStack in hs_err files Reviewed-by: dholmes, mbaesken ------------- PR: https://git.openjdk.org/jdk/pull/15884 From lucy at openjdk.org Tue Sep 26 13:41:15 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 26 Sep 2023 13:41:15 GMT Subject: RFR: 8316935: [s390x] Use consistent naming for lightweight locking in MacroAssembler In-Reply-To: References: Message-ID: On Tue, 26 Sep 2023 08:59:44 GMT, Amit Kumar wrote: > We (s390) also needs to update our naming from fast_lock & fast_unlock to MacroAssembler::lightweight_lock and MacroAssembler::lightweight_unlock respectively. Looks good to me. Changes which are classified "trivial" by a Reviewer can be integrated with just one positive review. Now you've got both: two reviews and a trivial classification. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15915#pullrequestreview-1644295073 From mdoerr at openjdk.org Tue Sep 26 13:41:16 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 26 Sep 2023 13:41:16 GMT Subject: RFR: 8316935: [s390x] Use consistent naming for lightweight locking in MacroAssembler In-Reply-To: References: Message-ID: On Tue, 26 Sep 2023 08:59:44 GMT, Amit Kumar wrote: > We (s390) also needs to update our naming from fast_lock & fast_unlock to MacroAssembler::lightweight_lock and MacroAssembler::lightweight_unlock respectively. You can integrate it after you get a 2nd review or after 24h (whatever happens first). ------------- PR Comment: https://git.openjdk.org/jdk/pull/15915#issuecomment-1735560616 From coleenp at openjdk.org Tue Sep 26 13:55:13 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 26 Sep 2023 13:55:13 GMT Subject: RFR: 8309599: WeakHandle and OopHandle release should clear obj pointer In-Reply-To: <7iiMlW8hlRFlFh8GdIwbphR86Hh3G7L5tA-f_VkxuC8=.fa35078c-c776-4041-841a-5d6c69ba4e06@github.com> References: <7iiMlW8hlRFlFh8GdIwbphR86Hh3G7L5tA-f_VkxuC8=.fa35078c-c776-4041-841a-5d6c69ba4e06@github.com> Message-ID: On Tue, 26 Sep 2023 12:47:42 GMT, Coleen Phillimore wrote: > This change makes WeakHandle and OopHandle release null out the obj pointer, at the cost of making the release function non-const and some changes that propagated from that. This enables ObjectMonitor code to test for null to see if the obj was already released, and seems like the right thing to do. See comments from related PR in the bug report. > Tested with tier1-4. This change doesn't fix the problem that you had observed with PR #13721. (I agree with the comments in that PR). It's a general cleanup where after calling release() the ObjectMonitor code wants some way to know that it has done so, without adding a specific call to WeakHandle::set_null() to do what the callers expect release() to do. I don't really like making the callers non-const, but I like having a special set_null() API less. Even with object -> OM mapping, the OM will still need to weakly point to the object, unless the design is changed a lot in the last couple of weeks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15920#issuecomment-1735587388 From tschatzl at openjdk.org Tue Sep 26 14:14:19 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 26 Sep 2023 14:14:19 GMT Subject: RFR: 8315503: G1: Code root scan causes long GC pauses due to imbalanced iteration [v5] In-Reply-To: References: Message-ID: On Tue, 26 Sep 2023 11:55:13 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> unregister-nmethod still called during nmethod::flush > > src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 3041: > >> 3039: p2i(_nm), HR_FORMAT_PARAMS(hr), HR_FORMAT_PARAMS(hr->humongous_start_region())); >> 3040: >> 3041: // HeapRegion::add_code_root() avoids adding duplicate entries. > > Obsolete comment. The comment is still current. https://bugs.openjdk.org/browse/JDK-8316212 is going to remove the extra check for `add()` to require the caller to make sure that nothing is added while iterating. I will remove the comment anyway because it does not add anything (and the important thing really is that no entries are added during iteration, not the duplicate thing). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15811#discussion_r1337279894 From fyang at openjdk.org Tue Sep 26 14:22:14 2023 From: fyang at openjdk.org (Fei Yang) Date: Tue, 26 Sep 2023 14:22:14 GMT Subject: RFR: 8315966: Relativize initial_sp in interpreter frames [v2] In-Reply-To: References: Message-ID: On Tue, 26 Sep 2023 12:01:34 GMT, Fredrik Bredberg wrote: > The "Updated after review" version contains the following changes. > > 1. The extra space in `interp_masm_riscv.cpp` was removed. I see `shadd` added in other files are not covered? They have similar issues. Looks good otherwise. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15815#issuecomment-1735642911 From mdoerr at openjdk.org Tue Sep 26 14:26:17 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 26 Sep 2023 14:26:17 GMT Subject: RFR: 8315966: Relativize initial_sp in interpreter frames [v2] In-Reply-To: <2MlDQiIr7FVTQzH2KyaDyepa6y8Abh1jM_8iuzxDlnA=.ab3b9155-b867-4a6a-aab4-96c2e3915737@github.com> References: <2MlDQiIr7FVTQzH2KyaDyepa6y8Abh1jM_8iuzxDlnA=.ab3b9155-b867-4a6a-aab4-96c2e3915737@github.com> Message-ID: On Tue, 26 Sep 2023 11:41:52 GMT, Fredrik Bredberg wrote: >> Relativize initial_sp in interpreter frames. >> >> By changing the "initial_sp" (AKA "monitor_block_top" or "monitors" on PowerPC) member in interpreter frames from being an absolute address into an offset that is relative to the frame pointer, we don't need to change the value as we freeze and thaw frames of virtual threads. This is since we might freeze and thaw from and to different worker threads, so the absolute address to locals might change, but the offset from the frame pointer will be constant. >> >> This subtask only handles relativization of "initial_sp" and "monitor_block_top" since it's the same slot in interpreter frames (roughly the same as "monitors" on PowerPC). Relativization of other interpreter frame members are handled in other subtasks to JDK-8289296. >> >> Tested tier1-tier7 on supported platforms. The rest was sanity tested using Qemu. > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Updated after review. PPC64 part looks great! Thanks! ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15815#pullrequestreview-1644420114 From amitkumar at openjdk.org Tue Sep 26 15:06:27 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 26 Sep 2023 15:06:27 GMT Subject: RFR: 8316935: [s390x] Use consistent naming for lightweight locking in MacroAssembler In-Reply-To: References: Message-ID: <36JaF_Vyl2qSSamBEPmkJz4-y0YbUUlyKcuuvrmF9is=.08303792-ac90-4337-8a7e-26033bb1795d@github.com> On Tue, 26 Sep 2023 13:37:54 GMT, Lutz Schmidt wrote: > Now you've got both: two reviews and a trivial classification. Thank you, Lutz and Martin for approving it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15915#issuecomment-1735725343 From amitkumar at openjdk.org Tue Sep 26 15:06:28 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 26 Sep 2023 15:06:28 GMT Subject: Integrated: 8316935: [s390x] Use consistent naming for lightweight locking in MacroAssembler In-Reply-To: References: Message-ID: On Tue, 26 Sep 2023 08:59:44 GMT, Amit Kumar wrote: > We (s390) also needs to update our naming from fast_lock & fast_unlock to MacroAssembler::lightweight_lock and MacroAssembler::lightweight_unlock respectively. This pull request has now been integrated. Changeset: efb7e85e Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/efb7e85ecfc9c6edb2820e1bf72d48958d4c9780 Stats: 16 lines in 4 files changed: 0 ins; 0 del; 16 mod 8316935: [s390x] Use consistent naming for lightweight locking in MacroAssembler Reviewed-by: mdoerr, lucy ------------- PR: https://git.openjdk.org/jdk/pull/15915 From rrich at openjdk.org Tue Sep 26 16:40:51 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 26 Sep 2023 16:40:51 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v11] In-Reply-To: References: Message-ID: > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request incrementally with five additional commits since the last revision: - Eliminate special case for scanning the large array end - First card of large array should be cleared if dirty - Do all large array scanning in separate method - Limit stripe size to 1m with at least 8 threads - Small clean-ups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/86747ff7..d75bd60a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=09-10 Stats: 153 lines in 2 files changed: 60 ins; 42 del; 51 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From rrich at openjdk.org Tue Sep 26 17:39:15 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 26 Sep 2023 17:39:15 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v11] In-Reply-To: References: Message-ID: On Tue, 26 Sep 2023 16:40:51 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with five additional commits since the last revision: > > - Eliminate special case for scanning the large array end > - First card of large array should be cleared if dirty > - Do all large array scanning in separate method > - Limit stripe size to 1m with at least 8 threads > - Small clean-ups The following is new in the most recent version: * All of the large array scanning is done by one method: `scavenge_large_array_contents`. * The special case for the last stripe of a large array was eliminated. Large array elements are only scanned by the stripe owner. * Reduced minimum size to `stripe_size_in_words` * Max. stripe size of 1M if there are at least 8 active threads Testing: hotspot:tier1 langtools:tier1 TEST_VM_OPTS="-XX:+UseParallelGC" Tests from JBS-item ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1735990753 From psandoz at openjdk.org Tue Sep 26 18:47:22 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 26 Sep 2023 18:47:22 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v27] In-Reply-To: References: Message-ID: <2AguFuRwAmTfpUeFVU2toX7gEtL5VoWKOfkeiHLtpow=.084205ec-e2eb-4dd7-ada2-732b316f6a86@github.com> On Mon, 25 Sep 2023 16:54:10 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > fix typos src/java.base/share/classes/java/lang/Module.java line 328: > 326: System.err.printf(""" > 327: WARNING: A restricted method in %s has been called > 328: WARNING: %s has been called%s in %s Suggestion: WARNING: %s has been called by %s in %s ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1337640870 From psandoz at openjdk.org Tue Sep 26 19:06:24 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 26 Sep 2023 19:06:24 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v27] In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 16:54:10 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > fix typos src/java.base/share/classes/java/lang/foreign/Linker.java line 735: > 733: * > 734: * @apiNote This linker option can not be combined with {@link #critical}. > 735: * That seems more specification (that can be asserted on) then an informative note. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1337661700 From dholmes at openjdk.org Tue Sep 26 20:57:15 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 26 Sep 2023 20:57:15 GMT Subject: RFR: 8299915: Remove ArrayAllocatorMallocLimit and associated code [v4] In-Reply-To: References: Message-ID: On Tue, 26 Sep 2023 08:35:10 GMT, Afshin Zafari wrote: >> 1. `ArrayAllocatorMallocLimit` is removed. The test cases that tested it also are removed. >> 2. `AllocArrayAllocator` instances are replaced with `MallocArrayAllocator`. >> 3. The signature of `CHeapBitMap::free(ptr, size)` is kept as it is, since it is called in this way from `GrowableBitMap::resize`, where `T` can be also `ArenaBitMap` and `ResourceBitMap`. However, it uses `MallocArrayAllocator::free(ptr)` and ignores the `size`: >> ```C++ >> void CHeapBitMap::free(bm_word_t* map, idx_t size_in_words) const { >> MallocArrayAllocator::free(map); >> } >> >> ### Test >> tiers1-4 passed on all platforms. > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > MetaspaceSize and its lower bound is used. Thanks for test changes. Seems fine. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15859#pullrequestreview-1645194241 From dholmes at openjdk.org Tue Sep 26 21:08:19 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 26 Sep 2023 21:08:19 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v5] In-Reply-To: References: Message-ID: On Tue, 26 Sep 2023 09:02:48 GMT, Afshin Zafari wrote: >> The `find` method now is >> ```C++ >> template >> int find(T* token, bool f(T*, E)) const { >> ... >> >> Any other functions which use this are also changed. >> Local linux-x64-debug hotspot:tier1 passed. Mach5 tier1 build on linux and Windows passed. > > Afshin Zafari has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into _8314502 > - changed the `E` param of find methods to `const E&`. > - find_from_end and its caller are also updated. > - 8314502: Change the comparator taking version of GrowableArray::find to be a template method > - 8314502: GrowableArray: Make find with comparator take template src/hotspot/share/gc/parallel/mutableNUMASpace.hpp line 1: > 1: /* This seems an unrelated change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15418#discussion_r1337785093 From coleenp at openjdk.org Tue Sep 26 21:11:16 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 26 Sep 2023 21:11:16 GMT Subject: RFR: 8299915: Remove ArrayAllocatorMallocLimit and associated code [v4] In-Reply-To: References: Message-ID: On Tue, 26 Sep 2023 08:35:10 GMT, Afshin Zafari wrote: >> 1. `ArrayAllocatorMallocLimit` is removed. The test cases that tested it also are removed. >> 2. `AllocArrayAllocator` instances are replaced with `MallocArrayAllocator`. >> 3. The signature of `CHeapBitMap::free(ptr, size)` is kept as it is, since it is called in this way from `GrowableBitMap::resize`, where `T` can be also `ArenaBitMap` and `ResourceBitMap`. However, it uses `MallocArrayAllocator::free(ptr)` and ignores the `size`: >> ```C++ >> void CHeapBitMap::free(bm_word_t* map, idx_t size_in_words) const { >> MallocArrayAllocator::free(map); >> } >> >> ### Test >> tiers1-4 passed on all platforms. > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > MetaspaceSize and its lower bound is used. This looks good to me. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15859#pullrequestreview-1645212245 From psandoz at openjdk.org Tue Sep 26 21:20:23 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 26 Sep 2023 21:20:23 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v27] In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 16:54:10 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > fix typos src/java.base/share/classes/jdk/internal/foreign/NativeMemorySegmentImpl.java line 152: > 150: private static long allocateMemoryWrapper(long size) { > 151: try { > 152: return UNSAFE.allocateMemory(size); Since we now zero memory only when needed we should test very carefully. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1337794211 From psandoz at openjdk.org Tue Sep 26 21:31:23 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 26 Sep 2023 21:31:23 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v27] In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 16:54:10 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > fix typos src/java.base/share/classes/java/lang/foreign/Linker.java line 35: > 33: > 34: import java.lang.invoke.MethodHandle; > 35: import java.nio.ByteOrder; Unused? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1337802958 From psandoz at openjdk.org Tue Sep 26 22:02:20 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 26 Sep 2023 22:02:20 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v27] In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 16:54:10 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > fix typos This looks good. In the implementation the functional interfaces `BindingInterpreter.StoreFunc/LoadFunc` are package private, but are used in internal public signatures. This was previously like this and it's not a big deal but i recommend making those interfaces public. We can also remove `@enablePreview` from `IndirectVarHandleTest` ------------- Marked as reviewed by psandoz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15103#pullrequestreview-1645266436 From cslucas at openjdk.org Tue Sep 26 22:55:39 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 26 Sep 2023 22:55:39 GMT Subject: RFR: JDK-8316991: Reduce nullable allocation merges Message-ID: ### Description Many, if not most, allocation merges (Phis) are nullable because they join object allocations with "NULL", or objects returned from method calls, etc. Please review this Pull Request that improves Reduce Allocation Merge implementation so that it can reduce at least some of these allocation merges. Overall, the improvements are related to 1) making rematerialization of merges able to represent "NULL" objects, and 2) being able to reduce merges used by CmpP/N and CastPP. The approach to reducing CmpP/N and CastPP is pretty similar to that used in the `MemNode::split_through_phi` method: a clone of the node being split is added on each input of the Phi. I make use of `optimize_ptr_compare` and some type information to remove redundant CmpP and CastPP nodes. I added a bunch of ASCII diagrams illustrating what some of the more important methods are doing. ### Benchmarking **Note:** In some of these tests no reduction happens. I left them in to validate that no perf. regression happens in that case. **Note 2:** Marging of error was negligible. | Benchmark | No RAM (ms/op) | Yes RAM (ms/op) | |--------------------------------------|------------------|-------------------| | TestTrapAfterMerge | 19.515 | 13.386 | | TestArgEscape | 33.165 | 33.254 | | TestCallTwoSide | 70.547 | 69.427 | | TestCmpAfterMerge | 16.400 | 2.984 | | TestCmpMergeWithNull_Second | 27.204 | 27.293 | | TestCmpMergeWithNull | 8.248 | 4.920 | | TestCondAfterMergeWithAllocate | 12.890 | 5.252 | | TestCondAfterMergeWithNull | 6.265 | 5.078 | | TestCondLoadAfterMerge | 12.713 | 5.163 | | TestConsecutiveSimpleMerge | 30.863 | 4.068 | | TestDoubleIfElseMerge | 16.069 | 2.444 | | TestEscapeInCallAfterMerge | 23.111 | 22.924 | | TestGlobalEscape | 14.459 | 14.425 | | TestIfElseInLoop | 246.061 | 42.786 | | TestLoadAfterLoopAlias | 45.808 | 45.812 | | TestLoadAfterTrap | 28.370 | 28.514 | | TestLoadInCondAfterMerge | 12.538 | 4.720 | | TestLoadInLoop | 25.534 | 17.079 | | TestMergedAccessAfterCallNoWrite | 169.837 | 169.881 | | TestMergedAccessAfterCallWithWrite | 149.669 | 152.105 | | TestMergedLoadAfterDirectStore | 16.496 | 16.473 | | TestMergesAndMixedEscape | 28.821 | 19.701 | | TestNestedObjectsArray | 31.207 | 27.832 | | TestNestedObjectsNoEscapeObject | 16.162 | 12.544 | | TestNestedObjectsObject | 16.117 | 12.204 | | TestNoEscapeWithLoadInLoop | 253.903 | 247.400 | | TestNoEscapeWithWriteInLoop | 113.710 | 113.714 | | TestObjectIdentity | 2.442 | 2.442 | | TestPartialPhis | 4.340 | 4.340 | | TestPollutedNoWrite | 7.817 | 1.991 | | TestPollutedPolymorphic | 11.017 | 1.991 | | TestPollutedWithWrite | 8.596 | 8.593 | | TestSRAndNSR_NoTrap_caller | 14.865 | 8.536 | | TestSRAndNSR_Trap_caller | 45.689 | 40.930 | | TestSimpleAliasedAlloc | 16.297 | 2.447 | | TestSimpleDoubleMerge | 23.786 | 2.997 | | TestString_one_caller | 15.484 | 15.271 | | TestString_two_caller | 15.456 | 14.996 | | TestSubclassesTrapping | 26.820 | 26.143 | | TestSubclasses | 6.521 | 3.834 | | TestThreeWayAliasedAlloc | 16.307 | 2.308 | | TestTrappingAfterMerge | 13.683 | 6.804 | ### Tests - Linux x86_64: Tier1-4, DaCapo, Renaissance, SpecJBB - MacOS Aarch64: Tier1-4 - Windows x86_64: Tier1-4 ------------- Commit messages: - Fix build after merge. - Fix merge - Support for reducing nullable allocation merges. Changes: https://git.openjdk.org/jdk/pull/15825/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15825&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316991 Stats: 2284 lines in 13 files changed: 2046 ins; 93 del; 145 mod Patch: https://git.openjdk.org/jdk/pull/15825.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15825/head:pull/15825 PR: https://git.openjdk.org/jdk/pull/15825 From jvernee at openjdk.org Wed Sep 27 00:26:21 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 27 Sep 2023 00:26:21 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v27] In-Reply-To: References: Message-ID: On Tue, 26 Sep 2023 21:28:43 GMT, Paul Sandoz wrote: >> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: >> >> fix typos > > src/java.base/share/classes/java/lang/foreign/Linker.java line 35: > >> 33: >> 34: import java.lang.invoke.MethodHandle; >> 35: import java.nio.ByteOrder; > > Unused? Yes, I'll remove it. > src/java.base/share/classes/java/lang/foreign/Linker.java line 735: > >> 733: * >> 734: * @apiNote This linker option can not be combined with {@link #critical}. >> 735: * > > That seems more specification (that can be asserted on) then an informative note. True. I'll fold it into the main spec body. > src/java.base/share/classes/jdk/internal/foreign/NativeMemorySegmentImpl.java line 152: > >> 150: private static long allocateMemoryWrapper(long size) { >> 151: try { >> 152: return UNSAFE.allocateMemory(size); > > Since we now zero memory only when needed we should test very carefully. Yes. The `makeNativeSegment` is currently only called from ArenaImpl, which is also responsible for zeroing the memory. I'll rename `makeNativeSegment` to `makeNativeSegmentNoZeroing` to make it extra clear for callers that memory will not be zeroed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1337898179 PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1337898017 PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1337899265 From jvernee at openjdk.org Wed Sep 27 00:33:37 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 27 Sep 2023 00:33:37 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v27] In-Reply-To: <2AguFuRwAmTfpUeFVU2toX7gEtL5VoWKOfkeiHLtpow=.084205ec-e2eb-4dd7-ada2-732b316f6a86@github.com> References: <2AguFuRwAmTfpUeFVU2toX7gEtL5VoWKOfkeiHLtpow=.084205ec-e2eb-4dd7-ada2-732b316f6a86@github.com> Message-ID: On Tue, 26 Sep 2023 18:44:01 GMT, Paul Sandoz wrote: >> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: >> >> fix typos > > src/java.base/share/classes/java/lang/Module.java line 328: > >> 326: System.err.printf(""" >> 327: WARNING: A restricted method in %s has been called >> 328: WARNING: %s has been called%s in %s > > Suggestion: > > WARNING: %s has been called by %s in %s > > ? The current code does the right thing, since in some cases the caller is `null` and the second `%s` should expand to an empty string. So in the `caller == null` case, the message becomes: Class::method has been called in an unnamed module There was also an offline suggestion to change it to: Class::method has been called by code in an unnamed module Which I think is a good idea. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1337902449 From jvernee at openjdk.org Wed Sep 27 00:36:22 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 27 Sep 2023 00:36:22 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v27] In-Reply-To: References: Message-ID: On Tue, 26 Sep 2023 21:59:35 GMT, Paul Sandoz wrote: > In the implementation the functional interfaces `BindingInterpreter.StoreFunc/LoadFunc` are package private, but are used in internal public signatures. This was previously like this and it's not a big deal but i recommend making those interfaces public. This was fixed in the panama-foreign repo: https://github.com/openjdk/panama-foreign/pull/891 I'll add that commit to this patch as well. > We can also remove `@enablePreview` from `IndirectVarHandleTest` Good catch! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15103#issuecomment-1736481584 From jvernee at openjdk.org Wed Sep 27 00:53:25 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 27 Sep 2023 00:53:25 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v28] In-Reply-To: References: Message-ID: <9NvYUl5F19FDLsn14TCcO34nCEZZIt07H8iLHrpTbgY=.d375318e-73a4-43df-9526-264a0ee043bc@github.com> > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: - Fix visibility issues Reviewed-by: mcimadamore - Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15103/files - new: https://git.openjdk.org/jdk/pull/15103/files/0244845a..f6ab4dc5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=26-27 Stats: 11 lines in 6 files changed: 0 ins; 3 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From dholmes at openjdk.org Wed Sep 27 07:09:11 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 27 Sep 2023 07:09:11 GMT Subject: RFR: 8309599: WeakHandle and OopHandle release should clear obj pointer In-Reply-To: <7iiMlW8hlRFlFh8GdIwbphR86Hh3G7L5tA-f_VkxuC8=.fa35078c-c776-4041-841a-5d6c69ba4e06@github.com> References: <7iiMlW8hlRFlFh8GdIwbphR86Hh3G7L5tA-f_VkxuC8=.fa35078c-c776-4041-841a-5d6c69ba4e06@github.com> Message-ID: <6G2AMWuBunkw5zZ0M39qBjGg523vfEUm1NPzafAAVMc=.3830900f-cbbc-4989-86d4-7fc77d860291@github.com> On Tue, 26 Sep 2023 12:47:42 GMT, Coleen Phillimore wrote: > This change makes WeakHandle and OopHandle release null out the obj pointer, at the cost of making the release function non-const and some changes that propagated from that. This enables ObjectMonitor code to test for null to see if the obj was already released, and seems like the right thing to do. See comments from related PR in the bug report. > Tested with tier1-4. Nulling out `_obj` seems quite reasonable. But I'm struggling to understand why we support `release` as a public API instead of having it handled by the destructor? One you have released a handle it is dangerous to try and use it so why keep it around instead of deleting it (and thus running the destructor)? Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15920#pullrequestreview-1645748643 From tschatzl at openjdk.org Wed Sep 27 07:23:39 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 27 Sep 2023 07:23:39 GMT Subject: RFR: 8315503: G1: Code root scan causes long GC pauses due to imbalanced iteration [v6] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that modifies the code root (remembered) set to use the CHT as internal representation. > > This removes lots of locking (inhibiting throughput), provides automatic balancing for the code root scan phase, and (parallel) bulk unregistering of nmethdos during code cache unloading improving performance of various pauses that deal with code root sets. > > With a stress test that frequently loads and unloads 6000 classes and associated methods from them we could previously see the following issues: > > During collection pauses: > > [4179,965s][gc,phases ] GC(273) Evacuate Collection Set: 812,18ms > [..] > [4179,965s][gc,phases ] GC(273) Code Root Scan (ms): Min: 0,00, Avg: 59,03, Max: 775,12, Diff: 775,12, Sum: 944,44, Workers: 16 > [...] > [4179,965s][gc,phases ] GC(273) Termination (ms): Min: 0,03, Avg: 643,90, Max: 690,96, Diff: 690,93, Sum: 10302,47, Workers: 16 > > > Code root scan now reduces to ~22ms max on average in this case. > > Class unloading (breaking down the code cache flushing, i.e. `CodeCache::flush_unlinked_nmethods`): > > Clear Exception Caches 35,5ms > Unregister NMethods 598,5ms <---- this is nmethod unregistering. > Unregister Old NMethods 3,0ms > CodeBlob flush 41,1ms > CodeCache free 5730,3ms > > > With this change, the `unregister nmethods` phase takes ~25ms max on that stress test. @walulyai contributed this part. > > We have recently seen some imbalances in code root scan and long Remark pauses (thankfully not to that extreme) in other real-world applications too: > > [2466.979s][gc,phases ] GC(131) Code Root Scan (ms): Min: 0.0, Avg: 5.7, Max: 46.4, Diff: 46.4, Sum: 57.0, Workers: 10 > > > Some random comment: > * the mutex for the CHT had to be decreased in priority by one to not conflict with `CodeCache_lock`. This does not seem to be detrimental otherwise. At the same time, I had to move the locks at `nosafepoint-3` to `nosafepoint-4` as well to keep previous ordering. All mutexes with uses of `nosafepoint` as their rank seem to be good now. > > Testing: tier1-5 > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: - Split off bulk removal of dead nmethods from code root sets - Initial comments from albert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15811/files - new: https://git.openjdk.org/jdk/pull/15811/files/e0588160..6e7bacd8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15811&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15811&range=04-05 Stats: 88 lines in 21 files changed: 1 ins; 65 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/15811.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15811/head:pull/15811 PR: https://git.openjdk.org/jdk/pull/15811 From azafari at openjdk.org Wed Sep 27 08:27:20 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Wed, 27 Sep 2023 08:27:20 GMT Subject: RFR: 8299915: Remove ArrayAllocatorMallocLimit and associated code [v4] In-Reply-To: References: Message-ID: On Tue, 26 Sep 2023 21:08:36 GMT, Coleen Phillimore wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> MetaspaceSize and its lower bound is used. > > This looks good to me. Thank you @coleenp and @dholmes-ora for your reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15859#issuecomment-1736934979 From azafari at openjdk.org Wed Sep 27 08:30:23 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Wed, 27 Sep 2023 08:30:23 GMT Subject: Integrated: 8299915: Remove ArrayAllocatorMallocLimit and associated code In-Reply-To: References: Message-ID: On Thu, 21 Sep 2023 12:02:24 GMT, Afshin Zafari wrote: > 1. `ArrayAllocatorMallocLimit` is removed. The test cases that tested it also are removed. > 2. `AllocArrayAllocator` instances are replaced with `MallocArrayAllocator`. > 3. The signature of `CHeapBitMap::free(ptr, size)` is kept as it is, since it is called in this way from `GrowableBitMap::resize`, where `T` can be also `ArenaBitMap` and `ResourceBitMap`. However, it uses `MallocArrayAllocator::free(ptr)` and ignores the `size`: > ```C++ > void CHeapBitMap::free(bm_word_t* map, idx_t size_in_words) const { > MallocArrayAllocator::free(map); > } > > ### Test > tiers1-4 passed on all platforms. This pull request has now been integrated. Changeset: 45a145e5 Author: Afshin Zafari URL: https://git.openjdk.org/jdk/commit/45a145e5bc3d3216bb03379896f66a3b719a06dc Stats: 213 lines in 8 files changed: 2 ins; 202 del; 9 mod 8299915: Remove ArrayAllocatorMallocLimit and associated code Reviewed-by: dholmes, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/15859 From pli at openjdk.org Wed Sep 27 08:36:43 2023 From: pli at openjdk.org (Pengfei Li) Date: Wed, 27 Sep 2023 08:36:43 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v4] In-Reply-To: References: Message-ID: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: Fix code style issues and add loop head dump ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14581/files - new: https://git.openjdk.org/jdk/pull/14581/files/93ccda10..bd1b939b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14581&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14581&range=02-03 Stats: 90 lines in 2 files changed: 27 ins; 14 del; 49 mod Patch: https://git.openjdk.org/jdk/pull/14581.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14581/head:pull/14581 PR: https://git.openjdk.org/jdk/pull/14581 From pli at openjdk.org Wed Sep 27 08:42:30 2023 From: pli at openjdk.org (Pengfei Li) Date: Wed, 27 Sep 2023 08:42:30 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v4] In-Reply-To: References: <7iru0xDm4lckuwyHvqGSld0_kWUVYSTg5BT3-rqP3Vw=.cab94902-98aa-40ed-ae13-d238380b6267@github.com> Message-ID: On Mon, 3 Jul 2023 14:42:03 GMT, Emanuel Peter wrote: >> I will try to do this in another JBS and come back here later. > > That would be fantastic! `SWPointer` is already moved out from `SuperWord` in [JDK-8312332](https://bugs.openjdk.org/browse/JDK-8312332). We will move `_vector_loop_debug` out in a later refactoring patch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1338266329 From azafari at openjdk.org Wed Sep 27 08:51:18 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Wed, 27 Sep 2023 08:51:18 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v5] In-Reply-To: References: Message-ID: <73sTfC_S_JkdiAYBv-CCB58NfWO_Q1RSgG43wJNutI8=.05080f32-0bd1-4a1e-a252-9bd4142f1fed@github.com> On Tue, 26 Sep 2023 21:05:22 GMT, David Holmes wrote: >> Afshin Zafari has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Merge branch 'master' into _8314502 >> - changed the `E` param of find methods to `const E&`. >> - find_from_end and its caller are also updated. >> - 8314502: Change the comparator taking version of GrowableArray::find to be a template method >> - 8314502: GrowableArray: Make find with comparator take template > > src/hotspot/share/gc/parallel/mutableNUMASpace.hpp line 1: > >> 1: /* > > This seems an unrelated change. This change came after fixing a merge conflict. In `mutableNUMASpace.cpp`, at lines 163, 182, 202 and 586 the `find` function is called in this way: int i = lgrp_spaces()->find(&lgrp_id, LGRPSpace::equals); where `lgrp_id` is `int`. Therefore, the `LGRPSpace::equals` has to take an `int*` in its first argument. The definition of `find` is: int find(T* token, bool f(T*, const E&)) const { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15418#discussion_r1338278635 From fbredberg at openjdk.org Wed Sep 27 09:07:23 2023 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 27 Sep 2023 09:07:23 GMT Subject: RFR: 8315966: Relativize initial_sp in interpreter frames [v3] In-Reply-To: References: Message-ID: <2MH8OqpvowA48pPPZst0J_smowCkPPyawen6o4qpuIw=.a433a9bd-6d7e-4f7d-b601-60c4c151164b@github.com> > Relativize initial_sp in interpreter frames. > > By changing the "initial_sp" (AKA "monitor_block_top" or "monitors" on PowerPC) member in interpreter frames from being an absolute address into an offset that is relative to the frame pointer, we don't need to change the value as we freeze and thaw frames of virtual threads. This is since we might freeze and thaw from and to different worker threads, so the absolute address to locals might change, but the offset from the frame pointer will be constant. > > This subtask only handles relativization of "initial_sp" and "monitor_block_top" since it's the same slot in interpreter frames (roughly the same as "monitors" on PowerPC). Relativization of other interpreter frame members are handled in other subtasks to JDK-8289296. > > Tested tier1-tier7 on supported platforms. The rest was sanity tested using Qemu. Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: Removed whitespace (RISC-V only). ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15815/files - new: https://git.openjdk.org/jdk/pull/15815/files/29b576fd..68d94c03 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15815&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15815&range=01-02 Stats: 6 lines in 3 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/15815.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15815/head:pull/15815 PR: https://git.openjdk.org/jdk/pull/15815 From fyang at openjdk.org Wed Sep 27 09:14:14 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 27 Sep 2023 09:14:14 GMT Subject: RFR: 8315966: Relativize initial_sp in interpreter frames [v3] In-Reply-To: <2MH8OqpvowA48pPPZst0J_smowCkPPyawen6o4qpuIw=.a433a9bd-6d7e-4f7d-b601-60c4c151164b@github.com> References: <2MH8OqpvowA48pPPZst0J_smowCkPPyawen6o4qpuIw=.a433a9bd-6d7e-4f7d-b601-60c4c151164b@github.com> Message-ID: On Wed, 27 Sep 2023 09:07:23 GMT, Fredrik Bredberg wrote: >> Relativize initial_sp in interpreter frames. >> >> By changing the "initial_sp" (AKA "monitor_block_top" or "monitors" on PowerPC) member in interpreter frames from being an absolute address into an offset that is relative to the frame pointer, we don't need to change the value as we freeze and thaw frames of virtual threads. This is since we might freeze and thaw from and to different worker threads, so the absolute address to locals might change, but the offset from the frame pointer will be constant. >> >> This subtask only handles relativization of "initial_sp" and "monitor_block_top" since it's the same slot in interpreter frames (roughly the same as "monitors" on PowerPC). Relativization of other interpreter frame members are handled in other subtasks to JDK-8289296. >> >> Tested tier1-tier7 on supported platforms. The rest was sanity tested using Qemu. > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Removed whitespace (RISC-V only). Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15815#pullrequestreview-1645996807 From rrich at openjdk.org Wed Sep 27 10:05:14 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 27 Sep 2023 10:05:14 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v9] In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 11:29:14 GMT, Albert Mingkun Yang wrote: > I experimented with the aforementioned read-only card table idea a bit and here is the draft: > https://github.com/openjdk/jdk/compare/master...albertnetymk:jdk:pgc-precise-obj-arr?expand=1 This looks very nice! The code is a lot easier to follow than the baseline and this pr. With your draft I found out too that the regressions with just 2 threads come from the remaining `object_start` calls. Larger stripes mean fewer of them. The caching used in your draft is surly better. So by default 1 card table byte per 512b card is needed. The shadow card table will require 2M per gigabyte used old generation. I guess that's affordable. Would you think that your solution can be backported? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1737093271 From mli at openjdk.org Wed Sep 27 10:09:41 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 27 Sep 2023 10:09:41 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic Message-ID: Only vector version is included in this patch. ### Test The patch passed the jdk tests found via `find test/jdk/ -iname *ChaCha*` ------------- Commit messages: - change from li to mv to follow code convention - fix according round 2 reviewing - use mu/tu rather than ma/ta when calling vsetvli - fix according round 1 reviewing - Initial commit: chacha20 intrinsic implementation in vector instructions on riscv Changes: https://git.openjdk.org/jdk/pull/15899/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15899&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8315716 Stats: 142 lines in 4 files changed: 142 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15899/head:pull/15899 PR: https://git.openjdk.org/jdk/pull/15899 From luhenry at openjdk.org Wed Sep 27 10:09:42 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 27 Sep 2023 10:09:42 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic In-Reply-To: References: Message-ID: <9ZuKINjIQo3SAMBRc5QMaEq_viZvevVqb16AZB4BzxY=.b99cbb11-85bb-4ed7-b610-737024f9a35c@github.com> On Mon, 25 Sep 2023 11:47:40 GMT, Hamlin Li wrote: > Only vector version is included in this patch. > > ### Test > The patch passed the jdk tests found via `find test/jdk/ -iname *ChaCha*` Changes requested by luhenry (Committer). Only need to do the `li` -> `mv` change and it's LGTM. Also, please change the PR title to `8315716: RISC-V: implement ChaCha20 intrinsic` to link it back to JBS. Thanks! src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 2021: > 2019: // rotate vector register left with shift bits, 32-bit version > 2020: void MacroAssembler::vrol_vwi(VectorRegister vd, uint32_t shift, VectorRegister tmp_vr) { > 2021: vsrl_vi(tmp_vr, vd, 32 - shift); Nit: You could even inline that in `macroAssembler_riscv.hpp` to match the other "vector pseudo instructions". src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4281: > 4279: > 4280: // rotate vector register left with shift bits, 32-bit version > 4281: void rotate_left_imm(VectorRegister rv, uint32_t shift, VectorRegister tmp_vr) { This should be in `macroAssembler_riscv.cpp` instead. I would also call it something like `vrolwi`. That'll more closely match the [`vrol`](https://github.com/riscv/riscv-crypto/blob/c8ddeb7e64a3444dda0438316af1238aeed72041/doc/vector/insns/vrol.adoc#L5) src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4287: > 4285: } > 4286: > 4287: void quarter_round(VectorRegister aVec, VectorRegister bVec, Please rename that to `chacha20_quarter_round` to make it clear it belongs to the chacha20 algorithm src/hotspot/cpu/riscv/vm_version_riscv.cpp line 255: > 253: FLAG_SET_DEFAULT(UseChaCha20Intrinsics, true); > 254: } > 255: } You can have something like that: Suggestion: if (UseRVV) { if (FLAG_IS_DEFAULT(UseChaCha20Intrinsics)) { FLAG_SET_DEFAULT(UseChaCha20Intrinsics, true); } } else if (UseChaCha20Intrinsics) { if (!FLAG_IS_DEFAULT(UseChaCha20Intrinsics)) { warning("Chacha20 Intrinsics requires RVV instructions (not available on this CPU)"); } FLAG_SET_DEFAULT(UseChaCha20Intrinsics, false); } ------------- PR Review: https://git.openjdk.org/jdk/pull/15899#pullrequestreview-1642116248 Marked as reviewed by luhenry (Committer). PR Review: https://git.openjdk.org/jdk/pull/15899#pullrequestreview-1646047413 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1336316527 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1335859588 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1335860186 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1336315818 From rehn at openjdk.org Wed Sep 27 10:09:42 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 27 Sep 2023 10:09:42 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic In-Reply-To: References: Message-ID: <_pqOBOgIpb6t8zMJWw_SHOZUsYO9_dQgPEJ3gddF_Gw=.22dfd84d-bf70-41b9-af21-169f09517213@github.com> On Mon, 25 Sep 2023 11:47:40 GMT, Hamlin Li wrote: > Only vector version is included in this patch. > > ### Test > The patch passed the jdk tests found via `find test/jdk/ -iname *ChaCha*` Thanks, can you include what testing you have done in the PR description? (testing as in validating the code is correct) src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4319: > 4317: * c_rarg1 - key_stream, the array that will hold the result of the ChaCha20 block function > 4318: */ > 4319: address generate_chacha20Block() { I see the other ones are not static, but they all should be static. Ignore, I missed these were in scope of "class StubGenerator". src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4343: > 4341: // in java level. > 4342: __ li(avl, 16); > 4343: __ vsetvli(length, avl, Assembler::e32, Assembler::m1, Assembler::ma, Assembler::ta); Is this really correct. We have no uses of ma/ta before this since we need to make sure we never touch memory outside of the arrays. I don't think ma/ta will ever be correct when working on Java heap. I would drop the last two argument and use the default of mu/tu as we do everywhere else. src/hotspot/cpu/riscv/vm_version_riscv.cpp line 191: > 189: if (UseRVV && FLAG_IS_DEFAULT(UseChaCha20Intrinsics)) { > 190: FLAG_SET_DEFAULT(UseChaCha20Intrinsics, true); > 191: } Just below this we may set RVV to false. I would put this just above "#ifdef COMPILER2" or so. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15899#issuecomment-1733594250 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1335871950 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1336175787 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1335802863 From mli at openjdk.org Wed Sep 27 10:09:43 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 27 Sep 2023 10:09:43 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic In-Reply-To: <_pqOBOgIpb6t8zMJWw_SHOZUsYO9_dQgPEJ3gddF_Gw=.22dfd84d-bf70-41b9-af21-169f09517213@github.com> References: <_pqOBOgIpb6t8zMJWw_SHOZUsYO9_dQgPEJ3gddF_Gw=.22dfd84d-bf70-41b9-af21-169f09517213@github.com> Message-ID: On Mon, 25 Sep 2023 12:19:22 GMT, Robbin Ehn wrote: > Thanks, can you include what testing you have done in the PR description? (testing as in validating the code is correct) added. > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4343: > >> 4341: // in java level. >> 4342: __ li(avl, 16); >> 4343: __ vsetvli(length, avl, Assembler::e32, Assembler::m1, Assembler::ma, Assembler::ta); > > Is this really correct. > We have no uses of ma/ta before this since we need to make sure we never touch memory outside of the arrays. > I don't think ma/ta will ever be correct when working on Java heap. > > I would drop the last two argument and use the default of mu/tu as we do everywhere else. I'm not quite sure, but modified as you suggested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15899#issuecomment-1733756083 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1336204681 From mli at openjdk.org Wed Sep 27 10:09:43 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 27 Sep 2023 10:09:43 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic In-Reply-To: <9ZuKINjIQo3SAMBRc5QMaEq_viZvevVqb16AZB4BzxY=.b99cbb11-85bb-4ed7-b610-737024f9a35c@github.com> References: <9ZuKINjIQo3SAMBRc5QMaEq_viZvevVqb16AZB4BzxY=.b99cbb11-85bb-4ed7-b610-737024f9a35c@github.com> Message-ID: On Wed, 27 Sep 2023 09:37:10 GMT, Ludovic Henry wrote: > Only need to do the `li` -> `mv` change and it's LGTM. Also, please change the PR title to `8315716: RISC-V: implement ChaCha20 intrinsic` to link it back to JBS. Thanks! Thanks for reviewing! > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4281: > >> 4279: >> 4280: // rotate vector register left with shift bits, 32-bit version >> 4281: void rotate_left_imm(VectorRegister rv, uint32_t shift, VectorRegister tmp_vr) { > > This should be in `macroAssembler_riscv.cpp` instead. I would also call it something like `vrolwi`. That'll more closely match the [`vrol`](https://github.com/riscv/riscv-crypto/blob/c8ddeb7e64a3444dda0438316af1238aeed72041/doc/vector/insns/vrol.adoc#L5) Thanks Ludovic, Fixed. > src/hotspot/cpu/riscv/vm_version_riscv.cpp line 255: > >> 253: FLAG_SET_DEFAULT(UseChaCha20Intrinsics, true); >> 254: } >> 255: } > > You can have something like that: > > Suggestion: > > if (UseRVV) { > if (FLAG_IS_DEFAULT(UseChaCha20Intrinsics)) { > FLAG_SET_DEFAULT(UseChaCha20Intrinsics, true); > } > } else if (UseChaCha20Intrinsics) { > if (!FLAG_IS_DEFAULT(UseChaCha20Intrinsics)) { > warning("Chacha20 Intrinsics requires RVV instructions (not available on this CPU)"); > } > FLAG_SET_DEFAULT(UseChaCha20Intrinsics, false); > } Thanks Ludovic! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15899#issuecomment-1737096689 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1335923012 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1337586626 From rehn at openjdk.org Wed Sep 27 10:09:43 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 27 Sep 2023 10:09:43 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic In-Reply-To: <9ZuKINjIQo3SAMBRc5QMaEq_viZvevVqb16AZB4BzxY=.b99cbb11-85bb-4ed7-b610-737024f9a35c@github.com> References: <9ZuKINjIQo3SAMBRc5QMaEq_viZvevVqb16AZB4BzxY=.b99cbb11-85bb-4ed7-b610-737024f9a35c@github.com> Message-ID: On Mon, 25 Sep 2023 13:03:11 GMT, Ludovic Henry wrote: >> Only vector version is included in this patch. >> >> ### Test >> The patch passed the jdk tests found via `find test/jdk/ -iname *ChaCha*` > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4287: > >> 4285: } >> 4286: >> 4287: void quarter_round(VectorRegister aVec, VectorRegister bVec, > > Please rename that to `chacha20_quarter_round` to make it clear it belongs to the chacha20 algorithm And should be static. Ignore. I missed they are in scope of class StubGenerator. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1335871092 From tonyp at openjdk.org Wed Sep 27 10:09:43 2023 From: tonyp at openjdk.org (Antonios Printezis) Date: Wed, 27 Sep 2023 10:09:43 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 11:47:40 GMT, Hamlin Li wrote: > Only vector version is included in this patch. > > ### Test > The patch passed the jdk tests found via `find test/jdk/ -iname *ChaCha*` src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4326: > 4324: const Register tmp_addr = t1; > 4325: const Register length = t2; > 4326: const Register avl = x28; Can you add the rest of the tmp registers to: // temporary register(caller-save registers) constexpr Register t0 = x5; constexpr Register t1 = x6; constexpr Register t2 = x7; so you can use `t3` / `t4` here instead of `x28` / `x29`? src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4338: > 4336: > 4337: RegSet saved_regs; > 4338: __ push_reg(saved_regs, sp); You're only using tmp registers it looks like? Is this needed, as the `saved_regs` set is empty? src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4342: > 4340: // Put 16 here, as com.sun.crypto.providerChaCha20Cipher.KS_MAX_LEN is 1024 > 4341: // in java level. > 4342: __ li(avl, 16); It's recommended to use `__ mv(avl, 16);` to copy a constant to a register. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4355: > 4353: > 4354: // Perform 10 iterations of the 8 quarter round set > 4355: __ li(loop, 10); `__ mv(loop, 10);` src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4381: > 4379: > 4380: // Store result to key stream > 4381: __ li(stride, 64); `__ mv(stride, 64);` src/hotspot/cpu/riscv/vm_version_riscv.cpp line 251: > 249: FLAG_SET_DEFAULT(UseBlockZeroing, false); > 250: } > 251: if (UseRVV) { What happens if someone enables `+UseChaCha20Intrinsics` without `+UseRVV`? Maybe check if `-UseRVV` and `+UseChaCha20Intrinsics` and disable it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1336263149 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1336260087 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1336247004 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1336248783 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1336249320 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1336241132 From mli at openjdk.org Wed Sep 27 10:09:44 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 27 Sep 2023 10:09:44 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 18:42:15 GMT, Antonios Printezis wrote: >> Only vector version is included in this patch. >> >> ### Test >> The patch passed the jdk tests found via `find test/jdk/ -iname *ChaCha*` > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4326: > >> 4324: const Register tmp_addr = t1; >> 4325: const Register length = t2; >> 4326: const Register avl = x28; > > Can you add the rest of the tmp registers to: > > > // temporary register(caller-save registers) > constexpr Register t0 = x5; > constexpr Register t1 = x6; > constexpr Register t2 = x7; > > > so you can use `t3` / `t4` here instead of `x28` / `x29`? sure > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4342: > >> 4340: // Put 16 here, as com.sun.crypto.providerChaCha20Cipher.KS_MAX_LEN is 1024 >> 4341: // in java level. >> 4342: __ li(avl, 16); > > It's recommended to use `__ mv(avl, 16);` to copy a constant to a register. Is there a difference between mv and li? Seems that mv(...) is calling li(...)? template::value)> inline void mv(Register Rd, T o) { li(Rd, (int64_t)o); } Or the recommendation is a code convention? > src/hotspot/cpu/riscv/vm_version_riscv.cpp line 251: > >> 249: FLAG_SET_DEFAULT(UseBlockZeroing, false); >> 250: } >> 251: if (UseRVV) { > > What happens if someone enables `+UseChaCha20Intrinsics` without `+UseRVV`? Maybe check if `-UseRVV` and `+UseChaCha20Intrinsics` and disable it? Nice catch! Thanks Tony! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1337584807 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1337578231 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1337573555 From luhenry at openjdk.org Wed Sep 27 10:09:44 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 27 Sep 2023 10:09:44 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic In-Reply-To: References: Message-ID: On Tue, 26 Sep 2023 17:43:15 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4342: >> >>> 4340: // Put 16 here, as com.sun.crypto.providerChaCha20Cipher.KS_MAX_LEN is 1024 >>> 4341: // in java level. >>> 4342: __ li(avl, 16); >> >> It's recommended to use `__ mv(avl, 16);` to copy a constant to a register. > > Is there a difference between mv and li? Seems that mv(...) is calling li(...)? > > template::value)> > inline void mv(Register Rd, T o) { li(Rd, (int64_t)o); } > > Or the recommendation is a code convention? It's only code convention in Hotspot. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1338340087 From mli at openjdk.org Wed Sep 27 10:09:44 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 27 Sep 2023 10:09:44 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic In-Reply-To: References: Message-ID: <3W75sRp2lXw-nd21_8yX7T2ZAF5WA63QfuWp-GOsqag=.9ba45dd3-5838-4aaf-b81a-2248bfc3fa66@github.com> On Wed, 27 Sep 2023 09:36:15 GMT, Ludovic Henry wrote: >> Is there a difference between mv and li? Seems that mv(...) is calling li(...)? >> >> template::value)> >> inline void mv(Register Rd, T o) { li(Rd, (int64_t)o); } >> >> Or the recommendation is a code convention? > > It's only code convention in Hotspot. Thanks for confirmation! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1338374627 From mli at openjdk.org Wed Sep 27 10:09:44 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 27 Sep 2023 10:09:44 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic In-Reply-To: References: <_pqOBOgIpb6t8zMJWw_SHOZUsYO9_dQgPEJ3gddF_Gw=.22dfd84d-bf70-41b9-af21-169f09517213@github.com> Message-ID: On Mon, 25 Sep 2023 17:40:23 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4343: >> >>> 4341: // in java level. >>> 4342: __ li(avl, 16); >>> 4343: __ vsetvli(length, avl, Assembler::e32, Assembler::m1, Assembler::ma, Assembler::ta); >> >> Is this really correct. >> We have no uses of ma/ta before this since we need to make sure we never touch memory outside of the arrays. >> I don't think ma/ta will ever be correct when working on Java heap. >> >> I would drop the last two argument and use the default of mu/tu as we do everywhere else. > > I'm not quite sure, but modified as you suggested. There is some statement at https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#343-vector-tail-agnostic-and-vector-mask-agnostic-vta-and-vma, The agnostic policy was added to accommodate machines with vector register renaming. With an undisturbed policy, all elements would have to be read from the old physical destination vector register to be copied into the new physical destination vector register. This causes an inefficiency when these inactive or tail values are not required for subsequent calculations. Seems it's more effiecient at some situation, but I'm not sure what's that case mentioned above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1336206026 From mli at openjdk.org Wed Sep 27 10:09:44 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 27 Sep 2023 10:09:44 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic In-Reply-To: References: <_pqOBOgIpb6t8zMJWw_SHOZUsYO9_dQgPEJ3gddF_Gw=.22dfd84d-bf70-41b9-af21-169f09517213@github.com> Message-ID: On Mon, 25 Sep 2023 17:41:49 GMT, Hamlin Li wrote: >> I'm not quite sure, but modified as you suggested. > > There is some statement at https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#343-vector-tail-agnostic-and-vector-mask-agnostic-vta-and-vma, > > The agnostic policy was added to accommodate machines with vector register renaming. With an undisturbed policy, all elements would have to be read from the old physical destination vector register to be copied into the new physical destination vector register. This causes an inefficiency when these inactive or tail values are not required for subsequent calculations. > > Seems it's more effiecient at some situation, but I'm not sure what's that case mentioned above. At https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#7-vector-loads-and-stores, there is a statement, ```Vector loads and stores can be masked, and they only access memory or raise exceptions for active elements.``` Seems it will not "touch memory outside of the arrays" when store back to key stream array(which in java heap) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1336216191 From mli at openjdk.org Wed Sep 27 10:09:44 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 27 Sep 2023 10:09:44 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic In-Reply-To: References: <_pqOBOgIpb6t8zMJWw_SHOZUsYO9_dQgPEJ3gddF_Gw=.22dfd84d-bf70-41b9-af21-169f09517213@github.com> Message-ID: <2gy2rzM2B7I7oHf-pVmKTbEkmZnwRznNinCAnDRDQNg=.57e9d123-c45b-48de-81db-48a9f4556d1d@github.com> On Mon, 25 Sep 2023 17:52:46 GMT, Hamlin Li wrote: >> There is some statement at https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#343-vector-tail-agnostic-and-vector-mask-agnostic-vta-and-vma, >> >> The agnostic policy was added to accommodate machines with vector register renaming. With an undisturbed policy, all elements would have to be read from the old physical destination vector register to be copied into the new physical destination vector register. This causes an inefficiency when these inactive or tail values are not required for subsequent calculations. >> >> Seems it's more effiecient at some situation, but I'm not sure what's that case mentioned above. > > At https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#7-vector-loads-and-stores, there is a statement, ```Vector loads and stores can be masked, and they only access memory or raise exceptions for active elements.``` > Seems it will not "touch memory outside of the arrays" when store back to key stream array(which in java heap) Anyway, I think your suggestion make more sense, I've modified as you suggested. Thanks Robbin for the detailed reviewing! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1336217013 From rehn at openjdk.org Wed Sep 27 10:09:44 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 27 Sep 2023 10:09:44 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic In-Reply-To: <2gy2rzM2B7I7oHf-pVmKTbEkmZnwRznNinCAnDRDQNg=.57e9d123-c45b-48de-81db-48a9f4556d1d@github.com> References: <_pqOBOgIpb6t8zMJWw_SHOZUsYO9_dQgPEJ3gddF_Gw=.22dfd84d-bf70-41b9-af21-169f09517213@github.com> <2gy2rzM2B7I7oHf-pVmKTbEkmZnwRznNinCAnDRDQNg=.57e9d123-c45b-48de-81db-48a9f4556d1d@github.com> Message-ID: On Mon, 25 Sep 2023 17:53:37 GMT, Hamlin Li wrote: >> At https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#7-vector-loads-and-stores, there is a statement, ```Vector loads and stores can be masked, and they only access memory or raise exceptions for active elements.``` >> Seems it will not "touch memory outside of the arrays" when store back to key stream array(which in java heap) > > Anyway, I think your suggestion make more sense, I've modified as you suggested. > Thanks Robbin for the detailed reviewing! It's the paragraph above: The tail elements during a vector instruction?s execution are the elements past the current vector length setting specified in vl. This means they are outside of your working set (body). When a set is marked agnostic, the corresponding set of destination elements in any vector destination operand can either retain the value they previously held, or are overwritten with 1s. Within a single vector instruction, each destination element can be either left undisturbed or overwritten with 1s, in any combination, and the pattern of undisturbed or overwritten with 1s is not required to be deterministic when the instruction is executed with the same inputs. Maybe this can't happen here since you work with such nice size. But the arrays result and startState will be initialized to zero when allocated. We don't want anything changing bytes which the VM didn't write to and we must be sure never to write outside of these. So tail is what comes after you data, in our case that will be Java heap, so we must always be sure only to write to the body. Did that make sense? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1336226148 From mli at openjdk.org Wed Sep 27 10:09:44 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 27 Sep 2023 10:09:44 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic In-Reply-To: References: <_pqOBOgIpb6t8zMJWw_SHOZUsYO9_dQgPEJ3gddF_Gw=.22dfd84d-bf70-41b9-af21-169f09517213@github.com> <2gy2rzM2B7I7oHf-pVmKTbEkmZnwRznNinCAnDRDQNg=.57e9d123-c45b-48de-81db-48a9f4556d1d@github.com> Message-ID: On Mon, 25 Sep 2023 18:02:56 GMT, Robbin Ehn wrote: >> Anyway, I think your suggestion make more sense, I've modified as you suggested. >> Thanks Robbin for the detailed reviewing! > > It's the paragraph above: > > The tail elements during a vector instruction?s execution are the elements past the current vector length setting specified in vl. > > > This means they are outside of your working set (body). > > > When a set is marked agnostic, the corresponding set of destination elements in any vector destination operand > can either retain the value they previously held, or are overwritten with 1s. Within a single vector instruction, each > destination element can be either left undisturbed or overwritten with 1s, in any combination, and the pattern of > undisturbed or overwritten with 1s is not required to be deterministic when the instruction is executed with the same > inputs. > > > Maybe this can't happen here since you work with such nice size. > But the arrays result and startState will be initialized to zero when allocated. > We don't want anything changing bytes which the VM didn't write to and we must be sure never to write outside of these. > > So tail is what comes after you data, in our case that will be Java heap, so we must always be sure only to write to the body. > > Did that make sense? Thanks for discussion. Yes, I see your point. What I mean above is that the java heap can only be touched when store back to memory, but store `only access memory or raise exceptions for active elements.`, that seems mean that a vectore operation might touch tail elements in vector group register, but a vector store will not touch the memory corresponding to the tail elements. Am I understanding the spec correctly? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1337573111 From luhenry at openjdk.org Wed Sep 27 10:09:44 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 27 Sep 2023 10:09:44 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic In-Reply-To: <_pqOBOgIpb6t8zMJWw_SHOZUsYO9_dQgPEJ3gddF_Gw=.22dfd84d-bf70-41b9-af21-169f09517213@github.com> References: <_pqOBOgIpb6t8zMJWw_SHOZUsYO9_dQgPEJ3gddF_Gw=.22dfd84d-bf70-41b9-af21-169f09517213@github.com> Message-ID: On Mon, 25 Sep 2023 12:15:58 GMT, Robbin Ehn wrote: >> Only vector version is included in this patch. >> >> ### Test >> The patch passed the jdk tests found via `find test/jdk/ -iname *ChaCha*` > > src/hotspot/cpu/riscv/vm_version_riscv.cpp line 191: > >> 189: if (UseRVV && FLAG_IS_DEFAULT(UseChaCha20Intrinsics)) { >> 190: FLAG_SET_DEFAULT(UseChaCha20Intrinsics, true); >> 191: } > > Just below this we may set RVV to false. > I would put this just above "#ifdef COMPILER2" or so. ? https://github.com/openjdk/jdk/pull/15899/files#diff-7b173d6e5834de13749c8333192fef5a874628a67b90a5d8d06235d507542ac4R255 seems like the ideal place. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1335862298 From mli at openjdk.org Wed Sep 27 10:09:44 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 27 Sep 2023 10:09:44 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic In-Reply-To: References: <_pqOBOgIpb6t8zMJWw_SHOZUsYO9_dQgPEJ3gddF_Gw=.22dfd84d-bf70-41b9-af21-169f09517213@github.com> Message-ID: <9enIuGZ7XgdLasOj3GSeeoNmrraOhAczu3G7FwcNmZg=.c87eed20-4d81-4a29-94f5-0ffc9a76f2aa@github.com> On Mon, 25 Sep 2023 13:04:59 GMT, Ludovic Henry wrote: >> src/hotspot/cpu/riscv/vm_version_riscv.cpp line 191: >> >>> 189: if (UseRVV && FLAG_IS_DEFAULT(UseChaCha20Intrinsics)) { >>> 190: FLAG_SET_DEFAULT(UseChaCha20Intrinsics, true); >>> 191: } >> >> Just below this we may set RVV to false. >> I would put this just above "#ifdef COMPILER2" or so. > > ? https://github.com/openjdk/jdk/pull/15899/files#diff-7b173d6e5834de13749c8333192fef5a874628a67b90a5d8d06235d507542ac4R255 seems like the ideal place. Thanks Robbin, Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1335921835 From luhenry at openjdk.org Wed Sep 27 10:13:11 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 27 Sep 2023 10:13:11 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 11:47:40 GMT, Hamlin Li wrote: > Only vector version is included in this patch. > > ### Test > The patch passed the jdk tests found via `find test/jdk/ -iname *ChaCha*` Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15899#pullrequestreview-1646110845 From duke at openjdk.org Wed Sep 27 12:04:46 2023 From: duke at openjdk.org (Liming Liu) Date: Wed, 27 Sep 2023 12:04:46 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly Message-ID: As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported:

Kernel -XX:-TransparentHugePages -XX:+TransparentHugePages

Unpatched Patched Unpatched Patched

4.18 11.30 11.30 0.25 0.25

5.13 0.22 0.22 3.42 3.42

6.1 0.27 0.33 3.54 0.33

------------- Commit messages: - Add a comment about the selection of GC and move the changing argument ahead - Rename pretouch_memory_fallback to pretouch_memory_common - Take the logic of inclusive ranges back for pretouch to avoid overflow - Replace tab with space - 8315923: pretouch_memory by atomic-add-0 fragments huge page unexpectedly Changes: https://git.openjdk.org/jdk/pull/15781/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15781&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8315923 Stats: 117 lines in 8 files changed: 105 ins; 6 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/15781.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15781/head:pull/15781 PR: https://git.openjdk.org/jdk/pull/15781 From pbk at openjdk.org Wed Sep 27 12:04:46 2023 From: pbk at openjdk.org (Peter B. Kessler) Date: Wed, 27 Sep 2023 12:04:46 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly In-Reply-To: References: Message-ID: <0_bb5A_nBfNRkwlnhCv_hNO8fW7Dt5ktqt-Z3DCy-pk=.e9dcd499-7c06-4283-9e27-690f91c70f05@github.com> On Mon, 18 Sep 2023 07:37:26 GMT, Liming Liu wrote: > As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). > > Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported: > > > > > > > > > > > >
Kernel -XX:-TransparentHugePages -XX:+TransparentHugePages
Unpatched Patched Unpatched Patched
4.18 11.30 11.30 0.25 0.25
5.13 0.22 0.22 3.42 3.42
6.1 0.27 0.33 3.54 0.33
src/hotspot/os/linux/os_linux.cpp line 2914: > 2912: // will initially always use small pages. > 2913: page_size = UseTransparentHugePages ? (size_t)os::vm_page_size() : page_size; > 2914: pretouch_memory_fallback(start, end, page_size); This assignment should be to a new variable named `pretouch_page_size` since it will only be used by `pretouch_memory_fallback`, and otherwise modifies the function parameter and also shadows the variable from `class os`. Declaring it `const` would get you extra points from readers, but I accept that is not the style here. src/hotspot/share/runtime/os.cpp line 2122: > 2120: Atomic::add(reinterpret_cast(cur), 0, memory_order_relaxed); > 2121: cur += page_size; > 2122: } while (cur < end); The previous code was careful to touch only as far as `align_down(static_cast(end) - 1, page_size)` to avoid the case where the region to be pre-touched ended at the end of the address space. (There is a comment about that.) Your code might compute `cur + page_size` and wrap around to the beginnng of the address space. You do not want to debug that, or have someone curse you for making them debug it. Your code might find that `cur < end` is never true if `end == 0` because you do not do the `-1` and the `align_down` and the comparison is unslgned. A style comment: I usually write loops like the old line 2113 as `for ( ; /* break *; cur += page_size) {` to warn the reader to watch for a `break` from the loop. src/hotspot/share/runtime/os.hpp line 226: > 224: static void pd_free_memory(char *addr, size_t bytes, size_t alignment_hint); > 225: static void pd_realign_memory(char *addr, size_t bytes, size_t alignment_hint); > 226: static void pd_pretouch_memory(void *start, void *end, size_t page_size); Should this be next to the declaration of `os::pretouch_memory` to warn the reader that there might be platform-dependent implementations that have to be considered? If you move this method, I would move `pretouch_memory_fallback` also. src/hotspot/share/runtime/os.hpp line 229: > 227: > 228: // Fallback to this if OS needs no specific treatment > 229: static void pretouch_memory_fallback(void *start, void *end, size_t page_size); Would `pretouch_memory_common` be a better name for this method? The name should reflect why the method exists, and I think of this as common code to avoid having a copy of the implementation in each platform-specific class. You don't "fall back" to this implementation: on most platforms, this is the only implementation. If `class os` had virtual methods, this would be the base class implementation. (Not a big deal.) test/hotspot/jtreg/gc/parallel/TestParallelAlwaysPreTouch.java line 31: > 29: * > 30: * @run main/othervm -XX:+UseParallelGC -XX:ParallelGCThreads=${os.processors} > 31: * -Xlog:startuptime,pagesize,gc+heap=debug I don't think there is anything ParallelGC-specific about the issue. If so, you could leave off the `Use...GC` argument (fewer arguments are better), and maybe add a comment in the test that the issue is present in both G1GC and ParallelGC. Extra points for testing any other collectors that support `-XX:+AlwaysPreTouch`. test/hotspot/jtreg/gc/parallel/TestParallelAlwaysPreTouch.java line 33: > 31: * -Xlog:startuptime,pagesize,gc+heap=debug > 32: * -Xms24G -Xmx24G -XX:+AlwaysPreTouch > 33: * -XX:-UseTransparentHugePages I would move the `-XX:-UseTransparentHugePages` and `-XX:+UseTransparentHugePages` to be the first argument on their command lines, to emphasize the only argument that is changing between the two runs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1329403622 PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1329369688 PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1329389651 PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1329395341 PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1329427162 PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1329425068 From duke at openjdk.org Wed Sep 27 12:04:47 2023 From: duke at openjdk.org (Liming Liu) Date: Wed, 27 Sep 2023 12:04:47 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly In-Reply-To: <0_bb5A_nBfNRkwlnhCv_hNO8fW7Dt5ktqt-Z3DCy-pk=.e9dcd499-7c06-4283-9e27-690f91c70f05@github.com> References: <0_bb5A_nBfNRkwlnhCv_hNO8fW7Dt5ktqt-Z3DCy-pk=.e9dcd499-7c06-4283-9e27-690f91c70f05@github.com> Message-ID: On Tue, 19 Sep 2023 00:02:12 GMT, Peter B. Kessler wrote: >> As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). >> >> Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported: >> >> >> >> >> >> >> >> >> >> >> >>
Kernel -XX:-TransparentHugePages -XX:+TransparentHugePages
Unpatched Patched Unpatched Patched
4.18 11.30 11.30 0.25 0.25
5.13 0.22 0.22 3.42 3.42
6.1 0.27 0.33 3.54 0.33
> > src/hotspot/os/linux/os_linux.cpp line 2914: > >> 2912: // will initially always use small pages. >> 2913: page_size = UseTransparentHugePages ? (size_t)os::vm_page_size() : page_size; >> 2914: pretouch_memory_fallback(start, end, page_size); > > This assignment should be to a new variable named `pretouch_page_size` since it will only be used by `pretouch_memory_fallback`, and otherwise modifies the function parameter and also shadows the variable from `class os`. Declaring it `const` would get you extra points from readers, but I accept that is not the style here. The code is moved from the Linux-specific code in pretouchTask.cpp. I think it would be fine here. > src/hotspot/share/runtime/os.cpp line 2122: > >> 2120: Atomic::add(reinterpret_cast(cur), 0, memory_order_relaxed); >> 2121: cur += page_size; >> 2122: } while (cur < end); > > The previous code was careful to touch only as far as `align_down(static_cast(end) - 1, page_size)` to avoid the case where the region to be pre-touched ended at the end of the address space. (There is a comment about that.) Your code might compute `cur + page_size` and wrap around to the beginnng of the address space. You do not want to debug that, or have someone curse you for making them debug it. > > Your code might find that `cur < end` is never true if `end == 0` because you do not do the `-1` and the `align_down` and the comparison is unslgned. > > A style comment: I usually write loops like the old line 2113 as `for ( ; /* break *; cur += page_size) {` to warn the reader to watch for a `break` from the loop. Thanks to point out the risk of overflow. I think it originates from the call to `align_up` earlier, and the following assertion would guarantee the computation of `cur + page_size` if overflow did not happen before. So I took back the original logic of inclusive ranges here, and renamed the parameters from `start` and `end` to `first` and `last`. The suggestion on style is also adopted. > src/hotspot/share/runtime/os.hpp line 226: > >> 224: static void pd_free_memory(char *addr, size_t bytes, size_t alignment_hint); >> 225: static void pd_realign_memory(char *addr, size_t bytes, size_t alignment_hint); >> 226: static void pd_pretouch_memory(void *start, void *end, size_t page_size); > > Should this be next to the declaration of `os::pretouch_memory` to warn the reader that there might be platform-dependent implementations that have to be considered? If you move this method, I would move `pretouch_memory_fallback` also. I placed it near the similar platform-dependent functions. It should be fine not to treat it special. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1329831774 PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1329819641 PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1329826784 From coleenp at openjdk.org Wed Sep 27 12:22:12 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 27 Sep 2023 12:22:12 GMT Subject: RFR: 8309599: WeakHandle and OopHandle release should clear obj pointer In-Reply-To: <7iiMlW8hlRFlFh8GdIwbphR86Hh3G7L5tA-f_VkxuC8=.fa35078c-c776-4041-841a-5d6c69ba4e06@github.com> References: <7iiMlW8hlRFlFh8GdIwbphR86Hh3G7L5tA-f_VkxuC8=.fa35078c-c776-4041-841a-5d6c69ba4e06@github.com> Message-ID: On Tue, 26 Sep 2023 12:47:42 GMT, Coleen Phillimore wrote: > This change makes WeakHandle and OopHandle release null out the obj pointer, at the cost of making the release function non-const and some changes that propagated from that. This enables ObjectMonitor code to test for null to see if the obj was already released, and seems like the right thing to do. See comments from related PR in the bug report. > Tested with tier1-4. OopHandles and WeakHandles don't have destructors (or copy constructors and assignment operators). Places where we release the storage would have to be reworked significantly to have this sort of usage model: eg, releasing a String from the StringTable is done with a callback from the CHT static void free_node(void* context, void* memory, Value const& value) { value.release(StringTable::_oop_storage); FreeHeap(memory); StringTable::item_removed(); } ------------- PR Comment: https://git.openjdk.org/jdk/pull/15920#issuecomment-1737281090 From aph at openjdk.org Wed Sep 27 12:41:16 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 27 Sep 2023 12:41:16 GMT Subject: RFR: 8310239: Add missing cross modifying fence in nmethod entry barriers [v2] In-Reply-To: References: Message-ID: On Mon, 28 Aug 2023 11:19:29 GMT, Erik ?sterlund wrote: > > Do we need an ISB on AArch64-specifc code? There, the guard value is data, not an immediate field. > > In other words, what instruction has just been patched that we need to make visible? > > On AArch64 we only use synchronous cross-modifying code, we just hide the expensive in slow paths using a epoch trick that proves that most executions don't need a fence. So that should all be fine. Sometimes I wonder if we should use that trick on x86_64 as well. I don't understand this reply. On AArch64 we don't patch code, we patch data. So why do we need to add a missing ISB to AArch64? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14543#issuecomment-1737316137 From fbredberg at openjdk.org Wed Sep 27 13:00:15 2023 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 27 Sep 2023 13:00:15 GMT Subject: RFR: 8315966: Relativize initial_sp in interpreter frames [v3] In-Reply-To: <2MH8OqpvowA48pPPZst0J_smowCkPPyawen6o4qpuIw=.a433a9bd-6d7e-4f7d-b601-60c4c151164b@github.com> References: <2MH8OqpvowA48pPPZst0J_smowCkPPyawen6o4qpuIw=.a433a9bd-6d7e-4f7d-b601-60c4c151164b@github.com> Message-ID: <1NStq2k2grGtpP_s7I1zmk8Uj23YO0dXEBhYfe5b6p0=.ad622f98-035c-44f9-9f43-8ac9f38ca436@github.com> On Wed, 27 Sep 2023 09:07:23 GMT, Fredrik Bredberg wrote: >> Relativize initial_sp in interpreter frames. >> >> By changing the "initial_sp" (AKA "monitor_block_top" or "monitors" on PowerPC) member in interpreter frames from being an absolute address into an offset that is relative to the frame pointer, we don't need to change the value as we freeze and thaw frames of virtual threads. This is since we might freeze and thaw from and to different worker threads, so the absolute address to locals might change, but the offset from the frame pointer will be constant. >> >> This subtask only handles relativization of "initial_sp" and "monitor_block_top" since it's the same slot in interpreter frames (roughly the same as "monitors" on PowerPC). Relativization of other interpreter frame members are handled in other subtasks to JDK-8289296. >> >> Tested tier1-tier7 on supported platforms. The rest was sanity tested using Qemu. > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Removed whitespace (RISC-V only). Thank you all. If no one else has anything to add, I'll integrate (as soon as I can convince a sponsor). ------------- PR Comment: https://git.openjdk.org/jdk/pull/15815#issuecomment-1737349402 From fbredberg at openjdk.org Wed Sep 27 13:18:26 2023 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 27 Sep 2023 13:18:26 GMT Subject: Integrated: 8315966: Relativize initial_sp in interpreter frames In-Reply-To: References: Message-ID: On Tue, 19 Sep 2023 09:00:01 GMT, Fredrik Bredberg wrote: > Relativize initial_sp in interpreter frames. > > By changing the "initial_sp" (AKA "monitor_block_top" or "monitors" on PowerPC) member in interpreter frames from being an absolute address into an offset that is relative to the frame pointer, we don't need to change the value as we freeze and thaw frames of virtual threads. This is since we might freeze and thaw from and to different worker threads, so the absolute address to locals might change, but the offset from the frame pointer will be constant. > > This subtask only handles relativization of "initial_sp" and "monitor_block_top" since it's the same slot in interpreter frames (roughly the same as "monitors" on PowerPC). Relativization of other interpreter frame members are handled in other subtasks to JDK-8289296. > > Tested tier1-tier7 on supported platforms. The rest was sanity tested using Qemu. This pull request has now been integrated. Changeset: 347bd15e Author: Fredrik Bredberg Committer: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/347bd15e49f5632e16d0ae4dd7240a3648baf539 Stats: 202 lines in 30 files changed: 84 ins; 48 del; 70 mod 8315966: Relativize initial_sp in interpreter frames Reviewed-by: fyang, mdoerr, pchilanomate ------------- PR: https://git.openjdk.org/jdk/pull/15815 From rehn at openjdk.org Wed Sep 27 13:49:13 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 27 Sep 2023 13:49:13 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic In-Reply-To: References: <_pqOBOgIpb6t8zMJWw_SHOZUsYO9_dQgPEJ3gddF_Gw=.22dfd84d-bf70-41b9-af21-169f09517213@github.com> <2gy2rzM2B7I7oHf-pVmKTbEkmZnwRznNinCAnDRDQNg=.57e9d123-c45b-48de-81db-48a9f4556d1d@github.com> Message-ID: On Tue, 26 Sep 2023 17:38:26 GMT, Hamlin Li wrote: >> It's the paragraph above: >> >> The tail elements during a vector instruction?s execution are the elements past the current vector length setting specified in vl. >> >> >> This means they are outside of your working set (body). >> >> >> When a set is marked agnostic, the corresponding set of destination elements in any vector destination operand >> can either retain the value they previously held, or are overwritten with 1s. Within a single vector instruction, each >> destination element can be either left undisturbed or overwritten with 1s, in any combination, and the pattern of >> undisturbed or overwritten with 1s is not required to be deterministic when the instruction is executed with the same >> inputs. >> >> >> Maybe this can't happen here since you work with such nice size. >> But the arrays result and startState will be initialized to zero when allocated. >> We don't want anything changing bytes which the VM didn't write to and we must be sure never to write outside of these. >> >> So tail is what comes after you data, in our case that will be Java heap, so we must always be sure only to write to the body. >> >> Did that make sense? > > Thanks for discussion. Yes, I see your point. > What I mean above is that the java heap can only be touched when store back to memory, but store `only access memory or raise exceptions for active elements.`, that seems mean that a vectore operation might touch tail elements in vector group register, but a vector store will not touch the memory corresponding to the tail elements. > Am I understanding the spec correctly? I don't see that the specs guarantees that under no circumstance tail memory will not be touched. So I don't know, for now it's better to be safe than sorry. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1338640884 From tschatzl at openjdk.org Wed Sep 27 14:16:20 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 27 Sep 2023 14:16:20 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v11] In-Reply-To: References: Message-ID: On Tue, 26 Sep 2023 16:40:51 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with five additional commits since the last revision: > > - Eliminate special case for scanning the large array end > - First card of large array should be cleared if dirty > - Do all large array scanning in separate method > - Limit stripe size to 1m with at least 8 threads > - Small clean-ups Hi, > > I experimented with the aforementioned read-only card table idea a bit and here is the draft: > > https://github.com/openjdk/jdk/compare/master...albertnetymk:jdk:pgc-precise-obj-arr?expand=1 > > This looks very nice! The code is a lot easier to follow than the baseline and this pr. > > With your draft I found out too that the regressions with just 2 threads come from the remaining `object_start` calls. Larger stripes mean fewer of them. The caching used in your draft is surly better. > > So by default 1 card table byte per 512b card is needed. The shadow card table will require 2M per gigabyte used old generation. I guess that's affordable. > > Would you think that your solution can be backported? I had a brief look at @albertnetymk's suggestion, a few comments: * it uses another card table - while "just" another 0.2% of the heap, we should try to avoid such regressions. G1 also does not need another card table... maybe some more effort should be put into optimizing that one away. * obviously allocating and freeing during the pause is suboptimal wrt to pause time so the prototype should be improved in that regard :) * the copying will stay (if there is a second card table), I would be interested in pause time changes for more throughput'y applications (jbb2005, timefold/optaplanner https://timefold.ai/blog/2023/java-21-performance) * anything can be backported, but the question is whether the individual maintainers of these versions are going to. It does have a good case though which may make it easier to convince maintainers. Hth, Thomas ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1737486097 From jvernee at openjdk.org Wed Sep 27 14:56:31 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 27 Sep 2023 14:56:31 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v28] In-Reply-To: <34zxg0Er5UaZKlmKxpvQ0QxAv8CeXEp0pSemqPgKquo=.165c5ccd-2873-42e7-aefa-e9ba2e78f661@github.com> References: <9NvYUl5F19FDLsn14TCcO34nCEZZIt07H8iLHrpTbgY=.d375318e-73a4-43df-9526-264a0ee043bc@github.com> <34zxg0Er5UaZKlmKxpvQ0QxAv8CeXEp0pSemqPgKquo=.165c5ccd-2873-42e7-aefa-e9ba2e78f661@github.com> Message-ID: On Wed, 27 Sep 2023 14:50:32 GMT, Alan Bateman wrote: >> Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fix visibility issues >> >> Reviewed-by: mcimadamore >> - Review comments > > test/hotspot/jtreg/compiler/rangechecks/TestRangeCheckHoistingScaledIV.java line 32: > >> 30: * @modules jdk.incubator.vector >> 31: * @compile -source ${jdk.version} TestRangeCheckHoistingScaledIV.java >> 32: * @run main/othervm compiler.rangechecks.TestRangeCheckHoistingScaledIV > > Not important but I assume the @compile line can be removed from a number of tests as it's no longer needed. It was needed for tests that didn't use @enablePreview. Ok, I'll go over all the tests that I've changed and see if there are `@compile` tags that can be removed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1338741350 From alanb at openjdk.org Wed Sep 27 14:56:29 2023 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 27 Sep 2023 14:56:29 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v28] In-Reply-To: <9NvYUl5F19FDLsn14TCcO34nCEZZIt07H8iLHrpTbgY=.d375318e-73a4-43df-9526-264a0ee043bc@github.com> References: <9NvYUl5F19FDLsn14TCcO34nCEZZIt07H8iLHrpTbgY=.d375318e-73a4-43df-9526-264a0ee043bc@github.com> Message-ID: <34zxg0Er5UaZKlmKxpvQ0QxAv8CeXEp0pSemqPgKquo=.165c5ccd-2873-42e7-aefa-e9ba2e78f661@github.com> On Wed, 27 Sep 2023 00:53:25 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: > > - Fix visibility issues > > Reviewed-by: mcimadamore > - Review comments src/java.base/share/classes/sun/launcher/LauncherHelper.java line 640: > 638: if (!enableNativeAccess.equals("ALL-UNNAMED")) { > 639: throw new IllegalArgumentException("Only ALL-UNNAMED allowed as value for " + ENABLE_NATIVE_ACCESS); > 640: } I don't think throwing IAE is right here. It should call abort with a key for the error message. The value of enableNativeAccess can be used as the parameter for the message. test/hotspot/jtreg/compiler/rangechecks/TestRangeCheckHoistingScaledIV.java line 32: > 30: * @modules jdk.incubator.vector > 31: * @compile -source ${jdk.version} TestRangeCheckHoistingScaledIV.java > 32: * @run main/othervm compiler.rangechecks.TestRangeCheckHoistingScaledIV Not important but I assume the @compile line can be removed from a number of tests as it's no longer needed. It was needed for tests that didn't use @enablePreview. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1338733430 PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1338737145 From alanb at openjdk.org Wed Sep 27 15:07:31 2023 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 27 Sep 2023 15:07:31 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v28] In-Reply-To: <9NvYUl5F19FDLsn14TCcO34nCEZZIt07H8iLHrpTbgY=.d375318e-73a4-43df-9526-264a0ee043bc@github.com> References: <9NvYUl5F19FDLsn14TCcO34nCEZZIt07H8iLHrpTbgY=.d375318e-73a4-43df-9526-264a0ee043bc@github.com> Message-ID: On Wed, 27 Sep 2023 00:53:25 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: > > - Fix visibility issues > > Reviewed-by: mcimadamore > - Review comments src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 1103: > 1101: * @throws WrongThreadException if this method is called from a thread {@code T}, > 1102: * such that {@code isAccessibleBy(T) == false}. > 1103: * @throws UnsupportedOperationException if {@code charset} is not a {@linkplain StandardCharsets standard charset}. The caller can fix/avoid the exception by providing another value for the argument so I think IAE is the unchecked exception for this case rather than UOE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1338758920 From kvn at openjdk.org Wed Sep 27 15:08:18 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 27 Sep 2023 15:08:18 GMT Subject: RFR: 8316125: Windows call_stub unnecessarily saves xmm16-31 when UseAVX>=3 In-Reply-To: References: Message-ID: On Tue, 12 Sep 2023 17:10:38 GMT, Daniel Jeli?ski wrote: > Please review this patch that removes saving of xmm16-xmm31 registers from Windows call_stub. > > Windows ABI only mandates saving xmm6-xmm15, which we continue to do here. > > No new tests. Mach5 tier1-5 builds and tests clean. Marked as reviewed by kvn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15688#pullrequestreview-1646906429 From jvernee at openjdk.org Wed Sep 27 16:15:30 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 27 Sep 2023 16:15:30 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v28] In-Reply-To: References: <9NvYUl5F19FDLsn14TCcO34nCEZZIt07H8iLHrpTbgY=.d375318e-73a4-43df-9526-264a0ee043bc@github.com> Message-ID: On Wed, 27 Sep 2023 15:04:12 GMT, Alan Bateman wrote: >> Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fix visibility issues >> >> Reviewed-by: mcimadamore >> - Review comments > > src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 1103: > >> 1101: * @throws WrongThreadException if this method is called from a thread {@code T}, >> 1102: * such that {@code isAccessibleBy(T) == false}. >> 1103: * @throws UnsupportedOperationException if {@code charset} is not a {@linkplain StandardCharsets standard charset}. > > The caller can fix/avoid the exception by providing another value for the argument so I think IAE is the unchecked exception for this case rather than UOE. I agree. I'll make the change for the following `CharSet` accepting methods: `MemorySegment::getString(long,Charset)`, `MemorySegment::setString(long,String,Charset)`, and `SegmentAllocator::allocateFrom(String,Charset)`. (Which should be all of them). > src/java.base/share/classes/sun/launcher/LauncherHelper.java line 640: > >> 638: if (!enableNativeAccess.equals("ALL-UNNAMED")) { >> 639: throw new IllegalArgumentException("Only ALL-UNNAMED allowed as value for " + ENABLE_NATIVE_ACCESS); >> 640: } > > I don't think throwing IAE is right here. It should call abort with a key for the error message. The value of enableNativeAccess can be used as the parameter for the message. Thanks for the suggestion! I'll switch this to using `abort` instead. Side note: I don't believe I have to add all the different error message translations right? Only the English version? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1338855648 PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1338857229 From sviswanathan at openjdk.org Wed Sep 27 16:18:16 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 27 Sep 2023 16:18:16 GMT Subject: RFR: 8316125: Windows call_stub unnecessarily saves xmm16-31 when UseAVX>=3 In-Reply-To: References: Message-ID: <-Cjyl5DiyAuGEmY6VPTyIZQqN1iarlbImScHLdlNx90=.dcce128a-5bf7-4d14-bb71-a6eafe3b2066@github.com> On Tue, 12 Sep 2023 17:10:38 GMT, Daniel Jeli?ski wrote: > Please review this patch that removes saving of xmm16-xmm31 registers from Windows call_stub. > > Windows ABI only mandates saving xmm6-xmm15, which we continue to do here. > > No new tests. Mach5 tier1-5 builds and tests clean. Looks good to me as well. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15688#pullrequestreview-1647106729 From vlivanov at openjdk.org Wed Sep 27 16:18:18 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 27 Sep 2023 16:18:18 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v5] In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 06:59:30 GMT, Aleksey Shipilev wrote: >> Work in progress, submitting for broader attention. >> >> See more details in the bug and related issues. >> >> This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. >> >> This implements the mitigation for AArch64 and x86_64. More platforms can be implemented in this PR, or deferred to later PRs. Port maintainers, feel free to suggest the patches for your arches, I'll be happy to fold them in. >> >> Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. >> >> Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. >> >> Current PR deliberately sets backoff at `1000` to simplify testing. The actual value should be chosen by broader experiments. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` >> - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Denser AArch64 Proposed approach looks very promising (especially, from backporting perspective) as a stop-the-gap mitigation for the scalability issue. Speaking of code size increase, extracting slow path into a stub should alleviate the risks. It only affects C2, because C1 already keeps that logic in a stub. src/hotspot/share/runtime/globals.hpp line 2003: > 2001: range(0, UINT_MAX) \ > 2002: \ > 2003: product(uint, SecondarySuperMissBackoff, 1000, EXPERIMENTAL, \ Should it be marked DIAGNOSTIC instead? The functionality is turned on by default. It it were 0 by default, EXPERIMENTAL would have been well-justified. src/hotspot/share/runtime/javaThread.cpp line 417: > 415: _vm_result_2(nullptr), > 416: > 417: _backoff_secondary_super_miss(0), Why don't you set it to `SecondarySuperMissBackoff` instead? With the proposed shape of `MacroAssembler::check_klass_subtype_slow_path()` (`if ((x = (x-1)) >= 0) {... slow path ... }`, an overflow happens on the first access. It has an unfortunate consequence that '== 0' doesn't work anymore and, moreover, signed comparison is needed. ------------- PR Review: https://git.openjdk.org/jdk/pull/15718#pullrequestreview-1647009070 PR Review Comment: https://git.openjdk.org/jdk/pull/15718#discussion_r1338801254 PR Review Comment: https://git.openjdk.org/jdk/pull/15718#discussion_r1338850326 From shade at openjdk.org Wed Sep 27 16:23:20 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 27 Sep 2023 16:23:20 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v5] In-Reply-To: References: Message-ID: <6KXwQYCE8EE4XzW7gCXWj6_Qt5lXjp0CtWXhImf-hic=.7befb81e-2f8b-4cb7-a9eb-f0c1d95a414c@github.com> On Wed, 27 Sep 2023 15:34:13 GMT, Vladimir Ivanov wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Denser AArch64 > > src/hotspot/share/runtime/globals.hpp line 2003: > >> 2001: range(0, UINT_MAX) \ >> 2002: \ >> 2003: product(uint, SecondarySuperMissBackoff, 1000, EXPERIMENTAL, \ > > Should it be marked DIAGNOSTIC instead? The functionality is turned on by default. It it were 0 by default, EXPERIMENTAL would have been well-justified. I don't think the discriminating factor for calling an option "diagnostic" or "experimental" is its default value. Rather it is its target use. As per `globals.hpp`: // DIAGNOSTIC options are not meant for VM tuning or for product modes. // They are to be used for VM quality assurance or field diagnosis // of VM bugs. // EXPERIMENTAL flags are in support of features that may not be // an officially supported part of a product, but may be available // for experimenting with. They could, for example, be performance // features that may not have undergone full or rigorous QA, but which may // help performance in some cases and released for experimentation // by the community of users and developers. ...and this one is obviously the experimental performance feature. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15718#discussion_r1338871123 From alanb at openjdk.org Wed Sep 27 16:28:27 2023 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 27 Sep 2023 16:28:27 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v28] In-Reply-To: References: <9NvYUl5F19FDLsn14TCcO34nCEZZIt07H8iLHrpTbgY=.d375318e-73a4-43df-9526-264a0ee043bc@github.com> Message-ID: <-rSkX6AyCfpiqcZp0Hadw0js8mWAScaUlsvgj4Ng1HE=.9090bf00-70bc-4bb9-8bbb-faa6a2ad8805@github.com> On Wed, 27 Sep 2023 16:12:46 GMT, Jorn Vernee wrote: > Side note: I don't believe I have to add all the different error message translations right? Only the English version? That's right, the translations will be updated towards the end of the release. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1338874028 From jvernee at openjdk.org Wed Sep 27 16:28:25 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 27 Sep 2023 16:28:25 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v29] In-Reply-To: References: Message-ID: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: - Use IAE instead of UOE for unsupported char sets - Use abort instead of IEA when encountering wrong value for ENA attrib. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15103/files - new: https://git.openjdk.org/jdk/pull/15103/files/f6ab4dc5..ea1b9c5f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=27-28 Stats: 18 lines in 7 files changed: 3 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From shade at openjdk.org Wed Sep 27 16:45:07 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 27 Sep 2023 16:45:07 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v6] In-Reply-To: References: Message-ID: > Work in progress, submitting for broader attention. > > See more details in the bug and related issues. > > This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. > > This implements the mitigation for AArch64 and x86_64. More platforms can be implemented in this PR, or deferred to later PRs. Port maintainers, feel free to suggest the patches for your arches, I'll be happy to fold them in. > > Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. > > Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. > > Current PR deliberately sets backoff at `1000` to simplify testing. The actual value should be chosen by broader experiments. > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` > - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: - Merge branch 'master' into JDK-8316180-backoff-secondary-super - Init with backoff right away - x86 cleanup - Denser AArch64 - Cleaner AArch64 code - Use proper 32-bit stores on AArch64 - PPC version - Revert "WIP x86_32" This reverts commit dc37d25b5ef232e2c8b0ac9c966c41a1ae3cca82. - WIP x86_32 - Revert ARM/PPC/RISC-V/S390 development stubs - ... and 6 more: https://git.openjdk.org/jdk/compare/0bc5653f...bc6964ad ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15718/files - new: https://git.openjdk.org/jdk/pull/15718/files/81a0ddd2..bc6964ad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15718&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15718&range=04-05 Stats: 71316 lines in 1463 files changed: 25882 ins; 12613 del; 32821 mod Patch: https://git.openjdk.org/jdk/pull/15718.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15718/head:pull/15718 PR: https://git.openjdk.org/jdk/pull/15718 From vlivanov at openjdk.org Wed Sep 27 16:45:10 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 27 Sep 2023 16:45:10 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v5] In-Reply-To: <6KXwQYCE8EE4XzW7gCXWj6_Qt5lXjp0CtWXhImf-hic=.7befb81e-2f8b-4cb7-a9eb-f0c1d95a414c@github.com> References: <6KXwQYCE8EE4XzW7gCXWj6_Qt5lXjp0CtWXhImf-hic=.7befb81e-2f8b-4cb7-a9eb-f0c1d95a414c@github.com> Message-ID: On Wed, 27 Sep 2023 16:20:50 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/runtime/globals.hpp line 2003: >> >>> 2001: range(0, UINT_MAX) \ >>> 2002: \ >>> 2003: product(uint, SecondarySuperMissBackoff, 1000, EXPERIMENTAL, \ >> >> Should it be marked DIAGNOSTIC instead? The functionality is turned on by default. It it were 0 by default, EXPERIMENTAL would have been well-justified. > > I don't think the discriminating factor for calling an option "diagnostic" or "experimental" is its default value. Rather it is its target use. > > As per `globals.hpp`: > > > // DIAGNOSTIC options are not meant for VM tuning or for product modes. > // They are to be used for VM quality assurance or field diagnosis > // of VM bugs. > > // EXPERIMENTAL flags are in support of features that may not be > // an officially supported part of a product, but may be available > // for experimenting with. They could, for example, be performance > // features that may not have undergone full or rigorous QA, but which may > // help performance in some cases and released for experimentation > // by the community of users and developers. > > > ...and this one is obviously the experimental performance feature. I referred to default value because in your particular case it controls whether the feature is turned on or off. Calling a feature experimental when it is unconditionally turned on in product builds looks a bit weird, doesn't it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15718#discussion_r1338903230 From shade at openjdk.org Wed Sep 27 16:45:16 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 27 Sep 2023 16:45:16 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v5] In-Reply-To: References: Message-ID: On Wed, 27 Sep 2023 16:08:08 GMT, Vladimir Ivanov wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Denser AArch64 > > src/hotspot/share/runtime/javaThread.cpp line 417: > >> 415: _vm_result_2(nullptr), >> 416: >> 417: _backoff_secondary_super_miss(0), > > Why don't you set it to `SecondarySuperMissBackoff` instead? > > With the proposed shape of `MacroAssembler::check_klass_subtype_slow_path()` (`if ((x = (x-1)) >= 0) {... slow path ... }`, an overflow happens on the first access. It has an unfortunate consequence that '== 0' doesn't work anymore and, moreover, signed comparison is needed. Yeah, I guess there is no point in not initializing to `SSMB` right away. I was thinking that the first update should be accepted, as slowpath is probably correct in the optimistic case, and only the bad cases should pay. But given how this counter is shared by all callsites, that point is likely moot. Changed in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15718#discussion_r1338899835 From jvernee at openjdk.org Wed Sep 27 16:50:33 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 27 Sep 2023 16:50:33 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v30] In-Reply-To: References: Message-ID: <7SpZ55G-FXPaGDEborDn2ZhxF6EUPUdG6J1p56GLYo0=.603f7708-0bd0-4ab9-992a-6aabdc216cc0@github.com> > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: drop unneeded @compile tags from jtreg tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15103/files - new: https://git.openjdk.org/jdk/pull/15103/files/ea1b9c5f..2bc0a650 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=28-29 Stats: 2 lines in 2 files changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From jvernee at openjdk.org Wed Sep 27 16:50:34 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 27 Sep 2023 16:50:34 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v28] In-Reply-To: References: <9NvYUl5F19FDLsn14TCcO34nCEZZIt07H8iLHrpTbgY=.d375318e-73a4-43df-9526-264a0ee043bc@github.com> <34zxg0Er5UaZKlmKxpvQ0QxAv8CeXEp0pSemqPgKquo=.165c5ccd-2873-42e7-aefa-e9ba2e78f661@github.com> Message-ID: On Wed, 27 Sep 2023 14:52:52 GMT, Jorn Vernee wrote: >> test/hotspot/jtreg/compiler/rangechecks/TestRangeCheckHoistingScaledIV.java line 32: >> >>> 30: * @modules jdk.incubator.vector >>> 31: * @compile -source ${jdk.version} TestRangeCheckHoistingScaledIV.java >>> 32: * @run main/othervm compiler.rangechecks.TestRangeCheckHoistingScaledIV >> >> Not important but I assume the @compile line can be removed from a number of tests as it's no longer needed. It was needed for tests that didn't use @enablePreview. > > Ok, I'll go over all the tests that I've changed and see if there are `@compile` tags that can be removed Besides the `compiler/rangechecks/TestRangeCheckHoistingScaledIV` test I found one other test that uses `@compile` in this manner: `java/lang/Thread/jni/AttachCurrentThread/AttachTest`. I've amended both. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1338910373 From shade at openjdk.org Wed Sep 27 17:08:16 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 27 Sep 2023 17:08:16 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v5] In-Reply-To: References: <6KXwQYCE8EE4XzW7gCXWj6_Qt5lXjp0CtWXhImf-hic=.7befb81e-2f8b-4cb7-a9eb-f0c1d95a414c@github.com> Message-ID: On Wed, 27 Sep 2023 16:39:22 GMT, Vladimir Ivanov wrote: >> I don't think the discriminating factor for calling an option "diagnostic" or "experimental" is its default value. Rather it is its target use. >> >> As per `globals.hpp`: >> >> >> // DIAGNOSTIC options are not meant for VM tuning or for product modes. >> // They are to be used for VM quality assurance or field diagnosis >> // of VM bugs. >> >> // EXPERIMENTAL flags are in support of features that may not be >> // an officially supported part of a product, but may be available >> // for experimenting with. They could, for example, be performance >> // features that may not have undergone full or rigorous QA, but which may >> // help performance in some cases and released for experimentation >> // by the community of users and developers. >> >> >> ...and this one is obviously the experimental performance feature. > > I referred to default value because in your particular case it controls whether the feature is turned on or off. Calling a feature experimental when it is unconditionally turned on in product builds looks a bit weird, doesn't it? Well, maybe :) One might ask another question: if I am experimenting with the value of this option, by switching from non-zero default to another non-zero value for performance evaluation in the field, is it really "diagnostic"? I think "experimental" captures the whole thing better. But I can change to diagnostic, if you insist. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15718#discussion_r1338935617 From igavrilin at openjdk.org Wed Sep 27 17:10:29 2023 From: igavrilin at openjdk.org (Ilya Gavrilin) Date: Wed, 27 Sep 2023 17:10:29 GMT Subject: Integrated: 8316743: RISC-V: Change UseVectorizedMismatchIntrinsic option result to warning In-Reply-To: References: Message-ID: On Fri, 22 Sep 2023 14:17:40 GMT, Ilya Gavrilin wrote: > Please review this small change for UseVectorizedMismatchIntrinsic option. > On RISC-V we do not have VectorizedMismatch intrinsic, so `void LIRGenerator::do_vectorizedMismatch(Intrinsic* x)` prodeuces fatal error when this option turned on. > Other similar options (like -XX:+UseCRC32Intrinsics) produces only warning: https://github.com/openjdk/jdk/blob/c90d63105ca774c047d5f5a4348aa657efc57953/src/hotspot/cpu/riscv/vm_version_riscv.cpp#L150-L183 > Also, on platforms, where VectorizedMismatch unimplemented to we got warning. This pull request has now been integrated. Changeset: 750da001 Author: Ilya Gavrilin Committer: Vladimir Kempik URL: https://git.openjdk.org/jdk/commit/750da0012931656cfd55f3e67c3f49ad7363ab8e Stats: 6 lines in 2 files changed: 5 ins; 0 del; 1 mod 8316743: RISC-V: Change UseVectorizedMismatchIntrinsic option result to warning Reviewed-by: fyang, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/15890 From vlivanov at openjdk.org Wed Sep 27 17:37:12 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 27 Sep 2023 17:37:12 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v5] In-Reply-To: References: <6KXwQYCE8EE4XzW7gCXWj6_Qt5lXjp0CtWXhImf-hic=.7befb81e-2f8b-4cb7-a9eb-f0c1d95a414c@github.com> Message-ID: <9M4Rm0qPvucDjCu3czcImrmFhMKyRZpXLDGTA_3lrag=.10f3efec-a660-4164-bb9a-802959dfd025@github.com> On Wed, 27 Sep 2023 17:04:54 GMT, Aleksey Shipilev wrote: >> I referred to default value because in your particular case it controls whether the feature is turned on or off. Calling a feature experimental when it is unconditionally turned on in product builds looks a bit weird, doesn't it? > > Well, maybe :) One might ask another question: if I am experimenting with the value of this option, by switching from non-zero default to another non-zero value for performance evaluation in the field, is it really "diagnostic"? I think "experimental" captures the whole thing better. But I can change to diagnostic, if you insist. A better question to ask is "Why do I have to specify -XX:+UnlockExperimentalVMOptions to turn the feature OFF?". Diagnosing a performance anomaly may involve both turning the logic off or increasing the limit. So, I do think that diagnostic is a better fit here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15718#discussion_r1338970246 From vlivanov at openjdk.org Wed Sep 27 17:37:13 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 27 Sep 2023 17:37:13 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v5] In-Reply-To: <9M4Rm0qPvucDjCu3czcImrmFhMKyRZpXLDGTA_3lrag=.10f3efec-a660-4164-bb9a-802959dfd025@github.com> References: <6KXwQYCE8EE4XzW7gCXWj6_Qt5lXjp0CtWXhImf-hic=.7befb81e-2f8b-4cb7-a9eb-f0c1d95a414c@github.com> <9M4Rm0qPvucDjCu3czcImrmFhMKyRZpXLDGTA_3lrag=.10f3efec-a660-4164-bb9a-802959dfd025@github.com> Message-ID: On Wed, 27 Sep 2023 17:33:43 GMT, Vladimir Ivanov wrote: >> Well, maybe :) One might ask another question: if I am experimenting with the value of this option, by switching from non-zero default to another non-zero value for performance evaluation in the field, is it really "diagnostic"? I think "experimental" captures the whole thing better. But I can change to diagnostic, if you insist. > > A better question to ask is "Why do I have to specify -XX:+UnlockExperimentalVMOptions to turn the feature OFF?". Diagnosing a performance anomaly may involve both turning the logic off or increasing the limit. So, I do think that diagnostic is a better fit here. Another question: should the flag be platform-specific? I'd expect different platforms may eventually settle on different defaults (e.g., platforms w/o support should set it to 0 right away) and, also, it would be better to print warnings when users try to explicitly set the flag to non-zero value when proper support is missing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15718#discussion_r1338971022 From vlivanov at openjdk.org Wed Sep 27 18:00:17 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 27 Sep 2023 18:00:17 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v6] In-Reply-To: References: Message-ID: On Wed, 27 Sep 2023 16:45:07 GMT, Aleksey Shipilev wrote: >> Work in progress, submitting for broader attention. >> >> See more details in the bug and related issues. >> >> This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. >> >> This implements the mitigation for AArch64 and x86_64. More platforms can be implemented in this PR, or deferred to later PRs. Port maintainers, feel free to suggest the patches for your arches, I'll be happy to fold them in. >> >> Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. >> >> Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. >> >> Current PR deliberately sets backoff at `1000` to simplify testing. The actual value should be chosen by broader experiments. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` >> - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - Merge branch 'master' into JDK-8316180-backoff-secondary-super > - Init with backoff right away > - x86 cleanup > - Denser AArch64 > - Cleaner AArch64 code > - Use proper 32-bit stores on AArch64 > - PPC version > - Revert "WIP x86_32" > > This reverts commit dc37d25b5ef232e2c8b0ac9c966c41a1ae3cca82. > - WIP x86_32 > - Revert ARM/PPC/RISC-V/S390 development stubs > - ... and 6 more: https://git.openjdk.org/jdk/compare/00ac2faf...bc6964ad I see the following disclaimer in the description: > Work in progress, submitting for broader attention. What are next steps and, more broadly, what's left to get the PR finalized? Especially, it's not clear how much performance testing it went through so far. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1737844987 From ayang at openjdk.org Wed Sep 27 18:19:17 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 27 Sep 2023 18:19:17 GMT Subject: RFR: 8315503: G1: Code root scan causes long GC pauses due to imbalanced iteration [v6] In-Reply-To: References: Message-ID: <3G9kv0_H2r1AbObnWLCj00kvPYrzFI7NfcQDoQbdCHo=.608c2eea-dd8d-4f8d-a1ba-956f0823e5ff@github.com> On Wed, 27 Sep 2023 07:23:39 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that modifies the code root (remembered) set to use the CHT as internal representation. >> >> This removes lots of locking (inhibiting throughput), provides automatic balancing for the code root scan phase, and (parallel) bulk unregistering of nmethdos during code cache unloading improving performance of various pauses that deal with code root sets. >> >> With a stress test that frequently loads and unloads 6000 classes and associated methods from them we could previously see the following issues: >> >> During collection pauses: >> >> [4179,965s][gc,phases ] GC(273) Evacuate Collection Set: 812,18ms >> [..] >> [4179,965s][gc,phases ] GC(273) Code Root Scan (ms): Min: 0,00, Avg: 59,03, Max: 775,12, Diff: 775,12, Sum: 944,44, Workers: 16 >> [...] >> [4179,965s][gc,phases ] GC(273) Termination (ms): Min: 0,03, Avg: 643,90, Max: 690,96, Diff: 690,93, Sum: 10302,47, Workers: 16 >> >> >> Code root scan now reduces to ~22ms max on average in this case. >> >> Class unloading (breaking down the code cache flushing, i.e. `CodeCache::flush_unlinked_nmethods`): >> >> Clear Exception Caches 35,5ms >> Unregister NMethods 598,5ms <---- this is nmethod unregistering. >> Unregister Old NMethods 3,0ms >> CodeBlob flush 41,1ms >> CodeCache free 5730,3ms >> >> >> With this change, the `unregister nmethods` phase takes ~25ms max on that stress test. @walulyai contributed this part. >> >> We have recently seen some imbalances in code root scan and long Remark pauses (thankfully not to that extreme) in other real-world applications too: >> >> [2466.979s][gc,phases ] GC(131) Code Root Scan (ms): Min: 0.0, Avg: 5.7, Max: 46.4, Diff: 46.4, Sum: 57.0, Workers: 10 >> >> >> Some random comment: >> * the mutex for the CHT had to be decreased in priority by one to not conflict with `CodeCache_lock`. This does not seem to be detrimental otherwise. At the same time, I had to move the locks at `nosafepoint-3` to `nosafepoint-4` as well to keep previous ordering. All mutexes with uses of `nosafepoint` as their rank seem to be good now. >> >> Testing: tier1-5 >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: > > - Split off bulk removal of dead nmethods from code root sets > - Initial comments from albert The title should probably be revised, sth like "use conc-hashtable for code-root", to better match the content of the PR. (For example, I don't get the "imbalanced" part after going through the diff.) Could you also update the perf number now that only one optimization is included? src/hotspot/share/gc/g1/g1CodeRootSet.cpp line 45: > 43: class G1CodeRootSetHashTableConfig : public StackObj { > 44: public: > 45: using Value = G1CodeRootSetHashTableValue; Could one use `nmethod*` here directly? Having one extra indirection/layer makes it harder to follow. src/hotspot/share/gc/g1/g1CodeRootSet.cpp line 82: > 80: uintx get_hash() const; > 81: bool equals(G1CodeRootSetHashTableValue* value); > 82: bool is_dead(G1CodeRootSetHashTableValue* value) const { return false; } I wonder if `(...)` works, since the arg in unused. (Inspired by `struct NOP` in conc-hashtable.) src/hotspot/share/gc/g1/g1RemSet.cpp line 825: > 823: > 824: // Scan code root remembered sets. > 825: { Without the claim-logic, all workers will scan code-root. Why is it needed that multiples workers scan the same set of code-root repeatedly? I thought once is enough per region. ------------- PR Review: https://git.openjdk.org/jdk/pull/15811#pullrequestreview-1647285080 PR Review Comment: https://git.openjdk.org/jdk/pull/15811#discussion_r1339005262 PR Review Comment: https://git.openjdk.org/jdk/pull/15811#discussion_r1339007198 PR Review Comment: https://git.openjdk.org/jdk/pull/15811#discussion_r1339017373 From shade at openjdk.org Wed Sep 27 18:27:00 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 27 Sep 2023 18:27:00 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v7] In-Reply-To: References: Message-ID: <2imokRHhQANZwG0V2mHaqfqCD-aoCgC-TI2bSxz1k9k=.54af038c-9f44-4879-826b-183b54eda82c@github.com> > Work in progress, submitting for broader attention. > > See more details in the bug and related issues. > > This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. > > This implements the mitigation for AArch64 and x86_64. More platforms can be implemented in this PR, or deferred to later PRs. Port maintainers, feel free to suggest the patches for your arches, I'll be happy to fold them in. > > Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. > > Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. > > Current PR deliberately sets backoff at `1000` to simplify testing. The actual value should be chosen by broader experiments. > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` > - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Option is diagnostic, platform-dependent ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15718/files - new: https://git.openjdk.org/jdk/pull/15718/files/bc6964ad..4ef74c15 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15718&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15718&range=05-06 Stats: 19 lines in 8 files changed: 18 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15718.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15718/head:pull/15718 PR: https://git.openjdk.org/jdk/pull/15718 From shade at openjdk.org Wed Sep 27 18:27:02 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 27 Sep 2023 18:27:02 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v5] In-Reply-To: References: <6KXwQYCE8EE4XzW7gCXWj6_Qt5lXjp0CtWXhImf-hic=.7befb81e-2f8b-4cb7-a9eb-f0c1d95a414c@github.com> <9M4Rm0qPvucDjCu3czcImrmFhMKyRZpXLDGTA_3lrag=.10f3efec-a660-4164-bb9a-802959dfd025@github.com> Message-ID: On Wed, 27 Sep 2023 17:34:19 GMT, Vladimir Ivanov wrote: >> A better question to ask is "Why do I have to specify -XX:+UnlockExperimentalVMOptions to turn the feature OFF?". Diagnosing a performance anomaly may involve both turning the logic off or increasing the limit. So, I do think that diagnostic is a better fit here. > > Another question: should the flag be platform-specific? I'd expect different platforms may eventually settle on different defaults (e.g., platforms w/o support should set it to 0 right away) and, also, it would be better to print warnings when users try to explicitly set the flag to non-zero value when proper support is missing. All right, diagnostic platform-dependent it is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15718#discussion_r1339026024 From shade at openjdk.org Wed Sep 27 18:28:13 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 27 Sep 2023 18:28:13 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v6] In-Reply-To: References: Message-ID: <17_uqfnizaJ93-eAhf9C0n5HGG-hGo8yxaIulgxM9q4=.6d950430-46b4-42a2-ac86-c84ae06df431@github.com> On Wed, 27 Sep 2023 17:57:46 GMT, Vladimir Ivanov wrote: > What are next steps and, more broadly, what's left to get the PR finalized? Especially, it's not clear how much performance testing it went through so far. I was waiting on Derek White to publish their benchmarks, so that we can decide reasonable defaults. While it seems too early to run larger scale benchmarks, feel free to give it a spin. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1737879284 From shade at openjdk.org Wed Sep 27 19:46:39 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 27 Sep 2023 19:46:39 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v8] In-Reply-To: References: Message-ID: > Work in progress, submitting for broader attention. > > See more details in the bug and related issues. > > This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. > > This implements the mitigation for AArch64 and x86_64. More platforms can be implemented in this PR, or deferred to later PRs. Port maintainers, feel free to suggest the patches for your arches, I'll be happy to fold them in. > > Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. > > Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. > > Current PR deliberately sets backoff at `1000` to simplify testing. The actual value should be chosen by broader experiments. > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` > - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Correct type for flag ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15718/files - new: https://git.openjdk.org/jdk/pull/15718/files/4ef74c15..8be561d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15718&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15718&range=06-07 Stats: 7 lines in 7 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/15718.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15718/head:pull/15718 PR: https://git.openjdk.org/jdk/pull/15718 From rrich at openjdk.org Wed Sep 27 20:08:14 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 27 Sep 2023 20:08:14 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v11] In-Reply-To: References: Message-ID: <2aKqY0mXRh_PadlRzpYuepecYMuQwL_aSZCLGYhYx48=.5552ff98-9008-4ff8-ad0d-ebd334b65e19@github.com> On Tue, 26 Sep 2023 16:40:51 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with five additional commits since the last revision: > > - Eliminate special case for scanning the large array end > - First card of large array should be cleared if dirty > - Do all large array scanning in separate method > - Limit stripe size to 1m with at least 8 threads > - Small clean-ups Could the following by a scheme with just one card table? * A thread clears only cards on stripes it owns * Problematic case: non-array object O starts in stripe S0 and reaches into S1 * Cards of O are imprecisely marked We could iterate all stripes pre-scavenge (in parallel?) marking the first card dirty iff a non-array object starts in a previous stripe and the card with the object start is marked dirty. This would allow to share clearing and scanning of non-array objects based on stripes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1738004579 From dchuyko at openjdk.org Wed Sep 27 21:30:42 2023 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Wed, 27 Sep 2023 21:30:42 GMT Subject: RFR: 8309271: A way to align already compiled methods with compiler directives [v7] In-Reply-To: References: Message-ID: <-xQ7n8u1xDWpcgMZJvaYl2LBTFTnf-Q7UlsAN2W8f6U=.2253a02d-e496-4276-8aa1-8be2dd869216@github.com> > Compiler Control (https://openjdk.org/jeps/165) provides method-context dependent control of the JVM compilers (C1 and C2). The active directive stack is built from the directive files passed with the `-XX:CompilerDirectivesFile` diagnostic command-line option and the Compiler.add_directives diagnostic command. It is also possible to clear all directives or remove the top from the stack. > > A matching directive will be applied at method compilation time when such compilation is started. If directives are added or changed, but compilation does not start, then the state of compiled methods doesn't correspond to the rules. This is not an error, and it happens in long running applications when directives are added or removed after compilation of methods that could be matched. For example, the user decides that C2 compilation needs to be disabled for some method due to a compiler bug, issues such a directive but this does not affect the application behavior. In such case, the target application needs to be restarted, and such an operation can have high costs and risks. Another goal is testing/debugging compilers. > > It would be convenient to optionally reconcile at least existing matching nmethods to the current stack of compiler directives (so bypass inlined methods). > > Natural way to eliminate the discrepancy between the result of compilation and the broken rule is to discard the compilation result, i.e. deoptimization. Prior to that we can try to re-compile the method letting compile broker to perform it taking new directives stack into account. Re-compilation helps to prevent hot methods from execution in the interpreter. > > A new flag `-r` has beed introduced for some directives related to compile commands: `Compiler.add_directives`, `Compiler.remove_directives`, `Compiler.clear_directives`. The default behavior has not changed (no flag). If the new flag is present, the command scans already compiled methods and puts methods that have any active non-default matching compiler directives to re-compilation if possible, otherwise marks them for deoptimization. There is currently no distinction which directives are found. In particular, this means that if there are rules for inlining into some method, it will be refreshed. On the other hand, if there are rules for a method and it was inlined, top-level methods won't be refreshed, but this can be achieved by having rules for them. > > In addition, a new diagnostic command `Compiler.replace_directives`, has been added for ... Dmitry Chuyko has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: - Merge branch 'openjdk:master' into compiler-directives-force-update - Merge branch 'openjdk:master' into compiler-directives-force-update - jcheck - Unnecessary import - force_update->refresh - Merge branch 'openjdk:master' into compiler-directives-force-update - Use only top directive for add/remove; better mutex rank definition; texts - Merge branch 'openjdk:master' into compiler-directives-force-update - Merge branch 'openjdk:master' into compiler-directives-force-update - Safe handling of non-java methods - ... and 15 more: https://git.openjdk.org/jdk/compare/750da001...e451f509 ------------- Changes: https://git.openjdk.org/jdk/pull/14111/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14111&range=06 Stats: 372 lines in 15 files changed: 339 ins; 3 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/14111.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14111/head:pull/14111 PR: https://git.openjdk.org/jdk/pull/14111 From mli at openjdk.org Wed Sep 27 22:25:52 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 27 Sep 2023 22:25:52 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic In-Reply-To: References: <_pqOBOgIpb6t8zMJWw_SHOZUsYO9_dQgPEJ3gddF_Gw=.22dfd84d-bf70-41b9-af21-169f09517213@github.com> <2gy2rzM2B7I7oHf-pVmKTbEkmZnwRznNinCAnDRDQNg=.57e9d123-c45b-48de-81db-48a9f4556d1d@github.com> Message-ID: On Wed, 27 Sep 2023 13:46:52 GMT, Robbin Ehn wrote: >> Thanks for discussion. Yes, I see your point. >> What I mean above is that the java heap can only be touched when store back to memory, but store `only access memory or raise exceptions for active elements.`, that seems mean that a vectore operation might touch tail elements in vector group register, but a vector store will not touch the memory corresponding to the tail elements. >> Am I understanding the spec correctly? > > I don't see that the specs guarantees that under no circumstance tail memory will not be touched. > So I don't know, for now it's better to be safe than sorry. Sure, in fact vsetvli was already modifed to use default arguments as you suggested. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1338662157 From pli at openjdk.org Wed Sep 27 22:32:10 2023 From: pli at openjdk.org (Pengfei Li) Date: Wed, 27 Sep 2023 22:32:10 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v4] In-Reply-To: References: <6IkvVTm9e60qXwaID0EihRXlUielrryBWoTmYAp3PuU=.c624b13d-bc6d-4c79-86a6-72bda016b50f@github.com> Message-ID: On Mon, 3 Jul 2023 14:44:22 GMT, Emanuel Peter wrote: >> Good catch! What do you think of getting rid of `_slp` completely in `SWPointer` refactoring? > > I think that would be optimal, if it is possible. I would maybe call it a `CLPointer`, for counted-loop-pointer? And only have a reference to the `_lpt` / `cl`. Eventually, we may want to even allow non-conted-loops, but that is really for the future. Done in SWPointer refactoring patch >> Exactly! I have tried supporting some basic strided accesses. The code is not included in this patch as it's not that beneficial on some CPUs and requires more C2 refactorings. > > Great, you should probably leave that to a future RFE anyway. I close this comment as the feature will not be included in this patch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1338274345 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1338281001 From pli at openjdk.org Wed Sep 27 22:32:45 2023 From: pli at openjdk.org (Pengfei Li) Date: Wed, 27 Sep 2023 22:32:45 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v4] In-Reply-To: <2Pgw8cvXHt5MbQOSiD9C_pIIyE5peaxJDbwI_w-9XJY=.1b09b1e6-e4be-48ea-b054-9d2dac2dbf30@github.com> References: <2Pgw8cvXHt5MbQOSiD9C_pIIyE5peaxJDbwI_w-9XJY=.1b09b1e6-e4be-48ea-b054-9d2dac2dbf30@github.com> Message-ID: On Tue, 4 Jul 2023 11:57:46 GMT, Emanuel Peter wrote: >> Yes, we have tried supporting type conversions (between different type sizes) but current solution is not mature and not included in this patch. So this limitation is added here. > > Ok, fine. Leave that for the future. I close this comment as the feature will not be included in this patch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1338281456 From pli at openjdk.org Wed Sep 27 22:31:55 2023 From: pli at openjdk.org (Pengfei Li) Date: Wed, 27 Sep 2023 22:31:55 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v4] In-Reply-To: <7iru0xDm4lckuwyHvqGSld0_kWUVYSTg5BT3-rqP3Vw=.cab94902-98aa-40ed-ae13-d238380b6267@github.com> References: <7iru0xDm4lckuwyHvqGSld0_kWUVYSTg5BT3-rqP3Vw=.cab94902-98aa-40ed-ae13-d238380b6267@github.com> Message-ID: On Mon, 3 Jul 2023 08:09:30 GMT, Pengfei Li wrote: >> I'd also move this to some static functions in a potential "autovectorization.hpp", and move `_vector_loop_debug` there, together with all its `is_trace...` accessors. > > I agree current code here is a bit ugly. I will try to make it better in `SWPointer` refactoring. Close above comment and leave this open to remind that we should also move `_vector_loop_debug` and all `is_trace...` accessors in a later patch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1338271886 From pli at openjdk.org Wed Sep 27 22:32:32 2023 From: pli at openjdk.org (Pengfei Li) Date: Wed, 27 Sep 2023 22:32:32 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: On Mon, 3 Jul 2023 14:51:55 GMT, Emanuel Peter wrote: >> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Address part of comments from Emanuel > > src/hotspot/share/opto/vmaskloop.cpp line 64: > >> 62: >> 63: if (!cl->is_valid_counted_loop(T_INT)) { >> 64: trace_msg(nullptr, "Loop is not a valid counted loop"); > > Would it help to dump the loop head here? Just that one knows which loop is being rejected here? Done > src/hotspot/share/opto/vmaskloop.cpp line 68: > >> 66: } >> 67: if (abs(cl->stride_con()) != 1) { >> 68: trace_msg(nullptr, "Loop has unsupported stride value"); > > Dump loop head and the stride Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1338290086 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1338290316 From pli at openjdk.org Wed Sep 27 22:32:58 2023 From: pli at openjdk.org (Pengfei Li) Date: Wed, 27 Sep 2023 22:32:58 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v4] In-Reply-To: References: <0s9ixJcCQIRzJ3h4tpPwVeC7HmYbdDqhd3V6BWZDUTg=.f2b9dad5-73f2-4a34-b24d-639f4fe3de9e@github.com> Message-ID: On Tue, 4 Jul 2023 02:23:32 GMT, Pengfei Li wrote: >> Suggested solution: track the last memory state per slice, just like I recently did in `SuperWord::schedule_reorder_memops` with `current_state_in_slice`. > > I'm not quite familiar with memory slice. Will do more investigation and come back later. After some study, I know my current approach does create some unnecessary memory dependence. Will fix this after some refactoring work. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1338286246 From pli at openjdk.org Wed Sep 27 22:33:03 2023 From: pli at openjdk.org (Pengfei Li) Date: Wed, 27 Sep 2023 22:33:03 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v4] In-Reply-To: References: Message-ID: On Tue, 4 Jul 2023 02:40:51 GMT, Pengfei Li wrote: >> test/hotspot/jtreg/compiler/vectorization/runner/ArrayInvariantFillTest.java line 69: >> >>> 67: @Test >>> 68: @IR(applyIfCPUFeatureOr = {"asimd", "true", "sse2", "true"}, >>> 69: applyIf = {"OptimizeFill", "false"}, >> >> This seems unrelated. Why did you have to add this? > > Will cleanup this in JDK-8309697. This is already fixed in another patch. Will close this comment. >> test/hotspot/jtreg/compiler/vectorization/runner/VectorizationTestRunner.java line 84: >> >>> 82: TestFramework irTest = new TestFramework(klass); >>> 83: // Add extra VM options to enable more auto-vectorization chances >>> 84: irTest.addFlags("-XX:-OptimizeFill"); >> >> Aha, you removed this too. Fair enough. But since the runner is currently requiring everything to be `flagless`, now I cannot actually force `-XX:-OptimizeFill` from the outside. And that means that potentially the tests are never actually run with `OptimizeFill` off, and we never actually can check the IR rules. We lose test coverage. That makes me a bit nervous. >> >> Suggestion: if tests actually require the flag off to execute the IR rule, then we should have two scenarios, one where the flag is on, and one when it is off. > > Again, will cleanup this in JDK-8309697. It's already cleaned up in another patch. Will close it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1338288996 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1338289635 From cslucas at openjdk.org Wed Sep 27 23:34:10 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 27 Sep 2023 23:34:10 GMT Subject: RFR: JDK-8316991: Reduce nullable allocation merges [v2] In-Reply-To: References: Message-ID: > ### Description > > Many, if not most, allocation merges (Phis) are nullable because they join object allocations with "NULL", or objects returned from method calls, etc. Please review this Pull Request that improves Reduce Allocation Merge implementation so that it can reduce at least some of these allocation merges. > > Overall, the improvements are related to 1) making rematerialization of merges able to represent "NULL" objects, and 2) being able to reduce merges used by CmpP/N and CastPP. > > The approach to reducing CmpP/N and CastPP is pretty similar to that used in the `MemNode::split_through_phi` method: a clone of the node being split is added on each input of the Phi. I make use of `optimize_ptr_compare` and some type information to remove redundant CmpP and CastPP nodes. I added a bunch of ASCII diagrams illustrating what some of the more important methods are doing. > > ### Benchmarking > > **Note:** In some of these tests no reduction happens. I left them in to validate that no perf. regression happens in that case. > **Note 2:** Marging of error was negligible. > > | Benchmark | No RAM (ms/op) | Yes RAM (ms/op) | > |--------------------------------------|------------------|-------------------| > | TestTrapAfterMerge | 19.515 | 13.386 | > | TestArgEscape | 33.165 | 33.254 | > | TestCallTwoSide | 70.547 | 69.427 | > | TestCmpAfterMerge | 16.400 | 2.984 | > | TestCmpMergeWithNull_Second | 27.204 | 27.293 | > | TestCmpMergeWithNull | 8.248 | 4.920 | > | TestCondAfterMergeWithAllocate | 12.890 | 5.252 | > | TestCondAfterMergeWithNull | 6.265 | 5.078 | > | TestCondLoadAfterMerge | 12.713 | 5.163 | > | TestConsecutiveSimpleMerge | 30.863 | 4.068 | > | TestDoubleIfElseMerge | 16.069 | 2.444 | > | TestEscapeInCallAfterMerge | 23.111 | 22.924 | > | TestGlobalEscape | 14.459 | 14.425 | > | TestIfElseInLoop | 246.061 | 42.786 | > | TestLoadAfterLoopAlias | 45.808 | 45.812 | > | TestLoadAfterTrap | 28.370 | ... Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Fix typo in test. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15825/files - new: https://git.openjdk.org/jdk/pull/15825/files/d1197055..e8e9c13d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15825&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15825&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15825.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15825/head:pull/15825 PR: https://git.openjdk.org/jdk/pull/15825 From dholmes at openjdk.org Thu Sep 28 01:49:24 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 28 Sep 2023 01:49:24 GMT Subject: RFR: 8309599: WeakHandle and OopHandle release should clear obj pointer In-Reply-To: References: <7iiMlW8hlRFlFh8GdIwbphR86Hh3G7L5tA-f_VkxuC8=.fa35078c-c776-4041-841a-5d6c69ba4e06@github.com> Message-ID: On Wed, 27 Sep 2023 12:19:31 GMT, Coleen Phillimore wrote: > OopHandles and WeakHandles don't have destructors Hmmm okay - it seems fragile to have a psuedo-destructor in `release()`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15920#issuecomment-1738324423 From dholmes at openjdk.org Thu Sep 28 02:07:23 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 28 Sep 2023 02:07:23 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v5] In-Reply-To: <73sTfC_S_JkdiAYBv-CCB58NfWO_Q1RSgG43wJNutI8=.05080f32-0bd1-4a1e-a252-9bd4142f1fed@github.com> References: <73sTfC_S_JkdiAYBv-CCB58NfWO_Q1RSgG43wJNutI8=.05080f32-0bd1-4a1e-a252-9bd4142f1fed@github.com> Message-ID: On Wed, 27 Sep 2023 08:48:27 GMT, Afshin Zafari wrote: >> src/hotspot/share/gc/parallel/mutableNUMASpace.hpp line 1: >> >>> 1: /* >> >> This seems an unrelated change. > > This change came after fixing a merge conflict. > In `mutableNUMASpace.cpp`, at lines 163, 182, 202 and 586 the `find` function is called in this way: > > int i = lgrp_spaces()->find(&lgrp_id, LGRPSpace::equals); > > where `lgrp_id` is `int`. Therefore, the `LGRPSpace::equals` has to take an `int*` in its first argument. The definition of `find` is: > > int find(T* token, bool f(T*, const E&)) const { After JDK-8316115 `lgrp_id` is `uint`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15418#discussion_r1339427450 From jwaters at openjdk.org Thu Sep 28 03:12:03 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 28 Sep 2023 03:12:03 GMT Subject: RFR: 8307160: [REDO] Enable the permissive- flag on the Microsoft Visual C compiler [v6] In-Reply-To: <7piLRto5nNbhYYYfENCr5ecm4M2xNtMkjkE8XhrLLQ0=.8fd1ac3a-46f8-47a8-ae37-a4abbf7757d9@github.com> References: <7piLRto5nNbhYYYfENCr5ecm4M2xNtMkjkE8XhrLLQ0=.8fd1ac3a-46f8-47a8-ae37-a4abbf7757d9@github.com> Message-ID: > We should set the -permissive- flag for the Microsoft Visual C compiler, as was requested by the now backed out [JDK-8241499](https://bugs.openjdk.org/browse/JDK-8241499). Doing so makes the Visual C compiler much less accepting of ill formed code, which will improve code quality on Windows in the future. Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: - Merge branch 'openjdk:master' into patch-10 - Merge branch 'master' into patch-10 - Document changes in awt_DnDDS.cpp - Remove negation in os_windows.cpp - Mismatched declaration in D3DGlyphCache.cpp - Fields in awt_TextComponent.cpp - reinterpret_cast needed in AccessBridgeJavaEntryPoints.cpp - Qualifiers in awt_PrintDialog.h should be removed - Likewise for awt_DnDDT.cpp - awt_ole.h include order issue in awt_DnDDS.cpp - ... and 16 more: https://git.openjdk.org/jdk/compare/84390dd0...1e2b39f9 ------------- Changes: https://git.openjdk.org/jdk/pull/15096/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15096&range=05 Stats: 802 lines in 17 files changed: 171 ins; 127 del; 504 mod Patch: https://git.openjdk.org/jdk/pull/15096.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15096/head:pull/15096 PR: https://git.openjdk.org/jdk/pull/15096 From jwaters at openjdk.org Thu Sep 28 03:21:40 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 28 Sep 2023 03:21:40 GMT Subject: RFR: 8307160: [REDO] Enable the permissive- flag on the Microsoft Visual C compiler [v6] In-Reply-To: References: <7piLRto5nNbhYYYfENCr5ecm4M2xNtMkjkE8XhrLLQ0=.8fd1ac3a-46f8-47a8-ae37-a4abbf7757d9@github.com> Message-ID: On Thu, 28 Sep 2023 03:12:03 GMT, Julian Waters wrote: >> We should set the -permissive- flag for the Microsoft Visual C compiler, as was requested by the now backed out [JDK-8241499](https://bugs.openjdk.org/browse/JDK-8241499). Doing so makes the Visual C compiler much less accepting of ill formed code, which will improve code quality on Windows in the future. > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: > > - Merge branch 'openjdk:master' into patch-10 > - Merge branch 'master' into patch-10 > - Document changes in awt_DnDDS.cpp > - Remove negation in os_windows.cpp > - Mismatched declaration in D3DGlyphCache.cpp > - Fields in awt_TextComponent.cpp > - reinterpret_cast needed in AccessBridgeJavaEntryPoints.cpp > - Qualifiers in awt_PrintDialog.h should be removed > - Likewise for awt_DnDDT.cpp > - awt_ole.h include order issue in awt_DnDDS.cpp > - ... and 16 more: https://git.openjdk.org/jdk/compare/84390dd0...1e2b39f9 closing and deleting for now ------------- PR Comment: https://git.openjdk.org/jdk/pull/15096#issuecomment-1738377203 From jwaters at openjdk.org Thu Sep 28 03:21:41 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 28 Sep 2023 03:21:41 GMT Subject: Withdrawn: 8307160: [REDO] Enable the permissive- flag on the Microsoft Visual C compiler In-Reply-To: <7piLRto5nNbhYYYfENCr5ecm4M2xNtMkjkE8XhrLLQ0=.8fd1ac3a-46f8-47a8-ae37-a4abbf7757d9@github.com> References: <7piLRto5nNbhYYYfENCr5ecm4M2xNtMkjkE8XhrLLQ0=.8fd1ac3a-46f8-47a8-ae37-a4abbf7757d9@github.com> Message-ID: <2z8rwkhhnrcVYMmAFv0iyc9bYapjmtEtCEnlrlfFxWQ=.10d537c9-43f3-47ce-9e3e-fff1aaaadfb5@github.com> On Tue, 1 Aug 2023 01:52:24 GMT, Julian Waters wrote: > We should set the -permissive- flag for the Microsoft Visual C compiler, as was requested by the now backed out [JDK-8241499](https://bugs.openjdk.org/browse/JDK-8241499). Doing so makes the Visual C compiler much less accepting of ill formed code, which will improve code quality on Windows in the future. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/15096 From djelinski at openjdk.org Thu Sep 28 04:19:30 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Thu, 28 Sep 2023 04:19:30 GMT Subject: Integrated: 8316125: Windows call_stub unnecessarily saves xmm16-31 when UseAVX>=3 In-Reply-To: References: Message-ID: On Tue, 12 Sep 2023 17:10:38 GMT, Daniel Jeli?ski wrote: > Please review this patch that removes saving of xmm16-xmm31 registers from Windows call_stub. > > Windows ABI only mandates saving xmm6-xmm15, which we continue to do here. > > No new tests. Mach5 tier1-5 builds and tests clean. This pull request has now been integrated. Changeset: 384d2ea6 Author: Daniel Jeli?ski URL: https://git.openjdk.org/jdk/commit/384d2ea6d10017299a6d538bc86c17e3b8443cd9 Stats: 26 lines in 3 files changed: 0 ins; 17 del; 9 mod 8316125: Windows call_stub unnecessarily saves xmm16-31 when UseAVX>=3 Reviewed-by: jvernee, kvn, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/15688 From kbarrett at openjdk.org Thu Sep 28 05:29:22 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 28 Sep 2023 05:29:22 GMT Subject: RFR: 8309599: WeakHandle and OopHandle release should clear obj pointer In-Reply-To: <7iiMlW8hlRFlFh8GdIwbphR86Hh3G7L5tA-f_VkxuC8=.fa35078c-c776-4041-841a-5d6c69ba4e06@github.com> References: <7iiMlW8hlRFlFh8GdIwbphR86Hh3G7L5tA-f_VkxuC8=.fa35078c-c776-4041-841a-5d6c69ba4e06@github.com> Message-ID: On Tue, 26 Sep 2023 12:47:42 GMT, Coleen Phillimore wrote: > This change makes WeakHandle and OopHandle release null out the obj pointer, at the cost of making the release function non-const and some changes that propagated from that. This enables ObjectMonitor code to test for null to see if the obj was already released, and seems like the right thing to do. See comments from related PR in the bug report. > Tested with tier1-4. Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15920#pullrequestreview-1647918875 From dholmes at openjdk.org Thu Sep 28 06:20:25 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 28 Sep 2023 06:20:25 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly In-Reply-To: References: Message-ID: On Mon, 18 Sep 2023 07:37:26 GMT, Liming Liu wrote: > As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). > > Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported: > > > > > > > > > > > >
Kernel -XX:-TransparentHugePages -XX:+TransparentHugePages
Unpatched Patched Unpatched Patched
4.18 11.30 11.30 0.25 0.25
5.13 0.22 0.22 3.42 3.42
6.1 0.27 0.33 3.54 0.33
I can't comment on the actual use of MADV_POPULATE_WRITE (probably need to wait for @tstuefe to get back for that) nor the flags usage in the test, but the general refactoring looks okay. Thanks test/hotspot/jtreg/gc/parallel/TestParallelAlwaysPreTouch.java line 29: > 27: * @requires vm.gc.Parallel & os.family == "linux" & os.maxMemory > 30G > 28: * @summary Check if parallel pretouch performs normally with and without THP. > 29: * @comment The test is not ParallelGC-specific, but a multi-threaded GC is \ No need for a line continuation character - \ ------------- PR Review: https://git.openjdk.org/jdk/pull/15781#pullrequestreview-1647974313 PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1339579908 From alanb at openjdk.org Thu Sep 28 07:17:35 2023 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 28 Sep 2023 07:17:35 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v30] In-Reply-To: <7SpZ55G-FXPaGDEborDn2ZhxF6EUPUdG6J1p56GLYo0=.603f7708-0bd0-4ab9-992a-6aabdc216cc0@github.com> References: <7SpZ55G-FXPaGDEborDn2ZhxF6EUPUdG6J1p56GLYo0=.603f7708-0bd0-4ab9-992a-6aabdc216cc0@github.com> Message-ID: On Wed, 27 Sep 2023 16:50:33 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > drop unneeded @compile tags from jtreg tests Marked as reviewed by alanb (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15103#pullrequestreview-1648059254 From rrich at openjdk.org Thu Sep 28 07:41:18 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 28 Sep 2023 07:41:18 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v12] In-Reply-To: References: Message-ID: <50VtEqmFaxK4NnbXqU54rQW8R1YrGDa6HukQOuniupE=.5a5365f1-546a-4c48-a763-9248346c6593@github.com> > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Remove stripe size adaptations and cache potentially expensive start array queries ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/d75bd60a..50737dda Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=10-11 Stats: 49 lines in 3 files changed: 20 ins; 27 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From azafari at openjdk.org Thu Sep 28 09:49:05 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 28 Sep 2023 09:49:05 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v6] In-Reply-To: References: Message-ID: <-ALHHMcYPfciG6g2sOT-XIEVTf1pA6XXa93eNXQamD4=.88329bf9-627b-4d78-93fc-299550fc2be0@github.com> > The `find` method now is > ```C++ > template > int find(T* token, bool f(T*, E)) const { > ... > > Any other functions which use this are also changed. > Local linux-x64-debug hotspot:tier1 passed. Mach5 tier1 build on linux and Windows passed. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: first arg of `find` casted to `uint*` ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15418/files - new: https://git.openjdk.org/jdk/pull/15418/files/71d320f9..9c548660 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15418&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15418&range=04-05 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/15418.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15418/head:pull/15418 PR: https://git.openjdk.org/jdk/pull/15418 From azafari at openjdk.org Thu Sep 28 10:03:26 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 28 Sep 2023 10:03:26 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v5] In-Reply-To: References: <73sTfC_S_JkdiAYBv-CCB58NfWO_Q1RSgG43wJNutI8=.05080f32-0bd1-4a1e-a252-9bd4142f1fed@github.com> Message-ID: <8gDQAzccCv2KWNfc8TRFPGodObQSYErt2qRpeX-04_U=.48cb406b-5615-432b-893b-4526af7346cc@github.com> On Thu, 28 Sep 2023 02:04:12 GMT, David Holmes wrote: >> This change came after fixing a merge conflict. >> In `mutableNUMASpace.cpp`, at lines 163, 182, 202 and 586 the `find` function is called in this way: >> >> int i = lgrp_spaces()->find(&lgrp_id, LGRPSpace::equals); >> >> where `lgrp_id` is `int`. Therefore, the `LGRPSpace::equals` has to take an `int*` in its first argument. The definition of `find` is: >> >> int find(T* token, bool f(T*, const E&)) const { > > After JDK-8316115 `lgrp_id` is `uint`. Even after the JDK-8316115, the local `lgrp_id` is defined as `int` and compared with `-1` in a few lines before calling the `find`. The local `int` definition is kept as it is, but the pointer to it is casted to `uint*`. Maybe @albertnetymk has to double check if this is not overlooked in JDK-8316115 fix, since the line `int lgrp_id = thr->lgrp_id();` (dates back to 15 years ago) is casting now from `uint` to `int` and comparing with `-1`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15418#discussion_r1339860082 From mdoerr at openjdk.org Thu Sep 28 10:38:52 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 28 Sep 2023 10:38:52 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: > I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. > Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Pass may_be_unordered information to lightweight_unlock. - Merge remote-tracking branch 'origin' into 8316746_lock_stack - Add x86_64 and aarch64 implementation. - 8316746: Top of lock-stack does not match the unlocked object ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15903/files - new: https://git.openjdk.org/jdk/pull/15903/files/83da590b..2ad32839 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15903&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15903&range=01-02 Stats: 5179 lines in 187 files changed: 4147 ins; 536 del; 496 mod Patch: https://git.openjdk.org/jdk/pull/15903.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15903/head:pull/15903 PR: https://git.openjdk.org/jdk/pull/15903 From mdoerr at openjdk.org Thu Sep 28 10:38:52 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 28 Sep 2023 10:38:52 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v2] In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 21:14:11 GMT, Martin Doerr wrote: >> I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. >> Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add x86_64 and aarch64 implementation. The lock order cannot be guaranteed for OSR compilation (see JBS discussion). Hence, I'm passing this information and allow reordering only in this case (currently testing on PPC64). Other platforms will follow after more testing and feedback. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1738889945 From coleenp at openjdk.org Thu Sep 28 12:03:36 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 28 Sep 2023 12:03:36 GMT Subject: RFR: 8309599: WeakHandle and OopHandle release should clear obj pointer In-Reply-To: References: <7iiMlW8hlRFlFh8GdIwbphR86Hh3G7L5tA-f_VkxuC8=.fa35078c-c776-4041-841a-5d6c69ba4e06@github.com> Message-ID: <5NUy8Uh3SiwETfwe9Min5PXSqcfAlLunyd1HJIwo4GY=.f4304324-c158-4852-9c56-4398f4b84ca8@github.com> On Thu, 28 Sep 2023 01:46:37 GMT, David Holmes wrote: > Hmmm okay - it seems fragile to have a psuedo-destructor in release(). I don't know what this comment means. It was fragile to *not* have release destroy the _obj pointer, which was the cause of the original confusion and problems while fixing the object monitor deflation code. Thanks Kim and David for the code reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15920#issuecomment-1739005387 From coleenp at openjdk.org Thu Sep 28 12:03:37 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 28 Sep 2023 12:03:37 GMT Subject: Integrated: 8309599: WeakHandle and OopHandle release should clear obj pointer In-Reply-To: <7iiMlW8hlRFlFh8GdIwbphR86Hh3G7L5tA-f_VkxuC8=.fa35078c-c776-4041-841a-5d6c69ba4e06@github.com> References: <7iiMlW8hlRFlFh8GdIwbphR86Hh3G7L5tA-f_VkxuC8=.fa35078c-c776-4041-841a-5d6c69ba4e06@github.com> Message-ID: On Tue, 26 Sep 2023 12:47:42 GMT, Coleen Phillimore wrote: > This change makes WeakHandle and OopHandle release null out the obj pointer, at the cost of making the release function non-const and some changes that propagated from that. This enables ObjectMonitor code to test for null to see if the obj was already released, and seems like the right thing to do. See comments from related PR in the bug report. > Tested with tier1-4. This pull request has now been integrated. Changeset: 0c55887b Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/0c55887bfb131501a26ba431919d94f2ba08a6c1 Stats: 13 lines in 8 files changed: 2 ins; 1 del; 10 mod 8309599: WeakHandle and OopHandle release should clear obj pointer Reviewed-by: dholmes, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/15920 From jvernee at openjdk.org Thu Sep 28 12:05:35 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 28 Sep 2023 12:05:35 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v31] In-Reply-To: References: Message-ID: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 59 commits: - Merge branch 'master' into JEP22 - drop unneeded @compile tags from jtreg tests - Use IAE instead of UOE for unsupported char sets - Use abort instead of IEA when encountering wrong value for ENA attrib. - Fix visibility issues Reviewed-by: mcimadamore - Review comments - fix typos - Tweak support for restricted methods Reviewed-by: jvernee, pminborg - Split note about byte order/alignment out of header - review comments - ... and 49 more: https://git.openjdk.org/jdk/compare/3481ecb2...72650c44 ------------- Changes: https://git.openjdk.org/jdk/pull/15103/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=30 Stats: 4352 lines in 258 files changed: 2211 ins; 1190 del; 951 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From rkennke at openjdk.org Thu Sep 28 12:30:27 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 28 Sep 2023 12:30:27 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 10:38:52 GMT, Martin Doerr wrote: >> I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. >> Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Pass may_be_unordered information to lightweight_unlock. > - Merge remote-tracking branch 'origin' into 8316746_lock_stack > - Add x86_64 and aarch64 implementation. > - 8316746: Top of lock-stack does not match the unlocked object src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 4071: > 4069: // Check if the top of the lock-stack matches the unlocked object. > 4070: addi(temp, temp, -oopSize); > 4071: if (may_be_unordered) { I guess there is no need to call the slow-path here. Simply don't emit the assertion should be good enough, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15903#discussion_r1340065560 From jvernee at openjdk.org Thu Sep 28 13:33:32 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 28 Sep 2023 13:33:32 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v32] In-Reply-To: References: Message-ID: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: review @enablePreview from java/foreign/TestRestricted test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15103/files - new: https://git.openjdk.org/jdk/pull/15103/files/72650c44..17dacbbd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=31 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=30-31 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From rehn at openjdk.org Thu Sep 28 14:51:31 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 28 Sep 2023 14:51:31 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 11:47:40 GMT, Hamlin Li wrote: > Only vector version is included in this patch. > > ### Test > The patch passed the jdk tests found via `find test/jdk/ -iname *ChaCha*` I had a look at the added registers, I think you can revert that. src/hotspot/cpu/riscv/assembler_riscv.hpp line 150: > 148: constexpr Register t5 = x30; > 149: constexpr Register t6 = x31; > 150: In your case it doesn't look like we need them? So I think you should revert these changes. As we may want to reserve one of those registers for something in the future. I don't think we should take lightly on just start using them. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4326: > 4324: const Register tmp_addr = t1; > 4325: const Register length = t2; > 4326: const Register avl = t3; There seems to be no overlapping with loop/t0. So avl van just be t0? No need for a fourth temp reg? src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4327: > 4325: const Register length = t2; > 4326: const Register avl = t3; > 4327: const Register stride = t4; There seems to be no overlapping with loop/t0. So avl van just be t0? No need for a fourth/fifth temp reg? src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4339: > 4337: // in java level. > 4338: __ mv(avl, 16); > 4339: __ vsetvli(length, avl, Assembler::e32, Assembler::m1); Here avl can t0, no? src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4377: > 4375: > 4376: // Store result to key stream > 4377: __ mv(stride, 64); Here stride can be t0, no? ------------- Changes requested by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15899#pullrequestreview-1649045488 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1340264742 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1340273743 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1340274188 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1340267856 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1340267577 From tschatzl at openjdk.org Thu Sep 28 15:32:11 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 28 Sep 2023 15:32:11 GMT Subject: RFR: 8315503: G1: Code root scan causes long GC pauses due to imbalanced iteration [v7] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that modifies the code root (remembered) set to use the CHT as internal representation. > > This removes lots of locking (inhibiting throughput), provides automatic balancing for the code root scan phase, and (parallel) bulk unregistering of nmethdos during code cache unloading improving performance of various pauses that deal with code root sets. > > With a stress test that frequently loads and unloads 6000 classes and associated methods from them we could previously see the following issues: > > During collection pauses: > > [4179,965s][gc,phases ] GC(273) Evacuate Collection Set: 812,18ms > [..] > [4179,965s][gc,phases ] GC(273) Code Root Scan (ms): Min: 0,00, Avg: 59,03, Max: 775,12, Diff: 775,12, Sum: 944,44, Workers: 16 > [...] > [4179,965s][gc,phases ] GC(273) Termination (ms): Min: 0,03, Avg: 643,90, Max: 690,96, Diff: 690,93, Sum: 10302,47, Workers: 16 > > > Code root scan now reduces to ~22ms max on average in this case. > > We have recently seen some imbalances in code root scan and long Remark pauses (thankfully not to that extreme) in other real-world applications too: > > [2466.979s][gc,phases ] GC(131) Code Root Scan (ms): Min: 0.0, Avg: 5.7, Max: 46.4, Diff: 46.4, Sum: 57.0, Workers: 10 > > > Some random comment: > * the mutex for the CHT had to be decreased in priority by one to not conflict with `CodeCache_lock`. This does not seem to be detrimental otherwise. At the same time, I had to move the locks at `nosafepoint-3` to `nosafepoint-4` as well to keep previous ordering. All mutexes with uses of `nosafepoint` as their rank seem to be good now. > > Testing: tier1-5 > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: ayang review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15811/files - new: https://git.openjdk.org/jdk/pull/15811/files/6e7bacd8..e025a07b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15811&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15811&range=05-06 Stats: 20 lines in 1 file changed: 0 ins; 7 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/15811.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15811/head:pull/15811 PR: https://git.openjdk.org/jdk/pull/15811 From tschatzl at openjdk.org Thu Sep 28 15:32:17 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 28 Sep 2023 15:32:17 GMT Subject: RFR: 8315503: G1: Code root scan causes long GC pauses due to imbalanced iteration [v6] In-Reply-To: <3G9kv0_H2r1AbObnWLCj00kvPYrzFI7NfcQDoQbdCHo=.608c2eea-dd8d-4f8d-a1ba-956f0823e5ff@github.com> References: <3G9kv0_H2r1AbObnWLCj00kvPYrzFI7NfcQDoQbdCHo=.608c2eea-dd8d-4f8d-a1ba-956f0823e5ff@github.com> Message-ID: On Wed, 27 Sep 2023 18:16:09 GMT, Albert Mingkun Yang wrote: >The title should probably be revised, sth like "use conc-hashtable for code-root", to better match the content of the PR. (For example, I don't get the "imbalanced" part after going through the diff.) > The title is still correct as mentioned in the other comment. This corroborates with the test results from the (now) attached test case. >Could you also update the perf number now that only one optimization is included? Will fix. > src/hotspot/share/gc/g1/g1CodeRootSet.cpp line 45: > >> 43: class G1CodeRootSetHashTableConfig : public StackObj { >> 44: public: >> 45: using Value = G1CodeRootSetHashTableValue; > > Could one use `nmethod*` here directly? Having one extra indirection/layer makes it harder to follow. I used the same style as for the `CardSet` implementation. Fixed though because I do not care that much (although I do not really like the `nmethod**` everywhere instead). > src/hotspot/share/gc/g1/g1CodeRootSet.cpp line 82: > >> 80: uintx get_hash() const; >> 81: bool equals(G1CodeRootSetHashTableValue* value); >> 82: bool is_dead(G1CodeRootSetHashTableValue* value) const { return false; } > > I wonder if `(...)` works, since the arg in unused. (Inspired by `struct NOP` in conc-hashtable.) something is missing here. You probably mean something like: ``` bool is_dead(G1CodeRootSetHashTableValue*) const``` We do not do that elsewhere either. We do not use `(...)` either elsewhere (in gc code at least), and I do not want to start this kind of discussion as part of this change. > src/hotspot/share/gc/g1/g1RemSet.cpp line 825: > >> 823: >> 824: // Scan code root remembered sets. >> 825: { > > Without the claim-logic, all workers will scan code-root. Why is it needed that multiples workers scan the same set of code-root repeatedly? I thought once is enough per region. Every hashtable has its internal claim now. So to have (eventually) multiple threads help with a single hashtable, all threads will at least to visit and check the current claim value whether this particular hashtable has been fully claimed (processed). ------------- PR Comment: https://git.openjdk.org/jdk/pull/15811#issuecomment-1739524187 PR Review Comment: https://git.openjdk.org/jdk/pull/15811#discussion_r1340324344 PR Review Comment: https://git.openjdk.org/jdk/pull/15811#discussion_r1340324873 PR Review Comment: https://git.openjdk.org/jdk/pull/15811#discussion_r1340325412 From dchuyko at openjdk.org Thu Sep 28 15:39:17 2023 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Thu, 28 Sep 2023 15:39:17 GMT Subject: RFR: 8309271: A way to align already compiled methods with compiler directives [v8] In-Reply-To: References: Message-ID: > Compiler Control (https://openjdk.org/jeps/165) provides method-context dependent control of the JVM compilers (C1 and C2). The active directive stack is built from the directive files passed with the `-XX:CompilerDirectivesFile` diagnostic command-line option and the Compiler.add_directives diagnostic command. It is also possible to clear all directives or remove the top from the stack. > > A matching directive will be applied at method compilation time when such compilation is started. If directives are added or changed, but compilation does not start, then the state of compiled methods doesn't correspond to the rules. This is not an error, and it happens in long running applications when directives are added or removed after compilation of methods that could be matched. For example, the user decides that C2 compilation needs to be disabled for some method due to a compiler bug, issues such a directive but this does not affect the application behavior. In such case, the target application needs to be restarted, and such an operation can have high costs and risks. Another goal is testing/debugging compilers. > > It would be convenient to optionally reconcile at least existing matching nmethods to the current stack of compiler directives (so bypass inlined methods). > > Natural way to eliminate the discrepancy between the result of compilation and the broken rule is to discard the compilation result, i.e. deoptimization. Prior to that we can try to re-compile the method letting compile broker to perform it taking new directives stack into account. Re-compilation helps to prevent hot methods from execution in the interpreter. > > A new flag `-r` has beed introduced for some directives related to compile commands: `Compiler.add_directives`, `Compiler.remove_directives`, `Compiler.clear_directives`. The default behavior has not changed (no flag). If the new flag is present, the command scans already compiled methods and puts methods that have any active non-default matching compiler directives to re-compilation if possible, otherwise marks them for deoptimization. There is currently no distinction which directives are found. In particular, this means that if there are rules for inlining into some method, it will be refreshed. On the other hand, if there are rules for a method and it was inlined, top-level methods won't be refreshed, but this can be achieved by having rules for them. > > In addition, a new diagnostic command `Compiler.replace_directives`, has been added for ... Dmitry Chuyko has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: - Merge branch 'openjdk:master' into compiler-directives-force-update - Merge branch 'openjdk:master' into compiler-directives-force-update - Merge branch 'openjdk:master' into compiler-directives-force-update - jcheck - Unnecessary import - force_update->refresh - Merge branch 'openjdk:master' into compiler-directives-force-update - Use only top directive for add/remove; better mutex rank definition; texts - Merge branch 'openjdk:master' into compiler-directives-force-update - Merge branch 'openjdk:master' into compiler-directives-force-update - ... and 16 more: https://git.openjdk.org/jdk/compare/fc989986...d95d2609 ------------- Changes: https://git.openjdk.org/jdk/pull/14111/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14111&range=07 Stats: 372 lines in 15 files changed: 339 ins; 3 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/14111.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14111/head:pull/14111 PR: https://git.openjdk.org/jdk/pull/14111 From ayang at openjdk.org Thu Sep 28 16:38:28 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 28 Sep 2023 16:38:28 GMT Subject: RFR: 8315503: G1: Code root scan causes long GC pauses due to imbalanced iteration [v6] In-Reply-To: References: <3G9kv0_H2r1AbObnWLCj00kvPYrzFI7NfcQDoQbdCHo=.608c2eea-dd8d-4f8d-a1ba-956f0823e5ff@github.com> Message-ID: On Thu, 28 Sep 2023 15:22:46 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/g1CodeRootSet.cpp line 82: >> >>> 80: uintx get_hash() const; >>> 81: bool equals(G1CodeRootSetHashTableValue* value); >>> 82: bool is_dead(G1CodeRootSetHashTableValue* value) const { return false; } >> >> I wonder if `(...)` works, since the arg in unused. (Inspired by `struct NOP` in conc-hashtable.) > > something is missing here. You probably mean something like: > > ``` bool is_dead(G1CodeRootSetHashTableValue*) const``` > > We do not do that elsewhere either. We do not use `(...)` either elsewhere (in gc code at least), and I do not want to start this kind of discussion as part of this change. I meant `bool is_dead(...) const { return false; }`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15811#discussion_r1340413820 From mli at openjdk.org Thu Sep 28 16:54:14 2023 From: mli at openjdk.org (Hamlin Li) Date: Thu, 28 Sep 2023 16:54:14 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic [v2] In-Reply-To: References: Message-ID: > Only vector version is included in this patch. > > ### Test > The patch passed the jdk tests found via `find test/jdk/ -iname *ChaCha*` Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: revert adding t3-t6 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15899/files - new: https://git.openjdk.org/jdk/pull/15899/files/100d7c61..fc19cb23 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15899&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15899&range=00-01 Stats: 40 lines in 2 files changed: 9 ins; 7 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/15899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15899/head:pull/15899 PR: https://git.openjdk.org/jdk/pull/15899 From mli at openjdk.org Thu Sep 28 16:54:16 2023 From: mli at openjdk.org (Hamlin Li) Date: Thu, 28 Sep 2023 16:54:16 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic [v2] In-Reply-To: References: Message-ID: <55D3pxr3rzzXVQiDf__qRCHxAlKgDTUT6R1D4Z0HJQg=.340f02d9-1221-407d-8ff1-6bfdff1bff3e@github.com> On Thu, 28 Sep 2023 14:47:08 GMT, Robbin Ehn wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> revert adding t3-t6 > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4327: > >> 4325: const Register length = t2; >> 4326: const Register avl = t3; >> 4327: const Register stride = t4; > > There seems to be no overlapping with loop/t0. > So avl van just be t0? No need for a fourth/fifth temp reg? Thanks Robbin, I've reverted adding t3-t6. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1340423091 From sspitsyn at openjdk.org Thu Sep 28 16:54:37 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 28 Sep 2023 16:54:37 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v6] In-Reply-To: <-ALHHMcYPfciG6g2sOT-XIEVTf1pA6XXa93eNXQamD4=.88329bf9-627b-4d78-93fc-299550fc2be0@github.com> References: <-ALHHMcYPfciG6g2sOT-XIEVTf1pA6XXa93eNXQamD4=.88329bf9-627b-4d78-93fc-299550fc2be0@github.com> Message-ID: On Thu, 28 Sep 2023 09:49:05 GMT, Afshin Zafari wrote: >> The `find` method now is >> ```C++ >> template >> int find(T* token, bool f(T*, E)) const { >> ... >> >> Any other functions which use this are also changed. >> Local linux-x64-debug hotspot:tier1 passed. Mach5 tier1 build on linux and Windows passed. > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > first arg of `find` casted to `uint*` Marked as reviewed by sspitsyn (Reviewer). The serviceability files look good. By being paranoid I'd suggest to run more tiers, eg. 3-4. ------------- PR Review: https://git.openjdk.org/jdk/pull/15418#pullrequestreview-1649318449 PR Comment: https://git.openjdk.org/jdk/pull/15418#issuecomment-1739688410 From ayang at openjdk.org Thu Sep 28 17:14:30 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 28 Sep 2023 17:14:30 GMT Subject: RFR: 8315503: G1: Code root scan causes long GC pauses due to imbalanced iteration [v7] In-Reply-To: References: Message-ID: <6QZpwgtss6Q4BBNU0V2RcSr31QVvT9-Dvz8N64S_kaw=.f91a637c-cae8-4497-993a-802c0d6aa529@github.com> On Thu, 28 Sep 2023 15:32:11 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that modifies the code root (remembered) set to use the CHT as internal representation. >> >> This removes lots of locking (inhibiting throughput), provides automatic balancing for the code root scan phase, and (parallel) bulk unregistering of nmethdos during code cache unloading improving performance of various pauses that deal with code root sets. >> >> With a stress test that frequently loads and unloads 6000 classes and associated methods from them we could previously see the following issues: >> >> During collection pauses: >> >> [4179,965s][gc,phases ] GC(273) Evacuate Collection Set: 812,18ms >> [..] >> [4179,965s][gc,phases ] GC(273) Code Root Scan (ms): Min: 0,00, Avg: 59,03, Max: 775,12, Diff: 775,12, Sum: 944,44, Workers: 16 >> [...] >> [4179,965s][gc,phases ] GC(273) Termination (ms): Min: 0,03, Avg: 643,90, Max: 690,96, Diff: 690,93, Sum: 10302,47, Workers: 16 >> >> >> Code root scan now reduces to ~22ms max on average in this case. >> >> We have recently seen some imbalances in code root scan and long Remark pauses (thankfully not to that extreme) in other real-world applications too: >> >> [2466.979s][gc,phases ] GC(131) Code Root Scan (ms): Min: 0.0, Avg: 5.7, Max: 46.4, Diff: 46.4, Sum: 57.0, Workers: 10 >> >> >> Some random comment: >> * the mutex for the CHT had to be decreased in priority by one to not conflict with `CodeCache_lock`. This does not seem to be detrimental otherwise. At the same time, I had to move the locks at `nosafepoint-3` to `nosafepoint-4` as well to keep previous ordering. All mutexes with uses of `nosafepoint` as their rank seem to be good now. >> >> Testing: tier1-5 >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > ayang review > Every hashtable has its internal claim now. I overlooked that. Thank you for the explanation. ------------- Marked as reviewed by ayang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15811#pullrequestreview-1649356965 From dnsimon at openjdk.org Thu Sep 28 20:43:58 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 28 Sep 2023 20:43:58 GMT Subject: RFR: 8283689: Update the foreign linker VM implementation [v23] In-Reply-To: References: Message-ID: On Tue, 17 May 2022 15:53:05 GMT, Jorn Vernee wrote: >> Hi, >> >> This PR updates the VM implementation of the foreign linker, by bringing over commits from the panama-foreign repo. >> >> This is split off from the main JEP integration for 19, since we have limited resources to handle this. As such, this PR might fall over to 20, but it would be nice if we could get it into 19. >> >> I've written up an overview of the Linker architecture here: http://cr.openjdk.java.net/~jvernee/docs/FL_Overview.html it might be useful to read that first. >> >> This patch moves from the "legacy" implementation, to what is currently implemented in the panama-foreign repo, except for replacing the use of method handle combinators with ASM. That will come in a later path. To recap. This PR contains the following changes: >> >> 1. VM stubs for downcalls are now generated up front, instead of lazily by C2 [1]. >> 2. the VM support for upcalls/downcalls now support all possible call shapes. And VM stubs and Java code implementing the buffered invocation strategy has been removed [2], [3], [4], [5]. >> 3. The existing C2 intrinsification support for the `linkToNative` method handle linker was no longer needed and has been removed [6] (support might be re-added in another form later). >> 4. Some other cleanups, such as: OptimizedEntryBlob (for upcalls) now implements RuntimeBlob directly. Binding to java classes has been rewritten to use javaClasses.h/cpp (this wasn't previously possible due to these java classes being in an incubator module) [7], [8], [9]. >> >> While the patch mostly consists of VM changes, there are also some Java changes to support (2). >> >> The original commit structure has been mostly retained, so it might be useful to look at a specific commit, or the corresponding patch in the [panama-foreign](https://github.com/openjdk/panama-foreign/pulls?q=is%3Apr) repo as well. I've also left some inline comments to explain some of the changes, which will hopefully make reviewing easier. >> >> Testing: Tier1-4 >> >> Thanks, >> Jorn >> >> [1]: https://github.com/openjdk/jdk/pull/7959/commits/048b88156814579dca1f70742061ad24942fd358 >> [2]: https://github.com/openjdk/jdk/pull/7959/commits/2fbbef472b4c2b4fee5ede2f18cd81ab61e88f49 >> [3]: https://github.com/openjdk/jdk/pull/7959/commits/8a957a4ed9cc8d1f708ea8777212eb51ab403dc3 >> [4]: https://github.com/openjdk/jdk/pull/7959/commits/35ba1d964f1de4a77345dc58debe0565db4b0ff3 >> [5]: https://github.com/openjdk/jdk/pull/7959/commits/4e72aae22920300c5ffa16fed805b62ed9092120 >> [6]: https://github.... > > Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 105 commits: > > - Merge branch 'master' into JEP-19-VM-IMPL2 > - ifdef NOT_PRODUCT -> ifndef PRODUCT > - Missing ASSERT -> NOT_PRODUCT > - Cleanup UL usage > - Fix failure with SPEC disabled (accidentally dropped change) > - indentation > - fix space > - Merge branch 'master' into JEP-19-VM-IMPL2 > - Undo spurious changes. > - Merge branch 'JEP-19-VM-IMPL2' of https://github.com/JornVernee/jdk into JEP-19-VM-IMPL2 > - ... and 95 more: https://git.openjdk.org/jdk/compare/af07919e...c3c1421b src/hotspot/cpu/aarch64/universalNativeInvoker_aarch64.cpp line 105: > 103: > 104: RuntimeStub* stub = > 105: RuntimeStub::new_runtime_stub("nep_invoker_blob", Is it acceptable for a VM fatal error to occur when the `RuntimeStub` cannot be allocated due to a (temporarily?) full code cache? If not, then you may want to do something like I'm doing in https://github.com/openjdk/jdk/pull/15970. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/7959#discussion_r1340644955 From pchilanomate at openjdk.org Thu Sep 28 21:54:41 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 28 Sep 2023 21:54:41 GMT Subject: RFR: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame Message-ID: Please review the following patch. As explained in the bug comments the problem is that os::get_sender_for_C_frame() always constructs a frame as if the sender is also a native C/C++ frame. Setting a correct value for _unextended_sp is important to avoid crashes if this value is later used to get that frame's caller, which will happen if we end up calling frame::sender_for_compiled_frame(). The issue exists on aarch64 for both linux and macos but the fix for linux is different. The "Procedure Call Standard for the Arm 64-bit Architecture" doesn't specify a location for the frame record within a stack frame (6.4.6), and gcc happens to choose to save it the top of the frame (lowest address) rather than the bottom. This means that changing fr->link() for fr->sender_sp() won't work. The fix is to use the value of fr->link() but adjusted using the code blob frame size before setting it as the _unextended_sp of the sender frame. While working on this fix I realized the issue is not only when the sender is a native nmethod but with all frames associated with a CodeBlob with a frame size > 0 (runtime stub, safepoint stub, etc) so the check takes that into account. I also made a small fix to next_frame() since these mentioned frames should also use frame::sender(). I created a new test to verify that walking the stack over a native nmethod or runtime stub now works okay. I'll try to add a reliable test case for walking over a safepoint stub too. I tested the fix by running the new test and also running tiers1-4 in mach5. I'll run the upper tiers too. Thanks, Patricio ------------- Commit messages: - v1 Changes: https://git.openjdk.org/jdk/pull/15972/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15972&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316309 Stats: 150 lines in 6 files changed: 144 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/15972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15972/head:pull/15972 PR: https://git.openjdk.org/jdk/pull/15972 From dlong at openjdk.org Fri Sep 29 01:19:16 2023 From: dlong at openjdk.org (Dean Long) Date: Fri, 29 Sep 2023 01:19:16 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v2] In-Reply-To: References: Message-ID: <8NGDMPaqgayziUbaCziBd5ra9L588Vj41J4KW_Jq1iw=.686e04db-b0a0-4f49-a00c-1508a63cacd0@github.com> On Thu, 28 Sep 2023 10:32:02 GMT, Martin Doerr wrote: > The lock order cannot be guaranteed for OSR compilation (see JBS discussion). There won't be an nmethod with an OSR entry point if the locks aren't nested correctly, so I still don't understand what is going wrong and why this change is necessary. The C2 assembly code output in the hs_err file looks correct. How could the locks in the interpreter frame get out of order? Allowing C2 to unlock locks out of order seems like the wrong solution. I think it could have unintended consequences because this is a basic invariant for the compilers. If this problem can somehow to caused by JVMTI, then whatever unsafe operation JVMTI is doing should probably invalidate OSR entry points for the method. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1740169908 From dholmes at openjdk.org Fri Sep 29 03:15:42 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 29 Sep 2023 03:15:42 GMT Subject: RFR: 8309599: WeakHandle and OopHandle release should clear obj pointer In-Reply-To: <5NUy8Uh3SiwETfwe9Min5PXSqcfAlLunyd1HJIwo4GY=.f4304324-c158-4852-9c56-4398f4b84ca8@github.com> References: <7iiMlW8hlRFlFh8GdIwbphR86Hh3G7L5tA-f_VkxuC8=.fa35078c-c776-4041-841a-5d6c69ba4e06@github.com> <5NUy8Uh3SiwETfwe9Min5PXSqcfAlLunyd1HJIwo4GY=.f4304324-c158-4852-9c56-4398f4b84ca8@github.com> Message-ID: On Thu, 28 Sep 2023 11:58:23 GMT, Coleen Phillimore wrote: > > Hmmm okay - it seems fragile to have a psuedo-destructor in release(). > > I don't know what this comment means. Object lifetimes should be well managed such that you can't use an object after it has been "destroyed". Methods like `release()` effectively nuke the internals of the object but the object is still available to be (mis)used. Before this fix `release` left a dangling `_obj` pointer, but that wouldn't be an issue if the handle itself could not be used after being released. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15920#issuecomment-1740244084 From lmesnik at openjdk.org Fri Sep 29 03:56:50 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 29 Sep 2023 03:56:50 GMT Subject: RFR: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame In-Reply-To: References: Message-ID: <_bcgcJi0-GKncB1R89sQZkZTsbv9JuwLvYu-WycITPY=.f7eada18-56ea-4774-b44c-54af193957e0@github.com> On Thu, 28 Sep 2023 21:07:09 GMT, Patricio Chilano Mateo wrote: > Please review the following patch. As explained in the bug comments the problem is that os::get_sender_for_C_frame() always constructs a frame as if the sender is also a native C/C++ frame. Setting a correct value for _unextended_sp is important to avoid crashes if this value is later used to get that frame's caller, which will happen if we end up calling frame::sender_for_compiled_frame(). > > The issue exists on aarch64 for both linux and macos but the fix for linux is different. The "Procedure Call Standard for the Arm 64-bit Architecture" doesn't specify a location for the frame record within a stack frame (6.4.6), and gcc happens to choose to save it the top of the frame (lowest address) rather than the bottom. This means that changing fr->link() for fr->sender_sp() won't work. The fix is to use the value of fr->link() but adjusted using the code blob frame size before setting it as the _unextended_sp of the sender frame. While working on this fix I realized the issue is not only when the sender is a native nmethod but with all frames associated with a CodeBlob with a frame size > 0 (runtime stub, safepoint stub, etc) so the check takes that into account. I also made a small fix to next_frame() since these mentioned frames should also use frame::sender(). > > I created a new test to verify that walking the stack over a native nmethod or runtime stub now works okay. I'll try to add a reliable test case for walking over a safepoint stub too. I tested the fix by running the new test and also running tiers1-4 in mach5. I'll run the upper tiers too. > > Thanks, > Patricio Changes requested by lmesnik (Reviewer). test/hotspot/jtreg/runtime/ErrorHandling/StackWalkNativeToJava.java line 39: > 37: * @requires os.family != "windows" > 38: * @library /test/lib > 39: * @run main/othervm StackWalkNativeToJava I think it should be driver instead of main/othervm here. test/hotspot/jtreg/runtime/ErrorHandling/StackWalkNativeToJava.java line 67: > 65: commands.add("StackWalkNativeToJava$TestNativeToJavaNative"); > 66: > 67: ProcessBuilder pb = ProcessTools.createJavaProcessBuilder(commands); The test ignores any external VM flags. Pleas add @requires vm.flagless to the test header to don't run this test with any additional VM flags ------------- PR Review: https://git.openjdk.org/jdk/pull/15972#pullrequestreview-1650000952 PR Review Comment: https://git.openjdk.org/jdk/pull/15972#discussion_r1340863304 PR Review Comment: https://git.openjdk.org/jdk/pull/15972#discussion_r1340864555 From dholmes at openjdk.org Fri Sep 29 04:32:17 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 29 Sep 2023 04:32:17 GMT Subject: RFR: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 21:07:09 GMT, Patricio Chilano Mateo wrote: > Please review the following patch. As explained in the bug comments the problem is that os::get_sender_for_C_frame() always constructs a frame as if the sender is also a native C/C++ frame. Setting a correct value for _unextended_sp is important to avoid crashes if this value is later used to get that frame's caller, which will happen if we end up calling frame::sender_for_compiled_frame(). > > The issue exists on aarch64 for both linux and macos but the fix for linux is different. The "Procedure Call Standard for the Arm 64-bit Architecture" doesn't specify a location for the frame record within a stack frame (6.4.6), and gcc happens to choose to save it the top of the frame (lowest address) rather than the bottom. This means that changing fr->link() for fr->sender_sp() won't work. The fix is to use the value of fr->link() but adjusted using the code blob frame size before setting it as the _unextended_sp of the sender frame. While working on this fix I realized the issue is not only when the sender is a native nmethod but with all frames associated with a CodeBlob with a frame size > 0 (runtime stub, safepoint stub, etc) so the check takes that into account. I also made a small fix to next_frame() since these mentioned frames should also use frame::sender(). > > I created a new test to verify that walking the stack over a native nmethod or runtime stub now works okay. I'll try to add a reliable test case for walking over a safepoint stub too. I tested the fix by running the new test and also running tiers1-4 in mach5. I'll run the upper tiers too. > > Thanks, > Patricio src/hotspot/share/utilities/vmError.cpp line 434: > 432: return invalid; > 433: } > 434: if (fr.is_interpreted_frame() || (fr.cb() != nullptr && fr.cb()->frame_size() > 0)) { This part of the fix is unclear to me. How do the old conditions relate to the new ones? test/hotspot/jtreg/runtime/ErrorHandling/StackWalkNativeToJava.java line 84: > 82: public void callNativeMethod() throws Exception { > 83: Object obj = new Object(); > 84: obj.wait(); Just to be clear, the aim here is to call a native method that will complete by throwing an exception, so you can abort the VM. A comment to that affect would be good. Thanks test/hotspot/jtreg/runtime/ErrorHandling/StackWalkNativeToJava.java line 116: > 114: } > 115: > 116: public void callVMMethod() throws Exception { Again a comment outlining how you expect this to abort the VM would be good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15972#discussion_r1340877413 PR Review Comment: https://git.openjdk.org/jdk/pull/15972#discussion_r1340876167 PR Review Comment: https://git.openjdk.org/jdk/pull/15972#discussion_r1340876631 From rehn at openjdk.org Fri Sep 29 06:33:05 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 29 Sep 2023 06:33:05 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic [v2] In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 16:54:14 GMT, Hamlin Li wrote: >> Only vector version is included in this patch. >> >> ### Test >> The patch passed the jdk tests found via `find test/jdk/ -iname *ChaCha*` > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > revert adding t3-t6 @gctony hopefully we can agree on the register :) As I said adding new temps in assembler_riscv.hpp will make these register very hard to 'reclaim' in a few years. So I think we should try to keep some available for future features and such. Can you look and see if @Hamlin-Li latest changes are okay? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15899#issuecomment-1740374812 From dholmes at openjdk.org Fri Sep 29 06:55:40 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 29 Sep 2023 06:55:40 GMT Subject: RFR: 8314294: Unsafe::allocateMemory and Unsafe::freeMemory are slower than malloc/free Message-ID: <5BsY-GjOu3yFCKRa2U_JeJS7b3KncWlt5Jeips2mwP8=.ec4f1e3b-1081-48b4-abac-e4ddb241c02e@github.com> To increase performance by avoiding a thread-state transition (native -> in_vm) we change the three "raw" allocation functions in Unsafe to be UNSAFE_LEAF rather than UNSAFE_ENTRY. It is hard to track through all the related code to prove this is a safe change, but I could not spot anything obvious and testing indicated no issues (my main concern was potential missing WXWrite on macOS Aarch64). Testing: - tiers 1-7 on linux and macos x64 and Aarch64, plus Windows x64 Thanks ------------- Commit messages: - 8314294: Unsafe::allocateMemory and Unsafe::freeMemory are slower than malloc/free Changes: https://git.openjdk.org/jdk/pull/15977/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15977&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314294 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/15977.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15977/head:pull/15977 PR: https://git.openjdk.org/jdk/pull/15977 From kbarrett at openjdk.org Fri Sep 29 07:11:11 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 29 Sep 2023 07:11:11 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly In-Reply-To: References: Message-ID: <0oZ06FXPfZ6SEJBdPRrUEiW3tqR7hTWkXUeXdzqVyNo=.e782f692-9152-44db-9847-7292e4afa7a0@github.com> On Mon, 18 Sep 2023 07:37:26 GMT, Liming Liu wrote: > As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). > > Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported: > > > > > > > > > > > >
Kernel -XX:-TransparentHugePages -XX:+TransparentHugePages
Unpatched Patched Unpatched Patched
4.18 11.30 11.30 0.25 0.25
5.13 0.22 0.22 3.42 3.42
6.1 0.27 0.33 3.54 0.33
PretouchTask attempts to parallelize the pretouching. How well does that work with the use of MADV_POPULATE_WRITE? src/hotspot/share/gc/shared/pretouchTask.cpp line 75: > 73: // initially always use small pages. > 74: page_size = UseTransparentHugePages ? (size_t)os::vm_page_size() : page_size; > 75: #endif I never liked this, so happy to see it gone. src/hotspot/share/runtime/os.cpp line 2117: > 2115: } > 2116: > 2117: void os::pretouch_memory_common(void *first, void *last, size_t page_size) { Suggest asserting `first` and `last` are `page_size` aligned. Also maybe assert `first <= last` here too. src/hotspot/share/runtime/os.cpp line 2119: > 2117: void os::pretouch_memory_common(void *first, void *last, size_t page_size) { > 2118: for (char *cur = static_cast(first); /* break */; cur += page_size) { > 2119: Atomic::add(reinterpret_cast(cur), 0, memory_order_relaxed); Throughout, HotSpot style is generally to cuddle a ptr-operator with the type, e.g. `char* foo` rather than `char *foo`, and similarly in the casts. src/hotspot/share/runtime/os.hpp line 226: > 224: static void pd_free_memory(char *addr, size_t bytes, size_t alignment_hint); > 225: static void pd_realign_memory(char *addr, size_t bytes, size_t alignment_hint); > 226: static void pd_pretouch_memory(void *first, void *last, size_t page_size); I wish we had a better pattern than a (usually identical) version of a function for each platform, but this is consistent with how similar things are done elsewhere. Good that you provide a common helper. So okay. And yes, I see that the existing nearby declarations are inconsistent with what I suggested is usual HotSpot style regarding the ptr-operator placement. This file is particularly inconsistent in that respect, sometimes even in the same declaration. But the substantial majority are type cuddled. In this particular case I'd not object to being consistent with the adjacent code. test/hotspot/jtreg/gc/parallel/TestParallelAlwaysPreTouch.java line 49: > 47: public class TestParallelAlwaysPreTouch { > 48: public static void main(String[] args) throws Exception { > 49: // everything should happen before entry point This isn't really testing anything beyond not crashing with the given options, and giving some log messages for manual examination. It would be better if some examination of the log messages could be performed to verify expected behavior. ------------- PR Review: https://git.openjdk.org/jdk/pull/15781#pullrequestreview-1650014869 PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1340977019 PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1340872840 PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1340872628 PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1340878255 PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1340947249 From shade at openjdk.org Fri Sep 29 07:16:03 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 29 Sep 2023 07:16:03 GMT Subject: RFR: 8314294: Unsafe::allocateMemory and Unsafe::freeMemory are slower than malloc/free In-Reply-To: <5BsY-GjOu3yFCKRa2U_JeJS7b3KncWlt5Jeips2mwP8=.ec4f1e3b-1081-48b4-abac-e4ddb241c02e@github.com> References: <5BsY-GjOu3yFCKRa2U_JeJS7b3KncWlt5Jeips2mwP8=.ec4f1e3b-1081-48b4-abac-e4ddb241c02e@github.com> Message-ID: On Fri, 29 Sep 2023 06:48:05 GMT, David Holmes wrote: > To increase performance by avoiding a thread-state transition (native -> in_vm) we change the three "raw" allocation functions in Unsafe to be UNSAFE_LEAF rather than UNSAFE_ENTRY. > > It is hard to track through all the related code to prove this is a safe change, but I could not spot anything obvious and testing indicated no issues (my main concern was potential missing WXWrite on macOS Aarch64). > > Testing: > - tiers 1-7 on linux and macos x64 and Aarch64, plus Windows x64 > > Thanks Looks reasonable. I clicked through some of the os::{malloc,realloc,free} implementations, and nothing pops out as requiring the VM mode. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15977#pullrequestreview-1650197945 From dlong at openjdk.org Fri Sep 29 07:50:04 2023 From: dlong at openjdk.org (Dean Long) Date: Fri, 29 Sep 2023 07:50:04 GMT Subject: RFR: 8314294: Unsafe::allocateMemory and Unsafe::freeMemory are slower than malloc/free In-Reply-To: <5BsY-GjOu3yFCKRa2U_JeJS7b3KncWlt5Jeips2mwP8=.ec4f1e3b-1081-48b4-abac-e4ddb241c02e@github.com> References: <5BsY-GjOu3yFCKRa2U_JeJS7b3KncWlt5Jeips2mwP8=.ec4f1e3b-1081-48b4-abac-e4ddb241c02e@github.com> Message-ID: On Fri, 29 Sep 2023 06:48:05 GMT, David Holmes wrote: > To increase performance by avoiding a thread-state transition (native -> in_vm) we change the three "raw" allocation functions in Unsafe to be UNSAFE_LEAF rather than UNSAFE_ENTRY. > > It is hard to track through all the related code to prove this is a safe change, but I could not spot anything obvious and testing indicated no issues (my main concern was potential missing WXWrite on macOS Aarch64). > > Testing: > - tiers 1-7 on linux and macos x64 and Aarch64, plus Windows x64 > > Thanks Looks good. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15977#pullrequestreview-1650246477 From mcimadamore at openjdk.org Fri Sep 29 08:20:03 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Fri, 29 Sep 2023 08:20:03 GMT Subject: RFR: 8314294: Unsafe::allocateMemory and Unsafe::freeMemory are slower than malloc/free In-Reply-To: <5BsY-GjOu3yFCKRa2U_JeJS7b3KncWlt5Jeips2mwP8=.ec4f1e3b-1081-48b4-abac-e4ddb241c02e@github.com> References: <5BsY-GjOu3yFCKRa2U_JeJS7b3KncWlt5Jeips2mwP8=.ec4f1e3b-1081-48b4-abac-e4ddb241c02e@github.com> Message-ID: On Fri, 29 Sep 2023 06:48:05 GMT, David Holmes wrote: > To increase performance by avoiding a thread-state transition (native -> in_vm) we change the three "raw" allocation functions in Unsafe to be UNSAFE_LEAF rather than UNSAFE_ENTRY. > > It is hard to track through all the related code to prove this is a safe change, but I could not spot anything obvious and testing indicated no issues (my main concern was potential missing WXWrite on macOS Aarch64). > > Testing: > - tiers 1-7 on linux and macos x64 and Aarch64, plus Windows x64 > > Thanks Thanks for taking care of this @dholmes-ora. Do you know if Unsafe::copyMemory, or Unsafe::setMemory can also receive same treatment? These are bulk operations, so they are less sensitive to the transition cost - but for small copies it can still be a factor. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15977#issuecomment-1740494281 From ngasson at openjdk.org Fri Sep 29 08:20:37 2023 From: ngasson at openjdk.org (Nick Gasson) Date: Fri, 29 Sep 2023 08:20:37 GMT Subject: RFR: 8316880: AArch64: "stop: Header is not fast-locked" with -XX:-UseLSE since JDK-8315880 Message-ID: Building a fastdebug image on a machine without LSE (e.g. A72) or explicitly disabling LSE results in: # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (0xe0000000), pid=64585, tid=64619 # stop: Header is not fast-locked # # JRE version: OpenJDK Runtime Environment (22.0) (fastdebug build 22-internal-git-a2391a92c) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-internal-git-a2391a92c, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) # Problematic frame: # J 1373 c2 sun.nio.ch.NativeThreadSet.add()I java.base (155 bytes) @ 0x0000ffff7ccdf110 [0x0000ffff7ccdef80+0x0000000000000190] # When UseLSE is false `MacroAssembler::cmpxchg()` uses rscratch1 as a temporary to store the result of the store-exclusive instruction. However rscratch1 may also be one of the registers passed as t1 or t2 to `MacroAssembler::lightweight_lock()` and holding a live value which is then clobbered. Fixed by ensuring rscratch1 is never passed as one of these temporaries. ------------- Commit messages: - 8316880: AArch64: "stop: Header is not fast-locked" with -XX:-UseLSE since JDK-8315880 Changes: https://git.openjdk.org/jdk/pull/15978/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15978&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316880 Stats: 29 lines in 8 files changed: 8 ins; 0 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/15978.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15978/head:pull/15978 PR: https://git.openjdk.org/jdk/pull/15978 From rkennke at openjdk.org Fri Sep 29 09:59:01 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 29 Sep 2023 09:59:01 GMT Subject: RFR: 8316880: AArch64: "stop: Header is not fast-locked" with -XX:-UseLSE since JDK-8315880 In-Reply-To: References: Message-ID: On Fri, 29 Sep 2023 08:12:06 GMT, Nick Gasson wrote: > Building a fastdebug image on a machine without LSE (e.g. A72) or explicitly disabling LSE results in: > > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (0xe0000000), pid=64585, tid=64619 > # stop: Header is not fast-locked > # > # JRE version: OpenJDK Runtime Environment (22.0) (fastdebug build 22-internal-git-a2391a92c) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-internal-git-a2391a92c, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) > # Problematic frame: > # J 1373 c2 sun.nio.ch.NativeThreadSet.add()I java.base (155 bytes) @ 0x0000ffff7ccdf110 [0x0000ffff7ccdef80+0x0000000000000190] > # > > > When UseLSE is false `MacroAssembler::cmpxchg()` uses rscratch1 as a temporary to store the result of the store-exclusive instruction. However rscratch1 may also be one of the registers passed as t1 or t2 to `MacroAssembler::lightweight_lock()` and holding a live value which is then clobbered. Fixed by ensuring rscratch1 is never passed as one of these temporaries. Looks good to me. Would it make any sense to only allocate the extra register when running with +UseLSE? ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15978#pullrequestreview-1650453456 From ngasson at openjdk.org Fri Sep 29 10:06:04 2023 From: ngasson at openjdk.org (Nick Gasson) Date: Fri, 29 Sep 2023 10:06:04 GMT Subject: RFR: 8316880: AArch64: "stop: Header is not fast-locked" with -XX:-UseLSE since JDK-8315880 In-Reply-To: References: Message-ID: On Fri, 29 Sep 2023 09:56:17 GMT, Roman Kennke wrote: > Would it make any sense to only allocate the extra register when running with +UseLSE? In C1? I thought about that but the benefit of always allocating it is that it reduces the differences between LSE and non-LSE modes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15978#issuecomment-1740635624 From rkennke at openjdk.org Fri Sep 29 10:20:04 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 29 Sep 2023 10:20:04 GMT Subject: RFR: 8316880: AArch64: "stop: Header is not fast-locked" with -XX:-UseLSE since JDK-8315880 In-Reply-To: References: Message-ID: On Fri, 29 Sep 2023 10:03:08 GMT, Nick Gasson wrote: > > Would it make any sense to only allocate the extra register when running with +UseLSE? > > In C1? I thought about that but the benefit of always allocating it is that it reduces the differences between LSE and non-LSE modes. Right. Good then! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15978#issuecomment-1740652422 From rkennke at openjdk.org Fri Sep 29 11:34:00 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 29 Sep 2023 11:34:00 GMT Subject: RFR: 8309599: WeakHandle and OopHandle release should clear obj pointer In-Reply-To: References: <7iiMlW8hlRFlFh8GdIwbphR86Hh3G7L5tA-f_VkxuC8=.fa35078c-c776-4041-841a-5d6c69ba4e06@github.com> <5NUy8Uh3SiwETfwe9Min5PXSqcfAlLunyd1HJIwo4GY=.f4304324-c158-4852-9c56-4398f4b84ca8@github.com> Message-ID: On Fri, 29 Sep 2023 02:56:07 GMT, David Holmes wrote: > > > Hmmm okay - it seems fragile to have a psuedo-destructor in release(). > > > > > > I don't know what this comment means. > > Object lifetimes should be well managed such that you can't use an object after it has been "destroyed". Methods like `release()` effectively nuke the internals of the object but the object is still available to be (mis)used. Before this fix `release` left a dangling `_obj` pointer, but that wouldn't be an issue if the handle itself could not be used after being released. In WeakHandle, it is ok, and in-fact expected, that the wrapped object is null when it is unreachable. This is the reason for the existence of WeakHandle. The contract has always been enforced by the GC Access API. This change additionally clear the field when the object becomes unreachable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15920#issuecomment-1740741841 From luhenry at openjdk.org Fri Sep 29 12:01:27 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 29 Sep 2023 12:01:27 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic [v2] In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 16:54:14 GMT, Hamlin Li wrote: >> Only vector version is included in this patch. >> >> ### Test >> The patch passed the jdk tests found via `find test/jdk/ -iname *ChaCha*` > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > revert adding t3-t6 Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15899#pullrequestreview-1650639784 From aph at openjdk.org Fri Sep 29 13:28:14 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 29 Sep 2023 13:28:14 GMT Subject: RFR: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 21:07:09 GMT, Patricio Chilano Mateo wrote: > Please review the following patch. As explained in the bug comments the problem is that os::get_sender_for_C_frame() always constructs a frame as if the sender is also a native C/C++ frame. Setting a correct value for _unextended_sp is important to avoid crashes if this value is later used to get that frame's caller, which will happen if we end up calling frame::sender_for_compiled_frame(). > > The issue exists on aarch64 for both linux and macos but the fix for linux is different. The "Procedure Call Standard for the Arm 64-bit Architecture" doesn't specify a location for the frame record within a stack frame (6.4.6), and gcc happens to choose to save it the top of the frame (lowest address) rather than the bottom. This means that changing fr->link() for fr->sender_sp() won't work. The fix is to use the value of fr->link() but adjusted using the code blob frame size before setting it as the _unextended_sp of the sender frame. While working on this fix I realized the issue is not only when the sender is a native nmethod but with all frames associated with a CodeBlob with a frame size > 0 (runtime stub, safepoint stub, etc) so the check takes that into account. I also made a small fix to next_frame() since these mentioned frames should also use frame::sender(). > > I created a new test to verify that walking the stack over a native nmethod or runtime stub now works okay. I'll try to add a reliable test case for walking over a safepoint stub too. I tested the fix by running the new test and also running tiers1-4 in mach5. I'll run the upper tiers too. > > Thanks, > Patricio src/hotspot/os_cpu/linux_aarch64/os_linux_aarch64.cpp line 170: > 168: bool use_codeblob = cb != nullptr && cb->frame_size() > 0; > 169: assert(!use_codeblob || !Interpreter::contains(pc), "should not be an interpreter frame"); > 170: intptr_t* sender_sp = use_codeblob ? (fr->link() + frame::metadata_words - cb->frame_size()) : fr->link(); Is this assuming that, if the caller is a native frame, the current FP will point to the lowest word in the caller's stack frame? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15972#discussion_r1341350076 From aph at openjdk.org Fri Sep 29 13:28:41 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 29 Sep 2023 13:28:41 GMT Subject: RFR: 8316880: AArch64: "stop: Header is not fast-locked" with -XX:-UseLSE since JDK-8315880 In-Reply-To: References: Message-ID: On Fri, 29 Sep 2023 08:12:06 GMT, Nick Gasson wrote: > Building a fastdebug image on a machine without LSE (e.g. A72) or explicitly disabling LSE results in: > > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (0xe0000000), pid=64585, tid=64619 > # stop: Header is not fast-locked > # > # JRE version: OpenJDK Runtime Environment (22.0) (fastdebug build 22-internal-git-a2391a92c) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-internal-git-a2391a92c, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) > # Problematic frame: > # J 1373 c2 sun.nio.ch.NativeThreadSet.add()I java.base (155 bytes) @ 0x0000ffff7ccdf110 [0x0000ffff7ccdef80+0x0000000000000190] > # > > > When UseLSE is false `MacroAssembler::cmpxchg()` uses rscratch1 as a temporary to store the result of the store-exclusive instruction. However rscratch1 may also be one of the registers passed as t1 or t2 to `MacroAssembler::lightweight_lock()` and holding a live value which is then clobbered. Fixed by ensuring rscratch1 is never passed as one of these temporaries. People unfamiliar with the platform conventions are going to keep getting this wrong. The scratch registers are used in macros: that's what they are for. Please add: diff --git a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp index e3df32ed602..4dbbedc123e 100644 --- a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp +++ b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp @@ -2735,6 +2735,10 @@ void MacroAssembler::cmpxchg(Register addr, Register expected, mov(result, expected); lse_cas(result, new_val, addr, size, acquire, release, /*not_pair*/ true); compare_eq(result, expected, size); +#ifdef ASSERT + // Poison rscratch1 + mov(rscratch1, 0x1f1f1f1f1f1f1f1f); +#endif } else { Label retry_load, done; prfm(Address(addr), PSTL1STRM); diff --git a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp index 51faab3d73b..c72a478949c 100644 --- a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp +++ b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp @@ -6315,7 +6315,7 @@ void MacroAssembler::double_move(VMRegPair src, VMRegPair dst, Register tmp) { // - t1, t2: temporary registers, will be destroyed void MacroAssembler::lightweight_lock(Register obj, Register hdr, Register t1, Register t2, Label& slow) { assert(LockingMode == LM_LIGHTWEIGHT, "only used with new lightweight locking"); - assert_different_registers(obj, hdr, t1, t2); + assert_different_registers(obj, hdr, t1, t2, rscratch1, rscratch2); // Check if we would have space on lock-stack for the object. ldrw(t1, Address(rthread, JavaThread::lock_stack_top_offset())); src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 107: > 105: } else { > 106: assert(LockingMode == LM_LIGHTWEIGHT, "must be"); > 107: lightweight_lock(oop, disp_hdr, tmp, rscratch2, no_count); Please use an allocated scratch register. src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 904: > 902: ldr(header_reg, Address(obj_reg, oopDesc::mark_offset_in_bytes())); > 903: tbnz(header_reg, exact_log2(markWord::monitor_value), slow_case); > 904: lightweight_unlock(obj_reg, header_reg, swap_reg, rscratch2, slow_case); And here. src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 1816: > 1814: assert(LockingMode == LM_LIGHTWEIGHT, "must be"); > 1815: __ ldr(swap_reg, Address(obj_reg, oopDesc::mark_offset_in_bytes())); > 1816: __ lightweight_lock(obj_reg, swap_reg, tmp, rscratch2, slow_path_lock); And here. src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 1957: > 1955: __ ldr(old_hdr, Address(obj_reg, oopDesc::mark_offset_in_bytes())); > 1956: __ tbnz(old_hdr, exact_log2(markWord::monitor_value), slow_path_unlock); > 1957: __ lightweight_unlock(obj_reg, old_hdr, swap_reg, rscratch2, slow_path_unlock); And here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15978#issuecomment-1740830043 PR Comment: https://git.openjdk.org/jdk/pull/15978#issuecomment-1740847447 PR Review Comment: https://git.openjdk.org/jdk/pull/15978#discussion_r1341316596 PR Review Comment: https://git.openjdk.org/jdk/pull/15978#discussion_r1341316893 PR Review Comment: https://git.openjdk.org/jdk/pull/15978#discussion_r1341317694 PR Review Comment: https://git.openjdk.org/jdk/pull/15978#discussion_r1341317842 From ayang at openjdk.org Fri Sep 29 14:03:17 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 29 Sep 2023 14:03:17 GMT Subject: RFR: 8317314: Remove unimplemented ObjArrayKlass::oop_oop_iterate_elements_bounded Message-ID: Trivial removing dead code. ------------- Commit messages: - trivial Changes: https://git.openjdk.org/jdk/pull/15985/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15985&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8317314 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15985.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15985/head:pull/15985 PR: https://git.openjdk.org/jdk/pull/15985 From mdoerr at openjdk.org Fri Sep 29 14:29:33 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 29 Sep 2023 14:29:33 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 10:38:52 GMT, Martin Doerr wrote: >> I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. >> Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Pass may_be_unordered information to lightweight_unlock. > - Merge remote-tracking branch 'origin' into 8316746_lock_stack > - Add x86_64 and aarch64 implementation. > - 8316746: Top of lock-stack does not match the unlocked object Thanks for looking at it! I'm currently using this PR as workaround until a better fix is found (without integrating it). I have come to the same conclusion that JIT compilers reject methods which may possibly contain unbalanced monitors. What I can observe is that the test passes with this: diff --git a/src/hotspot/share/opto/parse1.cpp b/src/hotspot/share/opto/parse1.cpp index 36a87cd8b0d..385dc54f947 100644 --- a/src/hotspot/share/opto/parse1.cpp +++ b/src/hotspot/share/opto/parse1.cpp @@ -222,7 +222,7 @@ void Parse::load_interpreter_state(Node* osr_buf) { assert(jvms()->monitor_depth() == 0, "should be no active locks at beginning of osr"); int mcnt = osr_block->flow()->monitor_count(); Node *monitors_addr = basic_plus_adr(osr_buf, osr_buf, (max_locals+mcnt*2-1)*wordSize); - for (index = 0; index < mcnt; index++) { + for (index = mcnt; (--index) >= 0;) { // Make a BoxLockNode for the monitor. Node *box = _gvn.transform(new BoxLockNode(next_monitor())); I think the lowest address contains the latest monitor (because the stack grows downwards) which should get pushed latest. Is this currently wrong? Note that x86_64 doesn't have the lock checks, so we don't know if it's affected, too. The test may work even if it's wrong because we unlock both objects. Other locking modes don't really care about the order, so the test simply passes even though it's wrong (at least on PPC64). ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1740943594 From aph at openjdk.org Fri Sep 29 14:43:14 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 29 Sep 2023 14:43:14 GMT Subject: RFR: JDK-8269393: store/load order not preserved when handling memory pool due to weakly ordered memory architecture of aarch64 In-Reply-To: References: Message-ID: On Tue, 19 Sep 2023 14:03:30 GMT, Damon Fenacci wrote: > # Issue > An intermittent _Memory Pool not found_ error has been noticed when running a few tests (_vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java_, _vmTestbase/vm/mlvm/meth/stress/compiler/sequences/Test.java_) on _macosx_aarch64_ (production build) with non-segmented code cache. > > ## Origin > The issue originates from the fact that aarch64 architecture is a weakly ordered memory architecture, i.e. it _permits the observation and completion of memory accesses in a different order from the program order_. > > More precisely: while calling `CodeHeapPool::get_memory_usage`, the `used` and `committed` variables are retrieved > https://github.com/openjdk/jdk/blob/138542de7889e8002df0e15a79e31d824c6a0473/src/hotspot/share/services/memoryPool.cpp#L181-L182 > and these are computed based on different variables saved in memory in `CodeCache::allocate` (during `heap->allocate` and `heap->expand_by` to be precise) .https://github.com/openjdk/jdk/blob/138542de7889e8002df0e15a79e31d824c6a0473/src/hotspot/share/code/codeCache.cpp#L535-L537 > The problem happens when first `heap->expand_by` gets called (which _increases_ `committed`) and then `heap->allocate` gets called in a second loop pass (which _increases_ `used`). Although stores in `CodeCache::allocate` happen in the this order, when reading from memory in `CodeHeapPool::get_memory_usage` it can happen that `used` has the newly computed value, while `committed` is still "old" (because of ARM?s weak memory order). This is a problem, since `committed` must be > than `used`. > > # Solution > > To avoid this situation we must assure that values used to calculate `committed` are actually saved before the values used to calculate `used` and that the opposite be true for reading. To enforce this we acquire a `CodeCache_lock` while reading `used` and `committed` in `CodeHeapPool::get_memory_usage` (which should actually be the convention when accessing CodeCache data). src/hotspot/share/services/memoryPool.cpp line 182: > 180: MemoryUsage CodeHeapPool::get_memory_usage() { > 181: OrderAccess::loadload(); > 182: size_t used = used_in_bytes(); This doesn't look quite right. A `loadload` controls the ordering between two accesses. If you want to make sure that you don't see an old version of `committed` with a new version of `used` then the `loadload` must be _between_ the two loads. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15819#discussion_r1330286272 From aph at openjdk.org Fri Sep 29 14:43:15 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 29 Sep 2023 14:43:15 GMT Subject: RFR: JDK-8269393: store/load order not preserved when handling memory pool due to weakly ordered memory architecture of aarch64 In-Reply-To: References: Message-ID: On Wed, 20 Sep 2023 06:27:58 GMT, Damon Fenacci wrote: >> Also, for clarity, I'd make `used` volatile, and access it with `Atomic::release_store()` and `load_acquire()`. That should keep everything straight, assuming there aren't any more ordering failures. > >> This doesn't look quite right. A `loadload` controls the ordering between two accesses. If you want to make sure that you don't see an old version of `committed` with a new version of `used` then the `loadload` must be _between_ the two loads. > > @theRealAph thanks for your comments. You're right: I've just moved it down one line. > >> Also, for clarity, I'd make `used` volatile, and access it with `Atomic::release_store()` and `load_acquire()`. That should keep everything straight, assuming there aren't any more ordering failures. > > I thought about doing so (@dholmes-ora's suggestion) but we realised that `release_store` would probably need to be put in `CodeCache::allocate` whereas `load_acquire` probably in `CodeHeapPool::get_memory_usage` but then we?d need a reference to be shared between the 2 places and that wouldn't make it very clean (basically there would be "nowhere to put the acquire/release at the low-level"). Firstly, this code looks horribly racy. There is a CodeCache lock, but it seems to be that the memory tracking is accessing data outside the lock. I don't know how that's supposed to work. It makes no sense to have a memory fence without explicitly saying which accesses it's ordering. In this patch that's now clear on the reader side, but not on the writer side. For the future maintainer of the code, we need to be able to see that. What do you mean by "we?d need a reference to be shared between the 2 places "? The use of naked storestore fences is a black-belt-ninja thing, and almost certainly not what you want here. See https://hboehm.info/c++mm/no_write_fences.html. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15819#discussion_r1331549018 From dfenacci at openjdk.org Fri Sep 29 14:43:15 2023 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 29 Sep 2023 14:43:15 GMT Subject: RFR: JDK-8269393: store/load order not preserved when handling memory pool due to weakly ordered memory architecture of aarch64 In-Reply-To: References: Message-ID: On Tue, 19 Sep 2023 15:09:31 GMT, Andrew Haley wrote: >> src/hotspot/share/services/memoryPool.cpp line 182: >> >>> 180: MemoryUsage CodeHeapPool::get_memory_usage() { >>> 181: OrderAccess::loadload(); >>> 182: size_t used = used_in_bytes(); >> >> This doesn't look quite right. A `loadload` controls the ordering between two accesses. If you want to make sure that you don't see an old version of `committed` with a new version of `used` then the `loadload` must be _between_ the two loads. > > Also, for clarity, I'd make `used` volatile, and access it with `Atomic::release_store()` and `load_acquire()`. That should keep everything straight, assuming there aren't any more ordering failures. > This doesn't look quite right. A `loadload` controls the ordering between two accesses. If you want to make sure that you don't see an old version of `committed` with a new version of `used` then the `loadload` must be _between_ the two loads. @theRealAph thanks for your comments. You're right: I've just moved it down one line. > Also, for clarity, I'd make `used` volatile, and access it with `Atomic::release_store()` and `load_acquire()`. That should keep everything straight, assuming there aren't any more ordering failures. I thought about doing so (@dholmes-ora's suggestion) but we realised that `release_store` would probably need to be put in `CodeCache::allocate` whereas `load_acquire` probably in `CodeHeapPool::get_memory_usage` but then we?d need a reference to be shared between the 2 places and that wouldn't make it very clean (basically there would be "nowhere to put the acquire/release at the low-level"). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15819#discussion_r1331062763 From dfenacci at openjdk.org Fri Sep 29 14:43:14 2023 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 29 Sep 2023 14:43:14 GMT Subject: RFR: JDK-8269393: store/load order not preserved when handling memory pool due to weakly ordered memory architecture of aarch64 Message-ID: # Issue An intermittent _Memory Pool not found_ error has been noticed when running a few tests (_vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java_, _vmTestbase/vm/mlvm/meth/stress/compiler/sequences/Test.java_) on _macosx_aarch64_ (production build) with non-segmented code cache. ## Origin The issue originates from the fact that aarch64 architecture is a weakly ordered memory architecture, i.e. it _permits the observation and completion of memory accesses in a different order from the program order_. More precisely: while calling `CodeHeapPool::get_memory_usage`, the `used` and `committed` variables are retrieved https://github.com/openjdk/jdk/blob/138542de7889e8002df0e15a79e31d824c6a0473/src/hotspot/share/services/memoryPool.cpp#L181-L182 and these are computed based on different variables saved in memory in `CodeCache::allocate` (during `heap->allocate` and `heap->expand_by` to be precise) .https://github.com/openjdk/jdk/blob/138542de7889e8002df0e15a79e31d824c6a0473/src/hotspot/share/code/codeCache.cpp#L535-L537 The problem happens when first `heap->expand_by` gets called (which _increases_ `committed`) and then `heap->allocate` gets called in a second loop pass (which _increases_ `used`). Although stores in `CodeCache::allocate` happen in the this order, when reading from memory in `CodeHeapPool::get_memory_usage` it can happen that `used` has the newly computed value, while `committed` is still "old" (because of ARM?s weak memory order). This is a problem, since `committed` must be > than `used`. # Solution To avoid this situation we must assure that values used to calculate `committed` are actually saved before the values used to calculate `used` and that the opposite be true for reading. To enforce this we acquire a `CodeCache_lock` while reading `used` and `committed` in `CodeHeapPool::get_memory_usage` (which should actually be the convention when accessing CodeCache data). ------------- Commit messages: - Revert "JDK-8269393: force store/load order" - Revert "JDK-8269393: fix position of load barrier" - Revert "JDK-8269393: revert used line indentation to match original" - JDK-8269393: revert used line indentation to match original - JDK-8269393: fix position of load barrier - JDK-8269393: force store/load order - JDK-8269393: Some vmTestbase/vm/mlvm/meth/stress/compiler tests fail with 'MemoryPool not found' Changes: https://git.openjdk.org/jdk/pull/15819/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15819&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8269393 Stats: 7 lines in 1 file changed: 5 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15819.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15819/head:pull/15819 PR: https://git.openjdk.org/jdk/pull/15819 From dfenacci at openjdk.org Fri Sep 29 14:43:15 2023 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 29 Sep 2023 14:43:15 GMT Subject: RFR: JDK-8269393: store/load order not preserved when handling memory pool due to weakly ordered memory architecture of aarch64 In-Reply-To: References: Message-ID: On Wed, 20 Sep 2023 12:26:17 GMT, Andrew Haley wrote: >> Firstly, this code looks horribly racy. There is a CodeCache lock, but it seems to be that the memory tracking is accessing data outside the lock. I don't know how that's supposed to work. >> >> It makes no sense to have a memory fence without explicitly saying which accesses it's ordering. In this patch that's now clear on the reader side, but not on the writer side. For the future maintainer of the code, we need to be able to see that. >> >> What do you mean by "we?d need a reference to be shared between the 2 places "? >> >> The use of naked storestore fences is a black-belt-ninja thing, and almost certainly not what you want here. See https://hboehm.info/c++mm/no_write_fences.html. > > Aha! I think it's a bug. Here: > > > // Track memory usage statistic after releasing CodeCache_lock > MemoryService::track_code_cache_memory_usage(); > > > calls > > > // Track the peak memory usage > pool->record_peak_memory_usage(); > > > calls > > > // Caller in JDK is responsible for synchronization - > // acquire the lock for this memory pool before calling VM > MemoryUsage usage = get_memory_usage(); > > > Oops. I can't see the lock being acquired. >There is a CodeCache lock, but it seems to be that the memory tracking is accessing data outside the lock. I don't know how that's supposed to work Acquiring a CodeCache lock when getting `used` and `committed` here https://github.com/openjdk/jdk/blob/12de9b0225363377e9a76729b11698221d4f29f2/src/hotspot/share/services/memoryPool.cpp#L181-L182 was actually the solution in the first commit https://github.com/openjdk/jdk/pull/15819/commits/1060bb01f7ad0ca47c38eb0f68b7f2aab6562ac3 but then, asking about the expensiveness of a lock vs storestore/loadload barriers, and the fact that the origin of the issue was actually the order in which store and load actually happen, made me think barriers would be preferable, thing that I now very much doubt. You are suggesting that it would make more sense (and be cleaner) to acquire a CodeCache lock instead of using barriers, right? > ``` > // Caller in JDK is responsible for synchronization - > // acquire the lock for this memory pool before calling VM > MemoryUsage usage = get_memory_usage(); > ``` > > Oops. I can't see the lock being acquired. I'm wondering if there are more places where this would be needed... > What do you mean by "we?d need a reference to be shared between the 2 places "? What I meant is mainly that in this specific case the _storing_ and _loading_ happens in very different places in the code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15819#discussion_r1331633626 From aph at openjdk.org Fri Sep 29 14:43:15 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 29 Sep 2023 14:43:15 GMT Subject: RFR: JDK-8269393: store/load order not preserved when handling memory pool due to weakly ordered memory architecture of aarch64 In-Reply-To: References: Message-ID: On Tue, 19 Sep 2023 15:04:27 GMT, Andrew Haley wrote: >> # Issue >> An intermittent _Memory Pool not found_ error has been noticed when running a few tests (_vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java_, _vmTestbase/vm/mlvm/meth/stress/compiler/sequences/Test.java_) on _macosx_aarch64_ (production build) with non-segmented code cache. >> >> ## Origin >> The issue originates from the fact that aarch64 architecture is a weakly ordered memory architecture, i.e. it _permits the observation and completion of memory accesses in a different order from the program order_. >> >> More precisely: while calling `CodeHeapPool::get_memory_usage`, the `used` and `committed` variables are retrieved >> https://github.com/openjdk/jdk/blob/138542de7889e8002df0e15a79e31d824c6a0473/src/hotspot/share/services/memoryPool.cpp#L181-L182 >> and these are computed based on different variables saved in memory in `CodeCache::allocate` (during `heap->allocate` and `heap->expand_by` to be precise) .https://github.com/openjdk/jdk/blob/138542de7889e8002df0e15a79e31d824c6a0473/src/hotspot/share/code/codeCache.cpp#L535-L537 >> The problem happens when first `heap->expand_by` gets called (which _increases_ `committed`) and then `heap->allocate` gets called in a second loop pass (which _increases_ `used`). Although stores in `CodeCache::allocate` happen in the this order, when reading from memory in `CodeHeapPool::get_memory_usage` it can happen that `used` has the newly computed value, while `committed` is still "old" (because of ARM?s weak memory order). This is a problem, since `committed` must be > than `used`. >> >> # Solution >> >> To avoid this situation we must assure that values used to calculate `committed` are actually saved before the values used to calculate `used` and that the opposite be true for reading. To enforce this we acquire a `CodeCache_lock` while reading `used` and `committed` in `CodeHeapPool::get_memory_usage` (which should actually be the convention when accessing CodeCache data). > > src/hotspot/share/services/memoryPool.cpp line 182: > >> 180: MemoryUsage CodeHeapPool::get_memory_usage() { >> 181: OrderAccess::loadload(); >> 182: size_t used = used_in_bytes(); > > This doesn't look quite right. A `loadload` controls the ordering between two accesses. If you want to make sure that you don't see an old version of `committed` with a new version of `used` then the `loadload` must be _between_ the two loads. Also, for clarity, I'd make `used` volatile, and access it with `Atomic::release_store()` and `load_acquire()`. That should keep everything straight, assuming there aren't any more ordering failures. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15819#discussion_r1330293024 From aph at openjdk.org Fri Sep 29 14:43:15 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 29 Sep 2023 14:43:15 GMT Subject: RFR: JDK-8269393: store/load order not preserved when handling memory pool due to weakly ordered memory architecture of aarch64 In-Reply-To: References: Message-ID: On Wed, 20 Sep 2023 12:22:50 GMT, Andrew Haley wrote: >>> This doesn't look quite right. A `loadload` controls the ordering between two accesses. If you want to make sure that you don't see an old version of `committed` with a new version of `used` then the `loadload` must be _between_ the two loads. >> >> @theRealAph thanks for your comments. You're right: I've just moved it down one line. >> >>> Also, for clarity, I'd make `used` volatile, and access it with `Atomic::release_store()` and `load_acquire()`. That should keep everything straight, assuming there aren't any more ordering failures. >> >> I thought about doing so (@dholmes-ora's suggestion) but we realised that `release_store` would probably need to be put in `CodeCache::allocate` whereas `load_acquire` probably in `CodeHeapPool::get_memory_usage` but then we?d need a reference to be shared between the 2 places and that wouldn't make it very clean (basically there would be "nowhere to put the acquire/release at the low-level"). > > Firstly, this code looks horribly racy. There is a CodeCache lock, but it seems to be that the memory tracking is accessing data outside the lock. I don't know how that's supposed to work. > > It makes no sense to have a memory fence without explicitly saying which accesses it's ordering. In this patch that's now clear on the reader side, but not on the writer side. For the future maintainer of the code, we need to be able to see that. > > What do you mean by "we?d need a reference to be shared between the 2 places "? > > The use of naked storestore fences is a black-belt-ninja thing, and almost certainly not what you want here. See https://hboehm.info/c++mm/no_write_fences.html. Aha! I think it's a bug. Here: // Track memory usage statistic after releasing CodeCache_lock MemoryService::track_code_cache_memory_usage(); calls // Track the peak memory usage pool->record_peak_memory_usage(); calls // Caller in JDK is responsible for synchronization - // acquire the lock for this memory pool before calling VM MemoryUsage usage = get_memory_usage(); Oops. I can't see the lock being acquired. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15819#discussion_r1331553245 From aph at openjdk.org Fri Sep 29 14:43:15 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 29 Sep 2023 14:43:15 GMT Subject: RFR: JDK-8269393: store/load order not preserved when handling memory pool due to weakly ordered memory architecture of aarch64 In-Reply-To: References: Message-ID: On Wed, 20 Sep 2023 13:27:33 GMT, Damon Fenacci wrote: >> Aha! I think it's a bug. Here: >> >> >> // Track memory usage statistic after releasing CodeCache_lock >> MemoryService::track_code_cache_memory_usage(); >> >> >> calls >> >> >> // Track the peak memory usage >> pool->record_peak_memory_usage(); >> >> >> calls >> >> >> // Caller in JDK is responsible for synchronization - >> // acquire the lock for this memory pool before calling VM >> MemoryUsage usage = get_memory_usage(); >> >> >> Oops. I can't see the lock being acquired. > >>There is a CodeCache lock, but it seems to be that the memory tracking is accessing data outside the lock. I don't know how that's supposed to work > > Acquiring a CodeCache lock when getting `used` and `committed` here > https://github.com/openjdk/jdk/blob/12de9b0225363377e9a76729b11698221d4f29f2/src/hotspot/share/services/memoryPool.cpp#L181-L182 > was actually the solution in the first commit https://github.com/openjdk/jdk/pull/15819/commits/1060bb01f7ad0ca47c38eb0f68b7f2aab6562ac3 but then, asking about the expensiveness of a lock vs storestore/loadload barriers, and the fact that the origin of the issue was actually the order in which store and load actually happen, made me think barriers would be preferable, thing that I now very much doubt. > You are suggesting that it would make more sense (and be cleaner) to acquire a CodeCache lock instead of using barriers, right? > >> ``` >> // Caller in JDK is responsible for synchronization - >> // acquire the lock for this memory pool before calling VM >> MemoryUsage usage = get_memory_usage(); >> ``` >> >> Oops. I can't see the lock being acquired. > > I'm wondering if there are more places where this would be needed... > > >> What do you mean by "we?d need a reference to be shared between the 2 places "? > > What I meant is mainly that in this specific case the _storing_ and _loading_ happens in very different places in the code. > > There is a CodeCache lock, but it seems to be that the memory tracking is accessing data outside the lock. I don't know how that's supposed to work > > Acquiring a CodeCache lock when getting `used` and `committed` here > > https://github.com/openjdk/jdk/blob/12de9b0225363377e9a76729b11698221d4f29f2/src/hotspot/share/services/memoryPool.cpp#L181-L182 > > was actually the solution in the first commit [1060bb0](https://github.com/openjdk/jdk/commit/1060bb01f7ad0ca47c38eb0f68b7f2aab6562ac3) but then, asking about the expensiveness of a lock vs storestore/loadload barriers, and the fact that the origin of the issue was actually the order in which store and load actually happen, made me think barriers would be preferable, thing that I now very much doubt. > You are suggesting that it would make more sense (and be cleaner) to acquire a CodeCache lock instead of using barriers, right? That seems to be the established convention. And I wonder if there are other problems caused by accessing data in this way, outside the CodeCache lock. > > ``` > > // Caller in JDK is responsible for synchronization - > > // acquire the lock for this memory pool before calling VM > > MemoryUsage usage = get_memory_usage(); > > ``` > > Oops. I can't see the lock being acquired. > > I'm wondering if there are more places where this would be needed... The comment makes it pretty clear that `get_memory_usage()`really needs to hold the lock for this memory pool. I very strongly suspect this is the real cause of your problem. > > What do you mean by "we?d need a reference to be shared between the 2 places "? > > What I meant is mainly that in this specific case the _storing_ and _loading_ happens in very different places in the code. Sure, it's not best practice. Better than an (apparently) randomly-placed fence, though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15819#discussion_r1331707703 From dfenacci at openjdk.org Fri Sep 29 14:43:16 2023 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 29 Sep 2023 14:43:16 GMT Subject: RFR: JDK-8269393: store/load order not preserved when handling memory pool due to weakly ordered memory architecture of aarch64 In-Reply-To: References: Message-ID: <5xTdbbB8OjoyCCNMLzTGXrgnPsNo_bw5NfVTX3C-ZsA=.4af7af8b-0596-4ec0-adb9-84dad2bd1646@github.com> On Wed, 20 Sep 2023 14:16:44 GMT, Andrew Haley wrote: >>>There is a CodeCache lock, but it seems to be that the memory tracking is accessing data outside the lock. I don't know how that's supposed to work >> >> Acquiring a CodeCache lock when getting `used` and `committed` here >> https://github.com/openjdk/jdk/blob/12de9b0225363377e9a76729b11698221d4f29f2/src/hotspot/share/services/memoryPool.cpp#L181-L182 >> was actually the solution in the first commit https://github.com/openjdk/jdk/pull/15819/commits/1060bb01f7ad0ca47c38eb0f68b7f2aab6562ac3 but then, asking about the expensiveness of a lock vs storestore/loadload barriers, and the fact that the origin of the issue was actually the order in which store and load actually happen, made me think barriers would be preferable, thing that I now very much doubt. >> You are suggesting that it would make more sense (and be cleaner) to acquire a CodeCache lock instead of using barriers, right? >> >>> ``` >>> // Caller in JDK is responsible for synchronization - >>> // acquire the lock for this memory pool before calling VM >>> MemoryUsage usage = get_memory_usage(); >>> ``` >>> >>> Oops. I can't see the lock being acquired. >> >> I'm wondering if there are more places where this would be needed... >> >> >>> What do you mean by "we?d need a reference to be shared between the 2 places "? >> >> What I meant is mainly that in this specific case the _storing_ and _loading_ happens in very different places in the code. > >> > There is a CodeCache lock, but it seems to be that the memory tracking is accessing data outside the lock. I don't know how that's supposed to work >> >> Acquiring a CodeCache lock when getting `used` and `committed` here >> >> https://github.com/openjdk/jdk/blob/12de9b0225363377e9a76729b11698221d4f29f2/src/hotspot/share/services/memoryPool.cpp#L181-L182 >> >> was actually the solution in the first commit [1060bb0](https://github.com/openjdk/jdk/commit/1060bb01f7ad0ca47c38eb0f68b7f2aab6562ac3) but then, asking about the expensiveness of a lock vs storestore/loadload barriers, and the fact that the origin of the issue was actually the order in which store and load actually happen, made me think barriers would be preferable, thing that I now very much doubt. >> You are suggesting that it would make more sense (and be cleaner) to acquire a CodeCache lock instead of using barriers, right? > > That seems to be the established convention. And I wonder if there are other problems caused by accessing data in this way, outside the CodeCache lock. > >> > ``` >> > // Caller in JDK is responsible for synchronization - >> > // acquire the lock for this memory pool before calling VM >> > MemoryUsage usage = get_memory_usage(); >> > ``` >> > Oops. I can't see the lock being acquired. >> >> I'm wondering if there are more places where this would be needed... > > The comment makes it pretty clear that `get_memory_usage()`really needs to hold the lock for this memory pool. I very strongly suspect this is the real cause of your problem. > >> > What do you mean by "we?d need a reference to be shared between the 2 places "? >> >> What I meant is mainly that in this specific case the _storing_ and _loading_ happens in very different places in the code. > > Sure, it's not best practice. Better than an (apparently) randomly-placed fence, though. I?ve reverted to acquiring the `CodeCache_lock` while reading `used` and `committed`. I thought about acquiring it higher up in the call stack but for this problematic case it is not really possible: in our case `get_memory_usage` is called from here https://github.com/openjdk/jdk/blob/12de9b0225363377e9a76729b11698221d4f29f2/src/hotspot/share/services/management.cpp#L592-L603 which is not really related to CodeCache (wrong place to acquire a `CodeCache_lock`) and it gets called from outside the VM. The other problem I thought might happen is a deadlock due to `get_memory_usage` being called after a `CodeCache_lock` has already been acquired. This does?t seem to be the case (checked by inspecting possible call stacks and running tier 1-5+ tests). Acquiring the lock conditionally (e.g. `CompiledMethod_lock->owned_by_self() ? nullptr : CompiledMethod_lock`, used in for other locks in the code) might guarantee that this doesn't happen but seems to "pollute" the code unnecessarily. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15819#discussion_r1335460111 From rkennke at openjdk.org Fri Sep 29 14:50:51 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 29 Sep 2023 14:50:51 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v21] In-Reply-To: References: Message-ID: > Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when compaction fails, to indicate that the object has been looked at, but failed compaction into to-space. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits: - Merge branch 'JDK-8305896' into JDK-8305898 - Merge branch 'JDK-8305896' into JDK-8305898 - Merge branch 'JDK-8305896' into JDK-8305898 - Merge branch 'JDK-8305896' into JDK-8305898 - Merge branch 'JDK-8305896' into JDK-8305898 - Update comment about mark-word layout - Merge branch 'JDK-8305896' into JDK-8305898 - Fix tests on 32bit builds - Merge branch 'JDK-8305896' into JDK-8305898 - Merge branch 'JDK-8305896' into JDK-8305898 - ... and 23 more: https://git.openjdk.org/jdk/compare/2d068012...dbb74fb0 ------------- Changes: https://git.openjdk.org/jdk/pull/13779/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=20 Stats: 101 lines in 8 files changed: 85 ins; 2 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/13779.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13779/head:pull/13779 PR: https://git.openjdk.org/jdk/pull/13779 From tonyp at openjdk.org Fri Sep 29 14:57:12 2023 From: tonyp at openjdk.org (Antonios Printezis) Date: Fri, 29 Sep 2023 14:57:12 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic [v2] In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 14:40:31 GMT, Robbin Ehn wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> revert adding t3-t6 > > src/hotspot/cpu/riscv/assembler_riscv.hpp line 150: > >> 148: constexpr Register t5 = x30; >> 149: constexpr Register t6 = x31; >> 150: > > In your case it doesn't look like we need them? > > So I think you should revert these changes. > As we may want to reserve one of those registers for something in the future. > I don't think we should take lightly on just start using them. @robehn Not sure I understand this argument. We can still use the registers using `x[28-31]`. Why not give them their more informative name? Also, I do use them in the MD5 intrinsic, FWIW. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1341461546 From tonyp at openjdk.org Fri Sep 29 14:57:13 2023 From: tonyp at openjdk.org (Antonios Printezis) Date: Fri, 29 Sep 2023 14:57:13 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic [v2] In-Reply-To: <3W75sRp2lXw-nd21_8yX7T2ZAF5WA63QfuWp-GOsqag=.9ba45dd3-5838-4aaf-b81a-2248bfc3fa66@github.com> References: <3W75sRp2lXw-nd21_8yX7T2ZAF5WA63QfuWp-GOsqag=.9ba45dd3-5838-4aaf-b81a-2248bfc3fa66@github.com> Message-ID: On Wed, 27 Sep 2023 10:04:29 GMT, Hamlin Li wrote: >> It's only code convention in Hotspot. > > Thanks for confirmation! > It's only code convention in Hotspot. Correct. Apologies for not making it clear. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1341441775 From tonyp at openjdk.org Fri Sep 29 14:57:14 2023 From: tonyp at openjdk.org (Antonios Printezis) Date: Fri, 29 Sep 2023 14:57:14 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic [v2] In-Reply-To: References: <_pqOBOgIpb6t8zMJWw_SHOZUsYO9_dQgPEJ3gddF_Gw=.22dfd84d-bf70-41b9-af21-169f09517213@github.com> <2gy2rzM2B7I7oHf-pVmKTbEkmZnwRznNinCAnDRDQNg=.57e9d123-c45b-48de-81db-48a9f4556d1d@github.com> Message-ID: On Wed, 27 Sep 2023 14:00:23 GMT, Hamlin Li wrote: >> I don't see that the specs guarantees that under no circumstance tail memory will not be touched. >> So I don't know, for now it's better to be safe than sorry. > > Sure, in fact vsetvli was already modifed to use default arguments as you suggested. > Maybe this can't happen here since you work with such nice size. Yeah, but given we don't know what the vector register size is I don't think we can assume anything about how "nice" the size is. ;-) I agree with Robbin here, I think mu / tu is the right approach. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1341447189 From tonyp at openjdk.org Fri Sep 29 14:57:14 2023 From: tonyp at openjdk.org (Antonios Printezis) Date: Fri, 29 Sep 2023 14:57:14 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic [v2] In-Reply-To: References: <_pqOBOgIpb6t8zMJWw_SHOZUsYO9_dQgPEJ3gddF_Gw=.22dfd84d-bf70-41b9-af21-169f09517213@github.com> <2gy2rzM2B7I7oHf-pVmKTbEkmZnwRznNinCAnDRDQNg=.57e9d123-c45b-48de-81db-48a9f4556d1d@github.com> Message-ID: <88hsCK7oRmnsEF5BZsIvLi0-UGY3lJP_xwjjzHglD3U=.d929769c-9dcf-47c8-b7c7-413ee5034c75@github.com> On Fri, 29 Sep 2023 14:37:11 GMT, Antonios Printezis wrote: >> Sure, in fact vsetvli was already modifed to use default arguments as you suggested. > >> Maybe this can't happen here since you work with such nice size. > > Yeah, but given we don't know what the vector register size is I don't think we can assume anything about how "nice" the size is. ;-) I agree with Robbin here, I think mu / tu is the right approach. > I don't think ma/ta will ever be correct when working on Java heap. What if we know that the operation is definitely touching memory that's inside the object? And why is it different to working on, say, the C-heap? You can have other objects on either side in both cases. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1341454992 From dcubed at openjdk.org Fri Sep 29 14:57:50 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 29 Sep 2023 14:57:50 GMT Subject: RFR: 8317314: Remove unimplemented ObjArrayKlass::oop_oop_iterate_elements_bounded In-Reply-To: References: Message-ID: On Fri, 29 Sep 2023 12:34:20 GMT, Albert Mingkun Yang wrote: > Trivial removing dead code. Thumbs up. This is a trivial fix. ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15985#pullrequestreview-1650938686 From ayang at openjdk.org Fri Sep 29 15:37:13 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 29 Sep 2023 15:37:13 GMT Subject: RFR: 8317318: Serial: Change GenCollectedHeap to SerialHeap in whitebox Message-ID: Use more precise type for Serial GC. I also added a ` ShouldNotReachHere()` there, because using serial-heap when serial-gc is not used seems problematic. ------------- Commit messages: - s1-prims Changes: https://git.openjdk.org/jdk/pull/15988/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15988&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8317318 Stats: 10 lines in 1 file changed: 7 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15988.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15988/head:pull/15988 PR: https://git.openjdk.org/jdk/pull/15988 From pchilanomate at openjdk.org Fri Sep 29 16:35:18 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 29 Sep 2023 16:35:18 GMT Subject: RFR: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame [v2] In-Reply-To: References: Message-ID: > Please review the following patch. As explained in the bug comments the problem is that os::get_sender_for_C_frame() always constructs a frame as if the sender is also a native C/C++ frame. Setting a correct value for _unextended_sp is important to avoid crashes if this value is later used to get that frame's caller, which will happen if we end up calling frame::sender_for_compiled_frame(). > > The issue exists on aarch64 for both linux and macos but the fix for linux is different. The "Procedure Call Standard for the Arm 64-bit Architecture" doesn't specify a location for the frame record within a stack frame (6.4.6), and gcc happens to choose to save it the top of the frame (lowest address) rather than the bottom. This means that changing fr->link() for fr->sender_sp() won't work. The fix is to use the value of fr->link() but adjusted using the code blob frame size before setting it as the _unextended_sp of the sender frame. While working on this fix I realized the issue is not only when the sender is a native nmethod but with all frames associated with a CodeBlob with a frame size > 0 (runtime stub, safepoint stub, etc) so the check takes that into account. I also made a small fix to next_frame() since these mentioned frames should also use frame::sender(). > > I created a new test to verify that walking the stack over a native nmethod or runtime stub now works okay. I'll try to add a reliable test case for walking over a safepoint stub too. I tested the fix by running the new test and also running tiers1-4 in mach5. I'll run the upper tiers too. > > Thanks, > Patricio Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: - add comment to tests - use driver + @requires vm.flagless ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15972/files - new: https://git.openjdk.org/jdk/pull/15972/files/eceeed15..8757025e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15972&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15972&range=00-01 Stats: 6 lines in 1 file changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15972/head:pull/15972 PR: https://git.openjdk.org/jdk/pull/15972 From pchilanomate at openjdk.org Fri Sep 29 16:35:32 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 29 Sep 2023 16:35:32 GMT Subject: RFR: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame [v2] In-Reply-To: References: Message-ID: On Fri, 29 Sep 2023 13:14:28 GMT, Andrew Haley wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - add comment to tests >> - use driver + @requires vm.flagless > > src/hotspot/os_cpu/linux_aarch64/os_linux_aarch64.cpp line 170: > >> 168: bool use_codeblob = cb != nullptr && cb->frame_size() > 0; >> 169: assert(!use_codeblob || !Interpreter::contains(pc), "should not be an interpreter frame"); >> 170: intptr_t* sender_sp = use_codeblob ? (fr->link() + frame::metadata_words - cb->frame_size()) : fr->link(); > > Is this assuming that, if the caller is a native frame, the current FP will point to the lowest word in the caller's stack frame? If this is a native frame the current FP would point to the current's frame lowest address. The value stored there would be the sender's FP. If the sender is also a native frame, then that value would just point to the lowest address of that frame. If the sender is a frame associated with some CodeBlob and we know its size then the sender's FP would point two words below the highest address of that frame (unless the sender's FP value is wrong but then _unextended_sp would be wrong anyways if we set it to the sender's FP as the old code) and we can calculate the actual _unextended_sp so that when getting the sender of that frame we don't crash. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15972#discussion_r1341523258 From pchilanomate at openjdk.org Fri Sep 29 16:35:46 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 29 Sep 2023 16:35:46 GMT Subject: RFR: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame [v2] In-Reply-To: References: Message-ID: On Fri, 29 Sep 2023 04:29:34 GMT, David Holmes wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - add comment to tests >> - use driver + @requires vm.flagless > > src/hotspot/share/utilities/vmError.cpp line 434: > >> 432: return invalid; >> 433: } >> 434: if (fr.is_interpreted_frame() || (fr.cb() != nullptr && fr.cb()->frame_size() > 0)) { > > This part of the fix is unclear to me. How do the old conditions relate to the new ones? The second part of the condition includes the previous checks for is_compiled_frame(), is_native_frame(), is_runtime_frame() plus any other frame that would use sender_for_compiled_frame() when calling frame::sender(), like the safepoint stub. > test/hotspot/jtreg/runtime/ErrorHandling/StackWalkNativeToJava.java line 84: > >> 82: public void callNativeMethod() throws Exception { >> 83: Object obj = new Object(); >> 84: obj.wait(); > > Just to be clear, the aim here is to call a native method that will complete by throwing an exception, so you can abort the VM. A comment to that affect would be good. Thanks Done. > test/hotspot/jtreg/runtime/ErrorHandling/StackWalkNativeToJava.java line 116: > >> 114: } >> 115: >> 116: public void callVMMethod() throws Exception { > > Again a comment outlining how you expect this to abort the VM would be good. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15972#discussion_r1341496808 PR Review Comment: https://git.openjdk.org/jdk/pull/15972#discussion_r1341496264 PR Review Comment: https://git.openjdk.org/jdk/pull/15972#discussion_r1341496366 From pchilanomate at openjdk.org Fri Sep 29 16:35:51 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 29 Sep 2023 16:35:51 GMT Subject: RFR: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame [v2] In-Reply-To: <_bcgcJi0-GKncB1R89sQZkZTsbv9JuwLvYu-WycITPY=.f7eada18-56ea-4774-b44c-54af193957e0@github.com> References: <_bcgcJi0-GKncB1R89sQZkZTsbv9JuwLvYu-WycITPY=.f7eada18-56ea-4774-b44c-54af193957e0@github.com> Message-ID: On Fri, 29 Sep 2023 03:49:50 GMT, Leonid Mesnik wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - add comment to tests >> - use driver + @requires vm.flagless > > test/hotspot/jtreg/runtime/ErrorHandling/StackWalkNativeToJava.java line 39: > >> 37: * @requires os.family != "windows" >> 38: * @library /test/lib >> 39: * @run main/othervm StackWalkNativeToJava > > I think it should be driver instead of main/othervm here. Fixed. > test/hotspot/jtreg/runtime/ErrorHandling/StackWalkNativeToJava.java line 67: > >> 65: commands.add("StackWalkNativeToJava$TestNativeToJavaNative"); >> 66: >> 67: ProcessBuilder pb = ProcessTools.createJavaProcessBuilder(commands); > > The test ignores any external VM flags. Pleas add > @requires vm.flagless > to the test header to don't run this test with any additional VM flags Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15972#discussion_r1341495319 PR Review Comment: https://git.openjdk.org/jdk/pull/15972#discussion_r1341495528 From lmesnik at openjdk.org Fri Sep 29 18:41:16 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 29 Sep 2023 18:41:16 GMT Subject: RFR: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame [v2] In-Reply-To: References: Message-ID: On Fri, 29 Sep 2023 16:35:18 GMT, Patricio Chilano Mateo wrote: >> Please review the following patch. As explained in the bug comments the problem is that os::get_sender_for_C_frame() always constructs a frame as if the sender is also a native C/C++ frame. Setting a correct value for _unextended_sp is important to avoid crashes if this value is later used to get that frame's caller, which will happen if we end up calling frame::sender_for_compiled_frame(). >> >> The issue exists on aarch64 for both linux and macos but the fix for linux is different. The "Procedure Call Standard for the Arm 64-bit Architecture" doesn't specify a location for the frame record within a stack frame (6.4.6), and gcc happens to choose to save it the top of the frame (lowest address) rather than the bottom. This means that changing fr->link() for fr->sender_sp() won't work. The fix is to use the value of fr->link() but adjusted using the code blob frame size before setting it as the _unextended_sp of the sender frame. While working on this fix I realized the issue is not only when the sender is a native nmethod but with all frames associated with a CodeBlob with a frame size > 0 (runtime stub, safepoint stub, etc) so the check takes that into account. I also made a small fix to next_frame() since these mentioned frames should also use frame::sender(). >> >> I created a new test to verify that walking the stack over a native nmethod or runtime stub now works okay. I'll try to add a reliable test case for walking over a safepoint stub too. I tested the fix by running the new test and also running tiers1-4 in mach5. I'll run the upper tiers too. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - add comment to tests > - use driver + @requires vm.flagless Thanks for the changes. The test looks good for me. ------------- Marked as reviewed by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15972#pullrequestreview-1651256046 From aph at openjdk.org Fri Sep 29 18:41:16 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 29 Sep 2023 18:41:16 GMT Subject: RFR: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame [v2] In-Reply-To: References: Message-ID: <5AhJwuTy6zJ4nhANPaeUYTZdQBuc96LefywHhhGA0Rc=.0ad39664-4961-4ea2-8d82-30d801207fa9@github.com> On Fri, 29 Sep 2023 15:39:04 GMT, Patricio Chilano Mateo wrote: > If this is a native frame the current FP would point to the current's frame lowest address. The value stored there would be the sender's FP. Yes to both of those. > If the sender is also a native frame, then that value would just point to the lowest address of that frame. Maybe. I think that's the way GCC works, but not the ABI. All the ABI guarantees is that the frame pointers form a chain. And I don't know that GCC always does it this way, e.g. with variable-size arrays. At least this needs a comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15972#discussion_r1341638839 From pchilanomate at openjdk.org Fri Sep 29 18:51:58 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 29 Sep 2023 18:51:58 GMT Subject: RFR: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame [v2] In-Reply-To: <5AhJwuTy6zJ4nhANPaeUYTZdQBuc96LefywHhhGA0Rc=.0ad39664-4961-4ea2-8d82-30d801207fa9@github.com> References: <5AhJwuTy6zJ4nhANPaeUYTZdQBuc96LefywHhhGA0Rc=.0ad39664-4961-4ea2-8d82-30d801207fa9@github.com> Message-ID: On Fri, 29 Sep 2023 17:30:58 GMT, Andrew Haley wrote: >> If this is a native frame the current FP would point to the current's frame lowest address. The value stored there would be the sender's FP. If the sender is also a native frame, then that value would just point to the lowest address of that frame. If the sender is a frame associated with some CodeBlob and we know its size then the sender's FP would point two words below the highest address of that frame (unless the sender's FP value is wrong but then _unextended_sp would be wrong anyways if we set it to the sender's FP as the old code) and we can calculate the actual _unextended_sp so that when getting the sender of that frame we don't crash. > >> If this is a native frame the current FP would point to the current's frame lowest address. The value stored there would be the sender's FP. > > Yes to both of those. > >> If the sender is also a native frame, then that value would just point to the lowest address of that frame. > > Maybe. I think that's the way GCC works, but not the ABI. All the ABI guarantees is that the frame pointers form a chain. And I don't know that GCC always does it this way, e.g. with variable-size arrays. At least this needs a comment. Ah, is your concern about setting the right _unextended_sp/_sp value when the caller is also a native frame? If that's the case then yes we would need to know where the frame records are stored to know what to do. I guess the assumption was already that they were stored at the lowest address and that's why we pass fr->link() for the sp when constructing the sender frame. But in any case that sp value is not used to get the sender so even if the value is not correct we won't crash. If that was your concern I can add a comment for that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15972#discussion_r1341698871 From rehn at openjdk.org Fri Sep 29 19:30:22 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 29 Sep 2023 19:30:22 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic [v2] In-Reply-To: References: Message-ID: On Fri, 29 Sep 2023 14:50:13 GMT, Antonios Printezis wrote: >> src/hotspot/cpu/riscv/assembler_riscv.hpp line 150: >> >>> 148: constexpr Register t5 = x30; >>> 149: constexpr Register t6 = x31; >>> 150: >> >> In your case it doesn't look like we need them? >> >> So I think you should revert these changes. >> As we may want to reserve one of those registers for something in the future. >> I don't think we should take lightly on just start using them. > > @robehn Not sure I understand this argument. We can still use the registers using `x[28-31]`. Why not give them their more informative name? Also, I do use them in the MD5 intrinsic, FWIW. Yes, I see what you mean. I think I was just wrong. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1341727249 From dlong at openjdk.org Sat Sep 30 03:09:56 2023 From: dlong at openjdk.org (Dean Long) Date: Sat, 30 Sep 2023 03:09:56 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 10:38:52 GMT, Martin Doerr wrote: >> I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. >> Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Pass may_be_unordered information to lightweight_unlock. > - Merge remote-tracking branch 'origin' into 8316746_lock_stack > - Add x86_64 and aarch64 implementation. > - 8316746: Top of lock-stack does not match the unlocked object Hmmm, I believe you're right. If only C2 is affected and not C1, then we should fix it in load_interpreter_state(). If C1 is also affected we can fix it in OSR_migration_begin(). ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1741641760 From dlong at openjdk.org Sat Sep 30 03:18:43 2023 From: dlong at openjdk.org (Dean Long) Date: Sat, 30 Sep 2023 03:18:43 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: <6Ct6_QIRvGXRb_o2hP8W4o9O8MgzbmXOcCsDqf_Tsx4=.a9cbce3d-758b-4544-9af3-5997c4b8f70c@github.com> On Thu, 28 Sep 2023 10:38:52 GMT, Martin Doerr wrote: >> I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. >> Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Pass may_be_unordered information to lightweight_unlock. > - Merge remote-tracking branch 'origin' into 8316746_lock_stack > - Add x86_64 and aarch64 implementation. > - 8316746: Top of lock-stack does not match the unlocked object Please go with your load_interpreter_state() fix. You've convinced me it is correct :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1741643626 From fbredberg at openjdk.org Sat Sep 30 12:48:27 2023 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Sat, 30 Sep 2023 12:48:27 GMT Subject: RFR: 8316523: Relativize esp in interpreter frames (PowerPC only) Message-ID: Relativize esp (Expression Stack Pointer on PowerPC) in interpreter frames. By changing the "esp" member in interpreter frames from being an absolute address into an offset that is relative to the frame pointer, we don't need to change the value as we freeze and thaw frames of virtual threads. This is since we might freeze and thaw from and to different worker threads, so the absolute address to locals might change, but the offset from the frame pointer will be constant. This subtask only handles "esp" on PowerPC. The relativization of other interpreter frame members are handled in other subtasks to [JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296). It has been sanity tested onr PowerPC using Qemu. ------------- Commit messages: - 8316523: Relativize esp in interpreter frames (PowerPC only) - 8316523_relativize_esp_on_ppc Changes: https://git.openjdk.org/jdk/pull/15999/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15999&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316523 Stats: 23 lines in 4 files changed: 16 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/15999.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15999/head:pull/15999 PR: https://git.openjdk.org/jdk/pull/15999 From fbredberg at openjdk.org Sat Sep 30 14:51:39 2023 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Sat, 30 Sep 2023 14:51:39 GMT Subject: RFR: 8316523: Relativize esp in interpreter frames (PowerPC only) In-Reply-To: References: Message-ID: On Sat, 30 Sep 2023 11:57:49 GMT, Fredrik Bredberg wrote: > Relativize esp (Expression Stack Pointer on PowerPC) in interpreter frames. > > By changing the "esp" member in interpreter frames from being an absolute address into an offset that is relative to the frame pointer, we don't need to change the value as we freeze and thaw frames of virtual threads. This is since we might freeze and thaw from and to different worker threads, so the absolute address to locals might change, but the offset from the frame pointer will be constant. > > This subtask only handles "esp" on PowerPC. The relativization of other interpreter frame members are handled in other subtasks to [JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296). > > It has been sanity tested onr PowerPC using Qemu. Since this is a PowerPC only PR, I think this needs to be reviewed by @TheRealMDoerr. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15999#issuecomment-1741782298 From duke at openjdk.org Sat Sep 30 15:52:55 2023 From: duke at openjdk.org (ExE Boss) Date: Sat, 30 Sep 2023 15:52:55 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v32] In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 13:33:32 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > review @enablePreview from java/foreign/TestRestricted test Maybe?include ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15103#issuecomment-1741795291 From jvernee at openjdk.org Sat Sep 30 16:22:56 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Sat, 30 Sep 2023 16:22:56 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v32] In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 13:33:32 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > review @enablePreview from java/foreign/TestRestricted test > Maybe include [openjdk/panama-foreign#864](https://github.com/openjdk/panama-foreign/pull/864)? The leftover implementation-only changes will be ported over separately. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15103#issuecomment-1741806316 From ayang at openjdk.org Sat Sep 30 17:27:42 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Sat, 30 Sep 2023 17:27:42 GMT Subject: RFR: 8317318: Serial: Change GenCollectedHeap to SerialHeap in whitebox [v2] In-Reply-To: References: Message-ID: > Use more precise type for Serial GC. I also added a ` ShouldNotReachHere()` there, because using serial-heap when serial-gc is not used seems problematic. Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: s1-prims ------------- Changes: https://git.openjdk.org/jdk/pull/15988/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15988&range=01 Stats: 11 lines in 1 file changed: 8 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15988.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15988/head:pull/15988 PR: https://git.openjdk.org/jdk/pull/15988

Kernel	`-XX:-TransparentHugePages`		`-XX:+TransparentHugePages`
	Unpatched	Patched	Unpatched	Patched
4.18	11.30	11.30	0.25	0.25
5.13	0.22	0.22	3.42	3.42
6.1	0.27	0.33	3.54	0.33