From gcao at openjdk.org Mon Sep 2 01:26:26 2024 From: gcao at openjdk.org (Gui Cao) Date: Mon, 2 Sep 2024 01:26:26 GMT Subject: RFR: 8339298: Remove unused function declaration poll_for_safepoint In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 07:03:57 GMT, Gui Cao wrote: > Hi, I noticed that there are two unused function declarations here, in the historical version they were used without UseCompilerSafepoints, now the unused UseCompilerSafepoints have been removed, but the function declarations may have forgotten to be removed. > > ### Testing > - [x] release & fastdebug build OK on linux-aarch64 > - [x] release & fastdebug build OK on linux-riscv64 Thanks all for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20785#issuecomment-2323601152 From gcao at openjdk.org Mon Sep 2 01:26:26 2024 From: gcao at openjdk.org (Gui Cao) Date: Mon, 2 Sep 2024 01:26:26 GMT Subject: Integrated: 8339298: Remove unused function declaration poll_for_safepoint In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 07:03:57 GMT, Gui Cao wrote: > Hi, I noticed that there are two unused function declarations here, in the historical version they were used without UseCompilerSafepoints, now the unused UseCompilerSafepoints have been removed, but the function declarations may have forgotten to be removed. > > ### Testing > - [x] release & fastdebug build OK on linux-aarch64 > - [x] release & fastdebug build OK on linux-riscv64 This pull request has now been integrated. Changeset: 9d7d85a6 Author: Gui Cao Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/9d7d85a6aa20ed95166f5f2f951597bca1fde841 Stats: 4 lines in 2 files changed: 0 ins; 4 del; 0 mod 8339298: Remove unused function declaration poll_for_safepoint Reviewed-by: fyang, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/20785 From duke at openjdk.org Mon Sep 2 02:38:17 2024 From: duke at openjdk.org (kuaiwei) Date: Mon, 2 Sep 2024 02:38:17 GMT Subject: RFR: 8339299: C1 will miss type profile when inline final method In-Reply-To: <-b1JTFIJbwhOHz4a5fasgVmF-aeaOuUuD-UWvNr_XSs=.1e9fa592-c984-40e7-b317-21a072e583b9@github.com> References: <-b1JTFIJbwhOHz4a5fasgVmF-aeaOuUuD-UWvNr_XSs=.1e9fa592-c984-40e7-b317-21a072e583b9@github.com> Message-ID: On Fri, 30 Aug 2024 22:39:04 GMT, Vladimir Ivanov wrote: >> I found sometimes C1 will miss type profile and I tried to write a test to demonstrate it. >> It will happen with these conditions >> 1 It's a virtual call and the callee is a final method. So c1 will think it's static bound. >> 2 The interpreter never touch the callsite, so interpreter does not add type profile. >> 3 In c1 compilation, it will be inlined based on CHA. >> 4 In c2 compilation, the CHA is broken, but type profile is missing, so c2 can not inline it. > > test/hotspot/jtreg/compiler/cha/cha_control.txt line 1: > >> 1: [ > > Currently, the prevalent way to specify compiler directives is through WhiteBox API at runtime (through `WhiteBox.addCompilerDirective(String directive)`). Please, follow the same pattern here. I find it more convenient to reason about test logic when all the pieces are present in a single place. Thanks for your suggestion. I will change the test case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20786#discussion_r1740267415 From rcastanedalo at openjdk.org Mon Sep 2 06:38:07 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 2 Sep 2024 06:38:07 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v11] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 13:23:32 GMT, Feilong Jiang wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with six additional commits since the last revision: >> >> - Add test to motivate compile-time null checks in 'refine_barrier_by_new_val_type' >> - Remark relation between compiler optimization and barrier filter >> - Make 'refine_barrier_by_new_val_type' static and its input argument 'const' >> - Replace 'the null' with 'null' in comment >> - Remove redundant redefinitions of '__' >> - Replace 'already dirty' with 'young' in post-barrier fast path comment > > risc-v port looks good too. > OK, if there are no objections from @feilongjiang or @snazarkin within a couple of days I will prepare an update to jdk-24+13. @TheRealMDoerr done (commit 4ee450a). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2323921726 From rcastanedalo at openjdk.org Mon Sep 2 06:38:07 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 2 Sep 2024 06:38:07 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v12] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: - Merge jdk-24+13 - Add test to motivate compile-time null checks in 'refine_barrier_by_new_val_type' - Remark relation between compiler optimization and barrier filter - Make 'refine_barrier_by_new_val_type' static and its input argument 'const' - Replace 'the null' with 'null' in comment - Remove redundant redefinitions of '__' - Replace 'already dirty' with 'young' in post-barrier fast path comment - Rename g1XChgX to g1GetAndSetX for consistency with Ideal operation names - Pass oldval to the pre-barrier in g1CompareAndExchange/SwapP - Assert that no implicit null checks are generated for memory accesses with barriers - ... and 8 more: https://git.openjdk.org/jdk/compare/52ffcda1...4ee450ad ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/57adcfb0..4ee450ad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=10-11 Stats: 30577 lines in 938 files changed: 18592 ins; 8033 del; 3952 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From enikitin at openjdk.org Mon Sep 2 07:46:32 2024 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Mon, 2 Sep 2024 07:46:32 GMT Subject: RFR: 8339366: [jittester] Make it possible to generate tests without execution Message-ID: This PR: 1. Extracts IR tree generation from execution (left in the Automatic) into a dedicated class IRTreeGenerator; 2. Introduces a generation result record (named IRTreeGenerator.Test). The record contains main and private classes along with the random seed used for their generation; 3. Creates CLI-wrapper classes for Java and ByteCode generators to allow generation-only execution; 4. Add a repeating option to the configuration - to make it possible to specify several main class names. Sample usage: java -cp build/classes --add-opens java.base/java.util=ALL-UNNAMED \ jdk.test.lib.jittester.JavaCodeGenerator \ -k Test_0 -k Test_1 -k Test_10 ------------- Commit messages: - 8339366: [jittester] Make it possible to generate tests without execution Changes: https://git.openjdk.org/jdk/pull/20806/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20806&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339366 Stats: 247 lines in 7 files changed: 167 ins; 59 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/20806.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20806/head:pull/20806 PR: https://git.openjdk.org/jdk/pull/20806 From duke at openjdk.org Mon Sep 2 08:02:31 2024 From: duke at openjdk.org (Yagmur Eren) Date: Mon, 2 Sep 2024 08:02:31 GMT Subject: RFR: 8330159: [C2] Remove or clarify Compile::init_start [v4] In-Reply-To: <7Q9VAVqBUDDnqEcCOWeRh5N-YC0dsg9lyQsOv6xbO80=.6d55dc58-0281-45fd-a92f-d0f6ddd910cc@github.com> References: <7Q9VAVqBUDDnqEcCOWeRh5N-YC0dsg9lyQsOv6xbO80=.6d55dc58-0281-45fd-a92f-d0f6ddd910cc@github.com> Message-ID: <8N4huxorU372tiK-_qaUzjKxr-Oo3V14tvnwoKe_g5M=.c4ba8d78-05c6-40e8-b82c-8e8a5b56c44d@github.com> > Compile::init_start method contained only an assertion. To cleanup, this method is removed and the locations where this method is called are replaced with the corresponding assertion. See issue: [JDK-8330159](https://bugs.openjdk.org/browse/JDK-8330159) Yagmur Eren has updated the pull request incrementally with one additional commit since the last revision: Update full name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20715/files - new: https://git.openjdk.org/jdk/pull/20715/files/192eaaae..0a05f15e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20715&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20715&range=02-03 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20715.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20715/head:pull/20715 PR: https://git.openjdk.org/jdk/pull/20715 From jsjolen at openjdk.org Mon Sep 2 08:23:19 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 2 Sep 2024 08:23:19 GMT Subject: RFR: 8339242: Fix overflow issues in AdlArena In-Reply-To: References: Message-ID: <25DOZYX1NntjE9rbq-GZv3nTJ8a3e754Vj-OhhHrgQQ=.7686431c-06fe-4aeb-9bd5-beb59cb9073e@github.com> On Thu, 29 Aug 2024 15:07:46 GMT, Casper Norrbin wrote: > Hi everyone, > > This PR addresses an issue in `adlArena` where some allocations lack checks for overflow. This could potentially result in successful allocations when called with unrealistic values. > > The fix includes: > > - Adding assertions to check for potential overflow. > - Reordering some operations to guard against overflow. LGTM ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20774#pullrequestreview-2274989449 From jsjolen at openjdk.org Mon Sep 2 08:23:19 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 2 Sep 2024 08:23:19 GMT Subject: RFR: 8339242: Fix overflow issues in AdlArena In-Reply-To: References: Message-ID: On Sat, 31 Aug 2024 04:49:57 GMT, Thomas Stuefe wrote: > If the aim is to increase security, would it not make more sense to test against hardcoded "reasonable max" values? Anything larger than a few MB is likely to be an error anyway, or? This is a build-time only feature, as it's run on `*.ad` files, so I think this should count as trusted input. The assertions in the PR just let's us bail early rather than discovering the overflow later on through a crash (or not at all). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20774#issuecomment-2324111151 From luhenry at openjdk.org Mon Sep 2 08:40:21 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 2 Sep 2024 08:40:21 GMT Subject: RFR: 8321010: RISC-V: C2 RoundVF [v2] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 03:09:16 GMT, Fei Yang wrote: >>> Hello Hamlin, I recall you had licheepi board. I would be nice if you can try to measure rvv performance gain with this https://github.com/syntacore/syntaj21/tree/rvv0.7.1 >>> >>> This PR showed it's not always easy to win perf just by using rvv - #17413 >>> >>> I understand it might not be possible, but would be nice to give it a try (I can share hsdis with support for 0.7.1 if needed) >> >> Had a internal discussion about your suggestion, seems 0.7.1 is not incompatible with 1.0/2.0, and for this simple intrinsic, we think a better path is to have it first, then re-visit it when we have real hardware to measure the performance later. > > @Hamlin-Li: Thanks for the quick update. Considering that saving/restoring for FRM could be expensive, I do wonder if we could gather some performance numbers before we go. I see people are now testing on RVV-1.0 hardwares [1] and I am also trying to get one (AFAIK, more powerful RVV-1.0 hardwares are also coming later this year, SG2044, SG2380, etc.). Also from discussion on [2], I see there are also other approaches available there without flipping the FP rounding mode. But I am not sure if they make sense for our case or work better without actual testing. > > [1] https://github.com/openjdk/jdk/pull/18382#issuecomment-2045145255 > [2] https://github.com/openjdk/jdk/pull/8204 @RealFYang @turbanoff could we please have another review? Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17745#issuecomment-2324153121 From duke at openjdk.org Mon Sep 2 09:24:57 2024 From: duke at openjdk.org (kuaiwei) Date: Mon, 2 Sep 2024 09:24:57 GMT Subject: RFR: 8339299: C1 will miss type profile when inline final method [v2] In-Reply-To: References: Message-ID: > I found sometimes C1 will miss type profile and I tried to write a test to demonstrate it. > It will happen with these conditions > 1 It's a virtual call and the callee is a final method. So c1 will think it's static bound. > 2 The interpreter never touch the callsite, so interpreter does not add type profile. > 3 In c1 compilation, it will be inlined based on CHA. > 4 In c2 compilation, the CHA is broken, but type profile is missing, so c2 can not inline it. kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Modify test case to use whitebox api ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20786/files - new: https://git.openjdk.org/jdk/pull/20786/files/1455feb0..d1c0594a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20786&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20786&range=00-01 Stats: 121 lines in 2 files changed: 44 ins; 44 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/20786.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20786/head:pull/20786 PR: https://git.openjdk.org/jdk/pull/20786 From duke at openjdk.org Mon Sep 2 09:24:57 2024 From: duke at openjdk.org (kuaiwei) Date: Mon, 2 Sep 2024 09:24:57 GMT Subject: RFR: 8339299: C1 will miss type profile when inline final method [v2] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 22:28:45 GMT, Vladimir Ivanov wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Modify test case to use whitebox api > > test/hotspot/jtreg/compiler/cha/TypeProfileFinalMethod.java line 43: > >> 41: public class TypeProfileFinalMethod { >> 42: public static void main(String[] args) throws Exception { >> 43: if (args.length == 1 && args[0].equals("Run")) { > > Instead of a check at runtime, you can introduce a separate class which drives test logic. Take a look at `compiler/jsr292/MHInlineTest.java` for an example (or grep for `class Launcher` under `test/hotspot/jtreg`). Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20786#discussion_r1740596957 From duke at openjdk.org Mon Sep 2 09:36:53 2024 From: duke at openjdk.org (Casper Norrbin) Date: Mon, 2 Sep 2024 09:36:53 GMT Subject: RFR: 8339242: Fix overflow issues in AdlArena [v2] In-Reply-To: References: Message-ID: <2S1gKrLw3byu7APaERxXdcscYjhlYt3Edv-TMRgkqso=.35838990-c7cd-4f27-b2d5-500a3a0f375c@github.com> > Hi everyone, > > This PR addresses an issue in `adlArena` where some allocations lack checks for overflow. This could potentially result in successful allocations when called with unrealistic values. > > The fix includes: > > - Adding assertions to check for potential overflow. > - Reordering some operations to guard against overflow. Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: arena realloc overflow check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20774/files - new: https://git.openjdk.org/jdk/pull/20774/files/c394d0cc..cf0b4348 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20774&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20774&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20774.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20774/head:pull/20774 PR: https://git.openjdk.org/jdk/pull/20774 From duke at openjdk.org Mon Sep 2 09:45:18 2024 From: duke at openjdk.org (Casper Norrbin) Date: Mon, 2 Sep 2024 09:45:18 GMT Subject: RFR: 8339242: Fix overflow issues in AdlArena [v2] In-Reply-To: References: Message-ID: <71Hix9ePWLMjEkBo66Cwb4fn5Qw_ibaXVjt1DUUHZ30=.3843d399-0f1c-42ad-9a66-2c30db461ad4@github.com> On Fri, 30 Aug 2024 23:35:41 GMT, Dean Long wrote: >> Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: >> >> arena realloc overflow check > > src/hotspot/share/adlc/adlArena.cpp line 154: > >> 152: if( (c_old+old_size == _hwm) && // Adjusting recent thing >> 153: ((size_t)(_max-c_old) >= new_size) ) { // Still fits where it sits, safe from overflow >> 154: > > This code appears to be a copy of Arena::Arealloc, so we should probably fix both at the same time. Good catch! Fixed in cf0b434. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20774#discussion_r1740625142 From duke at openjdk.org Mon Sep 2 11:10:35 2024 From: duke at openjdk.org (Yagmur Eren) Date: Mon, 2 Sep 2024 11:10:35 GMT Subject: RFR: 8330159: [C2] Remove or clarify Compile::init_start [v5] In-Reply-To: <7Q9VAVqBUDDnqEcCOWeRh5N-YC0dsg9lyQsOv6xbO80=.6d55dc58-0281-45fd-a92f-d0f6ddd910cc@github.com> References: <7Q9VAVqBUDDnqEcCOWeRh5N-YC0dsg9lyQsOv6xbO80=.6d55dc58-0281-45fd-a92f-d0f6ddd910cc@github.com> Message-ID: > Compile::init_start method contained only an assertion. To cleanup, this method is removed and the locations where this method is called are replaced with the corresponding assertion. See issue: [JDK-8330159](https://bugs.openjdk.org/browse/JDK-8330159) Yagmur Eren has updated the pull request incrementally with one additional commit since the last revision: adding NOT_DEBUG_RETURN instead of DEBUG_ONLY for Compile::verify_start ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20715/files - new: https://git.openjdk.org/jdk/pull/20715/files/0a05f15e..66e23d6f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20715&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20715&range=03-04 Stats: 6 lines in 3 files changed: 1 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20715.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20715/head:pull/20715 PR: https://git.openjdk.org/jdk/pull/20715 From duke at openjdk.org Mon Sep 2 11:15:19 2024 From: duke at openjdk.org (Yagmur Eren) Date: Mon, 2 Sep 2024 11:15:19 GMT Subject: RFR: 8330159: [C2] Remove or clarify Compile::init_start [v5] In-Reply-To: References: <7Q9VAVqBUDDnqEcCOWeRh5N-YC0dsg9lyQsOv6xbO80=.6d55dc58-0281-45fd-a92f-d0f6ddd910cc@github.com> Message-ID: <2pFdztomWU60_fYA092wBtpPyGjwuprzlzC1nj8xyMk=.2ae0fc34-b055-48ae-8320-7d014a148064@github.com> On Mon, 2 Sep 2024 11:10:35 GMT, Yagmur Eren wrote: >> Compile::init_start method contained only an assertion. To cleanup, this method is removed and the locations where this method is called are replaced with the corresponding assertion. See issue: [JDK-8330159](https://bugs.openjdk.org/browse/JDK-8330159) > > Yagmur Eren has updated the pull request incrementally with one additional commit since the last revision: > > adding NOT_DEBUG_RETURN instead of DEBUG_ONLY for Compile::verify_start Thanks a lot for the review and suggestions @dean-long and @chhagedorn! I believe I can integrate now if it looks good! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20715#issuecomment-2324469670 From duke at openjdk.org Mon Sep 2 11:15:21 2024 From: duke at openjdk.org (Yagmur Eren) Date: Mon, 2 Sep 2024 11:15:21 GMT Subject: RFR: 8330159: [C2] Remove or clarify Compile::init_start [v3] In-Reply-To: References: <7Q9VAVqBUDDnqEcCOWeRh5N-YC0dsg9lyQsOv6xbO80=.6d55dc58-0281-45fd-a92f-d0f6ddd910cc@github.com> <4tc9doDDziDT16yv9jghJoWtiPO3AEWOuO0wfPk1QGs=.9ff1f3c3-82e0-4ab1-8437-bc12ea34820d@github.com> Message-ID: <-rEJbe8llUiFKGdUvt2JIHb82IXofl-PGivU7szfkIg=.782db8b2-4ff2-4a4f-a715-0c2ec3e1930b@github.com> On Sat, 31 Aug 2024 00:01:47 GMT, Dean Long wrote: >> Yagmur Eren has updated the pull request incrementally with one additional commit since the last revision: >> >> remove method header > > src/hotspot/share/opto/generateOptoStub.cpp line 264: > >> 262: returnadr()); >> 263: root()->add_req(_gvn.transform(to_exc)); // bind to root to keep live >> 264: DEBUG_ONLY(C->verify_start(start);) > > This looks fine, but instead of marking every call site with DEBUG_ONLY, how about adding NOT_DEBUG_RETURN to the declaration of verify_start(), so it is a no-op in non-debug builds? For an example, see check_no_dead_use(). Thanks a lot for feedback @dean-long! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20715#discussion_r1740737874 From dfenacci at openjdk.org Mon Sep 2 11:31:24 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 2 Sep 2024 11:31:24 GMT Subject: RFR: 8333891: Method excluded with directive is not compiled after removal of directive In-Reply-To: References: <2xstE3V0PD8FGcijx_THSX1YgIJ7fZLponoL7b96TiY=.04ecae5f-9e3a-4c26-9893-72822f31c753@github.com> Message-ID: <21YY4Zhbx9XINm-d4yNhn_VU1ZKSHhtelyM6lTBLRIc=.834e0225-fbaf-4dc6-82af-4092a158316c@github.com> On Mon, 17 Jun 2024 18:00:54 GMT, Evgeny Astigeevich wrote: >> Test `runtime/BootstrapMethod/BSMCalledTwice.java` might have failed on Windows x64 because of the change. > >> Test `runtime/BootstrapMethod/BSMCalledTwice.java` might have failed on Windows x64 because of the change. > > I managed to reproduce the failure. The test fails because of my change. > There is a data race: > > Thread1: cleaning_flag ... setting_flag ... assert > Thread2: cleaning_flag ... setting_flag ... assert > > > `Thread2` can clean the flag between `Thread1` setting the flag and checking the assert. @eastig are you still working on this? Do you want to reopen it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19637#issuecomment-2324509876 From stuefe at openjdk.org Mon Sep 2 11:57:19 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 2 Sep 2024 11:57:19 GMT Subject: RFR: 8339242: Fix overflow issues in AdlArena [v2] In-Reply-To: <2S1gKrLw3byu7APaERxXdcscYjhlYt3Edv-TMRgkqso=.35838990-c7cd-4f27-b2d5-500a3a0f375c@github.com> References: <2S1gKrLw3byu7APaERxXdcscYjhlYt3Edv-TMRgkqso=.35838990-c7cd-4f27-b2d5-500a3a0f375c@github.com> Message-ID: On Mon, 2 Sep 2024 09:36:53 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> This PR addresses an issue in `adlArena` where some allocations lack checks for overflow. This could potentially result in successful allocations when called with unrealistic values. >> >> The fix includes: >> >> - Adding assertions to check for potential overflow. >> - Reordering some operations to guard against overflow. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > arena realloc overflow check src/hotspot/share/memory/arena.cpp line 339: > 337: // See if we can resize in-place > 338: if( (c_old+old_size == _hwm) && // Adjusting recent thing > 339: ((size_t)(_max-c_old) >= corrected_new_size) ) { // Still fits where it sits, safe from overflow This change is correct, but it hides an important finding behind a reshuffling of parameters that someone else may innocently reshape later. It also makes the code less readable. Can we use something like saturated_add()? I would also add an explicit assert for a reasonable max size. Arena allocations should be small. Nobody should hand in sizes larger than a few MB, so asserting for size >= 2^31 (2g) would make sense. Anything as large as that is almost certainly an error we should trap on. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20774#discussion_r1740791483 From jbhateja at openjdk.org Mon Sep 2 12:20:59 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 2 Sep 2024 12:20:59 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v5] In-Reply-To: References: Message-ID: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolved ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/c42b4afa..767aeef3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=03-04 Stats: 249 lines in 9 files changed: 75 ins; 67 del; 107 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From jbhateja at openjdk.org Mon Sep 2 12:21:00 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 2 Sep 2024 12:21:00 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v4] In-Reply-To: References: Message-ID: On Mon, 26 Aug 2024 22:17:55 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions. > > src/hotspot/cpu/x86/assembler_x86.cpp line 10229: > >> 10227: InstructionMark im(this); >> 10228: assert(VM_Version::supports_avx512bw() && (vector_len == AVX_512bit || VM_Version::supports_avx512vl()), ""); >> 10229: InstructionAttr attributes(vector_len, /* vex_w */ true,/* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true); > > vex_w could be false here. Encoding specification mentions W bit gets ignored, so no functional issues, will make it false to comply with our convention. > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 6698: > >> 6696: // Unsigned values ranges comprise of only +ve numbers, thus there exist only an upper bound saturation. >> 6697: // overflow = ((UMAX - MAX(SRC1 & SRC2)) >> 31 == 1 >> 6698: // Res = Signed Add INP1, INP2 > > The >>> 31 is not coded so comment could be improved to match the code. > Comment has SRC1/INP1 term mixed. > Also, could overflow not be implemented based on much simpler Java scalar algo: > Overflow = Res This is much straight forward, also evex supports unsigned comparison. Java scalar algo was empirically proved to hold good, I also verified with Alive2 solver which proved its semantic equivalence to HD section 2-13 based vector implementation. Here is the link to Alive2 solver which operates on LLVM IR inputs. [https://alive2.llvm.org/ce/z/XDQ7dY](https://alive2.llvm.org/ce/z/XDQ7dY) > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 6749: > >> 6747: vpor(xtmp2, xtmp2, src2, vlen_enc); >> 6748: // Compute mask for muxing T1 with T3 using SRC1. >> 6749: vpsign_extend_dq(etype, xtmp4, src1, vlen_enc); > > I don't think we need to do the sign extension. The blend instruction uses most significant bit to do the blend. Original vector is has double / quad word lanes which are being blended using byte level mask, sign extension will ensure that sign bit is propagated to MSB bits of each constituent byte mask corresponding double / quad word source lane. > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 6939: > >> 6937: >> 6938: // Compose saturating min/max vector using first input polarity mask. >> 6939: vpsign_extend_dq(etype, xtmp4, src1, vlen_enc); > > Sign extend to lower bits not needed as blend uses msbit only. Original vector is has double / quad word lanes which are being blended using byte level mask, sign extension will ensure that sign bit is propagated to MSB bits of each constituent byte mask corresponding double / quad word source lane. > src/hotspot/cpu/x86/x86.ad line 1953: > >> 1951: if (UseAVX < 1 || size_in_bits < 128 || (size_in_bits == 512 && !VM_Version::supports_avx512bw())) { >> 1952: return false; >> 1953: } > > UseAVX < 1 could be written as UseAVX == 0. Could we not do register version for size_in_bit < 128? I get your point, but constraints ensure we only address cases with vector size >= 128 bit in this patch.. > src/hotspot/cpu/x86/x86.ad line 10635: > >> 10633: %} >> 10634: >> 10635: instruct saturating_unsigned_add_reg_avx(vec dst, vec src1, vec src2, vec xtmp1, vec xtmp2, vec xtmp3, vec xtmp4) > > Should the temp here and all the places related to !avx512vl() be legVec instead of vec? Predicate already has AVX512VL check and so does dynamic register classes associated with its operands. > src/hotspot/cpu/x86/x86.ad line 10656: > >> 10654: match(Set dst (SaturatingSubVI src1 src2)); >> 10655: match(Set dst (SaturatingSubVL src1 src2)); >> 10656: effect(TEMP ktmp); > > This needs TEMP dst as well. There is no use of either of the source operands after assignment to dst in the macro assembly routine. > src/java.base/share/classes/java/lang/Byte.java line 647: > >> 645: */ >> 646: public static byte subSaturating(byte a, byte b) { >> 647: byte res = (byte)(a - b); > > Could we not do subSaturating as an int operation on similar lines as addSaturating? Yes, core libs also have {add/subtract}Exact API which instead of saturating over / underflowing values throws ArithmeticException. Streamlining overflow checks for saturating long operations on the same lines to address Joe's concerns on new constants. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1740820063 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1740819360 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1740823487 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1740823011 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1740819742 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1740819629 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1740822270 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1740821085 From jbhateja at openjdk.org Mon Sep 2 12:21:00 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 2 Sep 2024 12:21:00 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v2] In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 16:05:45 GMT, Sandhya Viswanathan wrote: >> Wonder if it would have been simpler if we added unsigned vector operators like Op_SaturatingUnsignedAddVB etc. We are not adding unsigned data types to Java, only supporting unsigned (saturating) operations on existing signed integral types. > > If the aim is to reduce the number of nodes, we could merge the Op_SaturatingAddVB, Op_SaturatingAddVS, Op_SaturatingAddVI, and Op_SaturatingAddVL into one Op_SaturatingAddV. Likewise for unsigned saturating add into Op_SaturatingUnsignedAddV. Hey @sviswa7, our concern was around value ranges of new unsigned scalar type, which as mentioned will be addressed when I support intrinsification of new core lib APIs and associated range constraining / folding optimization in a follow up patch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1740817837 From varadam at openjdk.org Mon Sep 2 13:04:20 2024 From: varadam at openjdk.org (Varada M) Date: Mon, 2 Sep 2024 13:04:20 GMT Subject: RFR: JDK-8332423 : [PPC64] Remove C1_MacroAssembler::call_c_with_frame_resize [v6] In-Reply-To: <0XReE7gZIQm55hR-SDnLB5664zfafkhOyivXTmGEIFk=.be01447c-7667-45f5-b1c4-08688b75ce4e@github.com> References: <0XReE7gZIQm55hR-SDnLB5664zfafkhOyivXTmGEIFk=.be01447c-7667-45f5-b1c4-08688b75ce4e@github.com> Message-ID: On Thu, 29 Aug 2024 07:49:49 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8332423](https://bugs.openjdk.org/browse/JDK-8332423) >> C1_MacroAssembler::call_c_with_frame_resize is only used with frame_resize == 0. >> Also, call_c is adapted as per endianess of system. >> We can adapt the exisiting code to handle the endianness check at one place and not have to repeatedly check at multiple places to make calls to call_c. > > Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: > > - header file change > - remove frame_resize LGTM! tier1 testing done on linux-ppc64le with both release and fastdebug, no related failures. Thank you ------------- PR Comment: https://git.openjdk.org/jdk/pull/19947#issuecomment-2324710147 From duke at openjdk.org Mon Sep 2 13:20:24 2024 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 2 Sep 2024 13:20:24 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v2] In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 14:47:47 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request incrementally with two additional commits since the last revision: > > - num_8b_elems_in_vec --> nof_vec_elems > - Removed checks for (MaxVectorSize >= 16) per @RealFYang suggestion. . ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-2324742994 From fgao at openjdk.org Mon Sep 2 13:40:18 2024 From: fgao at openjdk.org (Fei Gao) Date: Mon, 2 Sep 2024 13:40:18 GMT Subject: RFR: 8339063: [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only [v3] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 03:03:53 GMT, Joshua Zhu wrote: >> Please review this minor enhancement that skips verify_sve_vector_length after native calls. >> It works on SVE micro-architecture that only supports 128-bit vector length. > > Joshua Zhu has updated the pull request incrementally with one additional commit since the last revision: > > Fix mismatch issue in ad m4 file LGTM! Thanks for the update. ------------- Marked as reviewed by fgao (Committer). PR Review: https://git.openjdk.org/jdk/pull/20724#pullrequestreview-2275690197 From adinn at openjdk.org Mon Sep 2 13:45:20 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 2 Sep 2024 13:45:20 GMT Subject: RFR: 8339063: [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only [v3] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 03:03:53 GMT, Joshua Zhu wrote: >> Please review this minor enhancement that skips verify_sve_vector_length after native calls. >> It works on SVE micro-architecture that only supports 128-bit vector length. > > Joshua Zhu has updated the pull request incrementally with one additional commit since the last revision: > > Fix mismatch issue in ad m4 file still good ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20724#pullrequestreview-2275700348 From mdoerr at openjdk.org Mon Sep 2 13:54:21 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 2 Sep 2024 13:54:21 GMT Subject: RFR: JDK-8332423 : [PPC64] Remove C1_MacroAssembler::call_c_with_frame_resize [v6] In-Reply-To: <0XReE7gZIQm55hR-SDnLB5664zfafkhOyivXTmGEIFk=.be01447c-7667-45f5-b1c4-08688b75ce4e@github.com> References: <0XReE7gZIQm55hR-SDnLB5664zfafkhOyivXTmGEIFk=.be01447c-7667-45f5-b1c4-08688b75ce4e@github.com> Message-ID: On Thu, 29 Aug 2024 07:49:49 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8332423](https://bugs.openjdk.org/browse/JDK-8332423) >> C1_MacroAssembler::call_c_with_frame_resize is only used with frame_resize == 0. >> Also, call_c is adapted as per endianess of system. >> We can adapt the exisiting code to handle the endianness check at one place and not have to repeatedly check at multiple places to make calls to call_c. > > Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: > > - header file change > - remove frame_resize I had already tested on linux-ppc64le. What about AIX? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19947#issuecomment-2324811030 From duke at openjdk.org Mon Sep 2 15:00:19 2024 From: duke at openjdk.org (Casper Norrbin) Date: Mon, 2 Sep 2024 15:00:19 GMT Subject: RFR: 8339242: Fix overflow issues in AdlArena [v2] In-Reply-To: References: <2S1gKrLw3byu7APaERxXdcscYjhlYt3Edv-TMRgkqso=.35838990-c7cd-4f27-b2d5-500a3a0f375c@github.com> Message-ID: On Mon, 2 Sep 2024 11:54:40 GMT, Thomas Stuefe wrote: >> Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: >> >> arena realloc overflow check > > src/hotspot/share/memory/arena.cpp line 339: > >> 337: // See if we can resize in-place >> 338: if( (c_old+old_size == _hwm) && // Adjusting recent thing >> 339: ((size_t)(_max-c_old) >= corrected_new_size) ) { // Still fits where it sits, safe from overflow > > This change is correct, but it hides an important finding behind a reshuffling of parameters that someone else may innocently reshape later. It also makes the code less readable. Can we use something like saturated_add()? > > I would also add an explicit assert for a reasonable max size. Arena allocations should be small. Nobody should hand in sizes larger than a few MB, so asserting for size >= 2^31 (2g) would make sense. Anything as large as that is almost certainly an error we should trap on. I would support this. May be possible to just use `saturated_add()`. Would be limited to `max_jint`, but with an assert for sizes above 2^31, that shouldn't be an issue. Will look into it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20774#discussion_r1741037610 From duke at openjdk.org Mon Sep 2 15:18:21 2024 From: duke at openjdk.org (duke) Date: Mon, 2 Sep 2024 15:18:21 GMT Subject: RFR: 8339063: [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only [v3] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 03:03:53 GMT, Joshua Zhu wrote: >> Please review this minor enhancement that skips verify_sve_vector_length after native calls. >> It works on SVE micro-architecture that only supports 128-bit vector length. > > Joshua Zhu has updated the pull request incrementally with one additional commit since the last revision: > > Fix mismatch issue in ad m4 file @JoshuaZhuwj Your change (at version d19108585ebb2b849229c2bf11d0ea6d6860a56e) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20724#issuecomment-2324967336 From jzhu at openjdk.org Mon Sep 2 15:18:20 2024 From: jzhu at openjdk.org (Joshua Zhu) Date: Mon, 2 Sep 2024 15:18:20 GMT Subject: RFR: 8339063: [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only [v2] In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 15:52:16 GMT, Andrew Dinn wrote: >> Joshua Zhu has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix compilation failure with --disable-precompiled-headers > > Ok, that sounds like it is sufficient. Thank you for the reviews! @adinn @fg1417 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20724#issuecomment-2324965015 From jzhu at openjdk.org Mon Sep 2 15:40:29 2024 From: jzhu at openjdk.org (Joshua Zhu) Date: Mon, 2 Sep 2024 15:40:29 GMT Subject: Integrated: 8339063: [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 09:28:52 GMT, Joshua Zhu wrote: > Please review this minor enhancement that skips verify_sve_vector_length after native calls. > It works on SVE micro-architecture that only supports 128-bit vector length. This pull request has now been integrated. Changeset: 0e6bb514 Author: Joshua Zhu Committer: Andrew Dinn URL: https://git.openjdk.org/jdk/commit/0e6bb514c8ec7c4a7100fe06eaa9e954a74fda30 Stats: 60 lines in 7 files changed: 33 ins; 14 del; 13 mod 8339063: [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only Reviewed-by: adinn, fgao ------------- PR: https://git.openjdk.org/jdk/pull/20724 From kbarrett at openjdk.org Mon Sep 2 17:47:18 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 2 Sep 2024 17:47:18 GMT Subject: RFR: 8339242: Fix overflow issues in AdlArena [v2] In-Reply-To: References: <2S1gKrLw3byu7APaERxXdcscYjhlYt3Edv-TMRgkqso=.35838990-c7cd-4f27-b2d5-500a3a0f375c@github.com> Message-ID: <8Nhef8XklXkIyYGkJbI06nW6rQNCiVjtmHEYJ_QQYBE=.d54c2a46-b722-4ad9-8780-c274b14bb94f@github.com> On Mon, 2 Sep 2024 14:58:04 GMT, Casper Norrbin wrote: >> src/hotspot/share/memory/arena.cpp line 339: >> >>> 337: // See if we can resize in-place >>> 338: if( (c_old+old_size == _hwm) && // Adjusting recent thing >>> 339: ((size_t)(_max-c_old) >= corrected_new_size) ) { // Still fits where it sits, safe from overflow >> >> This change is correct, but it hides an important finding behind a reshuffling of parameters that someone else may innocently reshape later. It also makes the code less readable. Can we use something like saturated_add()? >> >> I would also add an explicit assert for a reasonable max size. Arena allocations should be small. Nobody should hand in sizes larger than a few MB, so asserting for size >= 2^31 (2g) would make sense. Anything as large as that is almost certainly an error we should trap on. > > I would support this. > > May be possible to just use `saturated_add()`. Would be limited to `max_jint`, but with an assert for sizes above 2^31, that shouldn't be an issue. Will look into it. Please, not `max_jint`. This limit isn't tied to Java integer semantics. `MAX_INT` might be appropriate. (Yes, I know they likely have the same value.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20774#discussion_r1741152312 From eastigeevich at openjdk.org Mon Sep 2 20:42:26 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 2 Sep 2024 20:42:26 GMT Subject: RFR: 8333891: Method excluded with directive is not compiled after removal of directive In-Reply-To: <21YY4Zhbx9XINm-d4yNhn_VU1ZKSHhtelyM6lTBLRIc=.834e0225-fbaf-4dc6-82af-4092a158316c@github.com> References: <2xstE3V0PD8FGcijx_THSX1YgIJ7fZLponoL7b96TiY=.04ecae5f-9e3a-4c26-9893-72822f31c753@github.com> <21YY4Zhbx9XINm-d4yNhn_VU1ZKSHhtelyM6lTBLRIc=.834e0225-fbaf-4dc6-82af-4092a158316c@github.com> Message-ID: On Mon, 2 Sep 2024 11:29:05 GMT, Damon Fenacci wrote: >>> Test `runtime/BootstrapMethod/BSMCalledTwice.java` might have failed on Windows x64 because of the change. >> >> I managed to reproduce the failure. The test fails because of my change. >> There is a data race: >> >> Thread1: cleaning_flag ... setting_flag ... assert >> Thread2: cleaning_flag ... setting_flag ... assert >> >> >> `Thread2` can clean the flag between `Thread1` setting the flag and checking the assert. > > @eastig are you still working on this? Do you want to reopen it? Hi @dafedafe, IMO as the fix is not simple it might not be worth to merge. The fix uses method flags. I have not seen compiler directives are used a lot, especially the case: add a directive and remove the directive. This can be waste of method flags. This PR got linked to the JBS issue. When more users complain of the issue, we can reconsider the fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19637#issuecomment-2325283583 From mdoerr at openjdk.org Mon Sep 2 20:44:23 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 2 Sep 2024 20:44:23 GMT Subject: RFR: JDK-8332423 : [PPC64] Remove C1_MacroAssembler::call_c_with_frame_resize [v6] In-Reply-To: <0XReE7gZIQm55hR-SDnLB5664zfafkhOyivXTmGEIFk=.be01447c-7667-45f5-b1c4-08688b75ce4e@github.com> References: <0XReE7gZIQm55hR-SDnLB5664zfafkhOyivXTmGEIFk=.be01447c-7667-45f5-b1c4-08688b75ce4e@github.com> Message-ID: On Thu, 29 Aug 2024 07:49:49 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8332423](https://bugs.openjdk.org/browse/JDK-8332423) >> C1_MacroAssembler::call_c_with_frame_resize is only used with frame_resize == 0. >> Also, call_c is adapted as per endianess of system. >> We can adapt the exisiting code to handle the endianness check at one place and not have to repeatedly check at multiple places to make calls to call_c. > > Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: > > - header file change > - remove frame_resize Changes requested by mdoerr (Reviewer). src/hotspot/cpu/ppc/macroAssembler_ppc.hpp line 368: > 366: address call_c(Register function_descriptor); > 367: address call_c(address function_entry, relocInfo::relocType rt = relocInfo::none) { > 368: return call_c((FunctionDescriptor*)function_entry, rt); This breaks ABIv1. Please use a cast to `const FunctionDescriptor*` and move your new function below ` address call_c(const FunctionDescriptor* function_descriptor, relocInfo::relocType rt);`. ------------- PR Review: https://git.openjdk.org/jdk/pull/19947#pullrequestreview-2276140748 PR Review Comment: https://git.openjdk.org/jdk/pull/19947#discussion_r1741239572 From duke at openjdk.org Tue Sep 3 06:30:00 2024 From: duke at openjdk.org (kuaiwei) Date: Tue, 3 Sep 2024 06:30:00 GMT Subject: RFR: 8339299: C1 will miss type profile when inline final method [v3] In-Reply-To: References: Message-ID: <5CB0aexpZn0lL8DvUFlOoNhU223th2guno8mMWuNl-w=.03067572-03dc-45a7-ba07-f489c43fc4d0@github.com> > I found sometimes C1 will miss type profile and I tried to write a test to demonstrate it. > It will happen with these conditions > 1 It's a virtual call and the callee is a final method. So c1 will think it's static bound. > 2 The interpreter never touch the callsite, so interpreter does not add type profile. > 3 In c1 compilation, it will be inlined based on CHA. > 4 In c2 compilation, the CHA is broken, but type profile is missing, so c2 can not inline it. kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Simplify should_profile_receiver_type ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20786/files - new: https://git.openjdk.org/jdk/pull/20786/files/d1c0594a..fc421c9a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20786&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20786&range=01-02 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20786.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20786/head:pull/20786 PR: https://git.openjdk.org/jdk/pull/20786 From roland at openjdk.org Tue Sep 3 07:01:26 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 3 Sep 2024 07:01:26 GMT Subject: RFR: 8334724: C2: remove PhaseIdealLoop::cast_incr_before_loop() [v2] In-Reply-To: <8QYwwkPgR9grcCttNV0KHliNjH9kmBGUt7kc2b5wPW0=.b10e5195-eb8e-4148-960d-670554b57ace@github.com> References: <1w8L64U3JeY8qKskSNm4OlTablOucHHThfW-9nakz4E=.e52b0193-9048-4382-8242-f5296483898a@github.com> <6kgDCB1rxZyn1JEX-8hvKyOQE07oMs8_kr7Cjbix3Gg=.61a77a4f-394c-48d6-a482-fe73f3314f5b@github.com> <8QYwwkPgR9grcCttNV0KHliNjH9kmBGUt7kc2b5wPW0=.b10e5195-eb8e-4148-960d-670554b57ace@github.com> Message-ID: On Thu, 22 Aug 2024 16:23:42 GMT, Vladimir Kozlov wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > I am not comfortable with big regression on MacOSX aarch64 even if you can't reproduce it locally. We need to rerun that testing to make sure it is random as you said. > > Please, merge latest JDK. @vnkozlov @chhagedorn thanks for the reviews and testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19831#issuecomment-2325741902 From roland at openjdk.org Tue Sep 3 07:01:26 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 3 Sep 2024 07:01:26 GMT Subject: Integrated: 8334724: C2: remove PhaseIdealLoop::cast_incr_before_loop() In-Reply-To: <1w8L64U3JeY8qKskSNm4OlTablOucHHThfW-9nakz4E=.e52b0193-9048-4382-8242-f5296483898a@github.com> References: <1w8L64U3JeY8qKskSNm4OlTablOucHHThfW-9nakz4E=.e52b0193-9048-4382-8242-f5296483898a@github.com> Message-ID: On Fri, 21 Jun 2024 14:34:09 GMT, Roland Westrelin wrote: > I propose removing `PhaseIdealLoop::cast_incr_before_loop()` and the > `CastII` nodes that it adds at counted loop heads. > > They were added to prevent nodes to float above the zero trip guard > when the backedge of a counted loop is removed. In particular, when a > range check is hoisted by predication, pre/main/post loops are created > and if one of the main or post loops lose its backedge, an array load > that's control dependent on a predicate above the pre loop could float > above the zero trip guard of the main or post loop. That can no longer > happen AFAICT with changes related to assert predicates. The array > load is now updated to have a control dependency that's below the zero > trip guard. > > The reason I'm revisiting this is that I noticed that > `PhaseIdealLoop::cast_incr_before_loop()` has a bug. When it adds the > `CastII`, it looks for the loop phi and picks input 1 of the phi it > finds as input to the `CastII`. To find the loop phi, it starts from > the loop incremement and loop for a use that's a phi and has the loop > head as control. It never checks that the phi it finds is the loop > phi. There can be more than one phi as uses of the increment at the > loop head and it can pick the wrong one. I tried to write a test case > where this would cause a bug but couldn't actually find any use for > the `CastII` anymore. > > In my testing, the only issue when the `CastII` are not added is that > some IR tests for vectorization fails: > > compiler/vectorization/TestPopulateIndex.java > compiler/vectorization/runner/ArrayShiftOpTest.java > compiler/vectorization/runner/LoopArrayIndexComputeTest.java > > because removing the `CastII` causes split if to occur with some nodes > that take the loop phi as input. That then causes pattern matching > during superword to break. I added logic to prevent split if for those > cases. This pull request has now been integrated. Changeset: 3a88fd43 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/3a88fd437dfb218df5d3338c8ee7d70416839cf8 Stats: 60 lines in 3 files changed: 27 ins; 28 del; 5 mod 8334724: C2: remove PhaseIdealLoop::cast_incr_before_loop() Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/19831 From rcastanedalo at openjdk.org Tue Sep 3 07:26:00 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 3 Sep 2024 07:26:00 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v13] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with four additional commits since the last revision: - Increase test coverage of new-object stores with different type information - Refactor the two post-barrier removal cases into a single expression - Remove unnecessary early null-based post-barrier elision - Make store capturability test G1-specific and more precise ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/4ee450ad..1ea2862f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=11-12 Stats: 88 lines in 5 files changed: 66 ins; 7 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Tue Sep 3 07:26:00 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 3 Sep 2024 07:26:00 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> Message-ID: <3ME602kl5NBEdiWT53M-B-YHtyU4yYgwc-lVFPxjiKg=.7298f764-788a-4c79-a2aa-324407c63813@github.com> On Fri, 30 Aug 2024 13:49:10 GMT, Roberto Casta?eda Lozano wrote: > Thanks for your suggestions and comments, Kim! I have addressed them now (modulo a couple of follow-up experiments that I will try out in the next days), please let me know if there is any further question. @kimbarrett I have addressed all your comments now (including follow-up enhancements), please re-review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2325782979 From rcastanedalo at openjdk.org Tue Sep 3 07:26:01 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 3 Sep 2024 07:26:01 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v9] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 13:40:24 GMT, Roberto Casta?eda Lozano wrote: > A perhaps more principled solution might be extending store-capturing analysis to reject stores with late-expanded barriers. I will give it a try. This option proved to be infeasible because other GCs (ZGC) rely on store capturing for barrier elision. Furthermore, this would prevent eliding G1 barriers that are found to be elidable only after the program is simplified by C2's intermediate optimizations, even if `ReduceInitialCardMarks` is enabled (I found a few such cases, e.g. where range check elimination is the enabling simplification). Instead, I have opted to remove the `ReduceInitialCardMarks` condition from `StoreNode::Ideal` and introduce a GC-specific test to determine whether a store can be captured and used for object initialization (commit 6b9954979). For G1, this is true iff the store does not have any barrier or it does have barriers but `ReduceInitialCardMarks` is enabled. For all other GCs the test is always true, which preserves the original mainline behavior. To summarize, this option makes the logic clearer, improves analysis precision, and isolates the changes to G1. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1741554994 From rcastanedalo at openjdk.org Tue Sep 3 07:26:00 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 3 Sep 2024 07:26:00 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> Message-ID: On Fri, 30 Aug 2024 08:23:44 GMT, Roberto Casta?eda Lozano wrote: > I will study if the check in get_store_barrier is superseded by that in refine_barrier_by_new_val_type. If I can convince myself that this is the case I will consider removing the former. This was indeed the case, so I have removed the compile-time null check from `G1BarrierSetC2::get_store_barrier` (commit deac05d7) and simplified the code around it (commit 6f4027bf). I also added a few extra test cases to exercise stores on newly-allocated objects with different nullness information (commit 1ea2862f). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1741555725 From chagedorn at openjdk.org Tue Sep 3 07:39:49 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 3 Sep 2024 07:39:49 GMT Subject: RFR: 8323688: C2: Fix UB of jlong overflow in PhaseIdealLoop::is_counted_loop() Message-ID: The computation of `final_correction` in `PhaseIdealLoop::is_counted_loop()` could overflow which is UB: https://github.com/openjdk/jdk/blob/dc4fd896289db1d2f6f7bbf5795fec533448a48c/src/hotspot/share/opto/loopnode.cpp#L1958-L1967 `canonicalized_correction` equals `max_int - 1` if stride is `max_int`. `limit_correction` is at most `max_int - 1` in that case. Adding both together will overflow. I don't think that any compiler would wrongly optimize this and we have not observed any issues with that. But we should still fix this UB. The fix I propose is to simply bail out with very large positive and negative strides such that we avoid an over- or underflow with the existing logic (see added comments for how the upper bound for the stride is determined). These large strides should be very uncommon in practice and even if we encounter these, the loop would only run for a few iterations. So, a bailout seems fine. This bailout has the additional benefit that we avoid other possibly unknown issues or issues in the future with counted loops having large edge-case strides like `min_int`. Thanks, Christian ------------- Commit messages: - 8323688: C2: Fix UB of jlong overflow in PhaseIdealLoop::is_counted_loop() Changes: https://git.openjdk.org/jdk/pull/20828/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20828&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323688 Stats: 21 lines in 1 file changed: 20 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20828.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20828/head:pull/20828 PR: https://git.openjdk.org/jdk/pull/20828 From dfenacci at openjdk.org Tue Sep 3 07:42:25 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 3 Sep 2024 07:42:25 GMT Subject: RFR: 8333891: Method excluded with directive is not compiled after removal of directive In-Reply-To: References: <2xstE3V0PD8FGcijx_THSX1YgIJ7fZLponoL7b96TiY=.04ecae5f-9e3a-4c26-9893-72822f31c753@github.com> <21YY4Zhbx9XINm-d4yNhn_VU1ZKSHhtelyM6lTBLRIc=.834e0225-fbaf-4dc6-82af-4092a158316c@github.com> Message-ID: On Mon, 2 Sep 2024 20:39:43 GMT, Evgeny Astigeevich wrote: >> @eastig are you still working on this? Do you want to reopen it? > > Hi @dafedafe, > IMO as the fix is not simple it might not be worth to merge. The fix uses method flags. > I have not seen compiler directives are used a lot, especially the case: add a directive and remove the directive. > This can be waste of method flags. > This PR got linked to the JBS issue. When more users complain of the issue, we can reconsider the fix. You're right: the use-case seems quite a peculiar one. Thanks anyway @eastig. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19637#issuecomment-2325812710 From chagedorn at openjdk.org Tue Sep 3 08:01:21 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 3 Sep 2024 08:01:21 GMT Subject: RFR: 8338100: C2: assert(!n_loop->is_member(get_loop(lca))) failed: control must not be back in the loop In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 15:58:30 GMT, Roland Westrelin wrote: > The crash occurs because a `Store` is sunk out of a loop that's an > inner loop of an infinite loop. The infinite loop was just found to be > infinite in the current round of loop opts. When that happens the > infinite loop is not properly attached to the rest of the loop tree. As > a consequence, the `IdealLoopTree` instance for the infinite loop and > its children are only partially initialized (`_nest` is not set) and > the structure is an inconsistent state. > > When the `Store` is sunk it's reported as belonging to a loop but the > `IdealLoopTree` for that loop is only half populated. As a consequence > a call to `is_dominator` for that loop hits an inconsistency, returns > an incorrect result and the assert fires. > > A possible fix would be a point fix that skips that optimization for a > loop that's part of an infinite loop nest. But given basic methods of > loop opts can't be trusted to work in the infinite loop nest, I > suppose similar issues can surface elsewhere. > > It's not the first time, we have issues with an infinite loop that's > not properly attached to the loop tree the first time it is > encountered (a NeverBranch is then added and on the next loop passes, > the infinite loop is properly attached to the loop tree). For instance > on a loop opts round, C2 can see that it has no loops and on the next > that it has some. > > I propose fixing this by properly attaching the infinite loop to the > loop tree when it's first discovered. A comment in the code seems to > hint that it requires going over the graph again after the > `NeverBranch` is added but I don't think that's case. > > I changed the assert in `loopnode.cpp` because it was there to work > around the inconsistency I mentioned above (no loop in a round, some > loops on the next one). > > The change in `parse1.cpp` fixes an issue I ran into when testing the > fix. The existing logic doesn't properly detect an exception backedge. > > I added the test case from 8336478 to this. The problem there is that > an infinite loop contains a long counted loop. The long counted loop > is transformed into a loop nest which is a 2 step process that > requires 2 rounds of loop opts. But c2 finds an infinite loop in the > middle of the process which causes it to see no more loops and to not > attempt another round of loop opts. The assert fires because it finds > a long counted loop nest that's half transformed. The change I propose > here fixes this too. If we go with this fix, I'll close 8336478 as > duplicate of this one. Looks reasonable to me. src/hotspot/share/opto/loopnode.cpp line 5528: > 5526: // move to outer most loop with same header > 5527: l = m_loop; > 5528: for (;;) { Might be cleaner: Suggestion: while (true) { src/hotspot/share/opto/loopnode.cpp line 5538: > 5536: sort(_ltree_root, l); > 5537: // fix child link from parent > 5538: IdealLoopTree *p = l->_parent; Suggestion: IdealLoopTree* p = l->_parent; ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20797#pullrequestreview-2276646203 PR Review Comment: https://git.openjdk.org/jdk/pull/20797#discussion_r1741581086 PR Review Comment: https://git.openjdk.org/jdk/pull/20797#discussion_r1741582258 From fgao at openjdk.org Tue Sep 3 08:31:21 2024 From: fgao at openjdk.org (Fei Gao) Date: Tue, 3 Sep 2024 08:31:21 GMT Subject: RFR: 8336464: C2: Force CastX2P to be a two-address instruction In-Reply-To: <9HI9_zwkUQqTJDp-WhIpJzyRM2bUYXeR2NokG09u1yI=.ebe9f7ad-c7ea-42d1-b2bb-77a1689390f2@github.com> References: <9HI9_zwkUQqTJDp-WhIpJzyRM2bUYXeR2NokG09u1yI=.ebe9f7ad-c7ea-42d1-b2bb-77a1689390f2@github.com> Message-ID: On Fri, 12 Jul 2024 13:59:23 GMT, Fei Gao wrote: > This patch forces `CastX2P` to be a two-address instruction, so that C2 could allocate the same register for `dst` and `src`. Then we can remove the instruction completely in the assembly. > > The motivation comes from some cast operations like `castPP`. The difference for ADLC between `castPP` and `CastX2P` lies in that `CastX2P` always has different types for `dst` and `src`. We can force ADLC to generate an extra `two_adr()` for `CastX2P` like it does automatically for `castPP`, which could tell register allocator that the instruction needs the same register for `dst` and `src`. > > However, sometimes, RA and GCM in C2 can't work as we expected. > > For example, we have Assembly on the existing code: > > ldp x10, x11, [x17,#136] > add x10, x10, x15 > add x11, x11, x10 > ldr x12, [x17,#152] > str x16, [x10] > add x10, x12, x15 > str x16, [x11] > str x16, [x10] > > > After applying the patch independently, the assembly is: > > ldr x10, [x16,#136] <--- 1 > add x10, x10, x15 > ldr x11, [x16,#144] <--- 2 > mov x13, x10 <--- 3 > str x17, [x13] > ldr x12, [x16,#152] > add x10, x11, x10 > str x17, [x10] > add x10, x12, x15 > str x17, [x10] > > > C2 generates a totally extra `mov`, see 3, and we even lost the chance to merge load pair, see 1 and 2. That's terrible. > > Although this scenario would disappear after combining with https://github.com/openjdk/jdk/pull/20157, I'm still not sure if this patch is worthwhile. Thanks for all your review and comments! I'll come back when I find a better way. Now, I'd like to convert to draft :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20159#issuecomment-2325907854 From thartmann at openjdk.org Tue Sep 3 09:04:19 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 3 Sep 2024 09:04:19 GMT Subject: RFR: 8323688: C2: Fix UB of jlong overflow in PhaseIdealLoop::is_counted_loop() In-Reply-To: References: Message-ID: <1Q98-omNh7hM6C8jXhr2IFSPHnonfRG3Of1yThReCgk=.a9a3b5fa-ae26-4b69-b409-4e8fa08be0d8@github.com> On Tue, 3 Sep 2024 07:35:10 GMT, Christian Hagedorn wrote: > The computation of `final_correction` in `PhaseIdealLoop::is_counted_loop()` could overflow which is UB: > > https://github.com/openjdk/jdk/blob/dc4fd896289db1d2f6f7bbf5795fec533448a48c/src/hotspot/share/opto/loopnode.cpp#L1958-L1967 > > `canonicalized_correction` equals `max_int - 1` if stride is `max_int`. `limit_correction` is at most `max_int - 1` in that case. Adding both together will overflow. I don't think that any compiler would wrongly optimize this and we have not observed any issues with that. But we should still fix this UB. > > The fix I propose is to simply bail out with very large positive and negative strides such that we avoid an over- or underflow with the existing logic (see added comments for how the upper bound for the stride is determined). These large strides should be very uncommon in practice and even if we encounter these, the loop would only run for a few iterations. So, a bailout seems fine. This bailout has the additional benefit that we avoid other possibly unknown issues or issues in the future with counted loops having large edge-case strides like `min_int`. > > Thanks, > Christian Looks good to me otherwise. src/hotspot/share/opto/loopnode.cpp line 1974: > 1972: > 1973: // Check (vi) and bail out if the stride is too big. > 1974: if (stride_con == min_signed_integer(iv_bt) || ABS(stride_con) > max_signed_integer(iv_bt) / 2) { Suggestion: if (stride_con == min_signed_integer(iv_bt) || (ABS(stride_con) > max_signed_integer(iv_bt) / 2)) { ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20828#pullrequestreview-2276828546 PR Review Comment: https://git.openjdk.org/jdk/pull/20828#discussion_r1741693684 From thartmann at openjdk.org Tue Sep 3 09:10:19 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 3 Sep 2024 09:10:19 GMT Subject: RFR: 8338924: C1: assert(0 <= i && i < _len) failed: illegal index 5 for length 5 In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 18:01:16 GMT, Matias Saavedra Silva wrote: > The change in [JDK-8335664](https://bugs.openjdk.org/browse/JDK-8335664) caused a crash when initializing basic blocks with `-Xcomp`. This change introduces a check to see if JSR is the last bytecode in its method so that expected behavior matches the previous patch. Verified with tier 1-6 tests. test/hotspot/jtreg/runtime/interpreter/LastJsrTest.java line 27: > 25: * @test > 26: * @bug 8335664 > 27: * @bug 8338924 Drive-by comment: You can use `@bug 8335664 8338924` instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20732#discussion_r1741708114 From chagedorn at openjdk.org Tue Sep 3 09:30:32 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 3 Sep 2024 09:30:32 GMT Subject: RFR: 8323688: C2: Fix UB of jlong overflow in PhaseIdealLoop::is_counted_loop() [v2] In-Reply-To: References: Message-ID: > The computation of `final_correction` in `PhaseIdealLoop::is_counted_loop()` could overflow which is UB: > > https://github.com/openjdk/jdk/blob/dc4fd896289db1d2f6f7bbf5795fec533448a48c/src/hotspot/share/opto/loopnode.cpp#L1958-L1967 > > `canonicalized_correction` equals `max_int - 1` if stride is `max_int`. `limit_correction` is at most `max_int - 1` in that case. Adding both together will overflow. I don't think that any compiler would wrongly optimize this and we have not observed any issues with that. But we should still fix this UB. > > The fix I propose is to simply bail out with very large positive and negative strides such that we avoid an over- or underflow with the existing logic (see added comments for how the upper bound for the stride is determined). These large strides should be very uncommon in practice and even if we encounter these, the loop would only run for a few iterations. So, a bailout seems fine. This bailout has the additional benefit that we avoid other possibly unknown issues or issues in the future with counted loops having large edge-case strides like `min_int`. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20828/files - new: https://git.openjdk.org/jdk/pull/20828/files/f7ddf302..c8fa1491 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20828&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20828&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20828.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20828/head:pull/20828 PR: https://git.openjdk.org/jdk/pull/20828 From chagedorn at openjdk.org Tue Sep 3 09:30:32 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 3 Sep 2024 09:30:32 GMT Subject: RFR: 8323688: C2: Fix UB of jlong overflow in PhaseIdealLoop::is_counted_loop() [v2] In-Reply-To: <1Q98-omNh7hM6C8jXhr2IFSPHnonfRG3Of1yThReCgk=.a9a3b5fa-ae26-4b69-b409-4e8fa08be0d8@github.com> References: <1Q98-omNh7hM6C8jXhr2IFSPHnonfRG3Of1yThReCgk=.a9a3b5fa-ae26-4b69-b409-4e8fa08be0d8@github.com> Message-ID: On Tue, 3 Sep 2024 08:58:13 GMT, Tobias Hartmann wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Tobias Hartmann > > src/hotspot/share/opto/loopnode.cpp line 1974: > >> 1972: >> 1973: // Check (vi) and bail out if the stride is too big. >> 1974: if (stride_con == min_signed_integer(iv_bt) || ABS(stride_con) > max_signed_integer(iv_bt) / 2) { > > Suggestion: > > if (stride_con == min_signed_integer(iv_bt) || (ABS(stride_con) > max_signed_integer(iv_bt) / 2)) { Thanks for the review, good point! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20828#discussion_r1741735035 From thartmann at openjdk.org Tue Sep 3 09:31:28 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 3 Sep 2024 09:31:28 GMT Subject: RFR: 8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v10] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 14:12:36 GMT, Damon Fenacci wrote: >> # Issue >> >> The test `compiler/startup/StartupOutput.java` fails intermittently due to a crash after correctly printing the error `Initial size of CodeCache is too small` (the test limits the code cache using `-XX:InitialCodeCacheSize=1024K -XX:ReservedCodeCacheSize=1200k`). >> The appearance of the issue is very dependent on thread scheduling. The original report happens during C1 initialization but C2 initialization is affected as well. >> >> # Causes >> >> There is one occurrence during C1 initialization and one during C2 initialization where a call to `RuntimeStub::new_runtime_stub` can fail fatally if there is not enough space left. >> For C1: `Compiler::init_c1_runtime` -> `Runtime1::initialize` -> `Runtime1::generate_blob_for` -> `Runtime1::generate_blob` -> `RuntimeStub::new_runtime_stub`. >> For C2: `C2Compiler::initialize` -> `OptoRuntime::generate` -> `OptoRuntime::generate_stub` -> `Compile::Compile` -> `Compile::Code_Gen` -> `PhaseOutput::install` -> `PhaseOutput::install_stub` -> `RuntimeStub::new_runtime_stub`. >> >> # Solution >> >> Instead of avoiding the crash it makes more sense to increase the minimum code cache size by adding the size of the minimal code cache needed for C1 and C2 to `CodeCacheMinimumUseSpace`. > > Damon Fenacci has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: > > - JDK-8326615: don't remove empty line between includes > - Merge tag 'jdk-24+13' into JDK-8326615 > > Added tag jdk-24+13 for changeset ff59532d > - JDK-8326615: add compiler present macros to includes > - JDK-8326615: update copyright year > - JDK-8326615: fix min code cache calculation > - JDK-8326615: remove empty line from problemlist > - Merge tag 'jdk-24+7' into JDK-8326615 > > Added tag jdk-24+7 for changeset 21a6cf84 > - JDK-8326615: calculate minimum code cache size based on initial compiler buffer sizes > - JDK-8326615 add forgotten problemlisted configuration after revert > - JDK-8326615 add forgotten problemlisted test after revert > - ... and 19 more: https://git.openjdk.org/jdk/compare/ff59532d...e7d977e2 That looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19280#pullrequestreview-2276900604 From tholenstein at openjdk.org Tue Sep 3 09:34:24 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 3 Sep 2024 09:34:24 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access In-Reply-To: References: Message-ID: On Mon, 26 Aug 2024 12:34:57 GMT, Roland Westrelin wrote: > Is igvn run between incremental inlining and the crash? Or is that all part of a single incremental inlining sequence? No, IGVN is not run. Yes, it is all part of a single incremental inlining sequence > In `LibraryCallKit::make_unsafe_address`, `base` is the `CheckCastPP`. What I don't quite understand is how we can get `top` out of `basic_plus_adr` if the `base` input is a `CheckCastPP`. `base` is ` 147 CheckCastPP === 136 71 [[ 150 149 ]] #java/lang/Object * (speculative=byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact * (inline_depth=2)) Oop:java/lang/Object * (speculative=byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact * (inline_depth=2)) !jvms: Test::helperSmall @ bci:11 (line 23) Test::accessSmallArray @ bci:7 (line 29) Test::test2 @ bci:2 (line 38) ` before https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2090 and `1 Con === 0 [[ ]] #top` after. Then `base` is top when we call `basic_plus_adr(base, offset)` right after. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20033#issuecomment-2326049334 From dfenacci at openjdk.org Tue Sep 3 09:36:21 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 3 Sep 2024 09:36:21 GMT Subject: RFR: 8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v5] In-Reply-To: <3R0sCk6svcTKQN308AI4JVcWRvIsgBUBAGfP385o0KM=.f08f8688-2528-4a1d-8df0-36d0e90aeba0@github.com> References: <3R0sCk6svcTKQN308AI4JVcWRvIsgBUBAGfP385o0KM=.f08f8688-2528-4a1d-8df0-36d0e90aeba0@github.com> Message-ID: On Wed, 24 Jul 2024 19:21:58 GMT, Vladimir Kozlov wrote: >> Damon Fenacci has updated the pull request incrementally with three additional commits since the last revision: >> >> - Update src/hotspot/share/gc/z/c1/zBarrierSetC1.cpp >> >> Co-authored-by: Tobias Hartmann >> - Update src/hotspot/share/gc/z/c1/zBarrierSetC1.cpp >> >> Co-authored-by: Tobias Hartmann >> - Update src/hotspot/share/gc/x/c1/xBarrierSetC1.cpp >> >> Co-authored-by: Tobias Hartmann > > Based on that we need to scale CodeCacheMinimumUseSpace based on number of C1 compiler threads to take into account buffer size: `NMethodSizeLimit` (or more precise `Compiler::code_buffer_size()`). And may be similar for C2 even so its buffer is not permanent. Size for C2 is difficult to determine because it is calculated based on compilation information. But may be we can use the same as for C1 as rough estimation. Thanks @vnkozlov @TobiHartmann for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19280#issuecomment-2326055186 From thartmann at openjdk.org Tue Sep 3 09:42:21 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 3 Sep 2024 09:42:21 GMT Subject: RFR: 8330159: [C2] Remove or clarify Compile::init_start [v5] In-Reply-To: References: <7Q9VAVqBUDDnqEcCOWeRh5N-YC0dsg9lyQsOv6xbO80=.6d55dc58-0281-45fd-a92f-d0f6ddd910cc@github.com> Message-ID: On Mon, 2 Sep 2024 11:10:35 GMT, Yagmur Eren wrote: >> Compile::init_start method contained only an assertion. To cleanup, this method is removed and the locations where this method is called are replaced with the corresponding assertion. See issue: [JDK-8330159](https://bugs.openjdk.org/browse/JDK-8330159) > > Yagmur Eren has updated the pull request incrementally with one additional commit since the last revision: > > adding NOT_DEBUG_RETURN instead of DEBUG_ONLY for Compile::verify_start src/hotspot/share/opto/compile.cpp line 1109: > 1107: > 1108: #ifdef ASSERT > 1109: // Install the StartNode on this compile object. I think this comment needs to be updated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20715#discussion_r1741754226 From thartmann at openjdk.org Tue Sep 3 09:43:19 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 3 Sep 2024 09:43:19 GMT Subject: RFR: 8323688: C2: Fix UB of jlong overflow in PhaseIdealLoop::is_counted_loop() [v2] In-Reply-To: References: Message-ID: <43fkNszYuHVqm-l_Vbndpek8wfe6ATRsp68JDIHkdn8=.ca48bf16-395d-486e-835b-e2c89e13d96f@github.com> On Tue, 3 Sep 2024 09:30:32 GMT, Christian Hagedorn wrote: >> The computation of `final_correction` in `PhaseIdealLoop::is_counted_loop()` could overflow which is UB: >> >> https://github.com/openjdk/jdk/blob/dc4fd896289db1d2f6f7bbf5795fec533448a48c/src/hotspot/share/opto/loopnode.cpp#L1958-L1967 >> >> `canonicalized_correction` equals `max_int - 1` if stride is `max_int`. `limit_correction` is at most `max_int - 1` in that case. Adding both together will overflow. I don't think that any compiler would wrongly optimize this and we have not observed any issues with that. But we should still fix this UB. >> >> The fix I propose is to simply bail out with very large positive and negative strides such that we avoid an over- or underflow with the existing logic (see added comments for how the upper bound for the stride is determined). These large strides should be very uncommon in practice and even if we encounter these, the loop would only run for a few iterations. So, a bailout seems fine. This bailout has the additional benefit that we avoid other possibly unknown issues or issues in the future with counted loops having large edge-case strides like `min_int`. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Tobias Hartmann Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20828#pullrequestreview-2276927883 From dfenacci at openjdk.org Tue Sep 3 09:48:25 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 3 Sep 2024 09:48:25 GMT Subject: Integrated: 8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) In-Reply-To: References: Message-ID: On Fri, 17 May 2024 09:37:01 GMT, Damon Fenacci wrote: > # Issue > > The test `compiler/startup/StartupOutput.java` fails intermittently due to a crash after correctly printing the error `Initial size of CodeCache is too small` (the test limits the code cache using `-XX:InitialCodeCacheSize=1024K -XX:ReservedCodeCacheSize=1200k`). > The appearance of the issue is very dependent on thread scheduling. The original report happens during C1 initialization but C2 initialization is affected as well. > > # Causes > > There is one occurrence during C1 initialization and one during C2 initialization where a call to `RuntimeStub::new_runtime_stub` can fail fatally if there is not enough space left. > For C1: `Compiler::init_c1_runtime` -> `Runtime1::initialize` -> `Runtime1::generate_blob_for` -> `Runtime1::generate_blob` -> `RuntimeStub::new_runtime_stub`. > For C2: `C2Compiler::initialize` -> `OptoRuntime::generate` -> `OptoRuntime::generate_stub` -> `Compile::Compile` -> `Compile::Code_Gen` -> `PhaseOutput::install` -> `PhaseOutput::install_stub` -> `RuntimeStub::new_runtime_stub`. > > # Solution > > Instead of avoiding the crash it makes more sense to increase the minimum code cache size by adding the size of the minimal code cache needed for C1 and C2 to `CodeCacheMinimumUseSpace`. This pull request has now been integrated. Changeset: 633fad8e Author: Damon Fenacci URL: https://git.openjdk.org/jdk/commit/633fad8e53109bef52190494a8b171035229d2ac Stats: 36 lines in 7 files changed: 26 ins; 3 del; 7 mod 8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/19280 From thartmann at openjdk.org Tue Sep 3 10:04:22 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 3 Sep 2024 10:04:22 GMT Subject: RFR: 8338100: C2: assert(!n_loop->is_member(get_loop(lca))) failed: control must not be back in the loop In-Reply-To: References: Message-ID: <54FDaMydeIPt3NE3lp3psLfTqvxxNFZPdGWf1V4a-q8=.2c512a4b-a029-457a-acae-b90136ff90d9@github.com> On Fri, 30 Aug 2024 15:58:30 GMT, Roland Westrelin wrote: > The crash occurs because a `Store` is sunk out of a loop that's an > inner loop of an infinite loop. The infinite loop was just found to be > infinite in the current round of loop opts. When that happens the > infinite loop is not properly attached to the rest of the loop tree. As > a consequence, the `IdealLoopTree` instance for the infinite loop and > its children are only partially initialized (`_nest` is not set) and > the structure is an inconsistent state. > > When the `Store` is sunk it's reported as belonging to a loop but the > `IdealLoopTree` for that loop is only half populated. As a consequence > a call to `is_dominator` for that loop hits an inconsistency, returns > an incorrect result and the assert fires. > > A possible fix would be a point fix that skips that optimization for a > loop that's part of an infinite loop nest. But given basic methods of > loop opts can't be trusted to work in the infinite loop nest, I > suppose similar issues can surface elsewhere. > > It's not the first time, we have issues with an infinite loop that's > not properly attached to the loop tree the first time it is > encountered (a NeverBranch is then added and on the next loop passes, > the infinite loop is properly attached to the loop tree). For instance > on a loop opts round, C2 can see that it has no loops and on the next > that it has some. > > I propose fixing this by properly attaching the infinite loop to the > loop tree when it's first discovered. A comment in the code seems to > hint that it requires going over the graph again after the > `NeverBranch` is added but I don't think that's case. > > I changed the assert in `loopnode.cpp` because it was there to work > around the inconsistency I mentioned above (no loop in a round, some > loops on the next one). > > The change in `parse1.cpp` fixes an issue I ran into when testing the > fix. The existing logic doesn't properly detect an exception backedge. > > I added the test case from 8336478 to this. The problem there is that > an infinite loop contains a long counted loop. The long counted loop > is transformed into a loop nest which is a 2 step process that > requires 2 rounds of loop opts. But c2 finds an infinite loop in the > middle of the process which causes it to see no more loops and to not > attempt another round of loop opts. The assert fires because it finds > a long counted loop nest that's half transformed. The change I propose > here fixes this too. If we go with this fix, I'll close 8336478 as > duplicate of this one. Thanks for the detailed summary. The fix looks reasonable to me. src/hotspot/share/opto/loopnode.cpp line 4591: > 4589: // Verify that the has_loops() flag set at parse time is consistent with the just built loop tree. When the back edge > 4590: // is an exception edge, parsing doesn't set has_loops(). > 4591: assert(_ltree_root->_child == nullptr || C->has_loops() || C->has_exception_backedge(), "parsing found no loops but there are some"); `PhaseIdealLoop::only_has_infinite_loops` is dead now and should be removed. src/hotspot/share/opto/loopnode.hpp line 1081: > 1079: // Place 'n' in some loop nest, where 'n' is a CFG node > 1080: void build_loop_tree(); > 1081: int build_loop_tree_impl(Node *n, int pre_order); Suggestion: int build_loop_tree_impl(Node* n, int pre_order); ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20797#pullrequestreview-2276967473 PR Review Comment: https://git.openjdk.org/jdk/pull/20797#discussion_r1741779954 PR Review Comment: https://git.openjdk.org/jdk/pull/20797#discussion_r1741782737 From kbarrett at openjdk.org Tue Sep 3 10:06:22 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 3 Sep 2024 10:06:22 GMT Subject: RFR: 8339242: Fix overflow issues in AdlArena [v2] In-Reply-To: <2S1gKrLw3byu7APaERxXdcscYjhlYt3Edv-TMRgkqso=.35838990-c7cd-4f27-b2d5-500a3a0f375c@github.com> References: <2S1gKrLw3byu7APaERxXdcscYjhlYt3Edv-TMRgkqso=.35838990-c7cd-4f27-b2d5-500a3a0f375c@github.com> Message-ID: On Mon, 2 Sep 2024 09:36:53 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> This PR addresses an issue in `adlArena` where some allocations lack checks for overflow. This could potentially result in successful allocations when called with unrealistic values. >> >> The fix includes: >> >> - Adding assertions to check for potential overflow. >> - Reordering some operations to guard against overflow. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > arena realloc overflow check Changes requested by kbarrett (Reviewer). src/hotspot/share/adlc/adlArena.cpp line 154: > 152: if( (c_old+old_size == _hwm) && // Adjusting recent thing > 153: ((size_t)(_max-c_old) >= new_size) ) { // Still fits where it sits, safe from overflow > 154: It appears that this change isn't worrying about bad `old_ptr` or `old_size` arguments, which is fine. But the code can be further improved by replacing lines 144-157 with something like // Reallocating the most recent allocation? if ((c_old + old_size) == _hwm) { assert(_chunk->bottom() <= c_old, "invariant"); // Reallocate in place if it fits. This also handles shrinking. if (pointer_delta(_max, c_old) >= new_size) { _hwm = c_old + new_size; return c_old; } } Of course, in adlc you can't use HotSpot's pointer_delta utility, so there you'll need to use something like what's in the PR for that calculation. Any check for an "unreasonable" size should happen in Amalloc, not here. ------------- PR Review: https://git.openjdk.org/jdk/pull/20774#pullrequestreview-2276975396 PR Review Comment: https://git.openjdk.org/jdk/pull/20774#discussion_r1741784871 From chagedorn at openjdk.org Tue Sep 3 11:48:30 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 3 Sep 2024 11:48:30 GMT Subject: RFR: 8338971: IGV: Add incrementally inlined method name to phase name Message-ID: This patch adds the method name to the incremental inlining step dumps in IGV which improves debugging issues involving incremental inlining: static void test() { method1(); method2(); method3(); } static void method1() {} static void method2() {} static void method3() {} Run with `-XX:+AlwaysIncrementalInline` and IGV print level >=3: Before patch: ![image](https://github.com/user-attachments/assets/a3a1ab32-e7b3-4ccb-8ab2-a75d2b5b6912) After patch: ![image](https://github.com/user-attachments/assets/8100a2fe-1670-4687-b8b8-c8053fbaa7d7) The patch just prints the method name if we call `print_method()` with `n` being a call node which, AFAICT, only happens for the incremental inlining step. However, even if we call it with another phase at some point, I don't think it hurts to also dump the method name there. #### Testing - Manually verifying change in IGV - Building IGV which runs its unit tests - Sanity run with a hello world program with `-Xcomp -XX:+AlwaysIncrementalInline -XX:+PrintIdealGraph -XX:PrintIdealGraphLevel=3 -XX:PrintIdealGraphFile=graph.xml` Thanks, Christian ------------- Commit messages: - 8338971: IGV: Add incrementally inlined method name to phase name Changes: https://git.openjdk.org/jdk/pull/20834/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20834&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338971 Stats: 8 lines in 1 file changed: 7 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20834.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20834/head:pull/20834 PR: https://git.openjdk.org/jdk/pull/20834 From mdoerr at openjdk.org Tue Sep 3 12:06:27 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Sep 2024 12:06:27 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v13] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 07:26:00 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with four additional commits since the last revision: > > - Increase test coverage of new-object stores with different type information > - Refactor the two post-barrier removal cases into a single expression > - Remove unnecessary early null-based post-barrier elision > - Make store capturability test G1-specific and more precise src/hotspot/cpu/aarch64/gc/g1/g1_aarch64.ad line 646: > 644: instruct g1LoadPVolatile(iRegPNoSp dst, indirect mem, iRegPNoSp tmp1, iRegPNoSp tmp2, rFlagsReg cr) > 645: %{ > 646: predicate(UseG1GC && needs_acquiring_load(n) && n->as_Load()->barrier_data() != 0); Remark: This node should never match because the `referent` is never volatile (same for `g1LoadNVolatile`): https://github.com/openjdk/jdk/blob/7a418fc07464fe359a0b45b6d797c65c573770cb/src/java.base/share/classes/java/lang/ref/Reference.java#L157 Hence, `needs_acquiring_load(n)` and `n->as_Load()->barrier_data() != 0` are never true at the same time. Not sure if this should somehow be reflected in the code. I've inserted `ShouldNotReachHere` on PPC64. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1741937425 From mdoerr at openjdk.org Tue Sep 3 12:15:27 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Sep 2024 12:15:27 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> <6PF-kgezzOb9Ed7j-BbrwaURnLJH5aFgOouFwTYiFrE=.670092b8-6980-43f3-a091-25312cfa0f1b@github.com> <1vQH6zpEgjhIO_mq9DCnpwxgDXmnqdx0owlvjJq4Fcw=.78e60c6e-2b23-4e94-a998-e7ba9eafcb6a@github.com> <6rvTU-KY2DpLx3sK7zmkaZCBaNELr3FLGItGwUJzNUM=.0e75c7a7-4309-49c5-b48b-5aa642bcbe43@github.com> Message-ID: On Thu, 29 Aug 2024 08:37:24 GMT, Albert Mingkun Yang wrote: >> Thanks, I prototyped the refactored version for both x64 and aarch64 [here](https://github.com/robcasloz/jdk/commit/c1ae871eadac0d44981b7892ac8f7b64e8734283). I do not have a strong opinion for or against this refactoring. @albertnetymk @kimbarrett what do you think about it? (asking since you have recently looked at this code). > > I find the use of default-arg-value `bool decode_new_val = false` a bit confusing. (I tend to think default-arg-value makes the code less readable in general.) > > If not using default-arg-value, I suspect the diff will be larger, and I don't see the immediate benefit of this refactoring. Maybe this can be deferred to its own PR if it's really desirable? @albertnetymk: FYI: The basic idea was to make compressed Oops optimizations easier. It allows using shorter decoding sequences and removing redundant null checks in the fast path. I've implemented it on PPC64: https://github.com/TheRealMDoerr/jdk/blob/ed9c0232f53a15d768804348e1d8a111fed9a19e/src/hotspot/cpu/ppc/gc/g1/g1BarrierSetAssembler_ppc.cpp#L471 But, I'm ok with postponing it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1741950634 From epeter at openjdk.org Tue Sep 3 12:15:31 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Sep 2024 12:15:31 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <7bghGF2-qbhP1hJA2ljtdA3xSUSqiV0RLaOYm4AcZSQ=.eb3e36b2-5461-4755-ae71-2de89660649f@github.com> On Thu, 29 Aug 2024 05:42:58 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding descriptive comments Ok, I left a few more comments. Generally, this looks like a nice feature, thanks for implementing it @jatin-bhateja ! ? A few issues with code style (camelCase vs snake_case). I'm also wondering about good naming. Why did we/you chose "select" for this? Why not "shuffle"? Does "select" not often get used as synonym of "blend", which has different semantics? Also: I'm a little worried about the semantics change of the RearrangeNode that you did with the changes in `RearrangeNode::Ideal`. It looks a little "hacky", especially in conjunction with the `vector_indexes_needs_massaging` method. Can you give a clear definition of the semantics of `RearrangeNode` and `vector_indexes_needs_massaging`, please? I also added some control questions for testing. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 6446: > 6444: } > 6445: > 6446: void C2_MacroAssembler::select_from_two_vector_evex(BasicType elem_bt, XMMRegister dst, XMMRegister src1, I also wonder if you could use the plural in these cases? You are selecting from two vectors, with the plural "s". Of course it is a bit annoying if you would have to name the IR node `SelectFromTwoVectors`, because we usually name the vector nodes `...Vector`, without the plural "s". src/hotspot/share/opto/library_call.cpp line 749: > 747: return inline_vector_compress_expand(); > 748: case vmIntrinsics::_VectorSelectFromTwoVectorOp: > 749: return inline_vector_select_from_two_vectors(); Interesting, here you use the correct plural "vectors". src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 544: > 542: byte[] vpayload1 = ((ByteVector)v1).vec(); > 543: byte[] vpayload2 = ((ByteVector)v2).vec(); > 544: byte[] vpayload3 = ((ByteVector)v3).vec(); Is there a reason you are not using more descriptive names here instead of `vpayload1`? I also wonder if the `selectFromHelper` should not be named more specifically: `selectFromTwoVector(s)Helper`? src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 2595: > 2593: @ForceInline > 2594: final ByteVector selectFromTemplate(ByteVector v1, ByteVector v2) { > 2595: int twovectorlen = length() * 2; `twovectorlen` -> `twoVectorLen` I think in Java we are supposed to use camelCase src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Vector.java line 2770: > 2768: > 2769: /** > 2770: * Rearranges the lane elements of two vectors, selecting lanes I have a bit of a name concern here. Why are we calling it "select" and not "rearrange"? Because for a single "from" vector we also call it "rearrange", right? Is "select" not often synonymous to "blend", which works also with two "from" vectors, but with a mask and not indexing for "selection/rearranging"? test/jdk/jdk/incubator/vector/Byte128VectorTests.java line 324: > 322: boolean is_exceptional_idx = (int)order[idx] >= vector_len; > 323: int oidx = is_exceptional_idx ? ((int)order[idx] - vector_len) : (int)order[idx]; > 324: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx])); I thought general Java style is camelCase? Is that not followed in the VectorAPI code? test/jdk/jdk/incubator/vector/ShortMaxVectorTests.java line 1048: > 1046: return SHORT_GENERATOR_SELECT_FROM_TRIPLES.stream().map(List::toArray). > 1047: toArray(Object[][]::new); > 1048: } Just a control question: does this also occasionally generate examples with out-of-bounds indices? Negative out of bounds and positive out of bounds? test/jdk/jdk/incubator/vector/ShortMaxVectorTests.java line 5812: > 5810: ShortVector bv = ShortVector.fromArray(SPECIES, b, i); > 5811: ShortVector idxv = ShortVector.fromArray(SPECIES, idx, i); > 5812: idxv.selectFrom(av, bv).intoArray(r, i); Would this test catch a bug where the backend would generate vectors that are too long or too short? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20508#pullrequestreview-2276944129 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741766060 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741773766 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741914524 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741911809 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741919025 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741920940 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741947885 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741949290 From epeter at openjdk.org Tue Sep 3 12:15:32 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Sep 2024 12:15:32 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 30 Aug 2024 14:02:46 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding descriptive comments > > src/hotspot/cpu/x86/x86.ad line 10490: > >> 10488: >> 10489: >> 10490: instruct selectFromTwoVec_evex(vec dst, vec src1, vec src2) > > You could rename `dst` -> `mask_and_dst`. That would maybe help the reader to more quickly know that it is an input-mask and output-dst. Also, for consistency, I would write out the name `selectFromTwoVector(s)_evex` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741772354 From mdoerr at openjdk.org Tue Sep 3 12:20:25 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Sep 2024 12:20:25 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: <3ME602kl5NBEdiWT53M-B-YHtyU4yYgwc-lVFPxjiKg=.7298f764-788a-4c79-a2aa-324407c63813@github.com> References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> <3ME602kl5NBEdiWT53M-B-YHtyU4yYgwc-lVFPxjiKg=.7298f764-788a-4c79-a2aa-324407c63813@github.com> Message-ID: On Tue, 3 Sep 2024 07:22:32 GMT, Roberto Casta?eda Lozano wrote: >>> I've only looked at the changes in gc directories (shared and cpu-specific). >> >> Thanks for your suggestions and comments, Kim! I have addressed them now (modulo a couple of follow-up experiments that I will try out in the next days), please let me know if there is any further question. > >> Thanks for your suggestions and comments, Kim! I have addressed them now (modulo a couple of follow-up experiments that I will try out in the next days), please let me know if there is any further question. > > @kimbarrett I have addressed all your comments now (including follow-up enhancements), please re-review. @robcasloz: Thanks for the updates! I have an implementation for PPC64: https://github.com/TheRealMDoerr/jdk/commit/ed9c0232f53a15d768804348e1d8a111fed9a19e Do you prefer integrating it soon? Otherwise, I could keep it separate and do more rebasing and testing while this PR evolves further. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2326378191 From duke at openjdk.org Tue Sep 3 12:42:26 2024 From: duke at openjdk.org (Yagmur Eren) Date: Tue, 3 Sep 2024 12:42:26 GMT Subject: RFR: 8330159: [C2] Remove or clarify Compile::init_start [v5] In-Reply-To: References: <7Q9VAVqBUDDnqEcCOWeRh5N-YC0dsg9lyQsOv6xbO80=.6d55dc58-0281-45fd-a92f-d0f6ddd910cc@github.com> Message-ID: On Tue, 3 Sep 2024 09:39:36 GMT, Tobias Hartmann wrote: >> Yagmur Eren has updated the pull request incrementally with one additional commit since the last revision: >> >> adding NOT_DEBUG_RETURN instead of DEBUG_ONLY for Compile::verify_start > > src/hotspot/share/opto/compile.cpp line 1109: > >> 1107: >> 1108: #ifdef ASSERT >> 1109: // Install the StartNode on this compile object. > > I think this comment needs to be updated. I believe the following may sound good briefly?: "Verify that the current StartNode is valid." ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20715#discussion_r1741992128 From duke at openjdk.org Tue Sep 3 12:45:19 2024 From: duke at openjdk.org (Yagmur Eren) Date: Tue, 3 Sep 2024 12:45:19 GMT Subject: RFR: 8330159: [C2] Remove or clarify Compile::init_start [v5] In-Reply-To: <2pFdztomWU60_fYA092wBtpPyGjwuprzlzC1nj8xyMk=.2ae0fc34-b055-48ae-8320-7d014a148064@github.com> References: <7Q9VAVqBUDDnqEcCOWeRh5N-YC0dsg9lyQsOv6xbO80=.6d55dc58-0281-45fd-a92f-d0f6ddd910cc@github.com> <2pFdztomWU60_fYA092wBtpPyGjwuprzlzC1nj8xyMk=.2ae0fc34-b055-48ae-8320-7d014a148064@github.com> Message-ID: <7UpM8j5ekKQZlNCAblwAZpAJZbBbCOW4InrH9VpS8fc=.03400512-dd6f-4533-a6f6-30ea0712fcf3@github.com> On Mon, 2 Sep 2024 11:12:33 GMT, Yagmur Eren wrote: >> Yagmur Eren has updated the pull request incrementally with one additional commit since the last revision: >> >> adding NOT_DEBUG_RETURN instead of DEBUG_ONLY for Compile::verify_start > > Thanks a lot for the review and suggestions @dean-long and @chhagedorn! I believe I can integrate now if it looks good! > Hi @nelanbu, I don't think this is correct. In `Compile::start()`, we have the following code: > > https://github.com/openjdk/jdk/blob/b8e8e965e541881605f9dbcd4d9871d4952b9232/src/hotspot/share/opto/compile.cpp#L1121-L1131 > > It asserts that `failing()` is false. Therefore, `init_start()` bails out before checking the assert with `start()` which you now no longer do with your refactoring. > > What you could do instead: > > * Simplify the code in `init_start()` to and add an assertion message: > > ``` > assert(failing() || s == start(), "should be StartNode"); > ``` > > * Change `init_start_node()` into a more meaningful name like `verify_start()`, as we are not actually initializing anything but rather sanity checking the start node. > * Guard the method with `DEBUG_ONLY/ifdef ASSERT` since it's only calling an assert in debug VM and nothing in product VM. Is `failing()` true if it fails as the name suggests? If so, then I guess it should be `!failing()` within `assert`, right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20715#issuecomment-2326427885 From epeter at openjdk.org Tue Sep 3 12:54:24 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Sep 2024 12:54:24 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v5] In-Reply-To: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> References: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> Message-ID: On Mon, 2 Sep 2024 12:20:59 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolved Ok. This is a huge change. And you do not just introduce changes to the VectorAPI and add Vector instructions. But you also add the scalar instructions. Can you split this into at least 2 PR's that are smaller please? - Scalar saturating instructions: they could even be made available to the user via `Integer.saturatingAdd` etc. Would that not be desired? - Vector saturating instructions I'm afraid that now you are not using the scalar ops individually at all, and they are only used as fallback when the vector-api code is not intrinsified. But how can we test this properly? I'm just not very happy having to review 9K+ PR's ? src/hotspot/cpu/x86/assembler_x86.cpp line 560: > 558: } > 559: > 560: bool Assembler::needs_evex(XMMRegister reg1, XMMRegister reg2, XMMRegister reg3) { This is an ASSERT / DEBUG only method, correct? Do you want to `#ifdef ASSERT` it accordingly? src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 914: > 912: case T_SHORT: vpminuw(dst, src1, src2, vlen_enc); break; > 913: case T_INT: vpminud(dst, src1, src2, vlen_enc); break; > 914: case T_LONG: evpminuq(dst, k0, src1, src2, false, vlen_enc); break; Can you explain to me what the `k0` is and where it comes from? src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 929: > 927: } > 928: > 929: void C2_MacroAssembler::vpuminmaxq(int opcode, XMMRegister dst, XMMRegister src1, XMMRegister src2, XMMRegister xtmp1, XMMRegister xtmp2, int vlen_enc) { Either wrap all inputs or none ;) src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4705: > 4703: default: > 4704: fatal("Unsupported operation %s", NodeClassNames[ideal_opc]); > 4705: break; Did you mean to explicitly mention these cases as unsupported? If yes, please add a comment to the code why. src/hotspot/cpu/x86/x86.ad line 6527: > 6525: %} > 6526: ins_pipe( pipe_slow ); > 6527: %} Should change the `uminmax_reg` to indicate that it is a `vector` operation? The `format` already says `vector_uminmax_reg`... Because what if we one day want to use the name `uminmax_reg` for a scalar operation? src/hotspot/share/opto/addnode.hpp line 194: > 192: class SaturatingAddINode : public Node { > 193: public: > 194: SaturatingAddINode(Node* in1, Node* in2) : Node(in1,in2) {} Suggestion: SaturatingAddINode(Node* in1, Node* in2) : Node(in1, in2) {} In other places below as well. src/hotspot/share/opto/addnode.hpp line 198: > 196: virtual const Type* bottom_type() const { return TypeInt::INT; } > 197: virtual uint ideal_reg() const { return Op_RegI; } > 198: }; Are these not supposed to inherit from the `AddNode`, and then override the corresponding methods? Or are you making them separate for a good reason? src/hotspot/share/opto/addnode.hpp line 462: > 460: //------------------------------UMaxINode--------------------------------------- > 461: // Maximum of 2 unsigned integers. > 462: class UMaxLNode : public Node { Here you comment it with `UMaxINode`, but below it is the `UMaxLNode`. The `-------xyz------` comments are really useless. But the semantics description is useful (though you again say integer instead of long here...). src/hotspot/share/opto/matcher.hpp line 380: > 378: static BasicType vector_element_basic_type(const MachNode* use, const MachOper* opnd); > 379: static const Type* vector_element_type(const Node* n); > 380: static const Type* vector_element_type(const MachNode* use, const MachOper* opnd); You should probably create your own section for this, since this is not about the **basic** type. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2277262281 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741956515 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741964463 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741961089 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741971197 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741976975 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741990855 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741984047 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741987722 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741997411 From chagedorn at openjdk.org Tue Sep 3 13:06:23 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 3 Sep 2024 13:06:23 GMT Subject: RFR: 8330159: [C2] Remove or clarify Compile::init_start [v5] In-Reply-To: <7UpM8j5ekKQZlNCAblwAZpAJZbBbCOW4InrH9VpS8fc=.03400512-dd6f-4533-a6f6-30ea0712fcf3@github.com> References: <7Q9VAVqBUDDnqEcCOWeRh5N-YC0dsg9lyQsOv6xbO80=.6d55dc58-0281-45fd-a92f-d0f6ddd910cc@github.com> <2pFdztomWU60_fYA092wBtpPyGjwuprzlzC1nj8xyMk=.2ae0fc34-b055-48ae-8320-7d014a148064@github.com> <7UpM8j5ekKQZlNCAblwAZpAJZbBbCOW4InrH9VpS8fc=.03400512-dd6f-4533-a6f6-30ea0712fcf3@github.com> Message-ID: On Tue, 3 Sep 2024 12:42:23 GMT, Yagmur Eren wrote: > > Hi @nelanbu, I don't think this is correct. In `Compile::start()`, we have the following code: > > https://github.com/openjdk/jdk/blob/b8e8e965e541881605f9dbcd4d9871d4952b9232/src/hotspot/share/opto/compile.cpp#L1121-L1131 > > > > It asserts that `failing()` is false. Therefore, `init_start()` bails out before checking the assert with `start()` which you now no longer do with your refactoring. > > What you could do instead: > > > > * Simplify the code in `init_start()` to and add an assertion message: > > > > ``` > > assert(failing() || s == start(), "should be StartNode"); > > ``` > > > > > > > > > > > > > > > > > > > > > > > > > > * Change `init_start_node()` into a more meaningful name like `verify_start()`, as we are not actually initializing anything but rather sanity checking the start node. > > * Guard the method with `DEBUG_ONLY/ifdef ASSERT` since it's only calling an assert in debug VM and nothing in product VM. > > Is `failing()` true if it fails as the name suggests? If so, then I guess it should be `!failing()` within `assert`, right? No, the way the assert works is that if we fail (i.e. `failing()` is true), we do not actually want to check `s == start()` because `start()` requires `failing()` evaluating to false. So, whenever `failing()` is true, the first part of the assert makes the assertion true and we stop evaluating. But usually it is false, so we continue evaluating the second part. We also use this trick with `||`-ing conditions at other places, for example here: https://github.com/openjdk/jdk/blob/e0c46d589b12aa644e12e4a4c9e84e035f7cf98d/src/hotspot/share/opto/callnode.cpp#L1291 Whenever `n` is null, the first part of the assert is true and makes the entire assert true. Only if `n` is non-null, we will evaluate the second and interesting part of the assert. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20715#issuecomment-2326473113 From epeter at openjdk.org Tue Sep 3 13:06:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Sep 2024 13:06:27 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v5] In-Reply-To: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> References: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> Message-ID: On Mon, 2 Sep 2024 12:20:59 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolved Ok, I left a few more comments. I think this PR could definately be split. It would make it more reviewable for me. src/hotspot/share/opto/vectornode.hpp line 148: > 146: > 147: //===========================Vector=ALU=Operations============================= > 148: class SaturatingVectorNode : public VectorNode { Semantics description of Saturation would be appreciated :) src/hotspot/share/opto/vectornode.hpp line 634: > 632: virtual int Opcode() const; > 633: }; > 634: This could also be a separate PR. Or are they somehow inseparable from the "saturation" changes? src/hotspot/share/prims/vectorSupport.hpp line 129: > 127: VECTOR_OP_SUSUB = 122, > 128: VECTOR_OP_UMIN = 123, > 129: VECTOR_OP_UMAX = 124, Please keep the alignment consistent. src/java.base/share/classes/java/lang/Integer.java line 1994: > 1992: * @return the greater of {@code a} and {@code b} > 1993: * @see java.util.function.BinaryOperator > 1994: * @since 1.8 Is this a copy error or did this already exist since `1.8`? src/java.base/share/classes/jdk/internal/vm/vector/VectorSupport.java line 395: > 393: > 394: /* ============================================================================ */ > 395: These comment lines seem redundant... test/jdk/jdk/incubator/vector/gen-template.sh line 317: > 315: function gen_saturating_binary_op { > 316: echo "Generating binary op $1 ($2)..." > 317: # gen_op_tmpl $binary_scalar "$@" Is this commented on purpose? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2277361678 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742016482 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742019985 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742021810 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742024534 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742026062 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742028394 From epeter at openjdk.org Tue Sep 3 13:12:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Sep 2024 13:12:22 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v5] In-Reply-To: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> References: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> Message-ID: On Mon, 2 Sep 2024 12:20:59 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolved I really like the additions here. More scalar ops and vector ops are fantastic! But I'd like you to split it into scalar and vector changes. Because on both sides we'll have to do some review work to get it all right. You did in fact add `java/lang` methods. I think you need to add tests for all of those. As well. That's going to be even more code to review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2326480778 PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2326486187 From rcastanedalo at openjdk.org Tue Sep 3 13:30:23 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 3 Sep 2024 13:30:23 GMT Subject: RFR: 8338971: IGV: Add incrementally inlined method name to phase name In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 11:43:57 GMT, Christian Hagedorn wrote: > This patch adds the method name to the incremental inlining step dumps in IGV which improves debugging issues involving incremental inlining: > > > static void test() { > method1(); > method2(); > method3(); > } > > static void method1() {} > static void method2() {} > static void method3() {} > > Run with `-XX:+AlwaysIncrementalInline` and IGV print level >=3: > > Before patch: > ![image](https://github.com/user-attachments/assets/a3a1ab32-e7b3-4ccb-8ab2-a75d2b5b6912) > > After patch: > ![image](https://github.com/user-attachments/assets/8100a2fe-1670-4687-b8b8-c8053fbaa7d7) > > The patch just prints the method name if we call `print_method()` with `n` being a call node which, AFAICT, only happens for the incremental inlining step. However, even if we call it with another phase at some point, I don't think it hurts to also dump the method name there. > > #### Testing > - Manually verifying change in IGV > - Building IGV which runs its unit tests > - Sanity run with a hello world program with `-Xcomp -XX:+AlwaysIncrementalInline -XX:+PrintIdealGraph -XX:PrintIdealGraphLevel=3 -XX:PrintIdealGraphFile=graph.xml` > > Thanks, > Christian Looks good otherwise! src/hotspot/share/opto/compile.cpp line 5206: > 5204: call->method()->print_short_name(&ss); > 5205: } > 5206: } Suggestion: using `call->_name` instead of `call->method()->print_short_name()` is slightly simpler and more general (should be equivalent for incremental inlining, but will also print stub names, "uncommon trap", etc. when dumping `PHASE_AFTER_ITER_GVN_STEP` graphs on call nodes). Suggestion: ss.print(": %d %s", n->_idx, NodeClassNames[n->Opcode()]); if (n->is_Call()) { CallNode* call = n->as_Call(); if (call->_name != nullptr) { ss.print(" - %s", call->_name); } } ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20834#pullrequestreview-2277443616 PR Review Comment: https://git.openjdk.org/jdk/pull/20834#discussion_r1742063501 From thartmann at openjdk.org Tue Sep 3 13:30:25 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 3 Sep 2024 13:30:25 GMT Subject: RFR: 8330159: [C2] Remove or clarify Compile::init_start [v5] In-Reply-To: References: <7Q9VAVqBUDDnqEcCOWeRh5N-YC0dsg9lyQsOv6xbO80=.6d55dc58-0281-45fd-a92f-d0f6ddd910cc@github.com> Message-ID: <0fiyr9zaMIkQlHd3cf5-9_V6iecso-JCavR3lpObMk4=.37f7ce9b-0727-4fd1-9d23-147952dbb3b2@github.com> On Tue, 3 Sep 2024 12:39:52 GMT, Yagmur Eren wrote: >> src/hotspot/share/opto/compile.cpp line 1109: >> >>> 1107: >>> 1108: #ifdef ASSERT >>> 1109: // Install the StartNode on this compile object. >> >> I think this comment needs to be updated. > > I believe the following may sound good briefly?: "Verify that the current StartNode is valid." Sounds good to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20715#discussion_r1742065594 From duke at openjdk.org Tue Sep 3 13:46:57 2024 From: duke at openjdk.org (Casper Norrbin) Date: Tue, 3 Sep 2024 13:46:57 GMT Subject: RFR: 8339242: Fix overflow issues in AdlArena [v3] In-Reply-To: References: Message-ID: > Hi everyone, > > This PR addresses an issue in `adlArena` where some allocations lack checks for overflow. This could potentially result in successful allocations when called with unrealistic values. > > The fix includes: > > - Adding assertions to check for potential overflow. > - Reordering some operations to guard against overflow. Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: saturated pointer adds + size asserts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20774/files - new: https://git.openjdk.org/jdk/pull/20774/files/cf0b4348..b30f188c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20774&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20774&range=01-02 Stats: 36 lines in 4 files changed: 20 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/20774.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20774/head:pull/20774 PR: https://git.openjdk.org/jdk/pull/20774 From duke at openjdk.org Tue Sep 3 13:49:51 2024 From: duke at openjdk.org (Yagmur Eren) Date: Tue, 3 Sep 2024 13:49:51 GMT Subject: RFR: 8330159: [C2] Remove or clarify Compile::init_start [v6] In-Reply-To: <7Q9VAVqBUDDnqEcCOWeRh5N-YC0dsg9lyQsOv6xbO80=.6d55dc58-0281-45fd-a92f-d0f6ddd910cc@github.com> References: <7Q9VAVqBUDDnqEcCOWeRh5N-YC0dsg9lyQsOv6xbO80=.6d55dc58-0281-45fd-a92f-d0f6ddd910cc@github.com> Message-ID: > Compile::init_start method contained only an assertion. To cleanup, this method is removed and the locations where this method is called are replaced with the corresponding assertion. See issue: [JDK-8330159](https://bugs.openjdk.org/browse/JDK-8330159) Yagmur Eren has updated the pull request incrementally with one additional commit since the last revision: Update Compile::verify_init comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20715/files - new: https://git.openjdk.org/jdk/pull/20715/files/66e23d6f..05b94113 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20715&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20715&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20715.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20715/head:pull/20715 PR: https://git.openjdk.org/jdk/pull/20715 From jvernee at openjdk.org Tue Sep 3 13:49:57 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 3 Sep 2024 13:49:57 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded Message-ID: As discussed in the JBS issue: FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance.
Performance numbers x64: before: Benchmark Mode Cnt Score Error Units Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op after: Benchmark Mode Cnt Score Error Units Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op aarch64: before: Benchmark Mode Cnt Score Error Units Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op after: Benchmark Mode Cnt Score Error Units Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op
As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. Testing: tier 1-4 ------------- Commit messages: - Add s390 changes - Merge branch 'master' into LoadVMTraget - Don't save/restore LR/CR + resolve_jobject on s390 - eyeball other platforms - call stub from upcall stub - reinit_heap_base - eyeball other platforms - Only test on Linux/AArch64 - aarch64 impl - load vmentry in on_entry using special stub - ... and 8 more: https://git.openjdk.org/jdk/compare/0e6bb514...8dcb14ff Changes: https://git.openjdk.org/jdk/pull/20479/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20479&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337753 Stats: 334 lines in 23 files changed: 257 ins; 26 del; 51 mod Patch: https://git.openjdk.org/jdk/pull/20479.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20479/head:pull/20479 PR: https://git.openjdk.org/jdk/pull/20479 From amitkumar at openjdk.org Tue Sep 3 13:49:57 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 3 Sep 2024 13:49:57 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded In-Reply-To: References: Message-ID: <6CeWpXBf60zq-H8SCQRoCP76TfwVHgSbxPG1RFR7E_8=.34c40871-4ae8-40a6-bc6f-94533c485903@github.com> On Tue, 6 Aug 2024 17:26:55 GMT, Jorn Vernee wrote: > As discussed in the JBS issue: > > FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` > > However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. > > The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. > >
> Performance numbers > x64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op > > > aarch64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op > >
> > As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. > > Testing: tier 1-4 RuntimeAddress will not work for s390x. @JornVernee would you please apply these changes. Thanks @TheRealMDoerr for pointing it out. I will create a JBS issue for implementing `resolve_global_jobject`. diff --git a/src/hotspot/cpu/s390/upcallLinker_s390.cpp b/src/hotspot/cpu/s390/upcallLinker_s390.cpp index 1b07522858f..8baad40a519 100644 --- a/src/hotspot/cpu/s390/upcallLinker_s390.cpp +++ b/src/hotspot/cpu/s390/upcallLinker_s390.cpp @@ -216,10 +216,11 @@ address UpcallLinker::make_upcall_stub(jobject receiver, Symbol* signature, arg_shuffle.generate(_masm, shuffle_reg, abi._shadow_space_bytes, frame::z_jit_out_preserve_size); __ block_comment("} argument_shuffle"); - __ block_comment("load target {"); + __ block_comment("load_target {"); __ load_const_optimized(Z_ARG1, (intptr_t)receiver); - __ call(RuntimeAddress(StubRoutines::upcall_stub_load_target())); // load taget Method* into Z_method - __ block_comment("} load target"); + __ load_const_optimized(call_target_address, StubRoutines::upcall_stub_load_target()); + __ call(call_target_address); // load taget Method* into Z_method + __ block_comment("} load_target"); __ z_lg(call_target_address, Address(Z_method, in_bytes(Method::from_compiled_offset()))); __ call(call_target_address); ------------- Changes requested by amitkumar (Committer). PR Review: https://git.openjdk.org/jdk/pull/20479#pullrequestreview-2276394431 From jvernee at openjdk.org Tue Sep 3 13:49:57 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 3 Sep 2024 13:49:57 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded In-Reply-To: References: Message-ID: <9_oOgD6Q5GisXSkq98pAsMwDZAyHEfAWLf5IUFWKIks=.cf7ac1ce-62b4-4c1e-8b0c-0f3ff06c9618@github.com> On Tue, 6 Aug 2024 17:26:55 GMT, Jorn Vernee wrote: > As discussed in the JBS issue: > > FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` > > However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. > > The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. > >
> Performance numbers > x64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op > > > aarch64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op > >
> > As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. > > Testing: tier 1-4 I had to adjust the stub size on x64 when running on fastdebug/shenandoah. That may also be needed on other platforms except for aarch64), but I can't test there. To find the right stub size, just increase `upcall_stub_code_base_size` to a high number (e.g. 2048), and then run this program: import java.lang.foreign.*; import java.lang.invoke.*; public class Main { public static void main(String[] args) throws Throwable { try (Arena arena = Arena.ofConfined()) { MemorySegment stub = Linker.nativeLinker().upcallStub( MethodHandles.empty(MethodType.methodType(void.class)), FunctionDescriptor.ofVoid(), arena); } } } $ javac -d classes Main.java $ java -cp classes -XX:+UseShenandoahGC -XX:+LogCompilation Main $ grep upcall_stub -A 1 hotspot_pid* Then set `upcall_stub_code_base_size` to be bigger than `size`. Using a static stub is still a bit slower, but much more in line with the performance of inline loads: Current: Benchmark Mode Cnt Score Error Units QSort.panama_upcall_qsort avgt 30 608.953 ? 3.047 ns/op Fully C++: Benchmark Mode Cnt Score Error Units QSort.panama_upcall_qsort avgt 30 725.142 ? 2.718 ns/op ~19% slower Static stub: Benchmark Mode Cnt Score Error Units QSort.panama_upcall_qsort avgt 30 627.661 ? 2.099 ns/op ~3% slower I think that's a good compromise. (Although I wish the C++ code was just as fast, as it's much nicer) > Some of the DecoratorSet should be applicable and improve performance. I gave `AS_NO_KEEPALIVE` a try. I'm not sure if that's correct, but it didn't really change performance. I'm not sure what other decorator would apply. I was looking at `ACCESS_READ`, but it seems that that can not be used for these loads. I've added the implementations with the stubs, I had to eyeball `s390, ppc, and rsicv, so testing is still needed on those platforms (GHA will do the cross builds). I also spent quite a while messing with the test. This test is quite unstable, because it creates a new thread pool for each test case, and then calls `ExecutorService::shutdownNow`, which doesn't allow submitted tasks to finish. It was running out of memory on some of the mac machine we test on in CI. I've made several attempts to make it more stable, but all of those resulted in the issue no longer being reproduced. For now I've restricted the test to linux/aarch64, since that's where we see the issue, and it seems stable enough to pass every time at least in out CI. If it causes issues though, I think we might have to just drop the test, or maybe mark it as `/manual`, so it doesn't run in CI. I'll be away until September. Will pick this back up then. With the latest version, we are slightly faster than the baseline: baseline: Benchmark Mode Cnt Score Error Units QSort.panama_upcall_qsort avgt 30 635.047 ? 2.181 ns/op Patched: Benchmark Mode Cnt Score Error Units QSort.panama_upcall_qsort avgt 30 625.385 ? 2.442 ns/op @TheRealMDoerr Thanks for all the suggestions! src/hotspot/cpu/x86/upcallLinker_x86_64.cpp line 311: > 309: Address(rbx, java_lang_invoke_ResolvedMethodName::vmtarget_offset()), > 310: noreg, noreg); > 311: __ movptr(Address(r15_thread, JavaThread::callee_target_offset()), rbx); // just in case callee is deoptimized FWIW, I tried to move this code to `UpcallLinker::on_entry`, but there is a speed cost to that (of around 10ns on my machine). Doing that results in several out-of-line calls to `AccessInternal::RuntimeDispatch<...>::_load_at_func` to load the values. It seems like a tradeoff between speed and space (in the code cache). I went with speed here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20479#issuecomment-2273455863 PR Comment: https://git.openjdk.org/jdk/pull/20479#issuecomment-2277864015 PR Comment: https://git.openjdk.org/jdk/pull/20479#issuecomment-2278175462 PR Comment: https://git.openjdk.org/jdk/pull/20479#issuecomment-2278557487 PR Comment: https://git.openjdk.org/jdk/pull/20479#issuecomment-2326568409 PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1706122881 From mdoerr at openjdk.org Tue Sep 3 13:49:57 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Sep 2024 13:49:57 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded In-Reply-To: References: Message-ID: On Tue, 6 Aug 2024 17:26:55 GMT, Jorn Vernee wrote: > As discussed in the JBS issue: > > FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` > > However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. > > The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. > >
> Performance numbers > x64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op > > > aarch64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op > >
> > As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. > > Testing: tier 1-4 Can't we do these nasty loads in C++ code and use `set_vm_result_2` in `UpcallLinker::on_entry`? The GC barriers can generate excessive amounts of code with some GCs. I guess that upcalls are less performance critical, so I'd prefer the other solution. Maybe the C++ code can get optimized better, too. Some of the `DecoratorSet` should be applicable and improve performance. If that doesn't help enough, maybe we should implement a dedicated static stub? There's no need to have the code replicated in each upcall stub. Also note that each `load_heap_oop` may save and restore registers which is actually only needed once. Regarding PPC64, I think that we could avoid PRESERVATION_FRAME_LR_GP_FP_REGS if we rearrange it such that the `load_heap_oop` is done at a place where the volatile regs are not live. But seems like this optimization is not available for other platforms. Some performance related remarks: - You could use `resolve_global_jobject` which is shorter and faster and exists on most platforms. - Using `vm_result_2` is no longer needed. The Method* can be directly passed in the method reg (or loaded from `callee_target`). - If you call the stub from assembler instead of from C++ you should be able to save some extra stuff like the frame. I'll check the PPC64 code later. @offamitkumar: Can you take a look at the s390 code, please? The cross build has failed. For the future: You may want to implement `resolve_global_jobject` which is shorter and faster and available on the other platforms. src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 4778: > 4776: StubCodeMark mark(this, "StubRoutines", "upcall_stub_load_target"); > 4777: address start = __ pc(); > 4778: __ save_LR_CR(R0); I think `save_LR_CR` and `restore_LR_CR` should get removed, too. CR is not live and LR is preserved everywhere below. But, I'll check this later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20479#issuecomment-2275524582 PR Comment: https://git.openjdk.org/jdk/pull/20479#issuecomment-2275707529 PR Comment: https://git.openjdk.org/jdk/pull/20479#issuecomment-2278240985 PR Comment: https://git.openjdk.org/jdk/pull/20479#issuecomment-2325103457 PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1713844568 From jvernee at openjdk.org Tue Sep 3 13:49:57 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 3 Sep 2024 13:49:57 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 10:51:59 GMT, Martin Doerr wrote: > Can't we do these nasty loads in C++ code and use set_vm_result_2 in UpcallLinker::on_entry? That's what I tried. I got a ~20% hit to execution time. > I guess that upcalls are less performance critical Why so? They are certainly much more rare than downcalls, but when they _are_ used, I think we'd like them to be fast. > Maybe the C++ code can get optimized better, too. I [tried](https://github.com/openjdk/jdk/commit/a2614ab77ef0ed493a819b970b31b939126c3da5) optimizing things by moving the accessors to `javaClasses.inline.hpp`, that helped the generated code a bit, but it didn't really improve speed. I think the problem is that we don't know at C++ compile time which barrier we need to use, since the GC is selected at runtime, while we do know when generating the stub. So, if we use C++, there will always be an out-of-line dispatch to the `_load_at` function for the particular GC. > Some of the DecoratorSet should be applicable and improve performance. If that doesn't help enough, maybe we should implement a dedicated static stub? There's no need to have the code replicated in each upcall stub. That's a good idea. If we can make that work, I'm all for it. P.S. giving that a try now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20479#issuecomment-2275658037 PR Comment: https://git.openjdk.org/jdk/pull/20479#issuecomment-2275861261 From mdoerr at openjdk.org Tue Sep 3 13:49:57 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Sep 2024 13:49:57 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded In-Reply-To: <9_oOgD6Q5GisXSkq98pAsMwDZAyHEfAWLf5IUFWKIks=.cf7ac1ce-62b4-4c1e-8b0c-0f3ff06c9618@github.com> References: <9_oOgD6Q5GisXSkq98pAsMwDZAyHEfAWLf5IUFWKIks=.cf7ac1ce-62b4-4c1e-8b0c-0f3ff06c9618@github.com> Message-ID: On Fri, 9 Aug 2024 12:44:45 GMT, Jorn Vernee wrote: > I think that's a good compromise. (Although I wish the C++ code was just as fast, as it's much nicer) Agreed. > I'm not sure what other decorator would apply. I think `ON_STRONG_OOP_REF` and `IS_NOT_NULL` could also be used. But, I guess that wouldn't make it fast enough. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20479#issuecomment-2278057468 From jvernee at openjdk.org Tue Sep 3 13:49:57 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 3 Sep 2024 13:49:57 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded In-Reply-To: <6CeWpXBf60zq-H8SCQRoCP76TfwVHgSbxPG1RFR7E_8=.34c40871-4ae8-40a6-bc6f-94533c485903@github.com> References: <6CeWpXBf60zq-H8SCQRoCP76TfwVHgSbxPG1RFR7E_8=.34c40871-4ae8-40a6-bc6f-94533c485903@github.com> Message-ID: <0WNgksf6ekZFo6urALPuINQ05farx7pAzkmw2sZqAB0=.a2e37bb0-04bd-4ac5-ab5a-9101813a2981@github.com> On Tue, 3 Sep 2024 04:52:08 GMT, Amit Kumar wrote: >> As discussed in the JBS issue: >> >> FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` >> >> However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. >> >> The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. >> >>
>> Performance numbers >> x64: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op >> >> >> aarch64: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op >> >>
>> >> As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. >> >> Testing: tier 1-4 > > RuntimeAddress will not work for s390x. @JornVernee would you please apply these changes. > > Thanks @TheRealMDoerr for pointing it out. I will create a JBS issue for implementing `resolve_global_jobject`. > > > diff --git a/src/hotspot/cpu/s390/upcallLinker_s390.cpp b/src/hotspot/cpu/s390/upcallLinker_s390.cpp > index 1b07522858f..8baad40a519 100644 > --- a/src/hotspot/cpu/s390/upcallLinker_s390.cpp > +++ b/src/hotspot/cpu/s390/upcallLinker_s390.cpp > @@ -216,10 +216,11 @@ address UpcallLinker::make_upcall_stub(jobject receiver, Symbol* signature, > arg_shuffle.generate(_masm, shuffle_reg, abi._shadow_space_bytes, frame::z_jit_out_preserve_size); > __ block_comment("} argument_shuffle"); > > - __ block_comment("load target {"); > + __ block_comment("load_target {"); > __ load_const_optimized(Z_ARG1, (intptr_t)receiver); > - __ call(RuntimeAddress(StubRoutines::upcall_stub_load_target())); // load taget Method* into Z_method > - __ block_comment("} load target"); > + __ load_const_optimized(call_target_address, StubRoutines::upcall_stub_load_target()); > + __ call(call_target_address); // load taget Method* into Z_method > + __ block_comment("} load_target"); > > __ z_lg(call_target_address, Address(Z_method, in_bytes(Method::from_compiled_offset()))); > __ call(call_target_address); @offamitkumar thanks, I've added those changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20479#issuecomment-2326422537 From duke at openjdk.org Tue Sep 3 13:58:20 2024 From: duke at openjdk.org (Casper Norrbin) Date: Tue, 3 Sep 2024 13:58:20 GMT Subject: RFR: 8339242: Fix overflow issues in AdlArena [v2] In-Reply-To: References: <2S1gKrLw3byu7APaERxXdcscYjhlYt3Edv-TMRgkqso=.35838990-c7cd-4f27-b2d5-500a3a0f375c@github.com> Message-ID: On Mon, 2 Sep 2024 11:54:40 GMT, Thomas Stuefe wrote: >> Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: >> >> arena realloc overflow check > > src/hotspot/share/memory/arena.cpp line 339: > >> 337: // See if we can resize in-place >> 338: if( (c_old+old_size == _hwm) && // Adjusting recent thing >> 339: ((size_t)(_max-c_old) >= corrected_new_size) ) { // Still fits where it sits, safe from overflow > > This change is correct, but it hides an important finding behind a reshuffling of parameters that someone else may innocently reshape later. It also makes the code less readable. Can we use something like saturated_add()? > > I would also add an explicit assert for a reasonable max size. Arena allocations should be small. Nobody should hand in sizes larger than a few MB, so asserting for size >= 2^31 (2g) would make sense. Anything as large as that is almost certainly an error we should trap on. @tstuefe I've now added asserts and changed this check to use saturated adds. The already existing `saturated_add()` only works on `int`s and `uint`s, so I made a version working on offsetting pointers. We still have to check for overflow (if the pointer addition was saturated), but the code becomes more readable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20774#discussion_r1742111702 From duke at openjdk.org Tue Sep 3 14:04:21 2024 From: duke at openjdk.org (Casper Norrbin) Date: Tue, 3 Sep 2024 14:04:21 GMT Subject: RFR: 8339242: Fix overflow issues in AdlArena [v2] In-Reply-To: References: <2S1gKrLw3byu7APaERxXdcscYjhlYt3Edv-TMRgkqso=.35838990-c7cd-4f27-b2d5-500a3a0f375c@github.com> Message-ID: On Tue, 3 Sep 2024 10:01:12 GMT, Kim Barrett wrote: >> Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: >> >> arena realloc overflow check > > src/hotspot/share/adlc/adlArena.cpp line 154: > >> 152: if( (c_old+old_size == _hwm) && // Adjusting recent thing >> 153: ((size_t)(_max-c_old) >= new_size) ) { // Still fits where it sits, safe from overflow >> 154: > > It appears that this change isn't worrying about bad `old_ptr` or `old_size` > arguments, which is fine. But the code can be further improved by replacing > lines 144-157 with something like > > // Reallocating the most recent allocation? > if ((c_old + old_size) == _hwm) { > assert(_chunk->bottom() <= c_old, "invariant"); > // Reallocate in place if it fits. This also handles shrinking. > if (pointer_delta(_max, c_old) >= new_size) { > _hwm = c_old + new_size; > return c_old; > } > } > > Of course, in adlc you can't use HotSpot's pointer_delta utility, so there > you'll need to use something like what's in the PR for that calculation. > > Any check for an "unreasonable" size should happen in Amalloc, not here. I believe this would miss the case where we shrink an allocation in place and we are not at the high water mark, where `new_size <= old_size`, but where `c_old + old_size) == _hwm` does not hold. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20774#discussion_r1742120796 From jvernee at openjdk.org Tue Sep 3 14:20:22 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 3 Sep 2024 14:20:22 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded In-Reply-To: References: Message-ID: On Tue, 6 Aug 2024 17:26:55 GMT, Jorn Vernee wrote: > As discussed in the JBS issue: > > FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` > > However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. > > The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. > >
> Performance numbers > x64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op > > > aarch64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op > >
> > As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. > > Testing: tier 1-4 src/hotspot/share/runtime/frame.cpp line 1132: > 1130: } > 1131: _cb->as_upcall_stub()->oops_do(f, *this); > 1132: } This was previously handled by overriding `preserve_callee_argument_oops` in `UpcallStub` as `ShouldNotReachHere`. That function was removed though. We should really have a check like this, since it helps rule out missed handling of the receiver handle which can cause GC issues, so I've added this here. I can move it to a separate PR as well, if preferred. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1742153421 From epeter at openjdk.org Tue Sep 3 14:26:31 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Sep 2024 14:26:31 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 16:30:26 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - fix compile errors > - Merge branch 'master' into unsignedbounds > - add comments > - Merge branch 'master' into unsignedbounds > - fix release build > - add comments, group arguments to reduce C-style reference passing arguments > - fix tests, add verify > - add unit tests > - fix template parameter > - refactor > - ... and 1 more: https://git.openjdk.org/jdk/compare/d8e4d3f2...d5ad9f1a This is great work! I'm struggling with the bit-manipulation, a lot of it looks a little magical to me ? . Some more comments or examples in the code would help me review. This is just a first pass, I'd have to go over this in more detail another time. src/hotspot/share/opto/rangeinference.cpp line 54: > 52: static AdjustResult> > 53: adjust_bounds_from_bits(const RangeInt& bounds, const KnownBits& bits) { > 54: static_assert(std::is_unsigned::value, ""); Suggestion: static_assert(std::is_unsigned::value, "only unsingned juint and julong allowed"); src/hotspot/share/opto/rangeinference.cpp line 56: > 54: static_assert(std::is_unsigned::value, ""); > 55: > 56: auto adjust_lo = [](T lo, const KnownBits& bits) { This does not even capture anything. Why not make it its own dedicated method with a nice name? I guess this could also be a member method of `KnownBits`. src/hotspot/share/opto/rangeinference.cpp line 62: > 60: if (zero_violation == one_violation) { > 61: return lo; > 62: } I need more explanation here. src/hotspot/share/opto/rangeinference.cpp line 94: > 92: return {true, false, {}}; > 93: } > 94: T new_hi = ~adjust_lo(~bounds._hi, {bits._ones, bits._zeros}); Wow, that looks like some magic ? Can you please explain this? src/hotspot/share/opto/rangeinference.cpp line 109: > 107: static AdjustResult> > 108: adjust_bits_from_bounds(const KnownBits& bits, const RangeInt& bounds) { > 109: static_assert(std::is_unsigned::value, ""); Again: could be a member function of `KnownBits`, where unsigned could be verified already... src/hotspot/share/opto/rangeinference.cpp line 113: > 111: T match_mask = mismatch == 0 ? std::numeric_limits::max() > 112: : ~(std::numeric_limits::max() >> count_leading_zeros(mismatch)); > 113: T new_zeros = bits._zeros | (match_mask &~ bounds._lo); Suggestion: T new_zeros = bits._zeros | (match_mask & ~bounds._lo); I think this was a typo? src/hotspot/share/opto/rangeinference.cpp line 143: > 141: } > 142: > 143: // Tighten all constraints of a TypeIntPrototype to its canonical form. Oh I like this definition of "canonical form". I think you should rename `normalize_constraints` -> `canonicalize`. Otherwise, you have two words, and you would need a definition for both "normalized" and "canonical". src/hotspot/share/opto/rangeinference.cpp line 152: > 150: TypeIntPrototype::normalize_constraints() const { > 151: static_assert(std::is_signed::value, ""); > 152: static_assert(std::is_unsigned::value, ""); I wonder if you should then generally use `T` for signed and unsigned, `S` for signed, and `U` for unsigned? src/hotspot/share/opto/rangeinference.cpp line 153: > 151: static_assert(std::is_signed::value, ""); > 152: static_assert(std::is_unsigned::value, ""); > 153: static_assert(sizeof(T) == sizeof(U), ""); Honestly, should this check not be done by `TypeIntPrototype` already? Or are you putting it here to make the pre-conditions explicit? src/hotspot/share/opto/rangeinference.cpp line 160: > 158: urange._lo > urange._hi || > 159: (_bits._zeros & _bits._ones) != 0) { > 160: return {false, {}}; We are returning a `Pair` here. I don't know what the `bool` stands for. Can you add some description at the top of the method, or make an explicit type instead of a `Pair` that makes it intuitive what we are returning here? src/hotspot/share/opto/rangeinference.cpp line 164: > 162: > 163: if (T(urange._lo) > T(urange._hi)) { > 164: if (T(urange._hi) < srange._lo) { These would be much more intuitive to read if `T -> S`! src/hotspot/share/opto/rangeinference.cpp line 434: > 432: #ifndef PRODUCT > 433: template > 434: static const char* intnamenear(T origin, const char* xname, char* buf, size_t buf_size, T n) { Suggestion: static const char* int_name_near(T origin, const char* xname, char* buf, size_t buf_size, T n) { simpler to read src/hotspot/share/opto/rangeinference.cpp line 444: > 442: return nullptr; > 443: } > 444: os::snprintf_checked(buf, buf_size, "%s+" INT32_FORMAT, xname, jint(n - origin)); Was there an explicit choice to use buffers, instead of `outputStream`? src/hotspot/share/opto/rangeinference.hpp line 44: > 42: T _zeros; > 43: T _ones; > 44: }; Are the `KnownBits` always unsigned? Why not name the type `U` and verify that it is unsigned? You could also have member-functions on it then, like the bounds adjustment. And then you can either have the type free (i.e. allow signed type), or require the same `U` type on the input ranges or values. src/hotspot/share/opto/type.cpp line 488: > 486: TypeInt::UBYTE = TypeInt::make(0, 255, WidenMin)->is_int(); // Unsigned Bytes > 487: TypeInt::CHAR = TypeInt::make(0,65535, WidenMin)->is_int(); // Java chars > 488: TypeInt::SHORT = TypeInt::make(-32768,32767, WidenMin)->is_int(); // Java shorts Suggestion: TypeInt::CHAR = TypeInt::make(0, 65535, WidenMin)->is_int(); // Java chars TypeInt::SHORT = TypeInt::make(-32768, 32767, WidenMin)->is_int(); // Java shorts Might as well fix the spacing now src/hotspot/share/opto/type.cpp line 500: > 498: assert( TypeInt::CC_EQ == TypeInt::ZERO, "types must match for CmpL to work" ); > 499: assert( TypeInt::CC_GE == TypeInt::BOOL, "types must match for CmpL to work" ); > 500: assert( (juint)(TypeInt::CC->_hi - TypeInt::CC->_lo) <= SMALLINT, "CC is truly small"); Why did you remove this check? src/hotspot/share/opto/type.hpp line 604: > 602: const jint _lo, _hi; // Lower bound, upper bound in the signed domain > 603: const juint _ulo, _uhi; // Lower bound, upper bound in the unsigned domain > 604: const juint _zeros, _ones; // Bits that are known to be 0 or 1 It would have been nice to have some sort of `Range` and `KnownBits` class... not sure how feasible this is, especially since `_lo` and `_hi` are already used all over the place... src/hotspot/share/opto/type.hpp line 606: > 604: const juint _zeros, _ones; // Bits that are known to be 0 or 1 > 605: > 606: static const TypeInt* cast(const Type* t) { return t->is_int(); } Is this even used? src/hotspot/share/opto/type.hpp line 617: > 615: bool is_con(jint i) const { return is_con() && _lo == i; } > 616: jint get_con() const { assert(is_con(), ""); return _lo; } > 617: // Check if a TypeInt is a subset of this TypeInt (i.e. all elements of the Suggestion: // Check if a jint or TypeInt is a subset of this TypeInt (i.e. all elements of the ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2277424756 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742093319 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742099992 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742108156 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742112203 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742114496 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742128592 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742137154 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742131718 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742139791 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742143974 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742145788 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742154568 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742158039 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742103749 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742077724 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742078511 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742057104 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742060868 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742065136 From epeter at openjdk.org Tue Sep 3 14:26:32 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Sep 2024 14:26:32 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 13:57:45 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - fix compile errors >> - Merge branch 'master' into unsignedbounds >> - add comments >> - Merge branch 'master' into unsignedbounds >> - fix release build >> - add comments, group arguments to reduce C-style reference passing arguments >> - fix tests, add verify >> - add unit tests >> - fix template parameter >> - refactor >> - ... and 1 more: https://git.openjdk.org/jdk/compare/d8e4d3f2...d5ad9f1a > > src/hotspot/share/opto/rangeinference.cpp line 109: > >> 107: static AdjustResult> >> 108: adjust_bits_from_bounds(const KnownBits& bits, const RangeInt& bounds) { >> 109: static_assert(std::is_unsigned::value, ""); > > Again: could be a member function of `KnownBits`, where unsigned could be verified already... At any rate, I would name `T -> U` > src/hotspot/share/opto/type.hpp line 604: > >> 602: const jint _lo, _hi; // Lower bound, upper bound in the signed domain >> 603: const juint _ulo, _uhi; // Lower bound, upper bound in the unsigned domain >> 604: const juint _zeros, _ones; // Bits that are known to be 0 or 1 > > It would have been nice to have some sort of `Range` and `KnownBits` class... not sure how feasible this is, especially since `_lo` and `_hi` are already used all over the place... Can you explain the semantics of the combination of these? Each of these defines a subset of the whole int-range. Is the resulting type the intersection of all of these three? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742115104 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742074617 From epeter at openjdk.org Tue Sep 3 14:26:33 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Sep 2024 14:26:33 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v2] In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 09:23:05 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> fix tests, add verify > > src/hotspot/share/opto/type.hpp line 558: > >> 556: >> 557: // Use to compute join of 2 sets >> 558: const bool _dual; > > I think you need to add some comments, explaining why this is here Thanks for adding the comment! I think comments should not refer to "before this change", rather they should describe the way the code works now. I would also recommend renaming it to `_is_dual`. Because with `_dual`, I would expect to get some kind of dual, not a bool. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742052610 From epeter at openjdk.org Tue Sep 3 14:32:26 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Sep 2024 14:32:26 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 16:30:26 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - fix compile errors > - Merge branch 'master' into unsignedbounds > - add comments > - Merge branch 'master' into unsignedbounds > - fix release build > - add comments, group arguments to reduce C-style reference passing arguments > - fix tests, add verify > - add unit tests > - fix template parameter > - refactor > - ... and 1 more: https://git.openjdk.org/jdk/compare/d8e4d3f2...d5ad9f1a What are your thoughts on how we are going to debug-print the new int-type? We probably do not always want to print the KnownBits, in general that is way too verbose. But in some alignment example, it would be nice to know that it is the `int/long` range, but the lowest 3 bits are always zero, hence 8 byte aligned. test/hotspot/gtest/opto/test_rangeinference.cpp line 148: > 146: test_normalize_constraints_random(); > 147: test_normalize_constraints_random(); > 148: } I would appreciate it if there were some explicit examples with explicit result verification. Just to make sure the methods are not systematically wrong in some silly way. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2326671282 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742176314 From epeter at openjdk.org Tue Sep 3 14:36:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Sep 2024 14:36:22 GMT Subject: RFR: 8336860: x86: Change integer src operand for CMoveL of 0 and 1 to long [v4] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 14:58:01 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch fixes `cmovL_imm_01*` instructions matching against an integer immediate 1 instead of a long immediate 1. I noticed while looking at the backend implementation of CMove that the rules specify `immI_1` instead of `immL1`, which means that the instructions can't be matched and instead falls through to the base case. I added a small benchmark and got these results: >> >> Baseline Patch Improvement >> Benchmark Mode Cnt Score Error Units Score Error Units >> BasicRules.cmovL_imm_01 avgt 15 259.073 ? 5.806 ns/op 231.108 ? 2.730 ns/op (+ 11.41%) >> >> Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Move architecture checks into IR I'm now good with it, but I'm away from my work laptop and cannot run our testing... @chhagedorn @TobiHartmann Can one of you please run testing for this before it is integrated? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20275#pullrequestreview-2277643723 From epeter at openjdk.org Tue Sep 3 14:42:20 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Sep 2024 14:42:20 GMT Subject: RFR: 8335444: Generalize implementation of AndNode mul_ring [v3] In-Reply-To: References: <7l_zt0Q884dkzJbP5kjhfvZPmFQ4CMRgakWoUCopfvw=.fc1307d0-6a2c-40cc-adc6-c1b21424d0dd@github.com> <6lNmXhkDKMNu7r8yYrBYs4JUJQ-drbj9Gj1_EaAqQj0=.c699170b-0551-4c04-8322-0182cf9b5866@github.com> Message-ID: On Fri, 23 Aug 2024 17:45:44 GMT, Quan Anh Mai wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Check IR before macro expansion > > I agree, I think your math is correct. How will this mesh with the changes that @merykitty is doing with the KnownBits, in the work of https://github.com/openjdk/jdk/pull/17508 etc? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20066#issuecomment-2326699785 From epeter at openjdk.org Tue Sep 3 14:42:20 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Sep 2024 14:42:20 GMT Subject: RFR: 8335444: Generalize implementation of AndNode mul_ring [v3] In-Reply-To: <6lNmXhkDKMNu7r8yYrBYs4JUJQ-drbj9Gj1_EaAqQj0=.c699170b-0551-4c04-8322-0182cf9b5866@github.com> References: <7l_zt0Q884dkzJbP5kjhfvZPmFQ4CMRgakWoUCopfvw=.fc1307d0-6a2c-40cc-adc6-c1b21424d0dd@github.com> <6lNmXhkDKMNu7r8yYrBYs4JUJQ-drbj9Gj1_EaAqQj0=.c699170b-0551-4c04-8322-0182cf9b5866@github.com> Message-ID: <7Xsp23wQ0rySCMNFIlWLkdfpgH2m2dNIKADLwFe9J5E=.7d9279d6-71e6-4bda-9ccd-1c0024ca205c@github.com> On Wed, 7 Aug 2024 01:20:09 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> I've written this patch which improves type calculation for bitwise-and functions. Previously, the only cases that were considered were if one of the inputs to the node were a positive constant. I've generalized this behavior, as well as added a case to better estimate the result for arbitrary ranges. Since these are very common patterns to see, this can help propagate more precise types throughout the ideal graph for a large number of methods, making other optimizations and analyses stronger. I was interested in where this patch improves types, so I ran CTW for `java_base` and `java_base_2` and printed out the differences in this gist [here](https://gist.github.com/jaskarth/b45260d81ab621656f4a55cc51cf5292). While I don't think it's particularly complicated I've also added some discussion of the mathematics below, mostly because I thought it was interesting to work through :) >> >> This patch passes tier1-3 testing on my linux x64 machine. Thoughts and reviews would be very appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Check IR before macro expansion I think there are still quite a few unaddressed review comments above, so I'll hold off with reviewing myself. Feel free to ping me if you need another review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20066#issuecomment-2326704510 From coleenp at openjdk.org Tue Sep 3 15:00:19 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 3 Sep 2024 15:00:19 GMT Subject: RFR: 8338924: C1: assert(0 <= i && i < _len) failed: illegal index 5 for length 5 In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 23:51:42 GMT, Dean Long wrote: >> test/hotspot/jtreg/runtime/interpreter/LastJsrTest.java line 39: >> >>> 37: public class LastJsrTest { >>> 38: public static void main(String[] args) { >>> 39: for (int i = 0; i < 1000; ++i) { >> >> Don't you need 10,000 in your loop to trigger compilation? > > Yes for C2, but this is enough for C1, the only compiler that needs this fix. I wanted to make sure C1 compilation was triggered by default without -Xcomp. Testing tiers that use -Xcomp will make sure it passes with C2. Ok. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20732#discussion_r1742223811 From matsaave at openjdk.org Tue Sep 3 15:03:56 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 3 Sep 2024 15:03:56 GMT Subject: RFR: 8338924: C1: assert(0 <= i && i < _len) failed: illegal index 5 for length 5 [v2] In-Reply-To: References: Message-ID: > The change in [JDK-8335664](https://bugs.openjdk.org/browse/JDK-8335664) caused a crash when initializing basic blocks with `-Xcomp`. This change introduces a check to see if JSR is the last bytecode in its method so that expected behavior matches the previous patch. Verified with tier 1-6 tests. Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Vladimir and Tobias comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20732/files - new: https://git.openjdk.org/jdk/pull/20732/files/511d5207..e3241704 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20732&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20732&range=00-01 Stats: 11 lines in 2 files changed: 0 ins; 3 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/20732.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20732/head:pull/20732 PR: https://git.openjdk.org/jdk/pull/20732 From mcimadamore at openjdk.org Tue Sep 3 15:54:21 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 3 Sep 2024 15:54:21 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 12:06:18 GMT, Jorn Vernee wrote: > I guess that upcalls are less performance critical Think of something like `qsort`, or OpenGL calling the "repaint" function, or, with something like `jextract` , calling the clang cursor visitor. While using upcalls might be rare, the kind of use cases where you need upcalls typically fall into the bucket where the same upcall is used a gazillion time within a certain downcall. Then, it becomes performance critical. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20479#issuecomment-2326871347 From sroy at openjdk.org Tue Sep 3 15:59:54 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Tue, 3 Sep 2024 15:59:54 GMT Subject: RFR: JDK-8332423 : [PPC64] Remove C1_MacroAssembler::call_c_with_frame_resize [v7] In-Reply-To: References: Message-ID: > JBS Issue : [JDK-8332423](https://bugs.openjdk.org/browse/JDK-8332423) > C1_MacroAssembler::call_c_with_frame_resize is only used with frame_resize == 0. > Also, call_c is adapted as per endianess of system. > We can adapt the exisiting code to handle the endianness check at one place and not have to repeatedly check at multiple places to make calls to call_c. Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: - cast - casting ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19947/files - new: https://git.openjdk.org/jdk/pull/19947/files/7c7de0ec..1db20f59 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19947&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19947&range=05-06 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19947/head:pull/19947 PR: https://git.openjdk.org/jdk/pull/19947 From kvn at openjdk.org Tue Sep 3 16:02:19 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Sep 2024 16:02:19 GMT Subject: RFR: 8323688: C2: Fix UB of jlong overflow in PhaseIdealLoop::is_counted_loop() [v2] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 09:30:32 GMT, Christian Hagedorn wrote: >> The computation of `final_correction` in `PhaseIdealLoop::is_counted_loop()` could overflow which is UB: >> >> https://github.com/openjdk/jdk/blob/dc4fd896289db1d2f6f7bbf5795fec533448a48c/src/hotspot/share/opto/loopnode.cpp#L1958-L1967 >> >> `canonicalized_correction` equals `max_int - 1` if stride is `max_int`. `limit_correction` is at most `max_int - 1` in that case. Adding both together will overflow. I don't think that any compiler would wrongly optimize this and we have not observed any issues with that. But we should still fix this UB. >> >> The fix I propose is to simply bail out with very large positive and negative strides such that we avoid an over- or underflow with the existing logic (see added comments for how the upper bound for the stride is determined). These large strides should be very uncommon in practice and even if we encounter these, the loop would only run for a few iterations. So, a bailout seems fine. This bailout has the additional benefit that we avoid other possibly unknown issues or issues in the future with counted loops having large edge-case strides like `min_int`. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Tobias Hartmann Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20828#pullrequestreview-2277872619 From sroy at openjdk.org Tue Sep 3 16:05:56 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Tue, 3 Sep 2024 16:05:56 GMT Subject: RFR: JDK-8332423 : [PPC64] Remove C1_MacroAssembler::call_c_with_frame_resize [v8] In-Reply-To: References: Message-ID: > JBS Issue : [JDK-8332423](https://bugs.openjdk.org/browse/JDK-8332423) > C1_MacroAssembler::call_c_with_frame_resize is only used with frame_resize == 0. > Also, call_c is adapted as per endianess of system. > We can adapt the exisiting code to handle the endianness check at one place and not have to repeatedly check at multiple places to make calls to call_c. Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: cast ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19947/files - new: https://git.openjdk.org/jdk/pull/19947/files/1db20f59..73eead7f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19947&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19947&range=06-07 Stats: 5 lines in 1 file changed: 2 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19947/head:pull/19947 PR: https://git.openjdk.org/jdk/pull/19947 From mdoerr at openjdk.org Tue Sep 3 16:05:56 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Sep 2024 16:05:56 GMT Subject: RFR: JDK-8332423 : [PPC64] Remove C1_MacroAssembler::call_c_with_frame_resize [v8] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 16:03:12 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8332423](https://bugs.openjdk.org/browse/JDK-8332423) >> C1_MacroAssembler::call_c_with_frame_resize is only used with frame_resize == 0. >> Also, call_c is adapted as per endianess of system. >> We can adapt the exisiting code to handle the endianness check at one place and not have to repeatedly check at multiple places to make calls to call_c. > > Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: > > cast src/hotspot/cpu/ppc/macroAssembler_ppc.hpp line 367: > 365: // calling conventions. Updates and returns _last_calls_return_pc. > 366: address call_c(Register function_descriptor); > 367: // For tail calls: only branch, don't link, so callee returns to caller of this function. Please still repair the indentation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19947#discussion_r1742322001 From mdoerr at openjdk.org Tue Sep 3 16:05:56 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Sep 2024 16:05:56 GMT Subject: RFR: JDK-8332423 : [PPC64] Remove C1_MacroAssembler::call_c_with_frame_resize [v7] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 15:59:54 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8332423](https://bugs.openjdk.org/browse/JDK-8332423) >> C1_MacroAssembler::call_c_with_frame_resize is only used with frame_resize == 0. >> Also, call_c is adapted as per endianess of system. >> We can adapt the exisiting code to handle the endianness check at one place and not have to repeatedly check at multiple places to make calls to call_c. > > Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: > > - cast > - casting src/hotspot/cpu/ppc/macroAssembler_ppc.hpp line 371: > 369: return call_c((const FunctionDescriptor*)function_entry, rt); > 370: } > 371: // For tail calls: only branch, don't link, so callee returns to caller of this function. This looks messed up! I requested to move it down, not do duplicate any function declaration. Also, the comment indentation is bad. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19947#discussion_r1742320366 From sroy at openjdk.org Tue Sep 3 16:13:57 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Tue, 3 Sep 2024 16:13:57 GMT Subject: RFR: JDK-8332423 : [PPC64] Remove C1_MacroAssembler::call_c_with_frame_resize [v9] In-Reply-To: References: Message-ID: > JBS Issue : [JDK-8332423](https://bugs.openjdk.org/browse/JDK-8332423) > C1_MacroAssembler::call_c_with_frame_resize is only used with frame_resize == 0. > Also, call_c is adapted as per endianess of system. > We can adapt the exisiting code to handle the endianness check at one place and not have to repeatedly check at multiple places to make calls to call_c. Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: indents ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19947/files - new: https://git.openjdk.org/jdk/pull/19947/files/73eead7f..b63c9591 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19947&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19947&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19947/head:pull/19947 PR: https://git.openjdk.org/jdk/pull/19947 From mdoerr at openjdk.org Tue Sep 3 16:14:23 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Sep 2024 16:14:23 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded In-Reply-To: References: Message-ID: On Tue, 6 Aug 2024 17:26:55 GMT, Jorn Vernee wrote: > As discussed in the JBS issue: > > FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` > > However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. > > The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. > >
> Performance numbers > x64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op > > > aarch64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op > >
> > As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. > > Testing: tier 1-4 src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 4598: > 4596: address start = __ pc(); > 4597: > 4598: __ resolve_global_jobject(R3_ARG1, R22_tmp2, R23_tmp3, MacroAssembler::PRESERVATION_FRAME_LR_GP_FP_REGS); // kills R31 The comment "// kills R31" is not true. Please remove it. Can you also improve the indentation in the succeeding lines, please? Otherwise, the PPC64 code looks good and the test/jdk/java/foreign tests are passing (also with ZGC). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1742333288 From mdoerr at openjdk.org Tue Sep 3 16:16:23 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Sep 2024 16:16:23 GMT Subject: RFR: JDK-8332423 : [PPC64] Remove C1_MacroAssembler::call_c_with_frame_resize [v9] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 16:13:57 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8332423](https://bugs.openjdk.org/browse/JDK-8332423) >> C1_MacroAssembler::call_c_with_frame_resize is only used with frame_resize == 0. >> Also, call_c is adapted as per endianess of system. >> We can adapt the exisiting code to handle the endianness check at one place and not have to repeatedly check at multiple places to make calls to call_c. > > Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: > > indents LGTM. We should retest it on AIX. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19947#pullrequestreview-2277903386 From kvn at openjdk.org Tue Sep 3 17:11:20 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Sep 2024 17:11:20 GMT Subject: RFR: 8338971: IGV: Add incrementally inlined method name to phase name In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 11:43:57 GMT, Christian Hagedorn wrote: > This patch adds the method name to the incremental inlining step dumps in IGV which improves debugging issues involving incremental inlining: > > > static void test() { > method1(); > method2(); > method3(); > } > > static void method1() {} > static void method2() {} > static void method3() {} > > Run with `-XX:+AlwaysIncrementalInline` and IGV print level >=3: > > Before patch: > ![image](https://github.com/user-attachments/assets/a3a1ab32-e7b3-4ccb-8ab2-a75d2b5b6912) > > After patch: > ![image](https://github.com/user-attachments/assets/8100a2fe-1670-4687-b8b8-c8053fbaa7d7) > > The patch just prints the method name if we call `print_method()` with `n` being a call node which, AFAICT, only happens for the incremental inlining step. However, even if we call it with another phase at some point, I don't think it hurts to also dump the method name there. > > #### Testing > - Manually verifying change in IGV > - Building IGV which runs its unit tests > - Sanity run with a hello world program with `-Xcomp -XX:+AlwaysIncrementalInline -XX:+PrintIdealGraph -XX:PrintIdealGraphLevel=3 -XX:PrintIdealGraphFile=graph.xml` > > Thanks, > Christian Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20834#pullrequestreview-2278016686 From kvn at openjdk.org Tue Sep 3 17:45:20 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Sep 2024 17:45:20 GMT Subject: RFR: 8338924: C1: assert(0 <= i && i < _len) failed: illegal index 5 for length 5 [v2] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 15:03:56 GMT, Matias Saavedra Silva wrote: >> The change in [JDK-8335664](https://bugs.openjdk.org/browse/JDK-8335664) caused a crash when initializing basic blocks with `-Xcomp`. This change introduces a check to see if JSR is the last bytecode in its method so that expected behavior matches the previous patch. Verified with tier 1-6 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Vladimir and Tobias comments Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20732#pullrequestreview-2278083249 From kvn at openjdk.org Tue Sep 3 18:14:30 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Sep 2024 18:14:30 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v4] In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 14:32:25 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Update after Roberto's comments and suggestions I agree that we should use your approach instead of big static size. As we discussed on our previous meeting Aarch64 has very small registers mask - only 10 words. Can you look if that enough or we should increase static size of it? It could be separate RFE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2327135602 From jkarthikeyan at openjdk.org Tue Sep 3 18:20:20 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 3 Sep 2024 18:20:20 GMT Subject: RFR: 8335444: Generalize implementation of AndNode mul_ring [v3] In-Reply-To: <7Xsp23wQ0rySCMNFIlWLkdfpgH2m2dNIKADLwFe9J5E=.7d9279d6-71e6-4bda-9ccd-1c0024ca205c@github.com> References: <7l_zt0Q884dkzJbP5kjhfvZPmFQ4CMRgakWoUCopfvw=.fc1307d0-6a2c-40cc-adc6-c1b21424d0dd@github.com> <6lNmXhkDKMNu7r8yYrBYs4JUJQ-drbj9Gj1_EaAqQj0=.c699170b-0551-4c04-8322-0182cf9b5866@github.com> <7Xsp23wQ0rySCMNFIlWLkdfpgH2m2dNIKADLwFe9J5E=.7d9279d6-71e6-4bda-9ccd-1c0024ca205c@github.com> Message-ID: <3DRXdiYsJvE-9u-pmYkW4ZTSsR1QwDJfCFLmLZo1mYk=.0007ce94-e939-49b3-be94-e39d4b3a7d34@github.com> On Tue, 3 Sep 2024 14:39:45 GMT, Emanuel Peter wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Check IR before macro expansion > > I think there are still quite a few unaddressed review comments above, so I'll hold off with reviewing myself. Feel free to ping me if you need another review. @eme64 I was thinking that this PR and the associated IR tests could serve as a baseline for future future work in this area, such as #17508, since it just uses the signed bounds we already have to do its analysis. My goal was to make the changes as simple as possible, so that we can see some benefits short term while we work on the long-term solution for better type inference. Originally I just had the `[0..C]` case since I saw that it showed up a lot in real-world code, but I kept finding generalizations and eventually it snowballed ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20066#issuecomment-2327144025 From qamai at openjdk.org Tue Sep 3 20:34:37 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 3 Sep 2024 20:34:37 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v8] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: address reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/d5ad9f1a..2c3807bd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=06-07 Stats: 369 lines in 5 files changed: 91 ins; 35 del; 243 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Tue Sep 3 20:34:43 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 3 Sep 2024 20:34:43 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 13:49:05 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - fix compile errors >> - Merge branch 'master' into unsignedbounds >> - add comments >> - Merge branch 'master' into unsignedbounds >> - fix release build >> - add comments, group arguments to reduce C-style reference passing arguments >> - fix tests, add verify >> - add unit tests >> - fix template parameter >> - refactor >> - ... and 1 more: https://git.openjdk.org/jdk/compare/d8e4d3f2...d5ad9f1a > > src/hotspot/share/opto/rangeinference.cpp line 56: > >> 54: static_assert(std::is_unsigned::value, ""); >> 55: >> 56: auto adjust_lo = [](T lo, const KnownBits& bits) { > > This does not even capture anything. Why not make it its own dedicated method with a nice name? I guess this could also be a member method of `KnownBits`. Since it is only used here I think it would be more sensible to make it a local lambda to lower the visibility, the resulting function is not too large, too. > src/hotspot/share/opto/type.hpp line 606: > >> 604: const juint _zeros, _ones; // Bits that are known to be 0 or 1 >> 605: >> 606: static const TypeInt* cast(const Type* t) { return t->is_int(); } > > Is this even used? I have removed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742651668 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742648644 From qamai at openjdk.org Tue Sep 3 20:34:44 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 3 Sep 2024 20:34:44 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v8] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 13:35:54 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> address reviews > > src/hotspot/share/opto/type.cpp line 500: > >> 498: assert( TypeInt::CC_EQ == TypeInt::ZERO, "types must match for CmpL to work" ); >> 499: assert( TypeInt::CC_GE == TypeInt::BOOL, "types must match for CmpL to work" ); >> 500: assert( (juint)(TypeInt::CC->_hi - TypeInt::CC->_lo) <= SMALLINT, "CC is truly small"); > > Why did you remove this check? Because I saw that `SMALLINT` is only used here so I moved it to `rangeinference.cpp`, the assert is also pretty unnecessary, too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742649934 From qamai at openjdk.org Tue Sep 3 20:34:44 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 3 Sep 2024 20:34:44 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 13:33:21 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/type.hpp line 604: >> >>> 602: const jint _lo, _hi; // Lower bound, upper bound in the signed domain >>> 603: const juint _ulo, _uhi; // Lower bound, upper bound in the unsigned domain >>> 604: const juint _zeros, _ones; // Bits that are known to be 0 or 1 >> >> It would have been nice to have some sort of `Range` and `KnownBits` class... not sure how feasible this is, especially since `_lo` and `_hi` are already used all over the place... > > Can you explain the semantics of the combination of these? Each of these defines a subset of the whole int-range. Is the resulting type the intersection of all of these three? That's what I thought too, but considering they are constants I think exposing them directly is fine. I have added explanation regarding the meaning of these constraints. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742648488 From qamai at openjdk.org Tue Sep 3 20:39:41 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 3 Sep 2024 20:39:41 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v9] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: move static_asserts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/2c3807bd..ae473850 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=07-08 Stats: 14 lines in 2 files changed: 6 ins; 8 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Tue Sep 3 20:39:41 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 3 Sep 2024 20:39:41 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 13:58:06 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/rangeinference.cpp line 109: >> >>> 107: static AdjustResult> >>> 108: adjust_bits_from_bounds(const KnownBits& bits, const RangeInt& bounds) { >>> 109: static_assert(std::is_unsigned::value, ""); >> >> Again: could be a member function of `KnownBits`, where unsigned could be verified already... > > At any rate, I would name `T -> U` Yes I have renamed all signed types to `S` and unsigned types to `U`. Regarding making it a member of `KnownBits`, making it a `static` function has the advantage of visibility to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742658774 From qamai at openjdk.org Tue Sep 3 20:39:46 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 3 Sep 2024 20:39:46 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 13:51:17 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - fix compile errors >> - Merge branch 'master' into unsignedbounds >> - add comments >> - Merge branch 'master' into unsignedbounds >> - fix release build >> - add comments, group arguments to reduce C-style reference passing arguments >> - fix tests, add verify >> - add unit tests >> - fix template parameter >> - refactor >> - ... and 1 more: https://git.openjdk.org/jdk/compare/d8e4d3f2...d5ad9f1a > > src/hotspot/share/opto/rangeinference.hpp line 44: > >> 42: T _zeros; >> 43: T _ones; >> 44: }; > > Are the `KnownBits` always unsigned? Why not name the type `U` and verify that it is unsigned? > > You could also have member-functions on it then, like the bounds adjustment. And then you can either have the type free (i.e. allow signed type), or require the same `U` type on the input ranges or values. Yes you are right, for some reasons I thought that `static_assert` needs to live inside functions??? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742656775 From qamai at openjdk.org Tue Sep 3 20:43:28 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 3 Sep 2024 20:43:28 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 13:56:24 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - fix compile errors >> - Merge branch 'master' into unsignedbounds >> - add comments >> - Merge branch 'master' into unsignedbounds >> - fix release build >> - add comments, group arguments to reduce C-style reference passing arguments >> - fix tests, add verify >> - add unit tests >> - fix template parameter >> - refactor >> - ... and 1 more: https://git.openjdk.org/jdk/compare/d8e4d3f2...d5ad9f1a > > src/hotspot/share/opto/rangeinference.cpp line 94: > >> 92: return {true, false, {}}; >> 93: } >> 94: T new_hi = ~adjust_lo(~bounds._hi, {bits._ones, bits._zeros}); > > Wow, that looks like some magic ? Can you please explain this? So `~` is a strictly decreasing function in the unsigned integer domain, so we just do a bitwise negation, compute the adjustment there and switch back. > src/hotspot/share/opto/rangeinference.cpp line 444: > >> 442: return nullptr; >> 443: } >> 444: os::snprintf_checked(buf, buf_size, "%s+" INT32_FORMAT, xname, jint(n - origin)); > > Was there an explicit choice to use buffers, instead of `outputStream`? I'm not sure, it is just the currently used method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742662458 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742663315 From qamai at openjdk.org Tue Sep 3 20:48:29 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 3 Sep 2024 20:48:29 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 14:05:36 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - fix compile errors >> - Merge branch 'master' into unsignedbounds >> - add comments >> - Merge branch 'master' into unsignedbounds >> - fix release build >> - add comments, group arguments to reduce C-style reference passing arguments >> - fix tests, add verify >> - add unit tests >> - fix template parameter >> - refactor >> - ... and 1 more: https://git.openjdk.org/jdk/compare/d8e4d3f2...d5ad9f1a > > src/hotspot/share/opto/rangeinference.cpp line 113: > >> 111: T match_mask = mismatch == 0 ? std::numeric_limits::max() >> 112: : ~(std::numeric_limits::max() >> count_leading_zeros(mismatch)); >> 113: T new_zeros = bits._zeros | (match_mask &~ bounds._lo); > > Suggestion: > > T new_zeros = bits._zeros | (match_mask & ~bounds._lo); > > I think this was a typo? It looks more like an `and not` to me :) However, if you prefer the `~` to stick to `bounds._lo` I would make that change. > test/hotspot/gtest/opto/test_rangeinference.cpp line 148: > >> 146: test_normalize_constraints_random(); >> 147: test_normalize_constraints_random(); >> 148: } > > I would appreciate it if there were some explicit examples with explicit result verification. Just to make sure the methods are not systematically wrong in some silly way. My idea is that it is what `test_normalize_constraints_simple` would do, but I think adding some more explicit cases would help, too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742670148 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742668030 From qamai at openjdk.org Tue Sep 3 20:52:27 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 3 Sep 2024 20:52:27 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 14:27:22 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - fix compile errors >> - Merge branch 'master' into unsignedbounds >> - add comments >> - Merge branch 'master' into unsignedbounds >> - fix release build >> - add comments, group arguments to reduce C-style reference passing arguments >> - fix tests, add verify >> - add unit tests >> - fix template parameter >> - refactor >> - ... and 1 more: https://git.openjdk.org/jdk/compare/d8e4d3f2...d5ad9f1a > > What are your thoughts on how we are going to debug-print the new int-type? We probably do not always want to print the KnownBits, in general that is way too verbose. But in some alignment example, it would be nice to know that it is the `int/long` range, but the lowest 3 bits are always zero, hence 8 byte aligned. @eme64 Thanks a lot for your reviews, I think I have addressed all of them, I will add some more explicit tests tomorrow. Regarding the `dump`, I made a new method `dump_verbose` which would dump the bit information of the type. What do you think? Also, the test failures are due to the `int:>=0` is now dumped differently (`int:0..max_int ^ 0..max_int`). Do you think it would be more suitable to make the type dumping more clever or to modify existing tests? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2327412583 From mdoerr at openjdk.org Tue Sep 3 22:13:22 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Sep 2024 22:13:22 GMT Subject: RFR: JDK-8332423 : [PPC64] Remove C1_MacroAssembler::call_c_with_frame_resize [v9] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 16:13:57 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8332423](https://bugs.openjdk.org/browse/JDK-8332423) >> C1_MacroAssembler::call_c_with_frame_resize is only used with frame_resize == 0. >> Also, call_c is adapted as per endianess of system. >> We can adapt the exisiting code to handle the endianness check at one place and not have to repeatedly check at multiple places to make calls to call_c. > > Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: > > indents You effectively changed the type from `relocInfo::runtime_call_type` to `relocInfo::none` in c1_LIRAssembler_ppc.cpp. This is causing problems with ABIv1. The VM seems to work when switching off C1 on AIX. So, the other files should be ok. ------------- Changes requested by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19947#pullrequestreview-2278607902 From sviswanathan at openjdk.org Tue Sep 3 22:17:34 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 3 Sep 2024 22:17:34 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v5] In-Reply-To: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> References: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> Message-ID: On Mon, 2 Sep 2024 12:20:59 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolved Thanks for considering the review comments. I have some minor follow ups. src/hotspot/cpu/x86/assembler_x86.cpp line 8470: > 8468: void Assembler::vpmaxud(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) { > 8469: assert(vector_len == AVX_128bit ? VM_Version::supports_avx() : > 8470: (vector_len == AVX_256bit ? VM_Version::supports_avx2() : VM_Version::supports_avx512bw()), ""); avx512bw check here seems wrong. src/hotspot/cpu/x86/assembler_x86.cpp line 8479: > 8477: void Assembler::vpmaxud(XMMRegister dst, XMMRegister nds, Address src, int vector_len) { > 8478: assert(vector_len == AVX_128bit ? VM_Version::supports_avx() : > 8479: (vector_len == AVX_256bit ? VM_Version::supports_avx2() : VM_Version::supports_avx512bw()), ""); avx512bw check here seems wrong. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 945: > 943: } else { > 944: vpblendvb(dst, src2, src1, xtmp1, vlen_enc); > 945: } The comment needs to move inside if and else block as the code in these blocks is reverse of each other. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 6691: > 6689: // Res = INP1 - INP2 (non-commutative and non-associative) > 6690: // Res = Mask ? Zero : Res > 6691: evmasked_op(etype == T_INT ? Op_SubVI : Op_SubVL, etype, ktmp, dst, src1, src2, false, vlen_enc, false); Do the comments need update here? e.g. 6688 is setting mask bits to true for src2 6713: int vlen_enc) { > 6714: // Unsigned values ranges comprise of only +ve numbers, thus there exist only an upper bound saturation. > 6715: // overflow_mask = (SRC1 + SRC2) 1985: public static long addSaturating(long a, long b) { > 1986: long res = a + b; > 1987: // HD 2-12 Overflow iff both arguments have the opposite sign of the result HD -> Hacker's Delight src/java.base/share/classes/java/lang/Long.java line 2008: > 2006: public static long subSaturating(long a, long b) { > 2007: long res = a - b; > 2008: // HD 2-12 Overflow iff the arguments have different signs and HD -> Hacker's Delight ------------- PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2277917757 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742347879 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742348218 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742725069 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742733746 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742751114 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742452009 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742452802 From sdohrmann at openjdk.org Tue Sep 3 22:22:56 2024 From: sdohrmann at openjdk.org (Steve Dohrmann) Date: Tue, 3 Sep 2024 22:22:56 GMT Subject: RFR: 8329035: New Data Destination instructions support [v2] In-Reply-To: References: Message-ID: > Adds assembler support for APX New Data Destination (NDD) and No Flags (NF) features. > > The NDD feature is supported by new functions that take an additional destination-only register operand. If the instruction also supports NF, a no_flags boolean parameter is present. To use these instructions with NF behavior, but without NDD semantics, the same register can be supplied for both the new destination and the (first) source operand. > > Some instructions support NF but not NDD. These instructions have a new function that just adds a boolean no_flags parameter. Existing functions were not overloaded with a boolean here because of signature collisions (bool / int) with functions that take immediate operands. > > All of the new functions have a letter "e" prefix, to avoid signature collisions and to indicate they will be evex encoded. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: function name changes based on review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20698/files - new: https://git.openjdk.org/jdk/pull/20698/files/d9f63772..9aea8bbb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20698&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20698&range=00-01 Stats: 186 lines in 2 files changed: 0 ins; 1 del; 185 mod Patch: https://git.openjdk.org/jdk/pull/20698.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20698/head:pull/20698 PR: https://git.openjdk.org/jdk/pull/20698 From sdohrmann at openjdk.org Tue Sep 3 22:22:57 2024 From: sdohrmann at openjdk.org (Steve Dohrmann) Date: Tue, 3 Sep 2024 22:22:57 GMT Subject: RFR: 8329035: New Data Destination instructions support [v2] In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 15:22:09 GMT, Jatin Bhateja wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> function name changes based on review comments > > src/hotspot/cpu/x86/assembler_x86.cpp line 1579: > >> 1577: >> 1578: void Assembler::eaddl(Register dst, Register src1, Register src2, bool no_flags) { >> 1579: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > > Should we not auto demote these instruction to use legacy MAP0 encoding, if dst and src1 / src2 are same and does not belong to EGPR set? > We do REX to VEX promotion and EVEX to VEX demotions at assembler level if the required criteria is met. The thinking is that the user would only use these extended functions if they want either NDD or NF semantics, or both. If they want only NF they can use the same register for dst / src and set no_flags to true; this case requires evex encoding. In other cases, they can call the non-"e" function to get prefix-optimized use of the larger register set. > src/hotspot/cpu/x86/assembler_x86.cpp line 2647: > >> 2645: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 2646: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_32bit); >> 2647: vex_prefix_ndd(src, dst->encoding(), 0, VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, no_flags); > > Suggestion: > > eevex_prefix_ndd(src, dst->encoding(), 0, VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, no_flags); Done. > src/hotspot/cpu/x86/assembler_x86.hpp line 794: > >> 792: bool eevex_x, int nds_enc, VexSimdPrefix pre, VexOpcode opc, bool no_flags = false); >> 793: >> 794: void vex_prefix_ndd(Address adr, int ndd_enc, int xreg_enc, VexSimdPrefix pre, VexOpcode opc, InstructionAttr *attributes, bool no_flags = false) { > > Suggestion: > > void eevx_prefix_ndd(Address adr, int ndd_enc, int xreg_enc, VexSimdPrefix pre, VexOpcode opc, InstructionAttr *attributes, bool no_flags = false) { > > > NDD is only supported with 4 byte extended evex encoding. Done. Assume "evex_prefix_ndd" was intended. > src/hotspot/cpu/x86/assembler_x86.hpp line 798: > >> 796: } >> 797: >> 798: void vex_prefix_nf(Address adr, int ndd_enc, int xreg_enc, VexSimdPrefix pre, VexOpcode opc, InstructionAttr *attributes, bool no_flags = false) { > > Same as above. Done. > src/hotspot/cpu/x86/assembler_x86.hpp line 809: > >> 807: InstructionAttr *attributes, bool src_is_gpr = false, bool nds_is_ndd = false, bool force_evex = false, bool no_flags = false); >> 808: >> 809: int vex_prefix_and_encode_ndd(int dst_enc, int nds_enc, int src_enc, VexSimdPrefix pre, VexOpcode opc, > > Suggestion: > > int vex_prefix_and_encode_ndd(int ndd_enc, int nds_enc, int src_enc, VexSimdPrefix pre, VexOpcode opc, The second argument (nds_enc) carries the ndd encoding. I kept the original parameter names since they are used in the implementation, which now serves both the ndd and nds features. The added nds_is_ndd flag distinguishes the two uses. The two encoding functions (vex_prefix and vex_prefix_and_encode) were fairly complex to start with. Changes to them to get ndd and nf features are pretty simple, so reusing them and keeping the existing parameter names in the implementation seemed a good idea. Using these same parameter names in the header took away the need to translate names if going between the header and impl. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1742768595 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1742767859 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1742767513 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1742767666 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1742769333 From sviswanathan at openjdk.org Tue Sep 3 22:23:21 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 3 Sep 2024 22:23:21 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v4] In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 12:17:08 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/x86.ad line 10656: >> >>> 10654: match(Set dst (SaturatingSubVI src1 src2)); >>> 10655: match(Set dst (SaturatingSubVL src1 src2)); >>> 10656: effect(TEMP ktmp); >> >> This needs TEMP dst as well. > > There is no use of either of the source operands after assignment to dst in the macro assembly routine. Sorry, I meant this for another instruct. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742767372 From sviswanathan at openjdk.org Tue Sep 3 22:23:22 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 3 Sep 2024 22:23:22 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v5] In-Reply-To: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> References: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> Message-ID: On Mon, 2 Sep 2024 12:20:59 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolved src/hotspot/cpu/x86/x86.ad line 10684: > 10682: match(Set dst (SaturatingSubVI src1 src2)); > 10683: match(Set dst (SaturatingSubVL src1 src2)); > 10684: effect(TEMP xtmp1, TEMP xtmp2); Here we need TEMP dst in effect, the saturating_unsigned_sub_dq_avx defines and uses dst across xtmp1. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742768246 From sviswanathan at openjdk.org Tue Sep 3 22:28:20 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 3 Sep 2024 22:28:20 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v2] In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 12:15:10 GMT, Jatin Bhateja wrote: >> If the aim is to reduce the number of nodes, we could merge the Op_SaturatingAddVB, Op_SaturatingAddVS, Op_SaturatingAddVI, and Op_SaturatingAddVL into one Op_SaturatingAddV. Likewise for unsigned saturating add into Op_SaturatingUnsignedAddV. > > Hey @sviswa7, our concern was around value ranges of new unsigned scalar type, which as mentioned will be addressed when I support intrinsification of new core lib APIs and associated range constraining / folding optimization in a follow up patch. Reiterating, we are not adding unsigned scalar types with this patch, we are only supporting unsigned (saturating) operations on existing signed integral types. So in my thoughts, we could avoid this change as I mentioned above, but I will leave this one to other reviewers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742772702 From kvn at openjdk.org Tue Sep 3 22:50:22 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Sep 2024 22:50:22 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v9] In-Reply-To: References: Message-ID: <2QYSxZwhAnaZ9z2kbrnk8ipDDS0YuqgodHbhHoZrjPY=.c837b6dc-66fd-4a64-a58a-327e7e858ffc@github.com> On Tue, 3 Sep 2024 20:39:41 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > move static_asserts Passing by comment. > Do you think it would be more suitable to make the type dumping more clever or to modify existing tests? I am fine with modifying tests when we agree on output format. Can you use `int:[0..max_int, 0..max_int]` instead of `^` which associates with an operator for me. For me it will be easy to interpret it as `[_lo, _hi] range of values. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2327570677 From kvn at openjdk.org Tue Sep 3 22:50:23 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Sep 2024 22:50:23 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 20:40:05 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/rangeinference.cpp line 94: >> >>> 92: return {true, false, {}}; >>> 93: } >>> 94: T new_hi = ~adjust_lo(~bounds._hi, {bits._ones, bits._zeros}); >> >> Wow, that looks like some magic ? Can you please explain this? > > So `~` is a strictly decreasing function in the unsigned integer domain, so we just do a bitwise negation, compute the adjustment there and switch back. This deserves comment. When next person need to touch this code it will be "magic" and confusing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742784955 From kvn at openjdk.org Tue Sep 3 22:59:21 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Sep 2024 22:59:21 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 20:49:40 GMT, Quan Anh Mai wrote: >> What are your thoughts on how we are going to debug-print the new int-type? We probably do not always want to print the KnownBits, in general that is way too verbose. But in some alignment example, it would be nice to know that it is the `int/long` range, but the lowest 3 bits are always zero, hence 8 byte aligned. > > @eme64 Thanks a lot for your reviews, I think I have addressed all of them, I will add some more explicit tests tomorrow. Regarding the `dump`, I made a new method `dump_verbose` which would dump the bit information of the type. What do you think? Also, the test failures are due to the `int:>=0` is now dumped differently (`int:0..max_int ^ 0..max_int`). Do you think it would be more suitable to make the type dumping more clever or to modify existing tests? @merykitty Can you add JMH benchmark which shows benefits of these changes? Does this change affect C2 compilation speed? It seems we may spend more time in `Value()` methods and other places where we create new type. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2327580747 From kvn at openjdk.org Tue Sep 3 23:05:22 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Sep 2024 23:05:22 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v9] In-Reply-To: References: Message-ID: <9IWQrneQV6EbR8d23nDgX8Nas6S-1RL4Jo5BR7UjZ0I=.2b6ddf93-484c-4f84-aa5e-728e01af6348@github.com> On Tue, 3 Sep 2024 20:39:41 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > move static_asserts Please, fix GHA builds and testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2327594984 From kvn at openjdk.org Tue Sep 3 23:05:23 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Sep 2024 23:05:23 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 20:31:50 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/rangeinference.cpp line 56: >> >>> 54: static_assert(std::is_unsigned::value, ""); >>> 55: >>> 56: auto adjust_lo = [](T lo, const KnownBits& bits) { >> >> This does not even capture anything. Why not make it its own dedicated method with a nice name? I guess this could also be a member method of `KnownBits`. > > Since it is only used here I think it would be more sensible to make it a local lambda to lower the visibility, the resulting function is not too large, too. This frightens me ... 40 lines of code is not small. What benefit it has vs normal private method? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1742793453 From dholmes at openjdk.org Wed Sep 4 01:34:26 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 4 Sep 2024 01:34:26 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded In-Reply-To: References: Message-ID: On Tue, 6 Aug 2024 17:26:55 GMT, Jorn Vernee wrote: > As discussed in the JBS issue: > > FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` > > However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. > > The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. > >
> Performance numbers > x64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op > > > aarch64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op > >
> > As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. > > Testing: tier 1-4 src/hotspot/share/prims/upcallLinker.cpp line 142: > 140: Handle exception_h(Thread::current(), exception); > 141: java_lang_Throwable::print_stack_trace(exception_h, tty); > 142: ShouldNotReachHere(); How does `print_stack_trace` not return here? test/jdk/java/foreign/TestUpcallStress.java line 27: > 25: * @test > 26: * @requires jdk.foreign.linker != "FALLBACK" > 27: * @requires os.arch == "aarch64" & os.name == "Linux" Only for Linux-aarch64 ?? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1742870369 PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1742871513 From dlong at openjdk.org Wed Sep 4 01:53:22 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 4 Sep 2024 01:53:22 GMT Subject: RFR: 8339299: C1 will miss type profile when inline final method [v3] In-Reply-To: References: Message-ID: On Sat, 31 Aug 2024 00:17:03 GMT, Vladimir Kozlov wrote: >> Yes, exactly. > > I am not sure how Dean proposal will help. I agree with Vladimir's suggestion - C1 should not optimize call sites in Level 3 compilation. I was just trying to think how to preserve the original intend of this code, which seemed to be "skip profiling if the profiling info is not needed", but getting it right seems complicated, so I'm OK with always doing it. > C1 should not optimize call sites in Level 3 compilation. You mean don't use can_be_statically_bound() and other checks to devirtualize virtual calls? I believe C1 does that at all compilation levels. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20786#discussion_r1742883560 From thartmann at openjdk.org Wed Sep 4 06:01:25 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 4 Sep 2024 06:01:25 GMT Subject: RFR: 8338924: C1: assert(0 <= i && i < _len) failed: illegal index 5 for length 5 [v2] In-Reply-To: References: Message-ID: <1PclKNj5vooKpK3xF2xAie2HpwQdkJe-035P4pVvK48=.3b3f5459-9d18-45e0-a7a4-7f4e81382d66@github.com> On Tue, 3 Sep 2024 15:03:56 GMT, Matias Saavedra Silva wrote: >> The change in [JDK-8335664](https://bugs.openjdk.org/browse/JDK-8335664) caused a crash when initializing basic blocks with `-Xcomp`. This change introduces a check to see if JSR is the last bytecode in its method so that expected behavior matches the previous patch. Verified with tier 1-6 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Vladimir and Tobias comments Looks good to me otherwise. src/hotspot/share/c1/c1_GraphBuilder.cpp line 1393: > 1391: // has already been activated. Watch for this case and bail out. > 1392: if (next_bci() >= method()->code_size()) { > 1393: // This can happen if the subroutine does not terminate with a ret, Indentation is incorrect here (should be 2 whitespace instead of 4). ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20732#pullrequestreview-2279064608 PR Review Comment: https://git.openjdk.org/jdk/pull/20732#discussion_r1743116302 From chagedorn at openjdk.org Wed Sep 4 06:03:54 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 4 Sep 2024 06:03:54 GMT Subject: RFR: 8338971: IGV: Add incrementally inlined method name to phase name [v2] In-Reply-To: References: Message-ID: <2t0mNjColaQYUsTG3gwkuMSgxUKicD2ySXoeFAvmMJM=.7ad261a2-dfe4-4038-a8ab-2ce439637392@github.com> > This patch adds the method name to the incremental inlining step dumps in IGV which improves debugging issues involving incremental inlining: > > > static void test() { > method1(); > method2(); > method3(); > } > > static void method1() {} > static void method2() {} > static void method3() {} > > Run with `-XX:+AlwaysIncrementalInline` and IGV print level >=3: > > Before patch: > ![image](https://github.com/user-attachments/assets/a3a1ab32-e7b3-4ccb-8ab2-a75d2b5b6912) > > After patch: > ![image](https://github.com/user-attachments/assets/8100a2fe-1670-4687-b8b8-c8053fbaa7d7) > > The patch just prints the method name if we call `print_method()` with `n` being a call node which, AFAICT, only happens for the incremental inlining step. However, even if we call it with another phase at some point, I don't think it hurts to also dump the method name there. > > #### Testing > - Manually verifying change in IGV > - Building IGV which runs its unit tests > - Sanity run with a hello world program with `-Xcomp -XX:+AlwaysIncrementalInline -XX:+PrintIdealGraph -XX:PrintIdealGraphLevel=3 -XX:PrintIdealGraphFile=graph.xml` > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: suggestion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20834/files - new: https://git.openjdk.org/jdk/pull/20834/files/24cefa31..dc0f25a4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20834&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20834&range=00-01 Stats: 11 lines in 1 file changed: 6 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20834.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20834/head:pull/20834 PR: https://git.openjdk.org/jdk/pull/20834 From chagedorn at openjdk.org Wed Sep 4 06:03:54 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 4 Sep 2024 06:03:54 GMT Subject: RFR: 8338971: IGV: Add incrementally inlined method name to phase name [v2] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 13:26:32 GMT, Roberto Casta?eda Lozano wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> suggestion > > src/hotspot/share/opto/compile.cpp line 5206: > >> 5204: call->method()->print_short_name(&ss); >> 5205: } >> 5206: } > > Suggestion: using `call->_name` instead of `call->method()->print_short_name()` is slightly simpler and more general (should be equivalent for incremental inlining, but will also print stub names, "uncommon trap", etc. when dumping `PHASE_AFTER_ITER_GVN_STEP` graphs on call nodes). > Suggestion: > > ss.print(": %d %s", n->_idx, NodeClassNames[n->Opcode()]); > if (n->is_Call()) { > CallNode* call = n->as_Call(); > if (call->_name != nullptr) { > ss.print(" - %s", call->_name); > } > } Thanks for the suggestion. I think `_name` is only set if there is no attached method. So, we could combine your suggestion with what I've had. What do you think? (see pushed update) This gives us: ![image](https://github.com/user-attachments/assets/33aad438-432a-4252-8e5b-c75220491bf5) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20834#discussion_r1743117370 From chagedorn at openjdk.org Wed Sep 4 06:04:26 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 4 Sep 2024 06:04:26 GMT Subject: RFR: 8336860: x86: Change integer src operand for CMoveL of 0 and 1 to long [v4] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 14:58:01 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch fixes `cmovL_imm_01*` instructions matching against an integer immediate 1 instead of a long immediate 1. I noticed while looking at the backend implementation of CMove that the rules specify `immI_1` instead of `immL1`, which means that the instructions can't be matched and instead falls through to the base case. I added a small benchmark and got these results: >> >> Baseline Patch Improvement >> Benchmark Mode Cnt Score Error Units Score Error Units >> BasicRules.cmovL_imm_01 avgt 15 259.073 ? 5.806 ns/op 231.108 ? 2.730 ns/op (+ 11.41%) >> >> Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Move architecture checks into IR Testing looked good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20275#pullrequestreview-2279068538 From chagedorn at openjdk.org Wed Sep 4 06:11:30 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 4 Sep 2024 06:11:30 GMT Subject: RFR: 8335444: Generalize implementation of AndNode mul_ring [v3] In-Reply-To: <6lNmXhkDKMNu7r8yYrBYs4JUJQ-drbj9Gj1_EaAqQj0=.c699170b-0551-4c04-8322-0182cf9b5866@github.com> References: <7l_zt0Q884dkzJbP5kjhfvZPmFQ4CMRgakWoUCopfvw=.fc1307d0-6a2c-40cc-adc6-c1b21424d0dd@github.com> <6lNmXhkDKMNu7r8yYrBYs4JUJQ-drbj9Gj1_EaAqQj0=.c699170b-0551-4c04-8322-0182cf9b5866@github.com> Message-ID: <-t28h66vJlrX5ieI15WmqUqGGMORqCjwJRStbCvqzEk=.680fdd08-0406-41df-b927-fcac04f2b14e@github.com> On Wed, 7 Aug 2024 01:20:09 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> I've written this patch which improves type calculation for bitwise-and functions. Previously, the only cases that were considered were if one of the inputs to the node were a positive constant. I've generalized this behavior, as well as added a case to better estimate the result for arbitrary ranges. Since these are very common patterns to see, this can help propagate more precise types throughout the ideal graph for a large number of methods, making other optimizations and analyses stronger. I was interested in where this patch improves types, so I ran CTW for `java_base` and `java_base_2` and printed out the differences in this gist [here](https://gist.github.com/jaskarth/b45260d81ab621656f4a55cc51cf5292). While I don't think it's particularly complicated I've also added some discussion of the mathematics below, mostly because I thought it was interesting to work through :) >> >> This patch passes tier1-3 testing on my linux x64 machine. Thoughts and reviews would be very appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Check IR before macro expansion I agree that we should keep this RFE simple. And we are just using thing that we already have. So, we could just go with the optimizations that you currently have (if you like to apply the few simple improvement suggestions, you can already do that) and follow up with future RFEs to cover more cases. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20066#issuecomment-2327996554 From fyang at openjdk.org Wed Sep 4 06:17:19 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 4 Sep 2024 06:17:19 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded In-Reply-To: References: Message-ID: On Tue, 6 Aug 2024 17:26:55 GMT, Jorn Vernee wrote: > As discussed in the JBS issue: > > FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` > > However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. > > The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. > >
> Performance numbers > x64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op > > > aarch64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op > >
> > As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. > > Testing: tier 1-4 src/hotspot/cpu/riscv/upcallLinker_riscv.cpp line 264: > 262: > 263: __ block_comment("{ load target "); > 264: __ movptr(j_rarg0, (intptr_t) receiver); Hi @JornVernee , Could you please apply following small add-on change for linux-riscv64? As I witnessed build warning with GCC-13. Otherwise, builds fine and the newly-added test/jdk/java/foreign/TestUpcallStress.java is passing. diff --git a/src/hotspot/cpu/riscv/upcallLinker_riscv.cpp b/src/hotspot/cpu/riscv/upcallLinker_riscv.cpp index 5c45a679316..55160be99d0 100644 --- a/src/hotspot/cpu/riscv/upcallLinker_riscv.cpp +++ b/src/hotspot/cpu/riscv/upcallLinker_riscv.cpp @@ -261,7 +261,7 @@ address UpcallLinker::make_upcall_stub(jobject receiver, Symbol* signature, __ block_comment("} argument shuffle"); __ block_comment("{ load target "); - __ movptr(j_rarg0, (intptr_t) receiver); + __ movptr(j_rarg0, (address) receiver); __ far_call(RuntimeAddress(StubRoutines::upcall_stub_load_target())); // loads Method* into xmethod __ block_comment("} load target "); diff --git a/test/jdk/java/foreign/TestUpcallStress.java b/test/jdk/java/foreign/TestUpcallStress.java index 3b9b1d4b207..40607746856 100644 --- a/test/jdk/java/foreign/TestUpcallStress.java +++ b/test/jdk/java/foreign/TestUpcallStress.java @@ -24,7 +24,7 @@ /* * @test * @requires jdk.foreign.linker != "FALLBACK" - * @requires os.arch == "aarch64" & os.name == "Linux" + * @requires (os.arch == "aarch64" | os.arch=="riscv64") & os.name == "Linux" * @requires os.maxMemory > 4G * @modules java.base/jdk.internal.foreign * @build NativeTestHelper CallGeneratorHelper TestUpcallBase ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1743130094 From rcastanedalo at openjdk.org Wed Sep 4 06:48:18 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 4 Sep 2024 06:48:18 GMT Subject: RFR: 8338971: IGV: Add incrementally inlined method name to phase name [v2] In-Reply-To: <2t0mNjColaQYUsTG3gwkuMSgxUKicD2ySXoeFAvmMJM=.7ad261a2-dfe4-4038-a8ab-2ce439637392@github.com> References: <2t0mNjColaQYUsTG3gwkuMSgxUKicD2ySXoeFAvmMJM=.7ad261a2-dfe4-4038-a8ab-2ce439637392@github.com> Message-ID: On Wed, 4 Sep 2024 06:03:54 GMT, Christian Hagedorn wrote: >> This patch adds the method name to the incremental inlining step dumps in IGV which improves debugging issues involving incremental inlining: >> >> >> static void test() { >> method1(); >> method2(); >> method3(); >> } >> >> static void method1() {} >> static void method2() {} >> static void method3() {} >> >> Run with `-XX:+AlwaysIncrementalInline` and IGV print level >=3: >> >> Before patch: >> ![image](https://github.com/user-attachments/assets/a3a1ab32-e7b3-4ccb-8ab2-a75d2b5b6912) >> >> After patch: >> ![image](https://github.com/user-attachments/assets/8100a2fe-1670-4687-b8b8-c8053fbaa7d7) >> >> The patch just prints the method name if we call `print_method()` with `n` being a call node which, AFAICT, only happens for the incremental inlining step. However, even if we call it with another phase at some point, I don't think it hurts to also dump the method name there. >> >> #### Testing >> - Manually verifying change in IGV >> - Building IGV which runs its unit tests >> - Sanity run with a hello world program with `-Xcomp -XX:+AlwaysIncrementalInline -XX:+PrintIdealGraph -XX:PrintIdealGraphLevel=3 -XX:PrintIdealGraphFile=graph.xml` >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > suggestion Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20834#pullrequestreview-2279135884 From rcastanedalo at openjdk.org Wed Sep 4 06:48:19 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 4 Sep 2024 06:48:19 GMT Subject: RFR: 8338971: IGV: Add incrementally inlined method name to phase name [v2] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 05:59:58 GMT, Christian Hagedorn wrote: >> src/hotspot/share/opto/compile.cpp line 5206: >> >>> 5204: call->method()->print_short_name(&ss); >>> 5205: } >>> 5206: } >> >> Suggestion: using `call->_name` instead of `call->method()->print_short_name()` is slightly simpler and more general (should be equivalent for incremental inlining, but will also print stub names, "uncommon trap", etc. when dumping `PHASE_AFTER_ITER_GVN_STEP` graphs on call nodes). >> Suggestion: >> >> ss.print(": %d %s", n->_idx, NodeClassNames[n->Opcode()]); >> if (n->is_Call()) { >> CallNode* call = n->as_Call(); >> if (call->_name != nullptr) { >> ss.print(" - %s", call->_name); >> } >> } > > Thanks for the suggestion. I think `_name` is only set if there is no attached method. So, we could combine your suggestion with what I've had. What do you think? (see pushed update) > > This gives us: > > ![image](https://github.com/user-attachments/assets/33aad438-432a-4252-8e5b-c75220491bf5) You are right, that looks good, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20834#discussion_r1743160978 From amitkumar at openjdk.org Wed Sep 4 07:10:37 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 4 Sep 2024 07:10:37 GMT Subject: RFR: 8332461: ubsan : dependencies.cpp:906:3: runtime error: load of value 4294967295, which is not a valid value for type 'DepType' Message-ID: The error mentioned in the JBS issue is seen on x86_64 as well as on s390x during the build, with `--enable-ubsan` configuration. I have added `-1` to enum to fix this issue for now as mentioned by @MBaesken. But removing the assert itself is also a possible solution, mentioned on the JBS issue. So I will happy to follow the reviews/suggestion if this is not a good fix. ------------- Commit messages: - make ubsan happy Changes: https://git.openjdk.org/jdk/pull/20847/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20847&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332461 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20847.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20847/head:pull/20847 PR: https://git.openjdk.org/jdk/pull/20847 From sroy at openjdk.org Wed Sep 4 08:08:21 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Wed, 4 Sep 2024 08:08:21 GMT Subject: RFR: JDK-8332423 : [PPC64] Remove C1_MacroAssembler::call_c_with_frame_resize [v9] In-Reply-To: References: Message-ID: <-G4IVzLHJKyQ8chFJn8FeSBXGNQCB5CebXlIOGJn5Go=.6a8a2692-7d0d-410e-a775-70cc43a84491@github.com> On Tue, 3 Sep 2024 22:10:30 GMT, Martin Doerr wrote: > You effectively changed the type from `relocInfo::runtime_call_type` to `relocInfo::none` in c1_LIRAssembler_ppc.cpp. This is causing problems with ABIv1. The VM seems to work when switching off C1 on AIX. So, the other files should be ok. So do we then call in c1_LIRA using __ call_c(copyfunc_addr, relocInfo::runtime_call_type); ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19947#issuecomment-2328190806 From chagedorn at openjdk.org Wed Sep 4 08:15:19 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 4 Sep 2024 08:15:19 GMT Subject: RFR: 8338971: IGV: Add incrementally inlined method name to phase name [v2] In-Reply-To: References: <2t0mNjColaQYUsTG3gwkuMSgxUKicD2ySXoeFAvmMJM=.7ad261a2-dfe4-4038-a8ab-2ce439637392@github.com> Message-ID: On Wed, 4 Sep 2024 06:46:04 GMT, Roberto Casta?eda Lozano wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> suggestion > > Marked as reviewed by rcastanedalo (Reviewer). Thanks @robcasloz and @vnkozlov for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20834#issuecomment-2328204581 From epeter at openjdk.org Wed Sep 4 08:21:26 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Sep 2024 08:21:26 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v5] In-Reply-To: References: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> Message-ID: On Tue, 3 Sep 2024 16:23:56 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolved > > src/hotspot/cpu/x86/assembler_x86.cpp line 8470: > >> 8468: void Assembler::vpmaxud(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) { >> 8469: assert(vector_len == AVX_128bit ? VM_Version::supports_avx() : >> 8470: (vector_len == AVX_256bit ? VM_Version::supports_avx2() : VM_Version::supports_avx512bw()), ""); > > avx512bw check here seems wrong. If this is indeed wrong, then we are missing tests, and you should add some more. > src/hotspot/cpu/x86/assembler_x86.cpp line 8479: > >> 8477: void Assembler::vpmaxud(XMMRegister dst, XMMRegister nds, Address src, int vector_len) { >> 8478: assert(vector_len == AVX_128bit ? VM_Version::supports_avx() : >> 8479: (vector_len == AVX_256bit ? VM_Version::supports_avx2() : VM_Version::supports_avx512bw()), ""); > > avx512bw check here seems wrong. If this is indeed wrong, then we are missing tests, and you should add some more. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1743283892 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1743284116 From epeter at openjdk.org Wed Sep 4 08:30:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Sep 2024 08:30:22 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 22:46:07 GMT, Vladimir Kozlov wrote: >> So `~` is a strictly decreasing function in the unsigned integer domain, so we just do a bitwise negation, compute the adjustment there and switch back. > > This deserves comment. When next person need to touch this code it will be "magic" and confusing. I agree with @vnkozlov . > So ~ is a strictly decreasing function in the unsigned integer domain, so we just do a bitwise negation, compute the adjustment there and switch back. And this explanation does not help me, probably I'm just missing some definitions and how they relate to each other... comments in the code are crucial. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1743292167 From epeter at openjdk.org Wed Sep 4 08:30:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Sep 2024 08:30:22 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 20:37:09 GMT, Quan Anh Mai wrote: >> At any rate, I would name `T -> U` > > Yes I have renamed all signed types to `S` and unsigned types to `U`. Regarding making it a member of `KnownBits`, making it a `static` function has the advantage of visibility to me. What do you mean by "advantage of visibility"? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1743293382 From epeter at openjdk.org Wed Sep 4 08:30:24 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Sep 2024 08:30:24 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 20:45:58 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/rangeinference.cpp line 113: >> >>> 111: T match_mask = mismatch == 0 ? std::numeric_limits::max() >>> 112: : ~(std::numeric_limits::max() >> count_leading_zeros(mismatch)); >>> 113: T new_zeros = bits._zeros | (match_mask &~ bounds._lo); >> >> Suggestion: >> >> T new_zeros = bits._zeros | (match_mask & ~bounds._lo); >> >> I think this was a typo? > > It looks more like an `and not` to me :) However, if you prefer the `~` to stick to `bounds._lo` I would make that change. I was confused by it, and wondered if a `&~` operator exists in cpp... ? But this is only a weak preference on my side. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1743297392 From chagedorn at openjdk.org Wed Sep 4 08:57:20 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 4 Sep 2024 08:57:20 GMT Subject: RFR: 8330159: [C2] Remove or clarify Compile::init_start [v6] In-Reply-To: References: <7Q9VAVqBUDDnqEcCOWeRh5N-YC0dsg9lyQsOv6xbO80=.6d55dc58-0281-45fd-a92f-d0f6ddd910cc@github.com> Message-ID: On Tue, 3 Sep 2024 13:49:51 GMT, Yagmur Eren wrote: >> Compile::init_start method contained only an assertion. To cleanup, this method is removed and the locations where this method is called are replaced with the corresponding assertion. See issue: [JDK-8330159](https://bugs.openjdk.org/browse/JDK-8330159) > > Yagmur Eren has updated the pull request incrementally with one additional commit since the last revision: > > Update Compile::verify_init comment Update looks good ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20715#pullrequestreview-2279424400 From dlunden at openjdk.org Wed Sep 4 09:04:28 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 4 Sep 2024 09:04:28 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v4] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 18:12:04 GMT, Vladimir Kozlov wrote: > As we discussed on our previous meeting Aarch64 has very small registers mask - only 10 words. Can you look if that enough or we should increase static size of it? It could be separate RFE. I do have looking at this in my to-do list (as a separate RFE). I'm not sure it is an issue though: the calculation of `RM_SIZE` first ensures that it covers all registers, and then adds three words to cover arguments, locks, and some other things. If it is only 10 words in total on aarch64, it should be because we simply do not have as many registers that we need to refer to. I do not recall from our discussion, is there some particular case where `RM_SIZE` on aarch64 is an issue? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2328307218 From rcastanedalo at openjdk.org Wed Sep 4 09:06:45 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 4 Sep 2024 09:06:45 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v14] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: 8334111: Implementation of Late Barrier Expansion for G1: ppc port ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/1ea2862f..ed9c0232 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=12-13 Stats: 1036 lines in 5 files changed: 947 ins; 64 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From stefank at openjdk.org Wed Sep 4 09:09:19 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 4 Sep 2024 09:09:19 GMT Subject: RFR: 8332461: ubsan : dependencies.cpp:906:3: runtime error: load of value 4294967295, which is not a valid value for type 'DepType' In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 07:04:36 GMT, Amit Kumar wrote: > The error mentioned in the JBS issue is seen on x86_64 as well as on s390x during the build, with `--enable-ubsan` configuration. > > I have added `-1` to enum to fix this issue for now as mentioned by @MBaesken. But removing the assert itself is also a possible solution, mentioned on the JBS issue. > > So I will happy to follow the reviews/suggestion if this is not a good fix. Shouldn't this new enum value also be used in the place that sets DepType to -1?: _type = (DepType)(end_marker-1); // defeat "already at end" assert (An alternative could be to find another way to defeat the "already at end" assert, but I guess that's out-of-scope for this PR) ------------- Changes requested by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20847#pullrequestreview-2279453632 From rcastanedalo at openjdk.org Wed Sep 4 09:10:27 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 4 Sep 2024 09:10:27 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> <3ME602kl5NBEdiWT53M-B-YHtyU4yYgwc-lVFPxjiKg=.7298f764-788a-4c79-a2aa-324407c63813@github.com> Message-ID: On Tue, 3 Sep 2024 12:17:58 GMT, Martin Doerr wrote: > I have an implementation for PPC64: https://github.com/TheRealMDoerr/jdk/commit/ed9c0232f53a15d768804348e1d8a111fed9a19e Do you prefer integrating it soon? That's great, thank you for your work and your comments and suggestions, Martin! I just merged your implementation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2328319555 From dlong at openjdk.org Wed Sep 4 09:15:20 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 4 Sep 2024 09:15:20 GMT Subject: RFR: 8338407: Support grouping several of existing regs into a new one In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 15:11:32 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch to add `group` support to operand? > > ### Some background about this pr > > In some platforms, there is some concept like a group of registers, for example on riscv there is vector group, which is a group of other single vectors. For example, m2 could be v2+v3, or v4+v5, m4 could be v4+v5+v6+v7, or v8+v9+v10+v11. > And, it's helpful to represent these vector group explicitly, otherwise it's tedious and error-prone. For example, in existing code, there's some like below: > > instruct vstring_compareUL(iRegP_R11 str1, iRegI_R12 cnt1, iRegP_R13 str2, iRegI_R14 cnt2, > iRegI_R10 result, vReg_V4 v4, vReg_V5 v5, vReg_V6 v6, vReg_V7 v7, > vReg_V8 v8, vReg_V9 v9, vReg_V10 v10, vReg_V11 v11, > iRegP_R28 tmp1, iRegL_R29 tmp2) > // ... > effect(KILL tmp1, KILL tmp2, USE_KILL str1, USE_KILL str2, USE_KILL cnt1, USE_KILL cnt2, > TEMP v4, TEMP v5, TEMP v6, TEMP v7, TEMP v8, TEMP v9, TEMP v10, TEMP v11); > // ... > __ string_compare_v($str1$$Register, $str2$$Register, > $cnt1$$Register, $cnt2$$Register, $result$$Register, > $tmp1$$Register, $tmp2$$Register, > StrIntrinsicNode::UL); > > The potential problems of the above code are that we need to > 1. write v4~v11 explicitly in its `instruct` and its `effect`, it's tedious; > 2. vector group are represented implicitly, which is not clear and error-prone; > 3. in its encoding `string_compare_v`, we need to specify m4, and v4/v8 explicitly. > 4. if some day we need to adjust from m4 to m2 or m8, it's really tedious and error-prone to make that change in both ad file and macro assembler files. > > > ### This PR > > The proposed solution is to represent a group of vector registers with a real vector group, e.g. `vReg_V4 v4, vReg_V5 v5, vReg_V6 v6, vReg_V7 v7` with `vReg_V4M4 v4m4`, `TEMP v4, TEMP v5, TEMP v6, TEMP v7` with `TEMP v4m4` and in `string_compare_v` implementation, we could query the length of of vector group (i.e. m4 in this case) and set its vtype automatically. > This solution solve the above listed issues, especially the last issue, that means in the future if we need to adjust m4 to m2 or m8, we only need to change the code in ad file and the change is simpler, and no change in string_compare_v is needed. > > ### What it looks like > > For more usage details, please please check [here](https://github.com/openjdk/jdk/compare/master...Hamlin-Li:jdk:explicit-v-reg-gro... OK, I think I see the problem that approach. I believe the reason vecD on arm32 works is because there is a VecD ideal type that has the correct size. Using `match(VecA)` will give the wrong size for groups like vReg_V2_m2. We would need maybe a new syntax, like `match(VecA[2])` or `match(VecA[4])`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20775#issuecomment-2328332600 From mdoerr at openjdk.org Wed Sep 4 09:26:30 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 4 Sep 2024 09:26:30 GMT Subject: RFR: JDK-8332423 : [PPC64] Remove C1_MacroAssembler::call_c_with_frame_resize [v9] In-Reply-To: <-G4IVzLHJKyQ8chFJn8FeSBXGNQCB5CebXlIOGJn5Go=.6a8a2692-7d0d-410e-a775-70cc43a84491@github.com> References: <-G4IVzLHJKyQ8chFJn8FeSBXGNQCB5CebXlIOGJn5Go=.6a8a2692-7d0d-410e-a775-70cc43a84491@github.com> Message-ID: On Wed, 4 Sep 2024 08:06:09 GMT, Suchismith Roy wrote: > > You effectively changed the type from `relocInfo::runtime_call_type` to `relocInfo::none` in c1_LIRAssembler_ppc.cpp. This is causing problems with ABIv1. The VM seems to work when switching off C1 on AIX. So, the other files should be ok. > > So do we then call in c1_LIRA using __ call_c(copyfunc_addr, relocInfo::runtime_call_type); ? Yes, for all 4 `call_c` in c1_LIRAssembler_ppc.cpp. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19947#issuecomment-2328356346 From amitkumar at openjdk.org Wed Sep 4 09:49:49 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 4 Sep 2024 09:49:49 GMT Subject: RFR: 8332461: ubsan : dependencies.cpp:906:3: runtime error: load of value 4294967295, which is not a valid value for type 'DepType' [v2] In-Reply-To: References: Message-ID: > The error mentioned in the JBS issue is seen on x86_64 as well as on s390x during the build, with `--enable-ubsan` configuration. > > I have added `-1` to enum to fix this issue for now as mentioned by @MBaesken. But removing the assert itself is also a possible solution, mentioned on the JBS issue. > > So I will happy to follow the reviews/suggestion if this is not a good fix. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: initialise with undefined_dependency ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20847/files - new: https://git.openjdk.org/jdk/pull/20847/files/1a945549..c9dea975 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20847&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20847&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20847.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20847/head:pull/20847 PR: https://git.openjdk.org/jdk/pull/20847 From amitkumar at openjdk.org Wed Sep 4 09:49:49 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 4 Sep 2024 09:49:49 GMT Subject: RFR: 8332461: ubsan : dependencies.cpp:906:3: runtime error: load of value 4294967295, which is not a valid value for type 'DepType' [v2] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 09:06:24 GMT, Stefan Karlsson wrote: >Shouldn't this new enum value also be used in the place that sets DepType to -1?: Done, Thanks for the suggestion & Please have a look again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20847#issuecomment-2328396937 From mdoerr at openjdk.org Wed Sep 4 10:03:21 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 4 Sep 2024 10:03:21 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded In-Reply-To: References: Message-ID: On Tue, 6 Aug 2024 17:26:55 GMT, Jorn Vernee wrote: > As discussed in the JBS issue: > > FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` > > However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. > > The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. > >
> Performance numbers > x64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op > > > aarch64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op > >
> > As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. > > Testing: tier 1-4 src/hotspot/cpu/s390/stubGenerator_s390.cpp line 3062: > 3060: StubCodeMark mark(this, "StubRoutines", "upcall_stub_load_target"); > 3061: address start = __ pc(); > 3062: __ save_return_pc(); @offamitkumar: Is saving and restoring the return_pc needed? Isn't in preserved by load_heap_oop? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1743436851 From epeter at openjdk.org Wed Sep 4 10:07:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Sep 2024 10:07:27 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v9] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 20:39:41 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > move static_asserts I got myself quite confused reading the code now. I have to admit I did not sleep much the last 2 nights, so I hope it does not just all come down to that... ? But one general comment: @merykitty I think you are very very strong in math. I wasn't bad at it either, and love understanding bit manipulations. But I'm definitely struggling hard, and do not have the time to reverse-engineer the code, or write proofs myself. And I would really like to see more proofs. So this will take quite a bit of effort to make this code more readable. I mean we are going to spend a lot of time in the future with this optimization. There will be countless follow-up bugs with them. So we need to have really high-quality code that many people can fairly easily understand ;) Please just assume that your reviewers are novices in all the tricks you are applying, and lead us through everything step-by-step, starting with clear definitions. All of that said: I really admire your work, and am excited where this is going :) src/hotspot/share/opto/compile.cpp line 4485: > 4483: index_max = sizetype->_hi - 1; > 4484: } > 4485: const TypeInt* iidxtype = TypeInt::make(0, index_max, Type::WidenMax)->is_int(); Can you explain this change, and the new condition `sizetype->_hi > 0`? src/hotspot/share/opto/compile.hpp line 928: > 926: // Workhorse function to sort out the blocked Node_Notes array: > 927: Node_Notes* locate_node_notes(GrowableArray* arr, > 928: int idx, bool can_grow = false); What was wrong with the `inline`? src/hotspot/share/opto/rangeinference.cpp line 46: > 44: RangeInt _bounds; > 45: KnownBits _bits; > 46: }; Could there be a `static_assert` for the `U` unsigned type? A quick comment about the semantics of these classes would be appreciated. src/hotspot/share/opto/rangeinference.cpp line 50: > 48: // Try to tighten the bound constraints from the known bit information > 49: // E.g: if lo = 0 but the lowest bit is always 1 then we can tighten > 50: // lo = 1 Add an example like this: lo = 2, hi = 9 zeros = 1111 ones = 1100 -> 4-aligned 0 1 2 3 4 5 6 7 8 9 10 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 bits: ok . . . ok . . . ok . . bounds: lo hi adjust: --------> lo hi <--- src/hotspot/share/opto/rangeinference.cpp line 55: > 53: adjust_bounds_from_bits(const RangeInt& bounds, const KnownBits& bits) { > 54: // Find the minimum value that is not less than lo and satisfies bits > 55: auto adjust_lo = [](U lo, const KnownBits& bits) { I'm not ok with a lambda that is larger than the body of its enclosing function. We also have no benefit here from using a lambda, other than hiding it. But we could better model this with a class that has public and private methods. src/hotspot/share/opto/rangeinference.cpp line 64: > 62: assert(zero_violation == 0, ""); > 63: return lo; > 64: } It would be nice if you state a definition `violation` in words in a comment. lo = 1100 zeros = 1111 ones = 1111 zv = 1100 ov = 0011 -> I struggle to see what exactly is violated here... -> The "violations" are non-zero, so I'd assume the lo is outside what the bits allow... but that is not true. src/hotspot/share/opto/rangeinference.cpp line 108: > 106: if (new_hi > bounds._hi) { > 107: return {true, false, {}}; > 108: } I need a proof that this is ok. src/hotspot/share/opto/rangeinference.cpp line 118: > 116: // extracting the common prefix of lo and hi and combining with the current > 117: // bit constraints > 118: // E.g: if lo = 0 and hi = 10, then all but the lowest 4 bits must be 0 Please add some ASCII art like I proposed above. src/hotspot/share/opto/rangeinference.cpp line 131: > 129: bool progress = (new_zeros != bits._zeros) || (new_ones != bits._ones); > 130: bool present = ((new_zeros & new_ones) == 0); > 131: return {progress, present, {new_zeros, new_ones}}; Too dense for me to read, after 2 min I gave up. I need comments ;) src/hotspot/share/opto/rangeinference.cpp line 136: > 134: // Try to tighten both the bounds and the bits at the same time > 135: // Iteratively tighten 1 using the other until no progress is made. > 136: // This function converges because bit constraints converge fast. You could say that each iteration constrains at least one new bit, and we have only 32 or 64 bits in total. That is correct, right? src/hotspot/share/opto/rangeinference.cpp line 139: > 137: template > 138: static SimpleCanonicalResult > 139: normalize_constraints_simple(const RangeInt& bounds, const KnownBits& bits) { What is the difference between "canonical" and "normalized"? src/hotspot/share/opto/rangeinference.cpp line 169: > 167: if (srange._lo > srange._hi || > 168: urange._lo > urange._hi || > 169: (_bits._zeros & _bits._ones) != 0) { Am I misunderstanding `zeros` and `ones`? I thought that if `zeros[i] = ones[i] = 1`, then it can be either a 0 or one there, and so if you AND and get a 1, then you know you have multiple options. Or are the bits all inverted, so that `zeros=0000, ones=1111` means we can only have zeros and no ones? Oh dear am I confused now ? src/hotspot/share/opto/rangeinference.cpp line 170: > 168: urange._lo > urange._hi || > 169: (_bits._zeros & _bits._ones) != 0) { > 170: return {false, {}}; Why not make a special function `CanonicalizedTypeIntPrototype::make_empty()`? Would be more readable here. src/hotspot/share/opto/rangeinference.hpp line 46: > 44: U _zeros; > 45: U _ones; > 46: }; Can you please give a definition / specification of what set bits mean, together with a few examples? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2279379523 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1743309996 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1743310916 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1743317541 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1743369065 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1743408727 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1743351860 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1743406337 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1743409735 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1743414178 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1743418278 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1743416368 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1743431677 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1743427354 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1743433348 From epeter at openjdk.org Wed Sep 4 10:07:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Sep 2024 10:07:28 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v9] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 08:41:07 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> move static_asserts > > src/hotspot/share/opto/rangeinference.cpp line 46: > >> 44: RangeInt _bounds; >> 45: KnownBits _bits; >> 46: }; > > Could there be a `static_assert` for the `U` unsigned type? > > A quick comment about the semantics of these classes would be appreciated. I think you could generally add `static_asserts` everywhere you use `U` or `S` types. > src/hotspot/share/opto/rangeinference.cpp line 64: > >> 62: assert(zero_violation == 0, ""); >> 63: return lo; >> 64: } > > It would be nice if you state a definition `violation` in words in a comment. > > > lo = 1100 > zeros = 1111 > ones = 1111 > zv = 1100 > ov = 0011 > -> I struggle to see what exactly is violated here... > -> The "violations" are non-zero, so I'd assume the lo is outside what the bits allow... but that is not true. `one_violation`: bit that could be one, but is zero `zero_violation`: bit that could be zero, but is one Hmm. Do we assume that `KnownBits` is "sane", i.e. that we never have `zeros[i] = 0 = ones[i]`? Because then we are basically asserting that `ones = ~zeros = 0`, right? And so the bits only allow a single bit pattern. I'm probably just very confused and need some more comments here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1743319474 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1743400519 From epeter at openjdk.org Wed Sep 4 10:07:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Sep 2024 10:07:28 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v9] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 08:42:26 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/rangeinference.cpp line 46: >> >>> 44: RangeInt _bounds; >>> 45: KnownBits _bits; >>> 46: }; >> >> Could there be a `static_assert` for the `U` unsigned type? >> >> A quick comment about the semantics of these classes would be appreciated. > > I think you could generally add `static_asserts` everywhere you use `U` or `S` types. Yes, I definately need some explanation about what `present`, `progress`, `data` mean in these classes. It is not directly clear from the use-site to me, and I don't want to reverse-engineer to much as a reader ;) >> src/hotspot/share/opto/rangeinference.cpp line 64: >> >>> 62: assert(zero_violation == 0, ""); >>> 63: return lo; >>> 64: } >> >> It would be nice if you state a definition `violation` in words in a comment. >> >> >> lo = 1100 >> zeros = 1111 >> ones = 1111 >> zv = 1100 >> ov = 0011 >> -> I struggle to see what exactly is violated here... >> -> The "violations" are non-zero, so I'd assume the lo is outside what the bits allow... but that is not true. > > `one_violation`: bit that could be one, but is zero > `zero_violation`: bit that could be zero, but is one > Hmm. Do we assume that `KnownBits` is "sane", i.e. that we never have `zeros[i] = 0 = ones[i]`? > Because then we are basically asserting that `ones = ~zeros = 0`, right? And so the bits only allow a single bit pattern. > > I'm probably just very confused and need some more comments here. I won't read more down before I understand this first part. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1743405601 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1743403130 From epeter at openjdk.org Wed Sep 4 10:07:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Sep 2024 10:07:29 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 20:29:06 GMT, Quan Anh Mai wrote: >> Can you explain the semantics of the combination of these? Each of these defines a subset of the whole int-range. Is the resulting type the intersection of all of these three? > > That's what I thought too, but considering they are constants I think exposing them directly is fine. I have added explanation regarding the meaning of these constraints. About the bit constraints: can you please specify what the set bits mean for `zeros` and `ones`, possibly with some examples? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1743435181 From sroy at openjdk.org Wed Sep 4 11:37:54 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Wed, 4 Sep 2024 11:37:54 GMT Subject: RFR: JDK-8332423 : [PPC64] Remove C1_MacroAssembler::call_c_with_frame_resize [v10] In-Reply-To: References: Message-ID: > JBS Issue : [JDK-8332423](https://bugs.openjdk.org/browse/JDK-8332423) > C1_MacroAssembler::call_c_with_frame_resize is only used with frame_resize == 0. > Also, call_c is adapted as per endianess of system. > We can adapt the exisiting code to handle the endianness check at one place and not have to repeatedly check at multiple places to make calls to call_c. Suchismith Roy has updated the pull request incrementally with three additional commits since the last revision: - LIRA assembler call_c - LIRA assembler call_c - LIRA assembler call_c ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19947/files - new: https://git.openjdk.org/jdk/pull/19947/files/b63c9591..b02b1982 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19947&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19947&range=08-09 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19947/head:pull/19947 PR: https://git.openjdk.org/jdk/pull/19947 From jvernee at openjdk.org Wed Sep 4 11:41:24 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 4 Sep 2024 11:41:24 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded In-Reply-To: References: Message-ID: <5rZOqWZ627bnEJ66wW7Qqye-ZYeAjsK1083MG5GELLk=.c5c144f0-a1a0-483c-ac21-3a948502c3be@github.com> On Wed, 4 Sep 2024 01:29:22 GMT, David Holmes wrote: >> As discussed in the JBS issue: >> >> FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` >> >> However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. >> >> The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. >> >>
>> Performance numbers >> x64: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op >> >> >> aarch64: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op >> >>
>> >> As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. >> >> Testing: tier 1-4 > > src/hotspot/share/prims/upcallLinker.cpp line 142: > >> 140: Handle exception_h(Thread::current(), exception); >> 141: java_lang_Throwable::print_stack_trace(exception_h, tty); >> 142: ShouldNotReachHere(); > > How does `print_stack_trace` not return here? It does return. `ShouldNotReachHere` is used to crash the VM. > test/jdk/java/foreign/TestUpcallStress.java line 27: > >> 25: * @test >> 26: * @requires jdk.foreign.linker != "FALLBACK" >> 27: * @requires os.arch == "aarch64" & os.name == "Linux" > > Only for Linux-aarch64 ?? Yes. The test is very unstable, and the issue is only reproducible on Linux/aarch64 any way. See https://github.com/openjdk/jdk/pull/20479#issuecomment-2278175462 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1743609798 PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1743607061 From jvernee at openjdk.org Wed Sep 4 11:58:20 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 4 Sep 2024 11:58:20 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded In-Reply-To: <5rZOqWZ627bnEJ66wW7Qqye-ZYeAjsK1083MG5GELLk=.c5c144f0-a1a0-483c-ac21-3a948502c3be@github.com> References: <5rZOqWZ627bnEJ66wW7Qqye-ZYeAjsK1083MG5GELLk=.c5c144f0-a1a0-483c-ac21-3a948502c3be@github.com> Message-ID: On Wed, 4 Sep 2024 11:39:10 GMT, Jorn Vernee wrote: >> src/hotspot/share/prims/upcallLinker.cpp line 142: >> >>> 140: Handle exception_h(Thread::current(), exception); >>> 141: java_lang_Throwable::print_stack_trace(exception_h, tty); >>> 142: ShouldNotReachHere(); >> >> How does `print_stack_trace` not return here? > > It does return. `ShouldNotReachHere` is used to crash the VM. `fatal()` might be better here. I could change it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1743638097 From jvernee at openjdk.org Wed Sep 4 12:01:24 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 4 Sep 2024 12:01:24 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 06:14:57 GMT, Fei Yang wrote: >> As discussed in the JBS issue: >> >> FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` >> >> However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. >> >> The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. >> >>
>> Performance numbers >> x64: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op >> >> >> aarch64: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op >> >>
>> >> As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. >> >> Testing: tier 1-4 > > src/hotspot/cpu/riscv/upcallLinker_riscv.cpp line 264: > >> 262: >> 263: __ block_comment("{ load target "); >> 264: __ movptr(j_rarg0, (intptr_t) receiver); > > Hi @JornVernee , Could you please apply following small add-on change for linux-riscv64? As I witnessed build warning with GCC-13. Otherwise, builds fine and the newly-added test/jdk/java/foreign/TestUpcallStress.java is passing. (PS: jdk_foreign tests are passing too) > > > diff --git a/src/hotspot/cpu/riscv/upcallLinker_riscv.cpp b/src/hotspot/cpu/riscv/upcallLinker_riscv.cpp > index 5c45a679316..55160be99d0 100644 > --- a/src/hotspot/cpu/riscv/upcallLinker_riscv.cpp > +++ b/src/hotspot/cpu/riscv/upcallLinker_riscv.cpp > @@ -261,7 +261,7 @@ address UpcallLinker::make_upcall_stub(jobject receiver, Symbol* signature, > __ block_comment("} argument shuffle"); > > __ block_comment("{ load target "); > - __ movptr(j_rarg0, (intptr_t) receiver); > + __ movptr(j_rarg0, (address) receiver); > __ far_call(RuntimeAddress(StubRoutines::upcall_stub_load_target())); // loads Method* into xmethod > __ block_comment("} load target "); > > diff --git a/test/jdk/java/foreign/TestUpcallStress.java b/test/jdk/java/foreign/TestUpcallStress.java > index 3b9b1d4b207..40607746856 100644 > --- a/test/jdk/java/foreign/TestUpcallStress.java > +++ b/test/jdk/java/foreign/TestUpcallStress.java > @@ -24,7 +24,7 @@ > /* > * @test > * @requires jdk.foreign.linker != "FALLBACK" > - * @requires os.arch == "aarch64" & os.name == "Linux" > + * @requires (os.arch == "aarch64" | os.arch=="riscv64") & os.name == "Linux" > * @requires os.maxMemory > 4G > * @modules java.base/jdk.internal.foreign > * @build NativeTestHelper CallGeneratorHelper TestUpcallBase Were you able to reproduce the original issue on RISC-V? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1743643186 From mdoerr at openjdk.org Wed Sep 4 12:12:21 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 4 Sep 2024 12:12:21 GMT Subject: RFR: JDK-8332423 : [PPC64] Remove C1_MacroAssembler::call_c_with_frame_resize [v10] In-Reply-To: References: Message-ID: <_imlBKIPw74SoiZujwy0uWDZcbwEDOC_VQUYDhiXoYI=.b0793b06-f7e3-48c5-b81e-df4fc77d1ab6@github.com> On Wed, 4 Sep 2024 11:37:54 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8332423](https://bugs.openjdk.org/browse/JDK-8332423) >> C1_MacroAssembler::call_c_with_frame_resize is only used with frame_resize == 0. >> Also, call_c is adapted as per endianess of system. >> We can adapt the exisiting code to handle the endianness check at one place and not have to repeatedly check at multiple places to make calls to call_c. > > Suchismith Roy has updated the pull request incrementally with three additional commits since the last revision: > > - LIRA assembler call_c > - LIRA assembler call_c > - LIRA assembler call_c LGTM. Hopefully the tests will pass on AIX. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19947#pullrequestreview-2279962886 From fyang at openjdk.org Wed Sep 4 12:50:20 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 4 Sep 2024 12:50:20 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 11:58:49 GMT, Jorn Vernee wrote: > Were you able to reproduce the original issue on RISC-V? Yeah. I can reproduce similar crash on linux-riscv64 platform running this new test as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1743722070 From jvernee at openjdk.org Wed Sep 4 12:51:47 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 4 Sep 2024 12:51:47 GMT Subject: RFR: 8338123: Linker crash when building a downcall handle with many arguments in x64 Message-ID: - Adjust downcall stub sizes based on latest version. (per method described in https://github.com/openjdk/jdk/pull/12908) - Beef up test for large stubs to also cover this particular case. ------------- Commit messages: - use junit - make test more rebust - add test - adjust downcall stub sizes Changes: https://git.openjdk.org/jdk/pull/20842/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20842&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338123 Stats: 31 lines in 2 files changed: 20 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/20842.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20842/head:pull/20842 PR: https://git.openjdk.org/jdk/pull/20842 From jvernee at openjdk.org Wed Sep 4 13:11:57 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 4 Sep 2024 13:11:57 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded [v2] In-Reply-To: References: Message-ID: > As discussed in the JBS issue: > > FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` > > However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. > > The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. > >
> Performance numbers > x64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op > > > aarch64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op > >
> > As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. > > Testing: tier 1-4 Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: Adjust ppc & RISC-V code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20479/files - new: https://git.openjdk.org/jdk/pull/20479/files/8dcb14ff..c478c08f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20479&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20479&range=00-01 Stats: 6 lines in 2 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20479.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20479/head:pull/20479 PR: https://git.openjdk.org/jdk/pull/20479 From jvernee at openjdk.org Wed Sep 4 13:14:55 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 4 Sep 2024 13:14:55 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded [v3] In-Reply-To: References: Message-ID: > As discussed in the JBS issue: > > FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` > > However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. > > The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. > >
> Performance numbers > x64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op > > > aarch64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op > >
> > As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. > > Testing: tier 1-4 Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: add RISC-V as target platform ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20479/files - new: https://git.openjdk.org/jdk/pull/20479/files/c478c08f..1558ad9c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20479&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20479&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20479.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20479/head:pull/20479 PR: https://git.openjdk.org/jdk/pull/20479 From jvernee at openjdk.org Wed Sep 4 13:14:55 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 4 Sep 2024 13:14:55 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded [v3] In-Reply-To: References: Message-ID: <7UkbLuak7QTMvkgLpAVpzBro8RX-WxPkA5bAS7pvTu4=.1df9e375-0025-45be-a7c1-52ec41ccf8c4@github.com> On Wed, 4 Sep 2024 12:46:21 GMT, Fei Yang wrote: >> Were you able to reproduce the original issue on RISC-V? > >> Were you able to reproduce the original issue on RISC-V? > > Yeah. I can reproduce similar crash on linux-riscv64 platform running this new test as well. Ok, I'll add riscv as one of the target platforms then ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1743766814 From mcimadamore at openjdk.org Wed Sep 4 13:15:19 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 4 Sep 2024 13:15:19 GMT Subject: RFR: 8338123: Linker crash when building a downcall handle with many arguments in x64 In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 17:52:35 GMT, Jorn Vernee wrote: > - Adjust downcall stub sizes based on latest version. (per method described in https://github.com/openjdk/jdk/pull/12908) > - Beef up test for large stubs to also cover this particular case. Marked as reviewed by mcimadamore (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20842#pullrequestreview-2280166021 From stefank at openjdk.org Wed Sep 4 13:18:19 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 4 Sep 2024 13:18:19 GMT Subject: RFR: 8332461: ubsan : dependencies.cpp:906:3: runtime error: load of value 4294967295, which is not a valid value for type 'DepType' [v2] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 09:49:49 GMT, Amit Kumar wrote: >> The error mentioned in the JBS issue is seen on x86_64 as well as on s390x during the build, with `--enable-ubsan` configuration. >> >> I have added `-1` to enum to fix this issue for now as mentioned by @MBaesken. But removing the assert itself is also a possible solution, mentioned on the JBS issue. >> >> So I will happy to follow the reviews/suggestion if this is not a good fix. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > initialise with undefined_dependency Looks good from my POV, but I'd like to see one of the compiler devs to properly Review this. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20847#pullrequestreview-2280176679 From mdoerr at openjdk.org Wed Sep 4 13:43:26 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 4 Sep 2024 13:43:26 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded [v3] In-Reply-To: References: Message-ID: <4htYDaThzZbOHDP1tq8hF60018HTq7Br-K9akXe8fwM=.0e17d325-896a-4f8d-aebe-3a3d91ae77b1@github.com> On Wed, 4 Sep 2024 13:14:55 GMT, Jorn Vernee wrote: >> As discussed in the JBS issue: >> >> FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` >> >> However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. >> >> The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. >> >>
>> Performance numbers >> x64: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op >> >> >> aarch64: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op >> >>
>> >> As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. >> >> Testing: tier 1-4 > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > add RISC-V as target platform PPC64 code looks good. Thanks! ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20479#pullrequestreview-2280263851 From jkarthikeyan at openjdk.org Wed Sep 4 13:47:25 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 4 Sep 2024 13:47:25 GMT Subject: RFR: 8336860: x86: Change integer src operand for CMoveL of 0 and 1 to long [v4] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 06:01:50 GMT, Christian Hagedorn wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Move architecture checks into IR > > Testing looked good! Thank you for the testing @chhagedorn, and thanks everyone for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20275#issuecomment-2329107464 From jkarthikeyan at openjdk.org Wed Sep 4 13:47:27 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 4 Sep 2024 13:47:27 GMT Subject: Integrated: 8336860: x86: Change integer src operand for CMoveL of 0 and 1 to long In-Reply-To: References: Message-ID: On Mon, 22 Jul 2024 03:33:23 GMT, Jasmine Karthikeyan wrote: > Hi all, > This patch fixes `cmovL_imm_01*` instructions matching against an integer immediate 1 instead of a long immediate 1. I noticed while looking at the backend implementation of CMove that the rules specify `immI_1` instead of `immL1`, which means that the instructions can't be matched and instead falls through to the base case. I added a small benchmark and got these results: > > Baseline Patch Improvement > Benchmark Mode Cnt Score Error Units Score Error Units > BasicRules.cmovL_imm_01 avgt 15 259.073 ? 5.806 ns/op 231.108 ? 2.730 ns/op (+ 11.41%) > > Thoughts and reviews would be appreciated! This pull request has now been integrated. Changeset: 6f8714ee Author: Jasmine Karthikeyan URL: https://git.openjdk.org/jdk/commit/6f8714ee197eb48923209299fd842f6757f0a945 Stats: 109 lines in 4 files changed: 105 ins; 0 del; 4 mod 8336860: x86: Change integer src operand for CMoveL of 0 and 1 to long Reviewed-by: epeter, chagedorn, shade, qamai, jbhateja ------------- PR: https://git.openjdk.org/jdk/pull/20275 From matsaave at openjdk.org Wed Sep 4 15:07:38 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 4 Sep 2024 15:07:38 GMT Subject: RFR: 8338924: C1: assert(0 <= i && i < _len) failed: illegal index 5 for length 5 [v3] In-Reply-To: References: Message-ID: <76lwqzv73-iutWjMq2VNSh-DGc4-dnwY0DXoKCJ5R7Q=.83cdbc75-dc58-49ec-a541-c2ec191506b1@github.com> > The change in [JDK-8335664](https://bugs.openjdk.org/browse/JDK-8335664) caused a crash when initializing basic blocks with `-Xcomp`. This change introduces a check to see if JSR is the last bytecode in its method so that expected behavior matches the previous patch. Verified with tier 1-6 tests. Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Fixed indentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20732/files - new: https://git.openjdk.org/jdk/pull/20732/files/e3241704..0a4d7606 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20732&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20732&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20732.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20732/head:pull/20732 PR: https://git.openjdk.org/jdk/pull/20732 From mli at openjdk.org Wed Sep 4 15:19:24 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 4 Sep 2024 15:19:24 GMT Subject: RFR: 8338407: Support grouping several of existing regs into a new one In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 09:13:11 GMT, Dean Long wrote: > OK, I think I see the problem that approach. I believe the reason vecD on arm32 works is because there is a VecD ideal type that has the correct size. Using `match(VecA)` will give the wrong size for groups like vReg_V2_m2. Yes. > We would need maybe a new syntax, like `match(VecA[2])` or `match(VecA[4])`. We could have something like `match(VecA[2])`. In this pr, I'm implementing a similar one, it will looks like below operand vReg_V2M2() %{ group(vReg_V2, vReg_V3) %} and operand vReg_V8M4() %{ group(vReg_V8, vReg_V9, vReg_V10, vReg_V11) %} For a more detailed demo of its usage, please have a look at this [merged pr](https://github.com/openjdk/jdk/compare/master...Hamlin-Li:jdk:explicit-v-reg-group-v3) (merge of this pr and riscv demo usage, I say `demo` because if this pr is accepted, there're more code in riscv to be changed(i.e. simplied)) The good part of this `group operand` implementation is that it does not change anything inside chaitin reg allocation, it does not change anything for any existing `operand`. A `group operand` is just a "grouping" of other operands, a `group operand` itself does not mean anything to the underlying (or to say the underlying has no knowledge of this `group operand`, especially the chaitin), an `group operand` is just "ungrouped" when parsing the `instruct`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20775#issuecomment-2329351107 From kvn at openjdk.org Wed Sep 4 16:12:22 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 4 Sep 2024 16:12:22 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v4] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 09:01:40 GMT, Daniel Lund?n wrote: > is there some particular case where RM_SIZE on aarch64 is an issue? Both `register_aarch64.hpp` and `register_x86.hpp` (64-bits) specify `number_of_registers = 32`. So why `RM_SIZE` is different? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2329471593 From kvn at openjdk.org Wed Sep 4 16:22:20 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 4 Sep 2024 16:22:20 GMT Subject: RFR: 8338924: C1: assert(0 <= i && i < _len) failed: illegal index 5 for length 5 [v3] In-Reply-To: <76lwqzv73-iutWjMq2VNSh-DGc4-dnwY0DXoKCJ5R7Q=.83cdbc75-dc58-49ec-a541-c2ec191506b1@github.com> References: <76lwqzv73-iutWjMq2VNSh-DGc4-dnwY0DXoKCJ5R7Q=.83cdbc75-dc58-49ec-a541-c2ec191506b1@github.com> Message-ID: On Wed, 4 Sep 2024 15:07:38 GMT, Matias Saavedra Silva wrote: >> The change in [JDK-8335664](https://bugs.openjdk.org/browse/JDK-8335664) caused a crash when initializing basic blocks with `-Xcomp`. This change introduces a check to see if JSR is the last bytecode in its method so that expected behavior matches the previous patch. Verified with tier 1-6 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Fixed indentation Update is good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20732#pullrequestreview-2280700242 From dlunden at openjdk.org Wed Sep 4 16:26:22 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 4 Sep 2024 16:26:22 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v4] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 16:09:42 GMT, Vladimir Kozlov wrote: > Both register_aarch64.hpp and register_x86.hpp (64-bits) specify number_of_registers = 32. So why RM_SIZE is different? The `RM_SIZE` calculation is based on `RegisterForm::_reg_ctr` which (I think) is incremented during parsing of the ad-files. As far as I can tell, `number_of_registers` do not influence this calculation. I can investigate more later on (focusing on updates for this PR now). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2329499966 From amitkumar at openjdk.org Wed Sep 4 16:49:26 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 4 Sep 2024 16:49:26 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded [v3] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 10:00:47 GMT, Martin Doerr wrote: >> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: >> >> add RISC-V as target platform > > src/hotspot/cpu/s390/stubGenerator_s390.cpp line 3062: > >> 3060: StubCodeMark mark(this, "StubRoutines", "upcall_stub_load_target"); >> 3061: address start = __ pc(); >> 3062: __ save_return_pc(); > > @offamitkumar: Is saving and restoring the return_pc needed? Isn't in preserved by load_heap_oop? I looked into it, but couldn't find out. But I remove the `save_return_pc` & `restore_return_pc` and everything seems fine. So maybe we can remove it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1744116982 From amitkumar at openjdk.org Wed Sep 4 16:49:27 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 4 Sep 2024 16:49:27 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded [v3] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 16:45:04 GMT, Amit Kumar wrote: >> src/hotspot/cpu/s390/stubGenerator_s390.cpp line 3062: >> >>> 3060: StubCodeMark mark(this, "StubRoutines", "upcall_stub_load_target"); >>> 3061: address start = __ pc(); >>> 3062: __ save_return_pc(); >> >> @offamitkumar: Is saving and restoring the return_pc needed? Isn't in preserved by load_heap_oop? > > I looked into it, but couldn't find out. But I remove the `save_return_pc` & `restore_return_pc` and everything seems fine. So maybe we can remove it. Tier1 test are fine with/without "saving & restoring" return_pc; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1744117640 From vlivanov at openjdk.org Wed Sep 4 17:05:19 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 4 Sep 2024 17:05:19 GMT Subject: RFR: 8339299: C1 will miss type profile when inline final method [v3] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 01:50:54 GMT, Dean Long wrote: >> I am not sure how Dean proposal will help. I agree with Vladimir's suggestion - C1 should not optimize call sites in Level 3 compilation. > > I was just trying to think how to preserve the original intend of this code, which seemed to be "skip profiling if the profiling info is not needed", but getting it right seems complicated, so I'm OK with always doing it. > >> C1 should not optimize call sites in Level 3 compilation. > > You mean don't use can_be_statically_bound() and other checks to devirtualize virtual calls? I believe C1 does that at all compilation levels. It does look attractive to align the logic with C2 usage of type profiles (avoid profiling when C2 doesn't consume the data). But I feel more comfortable unifying different modes of profiling at the expense of some micro-optimization opportunities. In other words, if interpreter collects some bit of data, I'd prefer to see C1 doing the same (and vice versa). I took a look at interpreter code (in `TemplateTable::invokevirtual_helper()`) and it makes the decision at runtime based on `is_vfinal` flag on `ResolvedMethodEntry`. The flag is set in `ConstantPoolCache::set_direct_or_vtable_call()` and covers both private and final methods. Moreover, receiver profiling is not performed on `invokeinterface` of private methods which is not taken into account by `should_profile_receiver_type()` now. It looks tempting to replicated what interpreter does (inspect `vfinal` flag on resolved method), but C1 has to gracefully work with not-yet-resolved call sites. So, either a recompilation or a runtime check is needed to align the behavior with interpreter. I haven't looked into the details, but performing profiling in C1 when the rest of the JVM doesn't expect that makes me a bit nervous. Smells like a possible source of profile data corruption. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20786#discussion_r1744135759 From vlivanov at openjdk.org Wed Sep 4 17:05:20 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 4 Sep 2024 17:05:20 GMT Subject: RFR: 8339299: C1 will miss type profile when inline final method [v3] In-Reply-To: References: <-b1JTFIJbwhOHz4a5fasgVmF-aeaOuUuD-UWvNr_XSs=.1e9fa592-c984-40e7-b317-21a072e583b9@github.com> Message-ID: On Mon, 2 Sep 2024 02:35:43 GMT, kuaiwei wrote: >> test/hotspot/jtreg/compiler/cha/cha_control.txt line 1: >> >>> 1: [ >> >> Currently, the prevalent way to specify compiler directives is through WhiteBox API at runtime (through `WhiteBox.addCompilerDirective(String directive)`). Please, follow the same pattern here. I find it more convenient to reason about test logic when all the pieces are present in a single place. > > Thanks for your suggestion. I will change the test case. Thanks, the test case looks good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20786#discussion_r1744138546 From mdoerr at openjdk.org Wed Sep 4 17:07:20 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 4 Sep 2024 17:07:20 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded [v3] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 16:45:38 GMT, Amit Kumar wrote: >> I looked into it, but couldn't find out. But I remove the `save_return_pc` & `restore_return_pc` and everything seems fine. So maybe we can remove it. > > Tier1 test are fine with/without "saving & restoring" return_pc; I found it: https://github.com/openjdk/jdk/blob/433f6d8a0643b59663bf76c0f3a2af27a6cc56b7/src/hotspot/cpu/s390/gc/g1/g1BarrierSetAssembler_s390.cpp#L238 Called here: https://github.com/openjdk/jdk/blob/433f6d8a0643b59663bf76c0f3a2af27a6cc56b7/src/hotspot/cpu/s390/gc/g1/g1BarrierSetAssembler_s390.cpp#L115 Other GCs with load barriers are not implemented, so the save&restore code is redundant. The stub is frameless and only needs the save&restore code when calling C. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1744140160 From vlivanov at openjdk.org Wed Sep 4 17:26:23 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 4 Sep 2024 17:26:23 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access In-Reply-To: References: Message-ID: <97WNhEuU6bfx4gw4qgg8mCeUIOobL5WFxq35I8Bb56o=.7cc209f5-1872-43d0-8bc5-182787d2f557@github.com> On Thu, 4 Jul 2024 12:17:51 GMT, Tobias Holenstein wrote: > We failed in `LibraryCallKit::inline_unsafe_access()` while trying to inline `Unsafe::getShortUnaligned`. > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java#L86 > The reason is that base (the array) is `ConP #null` hidden behind two `CheckCastPP` with `speculative=byte[int:>=0]` > > We call `Node* adr = make_unsafe_address(base, offset, type, kind == Relaxed);` > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2361 > - with **base** = `147 CheckCastPP` > - `118 ConP === 0 [[[ 106 101 71 ] #null` > type > > Depending on the **offset** we go two different paths in `LibraryCallKit::make_unsafe_address` which both lead to the same error in the end. > 1. For `UNSAFE.getShortUnaligned(array, 1_049_000)` we get kind = `Type::AnyPtr` because `offset >= os::vm_page_size()`. Since we assume base can't be null we insert an assert: > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2111 > > 2. whereas for `UNSAFE.getShortUnaligned(array, 1)` we get kind = `Type:: OopPtr` > https://github.com/openjdk/jdk/blob/c17fa910cf3bad48547a3f0d68a30795ec3194e6/src/hotspot/share/opto/library_call.cpp#L2078 > and insert a null check > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2090 > In both cases we return call `basic_plus_adr(..)` on a base being `top()` which returns **adr** = `1 Con === 0 [[ ]] #top` > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2386 => `_gvn.type(adr)` is _top_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2394 => `adr_type` is _nullptr_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2405-L2406 => `BasicType bt` is _T_ILLEGAL_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2424 => we fail here with `SIGSEGV: null pointer dereference` because `alias_type->adr_type()` is _nullptr_ > > ### Fix > the fix is to bailed out in this case > https://github.com/openjdk/jdk/blob/3d5d51e228c19a... Thanks for the clarifications, Toby. I reconsidered my conclusion about root cause. I agree that redundant `CheckCastPP` causes problems here, but what surprises me is that `null_check_oop` successfully detects that `base == NULL` while `LibraryCallKit::classify_unsafe_addr()` has a hard time doing the same. IMO the discrepancy is the source of the problem here. Can you share more details why it happens? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20033#issuecomment-2329610838 From thartmann at openjdk.org Wed Sep 4 17:28:28 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 4 Sep 2024 17:28:28 GMT Subject: RFR: 8338924: C1: assert(0 <= i && i < _len) failed: illegal index 5 for length 5 [v3] In-Reply-To: <76lwqzv73-iutWjMq2VNSh-DGc4-dnwY0DXoKCJ5R7Q=.83cdbc75-dc58-49ec-a541-c2ec191506b1@github.com> References: <76lwqzv73-iutWjMq2VNSh-DGc4-dnwY0DXoKCJ5R7Q=.83cdbc75-dc58-49ec-a541-c2ec191506b1@github.com> Message-ID: <8r8na8AGd_GKLNk_tCEQt-jt_XpElU695JzxwYMgboo=.c7d61a78-3f95-47c8-a282-6a1e1398c581@github.com> On Wed, 4 Sep 2024 15:07:38 GMT, Matias Saavedra Silva wrote: >> The change in [JDK-8335664](https://bugs.openjdk.org/browse/JDK-8335664) caused a crash when initializing basic blocks with `-Xcomp`. This change introduces a check to see if JSR is the last bytecode in its method so that expected behavior matches the previous patch. Verified with tier 1-6 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Fixed indentation Still good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20732#pullrequestreview-2280842248 From matsaave at openjdk.org Wed Sep 4 17:28:28 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 4 Sep 2024 17:28:28 GMT Subject: RFR: 8338924: C1: assert(0 <= i && i < _len) failed: illegal index 5 for length 5 [v3] In-Reply-To: <8r8na8AGd_GKLNk_tCEQt-jt_XpElU695JzxwYMgboo=.c7d61a78-3f95-47c8-a282-6a1e1398c581@github.com> References: <76lwqzv73-iutWjMq2VNSh-DGc4-dnwY0DXoKCJ5R7Q=.83cdbc75-dc58-49ec-a541-c2ec191506b1@github.com> <8r8na8AGd_GKLNk_tCEQt-jt_XpElU695JzxwYMgboo=.c7d61a78-3f95-47c8-a282-6a1e1398c581@github.com> Message-ID: On Wed, 4 Sep 2024 17:22:48 GMT, Tobias Hartmann wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed indentation > > Still good. Thanks for the reviews @TobiHartmann and @vnkozlov! Also thank you for the assistance @dean-long! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20732#issuecomment-2329611678 From matsaave at openjdk.org Wed Sep 4 17:28:29 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 4 Sep 2024 17:28:29 GMT Subject: Integrated: 8338924: C1: assert(0 <= i && i < _len) failed: illegal index 5 for length 5 In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 18:01:16 GMT, Matias Saavedra Silva wrote: > The change in [JDK-8335664](https://bugs.openjdk.org/browse/JDK-8335664) caused a crash when initializing basic blocks with `-Xcomp`. This change introduces a check to see if JSR is the last bytecode in its method so that expected behavior matches the previous patch. Verified with tier 1-6 tests. This pull request has now been integrated. Changeset: 1353601d Author: Matias Saavedra Silva URL: https://git.openjdk.org/jdk/commit/1353601dcc8f9ec3e12dea21dc61b3585a154b13 Stats: 23 lines in 4 files changed: 16 ins; 2 del; 5 mod 8338924: C1: assert(0 <= i && i < _len) failed: illegal index 5 for length 5 Co-authored-by: Dean Long Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/20732 From vlivanov at openjdk.org Wed Sep 4 18:46:20 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 4 Sep 2024 18:46:20 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access In-Reply-To: <97WNhEuU6bfx4gw4qgg8mCeUIOobL5WFxq35I8Bb56o=.7cc209f5-1872-43d0-8bc5-182787d2f557@github.com> References: <97WNhEuU6bfx4gw4qgg8mCeUIOobL5WFxq35I8Bb56o=.7cc209f5-1872-43d0-8bc5-182787d2f557@github.com> Message-ID: On Wed, 4 Sep 2024 17:23:54 GMT, Vladimir Ivanov wrote: > I reconsidered my conclusion about root cause. I'd like to clarify one point here: I still think speculative types may disturb exact checks against preallocated type constants [1] [2]. As an example, `TypePtr::cleanup_speculative()` [3] has to align `inline_depth()` in order to make comparison with `NULL_PTR` to work accurately. It doesn't look like `NULL_PTR` is affected (since `TypePtr::cleanup_speculative()` unconditionally drops speculative part for it), but `Type*::BOTTOM`/`Type*::NOTNULL` et all seem susceptible to the problem. @rwestrel, what do you think? [1] > IMO the root problem is in LibraryCallKit::classify_unsafe_addr() where base_type == TypePtr::NULL_PTR doesn't hold in presence of speculative part [2] > Initially, I had only == TypePtr::NULL_PTR comparisons in mind, but on a second thought all comparisons with preallocated type constants are susceptible to false negatives in presence of speculative part. [3] const Type* TypePtr::cleanup_speculative() const { if (speculative() == nullptr) { return this; } const Type* no_spec = remove_speculative(); // If this is NULL_PTR then we don't need the speculative type // (with_inline_depth in case the current type inline depth is // InlineDepthTop) if (no_spec == NULL_PTR->with_inline_depth(inline_depth())) { return no_spec; } if (above_centerline(speculative()->ptr())) { return no_spec; } const TypeOopPtr* spec_oopptr = speculative()->isa_oopptr(); // If the speculative may be null and is an inexact klass then it // doesn't help if (speculative() != TypePtr::NULL_PTR && speculative()->maybe_null() && (spec_oopptr == nullptr || !spec_oopptr->klass_is_exact())) { return no_spec; } return this; } ------------- PR Comment: https://git.openjdk.org/jdk/pull/20033#issuecomment-2329745385 From qamai at openjdk.org Wed Sep 4 19:10:37 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 4 Sep 2024 19:10:37 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v10] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: add more comments, group KnownBits ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/ae473850..2bf545fb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=08-09 Stats: 274 lines in 4 files changed: 167 ins; 50 del; 57 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Wed Sep 4 19:18:26 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 4 Sep 2024 19:18:26 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v9] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 08:36:01 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> move static_asserts > > src/hotspot/share/opto/compile.cpp line 4485: > >> 4483: index_max = sizetype->_hi - 1; >> 4484: } >> 4485: const TypeInt* iidxtype = TypeInt::make(0, index_max, Type::WidenMax)->is_int(); > > Can you explain this change, and the new condition `sizetype->_hi > 0`? If `sizetype->_hi == 0`, this is a load from an empty array and is dead anyway. `sizetype->_hi > 0` ensures that we will have a sane index. > src/hotspot/share/opto/compile.hpp line 928: > >> 926: // Workhorse function to sort out the blocked Node_Notes array: >> 927: Node_Notes* locate_node_notes(GrowableArray* arr, >> 928: int idx, bool can_grow = false); > > What was wrong with the `inline`? I'm not sure but it fails to compile with `inline`, which makes sense because normally an inline function needs its definition at the same place. > src/hotspot/share/opto/rangeinference.cpp line 50: > >> 48: // Try to tighten the bound constraints from the known bit information >> 49: // E.g: if lo = 0 but the lowest bit is always 1 then we can tighten >> 50: // lo = 1 > > Add an example like this: > > lo = 2, hi = 9 > zeros = 1111 > ones = 1100 > -> 4-aligned > > 0 1 2 3 4 5 6 7 8 9 10 > 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 > bits: ok . . . ok . . . ok . . > bounds: lo hi > adjust: --------> lo hi <--- Thanks, that is a great example. > src/hotspot/share/opto/rangeinference.cpp line 55: > >> 53: adjust_bounds_from_bits(const RangeInt& bounds, const KnownBits& bits) { >> 54: // Find the minimum value that is not less than lo and satisfies bits >> 55: auto adjust_lo = [](U lo, const KnownBits& bits) { > > I'm not ok with a lambda that is larger than the body of its enclosing function. We also have no benefit here from using a lambda, other than hiding it. But we could better model this with a class that has public and private methods. Sure, I have moved it to a `static` method. > src/hotspot/share/opto/rangeinference.cpp line 108: > >> 106: if (new_hi > bounds._hi) { >> 107: return {true, false, {}}; >> 108: } > > I need a proof that this is ok. Please let me know if the comment does not persuade you. > src/hotspot/share/opto/rangeinference.cpp line 139: > >> 137: template >> 138: static SimpleCanonicalResult >> 139: normalize_constraints_simple(const RangeInt& bounds, const KnownBits& bits) { > > What is the difference between "canonical" and "normalized"? I forgot this one, fixed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1744292281 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1744293044 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1744295532 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1744297138 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1744296535 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1744297476 From qamai at openjdk.org Wed Sep 4 19:18:27 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 4 Sep 2024 19:18:27 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v9] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 09:40:10 GMT, Emanuel Peter wrote: >> I think you could generally add `static_asserts` everywhere you use `U` or `S` types. > > Yes, I definately need some explanation about what `present`, `progress`, `data` mean in these classes. It is not directly clear from the use-site to me, and I don't want to reverse-engineer to much as a reader ;) Done that, hope it is more clear now. >> `one_violation`: bit that could be one, but is zero >> `zero_violation`: bit that could be zero, but is one >> Hmm. Do we assume that `KnownBits` is "sane", i.e. that we never have `zeros[i] = 0 = ones[i]`? >> Because then we are basically asserting that `ones = ~zeros = 0`, right? And so the bits only allow a single bit pattern. >> >> I'm probably just very confused and need some more comments here. > > I won't read more down before I understand this first part. Yes it seems you have misunderstood `zeros` and `ones`. This suggests me grouping them into `KnownBits` and I have added explanation regarding their semantics. In short, for each position such that the corresponding bit in `zeros` is set, the corresponding bit in the value must be unset, similar with `ones`. As a result, a sane bit constraints would require `(zeros & ones) == 0` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1744293432 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1744295301 From qamai at openjdk.org Wed Sep 4 19:25:40 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 4 Sep 2024 19:25:40 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v11] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: fix build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/2bf545fb..4f4a6bea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=09-10 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Wed Sep 4 19:50:01 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 4 Sep 2024 19:50:01 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v12] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: fix build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/4f4a6bea..8d14f8ee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=10-11 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Wed Sep 4 19:50:06 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 4 Sep 2024 19:50:06 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 14:27:22 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - fix compile errors >> - Merge branch 'master' into unsignedbounds >> - add comments >> - Merge branch 'master' into unsignedbounds >> - fix release build >> - add comments, group arguments to reduce C-style reference passing arguments >> - fix tests, add verify >> - add unit tests >> - fix template parameter >> - refactor >> - ... and 1 more: https://git.openjdk.org/jdk/compare/d8e4d3f2...d5ad9f1a > > What are your thoughts on how we are going to debug-print the new int-type? We probably do not always want to print the KnownBits, in general that is way too verbose. But in some alignment example, it would be nice to know that it is the `int/long` range, but the lowest 3 bits are always zero, hence 8 byte aligned. @eme64 Thanks for your patience, I believe the sole reason I can navigate this algorithm is that I am the author, so all the logic seems so natural to me =D Please let me know if there is any place you are still confused. @vnkozlov Rethinking about the format, since I decided to drop the bit information in normal dumping, it seems natural to make the dumper more aware of different bound values. Regarding JMH benchmark, this patch is likely to not provide any benefit directly, as no node has taken advantage of additional information. The main benefit comes from arithmetic nodes taking advantage of additional information to enhance their analysis as well as simplify current ones. I do not have concrete results regarding compilation time, I have only run a few simple comparisons with `-XX:+CITime -Xcomp` and it seems that the compilation time does not change significantly. Thanks very much. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2329841995 From mdoerr at openjdk.org Wed Sep 4 20:11:24 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 4 Sep 2024 20:11:24 GMT Subject: RFR: JDK-8332423 : [PPC64] Remove C1_MacroAssembler::call_c_with_frame_resize [v6] In-Reply-To: References: <0XReE7gZIQm55hR-SDnLB5664zfafkhOyivXTmGEIFk=.be01447c-7667-45f5-b1c4-08688b75ce4e@github.com> Message-ID: On Mon, 2 Sep 2024 13:01:44 GMT, Varada M wrote: >> Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: >> >> - header file change >> - remove frame_resize > > LGTM! > tier1 testing done on linux-ppc64le with both release and fastdebug, no related failures. > Thank you @varada1110: Can you use "Approve" instead of "Comment", please? Otherwise, your review doesn't get recorded properly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19947#issuecomment-2329881060 From kvn at openjdk.org Wed Sep 4 21:51:43 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 4 Sep 2024 21:51:43 GMT Subject: RFR: 8332461: ubsan : dependencies.cpp:906:3: runtime error: load of value 4294967295, which is not a valid value for type 'DepType' [v2] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 09:49:49 GMT, Amit Kumar wrote: >> The error mentioned in the JBS issue is seen on x86_64 as well as on s390x during the build, with `--enable-ubsan` configuration. >> >> I have added `-1` to enum to fix this issue for now as mentioned by @MBaesken. But removing the assert itself is also a possible solution, mentioned on the JBS issue. >> >> So I will happy to follow the reviews/suggestion if this is not a good fix. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > initialise with undefined_dependency Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20847#pullrequestreview-2281458123 From kvn at openjdk.org Wed Sep 4 22:16:54 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 4 Sep 2024 22:16:54 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v12] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 19:50:01 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > fix build My main complain now it your spread of `is_int()`, 'is_long()', 'is_integer()' all over C2 code. Why not call them inside `make()`? src/hotspot/share/opto/type.hpp line 29: > 27: > 28: #include "opto/adlcVMDeps.hpp" > 29: #include "opto/compile.hpp" Please, don't include `compile.hpp` here - it could be cyclic dependencies if not now but later. If you need something from it put it into `type.cpp` ------------- PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2281497013 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1744566752 From sviswanathan at openjdk.org Wed Sep 4 23:53:57 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 4 Sep 2024 23:53:57 GMT Subject: RFR: 8329035: New Data Destination instructions support [v2] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 22:22:56 GMT, Steve Dohrmann wrote: >> Adds assembler support for APX New Data Destination (NDD) and No Flags (NF) features. >> >> The NDD feature is supported by new functions that take an additional destination-only register operand. If the instruction also supports NF, a no_flags boolean parameter is present. To use these instructions with NF behavior, but without NDD semantics, the same register can be supplied for both the new destination and the (first) source operand. >> >> Some instructions support NF but not NDD. These instructions have a new function that just adds a boolean no_flags parameter. Existing functions were not overloaded with a boolean here because of signature collisions (bool / int) with functions that take immediate operands. >> >> All of the new functions have a letter "e" prefix, to avoid signature collisions and to indicate they will be evex encoded. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > function name changes based on review comments src/hotspot/cpu/x86/assembler_x86.cpp line 1361: > 1359: InstructionMark im(this); > 1360: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 1361: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); The input_size_in_bits could be EVEX_32bit. src/hotspot/cpu/x86/assembler_x86.cpp line 1403: > 1401: InstructionMark im(this); > 1402: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 1403: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); The input_size_in_bits could be EVEX_32bit. src/hotspot/cpu/x86/assembler_x86.cpp line 1475: > 1473: void Assembler::eaddb(Register dst, Register src, int imm8, bool no_flags) { > 1474: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 1475: // (void) evex_prefix_and_encode_ndd(src->encoding(), dst->encoding(), 0, VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes); Looks like the commented line is left over. src/hotspot/cpu/x86/assembler_x86.cpp line 1780: > 1778: void Assembler::eandw(Register dst, Register src1, Register src2, bool no_flags) { > 1779: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 1780: (void) evex_prefix_and_encode_ndd(src1->encoding(), dst->encoding(), src2->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, no_flags); This should be VEX_SIMD_66 instead of VEX_SIMD_NONE. src/hotspot/cpu/x86/assembler_x86.cpp line 1793: > 1791: InstructionMark im(this); > 1792: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 1793: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); input_size_in_bits should be EVEX_32bit here. src/hotspot/cpu/x86/assembler_x86.cpp line 1819: > 1817: InstructionMark im(this); > 1818: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 1819: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); input_size_in_bits should EVEX_32bit. src/hotspot/cpu/x86/assembler_x86.cpp line 1835: > 1833: InstructionMark im(this); > 1834: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 1835: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); input_size_in_bits should EVEX_32bit. src/hotspot/cpu/x86/assembler_x86.cpp line 1837: > 1835: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); > 1836: evex_prefix_ndd(src2, dst->encoding(), src1->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, no_flags); > 1837: emit_operand(src1, src2, 0); emit_int8(0x23) is missing before call to emit_operand(). src/hotspot/cpu/x86/assembler_x86.cpp line 2015: > 2013: void Assembler::ecmovl(Condition cc, Register dst, Register src1, Address src2) { > 2014: InstructionMark im(this); > 2015: NOT_LP64(guarantee(VM_Version::supports_cmov(), "illegal instruction")); This assert could be removed. src/hotspot/cpu/x86/assembler_x86.cpp line 2642: > 2640: > 2641: void Assembler::edecl(Register dst, Address src, bool no_flags) { > 2642: // Don't use it directly. Use MacroAssembler::decrement() instead. This comment could be removed. src/hotspot/cpu/x86/assembler_x86.cpp line 2721: > 2719: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 2720: int encode = evex_prefix_and_encode_nf(0, 0, src->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, no_flags); > 2721: emit_int16((unsigned char)0xA7, (0xE8 | encode)); Should this be 0xF7? src/hotspot/cpu/x86/assembler_x86.cpp line 2988: > 2986: > 2987: void Assembler::elzcntl(Register dst, Register src, bool no_flags) { > 2988: assert(VM_Version::supports_lzcnt(), "encoding is treated as BSR"); This assert could be removed. src/hotspot/cpu/x86/assembler_x86.cpp line 3004: > 3002: > 3003: void Assembler::elzcntl(Register dst, Address src, bool no_flags) { > 3004: assert(VM_Version::supports_lzcnt(), "encoding is treated as BSR"); This assert could be removed. src/hotspot/cpu/x86/assembler_x86.cpp line 12751: > 12749: } > 12750: if (nds_is_ndd) attributes->set_extended_context(); > 12751: bool is_extended = adr.base_needs_rex2() || adr.index_needs_rex2() || nds_enc >= 16 || xreg_enc >= 16 || nds_is_ndd || force_evex; If is_evex_instruction() is set for ndd and nf already at calling place as in my previous review comments, then is_extended could remain as before: bool is_extended = adr.base_needs_rex2() || adr.index_needs_rex2() || nds_enc >= 16 || xreg_enc >= 16; src/hotspot/cpu/x86/assembler_x86.cpp line 12841: > 12839: > 12840: clear_managed(); > 12841: if ((UseAVX > 2 && !attributes->is_legacy_mode()) || nds_is_ndd || force_evex) If is_evex_instruction() is set for ndd and nf already at calling place as in my previous review comments, then this if could remain as before: if (UseAVX > 2 && !attributes->is_legacy_mode()) src/hotspot/cpu/x86/assembler_x86.hpp line 796: > 794: void evex_prefix_ndd(Address adr, int ndd_enc, int xreg_enc, VexSimdPrefix pre, VexOpcode opc, InstructionAttr *attributes, bool no_flags = false) { > 795: vex_prefix(adr, ndd_enc, xreg_enc, pre, opc, attributes, /* nds_is_ndd */ true , /* force_evex */ true, no_flags); > 796: } The additional parameter force_evex could be removed and the above could be encoded as: attributes.set_is_evex_instruction(); vex_prefix(adr, ndd_enc, xreg_enc, pre, opc, attributes, /* nds_is_ndd */ true , no_flags); src/hotspot/cpu/x86/assembler_x86.hpp line 800: > 798: void evex_prefix_nf(Address adr, int ndd_enc, int xreg_enc, VexSimdPrefix pre, VexOpcode opc, InstructionAttr *attributes, bool no_flags = false) { > 799: vex_prefix(adr, ndd_enc, xreg_enc, pre, opc, attributes, /* nds_is_ndd */ false , /* force_evex */ true, no_flags); > 800: } The additional parameter force_evex could be removed and the above could be encoded as: attributes.set_is_evex_instruction(); vex_prefix(adr, ndd_enc, xreg_enc, pre, opc, attributes, /* nds_is_ndd */ false, no_flags); src/hotspot/cpu/x86/assembler_x86.hpp line 811: > 809: int evex_prefix_and_encode_ndd(int dst_enc, int nds_enc, int src_enc, VexSimdPrefix pre, VexOpcode opc, > 810: InstructionAttr *attributes, bool no_flags = false) { > 811: return vex_prefix_and_encode(dst_enc, nds_enc, src_enc, pre, opc, attributes, /* src_is_gpr */ true, /* nds_is_ndd */ true , /* force_evex */ true, no_flags); The additional parameter force_evex could be removed and the above could be encoded as: attributes.set_is_evex_instruction(); return vex_prefix_and_encode(dst_enc, nds_enc, src_enc, pre, opc, attributes, /* src_is_gpr */ true, /* nds_is_ndd */ true, no_flags); src/hotspot/cpu/x86/assembler_x86.hpp line 816: > 814: int evex_prefix_and_encode_nf(int dst_enc, int nds_enc, int src_enc, VexSimdPrefix pre, VexOpcode opc, > 815: InstructionAttr *attributes, bool no_flags = false) { > 816: return vex_prefix_and_encode(dst_enc, nds_enc, src_enc, pre, opc, attributes, /* src_is_gpr */ true, /* nds_is_ndd */ false, /* force_evex */ true, no_flags); The additional parameter force_evex could be removed and the above could be encoded as: attributes.set_is_evex_instruction(); return vex_prefix_and_encode(dst_enc, nds_enc, src_enc, pre, opc, attributes, /* src_is_gpr */ true, /* nds_is_ndd */ false, no_flags); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1744565879 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1744593576 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1744605475 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1744611664 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1744612351 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1744613073 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1744613754 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1744614108 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1744628195 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1744616282 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1744618760 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1744627345 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1744627491 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1744241014 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1744251493 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1744234434 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1744234962 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1744237546 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1744238289 From sviswanathan at openjdk.org Wed Sep 4 23:59:52 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 4 Sep 2024 23:59:52 GMT Subject: RFR: 8329035: New Data Destination instructions support [v2] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 22:22:56 GMT, Steve Dohrmann wrote: >> Adds assembler support for APX New Data Destination (NDD) and No Flags (NF) features. >> >> The NDD feature is supported by new functions that take an additional destination-only register operand. If the instruction also supports NF, a no_flags boolean parameter is present. To use these instructions with NF behavior, but without NDD semantics, the same register can be supplied for both the new destination and the (first) source operand. >> >> Some instructions support NF but not NDD. These instructions have a new function that just adds a boolean no_flags parameter. Existing functions were not overloaded with a boolean here because of signature collisions (bool / int) with functions that take immediate operands. >> >> All of the new functions have a letter "e" prefix, to avoid signature collisions and to indicate they will be evex encoded. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > function name changes based on review comments src/hotspot/cpu/x86/assembler_x86.cpp line 1461: > 1459: > 1460: void Assembler::eaddb(Register dst, Address src1, Register src2, bool no_flags) { > 1461: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); InstructionMark im(this) is missing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1744632967 From dlong at openjdk.org Thu Sep 5 03:02:58 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 5 Sep 2024 03:02:58 GMT Subject: RFR: 8332461: ubsan : dependencies.cpp:906:3: runtime error: load of value 4294967295, which is not a valid value for type 'DepType' [v2] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 09:49:49 GMT, Amit Kumar wrote: >> The error mentioned in the JBS issue is seen on x86_64 as well as on s390x during the build, with `--enable-ubsan` configuration. >> >> I have added `-1` to enum to fix this issue for now as mentioned by @MBaesken. But removing the assert itself is also a possible solution, mentioned on the JBS issue. >> >> So I will happy to follow the reviews/suggestion if this is not a good fix. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > initialise with undefined_dependency Marked as reviewed by dlong (Reviewer). src/hotspot/share/code/dependencies.hpp line 107: > 105: enum DepType { > 106: // _type is initially set to -1, to prevent "already at end" assert > 107: undefined_dependency = -1, Preserving the existing value seems fine, though it appears any value >= TYPE_LIMIT would work just as well. ------------- PR Review: https://git.openjdk.org/jdk/pull/20847#pullrequestreview-2281759910 PR Review Comment: https://git.openjdk.org/jdk/pull/20847#discussion_r1744738207 From dlong at openjdk.org Thu Sep 5 03:25:49 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 5 Sep 2024 03:25:49 GMT Subject: RFR: 8339299: C1 will miss type profile when inline final method [v3] In-Reply-To: References: Message-ID: <4bvmYFtueyFdZVH_j_a-f4CkW_soyG7OQscLj4J_UBA=.ddfd19ee-dbbe-430d-b955-58550109da46@github.com> On Wed, 4 Sep 2024 16:59:50 GMT, Vladimir Ivanov wrote: >> I was just trying to think how to preserve the original intend of this code, which seemed to be "skip profiling if the profiling info is not needed", but getting it right seems complicated, so I'm OK with always doing it. >> >>> C1 should not optimize call sites in Level 3 compilation. >> >> You mean don't use can_be_statically_bound() and other checks to devirtualize virtual calls? I believe C1 does that at all compilation levels. > > It does look attractive to align the logic with C2 usage of type profiles (avoid profiling when C2 doesn't consume the data). But I feel more comfortable unifying different modes of profiling at the expense of some micro-optimization opportunities. In other words, if interpreter collects some bit of data, I'd prefer to see C1 doing the same (and vice versa). > > I took a look at interpreter code (in `TemplateTable::invokevirtual_helper()`) and it makes the decision at runtime based on `is_vfinal` flag on `ResolvedMethodEntry`. The flag is set in `ConstantPoolCache::set_direct_or_vtable_call()` and covers both private and final methods. Moreover, receiver profiling is not performed on `invokeinterface` of private methods which is not taken into account by `should_profile_receiver_type()` now. > > It looks tempting to replicated what interpreter does (inspect `vfinal` flag on resolved method), but C1 has to gracefully work with not-yet-resolved call sites. So, either a recompilation or a runtime check is needed to align the behavior with interpreter. > > I haven't looked into the details, but performing profiling in C1 when the rest of the JVM doesn't expect that makes me a bit nervous. Smells like a possible source of profile data corruption. What profiling can be done seems to be decided by MethodData::compute_data_size()/MethodData::initialize_data(), which uses profile_arguments_for_invoke() and profile_return_for_invoke(). At runtime, I believe profiling is restricted by what it finds in the MethodData. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20786#discussion_r1744750197 From kbarrett at openjdk.org Thu Sep 5 06:44:51 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 5 Sep 2024 06:44:51 GMT Subject: RFR: 8339242: Fix overflow issues in AdlArena [v3] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 13:46:57 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> This PR addresses an issue in `adlArena` where some allocations lack checks for overflow. This could potentially result in successful allocations when called with unrealistic values. >> >> The fix includes: >> >> - Adding assertions to check for potential overflow. >> - Reordering some operations to guard against overflow. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > saturated pointer adds + size asserts I've only commented on the adlc changes, but I think all of those comments also apply to Arena. src/hotspot/share/adlc/adlArena.cpp line 156: > 154: if (new_size <= old_size) { // Shrink in-place > 155: if (c_old + old_size == _hwm) // Attempt to free the excess bytes > 156: _hwm = c_old + new_size; // Adjust hwm This `if` is missing braces around the consequent. If fixing the whitespace, the missing braces should also be added. src/hotspot/share/adlc/adlArena.hpp line 107: > 105: // Fast allocate in the arena. Common case is: pointer test + increment. > 106: void* Amalloc(size_t x) { > 107: assert(x <= (size_t)1 << 31, "unreasonable arena allocation size"); This seems mostly unrelated to the other overflow avoidance checking, and I think doesn't belong in this PR. For one thing, this might not be the right mechanism. Perhaps an Arena should have a configurable max allocation size? And shouldn't the max allocation be applied to Arealloc too? ------------- Changes requested by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20774#pullrequestreview-2281964035 PR Review Comment: https://git.openjdk.org/jdk/pull/20774#discussion_r1744889478 PR Review Comment: https://git.openjdk.org/jdk/pull/20774#discussion_r1744886811 From kbarrett at openjdk.org Thu Sep 5 06:44:52 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 5 Sep 2024 06:44:52 GMT Subject: RFR: 8339242: Fix overflow issues in AdlArena [v2] In-Reply-To: References: <2S1gKrLw3byu7APaERxXdcscYjhlYt3Edv-TMRgkqso=.35838990-c7cd-4f27-b2d5-500a3a0f375c@github.com> Message-ID: On Tue, 3 Sep 2024 14:01:25 GMT, Casper Norrbin wrote: >> src/hotspot/share/adlc/adlArena.cpp line 154: >> >>> 152: if( (c_old+old_size == _hwm) && // Adjusting recent thing >>> 153: ((size_t)(_max-c_old) >= new_size) ) { // Still fits where it sits, safe from overflow >>> 154: >> >> It appears that this change isn't worrying about bad `old_ptr` or `old_size` >> arguments, which is fine. But the code can be further improved by replacing >> lines 144-157 with something like >> >> // Reallocating the most recent allocation? >> if ((c_old + old_size) == _hwm) { >> assert(_chunk->bottom() <= c_old, "invariant"); >> // Reallocate in place if it fits. This also handles shrinking. >> if (pointer_delta(_max, c_old) >= new_size) { >> _hwm = c_old + new_size; >> return c_old; >> } >> } >> >> Of course, in adlc you can't use HotSpot's pointer_delta utility, so there >> you'll need to use something like what's in the PR for that calculation. >> >> Any check for an "unreasonable" size should happen in Amalloc, not here. > > I believe this would miss the case where we shrink an allocation in place and we are not at the high water mark, where `new_size <= old_size`, but where `c_old + old_size) == _hwm` does not hold. Oops, I managed to drop some of the code when pasting it into the github comment. There should be an `else if` clause: if ((c_old + old_size) == _hwm) { ... } else if (new_size <= old_size) { return c_old; } I like this version better than the earlier version because it has only one copy of the _hwm handling. It trades an additional pointer-delta in the shrink-last-allocation case for avoiding a compare-and-branch in the grow-last-allocation case, which I think is a good tradeoff, though likely pretty minor. I do not like the new saturated_add version. I think that just makes things (slightly) slower vs the pointer_delta compare, with no other benefit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20774#discussion_r1744877424 From amitkumar at openjdk.org Thu Sep 5 07:03:54 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 5 Sep 2024 07:03:54 GMT Subject: Integrated: 8332461: ubsan : dependencies.cpp:906:3: runtime error: load of value 4294967295, which is not a valid value for type 'DepType' In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 07:04:36 GMT, Amit Kumar wrote: > The error mentioned in the JBS issue is seen on x86_64 as well as on s390x during the build, with `--enable-ubsan` configuration. > > I have added `-1` to enum to fix this issue for now as mentioned by @MBaesken. But removing the assert itself is also a possible solution, mentioned on the JBS issue. > > So I will happy to follow the reviews/suggestion if this is not a good fix. This pull request has now been integrated. Changeset: 28de44da Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/28de44da71871bec7648f01a4df2faee43fa43b6 Stats: 5 lines in 2 files changed: 3 ins; 0 del; 2 mod 8332461: ubsan : dependencies.cpp:906:3: runtime error: load of value 4294967295, which is not a valid value for type 'DepType' Reviewed-by: stefank, kvn, dlong ------------- PR: https://git.openjdk.org/jdk/pull/20847 From amitkumar at openjdk.org Thu Sep 5 07:03:53 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 5 Sep 2024 07:03:53 GMT Subject: RFR: 8332461: ubsan : dependencies.cpp:906:3: runtime error: load of value 4294967295, which is not a valid value for type 'DepType' [v2] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 13:15:56 GMT, Stefan Karlsson wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> initialise with undefined_dependency > > Looks good from my POV, but I'd like to see one of the compiler devs to properly Review this. I tested the builds and fastdebug & release builds are fine, so let's integrate. Thanks @stefank @vnkozlov @dean-long for the suggestion & approval :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20847#issuecomment-2330754951 From stefank at openjdk.org Thu Sep 5 07:11:55 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 5 Sep 2024 07:11:55 GMT Subject: RFR: 8332461: ubsan : dependencies.cpp:906:3: runtime error: load of value 4294967295, which is not a valid value for type 'DepType' [v2] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 02:59:06 GMT, Dean Long wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> initialise with undefined_dependency > > src/hotspot/share/code/dependencies.hpp line 107: > >> 105: enum DepType { >> 106: // _type is initially set to -1, to prevent "already at end" assert >> 107: undefined_dependency = -1, > > Preserving the existing value seems fine, though it appears any value >= TYPE_LIMIT would work just as well. FWIW, I was also entertaining the idea of a solution like that with the hope that it would fit better with the usage in the iterators. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20847#discussion_r1744921838 From jbhateja at openjdk.org Thu Sep 5 07:45:17 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 5 Sep 2024 07:45:17 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v6] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review suggestions incorportated ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/767aeef3..bec0f449 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=04-05 Stats: 1979 lines in 59 files changed: 670 ins; 809 del; 500 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From jbhateja at openjdk.org Thu Sep 5 07:45:17 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 5 Sep 2024 07:45:17 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v5] In-Reply-To: References: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> Message-ID: <7huBzF7ygKcr1ADKYTizGsyEBNb6dWYaU3g9_StUGB4=.89495de4-e6b5-47d6-9756-41471d366211@github.com> On Tue, 3 Sep 2024 13:09:13 GMT, Emanuel Peter wrote: > You did in fact add `java/lang` methods. I think you need to add tests for all of those. As well. That's going to be even more code to review. Hi @eme64 , As Paul suggested in offline mail chain, lets restrict the changes with this patch to only VectorAPI. Going forward we may need to add special Unsigned value classes wrapping around equivalent sized integers. For the time being moving all the helper APIs int VectorMathUtils.java, these automatically gets exercised by the fallback implementation and we already have tests for next APIs. > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 914: > >> 912: case T_SHORT: vpminuw(dst, src1, src2, vlen_enc); break; >> 913: case T_INT: vpminud(dst, src1, src2, vlen_enc); break; >> 914: case T_LONG: evpminuq(dst, k0, src1, src2, false, vlen_enc); break; > > Can you explain to me what the `k0` is and where it comes from? k0 is an implicit mask register which signifies all true mask. Its not allocatable. Long min / max instructions are only available on AVX512 targets. > src/hotspot/share/opto/addnode.hpp line 194: > >> 192: class SaturatingAddINode : public Node { >> 193: public: >> 194: SaturatingAddINode(Node* in1, Node* in2) : Node(in1,in2) {} > > Suggestion: > > SaturatingAddINode(Node* in1, Node* in2) : Node(in1, in2) {} > > In other places below as well. Not applicable now. > src/hotspot/share/opto/addnode.hpp line 198: > >> 196: virtual const Type* bottom_type() const { return TypeInt::INT; } >> 197: virtual uint ideal_reg() const { return Op_RegI; } >> 198: }; > > Are these not supposed to inherit from the `AddNode`, and then override the corresponding methods? Or are you making them separate for a good reason? As per offline discussion with Paul, we are planning to restrict this patch to only Vector API, please refer to my earlier comments, https://github.com/openjdk/jdk/pull/20507#discussion_r1718044262 To reduce the noise I am keeping only required Vector IR nodes and planning to support scalar saturated operations in subsequent patch. > src/hotspot/share/opto/addnode.hpp line 462: > >> 460: //------------------------------UMaxINode--------------------------------------- >> 461: // Maximum of 2 unsigned integers. >> 462: class UMaxLNode : public Node { > > Here you comment it with `UMaxINode`, but below it is the `UMaxLNode`. The `-------xyz------` comments are really useless. But the semantics description is useful (though you again say integer instead of long here...). Not applicable now. > src/hotspot/share/opto/vectornode.hpp line 634: > >> 632: virtual int Opcode() const; >> 633: }; >> 634: > > This could also be a separate PR. Or are they somehow inseparable from the "saturation" changes? Not applicable now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2330830123 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1744971176 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1744970961 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1744971087 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1744971023 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1744970833 From jbhateja at openjdk.org Thu Sep 5 07:45:18 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 5 Sep 2024 07:45:18 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v5] In-Reply-To: References: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> Message-ID: <73QhgX2mQ9TBRQSq57MyimsFExG0tOKv6_id6EuCV_c=.03442b40-a24d-4623-8f1e-6050087c0e0d@github.com> On Tue, 3 Sep 2024 22:18:20 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolved > > src/hotspot/cpu/x86/x86.ad line 10684: > >> 10682: match(Set dst (SaturatingSubVI src1 src2)); >> 10683: match(Set dst (SaturatingSubVL src1 src2)); >> 10684: effect(TEMP xtmp1, TEMP xtmp2); > > Here we need TEMP dst in effect, the saturating_unsigned_sub_dq_avx defines and uses dst across xtmp1. Thanks, yes live range of MachNode corresponding to TEMP ends at its consumer instruction, they never make their way into liveout set of its block or survive beyond consumer, but back to back updates to DST and TMP may corrupt DST if both are assigned same registers by allocator. > src/java.base/share/classes/java/lang/Long.java line 1987: > >> 1985: public static long addSaturating(long a, long b) { >> 1986: long res = a + b; >> 1987: // HD 2-12 Overflow iff both arguments have the opposite sign of the result > > HD -> Hacker's Delight Thanks for elaborating, I replicated this logic from https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Math.java#L930 Wanted to comply with rest of the codes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1744970392 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1744970574 From jbhateja at openjdk.org Thu Sep 5 08:34:36 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 5 Sep 2024 08:34:36 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v7] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Some cleanups. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/bec0f449..7164783e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=05-06 Stats: 17 lines in 7 files changed: 2 ins; 10 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From mdoerr at openjdk.org Thu Sep 5 09:28:54 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 5 Sep 2024 09:28:54 GMT Subject: RFR: JDK-8332423 : [PPC64] Remove C1_MacroAssembler::call_c_with_frame_resize [v10] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 11:37:54 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8332423](https://bugs.openjdk.org/browse/JDK-8332423) >> C1_MacroAssembler::call_c_with_frame_resize is only used with frame_resize == 0. >> Also, call_c is adapted as per endianess of system. >> We can adapt the exisiting code to handle the endianness check at one place and not have to repeatedly check at multiple places to make calls to call_c. > > Suchismith Roy has updated the pull request incrementally with three additional commits since the last revision: > > - LIRA assembler call_c > - LIRA assembler call_c > - LIRA assembler call_c Test results look good this time. I think it's good to go. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19947#issuecomment-2331038868 From sroy at openjdk.org Thu Sep 5 09:58:58 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Thu, 5 Sep 2024 09:58:58 GMT Subject: RFR: JDK-8332423 : [PPC64] Remove C1_MacroAssembler::call_c_with_frame_resize [v10] In-Reply-To: References: Message-ID: <6TRW5GNn_Oik_QjWYxEp_GpM51EXXauSwxc1fnFAh34=.c8a31fd2-1eea-49cf-9fa4-786865d4dfd3@github.com> On Thu, 5 Sep 2024 09:26:30 GMT, Martin Doerr wrote: > Test results look good this time. I think it's good to go. Hi Martin, you mean the results for AIX ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19947#issuecomment-2331105119 From rcastanedalo at openjdk.org Thu Sep 5 10:05:12 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 5 Sep 2024 10:05:12 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v15] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Remove unnecessary g1LoadXVolatile instructions in aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/ed9c0232..9821e795 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=13-14 Stats: 71 lines in 2 files changed: 4 ins; 51 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Thu Sep 5 10:09:55 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 5 Sep 2024 10:09:55 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v13] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 12:04:09 GMT, Martin Doerr wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with four additional commits since the last revision: >> >> - Increase test coverage of new-object stores with different type information >> - Refactor the two post-barrier removal cases into a single expression >> - Remove unnecessary early null-based post-barrier elision >> - Make store capturability test G1-specific and more precise > > src/hotspot/cpu/aarch64/gc/g1/g1_aarch64.ad line 646: > >> 644: instruct g1LoadPVolatile(iRegPNoSp dst, indirect mem, iRegPNoSp tmp1, iRegPNoSp tmp2, rFlagsReg cr) >> 645: %{ >> 646: predicate(UseG1GC && needs_acquiring_load(n) && n->as_Load()->barrier_data() != 0); > > Remark: This node should never match because the `referent` is never volatile (same for `g1LoadNVolatile`): https://github.com/openjdk/jdk/blob/7a418fc07464fe359a0b45b6d797c65c573770cb/src/java.base/share/classes/java/lang/ref/Reference.java#L157 > Hence, `needs_acquiring_load(n)` and `n->as_Load()->barrier_data() != 0` are never true at the same time. > Not sure if this should somehow be reflected in the code. I've inserted `ShouldNotReachHere` on PPC64. Good catch, thanks! I simply removed the `g1LoadXVolatile` patterns and added a comment explaining why they are not needed (commit 9821e795). The matcher should already fail if we ever end up with an erroneous `LoadX` node `n` for which `UseG1GC && needs_acquiring_load(n) && n->as_Load()->barrier_data() != 0` holds. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1745185394 From mdoerr at openjdk.org Thu Sep 5 10:33:53 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 5 Sep 2024 10:33:53 GMT Subject: RFR: JDK-8332423 : [PPC64] Remove C1_MacroAssembler::call_c_with_frame_resize [v10] In-Reply-To: <6TRW5GNn_Oik_QjWYxEp_GpM51EXXauSwxc1fnFAh34=.c8a31fd2-1eea-49cf-9fa4-786865d4dfd3@github.com> References: <6TRW5GNn_Oik_QjWYxEp_GpM51EXXauSwxc1fnFAh34=.c8a31fd2-1eea-49cf-9fa4-786865d4dfd3@github.com> Message-ID: On Thu, 5 Sep 2024 09:56:10 GMT, Suchismith Roy wrote: > > Test results look good this time. I think it's good to go. > > Hi Martin, you mean the results for AIX ? Yes. I've also retested on linux ppc64le. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19947#issuecomment-2331176037 From varadam at openjdk.org Thu Sep 5 10:40:53 2024 From: varadam at openjdk.org (Varada M) Date: Thu, 5 Sep 2024 10:40:53 GMT Subject: RFR: JDK-8332423 : [PPC64] Remove C1_MacroAssembler::call_c_with_frame_resize [v10] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 11:37:54 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8332423](https://bugs.openjdk.org/browse/JDK-8332423) >> C1_MacroAssembler::call_c_with_frame_resize is only used with frame_resize == 0. >> Also, call_c is adapted as per endianess of system. >> We can adapt the exisiting code to handle the endianness check at one place and not have to repeatedly check at multiple places to make calls to call_c. > > Suchismith Roy has updated the pull request incrementally with three additional commits since the last revision: > > - LIRA assembler call_c > - LIRA assembler call_c > - LIRA assembler call_c LGTM! Thank you Martin for testing. ------------- Marked as reviewed by varadam (Committer). PR Review: https://git.openjdk.org/jdk/pull/19947#pullrequestreview-2282535182 From mdoerr at openjdk.org Thu Sep 5 10:45:55 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 5 Sep 2024 10:45:55 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v13] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 10:07:14 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/cpu/aarch64/gc/g1/g1_aarch64.ad line 646: >> >>> 644: instruct g1LoadPVolatile(iRegPNoSp dst, indirect mem, iRegPNoSp tmp1, iRegPNoSp tmp2, rFlagsReg cr) >>> 645: %{ >>> 646: predicate(UseG1GC && needs_acquiring_load(n) && n->as_Load()->barrier_data() != 0); >> >> Remark: This node should never match because the `referent` is never volatile (same for `g1LoadNVolatile`): https://github.com/openjdk/jdk/blob/7a418fc07464fe359a0b45b6d797c65c573770cb/src/java.base/share/classes/java/lang/ref/Reference.java#L157 >> Hence, `needs_acquiring_load(n)` and `n->as_Load()->barrier_data() != 0` are never true at the same time. >> Not sure if this should somehow be reflected in the code. I've inserted `ShouldNotReachHere` on PPC64. > > Good catch, thanks! I simply removed the `g1LoadXVolatile` patterns and added a comment explaining why they are not needed (commit 9821e795). The matcher should already fail if we ever end up with an erroneous `LoadX` node `n` for which `UseG1GC && needs_acquiring_load(n) && n->as_Load()->barrier_data() != 0` holds. Correct. Only the error message may be not so nice ("bad AD file"). PPC64 still has `g1LoadP_acq` and `g1LoadN_acq` which could also be replaced by a comment. But it's not important. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1745230285 From chagedorn at openjdk.org Thu Sep 5 10:55:53 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 5 Sep 2024 10:55:53 GMT Subject: Integrated: 8338971: IGV: Add incrementally inlined method name to phase name In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 11:43:57 GMT, Christian Hagedorn wrote: > This patch adds the method name to the incremental inlining step dumps in IGV which improves debugging issues involving incremental inlining: > > > static void test() { > method1(); > method2(); > method3(); > } > > static void method1() {} > static void method2() {} > static void method3() {} > > Run with `-XX:+AlwaysIncrementalInline` and IGV print level >=3: > > Before patch: > ![image](https://github.com/user-attachments/assets/a3a1ab32-e7b3-4ccb-8ab2-a75d2b5b6912) > > After patch: > ![image](https://github.com/user-attachments/assets/8100a2fe-1670-4687-b8b8-c8053fbaa7d7) > > The patch just prints the method name if we call `print_method()` with `n` being a call node which, AFAICT, only happens for the incremental inlining step. However, even if we call it with another phase at some point, I don't think it hurts to also dump the method name there. > > #### Testing > - Manually verifying change in IGV > - Building IGV which runs its unit tests > - Sanity run with a hello world program with `-Xcomp -XX:+AlwaysIncrementalInline -XX:+PrintIdealGraph -XX:PrintIdealGraphLevel=3 -XX:PrintIdealGraphFile=graph.xml` > > Thanks, > Christian This pull request has now been integrated. Changeset: 340e131d Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/340e131d616bd81ccd0bdc3817aead0284014cac Stats: 14 lines in 1 file changed: 13 ins; 0 del; 1 mod 8338971: IGV: Add incrementally inlined method name to phase name Reviewed-by: rcastanedalo, kvn ------------- PR: https://git.openjdk.org/jdk/pull/20834 From dholmes at openjdk.org Thu Sep 5 12:20:52 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 5 Sep 2024 12:20:52 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded [v3] In-Reply-To: References: <5rZOqWZ627bnEJ66wW7Qqye-ZYeAjsK1083MG5GELLk=.c5c144f0-a1a0-483c-ac21-3a948502c3be@github.com> Message-ID: On Wed, 4 Sep 2024 11:55:50 GMT, Jorn Vernee wrote: >> It does return. `ShouldNotReachHere` is used to crash the VM. > > `fatal()` might be better here. I could change it. Yes please do. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1745356400 From roland at openjdk.org Thu Sep 5 12:38:52 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 5 Sep 2024 12:38:52 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access In-Reply-To: References: Message-ID: On Mon, 26 Aug 2024 12:34:57 GMT, Roland Westrelin wrote: >> We failed in `LibraryCallKit::inline_unsafe_access()` while trying to inline `Unsafe::getShortUnaligned`. >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java#L86 >> The reason is that base (the array) is `ConP #null` hidden behind two `CheckCastPP` with `speculative=byte[int:>=0]` >> >> We call `Node* adr = make_unsafe_address(base, offset, type, kind == Relaxed);` >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2361 >> - with **base** = `147 CheckCastPP` >> - `118 ConP === 0 [[[ 106 101 71 ] #null` >> type >> >> Depending on the **offset** we go two different paths in `LibraryCallKit::make_unsafe_address` which both lead to the same error in the end. >> 1. For `UNSAFE.getShortUnaligned(array, 1_049_000)` we get kind = `Type::AnyPtr` because `offset >= os::vm_page_size()`. Since we assume base can't be null we insert an assert: >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2111 >> >> 2. whereas for `UNSAFE.getShortUnaligned(array, 1)` we get kind = `Type:: OopPtr` >> https://github.com/openjdk/jdk/blob/c17fa910cf3bad48547a3f0d68a30795ec3194e6/src/hotspot/share/opto/library_call.cpp#L2078 >> and insert a null check >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2090 >> In both cases we return call `basic_plus_adr(..)` on a base being `top()` which returns **adr** = `1 Con === 0 [[ ]] #top` >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2386 => `_gvn.type(adr)` is _top_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2394 => `adr_type` is _nullptr_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2405-L2406 => `BasicType bt` is _T_ILLEGAL_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2424 => we fail here with `SIGSEGV: null pointer dereference` because `alias_type->adr_type()` is _nullptr_ >> >> ### Fix >> the fix is to bailed out in ... > > Thanks for the extra details. > Is igvn run between incremental inlining and the crash? Or is that all part of a single incremental inlining sequence? > In `LibraryCallKit::make_unsafe_address`, `base` is the `CheckCastPP`. What I don't quite understand is how we can get `top` out of `basic_plus_adr` if the `base` input is a `CheckCastPP`. > It doesn't look like `NULL_PTR` is affected (since `TypePtr::cleanup_speculative()` unconditionally drops speculative part for it), but `Type*::BOTTOM`/`Type*::NOTNULL` et all seem susceptible to the problem. @rwestrel, what do you think? Something like: TypePtr::NOTNULL == ptr would indeed be a problem. So it would make sense to go over the uses of `Type*::BOTTOM`/`Type*::NOTNULL` and check they are not tested with pointer equality. Is that one what you're suggesting, Vladimir? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20033#issuecomment-2331470042 From roland at openjdk.org Thu Sep 5 12:57:28 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 5 Sep 2024 12:57:28 GMT Subject: RFR: 8338100: C2: assert(!n_loop->is_member(get_loop(lca))) failed: control must not be back in the loop [v2] In-Reply-To: References: Message-ID: > The crash occurs because a `Store` is sunk out of a loop that's an > inner loop of an infinite loop. The infinite loop was just found to be > infinite in the current round of loop opts. When that happens the > infinite loop is not properly attached to the rest of the loop tree. As > a consequence, the `IdealLoopTree` instance for the infinite loop and > its children are only partially initialized (`_nest` is not set) and > the structure is an inconsistent state. > > When the `Store` is sunk it's reported as belonging to a loop but the > `IdealLoopTree` for that loop is only half populated. As a consequence > a call to `is_dominator` for that loop hits an inconsistency, returns > an incorrect result and the assert fires. > > A possible fix would be a point fix that skips that optimization for a > loop that's part of an infinite loop nest. But given basic methods of > loop opts can't be trusted to work in the infinite loop nest, I > suppose similar issues can surface elsewhere. > > It's not the first time, we have issues with an infinite loop that's > not properly attached to the loop tree the first time it is > encountered (a NeverBranch is then added and on the next loop passes, > the infinite loop is properly attached to the loop tree). For instance > on a loop opts round, C2 can see that it has no loops and on the next > that it has some. > > I propose fixing this by properly attaching the infinite loop to the > loop tree when it's first discovered. A comment in the code seems to > hint that it requires going over the graph again after the > `NeverBranch` is added but I don't think that's case. > > I changed the assert in `loopnode.cpp` because it was there to work > around the inconsistency I mentioned above (no loop in a round, some > loops on the next one). > > The change in `parse1.cpp` fixes an issue I ran into when testing the > fix. The existing logic doesn't properly detect an exception backedge. > > I added the test case from 8336478 to this. The problem there is that > an infinite loop contains a long counted loop. The long counted loop > is transformed into a loop nest which is a 2 step process that > requires 2 rounds of loop opts. But c2 finds an infinite loop in the > middle of the process which causes it to see no more loops and to not > attempt another round of loop opts. The assert fires because it finds > a long counted loop nest that's half transformed. The change I propose > here fixes this too. If we go with this fix, I'll close 8336478 as > duplicate of this one. Roland Westrelin has updated the pull request incrementally with three additional commits since the last revision: - Update src/hotspot/share/opto/loopnode.hpp Co-authored-by: Tobias Hartmann - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20797/files - new: https://git.openjdk.org/jdk/pull/20797/files/1be3580b..ceb241a8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20797&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20797&range=00-01 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20797.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20797/head:pull/20797 PR: https://git.openjdk.org/jdk/pull/20797 From roland at openjdk.org Thu Sep 5 13:21:19 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 5 Sep 2024 13:21:19 GMT Subject: RFR: 8338100: C2: assert(!n_loop->is_member(get_loop(lca))) failed: control must not be back in the loop [v3] In-Reply-To: References: Message-ID: > The crash occurs because a `Store` is sunk out of a loop that's an > inner loop of an infinite loop. The infinite loop was just found to be > infinite in the current round of loop opts. When that happens the > infinite loop is not properly attached to the rest of the loop tree. As > a consequence, the `IdealLoopTree` instance for the infinite loop and > its children are only partially initialized (`_nest` is not set) and > the structure is an inconsistent state. > > When the `Store` is sunk it's reported as belonging to a loop but the > `IdealLoopTree` for that loop is only half populated. As a consequence > a call to `is_dominator` for that loop hits an inconsistency, returns > an incorrect result and the assert fires. > > A possible fix would be a point fix that skips that optimization for a > loop that's part of an infinite loop nest. But given basic methods of > loop opts can't be trusted to work in the infinite loop nest, I > suppose similar issues can surface elsewhere. > > It's not the first time, we have issues with an infinite loop that's > not properly attached to the loop tree the first time it is > encountered (a NeverBranch is then added and on the next loop passes, > the infinite loop is properly attached to the loop tree). For instance > on a loop opts round, C2 can see that it has no loops and on the next > that it has some. > > I propose fixing this by properly attaching the infinite loop to the > loop tree when it's first discovered. A comment in the code seems to > hint that it requires going over the graph again after the > `NeverBranch` is added but I don't think that's case. > > I changed the assert in `loopnode.cpp` because it was there to work > around the inconsistency I mentioned above (no loop in a round, some > loops on the next one). > > The change in `parse1.cpp` fixes an issue I ran into when testing the > fix. The existing logic doesn't properly detect an exception backedge. > > I added the test case from 8336478 to this. The problem there is that > an infinite loop contains a long counted loop. The long counted loop > is transformed into a loop nest which is a 2 step process that > requires 2 rounds of loop opts. But c2 finds an infinite loop in the > middle of the process which causes it to see no more loops and to not > attempt another round of loop opts. The assert fires because it finds > a long counted loop nest that's half transformed. The change I propose > here fixes this too. If we go with this fix, I'll close 8336478 as > duplicate of this one. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - remove useless PhaseIdealLoop::only_has_infinite_loops() - Merge branch 'master' into JDK-8338100 - Update src/hotspot/share/opto/loopnode.hpp Co-authored-by: Tobias Hartmann - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - comment - test fix - remove verification code - test & fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20797/files - new: https://git.openjdk.org/jdk/pull/20797/files/ceb241a8..a1bfc79e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20797&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20797&range=01-02 Stats: 11045 lines in 484 files changed: 6572 ins; 1805 del; 2668 mod Patch: https://git.openjdk.org/jdk/pull/20797.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20797/head:pull/20797 PR: https://git.openjdk.org/jdk/pull/20797 From roland at openjdk.org Thu Sep 5 13:21:19 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 5 Sep 2024 13:21:19 GMT Subject: RFR: 8338100: C2: assert(!n_loop->is_member(get_loop(lca))) failed: control must not be back in the loop [v3] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 07:58:25 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: >> >> - remove useless PhaseIdealLoop::only_has_infinite_loops() >> - Merge branch 'master' into JDK-8338100 >> - Update src/hotspot/share/opto/loopnode.hpp >> >> Co-authored-by: Tobias Hartmann >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - comment >> - test fix >> - remove verification code >> - test & fix > > Looks reasonable to me. @chhagedorn @TobiHartmann thanks for the reviews. I took care of the comments in the new commits. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20797#issuecomment-2331662706 From thartmann at openjdk.org Thu Sep 5 13:36:54 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 5 Sep 2024 13:36:54 GMT Subject: RFR: 8338100: C2: assert(!n_loop->is_member(get_loop(lca))) failed: control must not be back in the loop [v3] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 13:21:19 GMT, Roland Westrelin wrote: >> The crash occurs because a `Store` is sunk out of a loop that's an >> inner loop of an infinite loop. The infinite loop was just found to be >> infinite in the current round of loop opts. When that happens the >> infinite loop is not properly attached to the rest of the loop tree. As >> a consequence, the `IdealLoopTree` instance for the infinite loop and >> its children are only partially initialized (`_nest` is not set) and >> the structure is an inconsistent state. >> >> When the `Store` is sunk it's reported as belonging to a loop but the >> `IdealLoopTree` for that loop is only half populated. As a consequence >> a call to `is_dominator` for that loop hits an inconsistency, returns >> an incorrect result and the assert fires. >> >> A possible fix would be a point fix that skips that optimization for a >> loop that's part of an infinite loop nest. But given basic methods of >> loop opts can't be trusted to work in the infinite loop nest, I >> suppose similar issues can surface elsewhere. >> >> It's not the first time, we have issues with an infinite loop that's >> not properly attached to the loop tree the first time it is >> encountered (a NeverBranch is then added and on the next loop passes, >> the infinite loop is properly attached to the loop tree). For instance >> on a loop opts round, C2 can see that it has no loops and on the next >> that it has some. >> >> I propose fixing this by properly attaching the infinite loop to the >> loop tree when it's first discovered. A comment in the code seems to >> hint that it requires going over the graph again after the >> `NeverBranch` is added but I don't think that's case. >> >> I changed the assert in `loopnode.cpp` because it was there to work >> around the inconsistency I mentioned above (no loop in a round, some >> loops on the next one). >> >> The change in `parse1.cpp` fixes an issue I ran into when testing the >> fix. The existing logic doesn't properly detect an exception backedge. >> >> I added the test case from 8336478 to this. The problem there is that >> an infinite loop contains a long counted loop. The long counted loop >> is transformed into a loop nest which is a 2 step process that >> requires 2 rounds of loop opts. But c2 finds an infinite loop in the >> middle of the process which causes it to see no more loops and to not >> attempt another round of loop opts. The assert fires because it finds >> a long counted loop nest that's half transformed. The change I propose >> here fixes this too. If we go with this fix, I'll c... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - remove useless PhaseIdealLoop::only_has_infinite_loops() > - Merge branch 'master' into JDK-8338100 > - Update src/hotspot/share/opto/loopnode.hpp > > Co-authored-by: Tobias Hartmann > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - comment > - test fix > - remove verification code > - test & fix Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20797#pullrequestreview-2283085094 From chagedorn at openjdk.org Thu Sep 5 13:50:51 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 5 Sep 2024 13:50:51 GMT Subject: RFR: 8338100: C2: assert(!n_loop->is_member(get_loop(lca))) failed: control must not be back in the loop [v3] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 13:21:19 GMT, Roland Westrelin wrote: >> The crash occurs because a `Store` is sunk out of a loop that's an >> inner loop of an infinite loop. The infinite loop was just found to be >> infinite in the current round of loop opts. When that happens the >> infinite loop is not properly attached to the rest of the loop tree. As >> a consequence, the `IdealLoopTree` instance for the infinite loop and >> its children are only partially initialized (`_nest` is not set) and >> the structure is an inconsistent state. >> >> When the `Store` is sunk it's reported as belonging to a loop but the >> `IdealLoopTree` for that loop is only half populated. As a consequence >> a call to `is_dominator` for that loop hits an inconsistency, returns >> an incorrect result and the assert fires. >> >> A possible fix would be a point fix that skips that optimization for a >> loop that's part of an infinite loop nest. But given basic methods of >> loop opts can't be trusted to work in the infinite loop nest, I >> suppose similar issues can surface elsewhere. >> >> It's not the first time, we have issues with an infinite loop that's >> not properly attached to the loop tree the first time it is >> encountered (a NeverBranch is then added and on the next loop passes, >> the infinite loop is properly attached to the loop tree). For instance >> on a loop opts round, C2 can see that it has no loops and on the next >> that it has some. >> >> I propose fixing this by properly attaching the infinite loop to the >> loop tree when it's first discovered. A comment in the code seems to >> hint that it requires going over the graph again after the >> `NeverBranch` is added but I don't think that's case. >> >> I changed the assert in `loopnode.cpp` because it was there to work >> around the inconsistency I mentioned above (no loop in a round, some >> loops on the next one). >> >> The change in `parse1.cpp` fixes an issue I ran into when testing the >> fix. The existing logic doesn't properly detect an exception backedge. >> >> I added the test case from 8336478 to this. The problem there is that >> an infinite loop contains a long counted loop. The long counted loop >> is transformed into a loop nest which is a 2 step process that >> requires 2 rounds of loop opts. But c2 finds an infinite loop in the >> middle of the process which causes it to see no more loops and to not >> attempt another round of loop opts. The assert fires because it finds >> a long counted loop nest that's half transformed. The change I propose >> here fixes this too. If we go with this fix, I'll c... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - remove useless PhaseIdealLoop::only_has_infinite_loops() > - Merge branch 'master' into JDK-8338100 > - Update src/hotspot/share/opto/loopnode.hpp > > Co-authored-by: Tobias Hartmann > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - comment > - test fix > - remove verification code > - test & fix Still good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20797#pullrequestreview-2283122717 From tholenstein at openjdk.org Thu Sep 5 13:57:52 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 5 Sep 2024 13:57:52 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access In-Reply-To: <97WNhEuU6bfx4gw4qgg8mCeUIOobL5WFxq35I8Bb56o=.7cc209f5-1872-43d0-8bc5-182787d2f557@github.com> References: <97WNhEuU6bfx4gw4qgg8mCeUIOobL5WFxq35I8Bb56o=.7cc209f5-1872-43d0-8bc5-182787d2f557@github.com> Message-ID: On Wed, 4 Sep 2024 17:23:54 GMT, Vladimir Ivanov wrote: > Thanks for the clarifications, Toby. I reconsidered my conclusion about root cause. I agree that redundant `CheckCastPP` causes problems here, but what surprises me is that `null_check_oop` successfully detects that `base == NULL` while `LibraryCallKit::classify_unsafe_addr()` has a hard time doing the same. IMO the discrepancy is the source of the problem here. Can you share more details why it happens? `classify_unsafe_addr` relies on the type information provided by `_gvn.type(base).` If this type information is speculative or imprecise, the function might misclassify the address. This is the case here since `147 CheckCastPP` isn't equal to `TypePtr::NULL_PTR` in `classify_unsafe_addr`. `null_check_oop` uses more explicit checks to determine if a value is null and handles them by inserting traps if necessary. It uses `null_check_common` to perform the actual null check. `null_check_common` insert __chk__ = `150 CmpP`. `chk = _gvn.transform(chk);` then determined it to be null. https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/graphKit.cpp#L1316 150 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20033#issuecomment-2331745915 From tholenstein at openjdk.org Thu Sep 5 13:57:53 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 5 Sep 2024 13:57:53 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access In-Reply-To: References: Message-ID: On Thu, 4 Jul 2024 12:17:51 GMT, Tobias Holenstein wrote: > We failed in `LibraryCallKit::inline_unsafe_access()` while trying to inline `Unsafe::getShortUnaligned`. > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java#L86 > The reason is that base (the array) is `ConP #null` hidden behind two `CheckCastPP` with `speculative=byte[int:>=0]` > > We call `Node* adr = make_unsafe_address(base, offset, type, kind == Relaxed);` > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2361 > - with **base** = `147 CheckCastPP` > - `118 ConP === 0 [[[ 106 101 71 ] #null` > type > > Depending on the **offset** we go two different paths in `LibraryCallKit::make_unsafe_address` which both lead to the same error in the end. > 1. For `UNSAFE.getShortUnaligned(array, 1_049_000)` we get kind = `Type::AnyPtr` because `offset >= os::vm_page_size()`. Since we assume base can't be null we insert an assert: > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2111 > > 2. whereas for `UNSAFE.getShortUnaligned(array, 1)` we get kind = `Type:: OopPtr` > https://github.com/openjdk/jdk/blob/c17fa910cf3bad48547a3f0d68a30795ec3194e6/src/hotspot/share/opto/library_call.cpp#L2078 > and insert a null check > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2090 > In both cases we return call `basic_plus_adr(..)` on a base being `top()` which returns **adr** = `1 Con === 0 [[ ]] #top` > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2386 => `_gvn.type(adr)` is _top_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2394 => `adr_type` is _nullptr_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2405-L2406 => `BasicType bt` is _T_ILLEGAL_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2424 => we fail here with `SIGSEGV: null pointer dereference` because `alias_type->adr_type()` is _nullptr_ > > ### Fix > the fix is to bailed out in this case > https://github.com/openjdk/jdk/blob/3d5d51e228c19a... I still propose to fix `LibraryCallKit::classify_unsafe_addr` by changing https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2044 to } else if (_gvn.type(base->uncast()) == TypePtr::NULL_PTR) { ------------- PR Comment: https://git.openjdk.org/jdk/pull/20033#issuecomment-2331751173 From roland at openjdk.org Thu Sep 5 14:13:58 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 5 Sep 2024 14:13:58 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 09:31:18 GMT, Tobias Holenstein wrote: > `base` is `147 CheckCastPP === 136 71 [[ 150 149 ]] #java/lang/Object * (speculative=byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact * (inline_depth=2)) Oop:java/lang/Object * (speculative=byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact * (inline_depth=2)) !jvms: Test::helperSmall @ bci:11 (line 23) Test::accessSmallArray @ bci:7 (line 29) Test::test2 @ bci:2 (line 38)` before > > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2090 > > and `1 Con === 0 [[ ]] #top` after. > Then `base` is top when we call `basic_plus_adr(base, offset)` right after. I think Vladimir's question is: how can `null_check_oop()` return `top`? AFAIU, it creates a `CastPP` with 147 as input and that `CastPP` is transformed to top. How does that happen? What are the steps in the call to `_gvn.transform( cast );` that lead to a result of `top`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20033#issuecomment-2331789415 From epeter at openjdk.org Thu Sep 5 14:28:58 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 5 Sep 2024 14:28:58 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v7] In-Reply-To: References: Message-ID: <_aU7H3hhe2lA5cwndBuEFJ8U2rahyKwm60xqXaIANTQ=.1f027324-95d1-4539-b094-7ac04608fe59@github.com> On Thu, 5 Sep 2024 08:34:36 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Some cleanups. No time to review now. But the title only talks about saturating vector operations. UMin/ UMax is not really a saturating operation, right? Preferably, move it to a separate PR, or at least change the title, please :) Just note on the length of this PR: people are not really excited to review 9k lines at once. I personally spend quite a bit of effort splitting things into smaller units, so that I get things reviewed quicker, and so that I make the life of the reviewer easier. It would be nice if you could split things into smaller units, I think in the end you would get more reviews quicker, and the result would be of higher quality. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2331828999 From duke at openjdk.org Thu Sep 5 14:31:28 2024 From: duke at openjdk.org (Casper Norrbin) Date: Thu, 5 Sep 2024 14:31:28 GMT Subject: RFR: 8339242: Fix overflow issues in AdlArena [v4] In-Reply-To: References: Message-ID: > Hi everyone, > > This PR addresses an issue in `adlArena` where some allocations lack checks for overflow. This could potentially result in successful allocations when called with unrealistic values. > > The fix includes: > > - Adding assertions to check for potential overflow. > - Reordering some operations to guard against overflow. Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: review suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20774/files - new: https://git.openjdk.org/jdk/pull/20774/files/b30f188c..aec249fa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20774&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20774&range=02-03 Stats: 54 lines in 4 files changed: 10 ins; 29 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/20774.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20774/head:pull/20774 PR: https://git.openjdk.org/jdk/pull/20774 From duke at openjdk.org Thu Sep 5 14:31:28 2024 From: duke at openjdk.org (Casper Norrbin) Date: Thu, 5 Sep 2024 14:31:28 GMT Subject: RFR: 8339242: Fix overflow issues in AdlArena [v3] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 13:46:57 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> This PR addresses an issue in `adlArena` where some allocations lack checks for overflow. This could potentially result in successful allocations when called with unrealistic values. >> >> The fix includes: >> >> - Adding assertions to check for potential overflow. >> - Reordering some operations to guard against overflow. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > saturated pointer adds + size asserts Here are my thoughts after looking through this again and reading your comments. I agree that blindly reshuffling parameters to prevent overflow may not be the best solution. I think using something like `pointer_delta` strikes a good balance. Using a saturated add feels to me like checking for overflow, just with extra steps. I also believe that asserting for ?reasonable? arena allocation sizes may be out of scope of this PR. The initial purpose was to fix potential overflow issues in the adlc arena, and has already expanded into the regular arena. These asserts also causes test failures, so even more changes would be required if added. It may be better to create a separate issue for generally improving arena allocation safety. With the overflow checks in place, Kim?s refactor is a bit cleaner and more readable than the modified original, so I?ve opted to implement that. Please let me know any further comments or suggestions ------------- PR Comment: https://git.openjdk.org/jdk/pull/20774#issuecomment-2331836199 From sroy at openjdk.org Thu Sep 5 14:33:53 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Thu, 5 Sep 2024 14:33:53 GMT Subject: RFR: JDK-8332423 : [PPC64] Remove C1_MacroAssembler::call_c_with_frame_resize [v10] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 11:37:54 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8332423](https://bugs.openjdk.org/browse/JDK-8332423) >> C1_MacroAssembler::call_c_with_frame_resize is only used with frame_resize == 0. >> Also, call_c is adapted as per endianess of system. >> We can adapt the exisiting code to handle the endianness check at one place and not have to repeatedly check at multiple places to make calls to call_c. > > Suchismith Roy has updated the pull request incrementally with three additional commits since the last revision: > > - LIRA assembler call_c > - LIRA assembler call_c > - LIRA assembler call_c thank you all for the review and tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19947#issuecomment-2331841158 From duke at openjdk.org Thu Sep 5 14:33:53 2024 From: duke at openjdk.org (duke) Date: Thu, 5 Sep 2024 14:33:53 GMT Subject: RFR: JDK-8332423 : [PPC64] Remove C1_MacroAssembler::call_c_with_frame_resize [v10] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 11:37:54 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8332423](https://bugs.openjdk.org/browse/JDK-8332423) >> C1_MacroAssembler::call_c_with_frame_resize is only used with frame_resize == 0. >> Also, call_c is adapted as per endianess of system. >> We can adapt the exisiting code to handle the endianness check at one place and not have to repeatedly check at multiple places to make calls to call_c. > > Suchismith Roy has updated the pull request incrementally with three additional commits since the last revision: > > - LIRA assembler call_c > - LIRA assembler call_c > - LIRA assembler call_c @suchismith1993 Your change (at version b02b198226fc0f683648b3c91dba683cdf6e4c20) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19947#issuecomment-2331845806 From epeter at openjdk.org Thu Sep 5 14:46:57 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 5 Sep 2024 14:46:57 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 20:44:28 GMT, Quan Anh Mai wrote: >> test/hotspot/gtest/opto/test_rangeinference.cpp line 148: >> >>> 146: test_normalize_constraints_random(); >>> 147: test_normalize_constraints_random(); >>> 148: } >> >> I would appreciate it if there were some explicit examples with explicit result verification. Just to make sure the methods are not systematically wrong in some silly way. > > My idea is that it is what `test_normalize_constraints_simple` would do, but I think adding some more explicit cases would help, too. Maybe you should just grep over the whole diff and look for `normal` to catch any remaining cases where it should be named `canonical` ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1745685025 From epeter at openjdk.org Thu Sep 5 14:46:56 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 5 Sep 2024 14:46:56 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v12] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 19:50:01 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > fix build Just a very quick scan this time. src/hotspot/share/opto/rangeinference.hpp line 46: > 44: * Bits that are known to be 0 or 1. A value v satisfies this constraint iff > 45: * (v & zeros) == 0 && (~v & ones) == 0. I.e, all bits that is set in zeros > 46: * must be unset in v, and all bits that is set in ones must be set in v. That is quite counter-intuitive. Is there a good reason for this? I would have expected that `zero[i] = 1` would mean that a zero is allowed, and `ones[i] = 1` that a one is allowed. Basically, when I see `zero[i]` I expect it to be a boolean that answers me this question: "can it be a zero?". But you are telling me I'm supposed to ask "Must it not be a zero"? You are telling me that `zero[i] = 1` and `ones[i] = 0` means that it must be a `1`. I know that changing it now would be a lot of effort. But the risk of being unintuitive is that even less people can quickly fix bugs in this code. @vnkozlov what do you think about this? src/hotspot/share/opto/type.hpp line 564: > 562: // Dual sets are only used to compute the join of 2 sets, and not used > 563: // outside. > 564: const bool _is_dual; Do you have a clear definition of what a dual is somewhere? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2283269454 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1745673844 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1745669003 From epeter at openjdk.org Thu Sep 5 14:47:55 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 5 Sep 2024 14:47:55 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v7] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 08:34:36 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Some cleanups. Just a few quick remarks. src/hotspot/share/opto/vectornode.hpp line 188: > 186: }; > 187: > 188: Suprious newline, please remove Suggestion: src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathUtils.java line 78: > 76: * @since 24 > 77: */ > 78: public static long addSaturating(long a, long b) { Are these public methods any Java dev could use? If so: do we have tests for them? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2283248252 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1745655940 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1745665600 From epeter at openjdk.org Thu Sep 5 14:47:56 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 5 Sep 2024 14:47:56 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v5] In-Reply-To: <7huBzF7ygKcr1ADKYTizGsyEBNb6dWYaU3g9_StUGB4=.89495de4-e6b5-47d6-9756-41471d366211@github.com> References: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> <7huBzF7ygKcr1ADKYTizGsyEBNb6dWYaU3g9_StUGB4=.89495de4-e6b5-47d6-9756-41471d366211@github.com> Message-ID: On Thu, 5 Sep 2024 07:42:26 GMT, Jatin Bhateja wrote: >> src/hotspot/share/opto/vectornode.hpp line 634: >> >>> 632: virtual int Opcode() const; >>> 633: }; >>> 634: >> >> This could also be a separate PR. Or are they somehow inseparable from the "saturation" changes? > > Not applicable now. What is not applicable? Do you actually need this node for the saturating operations? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1745661230 From fjiang at openjdk.org Thu Sep 5 14:56:02 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 5 Sep 2024 14:56:02 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> <3ME602kl5NBEdiWT53M-B-YHtyU4yYgwc-lVFPxjiKg=.7298f764-788a-4c79-a2aa-324407c63813@github.com> Message-ID: On Wed, 4 Sep 2024 09:07:23 GMT, Roberto Casta?eda Lozano wrote: >> @robcasloz: Thanks for the updates! I have an implementation for PPC64: https://github.com/TheRealMDoerr/jdk/commit/ed9c0232f53a15d768804348e1d8a111fed9a19e >> Do you prefer integrating it soon? Otherwise, I could keep it separate and do more rebasing and testing while this PR evolves further. > >> I have an implementation for PPC64: https://github.com/TheRealMDoerr/jdk/commit/ed9c0232f53a15d768804348e1d8a111fed9a19e > Do you prefer integrating it soon? > > That's great, thank you for your work and your comments and suggestions, Martin! I just merged your implementation. Hi @robcasloz, here is the implementation for RISC-V: https://github.com/feilongjiang/jdk/commit/1c012cfd4a1f4d6e39d6f4798281423f884e97a6 We are still testing the latest changes, results will be updated later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2331932063 From qamai at openjdk.org Thu Sep 5 15:41:30 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 5 Sep 2024 15:41:30 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v13] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: - rename tests - more explanation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/8d14f8ee..5990628a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=11-12 Stats: 82 lines in 2 files changed: 49 ins; 1 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Thu Sep 5 15:41:30 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 5 Sep 2024 15:41:30 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v12] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 14:38:20 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> fix build > > src/hotspot/share/opto/rangeinference.hpp line 46: > >> 44: * Bits that are known to be 0 or 1. A value v satisfies this constraint iff >> 45: * (v & zeros) == 0 && (~v & ones) == 0. I.e, all bits that is set in zeros >> 46: * must be unset in v, and all bits that is set in ones must be set in v. > > That is quite counter-intuitive. Is there a good reason for this? > I would have expected that `zero[i] = 1` would mean that a zero is allowed, and `ones[i] = 1` that a one is allowed. > > Basically, when I see `zero[i]` I expect it to be a boolean that answers me this question: "can it be a zero?". But you are telling me I'm supposed to ask "Must it not be a zero"? > > You are telling me that `zero[i] = 1` and `ones[i] = 0` means that it must be a `1`. > > I know that changing it now would be a lot of effort. But the risk of being unintuitive is that even less people can quickly fix bugs in this code. > > @vnkozlov what do you think about this? You are a bit confused, the `zeros` and `ones` answer the question: Must this bit is 0 (or 1). Which means that `zero[i] = 1` means that the bit must be a `0`. > src/hotspot/share/opto/type.hpp line 564: > >> 562: // Dual sets are only used to compute the join of 2 sets, and not used >> 563: // outside. >> 564: const bool _is_dual; > > Do you have a clear definition of what a dual is somewhere? Tbh I'm not entirely sure what does "dual" serve other than: it is to compute the join of 2 sets. Looking at the code suggests that a `Type` is somehow a symmetric space with all normal types on one side and the "dual" types on the other. This really confuses me and I cannot figure out what exactly a dual type represents. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1745773937 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1745777943 From qamai at openjdk.org Thu Sep 5 15:41:30 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 5 Sep 2024 15:41:30 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v12] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 22:06:26 GMT, Vladimir Kozlov wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> fix build > > src/hotspot/share/opto/type.hpp line 29: > >> 27: >> 28: #include "opto/adlcVMDeps.hpp" >> 29: #include "opto/compile.hpp" > > Please, don't include `compile.hpp` here - it could be cyclic dependencies if not now but later. > If you need something from it put it into `type.cpp` It is useless, we use `Compile::current()` inside `type.hpp`, which means that `compile.hpp` must be included before `type.hpp` anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1745779199 From qamai at openjdk.org Thu Sep 5 15:45:02 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 5 Sep 2024 15:45:02 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 14:27:22 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - fix compile errors >> - Merge branch 'master' into unsignedbounds >> - add comments >> - Merge branch 'master' into unsignedbounds >> - fix release build >> - add comments, group arguments to reduce C-style reference passing arguments >> - fix tests, add verify >> - add unit tests >> - fix template parameter >> - refactor >> - ... and 1 more: https://git.openjdk.org/jdk/compare/d8e4d3f2...d5ad9f1a > > What are your thoughts on how we are going to debug-print the new int-type? We probably do not always want to print the KnownBits, in general that is way too verbose. But in some alignment example, it would be nice to know that it is the `int/long` range, but the lowest 3 bits are always zero, hence 8 byte aligned. @eme64 I have just found out a proof for the `adjust_bounds_from_bits`. It seems more rigorous and easier to understand. @vnkozlov It is because the result of a `make` can be not a `TypeInt` but an empty type. It would be possible to have a `TypeInt` instance representing the empty set. However, we expose `_lo` and `_hi`, and accessing them of an empty set seems to be a nonsensical operation and potentially dangerous. As a result, I think it is safer we return `Type::TOP` for the empty set. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2332074013 From sroy at openjdk.org Thu Sep 5 15:47:57 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Thu, 5 Sep 2024 15:47:57 GMT Subject: Integrated: JDK-8332423 : [PPC64] Remove C1_MacroAssembler::call_c_with_frame_resize In-Reply-To: References: Message-ID: On Fri, 28 Jun 2024 16:44:12 GMT, Suchismith Roy wrote: > JBS Issue : [JDK-8332423](https://bugs.openjdk.org/browse/JDK-8332423) > C1_MacroAssembler::call_c_with_frame_resize is only used with frame_resize == 0. > Also, call_c is adapted as per endianess of system. > We can adapt the exisiting code to handle the endianness check at one place and not have to repeatedly check at multiple places to make calls to call_c. This pull request has now been integrated. Changeset: b895d7cf Author: Suchismith Roy Committer: Martin Doerr URL: https://git.openjdk.org/jdk/commit/b895d7cf9fe0370a919e7092e40ac3458d91e95e Stats: 62 lines in 10 files changed: 3 ins; 48 del; 11 mod 8332423: [PPC64] Remove C1_MacroAssembler::call_c_with_frame_resize Reviewed-by: mdoerr, varadam ------------- PR: https://git.openjdk.org/jdk/pull/19947 From roland at openjdk.org Thu Sep 5 15:53:59 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 5 Sep 2024 15:53:59 GMT Subject: RFR: 8338100: C2: assert(!n_loop->is_member(get_loop(lca))) failed: control must not be back in the loop [v3] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 13:34:42 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: >> >> - remove useless PhaseIdealLoop::only_has_infinite_loops() >> - Merge branch 'master' into JDK-8338100 >> - Update src/hotspot/share/opto/loopnode.hpp >> >> Co-authored-by: Tobias Hartmann >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - comment >> - test fix >> - remove verification code >> - test & fix > > Looks good. Thanks for the re-reviews @TobiHartmann @chhagedorn ------------- PR Comment: https://git.openjdk.org/jdk/pull/20797#issuecomment-2332091249 From roland at openjdk.org Thu Sep 5 15:54:01 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 5 Sep 2024 15:54:01 GMT Subject: Integrated: 8338100: C2: assert(!n_loop->is_member(get_loop(lca))) failed: control must not be back in the loop In-Reply-To: References: Message-ID: <-gpM6F8mMAwSCVYFurcsPCD8_tLN6e9zmbJotifGpv8=.a6234a76-3975-4b28-ae17-547a6aaef3a7@github.com> On Fri, 30 Aug 2024 15:58:30 GMT, Roland Westrelin wrote: > The crash occurs because a `Store` is sunk out of a loop that's an > inner loop of an infinite loop. The infinite loop was just found to be > infinite in the current round of loop opts. When that happens the > infinite loop is not properly attached to the rest of the loop tree. As > a consequence, the `IdealLoopTree` instance for the infinite loop and > its children are only partially initialized (`_nest` is not set) and > the structure is an inconsistent state. > > When the `Store` is sunk it's reported as belonging to a loop but the > `IdealLoopTree` for that loop is only half populated. As a consequence > a call to `is_dominator` for that loop hits an inconsistency, returns > an incorrect result and the assert fires. > > A possible fix would be a point fix that skips that optimization for a > loop that's part of an infinite loop nest. But given basic methods of > loop opts can't be trusted to work in the infinite loop nest, I > suppose similar issues can surface elsewhere. > > It's not the first time, we have issues with an infinite loop that's > not properly attached to the loop tree the first time it is > encountered (a NeverBranch is then added and on the next loop passes, > the infinite loop is properly attached to the loop tree). For instance > on a loop opts round, C2 can see that it has no loops and on the next > that it has some. > > I propose fixing this by properly attaching the infinite loop to the > loop tree when it's first discovered. A comment in the code seems to > hint that it requires going over the graph again after the > `NeverBranch` is added but I don't think that's case. > > I changed the assert in `loopnode.cpp` because it was there to work > around the inconsistency I mentioned above (no loop in a round, some > loops on the next one). > > The change in `parse1.cpp` fixes an issue I ran into when testing the > fix. The existing logic doesn't properly detect an exception backedge. > > I added the test case from 8336478 to this. The problem there is that > an infinite loop contains a long counted loop. The long counted loop > is transformed into a loop nest which is a 2 step process that > requires 2 rounds of loop opts. But c2 finds an infinite loop in the > middle of the process which causes it to see no more loops and to not > attempt another round of loop opts. The assert fires because it finds > a long counted loop nest that's half transformed. The change I propose > here fixes this too. If we go with this fix, I'll close 8336478 as > duplicate of this one. This pull request has now been integrated. Changeset: e203df46 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/e203df46faf610e35e2c2510271ad68199f4fa3f Stats: 308 lines in 7 files changed: 257 ins; 32 del; 19 mod 8338100: C2: assert(!n_loop->is_member(get_loop(lca))) failed: control must not be back in the loop Co-authored-by: Emanuel Peter Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/20797 From jvernee at openjdk.org Thu Sep 5 16:04:08 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 5 Sep 2024 16:04:08 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded [v4] In-Reply-To: References: Message-ID: > As discussed in the JBS issue: > > FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` > > However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. > > The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. > >
> Performance numbers > x64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op > > > aarch64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op > >
> > As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. > > Testing: tier 1-4 Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: use fatal() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20479/files - new: https://git.openjdk.org/jdk/pull/20479/files/1558ad9c..7d191107 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20479&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20479&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20479.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20479/head:pull/20479 PR: https://git.openjdk.org/jdk/pull/20479 From rcastanedalo at openjdk.org Thu Sep 5 16:06:59 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 5 Sep 2024 16:06:59 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> <3ME602kl5NBEdiWT53M-B-YHtyU4yYgwc-lVFPxjiKg=.7298f764-788a-4c79-a2aa-324407c63813@github.com> Message-ID: On Wed, 4 Sep 2024 09:07:23 GMT, Roberto Casta?eda Lozano wrote: >> @robcasloz: Thanks for the updates! I have an implementation for PPC64: https://github.com/TheRealMDoerr/jdk/commit/ed9c0232f53a15d768804348e1d8a111fed9a19e >> Do you prefer integrating it soon? Otherwise, I could keep it separate and do more rebasing and testing while this PR evolves further. > >> I have an implementation for PPC64: https://github.com/TheRealMDoerr/jdk/commit/ed9c0232f53a15d768804348e1d8a111fed9a19e > Do you prefer integrating it soon? > > That's great, thank you for your work and your comments and suggestions, Martin! I just merged your implementation. > Hi @robcasloz, here is the implementation for RISC-V: [feilongjiang at 1c012cf](https://github.com/feilongjiang/jdk/commit/1c012cfd4a1f4d6e39d6f4798281423f884e97a6) We are still testing the latest changes, results will be updated later. Great, thanks @feilongjiang! Just let me know when you are done with testing and I will merge it into this changeset. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2332119624 From amitkumar at openjdk.org Thu Sep 5 16:08:52 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 5 Sep 2024 16:08:52 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded [v4] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 17:03:32 GMT, Martin Doerr wrote: >> Tier1 test are fine with/without "saving & restoring" return_pc; > > I found it: https://github.com/openjdk/jdk/blob/433f6d8a0643b59663bf76c0f3a2af27a6cc56b7/src/hotspot/cpu/s390/gc/g1/g1BarrierSetAssembler_s390.cpp#L238 > Called here: > https://github.com/openjdk/jdk/blob/433f6d8a0643b59663bf76c0f3a2af27a6cc56b7/src/hotspot/cpu/s390/gc/g1/g1BarrierSetAssembler_s390.cpp#L115 > Other GCs with load barriers are not implemented, so the save&restore code is redundant. > The stub is frameless and only needs the save&restore code when calling C. > In this case, no weak references are used, so there's no C call on s390. @JornVernee would you please remove "saving & restoring" code for the return PC as mentioned by Martin. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1745819490 From kvn at openjdk.org Thu Sep 5 16:23:54 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 5 Sep 2024 16:23:54 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v9] In-Reply-To: <9IWQrneQV6EbR8d23nDgX8Nas6S-1RL4Jo5BR7UjZ0I=.2b6ddf93-484c-4f84-aa5e-728e01af6348@github.com> References: <9IWQrneQV6EbR8d23nDgX8Nas6S-1RL4Jo5BR7UjZ0I=.2b6ddf93-484c-4f84-aa5e-728e01af6348@github.com> Message-ID: On Tue, 3 Sep 2024 23:02:43 GMT, Vladimir Kozlov wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> move static_asserts > > Please, fix GHA builds and testing. > @vnkozlov It is because the result of a `make` can be not a `TypeInt` but an empty type. It would be possible to have a `TypeInt` instance representing the empty set. However, we expose `_lo` and `_hi`, and accessing them of an empty set seems to be a nonsensical operation and potentially dangerous. As a result, I think it is safer we return `Type::TOP` for the empty set. So why you can't do next?: const Type* TypeInt::make(jint lo, jint hi, int w) { const Type* t = make(TypeIntPrototype{{lo, hi}, {0, max_juint}, {0, 0}}, w); t->is_int(); return t; } ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2332156134 From kvn at openjdk.org Thu Sep 5 16:30:56 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 5 Sep 2024 16:30:56 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v13] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 15:41:30 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: > > - rename tests > - more explanation src/hotspot/share/opto/type.cpp line 1705: > 1703: bool TypeInt::empty(void) const { > 1704: return false; > 1705: } I would like to see assert(_lo <= _hi) here since you unconditionally return `false`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1745842143 From kvn at openjdk.org Thu Sep 5 16:30:57 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 5 Sep 2024 16:30:57 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v12] In-Reply-To: References: Message-ID: <7qq0OOLuuaKq6avsgEBIq_HDcKd8Wx5afW-I8Pv_SzE=.f67f5b8b-24fd-4b27-bf1b-75a5ef03689c@github.com> On Thu, 5 Sep 2024 15:38:39 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/type.hpp line 29: >> >>> 27: >>> 28: #include "opto/adlcVMDeps.hpp" >>> 29: #include "opto/compile.hpp" >> >> Please, don't include `compile.hpp` here - it could be cyclic dependencies if not now but later. >> If you need something from it put it into `type.cpp` > > It is useless, we use `Compile::current()` inside `type.hpp`, which means that `compile.hpp` must be included before `type.hpp` anyway. `useless` -> `useful` You are right. But how it works now? Usually C++ compilers complain about missing definitions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1745816730 From qamai at openjdk.org Thu Sep 5 16:47:55 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 5 Sep 2024 16:47:55 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v13] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 16:24:17 GMT, Vladimir Kozlov wrote: >> Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: >> >> - rename tests >> - more explanation > > src/hotspot/share/opto/type.cpp line 1705: > >> 1703: bool TypeInt::empty(void) const { >> 1704: return false; >> 1705: } > > I would like to see assert(_lo <= _hi) here since you unconditionally return `false`. `_lo <= _hi` (and other invariants) is checked in `verify_constraints` when creating the `TypeInt` instance. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1745868702 From qamai at openjdk.org Thu Sep 5 16:47:55 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 5 Sep 2024 16:47:55 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v12] In-Reply-To: <7qq0OOLuuaKq6avsgEBIq_HDcKd8Wx5afW-I8Pv_SzE=.f67f5b8b-24fd-4b27-bf1b-75a5ef03689c@github.com> References: <7qq0OOLuuaKq6avsgEBIq_HDcKd8Wx5afW-I8Pv_SzE=.f67f5b8b-24fd-4b27-bf1b-75a5ef03689c@github.com> Message-ID: On Thu, 5 Sep 2024 16:04:34 GMT, Vladimir Kozlov wrote: >> It is useless, we use `Compile::current()` inside `type.hpp`, which means that `compile.hpp` must be included before `type.hpp` anyway. > > `useless` -> `useful` > You are right. But how it works now? Usually C++ compilers complain about missing definitions. It seems that in all source files that include `type.hpp`, `compile.hpp` is also included either directly or indirectly before. So, there is no compilation error. It only arises when `rangeinference.cpp` only includes `type.hpp` and not many things else. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1745867345 From qamai at openjdk.org Thu Sep 5 16:50:58 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 5 Sep 2024 16:50:58 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v13] In-Reply-To: References: Message-ID: <0veewW1__tzBqLGjyZxAynKtDfryEXmHYn5n2RlOqJQ=.c1493f7f-cd1d-4f07-8379-507b9531fc5f@github.com> On Thu, 5 Sep 2024 16:45:21 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/type.cpp line 1705: >> >>> 1703: bool TypeInt::empty(void) const { >>> 1704: return false; >>> 1705: } >> >> I would like to see assert(_lo <= _hi) here since you unconditionally return `false`. > > `_lo <= _hi` (and other invariants) is checked in `verify_constraints` when creating the `TypeInt` instance. The call: https://github.com/openjdk/jdk/blob/5990628a8337b9040128857f34359b169326eb23/src/hotspot/share/opto/type.cpp#L1598 And the implementation: https://github.com/openjdk/jdk/blob/5990628a8337b9040128857f34359b169326eb23/src/hotspot/share/opto/rangeinference.cpp#L358 Notice that `this->contains(_srange._lo)` implies that `_lo <= _hi` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1745872365 From qamai at openjdk.org Thu Sep 5 16:55:53 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 5 Sep 2024 16:55:53 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v9] In-Reply-To: References: <9IWQrneQV6EbR8d23nDgX8Nas6S-1RL4Jo5BR7UjZ0I=.2b6ddf93-484c-4f84-aa5e-728e01af6348@github.com> Message-ID: On Thu, 5 Sep 2024 16:21:19 GMT, Vladimir Kozlov wrote: > So why you can't do next?: > > const Type* TypeInt::make(jint lo, jint hi, int w) { > return make(TypeIntPrototype{{lo, hi}, {0, max_juint}, {0, 0}}, w)->is_int(); > } I think it would make sense if at all the use sites we ensure that `lo <= hi`. This, however, is not true and there are places where we do have `lo > hi`. As a result, I think it would make sense to defer the decision to deal with that case to the callers. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2332214920 From jkarthikeyan at openjdk.org Thu Sep 5 17:02:59 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 5 Sep 2024 17:02:59 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v13] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 15:41:30 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: > > - rename tests > - more explanation I think in most cases we expect `TypeInteger::make()` to have well-defined inputs and thus well-defined outputs, so I also think it would be good to keep the use sites as they were before for code cleanliness. For places where TOP is allowed there could be another function, maybe `TypeInteger::try_make()`, to signal explicitly that TOP is being handled by the callee code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2332224018 From kvn at openjdk.org Thu Sep 5 17:03:00 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 5 Sep 2024 17:03:00 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v13] In-Reply-To: <0veewW1__tzBqLGjyZxAynKtDfryEXmHYn5n2RlOqJQ=.c1493f7f-cd1d-4f07-8379-507b9531fc5f@github.com> References: <0veewW1__tzBqLGjyZxAynKtDfryEXmHYn5n2RlOqJQ=.c1493f7f-cd1d-4f07-8379-507b9531fc5f@github.com> Message-ID: On Thu, 5 Sep 2024 16:48:21 GMT, Quan Anh Mai wrote: >> `_lo <= _hi` (and other invariants) is checked in `verify_constraints` when creating the `TypeInt` instance. > > The call: > > https://github.com/openjdk/jdk/blob/5990628a8337b9040128857f34359b169326eb23/src/hotspot/share/opto/type.cpp#L1598 > > And the implementation: > > https://github.com/openjdk/jdk/blob/5990628a8337b9040128857f34359b169326eb23/src/hotspot/share/opto/rangeinference.cpp#L358 > > Notice that `this->contains(_srange._lo)` implies that `_lo <= _hi` Okay but assert will guard against future incorrect changes by someone who may not familiar with verification code you pointed. Some new path in type construction can be introduced which bypath those checks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1745886136 From qamai at openjdk.org Thu Sep 5 17:46:15 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 5 Sep 2024 17:46:15 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v14] In-Reply-To: References: Message-ID: <9Zcmr96TNLvXe7lcoKAhG4ZQNeLE7OKhvB4KeaShM-A=.4d478f01-510d-41b1-bfb5-bbaca34ec18b@github.com> > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: - add trivial test cases - make should return the correct type ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/5990628a..089c566b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=12-13 Stats: 93 lines in 15 files changed: 18 ins; 0 del; 75 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Thu Sep 5 17:48:56 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 5 Sep 2024 17:48:56 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v13] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 16:59:02 GMT, Jasmine Karthikeyan wrote: >> Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: >> >> - rename tests >> - more explanation > > I think in most cases we expect `TypeInteger::make()` to have well-defined inputs and thus well-defined outputs, so I also think it would be good to keep the use sites as they were before for code cleanliness. For places where TOP is allowed there could be another function, maybe `TypeInteger::try_make()`, to signal explicitly that TOP is being handled by the callee code. Thanks @jaskarth for the really great suggestions. I have made `make` return the concrete type and add an assert that the bounds are legal. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2332311607 From qamai at openjdk.org Thu Sep 5 17:53:56 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 5 Sep 2024 17:53:56 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v13] In-Reply-To: References: <0veewW1__tzBqLGjyZxAynKtDfryEXmHYn5n2RlOqJQ=.c1493f7f-cd1d-4f07-8379-507b9531fc5f@github.com> Message-ID: <6I_EScB1FGB2FQvG2FeBJ8VgLNY8ym0oprMtTG5CZSc=.8a6da5c2-608d-4d7e-82d3-3b9ae27dd1c9@github.com> On Thu, 5 Sep 2024 16:59:47 GMT, Vladimir Kozlov wrote: >> The call: >> >> https://github.com/openjdk/jdk/blob/5990628a8337b9040128857f34359b169326eb23/src/hotspot/share/opto/type.cpp#L1598 >> >> And the implementation: >> >> https://github.com/openjdk/jdk/blob/5990628a8337b9040128857f34359b169326eb23/src/hotspot/share/opto/rangeinference.cpp#L358 >> >> Notice that `this->contains(_srange._lo)` implies that `_lo <= _hi` > > Okay but assert will guard against future incorrect changes by someone who may not familiar with verification code you pointed. Some new path in type construction can be introduced which bypath those checks. My thought is that a `TypeInt` has many invariants and it would be really expensive if we check all of them at all use sites, and it seems not enough if we only check for one of those invariants. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1745946303 From qamai at openjdk.org Thu Sep 5 18:12:55 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 5 Sep 2024 18:12:55 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v15] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: fix builds ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/089c566b..2e3955d5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=13-14 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From mdoerr at openjdk.org Thu Sep 5 18:18:56 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 5 Sep 2024 18:18:56 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v15] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 10:05:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Remove unnecessary g1LoadXVolatile instructions in aarch64 I've implemented the same cleanup as on aarch64: https://github.com/TheRealMDoerr/jdk/commit/ad662a256034a09156b1b43673d2640a119740b2 Would be nice if you could apply it. Thanks! In case you want to merge further updates from head, I have no objections. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2332365001 From kvn at openjdk.org Thu Sep 5 19:12:57 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 5 Sep 2024 19:12:57 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v15] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 18:12:55 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > fix builds This is much better now. After GHA testing finished I will submit our testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2332452193 PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2332454787 From dlong at openjdk.org Thu Sep 5 20:00:51 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 5 Sep 2024 20:00:51 GMT Subject: RFR: 8330159: [C2] Remove or clarify Compile::init_start [v6] In-Reply-To: References: <7Q9VAVqBUDDnqEcCOWeRh5N-YC0dsg9lyQsOv6xbO80=.6d55dc58-0281-45fd-a92f-d0f6ddd910cc@github.com> Message-ID: On Tue, 3 Sep 2024 13:49:51 GMT, Yagmur Eren wrote: >> Compile::init_start method contained only an assertion. To cleanup, this method is removed and the locations where this method is called are replaced with the corresponding assertion. See issue: [JDK-8330159](https://bugs.openjdk.org/browse/JDK-8330159) > > Yagmur Eren has updated the pull request incrementally with one additional commit since the last revision: > > Update Compile::verify_init comment Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20715#pullrequestreview-2283968926 From duke at openjdk.org Thu Sep 5 20:18:53 2024 From: duke at openjdk.org (duke) Date: Thu, 5 Sep 2024 20:18:53 GMT Subject: RFR: 8330159: [C2] Remove or clarify Compile::init_start [v6] In-Reply-To: References: <7Q9VAVqBUDDnqEcCOWeRh5N-YC0dsg9lyQsOv6xbO80=.6d55dc58-0281-45fd-a92f-d0f6ddd910cc@github.com> Message-ID: <2Vuk8BV9j8WWvrowOwL1GTUHdulY3AqYfQcosqeRSrE=.d9e7fe19-80e8-4449-884e-db7f23219fbc@github.com> On Tue, 3 Sep 2024 13:49:51 GMT, Yagmur Eren wrote: >> Compile::init_start method contained only an assertion. To cleanup, this method is removed and the locations where this method is called are replaced with the corresponding assertion. See issue: [JDK-8330159](https://bugs.openjdk.org/browse/JDK-8330159) > > Yagmur Eren has updated the pull request incrementally with one additional commit since the last revision: > > Update Compile::verify_init comment @nelanbu Your change (at version 05b94113dc6664ff4ee6d9500151829eead648d4) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20715#issuecomment-2332553476 From duke at openjdk.org Thu Sep 5 20:18:51 2024 From: duke at openjdk.org (Yagmur Eren) Date: Thu, 5 Sep 2024 20:18:51 GMT Subject: RFR: 8330159: [C2] Remove or clarify Compile::init_start [v5] In-Reply-To: References: <7Q9VAVqBUDDnqEcCOWeRh5N-YC0dsg9lyQsOv6xbO80=.6d55dc58-0281-45fd-a92f-d0f6ddd910cc@github.com> <2pFdztomWU60_fYA092wBtpPyGjwuprzlzC1nj8xyMk=.2ae0fc34-b055-48ae-8320-7d014a148064@github.com> <7UpM8j5ekKQZlNCAblwAZpAJZbBbCOW4InrH9VpS8fc=.03400512-dd6f-4533-a6f6-30ea0712fcf3@github.com> Message-ID: <6dnkyadC6Ex1mxzpr_t_wMwRFb8JQBbt510Du62y7uc=.dc3d99b4-7b90-4a24-ba81-b23fc555a3da@github.com> On Tue, 3 Sep 2024 13:03:27 GMT, Christian Hagedorn wrote: >>> Hi @nelanbu, I don't think this is correct. In `Compile::start()`, we have the following code: >>> >>> https://github.com/openjdk/jdk/blob/b8e8e965e541881605f9dbcd4d9871d4952b9232/src/hotspot/share/opto/compile.cpp#L1121-L1131 >>> >>> It asserts that `failing()` is false. Therefore, `init_start()` bails out before checking the assert with `start()` which you now no longer do with your refactoring. >>> >>> What you could do instead: >>> >>> * Simplify the code in `init_start()` to and add an assertion message: >>> >>> ``` >>> assert(failing() || s == start(), "should be StartNode"); >>> ``` >>> >>> * Change `init_start_node()` into a more meaningful name like `verify_start()`, as we are not actually initializing anything but rather sanity checking the start node. >>> * Guard the method with `DEBUG_ONLY/ifdef ASSERT` since it's only calling an assert in debug VM and nothing in product VM. >> >> >> Is `failing()` true if it fails as the name suggests? If so, then I guess it should be `!failing()` within `assert`, right? > >> > Hi @nelanbu, I don't think this is correct. In `Compile::start()`, we have the following code: >> > https://github.com/openjdk/jdk/blob/b8e8e965e541881605f9dbcd4d9871d4952b9232/src/hotspot/share/opto/compile.cpp#L1121-L1131 >> > >> > It asserts that `failing()` is false. Therefore, `init_start()` bails out before checking the assert with `start()` which you now no longer do with your refactoring. >> > What you could do instead: >> > >> > * Simplify the code in `init_start()` to and add an assertion message: >> > >> > ``` >> > assert(failing() || s == start(), "should be StartNode"); >> > ``` >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > * Change `init_start_node()` into a more meaningful name like `verify_start()`, as we are not actually initializing anything but rather sanity checking the start node. >> > * Guard the method with `DEBUG_ONLY/ifdef ASSERT` since it's only calling an assert in debug VM and nothing in product VM. >> >> Is `failing()` true if it fails as the name suggests? If so, then I guess it should be `!failing()` within `assert`, right? > > No, the way the assert works is that if we fail (i.e. `failing()` is true), we do not actually want to check `s == start()` because `start()` requires `failing()` evaluating to false. So, whenever `failing()` is true, the first part of the assert makes the assertion true and we stop evaluating. But usually it is false, so we continue evaluating the second part. We also use this trick with `||`-ing conditions at other places, for example here: > https://github.com/openjdk/jdk/blob/e0c46d589b12aa644e12e4a4c9e84e035f7cf98d/src/hotspot/share/opto/callnode.cpp#L1291 > > Whenever `n` is null, the first part of the assert is true and makes the entire assert true. Only if `n` is non-null, we will evaluate the second and interesting part of the assert. Thanks a lot for the reviews and feedback @chhagedorn, @dean-long and @TobiHartmann! Can I get a sponsor? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20715#issuecomment-2332550692 From dlong at openjdk.org Thu Sep 5 20:23:51 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 5 Sep 2024 20:23:51 GMT Subject: RFR: 8339242: Fix overflow issues in AdlArena [v4] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 14:31:28 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> This PR addresses an issue in `adlArena` where some allocations lack checks for overflow. This could potentially result in successful allocations when called with unrealistic values. >> >> The fix includes: >> >> - Adding assertions to check for potential overflow. >> - Reordering some operations to guard against overflow. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > review suggestions src/hotspot/share/adlc/adlArena.cpp line 141: > 139: > 140: //------------------------------realloc---------------------------------------- > 141: size_t pointer_delta(const void *left, const void *right) { Do we want to assert left >= right here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20774#discussion_r1746114056 From duke at openjdk.org Thu Sep 5 20:45:57 2024 From: duke at openjdk.org (halkosajtarevic) Date: Thu, 5 Sep 2024 20:45:57 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v15] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 10:05:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Remove unnecessary g1LoadXVolatile instructions in aarch64 Sorry, one maybe dumb question, hopefully matching the context here: Is this whole handling also required if those references point to instances of enums? Or could we do some further optimizations in such cases? Or is that exactly what C2 is doing afterwards? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2332586175 From sviswanathan at openjdk.org Thu Sep 5 23:14:04 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 5 Sep 2024 23:14:04 GMT Subject: RFR: 8329035: New Data Destination instructions support [v2] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 22:22:56 GMT, Steve Dohrmann wrote: >> Adds assembler support for APX New Data Destination (NDD) and No Flags (NF) features. >> >> The NDD feature is supported by new functions that take an additional destination-only register operand. If the instruction also supports NF, a no_flags boolean parameter is present. To use these instructions with NF behavior, but without NDD semantics, the same register can be supplied for both the new destination and the (first) source operand. >> >> Some instructions support NF but not NDD. These instructions have a new function that just adds a boolean no_flags parameter. Existing functions were not overloaded with a boolean here because of signature collisions (bool / int) with functions that take immediate operands. >> >> All of the new functions have a letter "e" prefix, to avoid signature collisions and to indicate they will be evex encoded. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > function name changes based on review comments src/hotspot/cpu/x86/assembler_x86.cpp line 4583: > 4581: InstructionMark im(this); > 4582: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 4583: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); input_size_in_bits should be EVEX_32bit. src/hotspot/cpu/x86/assembler_x86.cpp line 4637: > 4635: InstructionMark im(this); > 4636: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 4637: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); input_size_in_bits should be EVEX_32bit. src/hotspot/cpu/x86/assembler_x86.cpp line 6593: > 6591: > 6592: void Assembler::erolq(Register dst, Register src, bool no_flags) { > 6593: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); vex_w should be true here. src/hotspot/cpu/x86/assembler_x86.cpp line 6647: > 6645: assert(isShiftCount(imm8), "illegal shift count"); > 6646: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 6647: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); input_size_in_bits should be EVEX_32bit. src/hotspot/cpu/x86/assembler_x86.cpp line 6670: > 6668: InstructionMark im(this); > 6669: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 6670: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); input_size_in_bits should be EVEX_32bit. src/hotspot/cpu/x86/assembler_x86.cpp line 6686: > 6684: } > 6685: > 6686: void Assembler::esall(Register dst, Register src, int imm8, bool no_flags) { assert(isShiftCount(imm8), "illegal shift count") missing. src/hotspot/cpu/x86/assembler_x86.cpp line 6694: > 6692: emit_int24((unsigned char)0xC1, (0xF8 | encode), imm8); > 6693: } > 6694: } Should this be (0xE0 | encode)? src/hotspot/cpu/x86/assembler_x86.cpp line 6722: > 6720: } > 6721: > 6722: void Assembler::esarl(Register dst, Address src, int imm8, bool no_flags) { assert(isShiftCount(imm8), "illegal shift count") missing. src/hotspot/cpu/x86/assembler_x86.cpp line 6725: > 6723: InstructionMark im(this); > 6724: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 6725: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); input_size_in_bits should be EVEX_32bit. src/hotspot/cpu/x86/assembler_x86.cpp line 6748: > 6746: InstructionMark im(this); > 6747: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 6748: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); input_size_in_bits should be EVEX_32bit. src/hotspot/cpu/x86/assembler_x86.cpp line 6764: > 6762: } > 6763: > 6764: void Assembler::esarl(Register dst, Register src, int imm8, bool no_flags) { assert(isShiftCount(imm8), "illegal shift count") missing. src/hotspot/cpu/x86/assembler_x86.cpp line 6794: > 6792: InstructionMark im(this); > 6793: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 6794: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); input_size_in_bits should be EVEX_32bit. src/hotspot/cpu/x86/assembler_x86.cpp line 6819: > 6817: void Assembler::esbbl(Register dst, Register src1, Address src2) { > 6818: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 6819: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); input_size_in_bits should be EVEX_32bit. src/hotspot/cpu/x86/assembler_x86.cpp line 7546: > 7544: InstructionMark im(this); > 7545: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 7546: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); input_size_in_bits should be EVEX_32bit. src/hotspot/cpu/x86/assembler_x86.cpp line 7572: > 7570: InstructionMark im(this); > 7571: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 7572: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); input_size_in_bits should be EVEX_32bit. src/hotspot/cpu/x86/assembler_x86.cpp line 7600: > 7598: InstructionMark im(this); > 7599: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 7600: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); input_size_in_bytes should be EVEX_32bit. src/hotspot/cpu/x86/assembler_x86.cpp line 7645: > 7643: void Assembler::exorw(Register dst, Register src1, Register src2, bool no_flags) { > 7644: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 7645: (void) evex_prefix_and_encode_ndd(src1->encoding(), dst->encoding(), src2->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, no_flags); VEX_SIMD_NONE should be VEX_SIMD_66. src/hotspot/cpu/x86/assembler_x86.cpp line 7662: > 7660: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 7661: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); > 7662: evex_prefix_ndd(src2, dst->encoding(), src1->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, no_flags); input_size_in_bits could be EVEX_16bit. VEX_SIMD_NONE should be VEX_SIMD_66. src/hotspot/cpu/x86/assembler_x86.cpp line 12251: > 12249: > 12250: void Assembler::edecl(Register dst, Register src, bool no_flags) { > 12251: // Don't use it directly. Use MacroAssembler::deccrementl() instead. This comment can be removed. src/hotspot/cpu/x86/assembler_x86.cpp line 13706: > 13704: > 13705: void Assembler::eincl(Register dst, Register src, bool no_flags) { > 13706: // Don't use it directly. Use MacroAssembler::incrementl() instead. This comment could be removed. src/hotspot/cpu/x86/assembler_x86.cpp line 14788: > 14786: void Assembler::edecl(Register dst, Register src, bool no_flags) { > 14787: // Don't use it directly. Use MacroAssembler::decrementl() instead. > 14788: // Use two-byte form (one-byte form is a REX prefix in 64-bit mode) This comment could be removed. src/hotspot/cpu/x86/assembler_x86.cpp line 14803: > 14801: void Assembler::edecq(Register dst, Register src, bool no_flags) { > 14802: // Don't use it directly. Use MacroAssembler::incrementq() instead. > 14803: // Use two-byte form (one-byte from is a REX prefix in 64-bit mode) This comment could be removed. src/hotspot/cpu/x86/assembler_x86.cpp line 14817: > 14815: > 14816: void Assembler::edecq(Register dst, Address src, bool no_flags) { > 14817: // Don't use it directly. Use MacroAssembler::increment() instead. This comment could be removed. src/hotspot/cpu/x86/assembler_x86.cpp line 14883: > 14881: void Assembler::eimulq(Register dst, Register src, bool no_flags) { > 14882: InstructionAttr attributes(AVX_128bit, /* vex_w */ true, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 14883: int encode = vex_prefix_and_encode(dst->encoding(), 0, src->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, /* src_is_gpr */ true, /* nds_is_ndd */ false, no_flags); Is there a reason we are not calling the evex_prefix_and_encode_nf here? src/hotspot/cpu/x86/assembler_x86.cpp line 14900: > 14898: void Assembler::eimulq(Register src, bool no_flags) { > 14899: InstructionAttr attributes(AVX_128bit, /* vex_w */ true, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 14900: int encode = vex_prefix_and_encode(0, 0, src->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, /* src_is_gpr */ true, /* nds_is_ndd */ false, no_flags); Is there a reason we are not calling the evex_prefix_and_encode_nf here? src/hotspot/cpu/x86/assembler_x86.cpp line 14921: > 14919: InstructionMark im(this); > 14920: InstructionAttr attributes(AVX_128bit, /* vex_w */ true, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 14921: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_32bit); input_size_in_bits could be EVEX_64bit. src/hotspot/cpu/x86/assembler_x86.cpp line 14946: > 14944: void Assembler::eimulq(Register dst, Register src, int value, bool no_flags) { > 14945: InstructionAttr attributes(AVX_128bit, /* vex_w */ true, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 14946: int encode = vex_prefix_and_encode(dst->encoding(), 0, src->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, /* src_is_gpr */ true, /* nds_is_ndd */ false, no_flags); Is there a reason we are not calling the evex_prefix_and_encode_nf here? src/hotspot/cpu/x86/assembler_x86.cpp line 14965: > 14963: InstructionMark im(this); > 14964: InstructionAttr attributes(AVX_128bit, /* vex_w */ true, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 14965: vex_prefix(src, 0, dst->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F, &attributes, /* nds_is_ndd */ false, no_flags); Is there a reason we are not calling the evex_prefix_nf here? src/hotspot/cpu/x86/assembler_x86.cpp line 14973: > 14971: InstructionMark im(this); > 14972: InstructionAttr attributes(AVX_128bit, /* vex_w */ true, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 14973: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_32bit); input_size_in_bits should be EVEX_64bit. src/hotspot/cpu/x86/assembler_x86.cpp line 14988: > 14986: void Assembler::eincl(Register dst, Register src, bool no_flags) { > 14987: // Don't use it directly. Use MacroAssembler::incrementl() instead. > 14988: // Use two-byte form (one-byte from is a REX prefix in 64-bit mode) This comment could be removed. src/hotspot/cpu/x86/assembler_x86.cpp line 15004: > 15002: void Assembler::eincq(Register dst, Register src, bool no_flags) { > 15003: // Don't use it directly. Use MacroAssembler::incrementq() instead. > 15004: // Use two-byte form (one-byte from is a REX prefix in 64-bit mode) This comment could be removed. src/hotspot/cpu/x86/assembler_x86.cpp line 15018: > 15016: > 15017: void Assembler::eincq(Register dst, Address src, bool no_flags) { > 15018: // Don't use it directly. Use MacroAssembler::incrementq() instead. This comment could be removed. src/hotspot/cpu/x86/assembler_x86.cpp line 15827: > 15825: > 15826: void Assembler::esarq(Register dst, Address src, int imm8, bool no_flags) { > 15827: InstructionMark im(this); assert(isShiftCount(imm8 >> 1), "illegal shift count") is missing. src/hotspot/cpu/x86/assembler_x86.cpp line 15885: > 15883: InstructionAttr attributes(AVX_128bit, /* vex_w */ true, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 15884: int encode = evex_prefix_and_encode_ndd(0, dst->encoding(), src->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, no_flags); > 15885: emit_int16((unsigned char)0xD1, (0xF8 | encode)); This should be: emit_int16((unsigned char)0xD3, (0xF8 | encode)); Shift by cl and not shift by 1. src/hotspot/cpu/x86/assembler_x86.cpp line 15920: > 15918: } > 15919: > 15920: void Assembler::esbbq(Register dst, Register src1, Address src2) { InstructionMark im(this) is missing. src/hotspot/cpu/x86/assembler_x86.cpp line 15934: > 15932: > 15933: void Assembler::esbbq(Register dst, Register src1, Register src2) { > 15934: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); vex_w should be true here? src/hotspot/cpu/x86/assembler_x86.cpp line 16039: > 16037: assert(isShiftCount(imm8 >> 1), "illegal shift count"); > 16038: InstructionAttr attributes(AVX_128bit, /* vex_w */ true, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 16039: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_32bit); input_size_in_bits should be EVEX_64bit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1745963354 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1745966767 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1745985658 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1745987552 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1745988624 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1745990265 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746011620 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1745995095 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1745995624 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1745996314 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746012273 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746013876 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746015511 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746059366 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746060252 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746108142 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746110091 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746113050 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746113669 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746115430 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746228953 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746229449 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746229826 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746237405 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746239674 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746243869 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746248995 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746253962 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746255625 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746257746 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746259847 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746262676 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746283748 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746287569 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746288704 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746289294 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1746291123 From dlong at openjdk.org Thu Sep 5 23:45:55 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 5 Sep 2024 23:45:55 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v12] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 15:37:45 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/type.hpp line 564: >> >>> 562: // Dual sets are only used to compute the join of 2 sets, and not used >>> 563: // outside. >>> 564: const bool _is_dual; >> >> Do you have a clear definition of what a dual is somewhere? > > Tbh I'm not entirely sure what does "dual" serve other than: it is to compute the join of 2 sets. Looking at the code suggests that a `Type` is somehow a symmetric space with all normal types on one side and the "dual" types on the other. This really confuses me and I cannot figure out what exactly a dual type represents. My understanding is "join" means "union", "meet" means "intersection", and "dual" means "complement". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1746317661 From dlong at openjdk.org Fri Sep 6 00:06:54 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 6 Sep 2024 00:06:54 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v12] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 23:42:58 GMT, Dean Long wrote: >> Tbh I'm not entirely sure what does "dual" serve other than: it is to compute the join of 2 sets. Looking at the code suggests that a `Type` is somehow a symmetric space with all normal types on one side and the "dual" types on the other. This really confuses me and I cannot figure out what exactly a dual type represents. > > My understanding is "join" means "union", "meet" means "intersection", and "dual" means "complement". "Complement" may not be quite right, or at least I don't see how it applies to TypeF and TypeD, whose xdual() functions are "self-symmetric" and do nothing except return the original type unchanged.. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1746328465 From kvn at openjdk.org Fri Sep 6 00:11:56 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 6 Sep 2024 00:11:56 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v15] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 18:12:55 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > fix builds I submitted our testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2332927584 From dlong at openjdk.org Fri Sep 6 00:26:55 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 6 Sep 2024 00:26:55 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v12] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 00:03:51 GMT, Dean Long wrote: >> My understanding is "join" means "union", "meet" means "intersection", and "dual" means "complement". > > "Complement" may not be quite right, or at least I don't see how it applies to TypeF and TypeD, whose xdual() functions are "self-symmetric" and do nothing except return the original type unchanged.. https://en.wikipedia.org/wiki/Duality_(order_theory) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1746339096 From dlong at openjdk.org Fri Sep 6 00:37:54 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 6 Sep 2024 00:37:54 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v12] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 00:24:14 GMT, Dean Long wrote: >> "Complement" may not be quite right, or at least I don't see how it applies to TypeF and TypeD, whose xdual() functions are "self-symmetric" and do nothing except return the original type unchanged.. > > https://en.wikipedia.org/wiki/Duality_(order_theory) If _lo <= x <= _hi, then I believe the dual is _hi <= x <= _lo If dual is really only needed for join, then it seems like we could remove the concept of dual and just implement join. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1746344441 From kbarrett at openjdk.org Fri Sep 6 03:35:50 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 6 Sep 2024 03:35:50 GMT Subject: RFR: 8339242: Fix overflow issues in AdlArena [v4] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 14:31:28 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> This PR addresses an issue in `adlArena` where some allocations lack checks for overflow. This could potentially result in successful allocations when called with unrealistic values. >> >> The fix includes: >> >> - Adding assertions to check for potential overflow. >> - Reordering some operations to guard against overflow. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > review suggestions Changes requested by kbarrett (Reviewer). src/hotspot/share/adlc/adlArena.cpp line 141: > 139: > 140: //------------------------------realloc---------------------------------------- > 141: size_t pointer_delta(const void *left, const void *right) { Function ought to be static. ------------- PR Review: https://git.openjdk.org/jdk/pull/20774#pullrequestreview-2284536292 PR Review Comment: https://git.openjdk.org/jdk/pull/20774#discussion_r1746465313 From kbarrett at openjdk.org Fri Sep 6 03:35:51 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 6 Sep 2024 03:35:51 GMT Subject: RFR: 8339242: Fix overflow issues in AdlArena [v4] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 20:21:20 GMT, Dean Long wrote: >> Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: >> >> review suggestions > > src/hotspot/share/adlc/adlArena.cpp line 141: > >> 139: >> 140: //------------------------------realloc---------------------------------------- >> 141: size_t pointer_delta(const void *left, const void *right) { > > Do we want to assert left >= right here? It's currently only used in one place, where we know that's true. OTOH, an assert doesn't hurt. Instead of this helper function, we could just use`(size_t)(_max - c_old)` inline (we can be confident the difference won't exceed the `ptrdiff_t` range here), reducing the lines of code by a little bit. The benefit from the helper is having the Arealloc code (nearly? completely?) identical here and in memory/arena.cpp. I don't have a strong opinion either way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20774#discussion_r1746467080 From jkarthikeyan at openjdk.org Fri Sep 6 04:41:24 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 6 Sep 2024 04:41:24 GMT Subject: RFR: 8335444: Generalize implementation of AndNode mul_ring [v4] In-Reply-To: <7l_zt0Q884dkzJbP5kjhfvZPmFQ4CMRgakWoUCopfvw=.fc1307d0-6a2c-40cc-adc6-c1b21424d0dd@github.com> References: <7l_zt0Q884dkzJbP5kjhfvZPmFQ4CMRgakWoUCopfvw=.fc1307d0-6a2c-40cc-adc6-c1b21424d0dd@github.com> Message-ID: > Hi all, > I've written this patch which improves type calculation for bitwise-and functions. Previously, the only cases that were considered were if one of the inputs to the node were a positive constant. I've generalized this behavior, as well as added a case to better estimate the result for arbitrary ranges. Since these are very common patterns to see, this can help propagate more precise types throughout the ideal graph for a large number of methods, making other optimizations and analyses stronger. I was interested in where this patch improves types, so I ran CTW for `java_base` and `java_base_2` and printed out the differences in this gist [here](https://gist.github.com/jaskarth/b45260d81ab621656f4a55cc51cf5292). While I don't think it's particularly complicated I've also added some discussion of the mathematics below, mostly because I thought it was interesting to work through :) > > This patch passes tier1-3 testing on my linux x64 machine. Thoughts and reviews would be very appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Improve cases with two negative ranges, add more documentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20066/files - new: https://git.openjdk.org/jdk/pull/20066/files/ca2db583..0d177423 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20066&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20066&range=02-03 Stats: 52 lines in 3 files changed: 37 ins; 2 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/20066.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20066/head:pull/20066 PR: https://git.openjdk.org/jdk/pull/20066 From jkarthikeyan at openjdk.org Fri Sep 6 04:41:24 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 6 Sep 2024 04:41:24 GMT Subject: RFR: 8335444: Generalize implementation of AndNode mul_ring [v3] In-Reply-To: <-t28h66vJlrX5ieI15WmqUqGGMORqCjwJRStbCvqzEk=.680fdd08-0406-41df-b927-fcac04f2b14e@github.com> References: <7l_zt0Q884dkzJbP5kjhfvZPmFQ4CMRgakWoUCopfvw=.fc1307d0-6a2c-40cc-adc6-c1b21424d0dd@github.com> <6lNmXhkDKMNu7r8yYrBYs4JUJQ-drbj9Gj1_EaAqQj0=.c699170b-0551-4c04-8322-0182cf9b5866@github.com> <-t28h66vJlrX5ieI15WmqUqGGMORqCjwJRStbCvqzEk=.680fdd08-0406-41df-b927-fcac04f2b14e@github.com> Message-ID: <2Sb31jj1CP3LNng8JQeJ8l2YpeHtK7YfU_f0I3d6NNU=.fd56fcb0-97a0-4462-be32-9369c0d0f154@github.com> On Wed, 4 Sep 2024 06:09:06 GMT, Christian Hagedorn wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Check IR before macro expansion > > I agree that we should keep this RFE simple. And we are just using thing that we already have. So, we could just go with the optimizations that you currently have (if you like to apply the few simple improvement suggestions, you can already do that) and follow up with future RFEs to cover more cases. @chhagedorn I've pushed a new version that should address all the comments from code review. I ended up handling the case with two negative ranges as well because it didn't add too much code complexity, but I agree that we should just use the optimizations we have and not add more to this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20066#issuecomment-2333218988 From jbhateja at openjdk.org Fri Sep 6 06:30:55 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 6 Sep 2024 06:30:55 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v5] In-Reply-To: References: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> <7huBzF7ygKcr1ADKYTizGsyEBNb6dWYaU3g9_StUGB4=.89495de4-e6b5-47d6-9756-41471d366211@github.com> Message-ID: On Thu, 5 Sep 2024 14:31:39 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectornode.hpp line 634: >> >>> 632: virtual int Opcode() const; >>> 633: }; >>> 634: >> >> This could also be a separate PR. Or are they somehow inseparable from the "saturation" changes? > > What is not applicable? Do you actually need this node for the saturating operations? It was in context of scalar IRs, as mentioned we plan to support unsigned scalar operation and its idealizations in follow up patch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1746573566 From jbhateja at openjdk.org Fri Sep 6 06:43:31 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 6 Sep 2024 06:43:31 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v8] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/7164783e..195390fe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=06-07 Stats: 24 lines in 2 files changed: 0 ins; 1 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From jbhateja at openjdk.org Fri Sep 6 06:43:32 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 6 Sep 2024 06:43:32 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v7] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 14:33:56 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Some cleanups. > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathUtils.java line 78: > >> 76: * @since 24 >> 77: */ >> 78: public static long addSaturating(long a, long b) { > > Are these public methods any Java dev could use? If so: do we have tests for them? Made them package private. These routines are exercised by newly added jtreg tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1746584232 From duke at openjdk.org Fri Sep 6 06:47:57 2024 From: duke at openjdk.org (Yagmur Eren) Date: Fri, 6 Sep 2024 06:47:57 GMT Subject: Integrated: 8330159: [C2] Remove or clarify Compile::init_start In-Reply-To: <7Q9VAVqBUDDnqEcCOWeRh5N-YC0dsg9lyQsOv6xbO80=.6d55dc58-0281-45fd-a92f-d0f6ddd910cc@github.com> References: <7Q9VAVqBUDDnqEcCOWeRh5N-YC0dsg9lyQsOv6xbO80=.6d55dc58-0281-45fd-a92f-d0f6ddd910cc@github.com> Message-ID: On Mon, 26 Aug 2024 13:54:16 GMT, Yagmur Eren wrote: > Compile::init_start method contained only an assertion. To cleanup, this method is removed and the locations where this method is called are replaced with the corresponding assertion. See issue: [JDK-8330159](https://bugs.openjdk.org/browse/JDK-8330159) This pull request has now been integrated. Changeset: 7db4d46c Author: nelanbu Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/7db4d46c3904d1a6949f053e6fc5e971cd519088 Stats: 11 lines in 3 files changed: 1 ins; 2 del; 8 mod 8330159: [C2] Remove or clarify Compile::init_start Reviewed-by: chagedorn, dlong ------------- PR: https://git.openjdk.org/jdk/pull/20715 From rcastanedalo at openjdk.org Fri Sep 6 08:49:35 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 6 Sep 2024 08:49:35 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v16] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'TheRealMDoerr/8334111_PPC64_G1_Barriers_V2' into JDK-8334060-g1-late-barrier-expansion - Cleanup g1_ppc.ad ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/9821e795..22e07ef0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=14-15 Stats: 40 lines in 1 file changed: 4 ins; 30 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Fri Sep 6 08:49:35 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 6 Sep 2024 08:49:35 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v15] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 10:05:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Remove unnecessary g1LoadXVolatile instructions in aarch64 > I've implemented the same cleanup as on aarch64: [TheRealMDoerr at ad662a2](https://github.com/TheRealMDoerr/jdk/commit/ad662a256034a09156b1b43673d2640a119740b2) Would be nice if you could apply it. Thanks! Sure, merged now (commit 22e07ef03a). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2333553391 From duke at openjdk.org Fri Sep 6 08:52:07 2024 From: duke at openjdk.org (Casper Norrbin) Date: Fri, 6 Sep 2024 08:52:07 GMT Subject: RFR: 8339242: Fix overflow issues in AdlArena [v5] In-Reply-To: References: Message-ID: <3B5BaULSyseHlT_yYiTogHQoMpY9vKQ54MXysGB6eIE=.e37f9f31-96c7-4c3e-8d15-bbd3c97ec5db@github.com> > Hi everyone, > > This PR addresses an issue in `adlArena` where some allocations lack checks for overflow. This could potentially result in successful allocations when called with unrealistic values. > > The fix includes: > > - Adding assertions to check for potential overflow. > - Reordering some operations to guard against overflow. Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: assert + static pointer_delta fun ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20774/files - new: https://git.openjdk.org/jdk/pull/20774/files/aec249fa..41804e2e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20774&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20774&range=03-04 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20774.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20774/head:pull/20774 PR: https://git.openjdk.org/jdk/pull/20774 From duke at openjdk.org Fri Sep 6 08:55:51 2024 From: duke at openjdk.org (Casper Norrbin) Date: Fri, 6 Sep 2024 08:55:51 GMT Subject: RFR: 8339242: Fix overflow issues in AdlArena [v4] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 03:29:43 GMT, Kim Barrett wrote: >> src/hotspot/share/adlc/adlArena.cpp line 141: >> >>> 139: >>> 140: //------------------------------realloc---------------------------------------- >>> 141: size_t pointer_delta(const void *left, const void *right) { >> >> Do we want to assert left >= right here? > > It's currently only used in one place, where we know that's true. OTOH, an assert doesn't hurt. > Instead of this helper function, we could just use`(size_t)(_max - c_old)` inline (we can be > confident the difference won't exceed the `ptrdiff_t` range here), reducing the lines of code by > a little bit. The benefit from the helper is having the Arealloc code (nearly? completely?) identical > here and in memory/arena.cpp. I don't have a strong opinion either way. The motivation was to keep the two arenas consistent. To that end, I've now added an assert and made the function static. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20774#discussion_r1746740935 From jbhateja at openjdk.org Fri Sep 6 09:08:53 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 6 Sep 2024 09:08:53 GMT Subject: RFR: 8337632: AES-GCM Algorithm optimization for x86_64 [v2] In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 22:42:33 GMT, Smita Kamath wrote: >> Hi, >> I want to submit an AES-GCM algorithm optimization. This implementation is using AVX512/VAES Instructions. Additionally, it reduces PARALLEL_LEN from 7680 to 512 bytes. The performance numbers are as below. Kindly review the code. Thank you. >> >> Benchmark | Datasize | BaseJDK (ops/s) | Patch(ops/s) | %Gain >> -- | -- | -- | -- | -- >> full.AESGCMBench.decrypt | 512 | 2928259.197 | 3269964.387 | 11.67 >> full.AESGCMBench.decrypt | 1024 | 2494254.611 | 3010987.731 | 20.72 >> full.AESGCMBench.decrypt | 1500 | 1883453.546 | 1934915.846 | 2.73 >> full.AESGCMBench.decrypt | 2048 | 1825780.711 | 2452861.368 | 34.34 >> full.AESGCMBench.decrypt | 4096 | 1275108.345 | 1806329.066 | 41.66 >> full.AESGCMBench.decrypt | 8192 | 1033936.634 | 1196836.052 | 15.75 >> full.AESGCMBench.decrypt | 16384 | 681494.768 | 711630.498 | 4.42 >> full.AESGCMBench.decrypt | 32768 | 385026.017 | 395043.193 | 2.6 >> full.AESGCMBench.decrypt | 65536 | 207373.924 | 214723.588 | 3.54 >> ? | ? | ? | ? | ? >> full.AESGCMBench.encrypt | 512 | 2658008.476 | 2882496.94 | 8.45 >> full.AESGCMBench.encrypt | 1024 | 2283709.63 | 2589534.403 | 13.39 >> full.AESGCMBench.encrypt | 1500 | 1794993.519 | 1817669.531 | 1.26 >> full.AESGCMBench.encrypt | 2048 | 1745532.435 | 2191097.29 | 25.52 >> full.AESGCMBench.encrypt | 4096 | 1203301.174 | 1649593.953 | 37.08 >> full.AESGCMBench.encrypt | 8192 | 985174.988 | 1132407.54 | 14.94 >> full.AESGCMBench.encrypt | 16384 | 658980.441 | 684765.771 | 3.91 >> full.AESGCMBench.encrypt | 32768 | 373543.798 | 391518.837 | 4.81 >> full.AESGCMBench.encrypt | 65536 | 202532.315 | 205084.833 | 1.260301597 > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments src/hotspot/cpu/x86/assembler_x86.cpp line 1849: > 1847: > 1848: void Assembler::cmpb(Register dst, int imm8) { > 1849: assert(dst->has_byte_register(), "must have byte register"); Above assertion is already part of emit_arith_b ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17515#discussion_r1735611149 From jbhateja at openjdk.org Fri Sep 6 09:08:55 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 6 Sep 2024 09:08:55 GMT Subject: RFR: 8337632: AES-GCM Algorithm optimization for x86_64 [v3] In-Reply-To: References: Message-ID: <8VoBChxBjiW1bW9pDuPZqpLetYklTvVTERvUUQjxlQM=.d3e29df2-8b01-4786-8648-a3fda9a4a0d4@github.com> On Fri, 30 Aug 2024 00:07:39 GMT, Smita Kamath wrote: >> Hi, >> I want to submit an AES-GCM algorithm optimization. This implementation is using AVX512/VAES Instructions. Additionally, it reduces PARALLEL_LEN from 7680 to 512 bytes. The performance numbers are as below. Kindly review the code. Thank you. >> >> Benchmark | Datasize | BaseJDK (ops/s) | Patch(ops/s) | %Gain >> -- | -- | -- | -- | -- >> full.AESGCMBench.decrypt | 512 | 2928259.197 | 3269964.387 | 11.67 >> full.AESGCMBench.decrypt | 1024 | 2494254.611 | 3010987.731 | 20.72 >> full.AESGCMBench.decrypt | 1500 | 1883453.546 | 1934915.846 | 2.73 >> full.AESGCMBench.decrypt | 2048 | 1825780.711 | 2452861.368 | 34.34 >> full.AESGCMBench.decrypt | 4096 | 1275108.345 | 1806329.066 | 41.66 >> full.AESGCMBench.decrypt | 8192 | 1033936.634 | 1196836.052 | 15.75 >> full.AESGCMBench.decrypt | 16384 | 681494.768 | 711630.498 | 4.42 >> full.AESGCMBench.decrypt | 32768 | 385026.017 | 395043.193 | 2.6 >> full.AESGCMBench.decrypt | 65536 | 207373.924 | 214723.588 | 3.54 >> ? | ? | ? | ? | ? >> full.AESGCMBench.encrypt | 512 | 2658008.476 | 2882496.94 | 8.45 >> full.AESGCMBench.encrypt | 1024 | 2283709.63 | 2589534.403 | 13.39 >> full.AESGCMBench.encrypt | 1500 | 1794993.519 | 1817669.531 | 1.26 >> full.AESGCMBench.encrypt | 2048 | 1745532.435 | 2191097.29 | 25.52 >> full.AESGCMBench.encrypt | 4096 | 1203301.174 | 1649593.953 | 37.08 >> full.AESGCMBench.encrypt | 8192 | 985174.988 | 1132407.54 | 14.94 >> full.AESGCMBench.encrypt | 16384 | 658980.441 | 684765.771 | 3.91 >> full.AESGCMBench.encrypt | 32768 | 373543.798 | 391518.837 | 4.81 >> full.AESGCMBench.encrypt | 65536 | 202532.315 | 205084.833 | 1.260301597 > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated copyright dates and addressed review comments src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 286: > 284: __ push(r15);//holds number of rounds > 285: __ push(rbx);//scratch register > 286: #ifdef _WIN64 Should we replace these stack access with GPR to scratch register XMM and vice-versa transfers. src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 3001: > 2999: if (do_reduction) { > 3000: //new reduction > 3001: __ evmovdquq(ZTMPB, ExternalAddress(ghash_polynomial_addr()), Assembler::AVX_512bit, rbx /*rscratch*/); Is this based on aggregate reduction method ? Can you please add some comments to narrate the reduction algorithm. src/hotspot/cpu/x86/stubGenerator_x86_64_ghash.cpp line 60: > 58: // Polynomial x^128+x^127+x^126+x^121+1 > 59: ATTRIBUTE_ALIGNED(16) static const uint64_t GHASH_POLYNOMIAL[] = { > 60: 0x0000000000000001ULL, 0xC200000000000000ULL, As per https://www.intel.com/content/dam/develop/external/us/en/documents/clmul-wp-rev-2-02-2014-04-20.pdf and https://www.intel.com/content/dam/www/public/us/en/documents/software-support/enabling-high-performance-gcm.pdf reduction polynomial for GHASH should be "x^128 + x^7 + x^2 + x + 1". Also the polynomial defined in comments is not matching with the bit representation 1100 0010 <119 zeros> 1 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17515#discussion_r1740682763 PR Review Comment: https://git.openjdk.org/jdk/pull/17515#discussion_r1746765269 PR Review Comment: https://git.openjdk.org/jdk/pull/17515#discussion_r1746631667 From jbhateja at openjdk.org Fri Sep 6 09:42:55 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 6 Sep 2024 09:42:55 GMT Subject: RFR: 8337632: AES-GCM Algorithm optimization for x86_64 [v3] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 00:07:39 GMT, Smita Kamath wrote: >> Hi, >> I want to submit an AES-GCM algorithm optimization. This implementation is using AVX512/VAES Instructions. Additionally, it reduces PARALLEL_LEN from 7680 to 512 bytes. The performance numbers are as below. Kindly review the code. Thank you. >> >> Benchmark | Datasize | BaseJDK (ops/s) | Patch(ops/s) | %Gain >> -- | -- | -- | -- | -- >> full.AESGCMBench.decrypt | 512 | 2928259.197 | 3269964.387 | 11.67 >> full.AESGCMBench.decrypt | 1024 | 2494254.611 | 3010987.731 | 20.72 >> full.AESGCMBench.decrypt | 1500 | 1883453.546 | 1934915.846 | 2.73 >> full.AESGCMBench.decrypt | 2048 | 1825780.711 | 2452861.368 | 34.34 >> full.AESGCMBench.decrypt | 4096 | 1275108.345 | 1806329.066 | 41.66 >> full.AESGCMBench.decrypt | 8192 | 1033936.634 | 1196836.052 | 15.75 >> full.AESGCMBench.decrypt | 16384 | 681494.768 | 711630.498 | 4.42 >> full.AESGCMBench.decrypt | 32768 | 385026.017 | 395043.193 | 2.6 >> full.AESGCMBench.decrypt | 65536 | 207373.924 | 214723.588 | 3.54 >> ? | ? | ? | ? | ? >> full.AESGCMBench.encrypt | 512 | 2658008.476 | 2882496.94 | 8.45 >> full.AESGCMBench.encrypt | 1024 | 2283709.63 | 2589534.403 | 13.39 >> full.AESGCMBench.encrypt | 1500 | 1794993.519 | 1817669.531 | 1.26 >> full.AESGCMBench.encrypt | 2048 | 1745532.435 | 2191097.29 | 25.52 >> full.AESGCMBench.encrypt | 4096 | 1203301.174 | 1649593.953 | 37.08 >> full.AESGCMBench.encrypt | 8192 | 985174.988 | 1132407.54 | 14.94 >> full.AESGCMBench.encrypt | 16384 | 658980.441 | 684765.771 | 3.91 >> full.AESGCMBench.encrypt | 32768 | 373543.798 | 391518.837 | 4.81 >> full.AESGCMBench.encrypt | 65536 | 202532.315 | 205084.833 | 1.260301597 > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated copyright dates and addressed review comments Kindly extend to micro for small dataSizes (128,256 and 512) and keyLenght (128, 192) https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/javax/crypto/full/AESGCMBench.java ------------- PR Comment: https://git.openjdk.org/jdk/pull/17515#issuecomment-2333672896 From rcastanedalo at openjdk.org Fri Sep 6 09:43:57 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 6 Sep 2024 09:43:57 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v15] In-Reply-To: References: Message-ID: <01AjvN-E7dfMD3IbJHbjLpHwe3VbMuPPloi3vt7Bxxk=.bc1bb5e5-6cd9-414d-8c76-a224fe856dcd@github.com> On Thu, 5 Sep 2024 20:36:01 GMT, halkosajtarevic wrote: > Sorry, one maybe dumb question, hopefully matching the context here: Is this whole handling also required if those references point to instances of enums? Or could we do some further optimizations in such cases? Or is that exactly what C2 is doing afterwards? Hi, do you mean whether G1 requires barriers when writing enum instances into object fields, as in `storeEnum` in this example? (...) public enum Day {MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY}; static class MyObject { Day day; } public static void storeEnum(MyObject o, Day d) { o.day = d; } (...) MyObject o = new MyObject(); Day d = Day.TUESDAY; storeEnum(o, d); (...) If so, the answer is yes: C2 treats this case as any other object write and generates GC barriers accordingly. Do you have any specific optimization in mind? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2333674779 From duke at openjdk.org Fri Sep 6 10:14:59 2024 From: duke at openjdk.org (halkosajtarevic) Date: Fri, 6 Sep 2024 10:14:59 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v16] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 08:49:35 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'TheRealMDoerr/8334111_PPC64_G1_Barriers_V2' into JDK-8334060-g1-late-barrier-expansion > - Cleanup g1_ppc.ad Yes exactly, that was what I meant. I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2333731725 From amitkumar at openjdk.org Fri Sep 6 10:43:54 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 6 Sep 2024 10:43:54 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v15] In-Reply-To: <01AjvN-E7dfMD3IbJHbjLpHwe3VbMuPPloi3vt7Bxxk=.bc1bb5e5-6cd9-414d-8c76-a224fe856dcd@github.com> References: <01AjvN-E7dfMD3IbJHbjLpHwe3VbMuPPloi3vt7Bxxk=.bc1bb5e5-6cd9-414d-8c76-a224fe856dcd@github.com> Message-ID: <5SBKgUwrPmIXH0hA64aKRsYZiHMg0M0uh_IjFq_xdAo=.f323ec69-adf3-4722-a5cb-0c49cfb8c5b1@github.com> On Fri, 6 Sep 2024 09:40:56 GMT, Roberto Casta?eda Lozano wrote: >> Sorry, one maybe dumb question, hopefully matching the context here: >> Is this whole handling also required if those references point to instances of enums? Or could we do some further optimizations in such cases? Or is that exactly what C2 is doing afterwards? > >> Sorry, one maybe dumb question, hopefully matching the context here: Is this whole handling also required if those references point to instances of enums? Or could we do some further optimizations in such cases? Or is that exactly what C2 is doing afterwards? > > Hi, do you mean whether G1 requires barriers when writing enum instances into object fields, as in `storeEnum` in this example? > > > (...) > > public enum Day {MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY}; > > static class MyObject { > Day day; > } > > public static void storeEnum(MyObject o, Day d) { > o.day = d; > } > > (...) > > MyObject o = new MyObject(); > Day d = Day.TUESDAY; > storeEnum(o, d); > > (...) > > > If so, the answer is yes: C2 treats this case as any other object write and generates GC barriers accordingly. Do you have any specific optimization in mind? Hi @robcasloz, you can pick up s390x patch from here: https://github.com/offamitkumar/jdk/commit/6663433c4aa17925f699eaa8995cdc0cd78c0034 ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2333779374 From rcastanedalo at openjdk.org Fri Sep 6 12:07:57 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 6 Sep 2024 12:07:57 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v16] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 10:12:19 GMT, halkosajtarevic wrote: > I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected. As far as I understand, from a GC perspective enum instances are handled just like any other class instance, so I do not think there is any special GC barrier optimization opportunity for them. Maybe some GC expert can comment further on this or correct me if I am wrong. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2333907222 From chagedorn at openjdk.org Fri Sep 6 13:56:46 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 6 Sep 2024 13:56:46 GMT Subject: RFR: 8335444: Generalize implementation of AndNode mul_ring [v4] In-Reply-To: References: <7l_zt0Q884dkzJbP5kjhfvZPmFQ4CMRgakWoUCopfvw=.fc1307d0-6a2c-40cc-adc6-c1b21424d0dd@github.com> Message-ID: On Fri, 6 Sep 2024 04:41:24 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> I've written this patch which improves type calculation for bitwise-and functions. Previously, the only cases that were considered were if one of the inputs to the node were a positive constant. I've generalized this behavior, as well as added a case to better estimate the result for arbitrary ranges. Since these are very common patterns to see, this can help propagate more precise types throughout the ideal graph for a large number of methods, making other optimizations and analyses stronger. I was interested in where this patch improves types, so I ran CTW for `java_base` and `java_base_2` and printed out the differences in this gist [here](https://gist.github.com/jaskarth/b45260d81ab621656f4a55cc51cf5292). While I don't think it's particularly complicated I've also added some discussion of the mathematics below, mostly because I thought it was interesting to work through :) >> >> This patch passes tier1-3 testing on my linux x64 machine. Thoughts and reviews would be very appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Improve cases with two negative ranges, add more documentation Thanks for the update, looks good to me! I'll give this another spinning in our testing over the weekend (will only be able to report back on Tuesday since Monday is a public holiday here). ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20066#pullrequestreview-2286327449 From rcastanedalo at openjdk.org Fri Sep 6 14:15:41 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 6 Sep 2024 14:15:41 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: s390 port : late barrier expansion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/22e07ef0..6663433c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=15-16 Stats: 896 lines in 8 files changed: 837 ins; 32 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Fri Sep 6 14:15:42 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 6 Sep 2024 14:15:42 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v16] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 12:04:52 GMT, Roberto Casta?eda Lozano wrote: >> Yes exactly, that was what I meant. >> I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected. > >> I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected. > > As far as I understand, from a GC perspective enum instances are handled just like any other class instance, so I do not think there is any special GC barrier optimization opportunity for them. Maybe some GC expert can comment further on this or correct me if I am wrong. > Hi @robcasloz, you can pick up s390x patch from here: [offamitkumar at 6663433](https://github.com/offamitkumar/jdk/commit/6663433c4aa17925f699eaa8995cdc0cd78c0034) Done, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2334130205 From qamai at openjdk.org Fri Sep 6 15:24:11 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 6 Sep 2024 15:24:11 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v12] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 00:35:19 GMT, Dean Long wrote: >> https://en.wikipedia.org/wiki/Duality_(order_theory) > > If _lo <= x <= _hi, then I believe the dual is _hi <= x <= _lo > > If dual is really only needed for join, then it seems like we could remove the concept of dual and just implement join. @dean-long Thanks, that is really helpful. IIUC, the duality here refers to the set of all `TypeInt` with a set `a` considered higher than `b` if `a` is a subset of `b`. This leads to our notion of bottom type being the universe set and top type being the empty set. It still does not make sense for the concept of a dual `TypeInt`, though, since the concept of duality applies to the set of `TypeInt`, not the `TypeInt`s themselves. > My understanding is "join" means "union", "meet" means "intersection", and "dual" means "complement". You got it backward, "join" means intersection and "meet" means union. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1747297489 From qamai at openjdk.org Fri Sep 6 15:46:06 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 6 Sep 2024 15:46:06 GMT Subject: RFR: 8335444: Generalize implementation of AndNode mul_ring [v4] In-Reply-To: References: <7l_zt0Q884dkzJbP5kjhfvZPmFQ4CMRgakWoUCopfvw=.fc1307d0-6a2c-40cc-adc6-c1b21424d0dd@github.com> Message-ID: On Fri, 6 Sep 2024 04:41:24 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> I've written this patch which improves type calculation for bitwise-and functions. Previously, the only cases that were considered were if one of the inputs to the node were a positive constant. I've generalized this behavior, as well as added a case to better estimate the result for arbitrary ranges. Since these are very common patterns to see, this can help propagate more precise types throughout the ideal graph for a large number of methods, making other optimizations and analyses stronger. I was interested in where this patch improves types, so I ran CTW for `java_base` and `java_base_2` and printed out the differences in this gist [here](https://gist.github.com/jaskarth/b45260d81ab621656f4a55cc51cf5292). While I don't think it's particularly complicated I've also added some discussion of the mathematics below, mostly because I thought it was interesting to work through :) >> >> This patch passes tier1-3 testing on my linux x64 machine. Thoughts and reviews would be very appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Improve cases with two negative ranges, add more documentation Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20066#pullrequestreview-2286605455 From kxu at openjdk.org Fri Sep 6 15:51:51 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Fri, 6 Sep 2024 15:51:51 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() Message-ID: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> Hello all. This patch addresses [JDK-8332442](https://bugs.openjdk.org/browse/JDK-8332442) and refactors `Op_ModI`/`Op_ModL`/`Op_UModI`/`Op_UModL` cases in `DIVMOD` transforamtions. The purpose of the transformation to convert adjacent div `/` and mod `%` operations of the same operands into one should the platform support this feature (e.g., x86-64). I took the liberty adding _signed_ DIVMOD nodes (i.e., `DIV_MOD_I` and `DIV_MOD_L`) to the `IRNode` class constants as they are previously missing. Please let me know if they were left intentionally and if there are any other concerns. Thanks! ~This will be a draft PR before GHA tests are confirmed passing.~ ------------- Commit messages: - Merge branch 'master' into refactor-mod-cases - Add test and IRNode for signed int/long divmod - created IR tests - passing tier1 tests - refactor divmod ops to handle_div_mod_op Changes: https://git.openjdk.org/jdk/pull/20877/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20877&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332442 Stats: 262 lines in 6 files changed: 194 ins; 64 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20877.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20877/head:pull/20877 PR: https://git.openjdk.org/jdk/pull/20877 From kbarrett at openjdk.org Fri Sep 6 17:20:06 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 6 Sep 2024 17:20:06 GMT Subject: RFR: 8339242: Fix overflow issues in AdlArena [v5] In-Reply-To: <3B5BaULSyseHlT_yYiTogHQoMpY9vKQ54MXysGB6eIE=.e37f9f31-96c7-4c3e-8d15-bbd3c97ec5db@github.com> References: <3B5BaULSyseHlT_yYiTogHQoMpY9vKQ54MXysGB6eIE=.e37f9f31-96c7-4c3e-8d15-bbd3c97ec5db@github.com> Message-ID: <7ekByVPPU1wUYWbPwF91KkrGrO80NmGLsoCie2zCAMM=.a932bbda-9e36-4cfb-a0f2-e7fb13d80ad0@github.com> On Fri, 6 Sep 2024 08:52:07 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> This PR addresses an issue in `adlArena` where some allocations lack checks for overflow. This could potentially result in successful allocations when called with unrealistic values. >> >> The fix includes: >> >> - Adding assertions to check for potential overflow. >> - Reordering some operations to guard against overflow. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > assert + static pointer_delta fun Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20774#pullrequestreview-2286796683 From jvernee at openjdk.org Fri Sep 6 17:35:10 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 6 Sep 2024 17:35:10 GMT Subject: Integrated: 8338123: Linker crash when building a downcall handle with many arguments in x64 In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 17:52:35 GMT, Jorn Vernee wrote: > - Adjust downcall stub sizes based on latest version. (per method described in https://github.com/openjdk/jdk/pull/12908) > - Beef up test for large stubs to also cover this particular case. This pull request has now been integrated. Changeset: 8e580ec5 Author: Jorn Vernee URL: https://git.openjdk.org/jdk/commit/8e580ec5382af1886e1bbf2fda3bce6416ced604 Stats: 31 lines in 2 files changed: 20 ins; 0 del; 11 mod 8338123: Linker crash when building a downcall handle with many arguments in x64 Reviewed-by: mcimadamore ------------- PR: https://git.openjdk.org/jdk/pull/20842 From jvernee at openjdk.org Fri Sep 6 17:51:15 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 6 Sep 2024 17:51:15 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded [v5] In-Reply-To: References: Message-ID: <9Lc9Ej1toiiI8QFYXndprsPo4l-g20XWPDw5g9l36Fk=.091f48aa-5559-4862-8968-e4283d3fa728@github.com> > As discussed in the JBS issue: > > FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` > > However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. > > The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. > >
> Performance numbers > x64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op > > > aarch64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op > >
> > As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. > > Testing: tier 1-4 Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: remove PC save/restore on s390 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20479/files - new: https://git.openjdk.org/jdk/pull/20479/files/7d191107..b3aa6b41 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20479&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20479&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20479.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20479/head:pull/20479 PR: https://git.openjdk.org/jdk/pull/20479 From psandoz at openjdk.org Fri Sep 6 18:02:12 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 6 Sep 2024 18:02:12 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v7] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 06:40:18 GMT, Jatin Bhateja wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathUtils.java line 78: >> >>> 76: * @since 24 >>> 77: */ >>> 78: public static long addSaturating(long a, long b) { >> >> Are these public methods any Java dev could use? If so: do we have tests for them? > > Made them package private. These routines are exercised by newly added jtreg tests. These methods need to be public, as the need to be used in any tail computation. Recommend naming as `VectorMath` aligning with the naming of `Math` and `StrictMath`. * The class {@code VectorMath} contains methods for performing * scalar numeric operations in support of vector numeric operations. For each method we can reference the associated vector operator. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1747520954 From vlivanov at openjdk.org Fri Sep 6 18:13:07 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 6 Sep 2024 18:13:07 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 12:35:59 GMT, Roland Westrelin wrote: > So it would make sense to go over the uses of Type*::BOTTOM/Type*::NOTNULL and check they are not tested with pointer equality. Is that one what you're suggesting, Vladimir? Thanks, Roland. Yes, that's what I had in mind and wanted to double-check with you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20033#issuecomment-2334579811 From jbhateja at openjdk.org Fri Sep 6 18:13:34 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 6 Sep 2024 18:13:34 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v8] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review resolutions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/8d71f175..d3ee3104 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=06-07 Stats: 115 lines in 18 files changed: 12 ins; 15 del; 88 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From jbhateja at openjdk.org Fri Sep 6 18:13:35 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 6 Sep 2024 18:13:35 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 30 Aug 2024 14:40:35 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding descriptive comments > > src/hotspot/share/opto/vectornode.cpp line 2159: > >> 2157: >> 2158: vmask_type = TypeVect::makemask(elem_bt, num_elem); >> 2159: mask = phase->transform(new VectorMaskCastNode(mask, vmask_type)); > > I would just have two variables, and not overwrite it: `integral_vmask_type` and `vmask_type`. Maybe also `mask` could be split into two variables? I think the variable names are appropriate and in accordance with convention. > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Vector.java line 2770: > >> 2768: >> 2769: /** >> 2770: * Rearranges the lane elements of two vectors, selecting lanes > > I have a bit of a name concern here. Why are we calling it "select" and not "rearrange"? Because for a single "from" vector we also call it "rearrange", right? Is "select" not often synonymous to "blend", which works also with two "from" vectors, but with a mask and not indexing for "selection/rearranging"? We already have another flavor of [selectFrom](https://docs.oracle.com/en/java/javase/22/docs/api/jdk.incubator.vector/jdk/incubator/vector/Vector.html#selectFrom(jdk.incubator.vector.Vector)) which permutes single vector, new API extents its semantics to two vector selection, so we kept the nomenclature consistent. > test/jdk/jdk/incubator/vector/Byte128VectorTests.java line 324: > >> 322: boolean is_exceptional_idx = (int)order[idx] >= vector_len; >> 323: int oidx = is_exceptional_idx ? ((int)order[idx] - vector_len) : (int)order[idx]; >> 324: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx])); > > I thought general Java style is camelCase? Is that not followed in the VectorAPI code? I agree, but somehow we are using non camelCase conventions in this file, look for uses of 'vector_len'. just preserving file level convention. > test/jdk/jdk/incubator/vector/ShortMaxVectorTests.java line 1048: > >> 1046: return SHORT_GENERATOR_SELECT_FROM_TRIPLES.stream().map(List::toArray). >> 1047: toArray(Object[][]::new); >> 1048: } > > Just a control question: does this also occasionally generate examples with out-of-bounds indices? Negative out of bounds and positive out of bounds? Original API did throw IndexOutOfBoundsException, but later on we have moved away from exception throwing semantics to wrapping semantics. Please find details at following comment https://github.com/openjdk/jdk/pull/20508#issuecomment-2306344606 > test/jdk/jdk/incubator/vector/ShortMaxVectorTests.java line 5812: > >> 5810: ShortVector bv = ShortVector.fromArray(SPECIES, b, i); >> 5811: ShortVector idxv = ShortVector.fromArray(SPECIES, idx, i); >> 5812: idxv.selectFrom(av, bv).intoArray(r, i); > > Would this test catch a bug where the backend would generate vectors that are too long or too short? Existing vectorAPI inline expansion entry points explicitly pass lane type and count as intrinsic arguments, this is used to create concrete ideal vector types. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1747532692 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1747532456 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1747532419 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1747532340 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1747532307 From jbhateja at openjdk.org Fri Sep 6 18:13:35 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 6 Sep 2024 18:13:35 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 30 Aug 2024 14:57:31 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectornode.cpp line 2183: >> >>> 2181: }; >>> 2182: // Targets emulating unsupported permutation for certain vector types >>> 2183: // may need to message the indexes to match the users intent. >> >> Suggestion: >> >> // may need to massage the indexes to match the users intent. > > This optimization for now seems quite specific to your `SelectFromTwoVectorNode::Ideal` lowering code. Can this conversion not be done there already? > > What is the semantics of `VectorRearrangeNode`? Should its shuffle vector always be bytes, and we now violated that "for a quick second"? Or is it going to be generally the idea to create all sorts of shuffle types and then fix that up? But then why do we need the `vector_indexes_needs_massaging`? > > Can you help me understand the concept/strategy behind this? Ok, IIRC variable index permutation instruction on every target expects shape conformance b/w data vector and permute index vector. Rearrange expects indices to be passed throug shuffle, idealization routines automatically injects a VectorLoadShuffle after loading indexes held in shuffle's backing storage i.e. a byte array. In all the cases apart from byte vector permute , VectorLoadShuffle expands the index byte lanes to match the data vector lane. So we always end up emitting a lane expansion instruction before permute instruction (scenario 1). Apart from usual expansions VectorLoadShuffle may also do additional magic for some targets where it may need to prune / massage the index vector if target does not support destination vector type (scenario 2). For our case, new selectFrom accepts the indices though vectors which save redundant expansions, but to leverage existing backend support for scenario 2 we do target specific pruning ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1747532612 From sviswanathan at openjdk.org Fri Sep 6 18:43:09 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 6 Sep 2024 18:43:09 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v8] In-Reply-To: References: Message-ID: <_DbK4ZSVvMwabc8jXhGrqJD-ox6o9Bvo9or64AKUQ4E=.8bd87542-9eed-456d-8d87-a065da637918@github.com> On Fri, 6 Sep 2024 06:43:31 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review suggestions I have only one comment, rest of the changes look good to me. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathUtils.java line 33: > 31: * > 32: */ > 33: public class VectorMathUtils { Could the class also be not public as it has only package private methods now? ------------- PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2286943136 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1747565468 From sviswanathan at openjdk.org Fri Sep 6 18:47:12 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 6 Sep 2024 18:47:12 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v8] In-Reply-To: <_DbK4ZSVvMwabc8jXhGrqJD-ox6o9Bvo9or64AKUQ4E=.8bd87542-9eed-456d-8d87-a065da637918@github.com> References: <_DbK4ZSVvMwabc8jXhGrqJD-ox6o9Bvo9or64AKUQ4E=.8bd87542-9eed-456d-8d87-a065da637918@github.com> Message-ID: On Fri, 6 Sep 2024 18:39:08 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review suggestions > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathUtils.java line 33: > >> 31: * >> 32: */ >> 33: public class VectorMathUtils { > > Could the class also be not public as it has only package private methods now? Please ignore this comment as Paul suggests that the methods in this file should to be public. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1747571110 From vlivanov at openjdk.org Fri Sep 6 19:30:05 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 6 Sep 2024 19:30:05 GMT Subject: RFR: 8339299: C1 will miss type profile when inline final method [v3] In-Reply-To: <5CB0aexpZn0lL8DvUFlOoNhU223th2guno8mMWuNl-w=.03067572-03dc-45a7-ba07-f489c43fc4d0@github.com> References: <5CB0aexpZn0lL8DvUFlOoNhU223th2guno8mMWuNl-w=.03067572-03dc-45a7-ba07-f489c43fc4d0@github.com> Message-ID: On Tue, 3 Sep 2024 06:30:00 GMT, kuaiwei wrote: >> I found sometimes C1 will miss type profile and I tried to write a test to demonstrate it. >> It will happen with these conditions >> 1 It's a virtual call and the callee is a final method. So c1 will think it's static bound. >> 2 The interpreter never touch the callsite, so interpreter does not add type profile. >> 3 In c1 compilation, it will be inlined based on CHA. >> 4 In c2 compilation, the CHA is broken, but type profile is missing, so c2 can not inline it. > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Simplify should_profile_receiver_type Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20786#pullrequestreview-2287023459 From vlivanov at openjdk.org Fri Sep 6 19:30:06 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 6 Sep 2024 19:30:06 GMT Subject: RFR: 8339299: C1 will miss type profile when inline final method [v3] In-Reply-To: <4bvmYFtueyFdZVH_j_a-f4CkW_soyG7OQscLj4J_UBA=.ddfd19ee-dbbe-430d-b955-58550109da46@github.com> References: <4bvmYFtueyFdZVH_j_a-f4CkW_soyG7OQscLj4J_UBA=.ddfd19ee-dbbe-430d-b955-58550109da46@github.com> Message-ID: On Thu, 5 Sep 2024 03:23:16 GMT, Dean Long wrote: >> It does look attractive to align the logic with C2 usage of type profiles (avoid profiling when C2 doesn't consume the data). But I feel more comfortable unifying different modes of profiling at the expense of some micro-optimization opportunities. In other words, if interpreter collects some bit of data, I'd prefer to see C1 doing the same (and vice versa). >> >> I took a look at interpreter code (in `TemplateTable::invokevirtual_helper()`) and it makes the decision at runtime based on `is_vfinal` flag on `ResolvedMethodEntry`. The flag is set in `ConstantPoolCache::set_direct_or_vtable_call()` and covers both private and final methods. Moreover, receiver profiling is not performed on `invokeinterface` of private methods which is not taken into account by `should_profile_receiver_type()` now. >> >> It looks tempting to replicated what interpreter does (inspect `vfinal` flag on resolved method), but C1 has to gracefully work with not-yet-resolved call sites. So, either a recompilation or a runtime check is needed to align the behavior with interpreter. >> >> I haven't looked into the details, but performing profiling in C1 when the rest of the JVM doesn't expect that makes me a bit nervous. Smells like a possible source of profile data corruption. > > What profiling can be done seems to be decided by MethodData::compute_data_size()/MethodData::initialize_data(), which uses profile_arguments_for_invoke() and profile_return_for_invoke(). At runtime, I believe profiling is restricted by what it finds in the MethodData. Thanks for additional details, Dean. Thinking more about it, it looks like the worst case scenario possible is that there'll be more data gathered than needed, but there'll be always a slot reserved for the data. IMO current fix seems like a good compromise. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20786#discussion_r1747618729 From vlivanov at openjdk.org Fri Sep 6 19:48:05 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 6 Sep 2024 19:48:05 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access In-Reply-To: References: Message-ID: <_ExKmq0qRxs73hnA6HBDJG0b_Zqi715jkZ1pIvv_JsA=.68781292-7849-4a01-87c6-86ae65470afc@github.com> On Thu, 5 Sep 2024 14:10:53 GMT, Roland Westrelin wrote: > I think Vladimir's question is: how can null_check_oop() return top? AFAIU, it creates a CastPP with 147 as input and that CastPP is transformed to top. How does that happen? What are the steps in the call to _gvn.transform( cast ); that lead to a result of top. I think that what should happen when compiler tries to cast a value to an empty type in a dead code. Toby's response answered my question: it's a GVN on `CmpP` which determines that both inputs are `NULL` and degenerates the check into an unconditional uncommon trap. (I believe it's `in1->eqv_uncast(in2)` in `SubNode::Value_common()` which does the job.) In such case, performing `base->uncast()` in `LibraryCallKit::classify_unsafe_addr()` seems appropriate to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20033#issuecomment-2334709082 From kbarrett at openjdk.org Fri Sep 6 20:26:09 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 6 Sep 2024 20:26:09 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v16] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 12:04:52 GMT, Roberto Casta?eda Lozano wrote: > > I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected. > > As far as I understand, from a GC perspective enum instances are handled just like any other class instance, so I do not think there is any special GC barrier optimization opportunity for them. Maybe some GC expert can comment further on this or correct me if I am wrong. @robcasloz is correct, the GCs don't have any special knowledge about enum instances. They are ordinary objects, though probably long-lived so will eventually migrate to the old generation. Trying to do anything special with them seems very unlikely to provide a benefit worth the costs involved. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2334754544 From duke at openjdk.org Fri Sep 6 20:26:10 2024 From: duke at openjdk.org (halkosajtarevic) Date: Fri, 6 Sep 2024 20:26:10 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v16] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 20:21:11 GMT, Kim Barrett wrote: > > > I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected. > > > > > > As far as I understand, from a GC perspective enum instances are handled just like any other class instance, so I do not think there is any special GC barrier optimization opportunity for them. Maybe some GC expert can comment further on this or correct me if I am wrong. > > @robcasloz is correct, the GCs don't have any special knowledge about enum instances. They are ordinary objects, though probably long-lived so will eventually migrate to the old generation. Trying to do anything special with them seems very unlikely to provide a benefit worth the costs involved. Thank you very much for the insights! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2334756865 From sdohrmann at openjdk.org Fri Sep 6 21:44:44 2024 From: sdohrmann at openjdk.org (Steve Dohrmann) Date: Fri, 6 Sep 2024 21:44:44 GMT Subject: RFR: 8329035: New Data Destination instructions support [v3] In-Reply-To: References: Message-ID: > Adds assembler support for APX New Data Destination (NDD) and No Flags (NF) features. > > The NDD feature is supported by new functions that take an additional destination-only register operand. If the instruction also supports NF, a no_flags boolean parameter is present. To use these instructions with NF behavior, but without NDD semantics, the same register can be supplied for both the new destination and the (first) source operand. > > Some instructions support NF but not NDD. These instructions have a new function that just adds a boolean no_flags parameter. Existing functions were not overloaded with a boolean here because of signature collisions (bool / int) with functions that take immediate operands. > > All of the new functions have a letter "e" prefix, to avoid signature collisions and to indicate they will be evex encoded. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: refactoring and fixes based on review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20698/files - new: https://git.openjdk.org/jdk/pull/20698/files/9aea8bbb..385ea567 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20698&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20698&range=01-02 Stats: 84 lines in 2 files changed: 29 ins; 17 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/20698.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20698/head:pull/20698 PR: https://git.openjdk.org/jdk/pull/20698 From sdohrmann at openjdk.org Fri Sep 6 21:44:49 2024 From: sdohrmann at openjdk.org (Steve Dohrmann) Date: Fri, 6 Sep 2024 21:44:49 GMT Subject: RFR: 8329035: New Data Destination instructions support [v2] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 22:05:03 GMT, Sandhya Viswanathan wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> function name changes based on review comments > > src/hotspot/cpu/x86/assembler_x86.cpp line 1361: > >> 1359: InstructionMark im(this); >> 1360: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 1361: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); > > The input_size_in_bits could be EVEX_32bit. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 1403: > >> 1401: InstructionMark im(this); >> 1402: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 1403: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); > > The input_size_in_bits could be EVEX_32bit. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 1461: > >> 1459: >> 1460: void Assembler::eaddb(Register dst, Address src1, Register src2, bool no_flags) { >> 1461: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > > InstructionMark im(this) is missing. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 1475: > >> 1473: void Assembler::eaddb(Register dst, Register src, int imm8, bool no_flags) { >> 1474: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 1475: // (void) evex_prefix_and_encode_ndd(src->encoding(), dst->encoding(), 0, VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes); > > Looks like the commented line is left over. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 1780: > >> 1778: void Assembler::eandw(Register dst, Register src1, Register src2, bool no_flags) { >> 1779: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 1780: (void) evex_prefix_and_encode_ndd(src1->encoding(), dst->encoding(), src2->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, no_flags); > > This should be VEX_SIMD_66 instead of VEX_SIMD_NONE. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 1793: > >> 1791: InstructionMark im(this); >> 1792: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 1793: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); > > input_size_in_bits should be EVEX_32bit here. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 1819: > >> 1817: InstructionMark im(this); >> 1818: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 1819: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); > > input_size_in_bits should EVEX_32bit. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 1835: > >> 1833: InstructionMark im(this); >> 1834: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 1835: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); > > input_size_in_bits should EVEX_32bit. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 1837: > >> 1835: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); >> 1836: evex_prefix_ndd(src2, dst->encoding(), src1->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, no_flags); >> 1837: emit_operand(src1, src2, 0); > > emit_int8(0x23) is missing before call to emit_operand(). Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 2642: > >> 2640: >> 2641: void Assembler::edecl(Register dst, Address src, bool no_flags) { >> 2642: // Don't use it directly. Use MacroAssembler::decrement() instead. > > This comment could be removed. Done. > src/hotspot/cpu/x86/assembler_x86.cpp line 2721: > >> 2719: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 2720: int encode = evex_prefix_and_encode_nf(0, 0, src->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, no_flags); >> 2721: emit_int16((unsigned char)0xA7, (0xE8 | encode)); > > Should this be 0xF7? Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 4583: > >> 4581: InstructionMark im(this); >> 4582: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 4583: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); > > input_size_in_bits should be EVEX_32bit. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 4637: > >> 4635: InstructionMark im(this); >> 4636: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 4637: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); > > input_size_in_bits should be EVEX_32bit. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 6593: > >> 6591: >> 6592: void Assembler::erolq(Register dst, Register src, bool no_flags) { >> 6593: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > > vex_w should be true here. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 6647: > >> 6645: assert(isShiftCount(imm8), "illegal shift count"); >> 6646: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 6647: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); > > input_size_in_bits should be EVEX_32bit. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 6670: > >> 6668: InstructionMark im(this); >> 6669: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 6670: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); > > input_size_in_bits should be EVEX_32bit. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 6686: > >> 6684: } >> 6685: >> 6686: void Assembler::esall(Register dst, Register src, int imm8, bool no_flags) { > > assert(isShiftCount(imm8), "illegal shift count") missing. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 6694: > >> 6692: emit_int24((unsigned char)0xC1, (0xF8 | encode), imm8); >> 6693: } >> 6694: } > > Should this be (0xE0 | encode)? Yes, thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 6722: > >> 6720: } >> 6721: >> 6722: void Assembler::esarl(Register dst, Address src, int imm8, bool no_flags) { > > assert(isShiftCount(imm8), "illegal shift count") missing. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 6725: > >> 6723: InstructionMark im(this); >> 6724: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 6725: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); > > input_size_in_bits should be EVEX_32bit. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 6748: > >> 6746: InstructionMark im(this); >> 6747: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 6748: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); > > input_size_in_bits should be EVEX_32bit. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 6764: > >> 6762: } >> 6763: >> 6764: void Assembler::esarl(Register dst, Register src, int imm8, bool no_flags) { > > assert(isShiftCount(imm8), "illegal shift count") missing. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 6794: > >> 6792: InstructionMark im(this); >> 6793: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 6794: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); > > input_size_in_bits should be EVEX_32bit. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 6819: > >> 6817: void Assembler::esbbl(Register dst, Register src1, Address src2) { >> 6818: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 6819: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); > > input_size_in_bits should be EVEX_32bit. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 7546: > >> 7544: InstructionMark im(this); >> 7545: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 7546: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); > > input_size_in_bits should be EVEX_32bit. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 7572: > >> 7570: InstructionMark im(this); >> 7571: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 7572: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); > > input_size_in_bits should be EVEX_32bit. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 7600: > >> 7598: InstructionMark im(this); >> 7599: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 7600: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); > > input_size_in_bytes should be EVEX_32bit. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 7645: > >> 7643: void Assembler::exorw(Register dst, Register src1, Register src2, bool no_flags) { >> 7644: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 7645: (void) evex_prefix_and_encode_ndd(src1->encoding(), dst->encoding(), src2->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, no_flags); > > VEX_SIMD_NONE should be VEX_SIMD_66. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 7662: > >> 7660: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 7661: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); >> 7662: evex_prefix_ndd(src2, dst->encoding(), src1->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, no_flags); > > input_size_in_bits could be EVEX_16bit. > VEX_SIMD_NONE should be VEX_SIMD_66. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 12251: > >> 12249: >> 12250: void Assembler::edecl(Register dst, Register src, bool no_flags) { >> 12251: // Don't use it directly. Use MacroAssembler::deccrementl() instead. > > This comment can be removed. Done. > src/hotspot/cpu/x86/assembler_x86.cpp line 12751: > >> 12749: } >> 12750: if (nds_is_ndd) attributes->set_extended_context(); >> 12751: bool is_extended = adr.base_needs_rex2() || adr.index_needs_rex2() || nds_enc >= 16 || xreg_enc >= 16 || nds_is_ndd || force_evex; > > If is_evex_instruction() is set for ndd and nf already at calling place as in my previous review comments, then is_extended could remain as before: > bool is_extended = adr.base_needs_rex2() || adr.index_needs_rex2() || nds_enc >= 16 || xreg_enc >= 16; Done. > src/hotspot/cpu/x86/assembler_x86.cpp line 12841: > >> 12839: >> 12840: clear_managed(); >> 12841: if ((UseAVX > 2 && !attributes->is_legacy_mode()) || nds_is_ndd || force_evex) > > If is_evex_instruction() is set for ndd and nf already at calling place as in my previous review comments, then this if could remain as before: if (UseAVX > 2 && !attributes->is_legacy_mode()) Done. > src/hotspot/cpu/x86/assembler_x86.cpp line 13706: > >> 13704: >> 13705: void Assembler::eincl(Register dst, Register src, bool no_flags) { >> 13706: // Don't use it directly. Use MacroAssembler::incrementl() instead. > > This comment could be removed. Done. > src/hotspot/cpu/x86/assembler_x86.cpp line 14788: > >> 14786: void Assembler::edecl(Register dst, Register src, bool no_flags) { >> 14787: // Don't use it directly. Use MacroAssembler::decrementl() instead. >> 14788: // Use two-byte form (one-byte form is a REX prefix in 64-bit mode) > > This comment could be removed. Done. > src/hotspot/cpu/x86/assembler_x86.cpp line 14803: > >> 14801: void Assembler::edecq(Register dst, Register src, bool no_flags) { >> 14802: // Don't use it directly. Use MacroAssembler::incrementq() instead. >> 14803: // Use two-byte form (one-byte from is a REX prefix in 64-bit mode) > > This comment could be removed. Done. > src/hotspot/cpu/x86/assembler_x86.cpp line 14817: > >> 14815: >> 14816: void Assembler::edecq(Register dst, Address src, bool no_flags) { >> 14817: // Don't use it directly. Use MacroAssembler::increment() instead. > > This comment could be removed. Done. > src/hotspot/cpu/x86/assembler_x86.cpp line 14883: > >> 14881: void Assembler::eimulq(Register dst, Register src, bool no_flags) { >> 14882: InstructionAttr attributes(AVX_128bit, /* vex_w */ true, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 14883: int encode = vex_prefix_and_encode(dst->encoding(), 0, src->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, /* src_is_gpr */ true, /* nds_is_ndd */ false, no_flags); > > Is there a reason we are not calling the evex_prefix_and_encode_nf here? No reason. Thanks, fixed. > src/hotspot/cpu/x86/assembler_x86.cpp line 14900: > >> 14898: void Assembler::eimulq(Register src, bool no_flags) { >> 14899: InstructionAttr attributes(AVX_128bit, /* vex_w */ true, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 14900: int encode = vex_prefix_and_encode(0, 0, src->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, /* src_is_gpr */ true, /* nds_is_ndd */ false, no_flags); > > Is there a reason we are not calling the evex_prefix_and_encode_nf here? No reason. Thanks, fixed. > src/hotspot/cpu/x86/assembler_x86.cpp line 15827: > >> 15825: >> 15826: void Assembler::esarq(Register dst, Address src, int imm8, bool no_flags) { >> 15827: InstructionMark im(this); > > assert(isShiftCount(imm8 >> 1), "illegal shift count") is missing. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 15885: > >> 15883: InstructionAttr attributes(AVX_128bit, /* vex_w */ true, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 15884: int encode = evex_prefix_and_encode_ndd(0, dst->encoding(), src->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, no_flags); >> 15885: emit_int16((unsigned char)0xD1, (0xF8 | encode)); > > This should be: > emit_int16((unsigned char)0xD3, (0xF8 | encode)); > Shift by cl and not shift by 1. Thanks, fixed. > src/hotspot/cpu/x86/assembler_x86.cpp line 15920: > >> 15918: } >> 15919: >> 15920: void Assembler::esbbq(Register dst, Register src1, Address src2) { > > InstructionMark im(this) is missing. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 15934: > >> 15932: >> 15933: void Assembler::esbbq(Register dst, Register src1, Register src2) { >> 15934: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > > vex_w should be true here? Yes, thanks, fixed. > src/hotspot/cpu/x86/assembler_x86.cpp line 16039: > >> 16037: assert(isShiftCount(imm8 >> 1), "illegal shift count"); >> 16038: InstructionAttr attributes(AVX_128bit, /* vex_w */ true, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 16039: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_32bit); > > input_size_in_bits should be EVEX_64bit. Thanks, done. > src/hotspot/cpu/x86/assembler_x86.hpp line 796: > >> 794: void evex_prefix_ndd(Address adr, int ndd_enc, int xreg_enc, VexSimdPrefix pre, VexOpcode opc, InstructionAttr *attributes, bool no_flags = false) { >> 795: vex_prefix(adr, ndd_enc, xreg_enc, pre, opc, attributes, /* nds_is_ndd */ true , /* force_evex */ true, no_flags); >> 796: } > > The additional parameter force_evex could be removed and the above could be encoded as: > attributes.set_is_evex_instruction(); > vex_prefix(adr, ndd_enc, xreg_enc, pre, opc, attributes, /* nds_is_ndd */ true , no_flags); Done. > src/hotspot/cpu/x86/assembler_x86.hpp line 800: > >> 798: void evex_prefix_nf(Address adr, int ndd_enc, int xreg_enc, VexSimdPrefix pre, VexOpcode opc, InstructionAttr *attributes, bool no_flags = false) { >> 799: vex_prefix(adr, ndd_enc, xreg_enc, pre, opc, attributes, /* nds_is_ndd */ false , /* force_evex */ true, no_flags); >> 800: } > > The additional parameter force_evex could be removed and the above could be encoded as: > attributes.set_is_evex_instruction(); > vex_prefix(adr, ndd_enc, xreg_enc, pre, opc, attributes, /* nds_is_ndd */ false, no_flags); Done. > src/hotspot/cpu/x86/assembler_x86.hpp line 811: > >> 809: int evex_prefix_and_encode_ndd(int dst_enc, int nds_enc, int src_enc, VexSimdPrefix pre, VexOpcode opc, >> 810: InstructionAttr *attributes, bool no_flags = false) { >> 811: return vex_prefix_and_encode(dst_enc, nds_enc, src_enc, pre, opc, attributes, /* src_is_gpr */ true, /* nds_is_ndd */ true , /* force_evex */ true, no_flags); > > The additional parameter force_evex could be removed and the above could be encoded as: > attributes.set_is_evex_instruction(); > return vex_prefix_and_encode(dst_enc, nds_enc, src_enc, pre, opc, attributes, /* src_is_gpr */ true, /* nds_is_ndd */ true, no_flags); Done. > src/hotspot/cpu/x86/assembler_x86.hpp line 816: > >> 814: int evex_prefix_and_encode_nf(int dst_enc, int nds_enc, int src_enc, VexSimdPrefix pre, VexOpcode opc, >> 815: InstructionAttr *attributes, bool no_flags = false) { >> 816: return vex_prefix_and_encode(dst_enc, nds_enc, src_enc, pre, opc, attributes, /* src_is_gpr */ true, /* nds_is_ndd */ false, /* force_evex */ true, no_flags); > > The additional parameter force_evex could be removed and the above could be encoded as: > attributes.set_is_evex_instruction(); > return vex_prefix_and_encode(dst_enc, nds_enc, src_enc, pre, opc, attributes, /* src_is_gpr */ true, /* nds_is_ndd */ false, no_flags); Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747745368 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747745414 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747745855 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747745464 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747745537 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747745586 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747745637 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747745667 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747745720 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747745763 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747745805 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747745902 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747745952 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747746057 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747746110 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747746146 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747746228 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747746523 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747746279 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747746371 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747746421 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747746584 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747746627 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747746668 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747746694 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747746761 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747746864 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747746917 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747746991 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747747103 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747745258 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747745324 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747747151 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747747206 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747747253 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747747294 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747747371 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747747439 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747747500 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747747532 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747747588 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747747614 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747747648 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747744994 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747745103 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747745148 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1747745195 From sviswanathan at openjdk.org Fri Sep 6 21:58:16 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 6 Sep 2024 21:58:16 GMT Subject: RFR: 8329035: New Data Destination instructions support [v3] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 21:44:44 GMT, Steve Dohrmann wrote: >> Adds assembler support for APX New Data Destination (NDD) and No Flags (NF) features. >> >> The NDD feature is supported by new functions that take an additional destination-only register operand. If the instruction also supports NF, a no_flags boolean parameter is present. To use these instructions with NF behavior, but without NDD semantics, the same register can be supplied for both the new destination and the (first) source operand. >> >> Some instructions support NF but not NDD. These instructions have a new function that just adds a boolean no_flags parameter. Existing functions were not overloaded with a boolean here because of signature collisions (bool / int) with functions that take immediate operands. >> >> All of the new functions have a letter "e" prefix, to avoid signature collisions and to indicate they will be evex encoded. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > refactoring and fixes based on review comments Thanks a lot for taking care of all the review comments. The PR looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20698#pullrequestreview-2287207333 From sviswanathan at openjdk.org Fri Sep 6 22:03:30 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 6 Sep 2024 22:03:30 GMT Subject: RFR: 8329035: New Data Destination instructions support [v3] In-Reply-To: References: Message-ID: <-RLu8PXYhxqRgLIU7Qljx2Pg1GFuJOhgVqlgms7NK-8=.e8429a61-3439-4a6b-bee0-4c9d4cbaeab6@github.com> On Fri, 6 Sep 2024 21:44:44 GMT, Steve Dohrmann wrote: >> Adds assembler support for APX New Data Destination (NDD) and No Flags (NF) features. >> >> The NDD feature is supported by new functions that take an additional destination-only register operand. If the instruction also supports NF, a no_flags boolean parameter is present. To use these instructions with NF behavior, but without NDD semantics, the same register can be supplied for both the new destination and the (first) source operand. >> >> Some instructions support NF but not NDD. These instructions have a new function that just adds a boolean no_flags parameter. Existing functions were not overloaded with a boolean here because of signature collisions (bool / int) with functions that take immediate operands. >> >> All of the new functions have a letter "e" prefix, to avoid signature collisions and to indicate they will be evex encoded. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > refactoring and fixes based on review comments @vnkozlov Could you please run this through your testing? @jatin-bhateja Please let us know if the PR looks good to you. We are hoping to get this integrated by next Friday (September 13th). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20698#issuecomment-2334866141 From sviswanathan at openjdk.org Fri Sep 6 22:08:14 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 6 Sep 2024 22:08:14 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v8] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 06:43:31 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review suggestions Thank you for taking care of all my comments. /Reviewers 2 ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2287215687 PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2334870617 From dlong at openjdk.org Fri Sep 6 22:31:09 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 6 Sep 2024 22:31:09 GMT Subject: RFR: 8339299: C1 will miss type profile when inline final method [v3] In-Reply-To: <5CB0aexpZn0lL8DvUFlOoNhU223th2guno8mMWuNl-w=.03067572-03dc-45a7-ba07-f489c43fc4d0@github.com> References: <5CB0aexpZn0lL8DvUFlOoNhU223th2guno8mMWuNl-w=.03067572-03dc-45a7-ba07-f489c43fc4d0@github.com> Message-ID: On Tue, 3 Sep 2024 06:30:00 GMT, kuaiwei wrote: >> I found sometimes C1 will miss type profile and I tried to write a test to demonstrate it. >> It will happen with these conditions >> 1 It's a virtual call and the callee is a final method. So c1 will think it's static bound. >> 2 The interpreter never touch the callsite, so interpreter does not add type profile. >> 3 In c1 compilation, it will be inlined based on CHA. >> 4 In c2 compilation, the CHA is broken, but type profile is missing, so c2 can not inline it. > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Simplify should_profile_receiver_type Thinking about this a little more, I think the real problem is the choice of callee Method* passed to profile_call(). For the non-inlined case, cha_monomorphic_target is not used as the callee, but it is used for the holder. For the inlined case, we do pass cha_monomorphic_target as the callee for profile_call(). If we passed the resolved/selected Method* to try_inline() as an additional parameter to be used by profile_call(), then I believe the problem with can_be_statically_bound() goes away, and should_profile_receiver_type() will approximate what the interpreter does, which was the original intent. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20786#issuecomment-2334896273 From kvn at openjdk.org Fri Sep 6 22:35:07 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 6 Sep 2024 22:35:07 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v4] In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 14:32:25 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Update after Roberto's comments and suggestions Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20404#pullrequestreview-2287239769 From lmesnik at openjdk.org Sat Sep 7 01:24:12 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sat, 7 Sep 2024 01:24:12 GMT Subject: RFR: 8339366: [jittester] Make it possible to generate tests without execution In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 07:41:39 GMT, Evgeny Nikitin wrote: > This PR: > > 1. Extracts IR tree generation from execution (left in the Automatic) into a dedicated class IRTreeGenerator; > 2. Introduces a generation result record (named IRTreeGenerator.Test). The record contains main and private classes along with the random seed used for their generation; > 3. Creates CLI-wrapper classes for Java and ByteCode generators to allow generation-only execution; > 4. Add a repeating option to the configuration - to make it possible to specify several main class names. > > Sample usage: > > java -cp build/classes --add-opens java.base/java.util=ALL-UNNAMED \ > jdk.test.lib.jittester.JavaCodeGenerator \ > -k Test_0 -k Test_1 -k Test_10 Marked as reviewed by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20806#pullrequestreview-2287305082 From sviswanathan at openjdk.org Sat Sep 7 02:11:45 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Sat, 7 Sep 2024 02:11:45 GMT Subject: RFR: 8339698: x86 andw/orw/xorw encoding missing 0x66 prefix Message-ID: <3i6Yq72VmyFDsHctqefLP-JuAoNFKYhCqygBfHmHN-M=.1d5baff1-b71e-4686-a2d2-e56c60d7938b@github.com> x86 andw/orw/xorw encoding is missing 0x66 prefix. This bug was discovered as part of the x86 instruction encoding test generation tool and gtest ([JDK-8339507](https://bugs.openjdk.org/browse/JDK-8339507)). This fix is a precursor to the PR for JDK-8339507 (https://github.com/openjdk/jdk/pull/20857). Best Regards, Sandhya ------------- Commit messages: - x86 andw/orw/xorw encoding missing 0x66 prefix Changes: https://git.openjdk.org/jdk/pull/20901/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20901&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339698 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20901.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20901/head:pull/20901 PR: https://git.openjdk.org/jdk/pull/20901 From kbarrett at openjdk.org Sat Sep 7 04:15:14 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 7 Sep 2024 04:15:14 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 14:15:41 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > s390 port : late barrier expansion I've reviewed the non-compiler GC changes. I've looked over the compiler changes, but can't claim to have reviewed them. I've also reviewed the x64 changes, and looked over the aarch64 changes. src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 176: > 174: __ jcc(Assembler::zero, runtime); // jump to runtime if index == 0 (full buffer) > 175: // The buffer is not full, store value into it. > 176: __ subptr(temp, wordSize); // temp := next index Instead of __ testptr(temp, temp); __ jcc(Assembler::zero, runtime); __ subptr(temp, wordSize); it seems like this might be better __ subptr(temp, wordSize); __ jcc(Assembler::below, runtime); I think the code in the PR matches what the early expansion generates, so I think a change here can be deferred to a followup. src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 354: > 352: __ bind(runtime); > 353: // save the live input values > 354: RegSet saved = RegSet::of(store_addr NOT_LP64(COMMA thread)); I was looking at this a while ago, and haven't figured out why we're saving `store_addr` here. Also not sure why we're saving `thread` here for 32bit platforms. Something to think about for the future. Though maybe the 32bit case will be gone by then :) src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 112: > 110: // The answer is that stores of different sizes can co-exist > 111: // in the same sequence of RawMem effects. We sometimes initialize > 112: // a whole 'tile' of array elements with a single jint or jlong.) I'm having trouble making sense of this comment. I guess a jlong could be used to null-initialize two 32bit oops/narrowOops? But that doesn't have anything to do with jints. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2287188386 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1747741376 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1747824868 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1747898995 From mdoerr at openjdk.org Sat Sep 7 12:40:10 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Sat, 7 Sep 2024 12:40:10 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 14:15:41 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > s390 port : late barrier expansion I've run the current version through our nightly tests and there were no issues. (Applied on JDK head and tested on x86_64, aarch64 and PPC64 with several OSes.) Performance also looks good on PPC64. No regression observed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2335174688 From mdoerr at openjdk.org Sat Sep 7 13:13:09 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Sat, 7 Sep 2024 13:13:09 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded [v5] In-Reply-To: <9Lc9Ej1toiiI8QFYXndprsPo4l-g20XWPDw5g9l36Fk=.091f48aa-5559-4862-8968-e4283d3fa728@github.com> References: <9Lc9Ej1toiiI8QFYXndprsPo4l-g20XWPDw5g9l36Fk=.091f48aa-5559-4862-8968-e4283d3fa728@github.com> Message-ID: On Fri, 6 Sep 2024 17:51:15 GMT, Jorn Vernee wrote: >> As discussed in the JBS issue: >> >> FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` >> >> However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. >> >> The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. >> >>
>> Performance numbers >> x64: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op >> >> >> aarch64: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op >> >>
>> >> As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. >> >> Testing: tier 1-4 > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > remove PC save/restore on s390 Fun facts (measured on PPC64le fastdbg build which contains some asm asserts): simple upcall stub: 520 Bytes (regardless of selected GC) upcall_stub_load_target with G1: 72 Bytes upcall_stub_load_target with ZGC: 1204 Bytes (more than 2x the upcall stub + the G1 version!) upcall_stub_load_target with Shenandoah: 1364 Bytes Great to have the upcall_stub_load_target only once :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20479#issuecomment-2335182768 From kvn at openjdk.org Sat Sep 7 16:09:08 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 7 Sep 2024 16:09:08 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v15] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 18:12:55 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > fix builds My tier1-7 testing passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2335722395 From kvn at openjdk.org Sat Sep 7 17:29:07 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 7 Sep 2024 17:29:07 GMT Subject: RFR: 8339698: x86 andw/orw/xorw encoding missing 0x66 prefix In-Reply-To: <3i6Yq72VmyFDsHctqefLP-JuAoNFKYhCqygBfHmHN-M=.1d5baff1-b71e-4686-a2d2-e56c60d7938b@github.com> References: <3i6Yq72VmyFDsHctqefLP-JuAoNFKYhCqygBfHmHN-M=.1d5baff1-b71e-4686-a2d2-e56c60d7938b@github.com> Message-ID: On Sat, 7 Sep 2024 02:00:49 GMT, Sandhya Viswanathan wrote: > x86 andw/orw/xorw encoding is missing 0x66 prefix. This bug was discovered as part of the x86 instruction encoding test generation tool and gtest ([JDK-8339507](https://bugs.openjdk.org/browse/JDK-8339507)). This fix is a precursor to the PR for JDK-8339507 (https://github.com/openjdk/jdk/pull/20857). > > Best Regards, > Sandhya Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20901#pullrequestreview-2288162085 From kvn at openjdk.org Sat Sep 7 17:43:08 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 7 Sep 2024 17:43:08 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v12] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 15:34:54 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/rangeinference.hpp line 46: >> >>> 44: * Bits that are known to be 0 or 1. A value v satisfies this constraint iff >>> 45: * (v & zeros) == 0 && (~v & ones) == 0. I.e, all bits that is set in zeros >>> 46: * must be unset in v, and all bits that is set in ones must be set in v. >> >> That is quite counter-intuitive. Is there a good reason for this? >> I would have expected that `zero[i] = 1` would mean that a zero is allowed, and `ones[i] = 1` that a one is allowed. >> >> Basically, when I see `zero[i]` I expect it to be a boolean that answers me this question: "can it be a zero?". But you are telling me I'm supposed to ask "Must it not be a zero"? >> >> You are telling me that `zero[i] = 1` and `ones[i] = 0` means that it must be a `1`. >> >> I know that changing it now would be a lot of effort. But the risk of being unintuitive is that even less people can quickly fix bugs in this code. >> >> @vnkozlov what do you think about this? > > You are a bit confused, the `zeros` and `ones` answer the question: Must this bit be 0 (or 1). Which means that `zero[i] = 1` means that the bit must be a `0`. The condition could be converted to `(v & ones) == ones && (v & zeros) == 0`. It is matching your statement and easy to understand. I would let C++ compiler to optimize it instead of encoding it in confusing way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1748779626 From qamai at openjdk.org Sun Sep 8 05:07:59 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 8 Sep 2024 05:07:59 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v16] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: - change (~v & ones) == 0 to (v & ones) == ones - Merge branch 'master' into unsignedbounds - fix builds - add trivial test cases - make should return the correct type - rename tests - more explanation - fix build - fix build - add more comments, group KnownBits - ... and 13 more: https://git.openjdk.org/jdk/compare/5b72bbf9...9b70213e ------------- Changes: https://git.openjdk.org/jdk/pull/17508/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=15 Stats: 1744 lines in 10 files changed: 1195 ins; 314 del; 235 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Sun Sep 8 05:08:00 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 8 Sep 2024 05:08:00 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v12] In-Reply-To: References: Message-ID: <6kDyXzIxKkea6YzufPfSHZig11Zhn0RWLyxrPzI3qdw=.a83a49fc-1396-4610-9d73-54f72b745ebb@github.com> On Sat, 7 Sep 2024 17:38:08 GMT, Vladimir Kozlov wrote: >> You are a bit confused, the `zeros` and `ones` answer the question: Must this bit be 0 (or 1). Which means that `zero[i] = 1` means that the bit must be a `0`. > > The condition could be converted to `(v & ones) == ones && (v & zeros) == 0`. > It is matching your statement and easy to understand. I would let C++ compiler to optimize it instead of encoding it in confusing way. That's a good idea, I have made that change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1749078182 From kvn at openjdk.org Sun Sep 8 17:18:15 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 8 Sep 2024 17:18:15 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v16] In-Reply-To: References: Message-ID: <8etnYfzY9oC0zgKA4pP19B8s6upbvPnw3Aksw2P5v5Y=.97090f8e-90c5-4ec7-abaa-ff10522de93e@github.com> On Sun, 8 Sep 2024 05:07:59 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: > > - change (~v & ones) == 0 to (v & ones) == ones > - Merge branch 'master' into unsignedbounds > - fix builds > - add trivial test cases > - make should return the correct type > - rename tests > - more explanation > - fix build > - fix build > - add more comments, group KnownBits > - ... and 13 more: https://git.openjdk.org/jdk/compare/5b72bbf9...9b70213e Good. Please, also update PR's description. You need second approval. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2288574407 PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2336759364 From qamai at openjdk.org Sun Sep 8 17:29:05 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 8 Sep 2024 17:29:05 GMT Subject: RFR: 8339698: x86 andw/orw/xorw encoding missing 0x66 prefix In-Reply-To: <3i6Yq72VmyFDsHctqefLP-JuAoNFKYhCqygBfHmHN-M=.1d5baff1-b71e-4686-a2d2-e56c60d7938b@github.com> References: <3i6Yq72VmyFDsHctqefLP-JuAoNFKYhCqygBfHmHN-M=.1d5baff1-b71e-4686-a2d2-e56c60d7938b@github.com> Message-ID: On Sat, 7 Sep 2024 02:00:49 GMT, Sandhya Viswanathan wrote: > x86 andw/orw/xorw encoding is missing 0x66 prefix. This bug was discovered as part of the x86 instruction encoding test generation tool and gtest ([JDK-8339507](https://bugs.openjdk.org/browse/JDK-8339507)). This fix is a precursor to the PR for JDK-8339507 (https://github.com/openjdk/jdk/pull/20857). > > Best Regards, > Sandhya May I ask what are these instructions used for, and can we simply replace them with the dword version? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20901#issuecomment-2336762556 From jbhateja at openjdk.org Sun Sep 8 22:08:04 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 8 Sep 2024 22:08:04 GMT Subject: RFR: 8339698: x86 andw/orw/xorw encoding missing 0x66 prefix In-Reply-To: <3i6Yq72VmyFDsHctqefLP-JuAoNFKYhCqygBfHmHN-M=.1d5baff1-b71e-4686-a2d2-e56c60d7938b@github.com> References: <3i6Yq72VmyFDsHctqefLP-JuAoNFKYhCqygBfHmHN-M=.1d5baff1-b71e-4686-a2d2-e56c60d7938b@github.com> Message-ID: On Sat, 7 Sep 2024 02:00:49 GMT, Sandhya Viswanathan wrote: > x86 andw/orw/xorw encoding is missing 0x66 prefix. This bug was discovered as part of the x86 instruction encoding test generation tool and gtest ([JDK-8339507](https://bugs.openjdk.org/browse/JDK-8339507)). This fix is a precursor to the PR for JDK-8339507 (https://github.com/openjdk/jdk/pull/20857). > > Best Regards, > Sandhya Marked as reviewed by jbhateja (Reviewer). src/hotspot/cpu/x86/assembler_x86.cpp line 1636: > 1634: > 1635: void Assembler::andw(Register dst, Register src) { > 1636: emit_int8(0x66); This is operand-size override prefix which modifies the default operand size to 16bits. ------------- PR Review: https://git.openjdk.org/jdk/pull/20901#pullrequestreview-2288611187 PR Review Comment: https://git.openjdk.org/jdk/pull/20901#discussion_r1749373098 From qamai at openjdk.org Mon Sep 9 02:10:04 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 9 Sep 2024 02:10:04 GMT Subject: RFR: 8339698: x86 andw/orw/xorw encoding missing 0x66 prefix In-Reply-To: <3i6Yq72VmyFDsHctqefLP-JuAoNFKYhCqygBfHmHN-M=.1d5baff1-b71e-4686-a2d2-e56c60d7938b@github.com> References: <3i6Yq72VmyFDsHctqefLP-JuAoNFKYhCqygBfHmHN-M=.1d5baff1-b71e-4686-a2d2-e56c60d7938b@github.com> Message-ID: <2ud6eed6OSlFuc1nAF_X1qSdNTW0Hr8TwFgnmv3ouHQ=.b50f0fcd-a15d-4b42-ad9f-14bc5def6404@github.com> On Sat, 7 Sep 2024 02:00:49 GMT, Sandhya Viswanathan wrote: > x86 andw/orw/xorw encoding is missing 0x66 prefix. This bug was discovered as part of the x86 instruction encoding test generation tool and gtest ([JDK-8339507](https://bugs.openjdk.org/browse/JDK-8339507)). This fix is a precursor to the PR for JDK-8339507 (https://github.com/openjdk/jdk/pull/20857). > > Best Regards, > Sandhya I mean what do we use these instructions for, I think that if these are not needed we can remove them instead. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20901#issuecomment-2336973929 From fjiang at openjdk.org Mon Sep 9 06:09:12 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 9 Sep 2024 06:09:12 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v16] In-Reply-To: References: Message-ID: <2Iqb8t5nI61Zq22PafvY9QUUw_9OZ7oHygSdOY6QCX8=.f1338ef5-d646-45aa-bcb6-54f0dd13bc87@github.com> On Fri, 6 Sep 2024 14:02:58 GMT, Roberto Casta?eda Lozano wrote: >>> I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected. >> >> As far as I understand, from a GC perspective enum instances are handled just like any other class instance, so I do not think there is any special GC barrier optimization opportunity for them. Maybe some GC expert can comment further on this or correct me if I am wrong. > >> Hi @robcasloz, you can pick up s390x patch from here: [offamitkumar at 6663433](https://github.com/offamitkumar/jdk/commit/6663433c4aa17925f699eaa8995cdc0cd78c0034) > > Done, thanks! > > Hi @robcasloz, here is the implementation for RISC-V: [feilongjiang at 1c012cf](https://github.com/feilongjiang/jdk/commit/1c012cfd4a1f4d6e39d6f4798281423f884e97a6) We are still testing the latest changes, results will be updated later. > > Great, thanks @feilongjiang! Just let me know when you are done with testing and I will merge it into this changeset. Tier1-3 & hotspot:tier4 test result is clean on linux-riscv64 platform. No regression observed for performance. (Applied on JDK head). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2337203016 From duke at openjdk.org Mon Sep 9 07:07:09 2024 From: duke at openjdk.org (kuaiwei) Date: Mon, 9 Sep 2024 07:07:09 GMT Subject: RFR: 8339299: C1 will miss type profile when inline final method [v3] In-Reply-To: References: <5CB0aexpZn0lL8DvUFlOoNhU223th2guno8mMWuNl-w=.03067572-03dc-45a7-ba07-f489c43fc4d0@github.com> Message-ID: On Fri, 6 Sep 2024 22:28:51 GMT, Dean Long wrote: > Thinking about this a little more, I think the real problem is the choice of callee Method* passed to profile_call(). For the non-inlined case, cha_monomorphic_target is not used as the callee, but it is used for the holder. For the inlined case, we do pass cha_monomorphic_target as the callee for profile_call(). If we passed the resolved/selected Method* to try_inline() as an additional parameter to be used by profile_call(), then I believe the problem with can_be_statically_bound() goes away, and should_profile_receiver_type() will approximate what the interpreter does, which was the original intent. Hi Dean, I'm not familiar with c1 profile. IMO, the original idea is if c1 can know callee is static bound, c2 can make the same decision. So c1 needn't do profile. I just think there may be another case to break it. void A() { Child1 p = new Child1(); return B(p); } void B(Parent p) { return C(p); } void C(Parent p) { return p.function(); } If C1 decide to inline A->B->C, it can know p in C is exact type of Child1. And it can skip profile if Child1::function is final. But when C2 compile B, it can not get the profile data. My think is there may be much cases to think about. We can just simply do profile in c1. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20786#issuecomment-2337292634 From duke at openjdk.org Mon Sep 9 07:31:39 2024 From: duke at openjdk.org (kuaiwei) Date: Mon, 9 Sep 2024 07:31:39 GMT Subject: RFR: 8339299: C1 will miss type profile when inline final method [v4] In-Reply-To: References: Message-ID: > I found sometimes C1 will miss type profile and I tried to write a test to demonstrate it. > It will happen with these conditions > 1 It's a virtual call and the callee is a final method. So c1 will think it's static bound. > 2 The interpreter never touch the callsite, so interpreter does not add type profile. > 3 In c1 compilation, it will be inlined based on CHA. > 4 In c2 compilation, the CHA is broken, but type profile is missing, so c2 can not inline it. kuaiwei has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' of https://github.com/openjdk/jdk into c1_cha_type_profile - Simplify should_profile_receiver_type - Modify test case to use whitebox api - 8339299: C1 will miss type profile when inline final method ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20786/files - new: https://git.openjdk.org/jdk/pull/20786/files/fc421c9a..8bd9ee11 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20786&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20786&range=02-03 Stats: 13336 lines in 512 files changed: 8175 ins; 2267 del; 2894 mod Patch: https://git.openjdk.org/jdk/pull/20786.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20786/head:pull/20786 PR: https://git.openjdk.org/jdk/pull/20786 From rcastanedalo at openjdk.org Mon Sep 9 07:44:13 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 9 Sep 2024 07:44:13 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: References: Message-ID: On Sat, 7 Sep 2024 12:37:54 GMT, Martin Doerr wrote: > I've run the current version through our nightly tests and there were no issues. (Applied on JDK head and tested on x86_64, aarch64 and PPC64 with several OSes.) Performance also looks good on PPC64. No regression observed. Great, thanks for testing Martin! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2337362381 From jbhateja at openjdk.org Mon Sep 9 08:18:54 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 9 Sep 2024 08:18:54 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v9] In-Reply-To: References: Message-ID: <9NBW5WfztEJCcHs68o3b8O1IhgmdS3LX7UQmZXxbZ8M=.0271ce97-7dbe-4f84-965d-d511b0392c5b@github.com> > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with two additional commits since the last revision: - Fix jtreg regression. - Addressing Paul's comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/195390fe..4a93042b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=07-08 Stats: 215 lines in 39 files changed: 0 ins; 1 del; 214 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From roland at openjdk.org Mon Sep 9 08:32:33 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 9 Sep 2024 08:32:33 GMT Subject: RFR: 8339733: C2: some nodes can have incorrect control after do_range_check() Message-ID: PhaseIdealLoop::do_range_check() sets the control of the new pre and main limits to be the entry control of the pre loop but it eliminates all conditions whose parameters are invariant in the main loop. Most of the time they are also invariant in the pre loop but that's not guaranteed. It does happen sometimes that those parameters are pinned in the pre loop. In that case, PhaseIdealLoop::do_range_check() sets wrong controls. This doesn't cause any issue today AFAICT. Also, this seems to be a typo in PhaseIdealLoop::insert_pre_post_loops(): pre_head->in(0) is `pre_head`. I fixed that one too. ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/20908/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20908&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339733 Stats: 74 lines in 2 files changed: 60 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/20908.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20908/head:pull/20908 PR: https://git.openjdk.org/jdk/pull/20908 From syan at openjdk.org Mon Sep 9 09:56:13 2024 From: syan at openjdk.org (SendaoYan) Date: Mon, 9 Sep 2024 09:56:13 GMT Subject: RFR: 8339714: Delete tedious bool type define Message-ID: Hi all, This PR delete tedious bool type define in `src/java.base/unix/native/libjsig/jsig.c` and `src/utils/hsdis/binutils/hsdis-binutils.c`. After JEP 347([JDK-8246032](https://bugs.openjdk.org/browse/JDK-8246032)), I think we can "#include " to use bool type directly, like [string.h](https://github.com/openjdk/jdk/blob/master/src/java.desktop/unix/native/libpipewire/include/spa/utils/string.h#L13) do. Make code more concision, the risk is quite low. Additional testing: - [x] Local build with --with-hsdis=binutils --with-binutils=$HOME/software/binutils - [ ] Jtreg tests(include tier1/tier2/tier3 etc.) on linux x64 - [ ] Jtreg tests(include tier1/tier2/tier3 etc.) on linux aarch64 ------------- Commit messages: - 8339714: Delete tedious bool type define Changes: https://git.openjdk.org/jdk/pull/20909/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20909&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339714 Stats: 14 lines in 2 files changed: 1 ins; 12 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20909.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20909/head:pull/20909 PR: https://git.openjdk.org/jdk/pull/20909 From jbhateja at openjdk.org Mon Sep 9 10:28:08 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 9 Sep 2024 10:28:08 GMT Subject: RFR: 8329035: New Data Destination instructions support [v3] In-Reply-To: <-RLu8PXYhxqRgLIU7Qljx2Pg1GFuJOhgVqlgms7NK-8=.e8429a61-3439-4a6b-bee0-4c9d4cbaeab6@github.com> References: <-RLu8PXYhxqRgLIU7Qljx2Pg1GFuJOhgVqlgms7NK-8=.e8429a61-3439-4a6b-bee0-4c9d4cbaeab6@github.com> Message-ID: On Fri, 6 Sep 2024 22:00:59 GMT, Sandhya Viswanathan wrote: > @vnkozlov Could you please run this through your testing? @jatin-bhateja Please let us know if the PR looks good to you. We are hoping to get this integrated by next Friday (September 13th). Hi @sviswa7 , sure, please allow me sometime, as @merykitty rightly raised concerns over [PR#20901](https://github.com/openjdk/jdk/pull/20901) about lazy need based introduction of new assembler instructions, I am also in process of prototyping c2 compiler side changes for NDD instructions, which can be floated once these are checked in. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20698#issuecomment-2337735283 From rcastanedalo at openjdk.org Mon Sep 9 11:15:47 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 9 Sep 2024 11:15:47 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v18] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion - riscv port for JEP 475 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/6663433c..94145917 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=16-17 Stats: 860 lines in 4 files changed: 771 ins; 49 del; 40 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Mon Sep 9 11:15:47 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 9 Sep 2024 11:15:47 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 07:41:06 GMT, Roberto Casta?eda Lozano wrote: >> I've run the current version through our nightly tests and there were no issues. (Applied on JDK head and tested on x86_64, aarch64 and PPC64 with several OSes.) Performance also looks good on PPC64. No regression observed. > >> I've run the current version through our nightly tests and there were no issues. (Applied on JDK head and tested on x86_64, aarch64 and PPC64 with several OSes.) Performance also looks good on PPC64. No regression observed. > > Great, thanks for testing Martin! > > > Hi @robcasloz, here is the implementation for RISC-V: [feilongjiang at 1c012cf](https://github.com/feilongjiang/jdk/commit/1c012cfd4a1f4d6e39d6f4798281423f884e97a6) We are still testing the latest changes, results will be updated later. > > > > > > Great, thanks @feilongjiang! Just let me know when you are done with testing and I will merge it into this changeset. > > Tier1-3 & hotspot:tier4 test result is clean on linux-riscv64 platform. No regression observed for performance. (Applied on JDK head). Thanks @feilongjiang, merged now (commit 94145917). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2337824882 From rcastanedalo at openjdk.org Mon Sep 9 11:35:13 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 9 Sep 2024 11:35:13 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 21:33:42 GMT, Kim Barrett wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> s390 port : late barrier expansion > > src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 176: > >> 174: __ jcc(Assembler::zero, runtime); // jump to runtime if index == 0 (full buffer) >> 175: // The buffer is not full, store value into it. >> 176: __ subptr(temp, wordSize); // temp := next index > > Instead of > > __ testptr(temp, temp); > __ jcc(Assembler::zero, runtime); > __ subptr(temp, wordSize); > > it seems like this might be better > > __ subptr(temp, wordSize); > __ jcc(Assembler::below, runtime); > > I think the code in the PR matches what the early expansion generates, so I think a change here > can be deferred to a followup. Good point, thanks! I made a note for follow-up work. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1750088920 From rcastanedalo at openjdk.org Mon Sep 9 11:48:11 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 9 Sep 2024 11:48:11 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 23:57:59 GMT, Kim Barrett wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> s390 port : late barrier expansion > > src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 354: > >> 352: __ bind(runtime); >> 353: // save the live input values >> 354: RegSet saved = RegSet::of(store_addr NOT_LP64(COMMA thread)); > > I was looking at this a while ago, and haven't figured out why we're saving `store_addr` here. > Also not sure why we're saving `thread` here for 32bit platforms. > Something to think about for the future. Though maybe the 32bit case will be gone by then :) I'm not sure either, this is in any case pre-existing interpreter code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1750105760 From roland at openjdk.org Mon Sep 9 11:51:06 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 9 Sep 2024 11:51:06 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() In-Reply-To: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> Message-ID: <4EVmLjNMTfnnGZOmvV41VLHO4OOuLiJLEuu5723qYag=.6309b7e2-e491-499d-b23e-35d440cec5f4@github.com> On Thu, 5 Sep 2024 21:01:00 GMT, Kangcheng Xu wrote: > Hello all. This patch addresses [JDK-8332442](https://bugs.openjdk.org/browse/JDK-8332442) and refactors `Op_ModI`/`Op_ModL`/`Op_UModI`/`Op_UModL` cases in `DIVMOD` transforamtions. The purpose of the transformation to convert adjacent div `/` and mod `%` operations of the same operands into one should the platform support this feature (e.g., x86-64). > > I took the liberty adding _signed_ DIVMOD nodes (i.e., `DIV_MOD_I` and `DIV_MOD_L`) to the `IRNode` class constants as they are previously missing. Please let me know if they were left intentionally and if there are any other concerns. Thanks! > > ~This will be a draft PR before GHA tests are confirmed passing.~ src/hotspot/share/opto/compile.cpp line 3163: > 3161: } > 3162: > 3163: void Compile::handle_div_mod_op(Node* n, int div_op, int div_mod_op, bool is_unsigned) { I would pass `bt` here instead of both `div_op` and `div_mod_op`. src/hotspot/share/opto/compile.cpp line 3174: > 3172: } > 3173: > 3174: BasicType bt = div_op == Op_DivI || div_op == Op_UDivI ? T_INT : T_LONG; and add a new function `Op_DivMod(BasicType bt, bool is_unsigned)` similar to `Op_ConIL` that would replace `div_mod_op` here. test/hotspot/jtreg/compiler/c2/TestDivModNodes.java line 45: > 43: @Test > 44: @Arguments(values = {Argument.RANDOM_EACH, Argument.RANDOM_EACH}) > 45: @IR(counts = {IRNode.DIV_MOD_I, "1" }) Running the test with both `UseDivMod` on and off and adding the corresponding IR rules would be nice. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1750105333 PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1750106629 PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1750110873 From jwaters at openjdk.org Mon Sep 9 12:10:09 2024 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 9 Sep 2024 12:10:09 GMT Subject: RFR: 8339714: Delete tedious bool type define In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 09:50:59 GMT, SendaoYan wrote: > Hi all, > This PR delete tedious bool type define in `src/java.base/unix/native/libjsig/jsig.c` and `src/utils/hsdis/binutils/hsdis-binutils.c`. After JEP 347([JDK-8246032](https://bugs.openjdk.org/browse/JDK-8246032)), I think we can "#include " to use bool type directly, like [string.h](https://github.com/openjdk/jdk/blob/master/src/java.desktop/unix/native/libpipewire/include/spa/utils/string.h#L13) do. > Make code more concision, the risk is quite low. > > Additional testing: > > - [x] Local build with --with-hsdis=binutils --with-binutils=$HOME/software/binutils > - [ ] Jtreg tests(include tier1/tier2/tier3 etc.) on linux x64 > - [ ] Jtreg tests(include tier1/tier2/tier3 etc.) on linux aarch64 Hmm. While I want to put my support behind this change, I recall that an earlier proposal that I made to implement jbooleans with stdbool.h being rejected due to backwards incompatibility, and some places really do expect an int and not a bool type. What are the likelihoods that a place in the code here is actually expecting an int due to ABI issues? src/java.base/unix/native/libjsig/jsig.c line 46: > 44: #include > 45: > 46: #if (__STDC_VERSION__ >= 199901L) Since this does include stdbool.h already, this change looks ok src/utils/hsdis/binutils/hsdis-binutils.c line 67: > 65: #include "hsdis.h" > 66: > 67: #ifndef bool I'm a little worried about this change. hsdis may really need an int here. If that turns out to not be the case then I'll retract my concerns ------------- PR Comment: https://git.openjdk.org/jdk/pull/20909#issuecomment-2337942768 PR Review Comment: https://git.openjdk.org/jdk/pull/20909#discussion_r1750135891 PR Review Comment: https://git.openjdk.org/jdk/pull/20909#discussion_r1750137910 From dlunden at openjdk.org Mon Sep 9 12:15:50 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 9 Sep 2024 12:15:50 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v5] In-Reply-To: References: Message-ID: > If a method has a large number of parameters, we currently bail out from C2 compilation. > > ### Changeset > > Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. > > Changes: > - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. > - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. > - Remove all `can_represent` checks and bailouts. > - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. > - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. > - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it). > > ![c2-regression](https:/... Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: - Formatting updates - Update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20404/files - new: https://git.openjdk.org/jdk/pull/20404/files/95396668..36f9dabf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=03-04 Stats: 1446 lines in 22 files changed: 1125 ins; 192 del; 129 mod Patch: https://git.openjdk.org/jdk/pull/20404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404 PR: https://git.openjdk.org/jdk/pull/20404 From dlunden at openjdk.org Mon Sep 9 12:19:07 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 9 Sep 2024 12:19:07 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v4] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 16:09:42 GMT, Vladimir Kozlov wrote: >>> As we discussed on our previous meeting Aarch64 has very small registers mask - only 10 words. Can you look if that enough or we should increase static size of it? It could be separate RFE. >> >> I do have looking at this in my to-do list (as a separate RFE). I'm not sure it is an issue though: the calculation of `RM_SIZE` first ensures that it covers all registers, and then adds three words to cover arguments, locks, and some other things. If it is only 10 words in total on aarch64, it should be because we simply do not have as many registers that we need to refer to. I do not recall from our discussion, is there some particular case where `RM_SIZE` on aarch64 is an issue? > >> is there some particular case where RM_SIZE on aarch64 is an issue? > > Both `register_aarch64.hpp` and `register_x86.hpp` (64-bits) specify `number_of_registers = 32`. So why `RM_SIZE` is different? Thanks for the review @vnkozlov! I just pushed an updated version. @dean: I believe the update addresses your concerns. Summary of the most important changes: - Add a (generous) limit to the number of stack slots that `BoxLockNode`s can occupy in total. If we reach the limit, we bail out of compilation. - Add an upper bound for register mask growth. The upper bound is is (or, should be) impossible to reach, and I've added an `assert` to check this whenever a register mask grows. // Compute a best-effort (statically known) upper bound for register mask // size in 32-bit words. When extending/growing register masks, we should // never grow past this size. static const unsigned int RM_SIZE_MAX = (((RM_SIZE_MIN << 5) + // Slots for machine registers (max_method_parameter_length * 2) + // Slots for incoming arguments (max_method_parameter_length * 2) + // Slots for outgoing arguments BoxLockNode_slot_limit + // Slots for locks 64 // Padding, reserved words, etc. ) + 31) >> 5; // Number of bits -> number of 32-bit words - Add a `STATIC_ASSERT` that `short` can index the maximum size register mask. - Add a compilation bailout to `PhaseChaitin::Select` at register mask chunk rollover, if `short` can no longer index the rolled-over mask. - Require that the user provides the arena in which to extend register masks directly in register mask constructors. This is slightly more verbose compared to using `_comp_arena` by default, but safer and more flexible. - Optimize memory consumption by allocating extensions of temporary register masks in a separate resource area (not `_comp_arena`). - Add @robcasloz's register mask tests. - Improve register mask dumping functionality (I needed it for debugging, could go in a separate RFE). - Make `overlap` also take the `all-stack` flag into account (bug). - Rename `TestManyMethodArguments.java` to `TestMaxMethodArguments.java`. It now tests that C2 can compile the maximum allowed number of arguments (according to the JVM spec). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2337966119 From jvernee at openjdk.org Mon Sep 9 12:34:07 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 9 Sep 2024 12:34:07 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded [v5] In-Reply-To: <9Lc9Ej1toiiI8QFYXndprsPo4l-g20XWPDw5g9l36Fk=.091f48aa-5559-4862-8968-e4283d3fa728@github.com> References: <9Lc9Ej1toiiI8QFYXndprsPo4l-g20XWPDw5g9l36Fk=.091f48aa-5559-4862-8968-e4283d3fa728@github.com> Message-ID: On Fri, 6 Sep 2024 17:51:15 GMT, Jorn Vernee wrote: >> As discussed in the JBS issue: >> >> FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` >> >> However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. >> >> The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. >> >>
>> Performance numbers >> x64: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op >> >> >> aarch64: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op >> >>
>> >> As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. >> >> Testing: tier 1-4 > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > remove PC save/restore on s390 src/hotspot/share/prims/upcallLinker.cpp line 160: > 158: > 159: assert(entry->method_holder()->is_initialized(), "no clinit barrier"); > 160: CompilationPolicy::compile_if_required(mh_entry, CHECK_0); Note that this call to `compile_if_required` doesn't make sense here, since the target method can change (through a race). But also: we already call `compile_if_required` for the target of a method handle in `CallInfo::set_common`, which we reach when resolving the target `vmentry` (a `MemberName`). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1750172834 From syan at openjdk.org Mon Sep 9 12:53:06 2024 From: syan at openjdk.org (SendaoYan) Date: Mon, 9 Sep 2024 12:53:06 GMT Subject: RFR: 8339714: Delete tedious bool type define In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 12:06:25 GMT, Julian Waters wrote: >> Hi all, >> This PR delete tedious bool type define in `src/java.base/unix/native/libjsig/jsig.c` and `src/utils/hsdis/binutils/hsdis-binutils.c`. After JEP 347([JDK-8246032](https://bugs.openjdk.org/browse/JDK-8246032)), I think we can "#include " to use bool type directly, like [string.h](https://github.com/openjdk/jdk/blob/master/src/java.desktop/unix/native/libpipewire/include/spa/utils/string.h#L13) do. >> Make code more concision, the risk is quite low. >> >> Additional testing: >> >> - [x] Local build with --with-hsdis=binutils --with-binutils=$HOME/software/binutils >> - [ ] Jtreg tests(include tier1/tier2/tier3 etc.) on linux x64 >> - [ ] Jtreg tests(include tier1/tier2/tier3 etc.) on linux aarch64 > > src/java.base/unix/native/libjsig/jsig.c line 46: > >> 44: #include >> 45: >> 46: #if (__STDC_VERSION__ >= 199901L) > > Since this does include stdbool.h already, this change looks ok Okey. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20909#discussion_r1750201356 From syan at openjdk.org Mon Sep 9 12:59:06 2024 From: syan at openjdk.org (SendaoYan) Date: Mon, 9 Sep 2024 12:59:06 GMT Subject: RFR: 8339714: Delete tedious bool type define In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 12:07:54 GMT, Julian Waters wrote: >> Hi all, >> This PR delete tedious bool type define in `src/java.base/unix/native/libjsig/jsig.c` and `src/utils/hsdis/binutils/hsdis-binutils.c`. After JEP 347([JDK-8246032](https://bugs.openjdk.org/browse/JDK-8246032)), I think we can "#include " to use bool type directly, like [string.h](https://github.com/openjdk/jdk/blob/master/src/java.desktop/unix/native/libpipewire/include/spa/utils/string.h#L13) do. >> Make code more concision, the risk is quite low. >> >> Additional testing: >> >> - [x] Local build with --with-hsdis=binutils --with-binutils=$HOME/software/binutils >> - [ ] Jtreg tests(include tier1/tier2/tier3 etc.) on linux x64 >> - [ ] Jtreg tests(include tier1/tier2/tier3 etc.) on linux aarch64 > > src/utils/hsdis/binutils/hsdis-binutils.c line 67: > >> 65: #include "hsdis.h" >> 66: >> 67: #ifndef bool > > I'm a little worried about this change. hsdis may really need an int here. If that turns out to not be the case then I'll retract my concerns I have verified this change locally, include build hsdis.so and check the functional with command java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -version. The verified show this change for hsdis.so work normally. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20909#discussion_r1750211438 From jwaters at openjdk.org Mon Sep 9 13:06:06 2024 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 9 Sep 2024 13:06:06 GMT Subject: RFR: 8339714: Delete tedious bool type define In-Reply-To: References: Message-ID: <55oOaRmeL9m8LqNBmJgL3HKHI5QJ0WHm4ahcOneNt8k=.bfa57937-f3fe-41d3-b514-3d5444bde093@github.com> On Mon, 9 Sep 2024 09:50:59 GMT, SendaoYan wrote: > Hi all, > This PR delete tedious bool type define in `src/java.base/unix/native/libjsig/jsig.c` and `src/utils/hsdis/binutils/hsdis-binutils.c`. After JEP 347([JDK-8246032](https://bugs.openjdk.org/browse/JDK-8246032)), I think we can "#include " to use bool type directly, like [string.h](https://github.com/openjdk/jdk/blob/master/src/java.desktop/unix/native/libpipewire/include/spa/utils/string.h#L13) do. > Make code more concision, the risk is quite low. > > Additional testing: > > - [x] Local build with --with-hsdis=binutils --with-binutils=$HOME/software/binutils > - [ ] Jtreg tests(include tier1/tier2/tier3 etc.) on linux x64 > - [ ] Jtreg tests(include tier1/tier2/tier3 etc.) on linux aarch64 Marked as reviewed by jwaters (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20909#pullrequestreview-2289852123 From jwaters at openjdk.org Mon Sep 9 13:06:07 2024 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 9 Sep 2024 13:06:07 GMT Subject: RFR: 8339714: Delete tedious bool type define In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 12:56:56 GMT, SendaoYan wrote: >> src/utils/hsdis/binutils/hsdis-binutils.c line 67: >> >>> 65: #include "hsdis.h" >>> 66: >>> 67: #ifndef bool >> >> I'm a little worried about this change. hsdis may really need an int here. If that turns out to not be the case then I'll retract my concerns > > I have verified this change locally, include build hsdis.so and check the functional with command java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -version. The verified show this change for hsdis.so work normally. Ok, sounds good ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20909#discussion_r1750222067 From syan at openjdk.org Mon Sep 9 13:20:07 2024 From: syan at openjdk.org (SendaoYan) Date: Mon, 9 Sep 2024 13:20:07 GMT Subject: RFR: 8339714: Delete tedious bool type define In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 13:03:49 GMT, Julian Waters wrote: >> I have verified this change locally, include build hsdis.so and check the functional with command java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -version. The verified show this change for hsdis.so work normally. > > Ok, sounds good Thanks for the review. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20909#discussion_r1750243576 From dlunden at openjdk.org Mon Sep 9 13:48:11 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 9 Sep 2024 13:48:11 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v2] In-Reply-To: References: <7O0X1427MXg_MrnMb9qcH64x3lG3feYt47iZhBujztk=.3fee1482-8bcf-42d9-bc4d-0e074f03222e@github.com> Message-ID: On Tue, 13 Aug 2024 10:17:03 GMT, Daniel Lund?n wrote: >> src/hotspot/share/opto/postaloc.cpp line 765: >> >>> 763: // in both registers. >>> 764: OptoReg::Name nreg_lo = OptoReg::add(nreg,-1); >>> 765: if( !lrgs(lidx).mask().Member(nreg_lo) ) { // Nearly always adjacent >> >> Is the removal of `// Either a spill slot, or` intentional? > > Unintentional, and I have to look closer at this. I suspect the "Either a spill slot" comment refers to if the register is larger than or equal to `LRG::SPILL_REG`, which I believe is implied by `!RegMask::can_represent(nreg_lo)` at this stage of `PhaseChaitin`. We should probably replace `RegMask::can_represent(nreg_lo)` with an explicit check `nreg_lo < LRG::SPILL_REG`. I cannot observe any case where `nreg_lo` here is larger than or equal to `LRG::SPILL_REG`, so it doesn't look like that is what the comment refers to. I've now added an `assert` that ensures my assumptions regarding removing the "Either a spill slot" line makes sense. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r1750299868 From rcastanedalo at openjdk.org Mon Sep 9 14:44:17 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 9 Sep 2024 14:44:17 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: References: Message-ID: <_7xTwicd2PDxJZOJ7xLdkHZc18UT-9sVZk-3YFgkMA0=.7db81e94-4f38-44a0-9983-6e391459aab2@github.com> On Sat, 7 Sep 2024 03:57:43 GMT, Kim Barrett wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> s390 port : late barrier expansion > > src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 112: > >> 110: // The answer is that stores of different sizes can co-exist >> 111: // in the same sequence of RawMem effects. We sometimes initialize >> 112: // a whole 'tile' of array elements with a single jint or jlong.) > > I'm having trouble making sense of this comment. I guess a jlong could be used to null-initialize two > 32bit oops/narrowOops? But that doesn't have anything to do with jints. I am not sure the complex overlap test is necessary here, this code was copy-pasted from [MemNode::find_previous_store()](https://github.com/openjdk/jdk/blob/c54fc08aa3c63e4b26dc5edb2436844dfd3bab7c/src/hotspot/share/opto/memnode.cpp#L678) by [JDK-8057737](https://bugs.openjdk.org/browse/JDK-8057737), and in this new context I do not see how we might find stores of different sizes as mentioned in the comment. jlongs could be used to null-initialize two 32-bit OOPs, but such initializing stores are not even visible in C2's intermediate representation at the time `G1BarrierSetC2::g1_can_remove_pre_barrier()` is called. The fact that the comment refers to initializing several array elements with a single jint suggests to me that this code has lost some of its original purpose after being copied into a narrower context (OOP stores after object allocations). But since this code is pre-existing and in the worst case it is just performing some unnecessary work, I suggest to leave it as-is a nd possibly investigate how to simplify it as a follow-up task. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1750400106 From lmesnik at openjdk.org Mon Sep 9 15:24:24 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 9 Sep 2024 15:24:24 GMT Subject: RFR: 8339299: C1 will miss type profile when inline final method [v4] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 07:31:39 GMT, kuaiwei wrote: >> I found sometimes C1 will miss type profile and I tried to write a test to demonstrate it. >> It will happen with these conditions >> 1 It's a virtual call and the callee is a final method. So c1 will think it's static bound. >> 2 The interpreter never touch the callsite, so interpreter does not add type profile. >> 3 In c1 compilation, it will be inlined based on CHA. >> 4 In c2 compilation, the CHA is broken, but type profile is missing, so c2 can not inline it. > > kuaiwei has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' of https://github.com/openjdk/jdk into c1_cha_type_profile > - Simplify should_profile_receiver_type > - Modify test case to use whitebox api > - 8339299: C1 will miss type profile when inline final method Changes requested by lmesnik (Reviewer). test/hotspot/jtreg/compiler/cha/TypeProfileFinalMethod.java line 46: > 44: public class TypeProfileFinalMethod { > 45: public static void main(String[] args) throws Exception { > 46: ProcessBuilder pb = ProcessTools.createLimitedTestJavaProcessBuilder( The createLimitedTestJavaProcessBuilder ignores any other VM flags. This mode should be used only of test too specific. However, * @requires (vm.opt.TieredStoAtLevel == null | vm.opt.TieredStopAtLevel == 4) means that we are going to run test with mostly any GC/Runtime and C2 stress options flags. (No C1-only) So it is needed to use createLimitedTestJavaProcessBuilder to accept all VM flags. You don't neet to test any additional VM flags until you have a reasons to suppose that something might fails. Just use 'createTestJavaProcessBuilder' insted. If you think that test shouldn't accept any addtional vm flags, use @requires vm.flagless instead of @requires (vm.opt.TieredStoAtLevel == null | vm.opt.TieredStopAtLevel == 4) So test is executed only once and don't run if it is gong to ignore flags. ------------- PR Review: https://git.openjdk.org/jdk/pull/20786#pullrequestreview-2290244295 PR Review Comment: https://git.openjdk.org/jdk/pull/20786#discussion_r1750466093 From sviswanathan at openjdk.org Mon Sep 9 17:07:06 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 9 Sep 2024 17:07:06 GMT Subject: RFR: 8339698: x86 andw/orw/xorw encoding missing 0x66 prefix In-Reply-To: References: <3i6Yq72VmyFDsHctqefLP-JuAoNFKYhCqygBfHmHN-M=.1d5baff1-b71e-4686-a2d2-e56c60d7938b@github.com> Message-ID: <_6w8Jl66kcg9gFEzGvJxhs93wmijQORXJjG-1byy7fM=.0d9d4f1f-7f83-41bc-9ece-a9a9b6675105@github.com> On Sat, 7 Sep 2024 17:26:45 GMT, Vladimir Kozlov wrote: >> x86 andw/orw/xorw encoding is missing 0x66 prefix. This bug was discovered as part of the x86 instruction encoding test generation tool and gtest ([JDK-8339507](https://bugs.openjdk.org/browse/JDK-8339507)). This fix is a precursor to the PR for JDK-8339507 (https://github.com/openjdk/jdk/pull/20857). >> >> Best Regards, >> Sandhya > > Good. I am ok either way, removing these instructions or fixing the encoding. @vnkozlov Please let me know your thoughts. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20901#issuecomment-2338631962 From kvn at openjdk.org Mon Sep 9 17:56:09 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 9 Sep 2024 17:56:09 GMT Subject: RFR: 8339698: x86 andw/orw/xorw encoding missing 0x66 prefix In-Reply-To: References: <3i6Yq72VmyFDsHctqefLP-JuAoNFKYhCqygBfHmHN-M=.1d5baff1-b71e-4686-a2d2-e56c60d7938b@github.com> Message-ID: On Sat, 7 Sep 2024 17:26:45 GMT, Vladimir Kozlov wrote: >> x86 andw/orw/xorw encoding is missing 0x66 prefix. This bug was discovered as part of the x86 instruction encoding test generation tool and gtest ([JDK-8339507](https://bugs.openjdk.org/browse/JDK-8339507)). This fix is a precursor to the PR for JDK-8339507 (https://github.com/openjdk/jdk/pull/20857). >> >> Best Regards, >> Sandhya > > Good. > I am ok either way, removing these instructions or fixing the encoding. @vnkozlov Please let me know your thoughts. Remove them. Check `addw(Register dst, Register src)` - I think it is not used too. All these instructions were added for first JEP 8223347: Integration of Vector API (Incubator) May be there were experiments with using them during development. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20901#issuecomment-2338730973 From sviswanathan at openjdk.org Mon Sep 9 18:17:18 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 9 Sep 2024 18:17:18 GMT Subject: RFR: 8339698: x86 unused andw/orw/xorw/addw encoding could be removed [v2] In-Reply-To: <3i6Yq72VmyFDsHctqefLP-JuAoNFKYhCqygBfHmHN-M=.1d5baff1-b71e-4686-a2d2-e56c60d7938b@github.com> References: <3i6Yq72VmyFDsHctqefLP-JuAoNFKYhCqygBfHmHN-M=.1d5baff1-b71e-4686-a2d2-e56c60d7938b@github.com> Message-ID: > x86 andw/orw/xorw encoding is missing 0x66 prefix. This bug was discovered as part of the x86 instruction encoding test generation tool and gtest ([JDK-8339507](https://bugs.openjdk.org/browse/JDK-8339507)). This fix is a precursor to the PR for JDK-8339507 (https://github.com/openjdk/jdk/pull/20857). > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Remove unused andw/orw/xorw/addw (reg, reg) instruction encoding ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20901/files - new: https://git.openjdk.org/jdk/pull/20901/files/0ae3b954..faba2cc9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20901&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20901&range=00-01 Stats: 29 lines in 2 files changed: 0 ins; 29 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20901.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20901/head:pull/20901 PR: https://git.openjdk.org/jdk/pull/20901 From sviswanathan at openjdk.org Mon Sep 9 18:17:18 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 9 Sep 2024 18:17:18 GMT Subject: RFR: 8339698: x86 unused andw/orw/xorw/addw encoding could be removed [v2] In-Reply-To: References: <3i6Yq72VmyFDsHctqefLP-JuAoNFKYhCqygBfHmHN-M=.1d5baff1-b71e-4686-a2d2-e56c60d7938b@github.com> Message-ID: <9-KS7QjKYDN0duYqcIcNgoQmcTsRGWTv293PXY-CuUw=.fc03cb29-4b11-4a79-b306-7ab7279b26ea@github.com> On Mon, 9 Sep 2024 17:53:45 GMT, Vladimir Kozlov wrote: >> Good. > >> I am ok either way, removing these instructions or fixing the encoding. @vnkozlov Please let me know your thoughts. > > Remove them. Check `addw(Register dst, Register src)` - I think it is not used too. > > All these instructions were added for first JEP 8223347: Integration of Vector API (Incubator) > May be there were experiments with using them during development. @vnkozlov @merykitty I have updated the PR implementing your review comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20901#issuecomment-2338784625 From dlong at openjdk.org Mon Sep 9 18:55:06 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 9 Sep 2024 18:55:06 GMT Subject: RFR: 8339299: C1 will miss type profile when inline final method [v3] In-Reply-To: References: <5CB0aexpZn0lL8DvUFlOoNhU223th2guno8mMWuNl-w=.03067572-03dc-45a7-ba07-f489c43fc4d0@github.com> Message-ID: On Mon, 9 Sep 2024 07:04:43 GMT, kuaiwei wrote: > If C1 decide to inline A->B->C, it can know p in C is exact type of Child1. And it can skip profile if Child1::function is final. But when C2 compile B, it can not get the profile data. My think is there may be much cases to think about. We can just simply do profile in c1. That's right. I was just trying to point out why this happens. C1 was checking final/private on Child1 instead of Parent. If we wanted to make C1 behave more like the interpreter, then it should do the final/private check on Parent. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20786#issuecomment-2338850830 From kvn at openjdk.org Mon Sep 9 19:16:06 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 9 Sep 2024 19:16:06 GMT Subject: RFR: 8339698: x86 unused andw/orw/xorw/addw encoding could be removed [v2] In-Reply-To: References: <3i6Yq72VmyFDsHctqefLP-JuAoNFKYhCqygBfHmHN-M=.1d5baff1-b71e-4686-a2d2-e56c60d7938b@github.com> Message-ID: On Mon, 9 Sep 2024 18:17:18 GMT, Sandhya Viswanathan wrote: >> x86 andw/orw/xorw(Register, Register) encoding is missing 0x66 prefix. These instructions along with addw(Register, Register) are unused and so removed. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Remove unused andw/orw/xorw/addw (reg, reg) instruction encoding Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20901#pullrequestreview-2290751296 From duke at openjdk.org Mon Sep 9 19:51:06 2024 From: duke at openjdk.org (duke) Date: Mon, 9 Sep 2024 19:51:06 GMT Subject: RFR: 8339366: [jittester] Make it possible to generate tests without execution In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 07:41:39 GMT, Evgeny Nikitin wrote: > This PR: > > 1. Extracts IR tree generation from execution (left in the Automatic) into a dedicated class IRTreeGenerator; > 2. Introduces a generation result record (named IRTreeGenerator.Test). The record contains main and private classes along with the random seed used for their generation; > 3. Creates CLI-wrapper classes for Java and ByteCode generators to allow generation-only execution; > 4. Add a repeating option to the configuration - to make it possible to specify several main class names. > > Sample usage: > > java -cp build/classes --add-opens java.base/java.util=ALL-UNNAMED \ > jdk.test.lib.jittester.JavaCodeGenerator \ > -k Test_0 -k Test_1 -k Test_10 @lepestock Your change (at version 4d66bfdd90885fa25d0c868f0473e0c0e1a0b3e4) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20806#issuecomment-2338949142 From sviswanathan at openjdk.org Mon Sep 9 19:53:08 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 9 Sep 2024 19:53:08 GMT Subject: RFR: 8339698: x86 unused andw/orw/xorw/addw encoding could be removed [v2] In-Reply-To: References: <3i6Yq72VmyFDsHctqefLP-JuAoNFKYhCqygBfHmHN-M=.1d5baff1-b71e-4686-a2d2-e56c60d7938b@github.com> Message-ID: On Mon, 9 Sep 2024 17:53:45 GMT, Vladimir Kozlov wrote: >> Good. > >> I am ok either way, removing these instructions or fixing the encoding. @vnkozlov Please let me know your thoughts. > > Remove them. Check `addw(Register dst, Register src)` - I think it is not used too. > > All these instructions were added for first JEP 8223347: Integration of Vector API (Incubator) > May be there were experiments with using them during development. Thanks a lot @vnkozlov. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20901#issuecomment-2338953775 From jbhateja at openjdk.org Mon Sep 9 19:58:13 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 9 Sep 2024 19:58:13 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction Message-ID: - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. - This saves emitting an explicit MOVZX instruction after setCC. - These new instructions are encoded using 4 byte Extended EVEX encoding. Validation performed over stand alone test point using Intel SDE. Best Regards, Jatin ------------- Commit messages: - 8339790: Support Intel APX setzucc instruction. Changes: https://git.openjdk.org/jdk/pull/20920/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20920&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339790 Stats: 53 lines in 7 files changed: 26 ins; 13 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/20920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20920/head:pull/20920 PR: https://git.openjdk.org/jdk/pull/20920 From enikitin at openjdk.org Mon Sep 9 19:59:10 2024 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Mon, 9 Sep 2024 19:59:10 GMT Subject: Integrated: 8339366: [jittester] Make it possible to generate tests without execution In-Reply-To: References: Message-ID: <9K40ixbdn16a4lyK6C2b4RkoXePbdkrKtVhp_lgZq98=.45901627-9380-4dac-b6ab-caf2816e55d2@github.com> On Mon, 2 Sep 2024 07:41:39 GMT, Evgeny Nikitin wrote: > This PR: > > 1. Extracts IR tree generation from execution (left in the Automatic) into a dedicated class IRTreeGenerator; > 2. Introduces a generation result record (named IRTreeGenerator.Test). The record contains main and private classes along with the random seed used for their generation; > 3. Creates CLI-wrapper classes for Java and ByteCode generators to allow generation-only execution; > 4. Add a repeating option to the configuration - to make it possible to specify several main class names. > > Sample usage: > > java -cp build/classes --add-opens java.base/java.util=ALL-UNNAMED \ > jdk.test.lib.jittester.JavaCodeGenerator \ > -k Test_0 -k Test_1 -k Test_10 This pull request has now been integrated. Changeset: 559fc711 Author: Evgeny Nikitin Committer: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/559fc711e03cf0086bea399ffb40cf294cbbb6e1 Stats: 247 lines in 7 files changed: 167 ins; 59 del; 21 mod 8339366: [jittester] Make it possible to generate tests without execution Reviewed-by: lmesnik ------------- PR: https://git.openjdk.org/jdk/pull/20806 From jbhateja at openjdk.org Mon Sep 9 19:58:33 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 9 Sep 2024 19:58:33 GMT Subject: RFR: 8339793: Fix incorrect APX feature enabling with -XX:-UseAPX Message-ID: Currently VM_Supports::supports_apx_f() returns a true value even if user explicitly pass -XX:-UseAPX runtime flag, this enables APX specific code and register set. This bug fix patch turn off the APX_F feature if UseAPX runtime flag is explicitly set to false value. Best Regards, Jatin ------------- Commit messages: - 8339793: Fix incorrect APX feature enabling with -XX:-UseAPX. Changes: https://git.openjdk.org/jdk/pull/20921/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20921&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339793 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20921.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20921/head:pull/20921 PR: https://git.openjdk.org/jdk/pull/20921 From kxu at openjdk.org Mon Sep 9 20:46:24 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 9 Sep 2024 20:46:24 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v2] In-Reply-To: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> Message-ID: <7spZ424UdaAr1E-q8E508vHdwGBv7l6ETGK-T53-6xM=.64104918-adeb-4ba9-b623-414fe46b73e6@github.com> > Hello all. This patch addresses [JDK-8332442](https://bugs.openjdk.org/browse/JDK-8332442) and refactors `Op_ModI`/`Op_ModL`/`Op_UModI`/`Op_UModL` cases in `DIVMOD` transforamtions. The purpose of the transformation to convert adjacent div `/` and mod `%` operations of the same operands into one should the platform support this feature (e.g., x86-64). > > I took the liberty adding _signed_ DIVMOD nodes (i.e., `DIV_MOD_I` and `DIV_MOD_L`) to the `IRNode` class constants as they are previously missing. Please let me know if they were left intentionally and if there are any other concerns. Thanks! > > ~This will be a draft PR before GHA tests are confirmed passing.~ Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: remove redundant arguments, test with -XX:-UseDivMod ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20877/files - new: https://git.openjdk.org/jdk/pull/20877/files/6dc9ea97..6c604d25 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20877&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20877&range=00-01 Stats: 52 lines in 4 files changed: 37 ins; 2 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/20877.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20877/head:pull/20877 PR: https://git.openjdk.org/jdk/pull/20877 From kxu at openjdk.org Mon Sep 9 20:46:24 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 9 Sep 2024 20:46:24 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v2] In-Reply-To: <4EVmLjNMTfnnGZOmvV41VLHO4OOuLiJLEuu5723qYag=.6309b7e2-e491-499d-b23e-35d440cec5f4@github.com> References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> <4EVmLjNMTfnnGZOmvV41VLHO4OOuLiJLEuu5723qYag=.6309b7e2-e491-499d-b23e-35d440cec5f4@github.com> Message-ID: On Mon, 9 Sep 2024 11:46:20 GMT, Roland Westrelin wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> remove redundant arguments, test with -XX:-UseDivMod > > src/hotspot/share/opto/compile.cpp line 3174: > >> 3172: } >> 3173: >> 3174: BasicType bt = div_op == Op_DivI || div_op == Op_UDivI ? T_INT : T_LONG; > > and add a new function `Op_DivMod(BasicType bt, bool is_unsigned)` similar to `Op_ConIL` that would replace `div_mod_op` here. Added new function `Op_DivModIL(...)` as `Op_DivMod` already exists as an opcode constant and cannot be removed without significant changes to the codebase (although never referenced directly). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1750908668 From jkarthikeyan at openjdk.org Tue Sep 10 01:21:25 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 10 Sep 2024 01:21:25 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v16] In-Reply-To: References: Message-ID: On Sun, 8 Sep 2024 05:07:59 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: > > - change (~v & ones) == 0 to (v & ones) == ones > - Merge branch 'master' into unsignedbounds > - fix builds > - add trivial test cases > - make should return the correct type > - rename tests > - more explanation > - fix build > - fix build > - add more comments, group KnownBits > - ... and 13 more: https://git.openjdk.org/jdk/compare/5b72bbf9...9b70213e This is looking really nice, I'm very excited to see this work continue! The added comments definitely help in understanding the theory and implementation. I think the approach of splitting the domain as `[lo, uhi] U [ulo, hi]` is quite clever, as it nicely avoids needing to consider how the bits interact across signs. I've added some comments below. src/hotspot/share/opto/rangeinference.cpp line 292: > 290: // Trivially canonicalize the bounds so that srange._lo and urange._hi are > 291: // both < 0 or >= 0. The same for srange._hi and urange._ulo > 292: if (S(urange._lo) > S(urange._hi)) { I think it would be worth mentioning in the comments that the case where `ulo` can be greater than `uhi` in the signed domain happens when the range crosses zero, to make it easier for people to understand the purpose of the logic from reading the comments. src/hotspot/share/opto/rangeinference.cpp line 449: > 447: // convergence by abandoning the bounds > 448: template > 449: const Type* int_type_widen(const CT* nt, const CT* ot, const CT* lt) { I think here and in `int_type_narrow` it would be good to write the parameters as `new_type`, `old_type`, and `limit_type` for clarity. src/hotspot/share/opto/rangeinference.hpp line 112: > 110: } > 111: > 112: return urange._hi - U(srange._lo) + U(srange._hi) - urange._lo + 1; Suggestion: return (urange._hi - U(srange._lo)) + (U(srange._hi) - urange._lo) + 1; The functionality is the same but this makes it more clear that the logic is `(uhi - lo) + (hi - ulo)`. src/hotspot/share/opto/rangeinference.hpp line 146: > 144: > 145: void int_type_dump(const TypeInt* t, outputStream* st, bool verbose); > 146: void int_type_dump(const TypeLong* t, outputStream* st, bool verbose); Maybe this could be called `long_type_dump`? Since it's not a template function and is specific to TypeLong. ------------- PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2287119764 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1749325904 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1749376527 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1751096189 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1749378575 From jkarthikeyan at openjdk.org Tue Sep 10 01:21:26 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 10 Sep 2024 01:21:26 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v15] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 18:12:55 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > fix builds src/hotspot/share/opto/rangeinference.cpp line 470: > 468: // Neither contains each other, weird? > 469: // fatal("Integer value range is not subset"); > 470: // return this; This looks like leftover dead code. src/hotspot/share/opto/type.hpp line 621: > 619: bool contains(const TypeInt* t) const; > 620: // Excluding the cases where this and t are the same > 621: bool properly_contains(const TypeInt* t) const; It doesn't look like this method is used in the changeset (or at least, I couldn't find any references with grep.) I think you could potentially remove it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1747853142 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1747687314 From duke at openjdk.org Tue Sep 10 02:20:06 2024 From: duke at openjdk.org (kuaiwei) Date: Tue, 10 Sep 2024 02:20:06 GMT Subject: RFR: 8339299: C1 will miss type profile when inline final method [v4] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 15:21:19 GMT, Leonid Mesnik wrote: >> kuaiwei has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' of https://github.com/openjdk/jdk into c1_cha_type_profile >> - Simplify should_profile_receiver_type >> - Modify test case to use whitebox api >> - 8339299: C1 will miss type profile when inline final method > > test/hotspot/jtreg/compiler/cha/TypeProfileFinalMethod.java line 46: > >> 44: public class TypeProfileFinalMethod { >> 45: public static void main(String[] args) throws Exception { >> 46: ProcessBuilder pb = ProcessTools.createLimitedTestJavaProcessBuilder( > > The `createLimitedTestJavaProcessBuilder` ignores any other VM flags. This mode should be used only of test too specific. > The > ` * @requires (vm.opt.TieredStoAtLevel == null | vm.opt.TieredStopAtLevel == 4)` > means that we are going to run test with mostly any GC/Runtime and C2 stress options flags. (No C1-only) > So it is needed to use createLimitedTestJavaProcessBuilder to accept all VM flags. > You don't neet to test any additional VM flags until you have a reasons to suppose that something might fails. > Just use `createTestJavaProcessBuilder` instead. > If you think that test shouldn't accept any addtional vm flags, use > `@requires vm.flagless` > instead of > `@requires (vm.opt.TieredStoAtLevel == null | vm.opt.TieredStopAtLevel == 4)` > So test is executed only once and don't run if it is gong to ignore flags. Thanks for your suggestions. The test case is dependent on tiered compilation and type profile. Is there any other option for these requirements? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20786#discussion_r1751153777 From jkarthikeyan at openjdk.org Tue Sep 10 03:29:05 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 10 Sep 2024 03:29:05 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 19:36:51 GMT, Jatin Bhateja wrote: > - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The > condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. > - This saves emitting an explicit MOVZX instruction after setCC. > - These new instructions are encoded using 4 byte Extended EVEX encoding. > > Validation performed over stand alone test point using Intel SDE. > > Best Regards, > Jatin src/hotspot/cpu/x86/macroAssembler_x86.cpp line 10425: > 10423: } > 10424: > 10425: void MacroAssembler::setCC(Assembler::Condition comparison, Register dst) { Generally I think we use all lowercase for assembler functions, such as `Assembler::jcc`. I think it would be easier to read if this were named `setcc` (and similar for `esetzucc`). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20920#discussion_r1751195879 From duke at openjdk.org Tue Sep 10 08:35:25 2024 From: duke at openjdk.org (kuaiwei) Date: Tue, 10 Sep 2024 08:35:25 GMT Subject: RFR: 8339299: C1 will miss type profile when inline final method [v5] In-Reply-To: References: Message-ID: > I found sometimes C1 will miss type profile and I tried to write a test to demonstrate it. > It will happen with these conditions > 1 It's a virtual call and the callee is a final method. So c1 will think it's static bound. > 2 The interpreter never touch the callsite, so interpreter does not add type profile. > 3 In c1 compilation, it will be inlined based on CHA. > 4 In c2 compilation, the CHA is broken, but type profile is missing, so c2 can not inline it. kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Modify test case to use createTestJavaProcessBuilder ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20786/files - new: https://git.openjdk.org/jdk/pull/20786/files/8bd9ee11..666ce51f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20786&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20786&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20786.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20786/head:pull/20786 PR: https://git.openjdk.org/jdk/pull/20786 From dholmes at openjdk.org Tue Sep 10 08:54:13 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 10 Sep 2024 08:54:13 GMT Subject: RFR: 8339714: Delete tedious bool type define In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 09:50:59 GMT, SendaoYan wrote: > Hi all, > This PR delete tedious bool type define in `src/java.base/unix/native/libjsig/jsig.c` and `src/utils/hsdis/binutils/hsdis-binutils.c`. After JEP 347([JDK-8246032](https://bugs.openjdk.org/browse/JDK-8246032)), I think we can "#include " to use bool type directly, like [string.h](https://github.com/openjdk/jdk/blob/master/src/java.desktop/unix/native/libpipewire/include/spa/utils/string.h#L13) do. > Make code more concision, the risk is quite low. > > Additional testing: > > - [x] Local build with --with-hsdis=binutils --with-binutils=$HOME/software/binutils > - [x] Jtreg tests(include tier1/tier2/tier3 etc.) on linux x64 > - [x] Jtreg tests(include tier1/tier2/tier3 etc.) on linux aarch64 This seems trivially fine to me. The JEP isn't really relevant for this C code as we have C99 as a minimum for a while now. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20909#pullrequestreview-2291801397 From jwaters at openjdk.org Tue Sep 10 09:05:10 2024 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 10 Sep 2024 09:05:10 GMT Subject: RFR: 8339714: Delete tedious bool type define In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 08:51:56 GMT, David Holmes wrote: > This seems trivially fine to me. The JEP isn't really relevant for this C code as we have C99 as a minimum for a while now. > > Thanks We actually have C11 as a minimum, C99 for Unix and C89 for Windows was our old standard, but that has since changed ------------- PR Comment: https://git.openjdk.org/jdk/pull/20909#issuecomment-2340086782 From dholmes at openjdk.org Tue Sep 10 09:14:11 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 10 Sep 2024 09:14:11 GMT Subject: RFR: 8339714: Delete tedious bool type define In-Reply-To: References: Message-ID: <_3Rjp678JLW7L-IYTVij3Trz4bax7oDxJCM5Kil4oHk=.0e43bcca-0b6c-48b4-80e7-409b1ffe8420@github.com> On Mon, 9 Sep 2024 09:50:59 GMT, SendaoYan wrote: > Hi all, > This PR delete tedious bool type define in `src/java.base/unix/native/libjsig/jsig.c` and `src/utils/hsdis/binutils/hsdis-binutils.c`. After JEP 347([JDK-8246032](https://bugs.openjdk.org/browse/JDK-8246032)), I think we can "#include " to use bool type directly, like [string.h](https://github.com/openjdk/jdk/blob/master/src/java.desktop/unix/native/libpipewire/include/spa/utils/string.h#L13) do. > Make code more concision, the risk is quite low. > > Additional testing: > > - [x] Local build with --with-hsdis=binutils --with-binutils=$HOME/software/binutils > - [x] Jtreg tests(include tier1/tier2/tier3 etc.) on linux x64 > - [x] Jtreg tests(include tier1/tier2/tier3 etc.) on linux aarch64 Yes I misspoke, I should have said we have required at least C99 for a while now. :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20909#issuecomment-2340105221 From kurashige.taizo at fujitsu.com Tue Sep 10 09:53:40 2024 From: kurashige.taizo at fujitsu.com (Taizo Kurashige (Fujitsu)) Date: Tue, 10 Sep 2024 09:53:40 +0000 Subject: Question about JDK-8221092 In-Reply-To: References: Message-ID: Hi Sandhya, Thank you for your response the other day. I'm sorry to bother you again. I'm trying to find specific documentation or evidence regarding the stepping values for certain processors. Specifically, I'm looking for confirmation on the following points: ?All Skylake processors have stepping < 5 ?CascadeLake processors have stepping >=5 Despite my efforts, I haven't been able to locate this information in the documentation provided by Intel. If you could point me in the right direction or provide any insights, it would be incredibly helpful. Thank you so much for your time and assistance. Best regards, Taizo ---------------------------------------------------------------------------------------- Taizo Kurashige GitHub account : https://github.com/kurashige23 ________________________________ Hi all, I'm sorry to bother you again. If possible, could anyone please give me some insight on the following? Is there a specification for what the stepping value is for a particular processor? For example, is it defined in any documentation that CascadeLake processors have stepping >=5? I searched the documentation provided by Intel but couldn't find it. I want some evidence that the following is true. ?All Skylake processors have stepping < 5 ?CascadeLake processors have stepping >=5. Thanks. ---------------------------------------------------------------------------------------- Taizo Kurashige GitHub account : https://github.com/kurashige23 ________________________________ Hi all, I'm sorry to bother you again. If possible, could anyone please give me some insight? Thanks. ---------------------------------------------------------------------------------------- Taizo Kurashige GitHub account : https://github.com/kurashige23 ________________________________ Hi all, If possible, could anyone give me some insight? Thanks. ---------------------------------------------------------------------------------------- Taizo Kurashige GitHub account : https://github.com/kurashige23 ________________________________ Hi Sandhya, Thank you for your response. Thanks to you, I understood the following. ?All Skylake processors have cupid family=6, model=0x55, stepping < 5 ?CascadeLake processors have cupid family=6, model=0x55, stepping >=5. If possible, I would like you to tell me about the following. Is there a specification for what the stepping value is for a particular processor? For example, is it defined in any documentation that CascadeLake processors have stepping >=5? I searched the documentation provided by Intel but couldn't find it. I want some evidence that the following is true. ?All Skylake processors have stepping < 5 ?CascadeLake processors have stepping >=5. If stepping per processor is specified somewhere, please let me know. Thanks. ---------------------------------------------------------------------------------------- Taizo Kurashige GitHub account : https://github.com/kurashige23 ________________________________ Hi Taizo, UseAVX is set to 2 for all Skylake processors (cupid family=6, model=0x55, stepping < 5) , not just Skylake X. CascadeLake processors have cupid family=6, model=0x55, stepping >=5. Hope this helps. Best Regards, Sandhya -------- Forwarded Message -------- Subject: Re: Question about JDK-8221092 Date: Wed, 10 Jul 2024 06:39:43 +0000 From: Taizo Kurashige (Fujitsu) To: hotspot-compiler-dev at openjdk.org Hi all, Could someone please respond to this question if possible? Thank you. ---------------------------------------------------------------------------------------- Taizo Kurashige GitHub account : https://github.com/kurashige23 ------------------------------------------------------------------------------------------------------------------------ *???:* Kurashige, Taizo/?? ?? *????:* 2024?7?3? 15:09 *??:* hotspot-compiler-dev at openjdk.org *??:* Question about JDK-8221092 Hi all, I have a question about https://bugs.openjdk.org/browse/JDK-8221092. If possible, could someone please provide some insight? Here's what I would like to know: 1. Is it correct to understand that "Skylake (X7) processors" refers to the Skylake processors listed at https://ark.intel.com/content/www/us/en/ark/products/codename/37572/products-formerly-skylake.html, specifically those in the 7000 series with an "X" or "XE" in their names? For example, "Intel? Core? i9-7920X X-series Processor (16.5M Cache, up to 4.30 GHz)" or "Intel? Core? i9-7980XE Extreme Edition Processor (24.75M Cache, up to 4.20 GHz)". 2. In the fix for JDK-8221092, if the stepping is less than 5, the processor is considered to be of Skylake (X7) or an earlier version. In such cases, UseAVX is set to 2. Is there any documentation that the stepping for Skylake (X7) is 5? Thank you. ---------------------------------------------------------------------------------------- Taizo Kurashige GitHub account : https://github.com/kurashige23 kurashige23 - Overview kurashige23 has 5 repositories available. Follow their code on GitHub. github.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbhateja at openjdk.org Tue Sep 10 11:28:13 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 10 Sep 2024 11:28:13 GMT Subject: RFR: 8329035: New Data Destination instructions support [v2] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 21:40:15 GMT, Steve Dohrmann wrote: >> src/hotspot/cpu/x86/assembler_x86.cpp line 6686: >> >>> 6684: } >>> 6685: >>> 6686: void Assembler::esall(Register dst, Register src, int imm8, bool no_flags) { >> >> assert(isShiftCount(imm8), "illegal shift count") missing. > > Thanks, done. SAL looks incorrect pneumonic, should be SHL. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1751706428 From jbhateja at openjdk.org Tue Sep 10 11:28:11 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 10 Sep 2024 11:28:11 GMT Subject: RFR: 8329035: New Data Destination instructions support [v3] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 21:44:44 GMT, Steve Dohrmann wrote: >> Adds assembler support for APX New Data Destination (NDD) and No Flags (NF) features. >> >> The NDD feature is supported by new functions that take an additional destination-only register operand. If the instruction also supports NF, a no_flags boolean parameter is present. To use these instructions with NF behavior, but without NDD semantics, the same register can be supplied for both the new destination and the (first) source operand. >> >> Some instructions support NF but not NDD. These instructions have a new function that just adds a boolean no_flags parameter. Existing functions were not overloaded with a boolean here because of signature collisions (bool / int) with functions that take immediate operands. >> >> All of the new functions have a letter "e" prefix, to avoid signature collisions and to indicate they will be evex encoded. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > refactoring and fixes based on review comments Hi @steveatgh , Reviewing your patch is a nice refresher of almost all x86 scalar instructions :-) Have some minor comments. src/hotspot/cpu/x86/assembler_x86.cpp line 1420: > 1418: emit_arith(0x11, 0xC0, src1, src2); > 1419: } > 1420: Encodings for ADC looks good, but I could not find uses for these instructions beyond stubs. Do you think we should add them lazily on need basis. src/hotspot/cpu/x86/assembler_x86.cpp line 1452: > 1450: emit_int8(imm8); > 1451: } > 1452: Again I could not find any use of scalar 16 bit addition and only one use of 8 bit add. Should be consider adding them lazily. src/hotspot/cpu/x86/assembler_x86.cpp line 1783: > 1781: emit_arith(0x23, 0xC0, src1, src2); > 1782: } > 1783: We can skip adding 8bit and 16bit flavors of AND instructions since they are not being used by compiler / stubs. src/hotspot/cpu/x86/assembler_x86.cpp line 1795: > 1793: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_32bit); > 1794: evex_prefix_ndd(src, dst->encoding(), 0, VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, no_flags); > 1795: emit_arith_operand(0x81, as_Register(4), src, imm32); Suggestion: emit_arith_operand(0x81, rsp, src, imm32); src/hotspot/cpu/x86/assembler_x86.cpp line 1840: > 1838: emit_operand(src1, src2, 0); > 1839: } > 1840: Should we only keep one _eandl (REG, REG, ADR)_ and remove other one _eandl(REG, ADR, REG)_, apart from source operand encodings there is no other difference b/w the two instructions. src/hotspot/cpu/x86/assembler_x86.cpp line 2016: > 2014: void Assembler::ecmovl(Condition cc, Register dst, Register src1, Address src2) { > 2015: InstructionMark im(this); > 2016: NOT_LP64(guarantee(VM_Version::supports_cmov(), "illegal instruction")); APX is only supported by 64 bit (IA-32e mode) targets. src/hotspot/cpu/x86/assembler_x86.cpp line 2696: > 2694: } > 2695: > 2696: void Assembler::eidivl(Register src, bool no_flags) { // Unsigned Comment should be signed division. src/hotspot/cpu/x86/assembler_x86.cpp line 6801: > 6799: emit_arith_operand(0x81, rbx, src, imm32); > 6800: } > 6801: Don't find any usage of SUB with borrow in existing code. I think we can defer adding it till we have a use. src/hotspot/cpu/x86/assembler_x86.cpp line 12723: > 12721: byte3 |= pre; > 12722: > 12723: // P2: byte 4 as zL'Lbv'aaa or 00L0VF00 where V = V4 and F = NF (no flags) Suggestion: // P2: byte 4 as zL'Lbv'aaa or 00LXVF00 where V = V4, X(extended context) = ND and F = NF (no flags) ------------- PR Review: https://git.openjdk.org/jdk/pull/20698#pullrequestreview-2291465334 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1751415531 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1751419391 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1751475124 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1751491312 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1751552473 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1751565632 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1751596776 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1751710213 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1751327543 From jbhateja at openjdk.org Tue Sep 10 11:45:25 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 10 Sep 2024 11:45:25 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v2] In-Reply-To: References: Message-ID: <-gmI8CWVHCk3ebpH6M3IaB4avuiG7QpzykKxYzGso2o=.b8072269-55e9-4c57-85e9-9f05fba5d934@github.com> > - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The > condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. > - This saves emitting an explicit MOVZX instruction after setCC. > - These new instructions are encoded using 4 byte Extended EVEX encoding. > > Validation performed over stand alone test point using Intel SDE. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review resolutions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20920/files - new: https://git.openjdk.org/jdk/pull/20920/files/bfe6f206..1488b588 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20920&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20920&range=00-01 Stats: 18 lines in 7 files changed: 0 ins; 0 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/20920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20920/head:pull/20920 PR: https://git.openjdk.org/jdk/pull/20920 From jbhateja at openjdk.org Tue Sep 10 11:51:06 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 10 Sep 2024 11:51:06 GMT Subject: RFR: 8339698: x86 unused andw/orw/xorw/addw encoding could be removed [v2] In-Reply-To: References: <3i6Yq72VmyFDsHctqefLP-JuAoNFKYhCqygBfHmHN-M=.1d5baff1-b71e-4686-a2d2-e56c60d7938b@github.com> Message-ID: On Mon, 9 Sep 2024 18:17:18 GMT, Sandhya Viswanathan wrote: >> x86 andw/orw/xorw(Register, Register) encoding is missing 0x66 prefix. These instructions along with addw(Register, Register) are unused and so removed. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Remove unused andw/orw/xorw/addw (reg, reg) instruction encoding Cleanup is also fine, still looks good. ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20901#pullrequestreview-2292198534 From roland at openjdk.org Tue Sep 10 11:54:05 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 10 Sep 2024 11:54:05 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v2] In-Reply-To: References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> <4EVmLjNMTfnnGZOmvV41VLHO4OOuLiJLEuu5723qYag=.6309b7e2-e491-499d-b23e-35d440cec5f4@github.com> Message-ID: On Mon, 9 Sep 2024 20:43:17 GMT, Kangcheng Xu wrote: >> src/hotspot/share/opto/compile.cpp line 3174: >> >>> 3172: } >>> 3173: >>> 3174: BasicType bt = div_op == Op_DivI || div_op == Op_UDivI ? T_INT : T_LONG; >> >> and add a new function `Op_DivMod(BasicType bt, bool is_unsigned)` similar to `Op_ConIL` that would replace `div_mod_op` here. > > Added new function `Op_DivModIL(...)` as `Op_DivMod` already exists as an opcode constant and cannot be removed without significant changes to the codebase (although never referenced directly). I don't mind using `Op_DivModIL(...)` but what happens if you remove `Op_DivMod`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1751802963 From roland at openjdk.org Tue Sep 10 11:54:06 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 10 Sep 2024 11:54:06 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v2] In-Reply-To: <7spZ424UdaAr1E-q8E508vHdwGBv7l6ETGK-T53-6xM=.64104918-adeb-4ba9-b623-414fe46b73e6@github.com> References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> <7spZ424UdaAr1E-q8E508vHdwGBv7l6ETGK-T53-6xM=.64104918-adeb-4ba9-b623-414fe46b73e6@github.com> Message-ID: On Mon, 9 Sep 2024 20:46:24 GMT, Kangcheng Xu wrote: >> Hello all. This patch addresses [JDK-8332442](https://bugs.openjdk.org/browse/JDK-8332442) and refactors `Op_ModI`/`Op_ModL`/`Op_UModI`/`Op_UModL` cases in `DIVMOD` transforamtions. The purpose of the transformation to convert adjacent div `/` and mod `%` operations of the same operands into one should the platform support this feature (e.g., x86-64). >> >> I took the liberty adding _signed_ DIVMOD nodes (i.e., `DIV_MOD_I` and `DIV_MOD_L`) to the `IRNode` class constants as they are previously missing. Please let me know if they were left intentionally and if there are any other concerns. Thanks! >> >> ~This will be a draft PR before GHA tests are confirmed passing.~ > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > remove redundant arguments, test with -XX:-UseDivMod test/hotspot/jtreg/compiler/c2/TestDivModNodes.java line 34: > 32: * @test > 33: * @summary Test DIV and MOD nodes are converted into DIVMOD where possible > 34: * @requires os.arch=="amd64" | os.arch=="x86_64" So this can run on aarch64 now? test/hotspot/jtreg/compiler/c2/TestDivModNodes.java line 47: > 45: @Arguments(values = {Argument.RANDOM_EACH, Argument.RANDOM_EACH}) > 46: @IR(counts = {IRNode.DIV_MOD_I, "1" }, applyIf = {"UseDivMod", "true"}) > 47: @IR(failOn = {IRNode.DIV_MOD_I}, applyIf = {"UseDivMod", "false"}) Rather than check that there's no `DivMod` node, you could check that there are the expected nodes (1 `Mul`, 1 `Sub`). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1751801808 PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1751801516 From tschatzl at openjdk.org Tue Sep 10 12:02:11 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 10 Sep 2024 12:02:11 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v18] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 11:15:47 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion > - riscv port for JEP 475 src/hotspot/cpu/aarch64/gc/g1/g1BarrierSetAssembler_aarch64.cpp line 210: > 208: Label& done, > 209: bool new_val_may_be_null) { > 210: // Does store cross heap regions? Suggestion: // Does store cross heap regions? Indentation ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1751721626 From qamai at openjdk.org Tue Sep 10 12:03:06 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 10 Sep 2024 12:03:06 GMT Subject: RFR: 8339698: x86 unused andw/orw/xorw/addw encoding could be removed [v2] In-Reply-To: References: <3i6Yq72VmyFDsHctqefLP-JuAoNFKYhCqygBfHmHN-M=.1d5baff1-b71e-4686-a2d2-e56c60d7938b@github.com> Message-ID: On Mon, 9 Sep 2024 18:17:18 GMT, Sandhya Viswanathan wrote: >> x86 andw/orw/xorw(Register, Register) encoding is missing 0x66 prefix. These instructions along with addw(Register, Register) are unused and so removed. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Remove unused andw/orw/xorw/addw (reg, reg) instruction encoding Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20901#pullrequestreview-2292225717 From qamai at openjdk.org Tue Sep 10 12:23:28 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 10 Sep 2024 12:23:28 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v17] In-Reply-To: References: Message-ID: <9SRKk6LjXxgsYSEXaPmexK4haR2N7WlnXVJZW_XTAaE=.b2d0892d-c512-4f62-870d-373481e6309f@github.com> > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: add doc to TypeInt, rename parameters, remove unused methods ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/9b70213e..81f4e15b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=15-16 Stats: 135 lines in 4 files changed: 65 ins; 15 del; 55 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Tue Sep 10 12:23:29 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 10 Sep 2024 12:23:29 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 08:25:19 GMT, Emanuel Peter wrote: >> Yes I have renamed all signed types to `S` and unsigned types to `U`. Regarding making it a member of `KnownBits`, making it a `static` function has the advantage of visibility to me. > > What do you mean by "advantage of visibility"? I mean it is only used in `rangeinference.cpp` while `KnownBits` can be accessed from other places as well. And updating a `KnownBits` according to a `RangeInt` seems local to the context of `TypeInt` canonicalization. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1751841702 From qamai at openjdk.org Tue Sep 10 12:23:29 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 10 Sep 2024 12:23:29 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v15] In-Reply-To: References: Message-ID: <9rtaON0aEhbYU5SrnW9Gkcr37iB6ZlFpTzdBCMxAXmU=.600eebbd-f964-47b0-9fd2-e3ecc7b50aeb@github.com> On Sat, 7 Sep 2024 01:19:56 GMT, Jasmine Karthikeyan wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> fix builds > > src/hotspot/share/opto/rangeinference.cpp line 470: > >> 468: // Neither contains each other, weird? >> 469: // fatal("Integer value range is not subset"); >> 470: // return this; > > This looks like leftover dead code. Yes it is, but I don't want to remove it. I think it should be the case the `new_type` contains `old_type`, but apparently sometimes it is not the case and I don't understand it. Returning the bottom type instead of `limit_type` is also weird, too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1751837343 From qamai at openjdk.org Tue Sep 10 12:23:31 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 10 Sep 2024 12:23:31 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v16] In-Reply-To: References: Message-ID: On Sun, 8 Sep 2024 22:26:20 GMT, Jasmine Karthikeyan wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: >> >> - change (~v & ones) == 0 to (v & ones) == ones >> - Merge branch 'master' into unsignedbounds >> - fix builds >> - add trivial test cases >> - make should return the correct type >> - rename tests >> - more explanation >> - fix build >> - fix build >> - add more comments, group KnownBits >> - ... and 13 more: https://git.openjdk.org/jdk/compare/5b72bbf9...9b70213e > > src/hotspot/share/opto/rangeinference.hpp line 146: > >> 144: >> 145: void int_type_dump(const TypeInt* t, outputStream* st, bool verbose); >> 146: void int_type_dump(const TypeLong* t, outputStream* st, bool verbose); > > Maybe this could be called `long_type_dump`? Since it's not a template function and is specific to TypeLong. I think keeping the name consistent with other functions is also a legit reason. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1751838397 From qamai at openjdk.org Tue Sep 10 12:30:11 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 10 Sep 2024 12:30:11 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v13] In-Reply-To: References: Message-ID: <1idRf5I6TbUSThDzmOaCvpcBwDD5Dsl1kuUAJF0lxcc=.b2dabb18-d926-478d-9f7a-2ca836356712@github.com> On Thu, 5 Sep 2024 16:59:02 GMT, Jasmine Karthikeyan wrote: >> Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: >> >> - rename tests >> - more explanation > > I think in most cases we expect `TypeInteger::make()` to have well-defined inputs and thus well-defined outputs, so I also think it would be good to keep the use sites as they were before for code cleanliness. For places where TOP is allowed there could be another function, maybe `TypeInteger::try_make()`, to signal explicitly that TOP is being handled by the callee code. @jaskarth Thanks a lot for your reviews and suggestions, I hope I have addressed all of them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2340577178 From syan at openjdk.org Tue Sep 10 12:45:08 2024 From: syan at openjdk.org (SendaoYan) Date: Tue, 10 Sep 2024 12:45:08 GMT Subject: RFR: 8339714: Delete tedious bool type define In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 09:50:59 GMT, SendaoYan wrote: > Hi all, > This PR delete tedious bool type define in `src/java.base/unix/native/libjsig/jsig.c` and `src/utils/hsdis/binutils/hsdis-binutils.c`. After JEP 347([JDK-8246032](https://bugs.openjdk.org/browse/JDK-8246032)), I think we can "#include " to use bool type directly, like [string.h](https://github.com/openjdk/jdk/blob/master/src/java.desktop/unix/native/libpipewire/include/spa/utils/string.h#L13) do. > Make code more concision, the risk is quite low. > > Additional testing: > > - [x] Local build with --with-hsdis=binutils --with-binutils=$HOME/software/binutils > - [x] Jtreg tests(include tier1/tier2/tier3 etc.) on linux x64 > - [x] Jtreg tests(include tier1/tier2/tier3 etc.) on linux aarch64 Thanks all for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20909#issuecomment-2340620724 From duke at openjdk.org Tue Sep 10 12:45:08 2024 From: duke at openjdk.org (duke) Date: Tue, 10 Sep 2024 12:45:08 GMT Subject: RFR: 8339714: Delete tedious bool type define In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 09:50:59 GMT, SendaoYan wrote: > Hi all, > This PR delete tedious bool type define in `src/java.base/unix/native/libjsig/jsig.c` and `src/utils/hsdis/binutils/hsdis-binutils.c`. After JEP 347([JDK-8246032](https://bugs.openjdk.org/browse/JDK-8246032)), I think we can "#include " to use bool type directly, like [string.h](https://github.com/openjdk/jdk/blob/master/src/java.desktop/unix/native/libpipewire/include/spa/utils/string.h#L13) do. > Make code more concision, the risk is quite low. > > Additional testing: > > - [x] Local build with --with-hsdis=binutils --with-binutils=$HOME/software/binutils > - [x] Jtreg tests(include tier1/tier2/tier3 etc.) on linux x64 > - [x] Jtreg tests(include tier1/tier2/tier3 etc.) on linux aarch64 @sendaoYan Your change (at version bf2ceae0fc34a21760414472805c0bda5ca4a3db) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20909#issuecomment-2340629392 From tschatzl at openjdk.org Tue Sep 10 13:03:15 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 10 Sep 2024 13:03:15 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v18] In-Reply-To: References: Message-ID: <6oxoxxR5yGICT3d_4Wip3egvyFiQLWb3HWMxpL41jwY=.9ca3bb60-1e86-4bd2-a3b4-26eadc409eb5@github.com> On Mon, 9 Sep 2024 11:15:47 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion > - riscv port for JEP 475 Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2292405233 From kxu at openjdk.org Tue Sep 10 14:44:05 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 10 Sep 2024 14:44:05 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v2] In-Reply-To: References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> <4EVmLjNMTfnnGZOmvV41VLHO4OOuLiJLEuu5723qYag=.6309b7e2-e491-499d-b23e-35d440cec5f4@github.com> Message-ID: On Tue, 10 Sep 2024 11:51:50 GMT, Roland Westrelin wrote: >> Added new function `Op_DivModIL(...)` as `Op_DivMod` already exists as an opcode constant and cannot be removed without significant changes to the codebase (although never referenced directly). > > I don't mind using `Op_DivModIL(...)` but what happens if you remove `Op_DivMod`? I would end up with some linker error: /usr/bin/ld: /path/to/jdk/build/linux-x86_64-server-fastdebug/hotspot/variant-server/libjvm/objs/divnode.o: in function `DivModNode::DivModNode(Node*, Node*, Node*)': /path/to/jdk/src/hotspot/share/opto/divnode.cpp:1357:(.text+0x7e99): undefined reference to `vtable for DivModNode' It looks like all nodes (even those abstract ones like `Multi` and `Op_Multi`) is processed with `macro` and added to the enum. I don't know why but I looks intentional. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1752124651 From kxu at openjdk.org Tue Sep 10 14:52:12 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 10 Sep 2024 14:52:12 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v2] In-Reply-To: References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> <7spZ424UdaAr1E-q8E508vHdwGBv7l6ETGK-T53-6xM=.64104918-adeb-4ba9-b623-414fe46b73e6@github.com> Message-ID: On Tue, 10 Sep 2024 11:50:52 GMT, Roland Westrelin wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> remove redundant arguments, test with -XX:-UseDivMod > > test/hotspot/jtreg/compiler/c2/TestDivModNodes.java line 34: > >> 32: * @test >> 33: * @summary Test DIV and MOD nodes are converted into DIVMOD where possible >> 34: * @requires os.arch=="amd64" | os.arch=="x86_64" > > So this can run on aarch64 now? I was under the impression aarch64 cannot calculate quotient and remainder with a single div instruction (unlike x86), and I assumed that's why existing tests on divmod doesn't include this arch. Looking at the source, only `src/hotspot/cpu/x86/x86_64.ad` and `s390.ad` seems to contain divmod related instructions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1752138368 From roland at openjdk.org Tue Sep 10 14:57:08 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 10 Sep 2024 14:57:08 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v2] In-Reply-To: References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> <4EVmLjNMTfnnGZOmvV41VLHO4OOuLiJLEuu5723qYag=.6309b7e2-e491-499d-b23e-35d440cec5f4@github.com> Message-ID: On Tue, 10 Sep 2024 14:41:34 GMT, Kangcheng Xu wrote: >> I don't mind using `Op_DivModIL(...)` but what happens if you remove `Op_DivMod`? > > I would end up with some linker error: > > /usr/bin/ld: /path/to/jdk/build/linux-x86_64-server-fastdebug/hotspot/variant-server/libjvm/objs/divnode.o: in function `DivModNode::DivModNode(Node*, Node*, Node*)': > /path/to/jdk/src/hotspot/share/opto/divnode.cpp:1357:(.text+0x7e99): undefined reference to `vtable for DivModNode' > > > It looks like all nodes (even those abstract ones like `Multi` and `Op_Multi`) is processed with `macro` and added to the enum. I don't know why but I looks intentional. I think that's because `DivModNode` has a virtual method `Opcode()` that's defined by `macro`. Removing the `Op_DivMod` line removes the virtual method definition but the declaration is still there. So removing `Op_DivMod` requires removing the `Opcode()` declaration in the `DivModNode` class. Anyway, that can be cleaned up in a separate PR if needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1752145910 From roland at openjdk.org Tue Sep 10 15:02:06 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 10 Sep 2024 15:02:06 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v2] In-Reply-To: References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> <7spZ424UdaAr1E-q8E508vHdwGBv7l6ETGK-T53-6xM=.64104918-adeb-4ba9-b623-414fe46b73e6@github.com> Message-ID: On Tue, 10 Sep 2024 14:49:39 GMT, Kangcheng Xu wrote: >> test/hotspot/jtreg/compiler/c2/TestDivModNodes.java line 34: >> >>> 32: * @test >>> 33: * @summary Test DIV and MOD nodes are converted into DIVMOD where possible >>> 34: * @requires os.arch=="amd64" | os.arch=="x86_64" >> >> So this can run on aarch64 now? > > I was under the impression aarch64 cannot calculate quotient and remainder with a single div instruction (unlike x86), and I assumed that's why existing tests on divmod doesn't include this arch. Looking at the source, only `src/hotspot/cpu/x86/x86_64.ad` and `s390.ad` seems to contain divmod related instructions. `UseDivMod` is true on all platforms but if there's no hardware support for `Op_DivModI` then this code: } else { // replace a%b with a-((a/b)*b) Node* mult = new MulINode(d, d->in(2)); Node* sub = new SubINode(d->in(1), mult); n->subsume_by(sub, this); } computes the `ModI` from the `DivI` result and removes the `DivI`. That's what you could match on aarch64. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1752155128 From roland at openjdk.org Tue Sep 10 15:02:09 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 10 Sep 2024 15:02:09 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v2] In-Reply-To: References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> <7spZ424UdaAr1E-q8E508vHdwGBv7l6ETGK-T53-6xM=.64104918-adeb-4ba9-b623-414fe46b73e6@github.com> Message-ID: <9oHIOwxZTSxNSzcpzfHHLECOnex2KnvF9oyW8NlwnTA=.6c94aa9d-25c0-4100-a88c-f342068ceb5c@github.com> On Tue, 10 Sep 2024 11:50:36 GMT, Roland Westrelin wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> remove redundant arguments, test with -XX:-UseDivMod > > test/hotspot/jtreg/compiler/c2/TestDivModNodes.java line 47: > >> 45: @Arguments(values = {Argument.RANDOM_EACH, Argument.RANDOM_EACH}) >> 46: @IR(counts = {IRNode.DIV_MOD_I, "1" }, applyIf = {"UseDivMod", "true"}) >> 47: @IR(failOn = {IRNode.DIV_MOD_I}, applyIf = {"UseDivMod", "false"}) > > Rather than check that there's no `DivMod` node, you could check that there are the expected nodes (1 `Mul`, 1 `Sub`). Actually, my comment is incorrect. Sorry for the confusion. The expected nodes are 1 `Div`, 1 `Mod` and no `Div`, no `Mod` when `UseDivMod` is true. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1752154518 From aph at openjdk.org Tue Sep 10 15:03:09 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 10 Sep 2024 15:03:09 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v2] In-Reply-To: <-gmI8CWVHCk3ebpH6M3IaB4avuiG7QpzykKxYzGso2o=.b8072269-55e9-4c57-85e9-9f05fba5d934@github.com> References: <-gmI8CWVHCk3ebpH6M3IaB4avuiG7QpzykKxYzGso2o=.b8072269-55e9-4c57-85e9-9f05fba5d934@github.com> Message-ID: On Tue, 10 Sep 2024 11:45:25 GMT, Jatin Bhateja wrote: >> - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The >> condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. >> - This saves emitting an explicit MOVZX instruction after setCC. >> - These new instructions are encoded using 4 byte Extended EVEX encoding. >> >> Validation performed over stand alone test point using Intel SDE. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review resolutions. src/hotspot/cpu/x86/x86_64.ad line 7095: > 7093: "sete $res\n\t" > 7094: "movzbl $res, $res" %} > 7095: ins_encode %{ Maybe change the format statement to match. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20920#discussion_r1752157139 From chagedorn at openjdk.org Tue Sep 10 15:50:20 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 10 Sep 2024 15:50:20 GMT Subject: RFR: 8335444: Generalize implementation of AndNode mul_ring [v4] In-Reply-To: References: <7l_zt0Q884dkzJbP5kjhfvZPmFQ4CMRgakWoUCopfvw=.fc1307d0-6a2c-40cc-adc6-c1b21424d0dd@github.com> Message-ID: On Fri, 6 Sep 2024 04:41:24 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> I've written this patch which improves type calculation for bitwise-and functions. Previously, the only cases that were considered were if one of the inputs to the node were a positive constant. I've generalized this behavior, as well as added a case to better estimate the result for arbitrary ranges. Since these are very common patterns to see, this can help propagate more precise types throughout the ideal graph for a large number of methods, making other optimizations and analyses stronger. I was interested in where this patch improves types, so I ran CTW for `java_base` and `java_base_2` and printed out the differences in this gist [here](https://gist.github.com/jaskarth/b45260d81ab621656f4a55cc51cf5292). While I don't think it's particularly complicated I've also added some discussion of the mathematics below, mostly because I thought it was interesting to work through :) >> >> This patch passes tier1-3 testing on my linux x64 machine. Thoughts and reviews would be very appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Improve cases with two negative ranges, add more documentation Testing looked good! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20066#issuecomment-2341317282 From sviswanathan at openjdk.org Tue Sep 10 15:53:08 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 10 Sep 2024 15:53:08 GMT Subject: RFR: 8339698: x86 unused andw/orw/xorw/addw encoding could be removed [v2] In-Reply-To: References: <3i6Yq72VmyFDsHctqefLP-JuAoNFKYhCqygBfHmHN-M=.1d5baff1-b71e-4686-a2d2-e56c60d7938b@github.com> Message-ID: On Tue, 10 Sep 2024 11:48:41 GMT, Jatin Bhateja wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unused andw/orw/xorw/addw (reg, reg) instruction encoding > > Cleanup is also fine, still looks good. Thanks @jatin-bhateja @merykitty for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20901#issuecomment-2341325907 From sviswanathan at openjdk.org Tue Sep 10 15:56:17 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 10 Sep 2024 15:56:17 GMT Subject: Integrated: 8339698: x86 unused andw/orw/xorw/addw encoding could be removed In-Reply-To: <3i6Yq72VmyFDsHctqefLP-JuAoNFKYhCqygBfHmHN-M=.1d5baff1-b71e-4686-a2d2-e56c60d7938b@github.com> References: <3i6Yq72VmyFDsHctqefLP-JuAoNFKYhCqygBfHmHN-M=.1d5baff1-b71e-4686-a2d2-e56c60d7938b@github.com> Message-ID: On Sat, 7 Sep 2024 02:00:49 GMT, Sandhya Viswanathan wrote: > x86 andw/orw/xorw(Register, Register) encoding is missing 0x66 prefix. These instructions along with addw(Register, Register) are unused and so removed. > > Best Regards, > Sandhya This pull request has now been integrated. Changeset: be0dca04 Author: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/be0dca046a43ecef2dcd012da6399cbed4cd0454 Stats: 26 lines in 2 files changed: 0 ins; 26 del; 0 mod 8339698: x86 unused andw/orw/xorw/addw encoding could be removed Reviewed-by: kvn, jbhateja, qamai ------------- PR: https://git.openjdk.org/jdk/pull/20901 From kxu at openjdk.org Tue Sep 10 15:57:24 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 10 Sep 2024 15:57:24 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v3] In-Reply-To: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> Message-ID: > Hello all. This patch addresses [JDK-8332442](https://bugs.openjdk.org/browse/JDK-8332442) and refactors `Op_ModI`/`Op_ModL`/`Op_UModI`/`Op_UModL` cases in `DIVMOD` transforamtions. The purpose of the transformation to convert adjacent div `/` and mod `%` operations of the same operands into one should the platform support this feature (e.g., x86-64). > > I took the liberty adding _signed_ DIVMOD nodes (i.e., `DIV_MOD_I` and `DIV_MOD_L`) to the `IRNode` class constants as they are previously missing. Please let me know if they were left intentionally and if there are any other concerns. Thanks! > > ~This will be a draft PR before GHA tests are confirmed passing.~ Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge branch 'openjdk:master' into refactor-mod-cases - include aarch64 in tests, add more configuration combinations - remove redundant arguments, test with -XX:-UseDivMod - Merge branch 'master' into refactor-mod-cases - Add test and IRNode for signed int/long divmod - created IR tests - passing tier1 tests - refactor divmod ops to handle_div_mod_op ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20877/files - new: https://git.openjdk.org/jdk/pull/20877/files/6c604d25..57e96e10 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20877&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20877&range=01-02 Stats: 19350 lines in 696 files changed: 11315 ins; 4326 del; 3709 mod Patch: https://git.openjdk.org/jdk/pull/20877.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20877/head:pull/20877 PR: https://git.openjdk.org/jdk/pull/20877 From kxu at openjdk.org Tue Sep 10 15:57:24 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 10 Sep 2024 15:57:24 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v3] In-Reply-To: References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> <4EVmLjNMTfnnGZOmvV41VLHO4OOuLiJLEuu5723qYag=.6309b7e2-e491-499d-b23e-35d440cec5f4@github.com> Message-ID: On Tue, 10 Sep 2024 14:54:13 GMT, Roland Westrelin wrote: >> I would end up with some linker error: >> >> /usr/bin/ld: /path/to/jdk/build/linux-x86_64-server-fastdebug/hotspot/variant-server/libjvm/objs/divnode.o: in function `DivModNode::DivModNode(Node*, Node*, Node*)': >> /path/to/jdk/src/hotspot/share/opto/divnode.cpp:1357:(.text+0x7e99): undefined reference to `vtable for DivModNode' >> >> >> It looks like all nodes (even those abstract ones like `Multi` and `Op_Multi`) is processed with `macro` and added to the enum. I don't know why but it looks intentional. > > I think that's because `DivModNode` has a virtual method `Opcode()` that's defined by `macro`. Removing the `Op_DivMod` line removes the virtual method definition but the declaration is still there. So removing `Op_DivMod` requires removing the `Opcode()` declaration in the `DivModNode` class. Anyway, that can be cleaned up in a separate PR if needed. Thanks for explaining. That is reasonable ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1752239134 From kxu at openjdk.org Tue Sep 10 15:57:24 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 10 Sep 2024 15:57:24 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v2] In-Reply-To: References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> <7spZ424UdaAr1E-q8E508vHdwGBv7l6ETGK-T53-6xM=.64104918-adeb-4ba9-b623-414fe46b73e6@github.com> Message-ID: On Tue, 10 Sep 2024 14:59:40 GMT, Roland Westrelin wrote: >> I was under the impression aarch64 cannot calculate quotient and remainder with a single div instruction (unlike x86), and I assumed that's why existing tests on divmod doesn't include this arch. Looking at the source, only `src/hotspot/cpu/x86/x86_64.ad` and `s390.ad` seems to contain divmod related instructions. > > `UseDivMod` is true on all platforms but if there's no hardware support for `Op_DivModI` then this code: > > } else { > // replace a%b with a-((a/b)*b) > Node* mult = new MulINode(d, d->in(2)); > Node* sub = new SubINode(d->in(1), mult); > n->subsume_by(sub, this); > } > > computes the `ModI` from the `DivI` result and removes the `DivI`. That's what you could match on aarch64. I've added test configurations to reflect aarch64 with `+XX:+UseDivMod` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1752241162 From epeter at openjdk.org Tue Sep 10 16:11:08 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Sep 2024 16:11:08 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v3] In-Reply-To: References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> Message-ID: <2H3OQTQw7DAGcEpg2pD20RJ-bhCfOCx_0qEokIk1GBE=.f29fab37-7941-4ae4-9ffc-8bfe559f9c4c@github.com> On Tue, 10 Sep 2024 15:57:24 GMT, Kangcheng Xu wrote: >> Hello all. This patch addresses [JDK-8332442](https://bugs.openjdk.org/browse/JDK-8332442) and refactors `Op_ModI`/`Op_ModL`/`Op_UModI`/`Op_UModL` cases in `DIVMOD` transforamtions. The purpose of the transformation to convert adjacent div `/` and mod `%` operations of the same operands into one should the platform support this feature (e.g., x86-64). >> >> I took the liberty adding _signed_ DIVMOD nodes (i.e., `DIV_MOD_I` and `DIV_MOD_L`) to the `IRNode` class constants as they are previously missing. Please let me know if they were left intentionally and if there are any other concerns. Thanks! >> >> ~This will be a draft PR before GHA tests are confirmed passing.~ > > Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'openjdk:master' into refactor-mod-cases > - include aarch64 in tests, add more configuration combinations > - remove redundant arguments, test with -XX:-UseDivMod > - Merge branch 'master' into refactor-mod-cases > - Add test and IRNode for signed int/long divmod > - created IR tests > - passing tier1 tests > - refactor divmod ops to handle_div_mod_op test/hotspot/jtreg/compiler/c2/TestDivModNodes.java line 38: > 36: * @run main/othervm -XX:+UseDivMod compiler.c2.TestDivModNodes > 37: * @run main/othervm -XX:-UseDivMod compiler.c2.TestDivModNodes > 38: */ Drive by comment: please allow it to run on any platform, and add restrictions on whee the IR rules are run. That allows us to test results on other platforms, which is valuable on its own. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1752261217 From kxu at openjdk.org Tue Sep 10 16:21:46 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 10 Sep 2024 16:21:46 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v4] In-Reply-To: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> Message-ID: > Hello all. This patch addresses [JDK-8332442](https://bugs.openjdk.org/browse/JDK-8332442) and refactors `Op_ModI`/`Op_ModL`/`Op_UModI`/`Op_UModL` cases in `DIVMOD` transforamtions. The purpose of the transformation to convert adjacent div `/` and mod `%` operations of the same operands into one should the platform support this feature (e.g., x86-64). > > I took the liberty adding _signed_ DIVMOD nodes (i.e., `DIV_MOD_I` and `DIV_MOD_L`) to the `IRNode` class constants as they are previously missing. Please let me know if they were left intentionally and if there are any other concerns. Thanks! > > ~This will be a draft PR before GHA tests are confirmed passing.~ Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: remove platform restriction ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20877/files - new: https://git.openjdk.org/jdk/pull/20877/files/57e96e10..3584d10c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20877&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20877&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20877.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20877/head:pull/20877 PR: https://git.openjdk.org/jdk/pull/20877 From kxu at openjdk.org Tue Sep 10 16:21:48 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 10 Sep 2024 16:21:48 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v3] In-Reply-To: <2H3OQTQw7DAGcEpg2pD20RJ-bhCfOCx_0qEokIk1GBE=.f29fab37-7941-4ae4-9ffc-8bfe559f9c4c@github.com> References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> <2H3OQTQw7DAGcEpg2pD20RJ-bhCfOCx_0qEokIk1GBE=.f29fab37-7941-4ae4-9ffc-8bfe559f9c4c@github.com> Message-ID: On Tue, 10 Sep 2024 16:08:21 GMT, Emanuel Peter wrote: >> Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into refactor-mod-cases >> - include aarch64 in tests, add more configuration combinations >> - remove redundant arguments, test with -XX:-UseDivMod >> - Merge branch 'master' into refactor-mod-cases >> - Add test and IRNode for signed int/long divmod >> - created IR tests >> - passing tier1 tests >> - refactor divmod ops to handle_div_mod_op > > test/hotspot/jtreg/compiler/c2/TestDivModNodes.java line 38: > >> 36: * @run main/othervm -XX:+UseDivMod compiler.c2.TestDivModNodes >> 37: * @run main/othervm -XX:-UseDivMod compiler.c2.TestDivModNodes >> 38: */ > > Drive by comment: please allow it to run on any platform, and add restrictions on whee the IR rules are run. That allows us to test results on other platforms, which is valuable on its own. Done. Waiting GHA to complete ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1752275174 From rcastanedalo at openjdk.org Tue Sep 10 16:26:58 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 10 Sep 2024 16:26:58 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v19] In-Reply-To: References: Message-ID: <5CDsDluK4CaytgLTPJPuKbz8Ug9mcF10mco0In8ljZM=.3fe332b8-36af-4de5-8cee-73f58a564497@github.com> > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Fix indentation in generate_post_barrier_fast_path Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/94145917..0979e41e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=17-18 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Tue Sep 10 16:26:59 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 10 Sep 2024 16:26:59 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v18] In-Reply-To: <6oxoxxR5yGICT3d_4Wip3egvyFiQLWb3HWMxpL41jwY=.9ca3bb60-1e86-4bd2-a3b4-26eadc409eb5@github.com> References: <6oxoxxR5yGICT3d_4Wip3egvyFiQLWb3HWMxpL41jwY=.9ca3bb60-1e86-4bd2-a3b4-26eadc409eb5@github.com> Message-ID: <7epSurWH76D6t-eSs3neVvSHYRdhdGanYobPU0Y_-SM=.5068c4a5-d220-417d-9d8a-0518bfdc61d8@github.com> On Tue, 10 Sep 2024 13:00:05 GMT, Thomas Schatzl wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion >> - riscv port for JEP 475 > > Marked as reviewed by tschatzl (Reviewer). Thanks for reviewing, @tschatzl! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2341418514 From jkarthikeyan at openjdk.org Tue Sep 10 16:56:10 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 10 Sep 2024 16:56:10 GMT Subject: RFR: 8335444: Generalize implementation of AndNode mul_ring [v4] In-Reply-To: References: <7l_zt0Q884dkzJbP5kjhfvZPmFQ4CMRgakWoUCopfvw=.fc1307d0-6a2c-40cc-adc6-c1b21424d0dd@github.com> Message-ID: On Fri, 6 Sep 2024 04:41:24 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> I've written this patch which improves type calculation for bitwise-and functions. Previously, the only cases that were considered were if one of the inputs to the node were a positive constant. I've generalized this behavior, as well as added a case to better estimate the result for arbitrary ranges. Since these are very common patterns to see, this can help propagate more precise types throughout the ideal graph for a large number of methods, making other optimizations and analyses stronger. I was interested in where this patch improves types, so I ran CTW for `java_base` and `java_base_2` and printed out the differences in this gist [here](https://gist.github.com/jaskarth/b45260d81ab621656f4a55cc51cf5292). While I don't think it's particularly complicated I've also added some discussion of the mathematics below, mostly because I thought it was interesting to work through :) >> >> This patch passes tier1-3 testing on my linux x64 machine. Thoughts and reviews would be very appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Improve cases with two negative ranges, add more documentation Thanks a lot for the testing, and thank you everyone for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20066#issuecomment-2341474901 From jkarthikeyan at openjdk.org Tue Sep 10 16:56:11 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 10 Sep 2024 16:56:11 GMT Subject: Integrated: 8335444: Generalize implementation of AndNode mul_ring In-Reply-To: <7l_zt0Q884dkzJbP5kjhfvZPmFQ4CMRgakWoUCopfvw=.fc1307d0-6a2c-40cc-adc6-c1b21424d0dd@github.com> References: <7l_zt0Q884dkzJbP5kjhfvZPmFQ4CMRgakWoUCopfvw=.fc1307d0-6a2c-40cc-adc6-c1b21424d0dd@github.com> Message-ID: <9cO6aCiMDsjGm1Wc70lSmZIUpdJYnF3478wpLhwR7bI=.f89129ab-aa37-423e-8306-897ed66ed237@github.com> On Mon, 8 Jul 2024 03:37:30 GMT, Jasmine Karthikeyan wrote: > Hi all, > I've written this patch which improves type calculation for bitwise-and functions. Previously, the only cases that were considered were if one of the inputs to the node were a positive constant. I've generalized this behavior, as well as added a case to better estimate the result for arbitrary ranges. Since these are very common patterns to see, this can help propagate more precise types throughout the ideal graph for a large number of methods, making other optimizations and analyses stronger. I was interested in where this patch improves types, so I ran CTW for `java_base` and `java_base_2` and printed out the differences in this gist [here](https://gist.github.com/jaskarth/b45260d81ab621656f4a55cc51cf5292). While I don't think it's particularly complicated I've also added some discussion of the mathematics below, mostly because I thought it was interesting to work through :) > > This patch passes tier1-3 testing on my linux x64 machine. Thoughts and reviews would be very appreciated! This pull request has now been integrated. Changeset: 92431049 Author: Jasmine Karthikeyan URL: https://git.openjdk.org/jdk/commit/92431049fd1838ced2019366b7ccb37547ae6127 Stats: 269 lines in 5 files changed: 219 ins; 37 del; 13 mod 8335444: Generalize implementation of AndNode mul_ring Reviewed-by: chagedorn, qamai, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/20066 From kvn at openjdk.org Tue Sep 10 16:59:12 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 10 Sep 2024 16:59:12 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v15] In-Reply-To: <9rtaON0aEhbYU5SrnW9Gkcr37iB6ZlFpTzdBCMxAXmU=.600eebbd-f964-47b0-9fd2-e3ecc7b50aeb@github.com> References: <9rtaON0aEhbYU5SrnW9Gkcr37iB6ZlFpTzdBCMxAXmU=.600eebbd-f964-47b0-9fd2-e3ecc7b50aeb@github.com> Message-ID: On Tue, 10 Sep 2024 12:17:10 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/rangeinference.cpp line 470: >> >>> 468: // Neither contains each other, weird? >>> 469: // fatal("Integer value range is not subset"); >>> 470: // return this; >> >> This looks like leftover dead code. > > Yes it is, but I don't want to remove it. I think it should be the case the `new_type` contains `old_type`, but apparently sometimes it is not the case and I don't understand it. Returning the bottom type instead of `limit_type` is also weird, too. Unless you want to investigate it this should be removed. File RFE to look on it later. Also you can execute it only in debug VM by using #ifdef ASSERT and `assert()` instead of `fatal()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1752325656 From qamai at openjdk.org Tue Sep 10 18:03:45 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 10 Sep 2024 18:03:45 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v18] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: remove leftover code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/81f4e15b..a77e8f4a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=16-17 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Tue Sep 10 18:03:45 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 10 Sep 2024 18:03:45 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v15] In-Reply-To: References: <9rtaON0aEhbYU5SrnW9Gkcr37iB6ZlFpTzdBCMxAXmU=.600eebbd-f964-47b0-9fd2-e3ecc7b50aeb@github.com> Message-ID: On Tue, 10 Sep 2024 16:56:25 GMT, Vladimir Kozlov wrote: >> Yes it is, but I don't want to remove it. I think it should be the case the `new_type` contains `old_type`, but apparently sometimes it is not the case and I don't understand it. Returning the bottom type instead of `limit_type` is also weird, too. > > Unless you want to investigate it this should be removed. File RFE to look on it later. > Also you can execute it only in debug VM by using #ifdef ASSERT and `assert()` instead of `fatal()`. Done! I don't want to risk breaking anything so I will take a look later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1752437178 From kxu at openjdk.org Tue Sep 10 21:58:41 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 10 Sep 2024 21:58:41 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v5] In-Reply-To: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> Message-ID: <_Klbmlp0Dqw5U579VCM5b19dNb4HZDGsoh6BbQPhJT4=.2d914a97-ee92-4fa8-b12f-71b40c02702d@github.com> > Hello all. This patch addresses [JDK-8332442](https://bugs.openjdk.org/browse/JDK-8332442) and refactors `Op_ModI`/`Op_ModL`/`Op_UModI`/`Op_UModL` cases in `DIVMOD` transforamtions. The purpose of the transformation to convert adjacent div `/` and mod `%` operations of the same operands into one should the platform support this feature (e.g., x86-64). > > I took the liberty adding _signed_ DIVMOD nodes (i.e., `DIV_MOD_I` and `DIV_MOD_L`) to the `IRNode` class constants as they are previously missing. Please let me know if they were left intentionally and if there are any other concerns. Thanks! > > ~This will be a draft PR before GHA tests are confirmed passing.~ Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: allow one div on platforms without hardware divmod ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20877/files - new: https://git.openjdk.org/jdk/pull/20877/files/3584d10c..1ca06ae7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20877&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20877&range=03-04 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/20877.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20877/head:pull/20877 PR: https://git.openjdk.org/jdk/pull/20877 From sdohrmann at openjdk.org Tue Sep 10 22:38:24 2024 From: sdohrmann at openjdk.org (Steve Dohrmann) Date: Tue, 10 Sep 2024 22:38:24 GMT Subject: RFR: 8329035: New Data Destination instructions support [v4] In-Reply-To: References: Message-ID: > Adds assembler support for APX New Data Destination (NDD) and No Flags (NF) features. > > The NDD feature is supported by new functions that take an additional destination-only register operand. If the instruction also supports NF, a no_flags boolean parameter is present. To use these instructions with NF behavior, but without NDD semantics, the same register can be supplied for both the new destination and the (first) source operand. > > Some instructions support NF but not NDD. These instructions have a new function that just adds a boolean no_flags parameter. Existing functions were not overloaded with a boolean here because of signature collisions (bool / int) with functions that take immediate operands. > > All of the new functions have a letter "e" prefix, to avoid signature collisions and to indicate they will be evex encoded. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: remove instr. functions based on review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20698/files - new: https://git.openjdk.org/jdk/pull/20698/files/385ea567..bd2612bc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20698&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20698&range=02-03 Stats: 228 lines in 2 files changed: 0 ins; 225 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20698.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20698/head:pull/20698 PR: https://git.openjdk.org/jdk/pull/20698 From sdohrmann at openjdk.org Tue Sep 10 22:38:25 2024 From: sdohrmann at openjdk.org (Steve Dohrmann) Date: Tue, 10 Sep 2024 22:38:25 GMT Subject: RFR: 8329035: New Data Destination instructions support [v3] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 07:31:25 GMT, Jatin Bhateja wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> refactoring and fixes based on review comments > > src/hotspot/cpu/x86/assembler_x86.cpp line 1420: > >> 1418: emit_arith(0x11, 0xC0, src1, src2); >> 1419: } >> 1420: > > Encodings for ADC looks good, but I could not find uses for these instructions beyond stubs. Do you think we should add them lazily on need basis. Done. Removed the eadc instruction functions. > src/hotspot/cpu/x86/assembler_x86.cpp line 1452: > >> 1450: emit_int8(imm8); >> 1451: } >> 1452: > > Again I could not find any use of scalar 16 bit addition and only one use of 8 bit add. Should be consider adding them lazily. Done. Removed the addb and eaddw functions. > src/hotspot/cpu/x86/assembler_x86.cpp line 1783: > >> 1781: emit_arith(0x23, 0xC0, src1, src2); >> 1782: } >> 1783: > > We can skip adding 8bit and 16bit flavors of AND instructions since they are not being used by compiler / stubs. Done. Removed the eandb and eandw functions. > src/hotspot/cpu/x86/assembler_x86.cpp line 1795: > >> 1793: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_32bit); >> 1794: evex_prefix_ndd(src, dst->encoding(), 0, VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, no_flags); >> 1795: emit_arith_operand(0x81, as_Register(4), src, imm32); > > Suggestion: > > emit_arith_operand(0x81, rsp, src, imm32); Done. > src/hotspot/cpu/x86/assembler_x86.cpp line 1840: > >> 1838: emit_operand(src1, src2, 0); >> 1839: } >> 1840: > > Should we only keep one _eandl (REG, REG, ADR)_ and remove other one _eandl(REG, ADR, REG)_, apart from source operand encodings there is no other difference b/w the two instructions. Done. Removed eandl(reg adr reg) > src/hotspot/cpu/x86/assembler_x86.cpp line 2016: > >> 2014: void Assembler::ecmovl(Condition cc, Register dst, Register src1, Address src2) { >> 2015: InstructionMark im(this); >> 2016: NOT_LP64(guarantee(VM_Version::supports_cmov(), "illegal instruction")); > > APX is only supported by 64 bit (IA-32e mode) targets. Done. > src/hotspot/cpu/x86/assembler_x86.cpp line 2696: > >> 2694: } >> 2695: >> 2696: void Assembler::eidivl(Register src, bool no_flags) { // Unsigned > > Comment should be signed division. Done. > src/hotspot/cpu/x86/assembler_x86.cpp line 6801: > >> 6799: emit_arith_operand(0x81, rbx, src, imm32); >> 6800: } >> 6801: > > Don't find any usage of SUB with borrow in existing code. I think we can defer adding it till we have a use. The ::esall(Reg, Reg, imm) is the ndd version of the ::sall(Reg, imm) function just above it. The ::eshll(Reg, Reg, imm) follows later in the file as the ndd verison of ::shll(Reg, imm). I understand that sal and shl encode the same semantics for a given operand pattern, but both were present in non-ndd form so I added ndd forms. Done. Removed esbb functions. > src/hotspot/cpu/x86/assembler_x86.cpp line 12723: > >> 12721: byte3 |= pre; >> 12722: >> 12723: // P2: byte 4 as zL'Lbv'aaa or 00L0VF00 where V = V4 and F = NF (no flags) > > Suggestion: > > // P2: byte 4 as zL'Lbv'aaa or 00LXVF00 where V = V4, X(extended context) = ND and F = NF (no flags) Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1752874266 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1752874475 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1752874916 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1752875481 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1752876605 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1752876765 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1752876877 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1752877186 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1752872867 From sdohrmann at openjdk.org Tue Sep 10 23:51:21 2024 From: sdohrmann at openjdk.org (Steve Dohrmann) Date: Tue, 10 Sep 2024 23:51:21 GMT Subject: RFR: 8329035: New Data Destination instructions support [v5] In-Reply-To: References: Message-ID: <8BgBNjVmkL4aunknvEVEGydKNJ8hAkFnmpwQD5sYkQs=.e019dc2d-04c0-422a-ada3-22e76042fd62@github.com> > Adds assembler support for APX New Data Destination (NDD) and No Flags (NF) features. > > The NDD feature is supported by new functions that take an additional destination-only register operand. If the instruction also supports NF, a no_flags boolean parameter is present. To use these instructions with NF behavior, but without NDD semantics, the same register can be supplied for both the new destination and the (first) source operand. > > Some instructions support NF but not NDD. These instructions have a new function that just adds a boolean no_flags parameter. Existing functions were not overloaded with a boolean here because of signature collisions (bool / int) with functions that take immediate operands. > > All of the new functions have a letter "e" prefix, to avoid signature collisions and to indicate they will be evex encoded. Steve Dohrmann has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge master - remove instr. functions based on review comments - refactoring and fixes based on review comments - function name changes based on review comments - fix 32-bit build name errors, missing no_flags arg, and addw functions - 8329035: New Data Destination instructions support ------------- Changes: https://git.openjdk.org/jdk/pull/20698/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20698&range=04 Stats: 1547 lines in 2 files changed: 1525 ins; 2 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/20698.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20698/head:pull/20698 PR: https://git.openjdk.org/jdk/pull/20698 From syan at openjdk.org Wed Sep 11 02:15:14 2024 From: syan at openjdk.org (SendaoYan) Date: Wed, 11 Sep 2024 02:15:14 GMT Subject: Integrated: 8339714: Delete tedious bool type define In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 09:50:59 GMT, SendaoYan wrote: > Hi all, > This PR delete tedious bool type define in `src/java.base/unix/native/libjsig/jsig.c` and `src/utils/hsdis/binutils/hsdis-binutils.c`. After JEP 347([JDK-8246032](https://bugs.openjdk.org/browse/JDK-8246032)), I think we can "#include " to use bool type directly, like [string.h](https://github.com/openjdk/jdk/blob/master/src/java.desktop/unix/native/libpipewire/include/spa/utils/string.h#L13) do. > Make code more concision, the risk is quite low. > > Additional testing: > > - [x] Local build with --with-hsdis=binutils --with-binutils=$HOME/software/binutils > - [x] Jtreg tests(include tier1/tier2/tier3 etc.) on linux x64 > - [x] Jtreg tests(include tier1/tier2/tier3 etc.) on linux aarch64 This pull request has now been integrated. Changeset: a6faf824 Author: SendaoYan Committer: David Holmes URL: https://git.openjdk.org/jdk/commit/a6faf8247b58d73dca199fe1e8b0e914c415f67f Stats: 14 lines in 2 files changed: 1 ins; 12 del; 1 mod 8339714: Delete tedious bool type define Reviewed-by: jwaters, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/20909 From syan at openjdk.org Wed Sep 11 02:30:10 2024 From: syan at openjdk.org (SendaoYan) Date: Wed, 11 Sep 2024 02:30:10 GMT Subject: RFR: 8339714: Delete tedious bool type define In-Reply-To: References: Message-ID: <6wPZ40cgyH7U6kJN31R80qymYXzQwTX-ndD-2MKboo0=.9178e030-415a-4b5d-8e13-a298e03e9c4f@github.com> On Mon, 9 Sep 2024 09:50:59 GMT, SendaoYan wrote: > Hi all, > This PR delete tedious bool type define in `src/java.base/unix/native/libjsig/jsig.c` and `src/utils/hsdis/binutils/hsdis-binutils.c`. After JEP 347([JDK-8246032](https://bugs.openjdk.org/browse/JDK-8246032)), I think we can "#include " to use bool type directly, like [string.h](https://github.com/openjdk/jdk/blob/master/src/java.desktop/unix/native/libpipewire/include/spa/utils/string.h#L13) do. > Make code more concision, the risk is quite low. > > Additional testing: > > - [x] Local build with --with-hsdis=binutils --with-binutils=$HOME/software/binutils > - [x] Jtreg tests(include tier1/tier2/tier3 etc.) on linux x64 > - [x] Jtreg tests(include tier1/tier2/tier3 etc.) on linux aarch64 Thanks for the sponsor. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20909#issuecomment-2342489060 From jbhateja at openjdk.org Wed Sep 11 05:35:12 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 11 Sep 2024 05:35:12 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v18] In-Reply-To: References: Message-ID: <5rE5jUqKmzecH6jMAXpaObv9xYRz3Xi1SCvCKhAQJ9o=.010bec0b-856d-4d71-94c8-7e02f0402a4e@github.com> On Tue, 10 Sep 2024 18:03:45 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > remove leftover code I will spend some time to go over your patch in detail, some early comments. src/hotspot/share/opto/rangeinference.hpp line 69: > 67: return (v & _zeros) == 0 && (v & _ones) == _ones; > 68: } > 69: }; It will be good if we add basic operations to KnowBits like. KnownBits.getMaxValue() returning ~ZEROS KnownBits.getMinValue() returning ONE KnownBits.and(KnownBits arg) KnownBits.or(KnownBits arg) KnownBits.xor(KnownBits args) KnownBits.not() These can be quite handy during data flow analysis using KnownBits src/hotspot/share/opto/type.hpp line 661: > 659: // the below constraints, see contains(jint) > 660: const jint _lo, _hi; // Lower bound, upper bound in the signed domain > 661: const juint _ulo, _uhi; // Lower bound, upper bound in the unsigned domain Can't we do without explicit fields to record unsigned hi / lo ? We just need to present a unsigned view of signed _lo and _hi which can be done using safe macros. ------------- PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2293022456 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1753137705 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1753141704 From jbhateja at openjdk.org Wed Sep 11 05:35:13 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 11 Sep 2024 05:35:13 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v17] In-Reply-To: <9SRKk6LjXxgsYSEXaPmexK4haR2N7WlnXVJZW_XTAaE=.b2d0892d-c512-4f62-870d-373481e6309f@github.com> References: <9SRKk6LjXxgsYSEXaPmexK4haR2N7WlnXVJZW_XTAaE=.b2d0892d-c512-4f62-870d-373481e6309f@github.com> Message-ID: On Tue, 10 Sep 2024 12:23:28 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add doc to TypeInt, rename parameters, remove unused methods src/hotspot/share/opto/rangeinference.cpp line 75: > 73: // ones = 1001 > 74: // zero_violation = 0100, i.e the second bit should be zero, but it is 1 in > 75: // lo. Similarly, one_violation = 0001, i.e the forth bit should be one, but Use LSB (right most) / MSB (left most) terminology. In this case LSB should be one src/hotspot/share/opto/rangeinference.cpp line 86: > 84: } > 85: > 86: // The principal here is that, consider the first bit in result that is Suggestion: // The principle here is that, consider the first bit in result that is ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1752284383 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1752285190 From jbhateja at openjdk.org Wed Sep 11 05:36:11 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 11 Sep 2024 05:36:11 GMT Subject: RFR: 8329035: New Data Destination instructions support [v5] In-Reply-To: <8BgBNjVmkL4aunknvEVEGydKNJ8hAkFnmpwQD5sYkQs=.e019dc2d-04c0-422a-ada3-22e76042fd62@github.com> References: <8BgBNjVmkL4aunknvEVEGydKNJ8hAkFnmpwQD5sYkQs=.e019dc2d-04c0-422a-ada3-22e76042fd62@github.com> Message-ID: On Tue, 10 Sep 2024 23:51:21 GMT, Steve Dohrmann wrote: >> Adds assembler support for APX New Data Destination (NDD) and No Flags (NF) features. >> >> The NDD feature is supported by new functions that take an additional destination-only register operand. If the instruction also supports NF, a no_flags boolean parameter is present. To use these instructions with NF behavior, but without NDD semantics, the same register can be supplied for both the new destination and the (first) source operand. >> >> Some instructions support NF but not NDD. These instructions have a new function that just adds a boolean no_flags parameter. Existing functions were not overloaded with a boolean here because of signature collisions (bool / int) with functions that take immediate operands. >> >> All of the new functions have a letter "e" prefix, to avoid signature collisions and to indicate they will be evex encoded. > > Steve Dohrmann has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge master > - remove instr. functions based on review comments > - refactoring and fixes based on review comments > - function name changes based on review comments > - fix 32-bit build name errors, missing no_flags arg, and addw functions > - 8329035: New Data Destination instructions support LGTM. Thanks @steveatgh ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20698#pullrequestreview-2295568614 From chagedorn at openjdk.org Wed Sep 11 06:30:05 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 11 Sep 2024 06:30:05 GMT Subject: RFR: 8339733: C2: some nodes can have incorrect control after do_range_check() In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 08:27:20 GMT, Roland Westrelin wrote: > PhaseIdealLoop::do_range_check() sets the control of the new pre and > main limits to be the entry control of the pre loop but it eliminates > all conditions whose parameters are invariant in the main loop. Most > of the time they are also invariant in the pre loop but that's not > guaranteed. It does happen sometimes that those parameters are pinned > in the pre loop. In that case, PhaseIdealLoop::do_range_check() sets > wrong controls. > > This doesn't cause any issue today AFAICT. > > Also, this seems to be a typo in PhaseIdealLoop::insert_pre_post_loops(): > > > pre_head->in(0) > > > is `pre_head`. I fixed that one too. Overall, the fix looks good! A few suggestions. src/hotspot/share/opto/loopTransform.cpp line 1687: > 1685: > 1686: register_new_node(pre_limit, pre_head->in(LoopNode::EntryControl)); > 1687: register_new_node(pre_opaq , pre_head->in(LoopNode::EntryControl)); Good catch, I've looked at this code many times before but never noticed that. src/hotspot/share/opto/loopTransform.cpp line 2793: > 2791: // to not ever trip end tests > 2792: Node *main_limit = cl->limit(); > 2793: Node* main_limit_c = get_ctrl(main_limit); There seems to be some mix between using `_c` and `_ctrl` as postfix. Should we go with `_ctrl` as postfix everywhere to make it more explicit? src/hotspot/share/opto/loopTransform.cpp line 3095: > 3093: set_ctrl(pre_opaq, new_limit_ctrl); > 3094: set_ctrl(pre_end->in(1), new_limit_ctrl); > 3095: set_ctrl(pre_end->cmp_node(), new_limit_ctrl); Just for better readability, I suggest to flip those such that it matches the top down order opaque -> cmp -> bool. Suggestion: set_ctrl(pre_end->cmp_node(), new_limit_ctrl); set_ctrl(pre_end->in(1), new_limit_ctrl); src/hotspot/share/opto/loopTransform.cpp line 3139: > 3137: set_ctrl(opqzm, new_limit_ctrl); > 3138: set_ctrl(iffm->in(1), new_limit_ctrl); > 3139: set_ctrl(iffm->in(1)->in(1), new_limit_ctrl); Same here (flip) Suggestion: set_ctrl(iffm->in(1)->in(1), new_limit_ctrl); set_ctrl(iffm->in(1), new_limit_ctrl); src/hotspot/share/opto/loopTransform.cpp line 3143: > 3141: > 3142: // Adjust control for node and its inputs (and inputs of its inputs) to be above the pre end > 3143: void PhaseIdealLoop::ensure_node_and_inputs_are_above_pre_end(CountedLoopEndNode* pre_end, Node* node, Node*& control) { `control` should be the `ctrl` of `node` here, right (i.e. `control == get_ctrl(node)`)? Maybe we can assert that. But thinking about it, would it hurt to just call `get_ctrl(node)` again here and remove `control` as parameter? Then we can just return the new control and do offset_c = ensure_node_and_inputs_are_above_pre_end(pre_end, offset); at the call-site. src/hotspot/share/opto/loopTransform.cpp line 3157: > 3155: assert(is_dominator(compute_early_ctrl(n, get_ctrl(n)), pre_end), "node pinned on loop exit test?"); > 3156: set_ctrl(n, control); > 3157: for (uint j = 0; j < n->req(); ++j) { For consistency: Suggestion: for (uint j = 0; j < n->req(); j++) { src/hotspot/share/opto/loopnode.hpp line 1188: > 1186: > 1187: Node* dominated_node(Node* c1, Node* c2) { > 1188: return is_dominator(c1, c2) ? c2 : c1; Should we also assert here that `is_dominator(c2, c1)`? ------------- PR Review: https://git.openjdk.org/jdk/pull/20908#pullrequestreview-2295482534 PR Review Comment: https://git.openjdk.org/jdk/pull/20908#discussion_r1753130481 PR Review Comment: https://git.openjdk.org/jdk/pull/20908#discussion_r1753131025 PR Review Comment: https://git.openjdk.org/jdk/pull/20908#discussion_r1753182663 PR Review Comment: https://git.openjdk.org/jdk/pull/20908#discussion_r1753194223 PR Review Comment: https://git.openjdk.org/jdk/pull/20908#discussion_r1753168617 PR Review Comment: https://git.openjdk.org/jdk/pull/20908#discussion_r1753163523 PR Review Comment: https://git.openjdk.org/jdk/pull/20908#discussion_r1753138453 From roland at openjdk.org Wed Sep 11 07:15:07 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 11 Sep 2024 07:15:07 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v5] In-Reply-To: <_Klbmlp0Dqw5U579VCM5b19dNb4HZDGsoh6BbQPhJT4=.2d914a97-ee92-4fa8-b12f-71b40c02702d@github.com> References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> <_Klbmlp0Dqw5U579VCM5b19dNb4HZDGsoh6BbQPhJT4=.2d914a97-ee92-4fa8-b12f-71b40c02702d@github.com> Message-ID: On Tue, 10 Sep 2024 21:58:41 GMT, Kangcheng Xu wrote: >> Hello all. This patch addresses [JDK-8332442](https://bugs.openjdk.org/browse/JDK-8332442) and refactors `Op_ModI`/`Op_ModL`/`Op_UModI`/`Op_UModL` cases in `DIVMOD` transforamtions. The purpose of the transformation to convert adjacent div `/` and mod `%` operations of the same operands into one should the platform support this feature (e.g., x86-64). >> >> I took the liberty adding _signed_ DIVMOD nodes (i.e., `DIV_MOD_I` and `DIV_MOD_L`) to the `IRNode` class constants as they are previously missing. Please let me know if they were left intentionally and if there are any other concerns. Thanks! >> >> ~This will be a draft PR before GHA tests are confirmed passing.~ > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > allow one div on platforms without hardware divmod Thanks for making the changes. Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20877#pullrequestreview-2295867244 From roland at openjdk.org Wed Sep 11 07:35:39 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 11 Sep 2024 07:35:39 GMT Subject: RFR: 8339733: C2: some nodes can have incorrect control after do_range_check() [v2] In-Reply-To: References: Message-ID: > PhaseIdealLoop::do_range_check() sets the control of the new pre and > main limits to be the entry control of the pre loop but it eliminates > all conditions whose parameters are invariant in the main loop. Most > of the time they are also invariant in the pre loop but that's not > guaranteed. It does happen sometimes that those parameters are pinned > in the pre loop. In that case, PhaseIdealLoop::do_range_check() sets > wrong controls. > > This doesn't cause any issue today AFAICT. > > Also, this seems to be a typo in PhaseIdealLoop::insert_pre_post_loops(): > > > pre_head->in(0) > > > is `pre_head`. I fixed that one too. Roland Westrelin has updated the pull request incrementally with three additional commits since the last revision: - Update src/hotspot/share/opto/loopTransform.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopTransform.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopTransform.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20908/files - new: https://git.openjdk.org/jdk/pull/20908/files/387194b3..24889f3b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20908&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20908&range=00-01 Stats: 5 lines in 1 file changed: 2 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20908.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20908/head:pull/20908 PR: https://git.openjdk.org/jdk/pull/20908 From roland at openjdk.org Wed Sep 11 08:10:45 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 11 Sep 2024 08:10:45 GMT Subject: RFR: 8339733: C2: some nodes can have incorrect control after do_range_check() [v3] In-Reply-To: References: Message-ID: > PhaseIdealLoop::do_range_check() sets the control of the new pre and > main limits to be the entry control of the pre loop but it eliminates > all conditions whose parameters are invariant in the main loop. Most > of the time they are also invariant in the pre loop but that's not > guaranteed. It does happen sometimes that those parameters are pinned > in the pre loop. In that case, PhaseIdealLoop::do_range_check() sets > wrong controls. > > This doesn't cause any issue today AFAICT. > > Also, this seems to be a typo in PhaseIdealLoop::insert_pre_post_loops(): > > > pre_head->in(0) > > > is `pre_head`. I fixed that one too. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20908/files - new: https://git.openjdk.org/jdk/pull/20908/files/24889f3b..03d80a49 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20908&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20908&range=01-02 Stats: 21 lines in 2 files changed: 3 ins; 0 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/20908.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20908/head:pull/20908 PR: https://git.openjdk.org/jdk/pull/20908 From roland at openjdk.org Wed Sep 11 08:10:45 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 11 Sep 2024 08:10:45 GMT Subject: RFR: 8339733: C2: some nodes can have incorrect control after do_range_check() [v3] In-Reply-To: References: Message-ID: <5xx1JAsE-fSgFf__Qjhw03dT8_v8s3OlErfLkLxQU7o=.e317426c-faab-4ea1-928c-8ca9fbc5e40e@github.com> On Wed, 11 Sep 2024 06:27:16 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > Overall, the fix looks good! A few suggestions. @chhagedorn thanks for reviewing this. I pushed a commit that should address all your comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20908#issuecomment-2342947766 From rcastanedalo at openjdk.org Wed Sep 11 08:30:02 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 11 Sep 2024 08:30:02 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v20] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Fix a few style issues ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/0979e41e..141020e6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=18-19 Stats: 7 lines in 3 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Wed Sep 11 08:32:12 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 11 Sep 2024 08:32:12 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v19] In-Reply-To: <5CDsDluK4CaytgLTPJPuKbz8Ug9mcF10mco0In8ljZM=.3fe332b8-36af-4de5-8cee-73f58a564497@github.com> References: <5CDsDluK4CaytgLTPJPuKbz8Ug9mcF10mco0In8ljZM=.3fe332b8-36af-4de5-8cee-73f58a564497@github.com> Message-ID: <8-IYniHv9GgBnsv9w3GggGF1mKKf3MfwxIxGIjEUh3c=.446607ac-5624-4c16-a1a5-a29187526023@github.com> On Tue, 10 Sep 2024 16:26:58 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Fix indentation in generate_post_barrier_fast_path > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> I just fixed a few more indentation and code style glitches found by clang-format in commit 141020e6 (thanks @dlunde for helping with the setup). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2342993484 From jsjolen at openjdk.org Wed Sep 11 08:41:10 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 11 Sep 2024 08:41:10 GMT Subject: RFR: 8339242: Fix overflow issues in AdlArena [v5] In-Reply-To: <3B5BaULSyseHlT_yYiTogHQoMpY9vKQ54MXysGB6eIE=.e37f9f31-96c7-4c3e-8d15-bbd3c97ec5db@github.com> References: <3B5BaULSyseHlT_yYiTogHQoMpY9vKQ54MXysGB6eIE=.e37f9f31-96c7-4c3e-8d15-bbd3c97ec5db@github.com> Message-ID: <_EAtdYmGvUCfcuP1LlfpU7SxLQIJmuWDnKajOWy96a0=.b9f0869b-2810-4ca2-b326-0f9271729909@github.com> On Fri, 6 Sep 2024 08:52:07 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> This PR addresses an issue in `adlArena` where some allocations lack checks for overflow. This could potentially result in successful allocations when called with unrealistic values. >> >> The fix includes: >> >> - Adding assertions to check for potential overflow. >> - Reordering some operations to guard against overflow. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > assert + static pointer_delta fun Still looks good to me, ship it! ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20774#pullrequestreview-2296078799 From duke at openjdk.org Wed Sep 11 08:44:06 2024 From: duke at openjdk.org (duke) Date: Wed, 11 Sep 2024 08:44:06 GMT Subject: RFR: 8339242: Fix overflow issues in AdlArena [v5] In-Reply-To: <3B5BaULSyseHlT_yYiTogHQoMpY9vKQ54MXysGB6eIE=.e37f9f31-96c7-4c3e-8d15-bbd3c97ec5db@github.com> References: <3B5BaULSyseHlT_yYiTogHQoMpY9vKQ54MXysGB6eIE=.e37f9f31-96c7-4c3e-8d15-bbd3c97ec5db@github.com> Message-ID: On Fri, 6 Sep 2024 08:52:07 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> This PR addresses an issue in `adlArena` where some allocations lack checks for overflow. This could potentially result in successful allocations when called with unrealistic values. >> >> The fix includes: >> >> - Adding assertions to check for potential overflow. >> - Reordering some operations to guard against overflow. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > assert + static pointer_delta fun @caspernorrbin Your change (at version 41804e2e93cac96e5773ca7db91dcc8dc7a535d2) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20774#issuecomment-2343018604 From duke at openjdk.org Wed Sep 11 08:44:05 2024 From: duke at openjdk.org (Casper Norrbin) Date: Wed, 11 Sep 2024 08:44:05 GMT Subject: RFR: 8339242: Fix overflow issues in AdlArena [v5] In-Reply-To: <3B5BaULSyseHlT_yYiTogHQoMpY9vKQ54MXysGB6eIE=.e37f9f31-96c7-4c3e-8d15-bbd3c97ec5db@github.com> References: <3B5BaULSyseHlT_yYiTogHQoMpY9vKQ54MXysGB6eIE=.e37f9f31-96c7-4c3e-8d15-bbd3c97ec5db@github.com> Message-ID: On Fri, 6 Sep 2024 08:52:07 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> This PR addresses an issue in `adlArena` where some allocations lack checks for overflow. This could potentially result in successful allocations when called with unrealistic values. >> >> The fix includes: >> >> - Adding assertions to check for potential overflow. >> - Reordering some operations to guard against overflow. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > assert + static pointer_delta fun Thank you everyone for the discussion and the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20774#issuecomment-2343018081 From duke at openjdk.org Wed Sep 11 08:49:13 2024 From: duke at openjdk.org (Casper Norrbin) Date: Wed, 11 Sep 2024 08:49:13 GMT Subject: Integrated: 8339242: Fix overflow issues in AdlArena In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 15:07:46 GMT, Casper Norrbin wrote: > Hi everyone, > > This PR addresses an issue in `adlArena` where some allocations lack checks for overflow. This could potentially result in successful allocations when called with unrealistic values. > > The fix includes: > > - Adding assertions to check for potential overflow. > - Reordering some operations to guard against overflow. This pull request has now been integrated. Changeset: 0b3f2e64 Author: Casper Norrbin Committer: Johan Sj?len URL: https://git.openjdk.org/jdk/commit/0b3f2e64e83b589115989f9d14a6c644bc3008aa Stats: 48 lines in 3 files changed: 21 ins; 14 del; 13 mod 8339242: Fix overflow issues in AdlArena Reviewed-by: jsjolen, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/20774 From chagedorn at openjdk.org Wed Sep 11 09:28:11 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 11 Sep 2024 09:28:11 GMT Subject: RFR: 8339733: C2: some nodes can have incorrect control after do_range_check() [v3] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 08:10:45 GMT, Roland Westrelin wrote: >> PhaseIdealLoop::do_range_check() sets the control of the new pre and >> main limits to be the entry control of the pre loop but it eliminates >> all conditions whose parameters are invariant in the main loop. Most >> of the time they are also invariant in the pre loop but that's not >> guaranteed. It does happen sometimes that those parameters are pinned >> in the pre loop. In that case, PhaseIdealLoop::do_range_check() sets >> wrong controls. >> >> This doesn't cause any issue today AFAICT. >> >> Also, this seems to be a typo in PhaseIdealLoop::insert_pre_post_loops(): >> >> >> pre_head->in(0) >> >> >> is `pre_head`. I fixed that one too. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks for the update, looks good! src/hotspot/share/opto/loopTransform.cpp line 2894: > 2892: int scale_con= 1; // Assume trip counter not scaled > 2893: > 2894: Node *limit_ctrl = get_ctrl(limit); While at it: Suggestion: Node* limit_ctrl = get_ctrl(limit); src/hotspot/share/opto/loopTransform.cpp line 2923: > 2921: } > 2922: > 2923: Node *offset_ctrl = get_ctrl(offset); Suggestion: Node* offset_ctrl = get_ctrl(offset); ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20908#pullrequestreview-2296188147 PR Review Comment: https://git.openjdk.org/jdk/pull/20908#discussion_r1753768884 PR Review Comment: https://git.openjdk.org/jdk/pull/20908#discussion_r1753770237 From roland at openjdk.org Wed Sep 11 11:21:41 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 11 Sep 2024 11:21:41 GMT Subject: RFR: 8339733: C2: some nodes can have incorrect control after do_range_check() [v4] In-Reply-To: References: Message-ID: > PhaseIdealLoop::do_range_check() sets the control of the new pre and > main limits to be the entry control of the pre loop but it eliminates > all conditions whose parameters are invariant in the main loop. Most > of the time they are also invariant in the pre loop but that's not > guaranteed. It does happen sometimes that those parameters are pinned > in the pre loop. In that case, PhaseIdealLoop::do_range_check() sets > wrong controls. > > This doesn't cause any issue today AFAICT. > > Also, this seems to be a typo in PhaseIdealLoop::insert_pre_post_loops(): > > > pre_head->in(0) > > > is `pre_head`. I fixed that one too. Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/opto/loopTransform.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopTransform.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20908/files - new: https://git.openjdk.org/jdk/pull/20908/files/03d80a49..2f4e0763 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20908&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20908&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20908.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20908/head:pull/20908 PR: https://git.openjdk.org/jdk/pull/20908 From roland at openjdk.org Wed Sep 11 11:21:41 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 11 Sep 2024 11:21:41 GMT Subject: RFR: 8339733: C2: some nodes can have incorrect control after do_range_check() [v3] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 09:25:50 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > Thanks for the update, looks good! @chhagedorn Updated with your latest tweaks. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20908#issuecomment-2343351045 From chagedorn at openjdk.org Wed Sep 11 12:03:09 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 11 Sep 2024 12:03:09 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v5] In-Reply-To: <_Klbmlp0Dqw5U579VCM5b19dNb4HZDGsoh6BbQPhJT4=.2d914a97-ee92-4fa8-b12f-71b40c02702d@github.com> References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> <_Klbmlp0Dqw5U579VCM5b19dNb4HZDGsoh6BbQPhJT4=.2d914a97-ee92-4fa8-b12f-71b40c02702d@github.com> Message-ID: On Tue, 10 Sep 2024 21:58:41 GMT, Kangcheng Xu wrote: >> Hello all. This patch addresses [JDK-8332442](https://bugs.openjdk.org/browse/JDK-8332442) and refactors `Op_ModI`/`Op_ModL`/`Op_UModI`/`Op_UModL` cases in `DIVMOD` transforamtions. The purpose of the transformation to convert adjacent div `/` and mod `%` operations of the same operands into one should the platform support this feature (e.g., x86-64). >> >> I took the liberty adding _signed_ DIVMOD nodes (i.e., `DIV_MOD_I` and `DIV_MOD_L`) to the `IRNode` class constants as they are previously missing. Please let me know if they were left intentionally and if there are any other concerns. Thanks! >> >> ~This will be a draft PR before GHA tests are confirmed passing.~ > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > allow one div on platforms without hardware divmod Generally the clean-up idea is good! A few comments. src/hotspot/share/opto/compile.cpp line 3169: > 3167: } > 3168: > 3169: // Check if a%b and a/b both exist I suggest to add whitespaces for better readability (same below): Suggestion: // Check if "a % b" and "a / b" both exist src/hotspot/share/opto/compile.cpp line 3171: > 3169: // Check if a%b and a/b both exist > 3170: Node* d = n->find_similar(Op_DivIL(bt, is_unsigned)); > 3171: if (!d) { You should directly compare against `nullptr`: Suggestion: if (d == nullptr) { src/hotspot/share/opto/compile.cpp line 3181: > 3179: n->subsume_by(divmod->mod_proj(), this); > 3180: } else { > 3181: // replace a%b with a-((a/b)*b) Suggestion: // Replace "a % b" with "a - ((a / b) * b)" test/hotspot/jtreg/compiler/c2/TestDivModNodes.java line 33: > 31: /* > 32: * @test > 33: * @summary Test DIV and MOD nodes are converted into DIVMOD where possible Suggestion: * @summary Test that DIV and MOD nodes are converted into DIVMOD where possible test/hotspot/jtreg/compiler/c2/TestDivModNodes.java line 36: > 34: * @library /test/lib / > 35: * @run main/othervm -XX:+UseDivMod compiler.c2.TestDivModNodes > 36: * @run main/othervm -XX:-UseDivMod compiler.c2.TestDivModNodes You should not pass the flags like that to the IR framework tests. It will not take these flags into account. What you should do instead is using the IR framework provided methods: * @run driver compiler.c2.TestDivModNodes ... public static void main(String[] args) { TestFramework.runWithFlags("-XX:-UseDivMod"); TestFramework.runWithFlags("-XX:+UseDivMod"); } Note that `UseDivMod` is a C2 specific flag. So, you should also add a `@requires vm.compiler2.enabled`. test/hotspot/jtreg/compiler/c2/TestDivModNodes.java line 58: > 56: > 57: verifyResult(dividend, divisor, q, r, TestDivModNodes::signedIntDiv, TestDivModNodes::signedIntMod); > 58: } Couldn't you just make `q` and `r` fields and then check them from within a `@Run` method? Something like that (not tested): static final Random RANDOM = AbstractInfo.getRandom(); int divResult; int modResult; ... @Test @IR(...) void testSignedIntDivMod(int dividend, int divisor) { divResult = dividend / divisor; modResult = dividend % divisor; } @Run(test="testSignedIntDivMod") void run() { int q = RANDOM.nextInt(); int r = RANDOM.nextInt(); testSignedIntDivMod(q, r); Asserts.assertEQ(q / r, divResult); Asserts.assertEQ(q % r, modResult); } Since the IR framework is single threaded, it should not be a problem to pass the result like that over fields. ------------- Changes requested by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20877#pullrequestreview-2296498167 PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1754117764 PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1754124805 PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1754118969 PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1754132715 PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1754157503 PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1754248381 From chagedorn at openjdk.org Wed Sep 11 12:05:05 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 11 Sep 2024 12:05:05 GMT Subject: RFR: 8339733: C2: some nodes can have incorrect control after do_range_check() [v4] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 11:21:41 GMT, Roland Westrelin wrote: >> PhaseIdealLoop::do_range_check() sets the control of the new pre and >> main limits to be the entry control of the pre loop but it eliminates >> all conditions whose parameters are invariant in the main loop. Most >> of the time they are also invariant in the pre loop but that's not >> guaranteed. It does happen sometimes that those parameters are pinned >> in the pre loop. In that case, PhaseIdealLoop::do_range_check() sets >> wrong controls. >> >> This doesn't cause any issue today AFAICT. >> >> Also, this seems to be a typo in PhaseIdealLoop::insert_pre_post_loops(): >> >> >> pre_head->in(0) >> >> >> is `pre_head`. I fixed that one too. > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/share/opto/loopTransform.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopTransform.cpp > > Co-authored-by: Christian Hagedorn That looks good to me, thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20908#pullrequestreview-2296682919 From yzheng at openjdk.org Wed Sep 11 13:21:42 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 11 Sep 2024 13:21:42 GMT Subject: RFR: 8339939: [JVMCI] Don't compress abstract and interface Klasses Message-ID: https://github.com/openjdk/jdk/pull/19157 disallows storing abstract and interface Klasses in class metaspace. JVMCI has to respect this and avoids compressing abstract and interface Klasses ------------- Commit messages: - trim trailing whitespace - make JVMCI aware that some klass pointers are not compressible Changes: https://git.openjdk.org/jdk/pull/20949/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20949&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339939 Stats: 63 lines in 7 files changed: 56 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/20949.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20949/head:pull/20949 PR: https://git.openjdk.org/jdk/pull/20949 From dnsimon at openjdk.org Wed Sep 11 14:01:06 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 11 Sep 2024 14:01:06 GMT Subject: RFR: 8339939: [JVMCI] Don't compress abstract and interface Klasses In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 13:09:07 GMT, Yudi Zheng wrote: > https://github.com/openjdk/jdk/pull/19157 disallows storing abstract and interface Klasses in class metaspace. JVMCI has to respect this and avoids compressing abstract and interface Klasses Marked as reviewed by dnsimon (Reviewer). src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstant.java line 29: > 27: /** > 28: * Marker interface for hotspot specific constants. > 29: */ Let's take this opportunity to improve this javadoc: /** * A value in a space managed by Hotspot (e.g. heap or metaspace). * Some of these values can be referenced with a compressed pointer (32 bits) * instead of a full word-sized pointer. */ ------------- PR Review: https://git.openjdk.org/jdk/pull/20949#pullrequestreview-2297174735 PR Review Comment: https://git.openjdk.org/jdk/pull/20949#discussion_r1754641618 From thartmann at openjdk.org Wed Sep 11 14:22:35 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 11 Sep 2024 14:22:35 GMT Subject: RFR: 8338566: Missing membar in ciEnv::get_or_create_exception before publishing handle Message-ID: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> Similar to [JDK-8251923](https://bugs.openjdk.org/browse/JDK-8251923), we need a store-store barrier before publishing a handle because otherwise another thread could observe the handle before it's fully initialized and read null from it. This affects architectures with a weak memory model like AArch64. Unfortunately, this only happened twice in our testing and I was never able to reproduce it. Thanks, Tobias ------------- Commit messages: - 8338566: Missing membar in ciEnv::get_or_create_exception before publishing handle Changes: https://git.openjdk.org/jdk/pull/20950/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20950&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338566 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20950.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20950/head:pull/20950 PR: https://git.openjdk.org/jdk/pull/20950 From shade at openjdk.org Wed Sep 11 14:31:10 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 11 Sep 2024 14:31:10 GMT Subject: RFR: 8338566: Missing membar in ciEnv::get_or_create_exception before publishing handle In-Reply-To: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> References: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> Message-ID: <4SqCF1g5oIHFq9Zudal8ynqtpduCkphdl-fvYFtpUqc=.ac02842f-2048-464e-809d-ce9f77eba458@github.com> On Wed, 11 Sep 2024 14:17:30 GMT, Tobias Hartmann wrote: > Similar to [JDK-8251923](https://bugs.openjdk.org/browse/JDK-8251923), we need a store-store barrier before publishing a handle because otherwise another thread could observe the handle before it's fully initialized and read null from it. This affects architectures with a weak memory model like AArch64. > > Unfortunately, this only happened twice in our testing and I was never able to reproduce it. > > Thanks, > Tobias It looks to me the issue is not exactly in CI, but in the fact that we can publish a racily constructed global JNI handle. If so, shouldn't we be adding `storestore` at the exit path in `JNIHandles::make_global` to be absolutely sure we covered all uses like these? ------------- PR Review: https://git.openjdk.org/jdk/pull/20950#pullrequestreview-2297315445 From gdub at openjdk.org Wed Sep 11 14:38:05 2024 From: gdub at openjdk.org (Gilles Duboscq) Date: Wed, 11 Sep 2024 14:38:05 GMT Subject: RFR: 8339939: [JVMCI] Don't compress abstract and interface Klasses In-Reply-To: References: Message-ID: <5Hae284Qb3b8eW5zJUliUCw9HqdUBZV3wkZ6tCzTnpg=.342e236a-d7aa-4783-bc41-f4754efd48a7@github.com> On Wed, 11 Sep 2024 13:09:07 GMT, Yudi Zheng wrote: > https://github.com/openjdk/jdk/pull/19157 disallows storing abstract and interface Klasses in class metaspace. JVMCI has to respect this and avoids compressing abstract and interface Klasses src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstant.java line 40: > 38: * Determines if this constant is compressible. > 39: */ > 40: boolean isCompressible(); It might be worth adding a note about the fact that even if this returns true, `compress()` might still throw `IllegalArgumentException` if `isCompressed()` is also true. Or is might be a bit more intuitive to ask for the invariant that `isCompressible()` should return `false` if `isCompressed()` is true, an reword the javadoc below. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20949#discussion_r1754773712 From thartmann at openjdk.org Wed Sep 11 14:41:04 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 11 Sep 2024 14:41:04 GMT Subject: RFR: 8338566: Missing membar in ciEnv::get_or_create_exception before publishing handle In-Reply-To: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> References: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> Message-ID: On Wed, 11 Sep 2024 14:17:30 GMT, Tobias Hartmann wrote: > Similar to [JDK-8251923](https://bugs.openjdk.org/browse/JDK-8251923), we need a store-store barrier before publishing a handle because otherwise another thread could observe the handle before it's fully initialized and read null from it. This affects architectures with a weak memory model like AArch64. > > Unfortunately, this only happened twice in our testing and I was never able to reproduce it. > > Thanks, > Tobias Right, I think it depends on if we want to make `JNIHandles::make_global` thread-safe or not. There are quite a few uses of `JNIHandles::make_global` and I don't know how performance sensitive these are. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20950#issuecomment-2343869620 From shade at openjdk.org Wed Sep 11 14:45:05 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 11 Sep 2024 14:45:05 GMT Subject: RFR: 8338566: Missing membar in ciEnv::get_or_create_exception before publishing handle In-Reply-To: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> References: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> Message-ID: On Wed, 11 Sep 2024 14:17:30 GMT, Tobias Hartmann wrote: > Similar to [JDK-8251923](https://bugs.openjdk.org/browse/JDK-8251923), we need a store-store barrier before publishing a handle because otherwise another thread could observe the handle before it's fully initialized and read null from it. This affects architectures with a weak memory model like AArch64. > > Unfortunately, this only happened twice in our testing and I was never able to reproduce it. > > Thanks, > Tobias I think we better fix it in `JNIHandles::make_global`, since both [JDK-8251923](https://bugs.openjdk.org/browse/JDK-8251923) and this bug shows how fragile and hard to reproduce this failure mode is. Let's hear from others? I don't mind this PR to go in, if we follow it with the RFE that replaces the per-use `storestore`-s with a (pun intended) more global one. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20950#issuecomment-2343878865 From thartmann at openjdk.org Wed Sep 11 14:57:04 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 11 Sep 2024 14:57:04 GMT Subject: RFR: 8338566: Missing membar in ciEnv::get_or_create_exception before publishing handle In-Reply-To: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> References: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> Message-ID: On Wed, 11 Sep 2024 14:17:30 GMT, Tobias Hartmann wrote: > Similar to [JDK-8251923](https://bugs.openjdk.org/browse/JDK-8251923), we need a store-store barrier before publishing a handle because otherwise another thread could observe the handle before it's fully initialized and read null from it. This affects architectures with a weak memory model like AArch64. > > Unfortunately, this only happened twice in our testing and I was never able to reproduce it. > > Thanks, > Tobias Right, fine with me. Let's see what others think. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20950#issuecomment-2343911536 From kxu at openjdk.org Wed Sep 11 15:57:29 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 11 Sep 2024 15:57:29 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v6] In-Reply-To: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> Message-ID: > Hello all. This patch addresses [JDK-8332442](https://bugs.openjdk.org/browse/JDK-8332442) and refactors `Op_ModI`/`Op_ModL`/`Op_UModI`/`Op_UModL` cases in `DIVMOD` transforamtions. The purpose of the transformation to convert adjacent div `/` and mod `%` operations of the same operands into one should the platform support this feature (e.g., x86-64). > > I took the liberty adding _signed_ DIVMOD nodes (i.e., `DIV_MOD_I` and `DIV_MOD_L`) to the `IRNode` class constants as they are previously missing. Please let me know if they were left intentionally and if there are any other concerns. Thanks! > > ~This will be a draft PR before GHA tests are confirmed passing.~ Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Merge branch 'openjdk:master' into refactor-mod-cases - improve formatting, nullptr comparison, update test flags, use custom @Run tests - allow one div on platforms without hardware divmod - remove platform restriction - Merge branch 'openjdk:master' into refactor-mod-cases - include aarch64 in tests, add more configuration combinations - remove redundant arguments, test with -XX:-UseDivMod - Merge branch 'master' into refactor-mod-cases - Add test and IRNode for signed int/long divmod - created IR tests - ... and 2 more: https://git.openjdk.org/jdk/compare/d66e0754...fe7b82ef ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20877/files - new: https://git.openjdk.org/jdk/pull/20877/files/1ca06ae7..fe7b82ef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20877&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20877&range=04-05 Stats: 1560 lines in 50 files changed: 946 ins; 343 del; 271 mod Patch: https://git.openjdk.org/jdk/pull/20877.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20877/head:pull/20877 PR: https://git.openjdk.org/jdk/pull/20877 From kxu at openjdk.org Wed Sep 11 15:57:29 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 11 Sep 2024 15:57:29 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v5] In-Reply-To: References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> <_Klbmlp0Dqw5U579VCM5b19dNb4HZDGsoh6BbQPhJT4=.2d914a97-ee92-4fa8-b12f-71b40c02702d@github.com> Message-ID: <_uoAlfUixm7tQDpUrss-eLiXD2k5aZWqkGiSu3BMULQ=.2d40079c-b44a-4cb1-a0a5-3428e19e6644@github.com> On Wed, 11 Sep 2024 12:00:46 GMT, Christian Hagedorn wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> allow one div on platforms without hardware divmod > > Generally the clean-up idea is good! A few comments. Updated PR per @chhagedorn's requests. Waiting for GHA to complete. > test/hotspot/jtreg/compiler/c2/TestDivModNodes.java line 36: > >> 34: * @library /test/lib / >> 35: * @run main/othervm -XX:+UseDivMod compiler.c2.TestDivModNodes >> 36: * @run main/othervm -XX:-UseDivMod compiler.c2.TestDivModNodes > > You should not pass the flags like that to the IR framework tests. It will not take these flags into account. What you should do instead is using the IR framework provided methods: > > > * @run driver compiler.c2.TestDivModNodes > ... > public static void main(String[] args) { > TestFramework.runWithFlags("-XX:-UseDivMod"); > TestFramework.runWithFlags("-XX:+UseDivMod"); > } > > Note that `UseDivMod` is a C2 specific flag. So, you should also add a `@requires vm.compiler2.enabled`. Thanks for pointing out! This is very critical. I've updated tests with `runWithFlags()` > test/hotspot/jtreg/compiler/c2/TestDivModNodes.java line 58: > >> 56: >> 57: verifyResult(dividend, divisor, q, r, TestDivModNodes::signedIntDiv, TestDivModNodes::signedIntMod); >> 58: } > > Couldn't you just make `q` and `r` fields and then check them from within a `@Run` method? > > Something like that (not tested): > > static final Random RANDOM = AbstractInfo.getRandom(); > int divResult; > int modResult; > ... > @Test > @IR(...) > void testSignedIntDivMod(int dividend, int divisor) { > divResult = dividend / divisor; > modResult = dividend % divisor; > } > > @Run(test="testSignedIntDivMod") > void run() { > int q = RANDOM.nextInt(); > int r = RANDOM.nextInt(); > testSignedIntDivMod(q, r); > Asserts.assertEQ(q / r, divResult); > Asserts.assertEQ(q % r, modResult); > } > > Since the IR framework is single threaded, it should not be a problem to pass the result like that over fields. Good point. I was aiming for generated IR to be as simple as possible. This is even better. I adopted your suggestions, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20877#issuecomment-2344054965 PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1755040226 PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1755045657 From kvn at openjdk.org Wed Sep 11 16:00:07 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 11 Sep 2024 16:00:07 GMT Subject: RFR: 8338566: Missing membar in ciEnv::get_or_create_exception before publishing handle In-Reply-To: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> References: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> Message-ID: On Wed, 11 Sep 2024 14:17:30 GMT, Tobias Hartmann wrote: > Similar to [JDK-8251923](https://bugs.openjdk.org/browse/JDK-8251923), we need a store-store barrier before publishing a handle because otherwise another thread could observe the handle before it's fully initialized and read null from it. This affects architectures with a weak memory model like AArch64. > > Unfortunately, this only happened twice in our testing and I was never able to reproduce it. > > Thanks, > Tobias Good ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20950#pullrequestreview-2297722847 From kvn at openjdk.org Wed Sep 11 17:51:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 11 Sep 2024 17:51:11 GMT Subject: RFR: 8329035: New Data Destination instructions support [v5] In-Reply-To: <8BgBNjVmkL4aunknvEVEGydKNJ8hAkFnmpwQD5sYkQs=.e019dc2d-04c0-422a-ada3-22e76042fd62@github.com> References: <8BgBNjVmkL4aunknvEVEGydKNJ8hAkFnmpwQD5sYkQs=.e019dc2d-04c0-422a-ada3-22e76042fd62@github.com> Message-ID: On Tue, 10 Sep 2024 23:51:21 GMT, Steve Dohrmann wrote: >> Adds assembler support for APX New Data Destination (NDD) and No Flags (NF) features. >> >> The NDD feature is supported by new functions that take an additional destination-only register operand. If the instruction also supports NF, a no_flags boolean parameter is present. To use these instructions with NF behavior, but without NDD semantics, the same register can be supplied for both the new destination and the (first) source operand. >> >> Some instructions support NF but not NDD. These instructions have a new function that just adds a boolean no_flags parameter. Existing functions were not overloaded with a boolean here because of signature collisions (bool / int) with functions that take immediate operands. >> >> All of the new functions have a letter "e" prefix, to avoid signature collisions and to indicate they will be evex encoded. > > Steve Dohrmann has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge master > - remove instr. functions based on review comments > - refactoring and fixes based on review comments > - function name changes based on review comments > - fix 32-bit build name errors, missing no_flags arg, and addw functions > - 8329035: New Data Destination instructions support I will start our testing and do review later before approval. Please, wait @steveatgh ------------- PR Comment: https://git.openjdk.org/jdk/pull/20698#issuecomment-2344316158 From simonis at openjdk.org Wed Sep 11 18:54:33 2024 From: simonis at openjdk.org (Volker Simonis) Date: Wed, 11 Sep 2024 18:54:33 GMT Subject: RFR: 8339954: Print JVMCI names with the Compiler.{permap,codelist,CodeHeap_Analytics} diagnostic commands Message-ID: The diagnostic commands `Compiler.codelist`, `Compiler.CodeHeap_Analytics` and `Compiler.perfmap` are handy for analyzing the CodeCache or creating a symbol file for the perf tool. However, with the Truffle framework which uses the GraalVM compiler in "hosted" mode, we can end up with hundreds if not thousands of nmethods which are all linked to the same Java method (most prominently `com.oracle.truffle.runtime.OptimizedCallTarget.profiledPERoot()`). All these nmethods are currently indistinguishable by the two aforementioned diagnostic commands. But nmethods compiled by the GraalVM compiler have a special "JVMCI name" attached to them, which in the case of Truffle corresponds to the guest language function name. Printing this "JVMCI name" in addition to the true Java method name makes it easier to distinguish various nmethods compiled by Truffle or other frameworks which use the GraalVM compiler in hosted mode. For the `Compiler.perfmap` command, it should be mentioned that the format of the created perfmap file is specified here: https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/jit-interface.txt It only mandates that each line starts with a start and size number in hex and interprets the whole rest of the line (which can even include special characters) as a "symbolname". Taking into account that we already today produce "symbol names" as different as "`throw_range_check_failed Runtime1 stub`", "`Signature Handler Temp Buffer`", "`I2C/C2I adapters`" or "`boolean java.lang.invoke.VarHandleInts$FieldInstanceReadWrite.compareAndSet(java.lang.invoke.VarHandle, java.lang.Object, int, int)`", adding a potential jvmci suffix like "jvmci_name=myFancyJSFunction()#2" to some methods will not cause any compatibility issues. ..and the output of `Compiler.CodeHeap_Analytics` is unparsable anyway :) ------------- Commit messages: - 8339954: Print JVMCI names with the Compiler.{permap,codelist,CodeHeap_Analytics} diagnostic commands Changes: https://git.openjdk.org/jdk/pull/20954/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20954&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339954 Stats: 35 lines in 2 files changed: 27 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/20954.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20954/head:pull/20954 PR: https://git.openjdk.org/jdk/pull/20954 From kvn at openjdk.org Wed Sep 11 19:30:12 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 11 Sep 2024 19:30:12 GMT Subject: RFR: 8329035: New Data Destination instructions support [v5] In-Reply-To: <8BgBNjVmkL4aunknvEVEGydKNJ8hAkFnmpwQD5sYkQs=.e019dc2d-04c0-422a-ada3-22e76042fd62@github.com> References: <8BgBNjVmkL4aunknvEVEGydKNJ8hAkFnmpwQD5sYkQs=.e019dc2d-04c0-422a-ada3-22e76042fd62@github.com> Message-ID: On Tue, 10 Sep 2024 23:51:21 GMT, Steve Dohrmann wrote: >> Adds assembler support for APX New Data Destination (NDD) and No Flags (NF) features. >> >> The NDD feature is supported by new functions that take an additional destination-only register operand. If the instruction also supports NF, a no_flags boolean parameter is present. To use these instructions with NF behavior, but without NDD semantics, the same register can be supplied for both the new destination and the (first) source operand. >> >> Some instructions support NF but not NDD. These instructions have a new function that just adds a boolean no_flags parameter. Existing functions were not overloaded with a boolean here because of signature collisions (bool / int) with functions that take immediate operands. >> >> All of the new functions have a letter "e" prefix, to avoid signature collisions and to indicate they will be evex encoded. > > Steve Dohrmann has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge master > - remove instr. functions based on review comments > - refactoring and fixes based on review comments > - function name changes based on review comments > - fix 32-bit build name errors, missing no_flags arg, and addw functions > - 8329035: New Data Destination instructions support Changes looks fine. I have only two comments. Testing is running. I will let you know results. src/hotspot/cpu/x86/assembler_x86.cpp line 7496: > 7494: emit_arith(0x33, 0xC0, src1, src2); > 7495: } > 7496: We just removed `xorw(reg, reg)` because it is not used: [#20901](https://git.openjdk.org/jdk/pull/20901) Please, don't add it back. src/hotspot/cpu/x86/assembler_x86.hpp line 796: > 794: void evex_prefix_ndd(Address adr, int ndd_enc, int xreg_enc, VexSimdPrefix pre, VexOpcode opc, InstructionAttr *attributes, bool no_flags = false); > 795: > 796: void evex_prefix_nf(Address adr, int ndd_enc, int xreg_enc, VexSimdPrefix pre, VexOpcode opc, InstructionAttr *attributes, bool no_flags = false); May be split these lines. ------------- PR Review: https://git.openjdk.org/jdk/pull/20698#pullrequestreview-2298438176 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1755454321 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1755452920 From jkarthikeyan at openjdk.org Wed Sep 11 20:12:08 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 11 Sep 2024 20:12:08 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v6] In-Reply-To: References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> Message-ID: On Wed, 11 Sep 2024 15:57:29 GMT, Kangcheng Xu wrote: >> Hello all. This patch addresses [JDK-8332442](https://bugs.openjdk.org/browse/JDK-8332442) and refactors `Op_ModI`/`Op_ModL`/`Op_UModI`/`Op_UModL` cases in `DIVMOD` transforamtions. The purpose of the transformation to convert adjacent div `/` and mod `%` operations of the same operands into one should the platform support this feature (e.g., x86-64). >> >> I took the liberty adding _signed_ DIVMOD nodes (i.e., `DIV_MOD_I` and `DIV_MOD_L`) to the `IRNode` class constants as they are previously missing. Please let me know if they were left intentionally and if there are any other concerns. Thanks! >> >> ~This will be a draft PR before GHA tests are confirmed passing.~ > > Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Merge branch 'openjdk:master' into refactor-mod-cases > - improve formatting, nullptr comparison, update test flags, use custom @Run tests > - allow one div on platforms without hardware divmod > - remove platform restriction > - Merge branch 'openjdk:master' into refactor-mod-cases > - include aarch64 in tests, add more configuration combinations > - remove redundant arguments, test with -XX:-UseDivMod > - Merge branch 'master' into refactor-mod-cases > - Add test and IRNode for signed int/long divmod > - created IR tests > - ... and 2 more: https://git.openjdk.org/jdk/compare/bf2d8c7b...fe7b82ef src/hotspot/share/opto/compile.cpp line 3181: > 3179: n->subsume_by(divmod->mod_proj(), this); > 3180: } else { > 3181: // replace "a % b" with "a - ((a / b) *b)" Suggestion: // Replace "a % b" with "a - ((a / b) * b)" test/hotspot/jtreg/compiler/c2/TestDivModNodes.java line 33: > 31: /* > 32: * @test > 33: * @summary Test that DIV and MOD nodes are converted into DIVMOD where possible I think you're missing `@bug 8332442` here. test/hotspot/jtreg/compiler/c2/TestDivModNodes.java line 68: > 66: private static void runSignedIntDivMod() { > 67: int dividend = RANDOM.nextInt(); > 68: int divisor = RANDOM.nextInt(); Since `nextInt()` can return 0, this might lead to a very rare test failure since div/mod by 0 results in an `ArithmeticException`. You could do `divisor = (divisor == 0) ? 1 : divisor` to avoid that, like we do in the other division idealization tests: https://github.com/openjdk/jdk/blob/51b85a1f692fed7a66bdc0fae21438a60aafe7c2/test/hotspot/jtreg/compiler/c2/irTests/DivINodeIdealizationTests.java#L45-L46 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1755363063 PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1755371635 PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1755392758 From kxu at openjdk.org Wed Sep 11 20:26:43 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 11 Sep 2024 20:26:43 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v7] In-Reply-To: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> Message-ID: > Hello all. This patch addresses [JDK-8332442](https://bugs.openjdk.org/browse/JDK-8332442) and refactors `Op_ModI`/`Op_ModL`/`Op_UModI`/`Op_UModL` cases in `DIVMOD` transforamtions. The purpose of the transformation to convert adjacent div `/` and mod `%` operations of the same operands into one should the platform support this feature (e.g., x86-64). > > I took the liberty adding _signed_ DIVMOD nodes (i.e., `DIV_MOD_I` and `DIV_MOD_L`) to the `IRNode` class constants as they are previously missing. Please let me know if they were left intentionally and if there are any other concerns. Thanks! > > ~This will be a draft PR before GHA tests are confirmed passing.~ Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: format comments, add @bug, avoid zero divisor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20877/files - new: https://git.openjdk.org/jdk/pull/20877/files/fe7b82ef..ef7882b0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20877&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20877&range=05-06 Stats: 22 lines in 2 files changed: 17 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20877.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20877/head:pull/20877 PR: https://git.openjdk.org/jdk/pull/20877 From kxu at openjdk.org Wed Sep 11 20:26:44 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 11 Sep 2024 20:26:44 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v6] In-Reply-To: References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> Message-ID: On Wed, 11 Sep 2024 18:57:17 GMT, Jasmine Karthikeyan wrote: >> Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into refactor-mod-cases >> - improve formatting, nullptr comparison, update test flags, use custom @Run tests >> - allow one div on platforms without hardware divmod >> - remove platform restriction >> - Merge branch 'openjdk:master' into refactor-mod-cases >> - include aarch64 in tests, add more configuration combinations >> - remove redundant arguments, test with -XX:-UseDivMod >> - Merge branch 'master' into refactor-mod-cases >> - Add test and IRNode for signed int/long divmod >> - created IR tests >> - ... and 2 more: https://git.openjdk.org/jdk/compare/d6ffaaff...fe7b82ef > > test/hotspot/jtreg/compiler/c2/TestDivModNodes.java line 33: > >> 31: /* >> 32: * @test >> 33: * @summary Test that DIV and MOD nodes are converted into DIVMOD where possible > > I think you're missing `@bug 8332442` here. Good catch. Thanks! > test/hotspot/jtreg/compiler/c2/TestDivModNodes.java line 68: > >> 66: private static void runSignedIntDivMod() { >> 67: int dividend = RANDOM.nextInt(); >> 68: int divisor = RANDOM.nextInt(); > > Since `nextInt()` can return 0, this might lead to a very rare test failure since div/mod by 0 results in an `ArithmeticException`. You could do `divisor = (divisor == 0) ? 1 : divisor` to avoid that, like we do in the other division idealization tests: > https://github.com/openjdk/jdk/blob/51b85a1f692fed7a66bdc0fae21438a60aafe7c2/test/hotspot/jtreg/compiler/c2/irTests/DivINodeIdealizationTests.java#L45-L46 changed to `nextNonZeroInt()` and `nextNonZeroLong()` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1755537089 PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1755539364 From jkarthikeyan at openjdk.org Wed Sep 11 20:34:06 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 11 Sep 2024 20:34:06 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v7] In-Reply-To: References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> Message-ID: On Wed, 11 Sep 2024 20:26:43 GMT, Kangcheng Xu wrote: >> Hello all. This patch addresses [JDK-8332442](https://bugs.openjdk.org/browse/JDK-8332442) and refactors `Op_ModI`/`Op_ModL`/`Op_UModI`/`Op_UModL` cases in `DIVMOD` transforamtions. The purpose of the transformation to convert adjacent div `/` and mod `%` operations of the same operands into one should the platform support this feature (e.g., x86-64). >> >> I took the liberty adding _signed_ DIVMOD nodes (i.e., `DIV_MOD_I` and `DIV_MOD_L`) to the `IRNode` class constants as they are previously missing. Please let me know if they were left intentionally and if there are any other concerns. Thanks! >> >> ~This will be a draft PR before GHA tests are confirmed passing.~ > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > format comments, add @bug, avoid zero divisor Looks good, thanks for the update! ------------- Marked as reviewed by jkarthikeyan (Committer). PR Review: https://git.openjdk.org/jdk/pull/20877#pullrequestreview-2298567226 From dholmes at openjdk.org Wed Sep 11 20:54:05 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 11 Sep 2024 20:54:05 GMT Subject: RFR: 8338566: Missing membar in ciEnv::get_or_create_exception before publishing handle In-Reply-To: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> References: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> Message-ID: <9IEgLiTNqfOAXyA_8ldo3iXyd5ehhHNo1mWcwWGrhNk=.90188fd2-9042-41e2-adc5-5601ce4d0b5d@github.com> On Wed, 11 Sep 2024 14:17:30 GMT, Tobias Hartmann wrote: > Similar to [JDK-8251923](https://bugs.openjdk.org/browse/JDK-8251923), we need a store-store barrier before publishing a handle because otherwise another thread could observe the handle before it's fully initialized and read null from it. This affects architectures with a weak memory model like AArch64. > > Unfortunately, this only happened twice in our testing and I was never able to reproduce it. > > Thanks, > Tobias Safe publication is the responsibility of the publisher, so this PR seems fine to me in that regard. I don't think `make_global` should end in a storestore! But is this storestore sufficient to make this publication scenario perfectly thread-safe? I would have expected to see a release_store/load_acquire pairing in the ` ciEnv::*Exception_instance()` methods. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20950#issuecomment-2344679325 From lmesnik at openjdk.org Wed Sep 11 22:13:06 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 11 Sep 2024 22:13:06 GMT Subject: RFR: 8339299: C1 will miss type profile when inline final method [v4] In-Reply-To: References: Message-ID: <3g9H3wRYKtkV9w0d4Qi9k3HiM-8W_X58UQuozCdz07w=.9c7e257f-b0ab-4e40-92b1-df232706d32d@github.com> On Tue, 10 Sep 2024 02:17:18 GMT, kuaiwei wrote: >> test/hotspot/jtreg/compiler/cha/TypeProfileFinalMethod.java line 46: >> >>> 44: public class TypeProfileFinalMethod { >>> 45: public static void main(String[] args) throws Exception { >>> 46: ProcessBuilder pb = ProcessTools.createLimitedTestJavaProcessBuilder( >> >> The `createLimitedTestJavaProcessBuilder` ignores any other VM flags. This mode should be used only of test too specific. >> The >> ` * @requires (vm.opt.TieredStoAtLevel == null | vm.opt.TieredStopAtLevel == 4)` >> means that we are going to run test with mostly any GC/Runtime and C2 stress options flags. (No C1-only) >> So it is needed to use createLimitedTestJavaProcessBuilder to accept all VM flags. >> You don't neet to test any additional VM flags until you have a reasons to suppose that something might fails. >> Just use `createTestJavaProcessBuilder` instead. >> If you think that test shouldn't accept any addtional vm flags, use >> `@requires vm.flagless` >> instead of >> `@requires (vm.opt.TieredStoAtLevel == null | vm.opt.TieredStopAtLevel == 4)` >> So test is executed only once and don't run if it is gong to ignore flags. > > Thanks for your suggestions. The test case is dependent on tiered compilation and type profile. Is there any other option for these requirements? I think that additionally * @requires vm.opt.TieredCompilation == null | vm.opt.TieredCompilation == true You could check if test fails with ' -XX:-TieredCompilation'. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20786#discussion_r1755740769 From sdohrmann at openjdk.org Wed Sep 11 23:02:41 2024 From: sdohrmann at openjdk.org (Steve Dohrmann) Date: Wed, 11 Sep 2024 23:02:41 GMT Subject: RFR: 8329035: New Data Destination instructions support [v6] In-Reply-To: References: Message-ID: > Adds assembler support for APX New Data Destination (NDD) and No Flags (NF) features. > > The NDD feature is supported by new functions that take an additional destination-only register operand. If the instruction also supports NF, a no_flags boolean parameter is present. To use these instructions with NF behavior, but without NDD semantics, the same register can be supplied for both the new destination and the (first) source operand. > > Some instructions support NF but not NDD. These instructions have a new function that just adds a boolean no_flags parameter. Existing functions were not overloaded with a boolean here because of signature collisions (bool / int) with functions that take immediate operands. > > All of the new functions have a letter "e" prefix, to avoid signature collisions and to indicate they will be evex encoded. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: xorw and long-line changes based on review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20698/files - new: https://git.openjdk.org/jdk/pull/20698/files/4b956df4..ef83fb07 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20698&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20698&range=04-05 Stats: 17 lines in 2 files changed: 2 ins; 13 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20698.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20698/head:pull/20698 PR: https://git.openjdk.org/jdk/pull/20698 From sdohrmann at openjdk.org Wed Sep 11 23:02:43 2024 From: sdohrmann at openjdk.org (Steve Dohrmann) Date: Wed, 11 Sep 2024 23:02:43 GMT Subject: RFR: 8329035: New Data Destination instructions support [v5] In-Reply-To: References: <8BgBNjVmkL4aunknvEVEGydKNJ8hAkFnmpwQD5sYkQs=.e019dc2d-04c0-422a-ada3-22e76042fd62@github.com> Message-ID: On Wed, 11 Sep 2024 19:25:15 GMT, Vladimir Kozlov wrote: >> Steve Dohrmann has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Merge master >> - remove instr. functions based on review comments >> - refactoring and fixes based on review comments >> - function name changes based on review comments >> - fix 32-bit build name errors, missing no_flags arg, and addw functions >> - 8329035: New Data Destination instructions support > > src/hotspot/cpu/x86/assembler_x86.cpp line 7496: > >> 7494: emit_arith(0x33, 0xC0, src1, src2); >> 7495: } >> 7496: > > We just removed `xorw(reg, reg)` because it is not used: [#20901](https://git.openjdk.org/jdk/pull/20901) > Please, don't add it back. Sorry. Done. > src/hotspot/cpu/x86/assembler_x86.hpp line 796: > >> 794: void evex_prefix_ndd(Address adr, int ndd_enc, int xreg_enc, VexSimdPrefix pre, VexOpcode opc, InstructionAttr *attributes, bool no_flags = false); >> 795: >> 796: void evex_prefix_nf(Address adr, int ndd_enc, int xreg_enc, VexSimdPrefix pre, VexOpcode opc, InstructionAttr *attributes, bool no_flags = false); > > May be split these lines. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1755785250 PR Review Comment: https://git.openjdk.org/jdk/pull/20698#discussion_r1755785161 From kvn at openjdk.org Thu Sep 12 01:07:09 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 12 Sep 2024 01:07:09 GMT Subject: RFR: 8329035: New Data Destination instructions support [v6] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 23:02:41 GMT, Steve Dohrmann wrote: >> Adds assembler support for APX New Data Destination (NDD) and No Flags (NF) features. >> >> The NDD feature is supported by new functions that take an additional destination-only register operand. If the instruction also supports NF, a no_flags boolean parameter is present. To use these instructions with NF behavior, but without NDD semantics, the same register can be supplied for both the new destination and the (first) source operand. >> >> Some instructions support NF but not NDD. These instructions have a new function that just adds a boolean no_flags parameter. Existing functions were not overloaded with a boolean here because of signature collisions (bool / int) with functions that take immediate operands. >> >> All of the new functions have a letter "e" prefix, to avoid signature collisions and to indicate they will be evex encoded. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > xorw and long-line changes based on review comments My testing passed. You can integrate. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20698#pullrequestreview-2299026270 PR Comment: https://git.openjdk.org/jdk/pull/20698#issuecomment-2345053013 From dlong at openjdk.org Thu Sep 12 02:16:04 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 12 Sep 2024 02:16:04 GMT Subject: RFR: 8338566: Missing membar in ciEnv::get_or_create_exception before publishing handle In-Reply-To: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> References: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> Message-ID: On Wed, 11 Sep 2024 14:17:30 GMT, Tobias Hartmann wrote: > Similar to [JDK-8251923](https://bugs.openjdk.org/browse/JDK-8251923), we need a store-store barrier before publishing a handle because otherwise another thread could observe the handle before it's fully initialized and read null from it. This affects architectures with a weak memory model like AArch64. > > Unfortunately, this only happened twice in our testing and I was never able to reproduce it. > > Thanks, > Tobias I agree, the ciEnv::*Exception_instance() methods look like a problem. They are using shared static jobjects without synchronization. If there is a race, one compiler thread can overwrite the existing handle that another compiler thread used to resolve the oop. I think we need synchronization or compare-and-swap when assigning to the static handle to prevent a leak. We may be OK without the load-acquire on the read side, if we rely on the data dependency like interpreter/generated code does. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20950#issuecomment-2345118335 From duke at openjdk.org Thu Sep 12 02:24:05 2024 From: duke at openjdk.org (kuaiwei) Date: Thu, 12 Sep 2024 02:24:05 GMT Subject: RFR: 8339299: C1 will miss type profile when inline final method [v4] In-Reply-To: <3g9H3wRYKtkV9w0d4Qi9k3HiM-8W_X58UQuozCdz07w=.9c7e257f-b0ab-4e40-92b1-df232706d32d@github.com> References: <3g9H3wRYKtkV9w0d4Qi9k3HiM-8W_X58UQuozCdz07w=.9c7e257f-b0ab-4e40-92b1-df232706d32d@github.com> Message-ID: On Wed, 11 Sep 2024 22:10:49 GMT, Leonid Mesnik wrote: >> Thanks for your suggestions. The test case is dependent on tiered compilation and type profile. Is there any other option for these requirements? > > I think that additionally > * @requires vm.opt.TieredCompilation == null | vm.opt.TieredCompilation == true > > You could check if test fails with ' -XX:-TieredCompilation'. I tested with "-XX:-TieredCompilation" and it passed. my test command: > make run-test TEST="test/hotspot/jtreg/compiler/cha/TypeProfileFinalMethod.java" CONF=fastdebug JTREG="VM_OPTIONS=-X X:-TieredCompilation" ... ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:test/hotspot/jtreg/compiler/cha/TypeProfileFinalMethod.java 1 1 0 0 ============================== TEST SUCCESS ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20786#discussion_r1755996596 From lmesnik at openjdk.org Thu Sep 12 03:11:08 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 12 Sep 2024 03:11:08 GMT Subject: RFR: 8339299: C1 will miss type profile when inline final method [v4] In-Reply-To: References: <3g9H3wRYKtkV9w0d4Qi9k3HiM-8W_X58UQuozCdz07w=.9c7e257f-b0ab-4e40-92b1-df232706d32d@github.com> Message-ID: On Thu, 12 Sep 2024 02:21:17 GMT, kuaiwei wrote: >> I think that additionally >> * @requires vm.opt.TieredCompilation == null | vm.opt.TieredCompilation == true >> >> You could check if test fails with ' -XX:-TieredCompilation'. > > I tested with "-XX:-TieredCompilation" and it passed. > my test command: > >> make run-test TEST="test/hotspot/jtreg/compiler/cha/TypeProfileFinalMethod.java" CONF=fastdebug JTREG="VM_OPTIONS=-X > X:-TieredCompilation" > ... > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/cha/TypeProfileFinalMethod.java > 1 1 0 0 > ============================== > TEST SUCCESS Thanks. That's fine for me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20786#discussion_r1756033937 From chagedorn at openjdk.org Thu Sep 12 06:21:05 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 12 Sep 2024 06:21:05 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v7] In-Reply-To: References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> Message-ID: On Wed, 11 Sep 2024 20:26:43 GMT, Kangcheng Xu wrote: >> Hello all. This patch addresses [JDK-8332442](https://bugs.openjdk.org/browse/JDK-8332442) and refactors `Op_ModI`/`Op_ModL`/`Op_UModI`/`Op_UModL` cases in `DIVMOD` transforamtions. The purpose of the transformation to convert adjacent div `/` and mod `%` operations of the same operands into one should the platform support this feature (e.g., x86-64). >> >> I took the liberty adding _signed_ DIVMOD nodes (i.e., `DIV_MOD_I` and `DIV_MOD_L`) to the `IRNode` class constants as they are previously missing. Please let me know if they were left intentionally and if there are any other concerns. Thanks! >> >> ~This will be a draft PR before GHA tests are confirmed passing.~ > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > format comments, add @bug, avoid zero divisor Thanks for the update, that looks good to me! I'll also give this a spinning in our testing. Will report back once it's completed. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20877#pullrequestreview-2299312195 From thartmann at openjdk.org Thu Sep 12 06:36:05 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 12 Sep 2024 06:36:05 GMT Subject: RFR: 8339733: C2: some nodes can have incorrect control after do_range_check() [v4] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 11:21:41 GMT, Roland Westrelin wrote: >> PhaseIdealLoop::do_range_check() sets the control of the new pre and >> main limits to be the entry control of the pre loop but it eliminates >> all conditions whose parameters are invariant in the main loop. Most >> of the time they are also invariant in the pre loop but that's not >> guaranteed. It does happen sometimes that those parameters are pinned >> in the pre loop. In that case, PhaseIdealLoop::do_range_check() sets >> wrong controls. >> >> This doesn't cause any issue today AFAICT. >> >> Also, this seems to be a typo in PhaseIdealLoop::insert_pre_post_loops(): >> >> >> pre_head->in(0) >> >> >> is `pre_head`. I fixed that one too. > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/share/opto/loopTransform.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopTransform.cpp > > Co-authored-by: Christian Hagedorn Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20908#pullrequestreview-2299335041 From roland at openjdk.org Thu Sep 12 07:23:08 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 12 Sep 2024 07:23:08 GMT Subject: RFR: 8339733: C2: some nodes can have incorrect control after do_range_check() [v4] In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 06:33:28 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update src/hotspot/share/opto/loopTransform.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopTransform.cpp >> >> Co-authored-by: Christian Hagedorn > > Looks good to me too. @TobiHartmann @chhagedorn thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/20908#issuecomment-2345458282 From roland at openjdk.org Thu Sep 12 07:23:09 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 12 Sep 2024 07:23:09 GMT Subject: Integrated: 8339733: C2: some nodes can have incorrect control after do_range_check() In-Reply-To: References: Message-ID: <0NwmHM4zG2eNRBaOLXQAYXT8kVBuG0HKvvHo1JcrLDI=.c24c5afa-e53b-4514-b0fb-23d523b4e0cc@github.com> On Mon, 9 Sep 2024 08:27:20 GMT, Roland Westrelin wrote: > PhaseIdealLoop::do_range_check() sets the control of the new pre and > main limits to be the entry control of the pre loop but it eliminates > all conditions whose parameters are invariant in the main loop. Most > of the time they are also invariant in the pre loop but that's not > guaranteed. It does happen sometimes that those parameters are pinned > in the pre loop. In that case, PhaseIdealLoop::do_range_check() sets > wrong controls. > > This doesn't cause any issue today AFAICT. > > Also, this seems to be a typo in PhaseIdealLoop::insert_pre_post_loops(): > > > pre_head->in(0) > > > is `pre_head`. I fixed that one too. This pull request has now been integrated. Changeset: 315abdf8 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/315abdf8c835e95d9c509f72b7ae21e6b59e4a29 Stats: 86 lines in 2 files changed: 63 ins; 0 del; 23 mod 8339733: C2: some nodes can have incorrect control after do_range_check() Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/20908 From roland at openjdk.org Thu Sep 12 07:41:11 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 12 Sep 2024 07:41:11 GMT Subject: RFR: 8333258: C2: high memory usage in PhaseCFG::insert_anti_dependences() [v3] In-Reply-To: References: Message-ID: <8caUrkKdKk4TqQ4OmhNe35rpU0ES5Ja4y9TvXS32nDA=.f901fc91-eb12-422d-bdc6-ef2907fd7328@github.com> On Tue, 23 Jul 2024 16:20:23 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - refactoring >> - Merge branch 'master' into JDK-8333258 >> - review >> - Merge branch 'master' into JDK-8333258 >> - whitespaces >> - tests & fix > > But I would like you to fix the comments here: > > // The relevant stores "nearby" the load consist of a tree rooted > // at initial_mem, with internal nodes of type MergeMem. > // Therefore, the branches visited by the worklist are of this form: > // initial_mem -> (MergeMem ->)* store > // The anti-dependence constraints apply only to the fringe of this tree. > > There are not just `MergeMem` but also `Phi` nodes. @eme64 could you take another look at this one? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19791#issuecomment-2345492703 From thartmann at openjdk.org Thu Sep 12 07:42:07 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 12 Sep 2024 07:42:07 GMT Subject: RFR: 8338566: Missing membar in ciEnv::get_or_create_exception before publishing handle In-Reply-To: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> References: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> Message-ID: On Wed, 11 Sep 2024 14:17:30 GMT, Tobias Hartmann wrote: > Similar to [JDK-8251923](https://bugs.openjdk.org/browse/JDK-8251923), we need a store-store barrier before publishing a handle because otherwise another thread could observe the handle before it's fully initialized and read null from it. This affects architectures with a weak memory model like AArch64. > > Unfortunately, this only happened twice in our testing and I was never able to reproduce it. > > Thanks, > Tobias Thanks for looking into this, David and Dean. Good points, I agree that we would need to make this completely thread-safe to prevent a leak. Looking at the code again, I wonder why we even do all this lazily, especially since we already create `NullPointerException` and `ArithmeticException` eagerly at VM startup: https://github.com/openjdk/jdk/blob/438121be6bdb085fa13ad14ec53b09ecdbd4757d/src/hotspot/share/memory/universe.cpp#L1086-L1089 Couldn't we do the same for `ArrayIndexOutOfBoundsException`, `ArrayStoreException` and `ClassCastException`? This would save us quite some complexity and I think the startup / footprint overhead is negligible. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20950#issuecomment-2345494640 From epeter at openjdk.org Thu Sep 12 08:41:12 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Sep 2024 08:41:12 GMT Subject: RFR: 8333258: C2: high memory usage in PhaseCFG::insert_anti_dependences() [v5] In-Reply-To: <4SiD1hBSzfbqKvKT8t6D8WDxQqH4XmAWAMllX929M6E=.8ee795df-2d1e-4b1c-84be-d396cd3274c1@github.com> References: <4SiD1hBSzfbqKvKT8t6D8WDxQqH4XmAWAMllX929M6E=.8ee795df-2d1e-4b1c-84be-d396cd3274c1@github.com> Message-ID: <4TjjcMSCPD7wZa5lOjD2_4xO73MB2CuIngPTkqFU5xg=.d41ea86f-8164-4887-a1e7-a67e873b1b3f@github.com> On Mon, 26 Aug 2024 13:34:41 GMT, Roland Westrelin wrote: >> In a debug build, `PhaseCFG::insert_anti_dependences()` is called >> twice for a single node: once for actual processing, once for >> verification. >> >> In TestAntiDependenciesHighMemUsage, the test has a `Region` that >> merges 337 incoming path. It also has one `Phi` per memory slice that >> are stored to: 1000 `Phi` nodes. Each `Phi` node has 337 inputs that >> are identical except for one. The common input is the memory state on >> method entry. The test has 60 `Load` that needs to be processed for >> anti dependences. All `Load` share the same memory input: the memory >> state on method entry. For each `Load`, all `Phi` nodes are pushed 336 >> times on the work lists for anti dependence processing because all of >> them appear multiple times as uses of each `Load`s memory state: `Phi`s >> are pushed 336 000 on 2 work lists. Memory is not reclaimed on exit >> from `PhaseCFG::insert_anti_dependences()` so memory usage grows as >> `Load` nodes are processed: >> >> 336000 * 2 work lists * 60 loads * 8 bytes pointer = 322 MB. >> >> The fix I propose for this is to not push `Phi` nodes more than once >> when they have the same inputs multiple times. >> >> In TestAntiDependenciesHighMemUsage2, the test has 4000 loads. For >> each of them, when processed for anti dependences, all 4000 loads are >> pushed on the work lists because they share the same memory >> input. Then when they are popped from the work list, they are >> discarded because only stores are of interest: >> >> 4000 loads processed * 4000 loads pushed * 2 work lists * 8 bytes pointer = 256 MB. >> >> The fix I propose for this is to test before pushing on the work list >> whether a node is a store or not. >> >> Finally, I propose adding a `ResourceMark` so memory doesn't >> accumulate over calls to `PhaseCFG::insert_anti_dependences()`. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > more review @rwestrel sorry for the long delay, I was away for over 3 weeks. Looks much better now, I still have a few comments and suggestions. src/hotspot/share/opto/gcm.cpp line 632: > 630: if (use_mem_state->is_MergeMem()) { > 631: // Be sure we don't get into combinatorial problems. > 632: // (Allow phis to be repeated; they can merge two relevant states.) Does this comment not belong to the `Phi` code below? src/hotspot/share/opto/gcm.cpp line 637: > 635: if (_worklist_visited.at(j-1) == use_mem_state) return; // already on work list; do not repeat > 636: } > 637: _worklist_visited.push(use_mem_state); This is fine, but it might be nicer if we had such a `push_if_missing`. `GrowableArray` has that functionality. Maybe you should just use a `GrowableArray` then anyway, right? It would be less code. src/hotspot/share/opto/gcm.cpp line 749: > 747: // initial_mem -> (MergeMem ->)* Memory state modifying node > 748: // Memory state modifying nodes include Store and Phi nodes and any node for which needs_anti_dependence_check() > 749: // returns true. Basically this is the idea: `needs_anti_dependence_check() == true` for Loads `needs_anti_dependence_check() == false` for MergeMem, Phi, Store So your `returns true` is not correct, right? Should it not be `returns false`? src/hotspot/share/opto/gcm.cpp line 761: > 759: > 760: uint op = use_mem_state->Opcode(); > 761: assert(!use_mem_state->needs_anti_dependence_check(), "only stores"); The comment in the assert is misleading. It can also be MergeMem, Phi, etc src/hotspot/share/opto/gcm.cpp line 775: > 773: for (DUIterator_Fast imax, i = def_mem_state->fast_outs(imax); i < imax; i++) { > 774: use_mem_state = def_mem_state->fast_out(i); > 775: // If this is not a store, load can't be anti dependent on this node This comment is not helping me, and I think it is also not precise... Instead of `store`, you should be talking about `MergeMem`, `Phi`, `Store` and any other state modifying node. Suggestion: remove comment, and after the check, just before the `continue`, add this comment: use_mem_state is also a kind of load (i.e. needs_anti_dependence_check), and it is not any state modifying node, store, Phi or MergeMem. Hence, load is not anti dependent on this node. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19791#pullrequestreview-2299488433 PR Review Comment: https://git.openjdk.org/jdk/pull/19791#discussion_r1756325982 PR Review Comment: https://git.openjdk.org/jdk/pull/19791#discussion_r1756397476 PR Review Comment: https://git.openjdk.org/jdk/pull/19791#discussion_r1756355053 PR Review Comment: https://git.openjdk.org/jdk/pull/19791#discussion_r1756352150 PR Review Comment: https://git.openjdk.org/jdk/pull/19791#discussion_r1756385142 From epeter at openjdk.org Thu Sep 12 08:41:13 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Sep 2024 08:41:13 GMT Subject: RFR: 8333258: C2: high memory usage in PhaseCFG::insert_anti_dependences() [v5] In-Reply-To: <4TjjcMSCPD7wZa5lOjD2_4xO73MB2CuIngPTkqFU5xg=.d41ea86f-8164-4887-a1e7-a67e873b1b3f@github.com> References: <4SiD1hBSzfbqKvKT8t6D8WDxQqH4XmAWAMllX929M6E=.8ee795df-2d1e-4b1c-84be-d396cd3274c1@github.com> <4TjjcMSCPD7wZa5lOjD2_4xO73MB2CuIngPTkqFU5xg=.d41ea86f-8164-4887-a1e7-a67e873b1b3f@github.com> Message-ID: On Thu, 12 Sep 2024 08:06:05 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> more review > > src/hotspot/share/opto/gcm.cpp line 761: > >> 759: >> 760: uint op = use_mem_state->Opcode(); >> 761: assert(!use_mem_state->needs_anti_dependence_check(), "only stores"); > > The comment in the assert is misleading. It can also be MergeMem, Phi, etc Maybe say `only MergeMem and state modifying nodes`. Or just `no loads`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19791#discussion_r1756356201 From dlong at openjdk.org Thu Sep 12 09:51:06 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 12 Sep 2024 09:51:06 GMT Subject: RFR: 8338566: Missing membar in ciEnv::get_or_create_exception before publishing handle In-Reply-To: References: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> Message-ID: <6REI8keH3KddT31KZ-TAwxekLpClKLkq4h17ZjhtajU=.fb277286-a305-4600-a8ef-10ff0ede873c@github.com> On Thu, 12 Sep 2024 07:39:02 GMT, Tobias Hartmann wrote: >> Similar to [JDK-8251923](https://bugs.openjdk.org/browse/JDK-8251923), we need a store-store barrier before publishing a handle because otherwise another thread could observe the handle before it's fully initialized and read null from it. This affects architectures with a weak memory model like AArch64. >> >> Unfortunately, this only happened twice in our testing and I was never able to reproduce it. >> >> Thanks, >> Tobias > > Thanks for looking into this, David and Dean. Good points, I agree that we would need to make this completely thread-safe to prevent a leak. Looking at the code again, I wonder why we even do all this lazily, especially since we already create `NullPointerException` and `ArithmeticException` eagerly at VM startup: > https://github.com/openjdk/jdk/blob/438121be6bdb085fa13ad14ec53b09ecdbd4757d/src/hotspot/share/memory/universe.cpp#L1086-L1089 > > Couldn't we do the same for `ArrayIndexOutOfBoundsException`, `ArrayStoreException` and `ClassCastException`? This would save us quite some complexity and I think the startup / footprint overhead is negligible. @TobiHartmann Yes, that seems like the best idea. I was going to suggest moving the fields into the CompilerThread, which gets rid of the race and limits the redundant objects, but I like your idea better. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20950#issuecomment-2345788504 From simonis at openjdk.org Thu Sep 12 11:00:45 2024 From: simonis at openjdk.org (Volker Simonis) Date: Thu, 12 Sep 2024 11:00:45 GMT Subject: RFR: 8339954: Print JVMCI names with the Compiler.{permap,codelist,CodeHeap_Analytics} diagnostic commands [v2] In-Reply-To: References: Message-ID: > The diagnostic commands `Compiler.codelist`, `Compiler.CodeHeap_Analytics` and `Compiler.perfmap` are handy for analyzing the CodeCache or creating a symbol file for the perf tool. However, with the Truffle framework which uses the GraalVM compiler in "hosted" mode, we can end up with hundreds if not thousands of nmethods which are all linked to the same Java method (most prominently `com.oracle.truffle.runtime.OptimizedCallTarget.profiledPERoot()`). All these nmethods are currently indistinguishable by the two aforementioned diagnostic commands. > > But nmethods compiled by the GraalVM compiler have a special "JVMCI name" attached to them, which in the case of Truffle corresponds to the guest language function name. Printing this "JVMCI name" in addition to the true Java method name makes it easier to distinguish various nmethods compiled by Truffle or other frameworks which use the GraalVM compiler in hosted mode. > > For the `Compiler.perfmap` command, it should be mentioned that the format of the created perfmap file is specified here: > https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/jit-interface.txt > > It only mandates that each line starts with a start and size number in hex and interprets the whole rest of the line (which can even include special characters) as a "symbolname". Taking into account that we already today produce "symbol names" as different as "`throw_range_check_failed Runtime1 stub`", "`Signature Handler Temp Buffer`", "`I2C/C2I adapters`" or "`boolean java.lang.invoke.VarHandleInts$FieldInstanceReadWrite.compareAndSet(java.lang.invoke.VarHandle, java.lang.Object, int, int)`", adding a potential jvmci suffix like "jvmci_name=myFancyJSFunction()#2" to some methods will not cause any compatibility issues. > > ..and the output of `Compiler.CodeHeap_Analytics` is unparsable anyway :) Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Replace call to ::sprintf() by os::snprintf() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20954/files - new: https://git.openjdk.org/jdk/pull/20954/files/a52e3c91..0c3fbb7a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20954&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20954&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20954.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20954/head:pull/20954 PR: https://git.openjdk.org/jdk/pull/20954 From kxu at openjdk.org Thu Sep 12 14:04:12 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 12 Sep 2024 14:04:12 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v8] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Tue, 23 Jul 2024 09:52:00 GMT, Emanuel Peter wrote: >> I'm observing a weird test [failure exposed by GHA](https://github.com/tabjy/jdk/actions/runs/10045677996/job/27764679129) that only happens on Linux x86. While there are no actual loop nodes, `LoopLimit` nodes are matched by the same regex. I'm currently looking into this: >> >> - `LoopLimit` nodes are somehow not eliminated together with counted loops on 32bit Linux >> - the same binary bundle downloaded from GHA running on my local machine produces only one failure (instead of two) on `testIntCountedLoopWithIntIVWithRandomStrides(int)` >> - `LoopLimit` and `/(\d+(\s){2}(Loop.*)+(\s){2}===.*)/` being another case of [regex mis-match](https://github.com/openjdk/jdk/pull/18198#issuecomment-2214675206)? >> >> >> 2024-07-22T18:16:53.5547051Z One or more @IR rules failed: >> 2024-07-22T18:16:53.5547411Z >> 2024-07-22T18:16:53.5547613Z Failed IR Rules (2) of Methods (2) >> 2024-07-22T18:16:53.5548249Z ---------------------------------- >> 2024-07-22T18:16:53.5550118Z 1) Method "private static int compiler.loopopts.parallel_iv.TestParallelIvInIntCountedLoop.testIntCountedLoopWithIntIVWithRandomStrides(int)" - [Failed IR rules: 1]: >> 2024-07-22T18:16:53.5553610Z * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={}, failOn={"_#LOOP#_", "_#COUNTED_LOOP#_"}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" >> 2024-07-22T18:16:53.5556263Z > Phase "PrintIdeal": >> 2024-07-22T18:16:53.5557150Z - failOn: Graph contains forbidden nodes: >> 2024-07-22T18:16:53.5557912Z * Constraint 1: "(\d+(\s){2}(Loop.*)+(\s){2}===.*)" >> 2024-07-22T18:16:53.5558668Z - Matched forbidden node: >> 2024-07-22T18:16:53.5559308Z * 127 LoopLimit === _ 22 10 83 [[ 128 ]] >> 2024-07-22T18:16:53.5559791Z >> 2024-07-22T18:16:53.5561343Z 2) Method "private static long compiler.loopopts.parallel_iv.TestParallelIvInIntCountedLoop.testIntCountedLoopWithLongIVWithRandomStrides(int)" - [Failed IR rules: 1]: >> 2024-07-22T18:16:53.5565173Z * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={}, failOn={"_#LOOP#_", "_#COUNTED_LOOP#_"}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" >> 2024-07-22T18:16:53.5567518Z > Phase "PrintIdeal": >> 2... > > @tabjy Yes, looks like the 32-bit VM behaves different. Do you know what makes the difference here? > Yes, looks like regex-matching is also confused here, we should probably address that. > If the difference of the 32-bit vs 64-bit turns out to be expected: you can just do this: > `test/hotspot/jtreg/compiler/loopopts/superword/RedTest_long.java: applyIfPlatform = {"64-bit", "true"},` Hi @eme64! Could you kindly give this a quick look if you have the time. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18489#issuecomment-2346386857 From simonis at openjdk.org Thu Sep 12 15:47:16 2024 From: simonis at openjdk.org (Volker Simonis) Date: Thu, 12 Sep 2024 15:47:16 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v5] In-Reply-To: References: Message-ID: On Mon, 5 Aug 2024 00:32:40 GMT, leo liang wrote: > Hi there, we are seeing this issue when we run JFR on our services under load, we see a large spike of CPU after JFR is triggered, which cause 500 errors in our service. We are currently using corretto-17 in our service. > > Wondering this fix get back ported to JDK 17? As I can't find this change mentioned in [JDK update](https://wiki.openjdk.org/display/JDKUpdates/Archived+Releases) or in [jdk17u tag compare](https://github.com/openjdk/jdk17u/compare/jdk-17.0.9+9...jdk-17.0.13+1) > > Also, wondering if there is a walk around for this issue if the PR is not back ported to Java 17. `XX:+EnableDynamicAgentLoading` seems to only supported in Java 21, so that wouldn't help for now @leomao10, I'm not sure if this change will ever be downported to older releases like JDK 21 or even JDK 17. I personally consider it low risk, but there have been reports of performance regressions in some cases (e.g. [JDK-8336805](https://bugs.openjdk.org/browse/JDK-8336805)). I couldn't reproduce them, but I can image that they will make the maintainers of LTS releases even more cautious. The easiest way to workaround this issue in JDK 17 would be to set the `can_redefine_classes`, `can_retransform_classes` or `can_generate_breakpoint_events` JVMTI capabilities at startup or early in the lifetime of the JVM. There are several ways how you could do this. - trigger JFR right after startup. This will still invalidate all the JIT compiled methods but if you do this early enough there won'T be many of them. After you've triggered JFR for the first time, the corresponding JVMTI capabilities will be set and all dependencies will be recorded automatically so any subsequent JFR invocation won't suffer from a performance degradation any more. - attach any other JVMTI agent like for example [async profiler](https://github.com/async-profiler/async-profiler) which requests the corresponding JVMTI capabilities at startup. - write your own, trivial JVMTI agent which merely requests the corresponding JVMTI capabilities and attach it at startup with `agentpath:jvmtiAgent.so`. The agent can be as simple as: /* g++ -fPIC -shared -I $JAVA_HOME/include/ -I $JAVA_HOME/inlude/linux -o jvmtiAgent.so jvmtiAgent.cpp */ #include #include #include extern "C" JNIEXPORT jint JNICALL Agent_OnLoad(JavaVM* jvm, char* options, void* reserved) { jvmtiEnv* jvmti = NULL; jvmtiCapabilities capa; jvmtiError error; jint result = jvm->GetEnv((void**) &jvmti, JVMTI_VERSION_1_1); if (result != JNI_OK) { fprintf(stderr, "Can't access JVMTI!\n"); return JNI_ERR; } memset(&capa, 0, sizeof(jvmtiCapabilities)); capa.can_redefine_classes = 1; capa.can_retransform_classes = 1; capa.can_generate_breakpoint_events = 1; if (jvmti->AddCapabilities(&capa) != JVMTI_ERROR_NONE) { fprintf(stderr, "Can't set capabilities!\n"); return JNI_ERR; } else { fprintf(stdout, "Added capabilities!\n"); } return JNI_OK; } ------------- PR Comment: https://git.openjdk.org/jdk/pull/17509#issuecomment-2346646666 From phh at openjdk.org Thu Sep 12 15:56:04 2024 From: phh at openjdk.org (Paul Hohensee) Date: Thu, 12 Sep 2024 15:56:04 GMT Subject: RFR: 8339954: Print JVMCI names with the Compiler.{permap,codelist,CodeHeap_Analytics} diagnostic commands [v2] In-Reply-To: References: Message-ID: <_bcSfBK08N9xKZM4WuNuIsmMqhCDxMfavdwqiAowjZ8=.13c91a0f-1327-42ad-82d4-eee3052caada@github.com> On Thu, 12 Sep 2024 11:00:45 GMT, Volker Simonis wrote: >> The diagnostic commands `Compiler.codelist`, `Compiler.CodeHeap_Analytics` and `Compiler.perfmap` are handy for analyzing the CodeCache or creating a symbol file for the perf tool. However, with the Truffle framework which uses the GraalVM compiler in "hosted" mode, we can end up with hundreds if not thousands of nmethods which are all linked to the same Java method (most prominently `com.oracle.truffle.runtime.OptimizedCallTarget.profiledPERoot()`). All these nmethods are currently indistinguishable by the two aforementioned diagnostic commands. >> >> But nmethods compiled by the GraalVM compiler have a special "JVMCI name" attached to them, which in the case of Truffle corresponds to the guest language function name. Printing this "JVMCI name" in addition to the true Java method name makes it easier to distinguish various nmethods compiled by Truffle or other frameworks which use the GraalVM compiler in hosted mode. >> >> For the `Compiler.perfmap` command, it should be mentioned that the format of the created perfmap file is specified here: >> https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/jit-interface.txt >> >> It only mandates that each line starts with a start and size number in hex and interprets the whole rest of the line (which can even include special characters) as a "symbolname". Taking into account that we already today produce "symbol names" as different as "`throw_range_check_failed Runtime1 stub`", "`Signature Handler Temp Buffer`", "`I2C/C2I adapters`" or "`boolean java.lang.invoke.VarHandleInts$FieldInstanceReadWrite.compareAndSet(java.lang.invoke.VarHandle, java.lang.Object, int, int)`", adding a potential jvmci suffix like "jvmci_name=myFancyJSFunction()#2" to some methods will not cause any compatibility issues. >> >> ..and the output of `Compiler.CodeHeap_Analytics` is unparsable anyway :) > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Replace call to ::sprintf() by os::snprintf() Thanks for adding this. Lgtm. ------------- Marked as reviewed by phh (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20954#pullrequestreview-2300765575 From epeter at openjdk.org Thu Sep 12 15:57:26 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Sep 2024 15:57:26 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing Message-ID: **Motivation** I want to write small dedicated fuzzers: - Generate `java` and `jasm` source code: just some `String`. - Quickly compile it (with this framework). - Execute the compiled code. The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. **The CompileFramework** Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. I implemented a first, simple version of the framework. I added some tests and examples. **Example** CompileFramework comp = new CompileFramework(); comp.add(SourceCode.newJavaSourceCode("XYZ", "")); comp.compile(); comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 **Below some use cases: tests that would have been better with the CompileFramework** **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperWord). **Use case : test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java** Currently, I generate it with a `generator.py` that I update and keep in the latest related JBS issue. Not great, because the generator code is not properly version-controlled. And it is python which is not really used elsewhere in our stack. Basically, I generate different loops with different parameters. I need to iterate over a list of distances and generate one loop for each of them. With this generator.py I must chose a fixed list of values, but with a fuzzer I could both pick a list of fixed values but also sprinkle in some random values for higher coverage. **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java** Test memory segments with different kinds of backing data types (array, native, etc). Then use various different address shapes. Verify that they vectorize as expected. It is very cumbersome to generate all the examples by hand. **More use cases** I'm currently looking into extending both `MergeStores` and `SuperWord`. This will require more tests. The test coverage could be improved even for current features. Often when I add a big test I seem to catch some already existing bug. ------------- Commit messages: - fix paths for windows, had compile issue - increase compile timeut - rm unnecessary test - Merge branch 'master' into fuzzer-test - Merge branch 'master' into fuzzer-test - name timeout better - stub of TestMergeStoresFuzzer - private source and classes directory per CompileFramework - give javac the classesDir from jasm compilation, so java files can reference jasm classes - make it multi-threading safe - ... and 41 more: https://git.openjdk.org/jdk/compare/8fce5275...881c76bf Changes: https://git.openjdk.org/jdk/pull/20184/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337221 Stats: 1266 lines in 13 files changed: 1266 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From enikitin at openjdk.org Thu Sep 12 15:57:28 2024 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Thu, 12 Sep 2024 15:57:28 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing In-Reply-To: References: Message-ID: On Mon, 15 Jul 2024 15:56:10 GMT, Emanuel Peter wrote: > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 28: > 26: import compiler.lib.compile_framework.SourceCode; > 27: import compiler.lib.compile_framework.CompileFrameworkException; > 28: import compiler.lib.compile_framework.InternalCompileFrameworkException; Not needed? test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 42: > 40: import java.util.List; > 41: import java.util.concurrent.TimeUnit; > 42: import jdk.test.lib.process.ProcessTools; 1. URI and Arrays are not used 2. imports list is unsorted. test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 54: > 52: } > 53: > 54: public void printSourceCodes() { I'd decouple debug dumping from the stream it prints to. Not always we agree with filling the `stdout` with garbage. Something like this: public String sourceCodesAsString() {...} public Set getSourceCodes() {...} test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 66: > 64: } > 65: > 66: printSourceCodes(); Make debug printouts controllable? Fuzzing generators compiling thousands files, would generate a whole big data of those printouts. test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 73: > 71: switch (sourceCode.kind) { > 72: case SourceCode.Kind.JASM -> { jasmSources.add(sourceCode); } > 73: case SourceCode.Kind.JAVA -> { javaSources.add(sourceCode); } Simplify (and shorten imports also)? Suggestion: case JASM -> { jasmSources.add(sourceCode); } case JAVA -> { javaSources.add(sourceCode); } test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 87: > 85: final String sourceDir; > 86: try { > 87: sourceDir = "compile-framework-sources-" + ProcessTools.getProcessId(); Make it MT-safe, say by utilising the `File.createTempFile(prefix, suffix, directory)`? test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 131: > 129: private static String getAsmToolsPath() { > 130: for (String path : getClassPaths()) { > 131: System.out.println("jtreg.jar?: " + path); A weird debug printout here test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 150: > 148: return; > 149: } > 150: System.out.println("Compiling Java sources: " + javaSources.size()); Same wish to make debug printouts controllable. A dedicated (mini-?)logger, a static on/off variable, a dedicated PrintStream the user could set up themselves, etc. test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 174: > 172: } > 173: > 174: private static List writeSourcesToFile(String sourceDir, List sources) { `String sourceDir` -> `Path sourceDir` ? test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 185: > 183: } > 184: > 185: private static void writeCodeToFile(String code, String fileName) { `String fileName` -> `Path fullPath` ? test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 189: > 187: File file = new File(fileName); > 188: File dir = file.getAbsoluteFile().getParentFile(); > 189: if (!dir.exists()){ A space is missing Suggestion: if (!dir.exists()) { test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 203: > 201: > 202: ProcessBuilder builder = new ProcessBuilder(command); > 203: builder.redirectErrorStream(true); Same streams question. Not always we'd like to have errors in the `stderr` test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 222: > 220: } > 221: > 222: if (exitCode != 0 || !output.equals("")) { > Note: FuzzerUtils.java uses or overrides a deprecated API. > Note: Recompile with -Xlint:deprecation for details. Warnings could corrupt the output. And we probably need to at least make it controllable. test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/CombinedJavaJasmExample.java line 72: > 70: out.println("}"); > 71: out.close(); > 72: return writer.toString(); Use a multi-line string here (and in other code samples)? Suggestion: return """ package p/xyz; super public class XYZJasm { public static Method test:"(I)I" stack 20 locals 20 { iload_0; iconst_2; imul; invokestatic Method p/xyz/XYZJava."mul3":"(I)I"; // reference java class ireturn; } public static Method mul5:"(I)I" stack 20 locals 20 { iload_0; ldc 5; imul; ireturn; } }"""; test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/CombinedJavaJasmExample.java line 90: > 88: out.println(" }"); > 89: out.println("}"); > 90: out.close(); `close()` has no effect here, AFAIR. Could be deleted? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694278322 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694278359 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694236325 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694236668 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694277716 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694239667 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694240795 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694238462 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694238767 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694238681 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694278627 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694240185 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694319177 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1692764990 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1692758741 From epeter at openjdk.org Thu Sep 12 15:57:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Sep 2024 15:57:28 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing In-Reply-To: References: Message-ID: On Sun, 28 Jul 2024 16:25:43 GMT, Evgeny Nikitin wrote: >> **Motivation** >> >> I want to write small dedicated fuzzers: >> - Generate `java` and `jasm` source code: just some `String`. >> - Quickly compile it (with this framework). >> - Execute the compiled code. >> >> The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. >> >> **The CompileFramework** >> >> Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. >> An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. >> >> I implemented a first, simple version of the framework. I added some tests and examples. >> >> **Example** >> >> >> CompileFramework comp = new CompileFramework(); >> comp.add(SourceCode.newJavaSourceCode("XYZ", "")); >> comp.compile(); >> comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); >> >> >> https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 >> >> **Below some use cases: tests that would have been better with the CompileFramework** >> >> **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** >> >> I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. >> >> For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, >> >> to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something i... > > test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 28: > >> 26: import compiler.lib.compile_framework.SourceCode; >> 27: import compiler.lib.compile_framework.CompileFrameworkException; >> 28: import compiler.lib.compile_framework.InternalCompileFrameworkException; > > Not needed? What do you mean? I have multiple uses in the file. > test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 42: > >> 40: import java.util.List; >> 41: import java.util.concurrent.TimeUnit; >> 42: import jdk.test.lib.process.ProcessTools; > > 1. URI and Arrays are not used > 2. imports list is unsorted. thanks! > test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 54: > >> 52: } >> 53: >> 54: public void printSourceCodes() { > > I'd decouple debug dumping from the stream it prints to. Not always we agree with filling the `stdout` with garbage. > Something like this: > > public String sourceCodesAsString() {...} > public Set getSourceCodes() {...} Ok, I'll make it dependent on `-DCompileFrameworkVerbose`. > test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 66: > >> 64: } >> 65: >> 66: printSourceCodes(); > > Make debug printouts controllable? Fuzzing generators compiling thousands files, would generate a whole big data of those printouts. Good point! I'll make it dependent on -DCompileFrameworkVerbose. > test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 73: > >> 71: switch (sourceCode.kind) { >> 72: case SourceCode.Kind.JASM -> { jasmSources.add(sourceCode); } >> 73: case SourceCode.Kind.JAVA -> { javaSources.add(sourceCode); } > > Simplify (and shorten imports also)? > Suggestion: > > case JASM -> { jasmSources.add(sourceCode); } > case JAVA -> { javaSources.add(sourceCode); } Hmm. I think I like it the way it is, a bit more explicit. > test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 87: > >> 85: final String sourceDir; >> 86: try { >> 87: sourceDir = "compile-framework-sources-" + ProcessTools.getProcessId(); > > Make it MT-safe, say by utilising the `File.createTempFile(prefix, suffix, directory)`? I would like the generated and saved files to be available int the `JTWork/scratch` directory. This allows us to download the source-code of a reproducer from our CI pipeline when there is a failure. Is that possible with `File.createTempFile` as well? > test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 131: > >> 129: private static String getAsmToolsPath() { >> 130: for (String path : getClassPaths()) { >> 131: System.out.println("jtreg.jar?: " + path); > > A weird debug printout here Ah, good point. Needed that for debugging on windows > test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 150: > >> 148: return; >> 149: } >> 150: System.out.println("Compiling Java sources: " + javaSources.size()); > > Same wish to make debug printouts controllable. > A dedicated (mini-?)logger, a static on/off variable, a dedicated PrintStream the user could set up themselves, etc. I think the flag will be off by default, so no printing. But if there ever arises a need for a logger like that, we can make this happen in a follow-up RFE. > test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 174: > >> 172: } >> 173: >> 174: private static List writeSourcesToFile(String sourceDir, List sources) { > > `String sourceDir` -> `Path sourceDir` ? Good idea! > test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 185: > >> 183: } >> 184: >> 185: private static void writeCodeToFile(String code, String fileName) { > > `String fileName` -> `Path fullPath` ? Yes, will do it systematically. > test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 189: > >> 187: File file = new File(fileName); >> 188: File dir = file.getAbsoluteFile().getParentFile(); >> 189: if (!dir.exists()){ > > A space is missing > Suggestion: > > if (!dir.exists()) { thanks! > test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 203: > >> 201: >> 202: ProcessBuilder builder = new ProcessBuilder(command); >> 203: builder.redirectErrorStream(true); > > Same streams question. Not always we'd like to have errors in the `stderr` But here I think I want everything to go to the `stdout`. That way, I can check below that the compilation had neither any errors nor warnings. See `output.equals("")`. > test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 222: > >> 220: } >> 221: >> 222: if (exitCode != 0 || !output.equals("")) { > >> Note: FuzzerUtils.java uses or overrides a deprecated API. >> Note: Recompile with -Xlint:deprecation for details. > > Warnings could corrupt the output. And we probably need to at least make it controllable. I see. I have so far not encountered any issues. I would like to keep it simple for now. We can invest in a more complicated solution in a follow up RFE. For now I suppose `FuzzerUtils` would just not be allowed. > test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/CombinedJavaJasmExample.java line 72: > >> 70: out.println("}"); >> 71: out.close(); >> 72: return writer.toString(); > > Use a multi-line string here (and in other code samples)? > > Suggestion: > > return """ > package p/xyz; > > super public class XYZJasm { > public static Method test:"(I)I" > stack 20 locals 20 > { > iload_0; > iconst_2; > imul; > invokestatic Method p/xyz/XYZJava."mul3":"(I)I"; // reference java class > ireturn; > } > > public static Method mul5:"(I)I" > stack 20 locals 20 > { > iload_0; > ldc 5; > imul; > ireturn; > } > }"""; Yes, I'll use a multi-line string, good idea! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694644541 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694854892 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694671396 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694698616 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694701904 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694704950 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694753498 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694775690 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694812582 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694812823 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694855298 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694816041 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694819394 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1693257830 From epeter at openjdk.org Thu Sep 12 15:57:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Sep 2024 15:57:28 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing In-Reply-To: References: Message-ID: <6pB8ELwoBubKWKTlOMrGWgpasmMex1FceFItsbL2sEE=.5a08d360-b532-4a1f-a45e-ab57c3ed0ba6@github.com> On Mon, 29 Jul 2024 06:27:20 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 28: >> >>> 26: import compiler.lib.compile_framework.SourceCode; >>> 27: import compiler.lib.compile_framework.CompileFrameworkException; >>> 28: import compiler.lib.compile_framework.InternalCompileFrameworkException; >> >> Not needed? > > What do you mean? I have multiple uses in the file. I want to separate exceptions where the user is responsible, and internal exceptions that should never happen (otherwise there is a bug in the CompileFramework). >> test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 87: >> >>> 85: final String sourceDir; >>> 86: try { >>> 87: sourceDir = "compile-framework-sources-" + ProcessTools.getProcessId(); >> >> Make it MT-safe, say by utilising the `File.createTempFile(prefix, suffix, directory)`? > > I would like the generated and saved files to be available int the `JTWork/scratch` directory. This allows us to download the source-code of a reproducer from our CI pipeline when there is a failure. Is that possible with `File.createTempFile` as well? But what exactly could go wrong here? The "compile" method is to be called in a single-threaded way. This thread should have a unique `ProcessTools.getProcessId`, right? So no two processes should be generating the same directory. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694853936 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1694784308 From enikitin at openjdk.org Thu Sep 12 15:57:28 2024 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Thu, 12 Sep 2024 15:57:28 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing In-Reply-To: <6pB8ELwoBubKWKTlOMrGWgpasmMex1FceFItsbL2sEE=.5a08d360-b532-4a1f-a45e-ab57c3ed0ba6@github.com> References: <6pB8ELwoBubKWKTlOMrGWgpasmMex1FceFItsbL2sEE=.5a08d360-b532-4a1f-a45e-ab57c3ed0ba6@github.com> Message-ID: On Mon, 29 Jul 2024 08:57:43 GMT, Emanuel Peter wrote: >> What do you mean? I have multiple uses in the file. > > I want to separate exceptions where the user is responsible, and internal exceptions that should never happen (otherwise there is a bug in the CompileFramework). Aren't they in the same package (and therefore the imports are redundant)? >> I would like the generated and saved files to be available int the `JTWork/scratch` directory. This allows us to download the source-code of a reproducer from our CI pipeline when there is a failure. Is that possible with `File.createTempFile` as well? > > But what exactly could go wrong here? The "compile" method is to be called in a single-threaded way. This thread should have a unique `ProcessTools.getProcessId`, right? So no two processes should be generating the same directory. > Is that possible with File.createTempFile as well? Yes. 'Directory', the argument is the path to contain the newly created temp file; > But what exactly could go wrong here? The "compile" method is to be called in a single-threaded way. 1. Well, I personally don't like imposing such a heavy (single-threaded-only) requirement while we can easily make it mt-safe; It's a **universal** compiling FW, after all. 2. Among the most obvious users of the FW are automated test generators, the likes of JavaFuzzer and JITTester. The former is multi-threaded (and is loved in the industry for that). The latter presents serious problems (huge run times) because of its single-threaded nature; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1695155601 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1695141058 From enikitin at openjdk.org Thu Sep 12 15:57:28 2024 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Thu, 12 Sep 2024 15:57:28 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing In-Reply-To: References: Message-ID: <39MJ4Z21-697VujpVo7eoiux6Vilo92fSmAZxHgYzy8=.ff50f18e-3109-4fa1-ae69-7ee34e4f6d51@github.com> On Mon, 29 Jul 2024 07:11:31 GMT, Emanuel Peter wrote: > Hmm. I think I like it the way it is, a bit more explicit. Up to you then. No objections on having different writing styles :) > See output.equals(""). Well, that doesn't require redirection. You check the process' output, not the host (JTReg) output. > But here I think I want everything to go to the stdout. Well, here I'd like to call the same argument of volume. If we generate 100s of files, using not the perfect generators (that generate warnings, can create semi-obsolete code, etc.)... we'd not want to stare at their outputs manually. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1695153284 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1695152034 From epeter at openjdk.org Thu Sep 12 15:57:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Sep 2024 15:57:28 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing In-Reply-To: References: <6pB8ELwoBubKWKTlOMrGWgpasmMex1FceFItsbL2sEE=.5a08d360-b532-4a1f-a45e-ab57c3ed0ba6@github.com> Message-ID: <52PN4jskBRo2Ce-63NwNruAJbc-jAbdyH0Q5hx02wpI=.df2a2823-4803-42e7-8441-e238d5951600@github.com> On Mon, 29 Jul 2024 12:49:55 GMT, Evgeny Nikitin wrote: >> I want to separate exceptions where the user is responsible, and internal exceptions that should never happen (otherwise there is a bug in the CompileFramework). > > Aren't they in the same package (and therefore the imports are redundant)? Correct, fixing that. Thanks! >> But what exactly could go wrong here? The "compile" method is to be called in a single-threaded way. This thread should have a unique `ProcessTools.getProcessId`, right? So no two processes should be generating the same directory. > >> Is that possible with File.createTempFile as well? > Yes. 'Directory', the argument is the path to contain the newly created temp file; > >> But what exactly could go wrong here? The "compile" method is to be called in a single-threaded way. > 1. Well, I personally don't like imposing such a heavy (single-threaded-only) requirement while we can easily make it mt-safe; It's a **universal** compiling FW, after all. > 2. Among the most obvious users of the FW are automated test generators, the likes of JavaFuzzer and JITTester. The former is multi-threaded (and is loved in the industry for that). The latter presents serious problems (huge run times) because of its single-threaded nature; @lepestock Why exactly is my solution not multi-threading safe? Does your solution with `File.createTempFile` allow (easy) extraction of the generated files when there is a failure? Where could those files be found? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1696452990 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1696443244 From epeter at openjdk.org Thu Sep 12 15:57:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Sep 2024 15:57:28 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing In-Reply-To: References: <6pB8ELwoBubKWKTlOMrGWgpasmMex1FceFItsbL2sEE=.5a08d360-b532-4a1f-a45e-ab57c3ed0ba6@github.com> <52PN4jskBRo2Ce-63NwNruAJbc-jAbdyH0Q5hx02wpI=.df2a2823-4803-42e7-8441-e238d5951600@github.com> Message-ID: On Tue, 30 Jul 2024 07:26:27 GMT, Emanuel Peter wrote: >> @lepestock Why exactly is my solution not multi-threading safe? >> Does your solution with `File.createTempFile` allow (easy) extraction of the generated files when there is a failure? Where could those files be found? > > BTW: in the IR-Framework, we also use `ProcessTools.getProcessId` to generate unique filenames. Would that also be an issue there? So far we have not seen problems with it though... Thanks for the offline conversation. `ProcessTools.getProcessId` is per process. If there are multiple threads in one process, this leads to issues. Also if I use the `CompileFramework` repeatedly, this could lead to issues. I think I can use `Files.createTempDirectory(Paths.get("."), "compile-framework-sources-");` to create a directory. "temp" does not mean it is necessarily deleted, only if I also use `File.deleteOnExit()`. So that should be ok, and the files are accessible if there is a failure. I'm also thinking that I may need a temporary class-file directory. Otherwise, there could be issues if the `CompileFramework` is used repeatedly or in parallel. Currently, I just put them all in the `-Dtest.classes` dir, but that could lead to raise conditions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1696502974 From epeter at openjdk.org Thu Sep 12 15:57:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Sep 2024 15:57:28 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing In-Reply-To: <52PN4jskBRo2Ce-63NwNruAJbc-jAbdyH0Q5hx02wpI=.df2a2823-4803-42e7-8441-e238d5951600@github.com> References: <6pB8ELwoBubKWKTlOMrGWgpasmMex1FceFItsbL2sEE=.5a08d360-b532-4a1f-a45e-ab57c3ed0ba6@github.com> <52PN4jskBRo2Ce-63NwNruAJbc-jAbdyH0Q5hx02wpI=.df2a2823-4803-42e7-8441-e238d5951600@github.com> Message-ID: On Tue, 30 Jul 2024 07:18:00 GMT, Emanuel Peter wrote: >>> Is that possible with File.createTempFile as well? >> Yes. 'Directory', the argument is the path to contain the newly created temp file; >> >>> But what exactly could go wrong here? The "compile" method is to be called in a single-threaded way. >> 1. Well, I personally don't like imposing such a heavy (single-threaded-only) requirement while we can easily make it mt-safe; It's a **universal** compiling FW, after all. >> 2. Among the most obvious users of the FW are automated test generators, the likes of JavaFuzzer and JITTester. The former is multi-threaded (and is loved in the industry for that). The latter presents serious problems (huge run times) because of its single-threaded nature; > > @lepestock Why exactly is my solution not multi-threading safe? > Does your solution with `File.createTempFile` allow (easy) extraction of the generated files when there is a failure? Where could those files be found? BTW: in the IR-Framework, we also use `ProcessTools.getProcessId` to generate unique filenames. Would that also be an issue there? So far we have not seen problems with it though... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1696455835 From enikitin at openjdk.org Thu Sep 12 15:57:28 2024 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Thu, 12 Sep 2024 15:57:28 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing In-Reply-To: References: <39MJ4Z21-697VujpVo7eoiux6Vilo92fSmAZxHgYzy8=.ff50f18e-3109-4fa1-ae69-7ee34e4f6d51@github.com> Message-ID: On Tue, 30 Jul 2024 07:24:02 GMT, Emanuel Peter wrote: >> I'm not sure I understand what you are saying. >> >> `redirectErrorStream`: Tells whether this process builder merges standard error and standard output. >> >> So all I'm doing is merging the stdout and stderr from the process. Now everything from the process goes to the stdout of the process, right? Then I can check `output.equals("")` and that captures that there is neither any thing on the stdout nor on the stderr of the process. >> >> Nothing is actually being printed in the JTREG stdout. Correct? > > BTW: you put not just my text as quotation but also your reply ;) > So all I'm doing is merging the stdout and stderr from the process. .. > Nothing is actually being printed in the JTREG stdout. Correct? You're right. Ignore that comment then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1696684075 From epeter at openjdk.org Thu Sep 12 15:57:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Sep 2024 15:57:28 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing In-Reply-To: <39MJ4Z21-697VujpVo7eoiux6Vilo92fSmAZxHgYzy8=.ff50f18e-3109-4fa1-ae69-7ee34e4f6d51@github.com> References: <39MJ4Z21-697VujpVo7eoiux6Vilo92fSmAZxHgYzy8=.ff50f18e-3109-4fa1-ae69-7ee34e4f6d51@github.com> Message-ID: On Mon, 29 Jul 2024 12:47:09 GMT, Evgeny Nikitin wrote: >> But here I think I want everything to go to the `stdout`. That way, I can check below that the compilation had neither any errors nor warnings. See `output.equals("")`. > >> See output.equals(""). > Well, that doesn't require redirection. You check the process' output, not the host (JTReg) output. > >> But here I think I want everything to go to the stdout. > Well, here I'd like to call the same argument of volume. If we generate 100s of files, using not the perfect generators (that generate warnings, can create semi-obsolete code, etc.)... we'd not want to stare at their outputs manually. I'm not sure I understand what you are saying. `redirectErrorStream`: Tells whether this process builder merges standard error and standard output. So all I'm doing is merging the stdout and stderr from the process. Now everything from the process goes to the stdout of the process, right? Then I can check `output.equals("")` and that captures that there is neither any thing on the stdout nor on the stderr of the process. Nothing is actually being printed in the JTREG stdout. Correct? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1696450187 From epeter at openjdk.org Thu Sep 12 15:57:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Sep 2024 15:57:28 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing In-Reply-To: References: <39MJ4Z21-697VujpVo7eoiux6Vilo92fSmAZxHgYzy8=.ff50f18e-3109-4fa1-ae69-7ee34e4f6d51@github.com> Message-ID: On Tue, 30 Jul 2024 07:23:16 GMT, Emanuel Peter wrote: >>> See output.equals(""). >> Well, that doesn't require redirection. You check the process' output, not the host (JTReg) output. >> >>> But here I think I want everything to go to the stdout. >> Well, here I'd like to call the same argument of volume. If we generate 100s of files, using not the perfect generators (that generate warnings, can create semi-obsolete code, etc.)... we'd not want to stare at their outputs manually. > > I'm not sure I understand what you are saying. > > `redirectErrorStream`: Tells whether this process builder merges standard error and standard output. > > So all I'm doing is merging the stdout and stderr from the process. Now everything from the process goes to the stdout of the process, right? Then I can check `output.equals("")` and that captures that there is neither any thing on the stdout nor on the stderr of the process. > > Nothing is actually being printed in the JTREG stdout. Correct? BTW: you put not just my text as quotation but also your reply ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1696452493 From sdohrmann at openjdk.org Thu Sep 12 16:09:21 2024 From: sdohrmann at openjdk.org (Steve Dohrmann) Date: Thu, 12 Sep 2024 16:09:21 GMT Subject: RFR: 8329035: New Data Destination instructions support [v6] In-Reply-To: References: Message-ID: <_R8D-GeFyVkkA2CgxVoU4xsmmF1-FWU42kZ1AQrbAoM=.f117eb17-c38d-4abb-8ec5-47979fcadf18@github.com> On Wed, 11 Sep 2024 23:02:41 GMT, Steve Dohrmann wrote: >> Adds assembler support for APX New Data Destination (NDD) and No Flags (NF) features. >> >> The NDD feature is supported by new functions that take an additional destination-only register operand. If the instruction also supports NF, a no_flags boolean parameter is present. To use these instructions with NF behavior, but without NDD semantics, the same register can be supplied for both the new destination and the (first) source operand. >> >> Some instructions support NF but not NDD. These instructions have a new function that just adds a boolean no_flags parameter. Existing functions were not overloaded with a boolean here because of signature collisions (bool / int) with functions that take immediate operands. >> >> All of the new functions have a letter "e" prefix, to avoid signature collisions and to indicate they will be evex encoded. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > xorw and long-line changes based on review comments Thank you reviewers for your helpful and timely comments! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20698#issuecomment-2346687781 From sviswanathan at openjdk.org Thu Sep 12 16:09:21 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 12 Sep 2024 16:09:21 GMT Subject: RFR: 8329035: New Data Destination instructions support [v6] In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 01:03:34 GMT, Vladimir Kozlov wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> xorw and long-line changes based on review comments > > Good. Thanks a lot @vnkozlov. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20698#issuecomment-2346693791 From sdohrmann at openjdk.org Thu Sep 12 16:09:22 2024 From: sdohrmann at openjdk.org (Steve Dohrmann) Date: Thu, 12 Sep 2024 16:09:22 GMT Subject: Integrated: 8329035: New Data Destination instructions support In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 22:44:09 GMT, Steve Dohrmann wrote: > Adds assembler support for APX New Data Destination (NDD) and No Flags (NF) features. > > The NDD feature is supported by new functions that take an additional destination-only register operand. If the instruction also supports NF, a no_flags boolean parameter is present. To use these instructions with NF behavior, but without NDD semantics, the same register can be supplied for both the new destination and the (first) source operand. > > Some instructions support NF but not NDD. These instructions have a new function that just adds a boolean no_flags parameter. Existing functions were not overloaded with a boolean here because of signature collisions (bool / int) with functions that take immediate operands. > > All of the new functions have a letter "e" prefix, to avoid signature collisions and to indicate they will be evex encoded. This pull request has now been integrated. Changeset: ab9b72c5 Author: Steve Dohrmann URL: https://git.openjdk.org/jdk/commit/ab9b72c50a5f324e53b8c6535f401cc185b98c75 Stats: 1535 lines in 2 files changed: 1514 ins; 2 del; 19 mod 8329035: New Data Destination instructions support Reviewed-by: kvn, sviswanathan, jbhateja ------------- PR: https://git.openjdk.org/jdk/pull/20698 From qamai at openjdk.org Thu Sep 12 16:47:27 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 12 Sep 2024 16:47:27 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v19] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: refine comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/a77e8f4a..25643785 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=17-18 Stats: 11 lines in 1 file changed: 0 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Thu Sep 12 16:52:12 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 12 Sep 2024 16:52:12 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v18] In-Reply-To: <5rE5jUqKmzecH6jMAXpaObv9xYRz3Xi1SCvCKhAQJ9o=.010bec0b-856d-4d71-94c8-7e02f0402a4e@github.com> References: <5rE5jUqKmzecH6jMAXpaObv9xYRz3Xi1SCvCKhAQJ9o=.010bec0b-856d-4d71-94c8-7e02f0402a4e@github.com> Message-ID: On Wed, 11 Sep 2024 05:24:38 GMT, Jatin Bhateja wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> remove leftover code > > src/hotspot/share/opto/rangeinference.hpp line 69: > >> 67: return (v & _zeros) == 0 && (v & _ones) == _ones; >> 68: } >> 69: }; > > It will be good if we add basic operations to KnowBits like. > KnownBits.getMaxValue() returning ~ZEROS > KnownBits.getMinValue() returning ONE > KnownBits.and(KnownBits arg) > KnownBits.or(KnownBits arg) > KnownBits.xor(KnownBits args) > KnownBits.not() > > > These can be quite handy during data flow analysis using KnownBits Yes I think they would be helpful in later patches when implementing `Value` methods of several nodes to take advantage of additional `TypeInt` information. > src/hotspot/share/opto/type.hpp line 661: > >> 659: // the below constraints, see contains(jint) >> 660: const jint _lo, _hi; // Lower bound, upper bound in the signed domain >> 661: const juint _ulo, _uhi; // Lower bound, upper bound in the unsigned domain > > Can't we do without explicit fields to record unsigned hi / lo ? > We just need to present a unsigned view of signed _lo and _hi which can be done using safe macros. No we can't, consider `TypeInt::NON_ZERO`. It would have `_lo = min_jint`, `_hi = max_jint`, `_zeros = 0`, `_ones = 0`. Which make it impossible to distinguish from `TypeInt::INT` without unsigned bounds. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1757237569 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1757240241 From sviswanathan at openjdk.org Thu Sep 12 23:17:13 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 12 Sep 2024 23:17:13 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes Message-ID: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. Summary of changes is as follows: 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code For the following source: public void test() { var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); index.selectFrom(inpvect).intoArray(byteres, j); } } The code generated for inner main now looks as follows: ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 0x00007f40d02274d0: movslq %ebx,%r13 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) 0x00007f40d022751f: add $0x40,%ebx 0x00007f40d0227522: cmp %r8d,%ebx 0x00007f40d0227525: jl 0x00007f40d02274d0 Best Regards, Sandhya ------------- Commit messages: - Merge branch 'master' of https://git.openjdk.java.net/jdk into rearrangewrap - Some cleanup - Some small fixes - Initial feedback - Optionally partial wrap shuffles during construction - Wrap shuffle on rearrange Changes: https://git.openjdk.org/jdk/pull/20634/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20634&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340079 Stats: 686 lines in 47 files changed: 548 ins; 30 del; 108 mod Patch: https://git.openjdk.org/jdk/pull/20634.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20634/head:pull/20634 PR: https://git.openjdk.org/jdk/pull/20634 From psandoz at openjdk.org Thu Sep 12 23:17:13 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Thu, 12 Sep 2024 23:17:13 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: <0LLon0mwzZnp8sR_306z0BoBUjXQAgLBn_KHP-37PC0=.802ff560-b97b-43ba-83b1-94d331d7e03e@github.com> On Mon, 19 Aug 2024 21:47:23 GMT, Sandhya Viswanathan wrote: > Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. > > Summary of changes is as follows: > 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. > 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code > > For the following source: > > > public void test() { > var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); > for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { > var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); > index.selectFrom(inpvect).intoArray(byteres, j); > } > } > > > The code generated for inner main now looks as follows: > ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 > 0x00007f40d02274d0: movslq %ebx,%r13 > 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 > 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) > 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 > 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) > 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 > 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) > 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 > 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) > 0x00007f40d022751f: add $0x40,%ebx > 0x00007f40d0227522: cmp %r8d,%ebx > 0x00007f40d0227525: jl 0x00007f40d02274d0 > > Best Regards, > Sandhya API shapes are good! I see you intrinsified `selectFrom` which, IIUC, optimally generates C2 nodes that are functionally equivalent to the Java expression `v.rearrange(this.toShuffle())`. That way we can better generate an optimal set of instructions? Do you know what deficiencies there that blocks us from compiling the expression down to the same set of instructions as the intrinsic? Not suggesting we do that here, just for future reference. Adding link to UTF-8 decoding use case for convenience and reminder: https://github.com/AugustNagro/utf8.java/blob/master/src/main/java/com/augustnagro/utf8/Utf8.java. I think this is good enough to promote out of draft and create a CSR for the API changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2305377165 PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2305412450 PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2346848993 From sviswanathan at openjdk.org Thu Sep 12 23:17:14 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 12 Sep 2024 23:17:14 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: <0LLon0mwzZnp8sR_306z0BoBUjXQAgLBn_KHP-37PC0=.802ff560-b97b-43ba-83b1-94d331d7e03e@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> <0LLon0mwzZnp8sR_306z0BoBUjXQAgLBn_KHP-37PC0=.802ff560-b97b-43ba-83b1-94d331d7e03e@github.com> Message-ID: On Thu, 22 Aug 2024 18:21:50 GMT, Paul Sandoz wrote: > API shapes are good! > > I see you intrinsified `selectFrom` which, IIUC, optimally generates C2 nodes that are functionally equivalent to the Java expression `v.rearrange(this.toShuffle())`. That way we can better generate an optimal set of instructions? > > Do you know what deficiencies there that blocks us from compiling the expression down to the same set of instructions as the intrinsic? Not suggesting we do that here, just for future reference. Yes, I intrinsified to generate optimial set of instructions. In the expression `v.rearrange(this.toShuffle())` we will do first partial wrap as part of this.toShuffle() and then full wrap as part of rearrange. In the intrinsic I am only doing full wrap. Without intrinsic, if for whatever reason the this.toShuffle() is not moved out of the loop by the JIT, we incur additional overhead of the partial wrap in the hot code path. I saw this happening when the following is run as part of the jmh instead of being called from standalone java with a loop: var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); index.selectFrom(inpvect).intoArray(byteres, j); } The perf difference between the intrinsic and no intrinsic observed in this case then is about 20%. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2305521441 From dlong at openjdk.org Fri Sep 13 00:20:07 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 13 Sep 2024 00:20:07 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v4] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 12:16:47 GMT, Daniel Lund?n wrote: > Add a STATIC_ASSERT that short can index the maximum size register mask. This assumes the types in OptoRegPair won't change. Something more future-proof would be to try to construct an OptoRegPair with RM_SIZE_MAX >> 5, then try to read it back. We should probably have OptoRegPair check input values for overflow while we are at it. > Add an upper bound for register mask growth Does this mean "AllStack" no longer means infinite? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2347482489 From rehn at openjdk.org Fri Sep 13 07:12:11 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 13 Sep 2024 07:12:11 GMT Subject: RFR: 8321010: RISC-V: C2 RoundVF [v16] In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 10:29:59 GMT, Hamlin Li wrote: >> Hi, >> Can you have a review on this patch to add RoundVF/RoundDF intrinsics? >> >> Current test shows that, it bring performance gain when vlenb >= 32 (which is on bananapi), but bring regression when vlenb == 16 (which is on k230). So I only enable the intrinsic when vlenb >= 32. >> >> For double, even vlenb == 32, there is still some regression, so in this pr I only enable it when vlenb >= 64. Although there is no hardware to verify it, I think from the trend of performance data on bananapi and k230, it's promising to bring better performance rather than regression for double when vlenb == 64+. Please compare the data of `Benchmark on bananapi, +UseSuperWord` and `Benchmark on k230, +UseSuperWord, enable RoundVF/D when vlenb == 16`. >> >> Thanks! >> >> ## Tests >> >> test/hotspot/jtreg/compiler/vectorization/TestRoundVectRiscv64.java test/hotspot/jtreg/compiler/c2/cr6340864/TestFloatVect.java test/hotspot/jtreg/compiler/c2/cr6340864/TestDoubleVect.java test/hotspot/jtreg/compiler/floatingpoint/TestRound.java >> >> test/jdk/java/lang/Math/RoundTests.java >> >> ## Performance - with Intrinsic >> >> ### on bananapi >> Benchmark on bananapi, +UseSuperWord >> >> Benchmark on bananapi, +UseSuperWord | (TESTSIZE) | Mode | Cnt | Score +intrinsic | Score -intrinsic | Error | Units | Improvement >> -- | -- | -- | -- | -- | -- | -- | -- | -- >> FpRoundingBenchmark.test_round_double | 2048 | avgt | 10 | 23794.153 | 20557.467 | 2899.266 | ns/op | 0.864 >> FpRoundingBenchmark.test_round_float | 2048 | avgt | 10 | 11531.853 | 16562.431 | 865.779 | ns/op | 1.436 >> >> >> >> ### on k230 (enable intrinsic even when vlenb == 16) >> Benchmark on k230, +UseSuperWord, enable RoundVF/D when vlenb == 16 >> >> Benchmark on k230, +UseSuperWord, enable RoundVF/D ... > > Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: > > - comments > - Merge branch 'master' into round-F+D-v > - minor > - minor > - minor > - add additional tests > - enable roundVD when MaxVectorSize >= 64 > - enable intrinsic when MaxVectorSize >= 32 > - Merge branch 'master' into round-F+D-v > - enable when vlenb >= 32 > - ... and 15 more: https://git.openjdk.org/jdk/compare/be34730f...c35fcddc Seems alright, thanks! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17745#pullrequestreview-2302283083 From roland at openjdk.org Fri Sep 13 07:43:49 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 13 Sep 2024 07:43:49 GMT Subject: RFR: 8333258: C2: high memory usage in PhaseCFG::insert_anti_dependences() [v3] In-Reply-To: References: Message-ID: On Tue, 23 Jul 2024 16:20:23 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - refactoring >> - Merge branch 'master' into JDK-8333258 >> - review >> - Merge branch 'master' into JDK-8333258 >> - whitespaces >> - tests & fix > > But I would like you to fix the comments here: > > // The relevant stores "nearby" the load consist of a tree rooted > // at initial_mem, with internal nodes of type MergeMem. > // Therefore, the branches visited by the worklist are of this form: > // initial_mem -> (MergeMem ->)* store > // The anti-dependence constraints apply only to the fringe of this tree. > > There are not just `MergeMem` but also `Phi` nodes. @eme64 can you have another look. New commit should address your comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19791#issuecomment-2348251836 From roland at openjdk.org Fri Sep 13 07:43:46 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 13 Sep 2024 07:43:46 GMT Subject: RFR: 8333258: C2: high memory usage in PhaseCFG::insert_anti_dependences() [v6] In-Reply-To: References: Message-ID: <4nYVjz-ULoZGwX8qbWOLPvTIMxq6C3IYCWiEdsaC8sk=.527878ef-f3ad-4085-a321-da812e201cea@github.com> > In a debug build, `PhaseCFG::insert_anti_dependences()` is called > twice for a single node: once for actual processing, once for > verification. > > In TestAntiDependenciesHighMemUsage, the test has a `Region` that > merges 337 incoming path. It also has one `Phi` per memory slice that > are stored to: 1000 `Phi` nodes. Each `Phi` node has 337 inputs that > are identical except for one. The common input is the memory state on > method entry. The test has 60 `Load` that needs to be processed for > anti dependences. All `Load` share the same memory input: the memory > state on method entry. For each `Load`, all `Phi` nodes are pushed 336 > times on the work lists for anti dependence processing because all of > them appear multiple times as uses of each `Load`s memory state: `Phi`s > are pushed 336 000 on 2 work lists. Memory is not reclaimed on exit > from `PhaseCFG::insert_anti_dependences()` so memory usage grows as > `Load` nodes are processed: > > 336000 * 2 work lists * 60 loads * 8 bytes pointer = 322 MB. > > The fix I propose for this is to not push `Phi` nodes more than once > when they have the same inputs multiple times. > > In TestAntiDependenciesHighMemUsage2, the test has 4000 loads. For > each of them, when processed for anti dependences, all 4000 loads are > pushed on the work lists because they share the same memory > input. Then when they are popped from the work list, they are > discarded because only stores are of interest: > > 4000 loads processed * 4000 loads pushed * 2 work lists * 8 bytes pointer = 256 MB. > > The fix I propose for this is to test before pushing on the work list > whether a node is a store or not. > > Finally, I propose adding a `ResourceMark` so memory doesn't > accumulate over calls to `PhaseCFG::insert_anti_dependences()`. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - review - Merge branch 'master' into JDK-8333258 - more review - more review - Merge branch 'master' into JDK-8333258 - review - Merge branch 'master' into JDK-8333258 - refactoring - Merge branch 'master' into JDK-8333258 - review - ... and 3 more: https://git.openjdk.org/jdk/compare/a01486a6...4511c175 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19791/files - new: https://git.openjdk.org/jdk/pull/19791/files/15a33090..4511c175 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19791&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19791&range=04-05 Stats: 11038 lines in 484 files changed: 6574 ins; 1789 del; 2675 mod Patch: https://git.openjdk.org/jdk/pull/19791.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19791/head:pull/19791 PR: https://git.openjdk.org/jdk/pull/19791 From epeter at openjdk.org Fri Sep 13 07:48:14 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 13 Sep 2024 07:48:14 GMT Subject: RFR: 8333258: C2: high memory usage in PhaseCFG::insert_anti_dependences() [v6] In-Reply-To: <4nYVjz-ULoZGwX8qbWOLPvTIMxq6C3IYCWiEdsaC8sk=.527878ef-f3ad-4085-a321-da812e201cea@github.com> References: <4nYVjz-ULoZGwX8qbWOLPvTIMxq6C3IYCWiEdsaC8sk=.527878ef-f3ad-4085-a321-da812e201cea@github.com> Message-ID: On Fri, 13 Sep 2024 07:43:46 GMT, Roland Westrelin wrote: >> In a debug build, `PhaseCFG::insert_anti_dependences()` is called >> twice for a single node: once for actual processing, once for >> verification. >> >> In TestAntiDependenciesHighMemUsage, the test has a `Region` that >> merges 337 incoming path. It also has one `Phi` per memory slice that >> are stored to: 1000 `Phi` nodes. Each `Phi` node has 337 inputs that >> are identical except for one. The common input is the memory state on >> method entry. The test has 60 `Load` that needs to be processed for >> anti dependences. All `Load` share the same memory input: the memory >> state on method entry. For each `Load`, all `Phi` nodes are pushed 336 >> times on the work lists for anti dependence processing because all of >> them appear multiple times as uses of each `Load`s memory state: `Phi`s >> are pushed 336 000 on 2 work lists. Memory is not reclaimed on exit >> from `PhaseCFG::insert_anti_dependences()` so memory usage grows as >> `Load` nodes are processed: >> >> 336000 * 2 work lists * 60 loads * 8 bytes pointer = 322 MB. >> >> The fix I propose for this is to not push `Phi` nodes more than once >> when they have the same inputs multiple times. >> >> In TestAntiDependenciesHighMemUsage2, the test has 4000 loads. For >> each of them, when processed for anti dependences, all 4000 loads are >> pushed on the work lists because they share the same memory >> input. Then when they are popped from the work list, they are >> discarded because only stores are of interest: >> >> 4000 loads processed * 4000 loads pushed * 2 work lists * 8 bytes pointer = 256 MB. >> >> The fix I propose for this is to test before pushing on the work list >> whether a node is a store or not. >> >> Finally, I propose adding a `ResourceMark` so memory doesn't >> accumulate over calls to `PhaseCFG::insert_anti_dependences()`. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8333258 > - more review > - more review > - Merge branch 'master' into JDK-8333258 > - review > - Merge branch 'master' into JDK-8333258 > - refactoring > - Merge branch 'master' into JDK-8333258 > - review > - ... and 3 more: https://git.openjdk.org/jdk/compare/c5080834...4511c175 Thanks for all the updates @rwestrel . LGTM :) And thanks for bearing with all my comments ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19791#pullrequestreview-2302362938 From roland at openjdk.org Fri Sep 13 07:48:17 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 13 Sep 2024 07:48:17 GMT Subject: RFR: 8333258: C2: high memory usage in PhaseCFG::insert_anti_dependences() [v3] In-Reply-To: References: Message-ID: On Tue, 23 Jul 2024 16:20:23 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - refactoring >> - Merge branch 'master' into JDK-8333258 >> - review >> - Merge branch 'master' into JDK-8333258 >> - whitespaces >> - tests & fix > > But I would like you to fix the comments here: > > // The relevant stores "nearby" the load consist of a tree rooted > // at initial_mem, with internal nodes of type MergeMem. > // Therefore, the branches visited by the worklist are of this form: > // initial_mem -> (MergeMem ->)* store > // The anti-dependence constraints apply only to the fringe of this tree. > > There are not just `MergeMem` but also `Phi` nodes. @eme64 thanks for the review @vnkozlov does this still look good to you? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19791#issuecomment-2348262038 From mli at openjdk.org Fri Sep 13 07:48:45 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 13 Sep 2024 07:48:45 GMT Subject: RFR: 8321010: RISC-V: C2 RoundVF [v2] In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 08:38:12 GMT, Ludovic Henry wrote: >> @Hamlin-Li: Thanks for the quick update. Considering that saving/restoring for FRM could be expensive, I do wonder if we could gather some performance numbers before we go. I see people are now testing on RVV-1.0 hardwares [1] and I am also trying to get one (AFAIK, more powerful RVV-1.0 hardwares are also coming later this year, SG2044, SG2380, etc.). Also from discussion on [2], I see there are also other approaches available there without flipping the FP rounding mode. But I am not sure if they make sense for our case or work better without actual testing. >> >> [1] https://github.com/openjdk/jdk/pull/18382#issuecomment-2045145255 >> [2] https://github.com/openjdk/jdk/pull/8204 > > @RealFYang @turbanoff could we please have another review? Thank you! Thanks @luhenry @robehn for you reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17745#issuecomment-2348261551 From mli at openjdk.org Fri Sep 13 07:48:45 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 13 Sep 2024 07:48:45 GMT Subject: RFR: 8321010: RISC-V: C2 RoundVF [v17] In-Reply-To: References: Message-ID: <9bOxRjlJpJjeHfPHA5X9a3Xap_rBR2JDv_xrw2ZcFTc=.96857038-27a9-416d-8c5d-7862200c439c@github.com> > Hi, > Can you have a review on this patch to add RoundVF/RoundDF intrinsics? > > Current test shows that, it bring performance gain when vlenb >= 32 (which is on bananapi), but bring regression when vlenb == 16 (which is on k230). So I only enable the intrinsic when vlenb >= 32. > > For double, even vlenb == 32, there is still some regression, so in this pr I only enable it when vlenb >= 64. Although there is no hardware to verify it, I think from the trend of performance data on bananapi and k230, it's promising to bring better performance rather than regression for double when vlenb == 64+. Please compare the data of `Benchmark on bananapi, +UseSuperWord` and `Benchmark on k230, +UseSuperWord, enable RoundVF/D when vlenb == 16`. > > Thanks! > > ## Tests > > test/hotspot/jtreg/compiler/vectorization/TestRoundVectRiscv64.java test/hotspot/jtreg/compiler/c2/cr6340864/TestFloatVect.java test/hotspot/jtreg/compiler/c2/cr6340864/TestDoubleVect.java test/hotspot/jtreg/compiler/floatingpoint/TestRound.java > > test/jdk/java/lang/Math/RoundTests.java > > ## Performance - with Intrinsic > > ### on bananapi > Benchmark on bananapi, +UseSuperWord > > Benchmark on bananapi, +UseSuperWord | (TESTSIZE) | Mode | Cnt | Score +intrinsic | Score -intrinsic | Error | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- > FpRoundingBenchmark.test_round_double | 2048 | avgt | 10 | 23794.153 | 20557.467 | 2899.266 | ns/op | 0.864 > FpRoundingBenchmark.test_round_float | 2048 | avgt | 10 | 11531.853 | 16562.431 | 865.779 | ns/op | 1.436 > > > > ### on k230 (enable intrinsic even when vlenb == 16) > Benchmark on k230, +UseSuperWord, enable RoundVF/D when vlenb == 16 > > Benchmark on k230, +UseSuperWord, enable RoundVF/D when vlenb == 16 | (TESTSIZE) | Mode | Cnt | Score +intrinsic ... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: minor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17745/files - new: https://git.openjdk.org/jdk/pull/17745/files/c35fcddc..68608fbd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17745&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17745&range=15-16 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17745.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17745/head:pull/17745 PR: https://git.openjdk.org/jdk/pull/17745 From jbhateja at openjdk.org Fri Sep 13 07:52:38 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Sep 2024 07:52:38 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v3] In-Reply-To: References: Message-ID: > - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The > condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. > - This saves emitting an explicit MOVZX instruction after setCC. > - These new instructions are encoded using 4 byte Extended EVEX encoding. > > Validation performed over stand alone test point using Intel SDE. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Review comments resolution. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8339790 - Review resolutions. - 8339790: Support Intel APX setzucc instruction. ------------- Changes: https://git.openjdk.org/jdk/pull/20920/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20920&range=02 Stats: 77 lines in 7 files changed: 26 ins; 25 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/20920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20920/head:pull/20920 PR: https://git.openjdk.org/jdk/pull/20920 From rehn at openjdk.org Fri Sep 13 08:03:07 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 13 Sep 2024 08:03:07 GMT Subject: RFR: 8321010: RISC-V: C2 RoundVF [v17] In-Reply-To: <9bOxRjlJpJjeHfPHA5X9a3Xap_rBR2JDv_xrw2ZcFTc=.96857038-27a9-416d-8c5d-7862200c439c@github.com> References: <9bOxRjlJpJjeHfPHA5X9a3Xap_rBR2JDv_xrw2ZcFTc=.96857038-27a9-416d-8c5d-7862200c439c@github.com> Message-ID: On Fri, 13 Sep 2024 07:48:45 GMT, Hamlin Li wrote: >> Hi, >> Can you have a review on this patch to add RoundVF/RoundDF intrinsics? >> >> Current test shows that, it bring performance gain when vlenb >= 32 (which is on bananapi), but bring regression when vlenb == 16 (which is on k230). So I only enable the intrinsic when vlenb >= 32. >> >> For double, even vlenb == 32, there is still some regression, so in this pr I only enable it when vlenb >= 64. Although there is no hardware to verify it, I think from the trend of performance data on bananapi and k230, it's promising to bring better performance rather than regression for double when vlenb == 64+. Please compare the data of `Benchmark on bananapi, +UseSuperWord` and `Benchmark on k230, +UseSuperWord, enable RoundVF/D when vlenb == 16`. >> >> Thanks! >> >> ## Tests >> >> test/hotspot/jtreg/compiler/vectorization/TestRoundVectRiscv64.java test/hotspot/jtreg/compiler/c2/cr6340864/TestFloatVect.java test/hotspot/jtreg/compiler/c2/cr6340864/TestDoubleVect.java test/hotspot/jtreg/compiler/floatingpoint/TestRound.java >> >> test/jdk/java/lang/Math/RoundTests.java >> >> ## Performance - with Intrinsic >> >> ### on bananapi >> Benchmark on bananapi, +UseSuperWord >> >> Benchmark on bananapi, +UseSuperWord | (TESTSIZE) | Mode | Cnt | Score +intrinsic | Score -intrinsic | Error | Units | Improvement >> -- | -- | -- | -- | -- | -- | -- | -- | -- >> FpRoundingBenchmark.test_round_double | 2048 | avgt | 10 | 23794.153 | 20557.467 | 2899.266 | ns/op | 0.864 >> FpRoundingBenchmark.test_round_float | 2048 | avgt | 10 | 11531.853 | 16562.431 | 865.779 | ns/op | 1.436 >> >> >> >> ### on k230 (enable intrinsic even when vlenb == 16) >> Benchmark on k230, +UseSuperWord, enable RoundVF/D when vlenb == 16 >> >> Benchmark on k230, +UseSuperWord, enable RoundVF/D ... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > minor Marked as reviewed by rehn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17745#pullrequestreview-2302403374 From mli at openjdk.org Fri Sep 13 08:08:16 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 13 Sep 2024 08:08:16 GMT Subject: Integrated: 8321010: RISC-V: C2 RoundVF In-Reply-To: References: Message-ID: On Wed, 7 Feb 2024 09:58:35 GMT, Hamlin Li wrote: > Hi, > Can you have a review on this patch to add RoundVF/RoundDF intrinsics? > > Current test shows that, it bring performance gain when vlenb >= 32 (which is on bananapi), but bring regression when vlenb == 16 (which is on k230). So I only enable the intrinsic when vlenb >= 32. > > For double, even vlenb == 32, there is still some regression, so in this pr I only enable it when vlenb >= 64. Although there is no hardware to verify it, I think from the trend of performance data on bananapi and k230, it's promising to bring better performance rather than regression for double when vlenb == 64+. Please compare the data of `Benchmark on bananapi, +UseSuperWord` and `Benchmark on k230, +UseSuperWord, enable RoundVF/D when vlenb == 16`. > > Thanks! > > ## Tests > > test/hotspot/jtreg/compiler/vectorization/TestRoundVectRiscv64.java test/hotspot/jtreg/compiler/c2/cr6340864/TestFloatVect.java test/hotspot/jtreg/compiler/c2/cr6340864/TestDoubleVect.java test/hotspot/jtreg/compiler/floatingpoint/TestRound.java > > test/jdk/java/lang/Math/RoundTests.java > > ## Performance - with Intrinsic > > ### on bananapi > Benchmark on bananapi, +UseSuperWord > > Benchmark on bananapi, +UseSuperWord | (TESTSIZE) | Mode | Cnt | Score +intrinsic | Score -intrinsic | Error | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- > FpRoundingBenchmark.test_round_double | 2048 | avgt | 10 | 23794.153 | 20557.467 | 2899.266 | ns/op | 0.864 > FpRoundingBenchmark.test_round_float | 2048 | avgt | 10 | 11531.853 | 16562.431 | 865.779 | ns/op | 1.436 > > > > ### on k230 (enable intrinsic even when vlenb == 16) > Benchmark on k230, +UseSuperWord, enable RoundVF/D when vlenb == 16 > > Benchmark on k230, +UseSuperWord, enable RoundVF/D when vlenb == 16 | (TESTSIZE) | Mode | Cnt | Score +intrinsic ... This pull request has now been integrated. Changeset: bacd0460 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/bacd046062bffb4c95ec7a508a1080ad651a94a4 Stats: 921 lines in 11 files changed: 921 ins; 0 del; 0 mod 8321010: RISC-V: C2 RoundVF 8321011: RISC-V: C2 RoundVD Reviewed-by: rehn, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/17745 From dlunden at openjdk.org Fri Sep 13 08:42:10 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 13 Sep 2024 08:42:10 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v4] In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 00:17:53 GMT, Dean Long wrote: > This assumes the types in OptoRegPair won't change. Something more future-proof would be to try to construct an OptoRegPair with RM_SIZE_MAX >> 5, then try to read it back. We should probably have OptoRegPair check input values for overflow while we are at it. Right, it's better to not hardcode it to `short`. I'll fix it. > Does this mean "AllStack" no longer means infinite? "AllStack" still works the same way as before; the growth cap is only for the actual representable bits in the register mask. When rolling over register masks in `PhaseChaitin::Select` (the main use of the all-stack flag), I have added a sanity bailout if `short` cannot index the rolled-over mask. I'll change that check to use `OptoRegPair` instead as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2348374965 From thartmann at openjdk.org Fri Sep 13 09:14:21 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 13 Sep 2024 09:14:21 GMT Subject: RFR: 8338566: Missing membar in ciEnv::get_or_create_exception before publishing handle [v2] In-Reply-To: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> References: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> Message-ID: <_spGzlVz58-PSyiiwyEgVC2Z5YXsCg-sgF0-WWl_nLE=.55e033e4-c400-453e-a2a1-0c2d2ee7ea9c@github.com> > Similar to [JDK-8251923](https://bugs.openjdk.org/browse/JDK-8251923), we need a store-store barrier before publishing a handle because otherwise another thread could observe the handle before it's fully initialized and read null from it. This affects architectures with a weak memory model like AArch64. > > Unfortunately, this only happened twice in our testing and I was never able to reproduce it. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Create exceptions eagerly ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20950/files - new: https://git.openjdk.org/jdk/pull/20950/files/5b4d9d11..155cd71d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20950&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20950&range=00-01 Stats: 110 lines in 6 files changed: 41 ins; 63 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20950.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20950/head:pull/20950 PR: https://git.openjdk.org/jdk/pull/20950 From thartmann at openjdk.org Fri Sep 13 09:14:21 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 13 Sep 2024 09:14:21 GMT Subject: RFR: 8338566: Missing membar in ciEnv::get_or_create_exception before publishing handle In-Reply-To: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> References: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> Message-ID: On Wed, 11 Sep 2024 14:17:30 GMT, Tobias Hartmann wrote: > Similar to [JDK-8251923](https://bugs.openjdk.org/browse/JDK-8251923), we need a store-store barrier before publishing a handle because otherwise another thread could observe the handle before it's fully initialized and read null from it. This affects architectures with a weak memory model like AArch64. > > Unfortunately, this only happened twice in our testing and I was never able to reproduce it. > > Thanks, > Tobias Okay, here's a new version with eager exception creation. Looks much cleaner to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20950#issuecomment-2348443203 From shade at openjdk.org Fri Sep 13 10:09:31 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 13 Sep 2024 10:09:31 GMT Subject: RFR: 8340102: Move assert-only loop in OopMapSort::sort under debug macro Message-ID: Found this papercut when looking at Leyden perf runs. In OopMapSort::sort, there is a loop that apparently is only there for asserts. At least GCC 11.4 apparently not smart enough to eliminate the whole loop in release builds, probably because iterator reads things from the stream. Wrapping the loop with #ifdef ASSERT saves about 144 bytes in code stream. ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/20992/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20992&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340102 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20992.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20992/head:pull/20992 PR: https://git.openjdk.org/jdk/pull/20992 From stuefe at openjdk.org Fri Sep 13 10:43:03 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 13 Sep 2024 10:43:03 GMT Subject: RFR: 8340102: Move assert-only loop in OopMapSort::sort under debug macro In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 10:04:01 GMT, Aleksey Shipilev wrote: > Found this papercut when looking at Leyden perf runs. > > In OopMapSort::sort, there is a loop that apparently is only there for asserts. At least GCC 11.4 apparently not smart enough to eliminate the whole loop in release builds, probably because iterator reads things from the stream. Wrapping the loop with #ifdef ASSERT saves about 144 bytes in code stream. Good. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20992#pullrequestreview-2302788641 From fyang at openjdk.org Fri Sep 13 11:00:04 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 13 Sep 2024 11:00:04 GMT Subject: RFR: 8340102: Move assert-only loop in OopMapSort::sort under debug macro In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 10:04:01 GMT, Aleksey Shipilev wrote: > Found this papercut when looking at Leyden perf runs. > > In OopMapSort::sort, there is a loop that apparently is only there for asserts. At least GCC 11.4 apparently not smart enough to eliminate the whole loop in release builds, probably because iterator reads things from the stream. Wrapping the loop with #ifdef ASSERT saves about 144 bytes in code stream. LGTM. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20992#pullrequestreview-2302817493 From shade at openjdk.org Fri Sep 13 11:43:05 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 13 Sep 2024 11:43:05 GMT Subject: RFR: 8338566: Missing membar in ciEnv::get_or_create_exception before publishing handle [v2] In-Reply-To: <_spGzlVz58-PSyiiwyEgVC2Z5YXsCg-sgF0-WWl_nLE=.55e033e4-c400-453e-a2a1-0c2d2ee7ea9c@github.com> References: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> <_spGzlVz58-PSyiiwyEgVC2Z5YXsCg-sgF0-WWl_nLE=.55e033e4-c400-453e-a2a1-0c2d2ee7ea9c@github.com> Message-ID: On Fri, 13 Sep 2024 09:14:21 GMT, Tobias Hartmann wrote: >> Similar to [JDK-8251923](https://bugs.openjdk.org/browse/JDK-8251923), we need a store-store barrier before publishing a handle because otherwise another thread could observe the handle before it's fully initialized and read null from it. This affects architectures with a weak memory model like AArch64. >> >> Unfortunately, this only happened twice in our testing and I was never able to reproduce it. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Create exceptions eagerly Cleaner, right? No concurrency problems, exceptions get CDS archived. This might affect startup a little, but I would not expect it to matter. I suggest to rename the bug into something else, given the whole `ciEnv::get_or_create_exception` is gone. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20950#pullrequestreview-2302892133 From thartmann at openjdk.org Fri Sep 13 12:12:04 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 13 Sep 2024 12:12:04 GMT Subject: RFR: 8338566: Lazy creation of exception instances is not thread safe [v2] In-Reply-To: <_spGzlVz58-PSyiiwyEgVC2Z5YXsCg-sgF0-WWl_nLE=.55e033e4-c400-453e-a2a1-0c2d2ee7ea9c@github.com> References: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> <_spGzlVz58-PSyiiwyEgVC2Z5YXsCg-sgF0-WWl_nLE=.55e033e4-c400-453e-a2a1-0c2d2ee7ea9c@github.com> Message-ID: On Fri, 13 Sep 2024 09:14:21 GMT, Tobias Hartmann wrote: >> Similar to [JDK-8251923](https://bugs.openjdk.org/browse/JDK-8251923), we need a store-store barrier before publishing a handle because otherwise another thread could observe the handle before it's fully initialized and read null from it. This affects architectures with a weak memory model like AArch64. >> >> Unfortunately, this only happened twice in our testing and I was never able to reproduce it. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Create exceptions eagerly Thanks Aleksey! I changed the title. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20950#issuecomment-2348803491 From jkarthikeyan at openjdk.org Fri Sep 13 14:42:08 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 13 Sep 2024 14:42:08 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v3] In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 07:52:38 GMT, Jatin Bhateja wrote: >> - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The >> condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. >> - This saves emitting an explicit MOVZX instruction after setCC. >> - These new instructions are encoded using 4 byte Extended EVEX encoding. >> >> Validation performed over stand alone test point using Intel SDE. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Review comments resolution. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8339790 > - Review resolutions. > - 8339790: Support Intel APX setzucc instruction. This looks good to me! ------------- Marked as reviewed by jkarthikeyan (Committer). PR Review: https://git.openjdk.org/jdk/pull/20920#pullrequestreview-2303317737 From epeter at openjdk.org Fri Sep 13 14:48:17 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 13 Sep 2024 14:48:17 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 6 Sep 2024 18:08:04 GMT, Jatin Bhateja wrote: >> test/jdk/jdk/incubator/vector/ShortMaxVectorTests.java line 1048: >> >>> 1046: return SHORT_GENERATOR_SELECT_FROM_TRIPLES.stream().map(List::toArray). >>> 1047: toArray(Object[][]::new); >>> 1048: } >> >> Just a control question: does this also occasionally generate examples with out-of-bounds indices? Negative out of bounds and positive out of bounds? > > Original API did throw IndexOutOfBoundsException, but later on we have moved away from exception throwing semantics to wrapping semantics. > Please find details at following comment > https://github.com/openjdk/jdk/pull/20508#issuecomment-2306344606 And do we test that the wrapping works correctly? >> test/jdk/jdk/incubator/vector/ShortMaxVectorTests.java line 5812: >> >>> 5810: ShortVector bv = ShortVector.fromArray(SPECIES, b, i); >>> 5811: ShortVector idxv = ShortVector.fromArray(SPECIES, idx, i); >>> 5812: idxv.selectFrom(av, bv).intoArray(r, i); >> >> Would this test catch a bug where the backend would generate vectors that are too long or too short? > > Existing vectorAPI inline expansion entry points explicitly pass lane type and count as intrinsic arguments, this is used to create concrete ideal vector types. That does not answer my question. If the backend operations you implemented would have the wrong vector-length: do we have any tests that would catch that? Often that requires not just going "up" with a loop but also "counting down" with the loop iv. Do you know what I mean? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1758999902 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1759002531 From epeter at openjdk.org Fri Sep 13 14:52:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 13 Sep 2024 14:52:11 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: <7bghGF2-qbhP1hJA2ljtdA3xSUSqiV0RLaOYm4AcZSQ=.eb3e36b2-5461-4755-ae71-2de89660649f@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7bghGF2-qbhP1hJA2ljtdA3xSUSqiV0RLaOYm4AcZSQ=.eb3e36b2-5461-4755-ae71-2de89660649f@github.com> Message-ID: On Tue, 3 Sep 2024 11:45:53 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding descriptive comments > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 544: > >> 542: byte[] vpayload1 = ((ByteVector)v1).vec(); >> 543: byte[] vpayload2 = ((ByteVector)v2).vec(); >> 544: byte[] vpayload3 = ((ByteVector)v3).vec(); > > Is there a reason you are not using more descriptive names here instead of `vpayload1`? > I also wonder if the `selectFromHelper` should not be named more specifically: `selectFromTwoVector(s)Helper`? You only gave me a thumbs up and no change - but comment resolved. Is that intentional? Makes me feel like you are ignoring my comments, and that discourages me from reviewing in the future. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1759008094 From epeter at openjdk.org Fri Sep 13 14:56:10 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 13 Sep 2024 14:56:10 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v8] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 6 Sep 2024 18:13:34 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review resolutions. Can you please **define** somewhere what it means to `prune indexes`? It does not help me much more than the previous "massaging indexes" you had before I asked you to change it. > Also: I'm a little worried about the semantics change of the RearrangeNode that you did with the changes in RearrangeNode::Ideal. It looks a little "hacky", especially in conjunction with the vector_indexes_needs_massaging method. Can you give a clear definition of the semantics of RearrangeNode and vector_indexes_needs_massaging, please? You have also not responded to this yet. It seems to me that before your proposed change, `RearrangeNode` had a clear and easy semantic, and now you somehow "hack it" to work with your `vector_indexes_needs_pruning`. Can you explain please why this makes sense and add a comment to `RearrangeNode` what its semantics is? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2349148857 From kvn at openjdk.org Fri Sep 13 16:17:08 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 13 Sep 2024 16:17:08 GMT Subject: RFR: 8340102: Move assert-only loop in OopMapSort::sort under debug macro In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 10:04:01 GMT, Aleksey Shipilev wrote: > Found this papercut when looking at Leyden perf runs. > > In OopMapSort::sort, there is a loop that apparently is only there for asserts. At least GCC 11.4 apparently not smart enough to eliminate the whole loop in release builds, probably because iterator reads things from the stream. Wrapping the loop with #ifdef ASSERT saves about 144 bytes in code stream. Nice. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20992#pullrequestreview-2303527268 From jbhateja at openjdk.org Fri Sep 13 16:20:28 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Sep 2024 16:20:28 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v10] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Documentation change suggerstion from Paul ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/4a93042b..4301c817 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=08-09 Stats: 343 lines in 2 files changed: 173 ins; 8 del; 162 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From psandoz at openjdk.org Fri Sep 13 16:46:17 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 13 Sep 2024 16:46:17 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 6 Sep 2024 18:08:09 GMT, Jatin Bhateja wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Vector.java line 2770: >> >>> 2768: >>> 2769: /** >>> 2770: * Rearranges the lane elements of two vectors, selecting lanes >> >> I have a bit of a name concern here. Why are we calling it "select" and not "rearrange"? Because for a single "from" vector we also call it "rearrange", right? Is "select" not often synonymous to "blend", which works also with two "from" vectors, but with a mask and not indexing for "selection/rearranging"? > > We already have another flavor of [selectFrom](https://docs.oracle.com/en/java/javase/22/docs/api/jdk.incubator.vector/jdk/incubator/vector/Vector.html#selectFrom(jdk.incubator.vector.Vector)) which permutes single vector, new API extents its semantics to two vector selection, so we kept the nomenclature consistent. Select operates only on vectors where the `this` vector represents the indexes to *select* elements from the other vectors. Rearrange operates on vectors and a shuffle argument that *rearranges* elements from the other vectors. The former behavior can be specified in terms of the latter behavior, and ideally the equivalent expressions should result in ~same generated sequence of instructions. However, we are not there yet and need to further optimize shuffles to make that happen. But, we can optimize `selectFrom` with the dependent change to wrap indexes instead of throwing when out of bounds. (Separately there is an annoying issue with select, that we should not address in this PR. Using a Float/Double Vector for indexes is awkward.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1759182233 From qamai at openjdk.org Fri Sep 13 17:23:05 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 13 Sep 2024 17:23:05 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Mon, 19 Aug 2024 21:47:23 GMT, Sandhya Viswanathan wrote: > Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. > > Summary of changes is as follows: > 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. > 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code > > For the following source: > > > public void test() { > var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); > for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { > var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); > index.selectFrom(inpvect).intoArray(byteres, j); > } > } > > > The code generated for inner main now looks as follows: > ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 > 0x00007f40d02274d0: movslq %ebx,%r13 > 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 > 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) > 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 > 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) > 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 > 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) > 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 > 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) > 0x00007f40d022751f: add $0x40,%ebx > 0x00007f40d0227522: cmp %r8d,%ebx > 0x00007f40d0227525: jl 0x00007f40d02274d0 > > Best Regards, > Sandhya Given `rearrange` with 1 vector gets wrapping indices semantics. I think we should stop normalizing indices when converting a `Vector` into a `VectorShuffle` (currently we wrap all out-of-bound elements to `[-VLEN, 0)`). Then the rearrange with 2 vectors will also wrap similarly (all indices are `& (VLEN * 2 - 1)`, then indices `[0, VLEN)` maps to the first vector and indices `[VLEN, 2 * VLEN)` map to the second vector). What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2349535266 From jbhateja at openjdk.org Fri Sep 13 17:31:24 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Sep 2024 17:31:24 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Mon, 19 Aug 2024 21:47:23 GMT, Sandhya Viswanathan wrote: > Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. > > Summary of changes is as follows: > 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. > 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code > > For the following source: > > > public void test() { > var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); > for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { > var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); > index.selectFrom(inpvect).intoArray(byteres, j); > } > } > > > The code generated for inner main now looks as follows: > ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 > 0x00007f40d02274d0: movslq %ebx,%r13 > 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 > 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) > 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 > 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) > 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 > 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) > 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 > 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) > 0x00007f40d022751f: add $0x40,%ebx > 0x00007f40d0227522: cmp %r8d,%ebx > 0x00007f40d0227525: jl 0x00007f40d02274d0 > > Best Regards, > Sandhya src/hotspot/share/opto/vectorIntrinsics.cpp line 2206: > 2204: const Type * byte_bt = Type::get_const_basic_type(T_BYTE); > 2205: const TypeVect * byte_vt = TypeVect::make(byte_bt, num_elem); > 2206: Node* byte_shuffle = gvn().transform(VectorCastNode::make(cast_vopc, v1, T_BYTE, num_elem)); We can be optimal here and prevent down casting and subsequent load shuffles in applicable scenarios, e.g. indexes held in integral vectors. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1758203424 From kvn at openjdk.org Fri Sep 13 17:32:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 13 Sep 2024 17:32:10 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v3] In-Reply-To: References: Message-ID: <3WoyFjZSLvQyBaCl3bJxLCXPwutSfmzyaY8fhnZ21bs=.0c8cd153-3160-4391-8f63-20597f5e84b5@github.com> On Fri, 13 Sep 2024 07:52:38 GMT, Jatin Bhateja wrote: >> - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The >> condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. >> - This saves emitting an explicit MOVZX instruction after setCC. >> - These new instructions are encoded using 4 byte Extended EVEX encoding. >> >> Validation performed over stand alone test point using Intel SDE. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Review comments resolution. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8339790 > - Review resolutions. > - 8339790: Support Intel APX setzucc instruction. Just one comment src/hotspot/cpu/x86/gc/x/x_x86_64.ad line 129: > 127: format %{ "lock\n\t" > 128: "cmpxchgq $newval, $mem\n\t" > 129: "sete_with_zextl $res\n\t" %} Please, use `setcc` in format to match `ins_encode`. `sete_with_zextl` isused only when it is supported. ------------- PR Review: https://git.openjdk.org/jdk/pull/20920#pullrequestreview-2303690290 PR Review Comment: https://git.openjdk.org/jdk/pull/20920#discussion_r1759235416 From jbhateja at openjdk.org Fri Sep 13 17:41:08 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Sep 2024 17:41:08 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 13 Sep 2024 14:45:29 GMT, Emanuel Peter wrote: >> Existing vectorAPI inline expansion entry points explicitly pass lane type and count as intrinsic arguments, this is used to create concrete ideal vector types. > > That does not answer my question. If the backend operations you implemented would have the wrong vector-length: do we have any tests that would catch that? Often that requires not just going "up" with a loop but also "counting down" with the loop iv. Do you know what I mean? Patch includes tests for all the species (combination of vector type and sizes), each vector kernel is validated against equivalent scalar implementation, scenario which you are referring is implicitly handled though tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1759246223 From sviswanathan at openjdk.org Fri Sep 13 18:20:07 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 13 Sep 2024 18:20:07 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Fri, 13 Sep 2024 17:20:40 GMT, Quan Anh Mai wrote: > Given `rearrange` with 1 vector gets wrapping indices semantics. I think we should stop normalizing indices when converting a `Vector` into a `VectorShuffle` (currently we wrap all out-of-bound elements to `[-VLEN, 0)`). Then the rearrange with 2 vectors will also wrap similarly (all indices are `& (VLEN * 2 - 1)`, then indices `[0, VLEN)` maps to the first vector and indices `[VLEN, 2 * VLEN)` map to the second vector). We will normalize the indices when we invoke `VectorShuffle::toVector` which I think is much less used than `Vector::toShuffle`. What do you think? The guidance from Paul Sandoz and John Rose is to keep the the partial wrapping at shuffle construction as is for now and only change the rearrange and selectFrom apis. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2349763832 From sviswanathan at openjdk.org Fri Sep 13 18:27:06 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 13 Sep 2024 18:27:06 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Fri, 13 Sep 2024 05:30:36 GMT, Jatin Bhateja wrote: >> Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. >> >> Summary of changes is as follows: >> 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. >> 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code >> >> For the following source: >> >> >> public void test() { >> var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); >> for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { >> var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); >> index.selectFrom(inpvect).intoArray(byteres, j); >> } >> } >> >> >> The code generated for inner main now looks as follows: >> ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 >> 0x00007f40d02274d0: movslq %ebx,%r13 >> 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) >> 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) >> 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) >> 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) >> 0x00007f40d022751f: add $0x40,%ebx >> 0x00007f40d0227522: cmp %r8d,%ebx >> 0x00007f40d0227525: jl 0x00007f40d02274d0 >> >> Best Regards, >> Sandhya > > src/hotspot/share/opto/vectorIntrinsics.cpp line 2206: > >> 2204: const Type * byte_bt = Type::get_const_basic_type(T_BYTE); >> 2205: const TypeVect * byte_vt = TypeVect::make(byte_bt, num_elem); >> 2206: Node* byte_shuffle = gvn().transform(VectorCastNode::make(cast_vopc, v1, T_BYTE, num_elem)); > > We can be optimal here and prevent down casting and subsequent load shuffles in applicable scenarios, e.g. indexes held in integral vectors. @jatin-bhateja If you could expand on this comment with specific cases it will be helpful. The loadShuffle generation is needed for platform specific handling of shuffles and cannot be optimized out here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759296031 From jbhateja at openjdk.org Fri Sep 13 18:30:08 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Sep 2024 18:30:08 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v8] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 13 Sep 2024 14:53:18 GMT, Emanuel Peter wrote: > Can you please **define** somewhere what it means to `prune indexes`? It does not help me much more than the previous "massaging indexes" you had before I asked you to change it. > > > Also: I'm a little worried about the semantics change of the RearrangeNode that you did with the changes in RearrangeNode::Ideal. It looks a little "hacky", especially in conjunction with the vector_indexes_needs_massaging method. Can you give a clear definition of the semantics of RearrangeNode and vector_indexes_needs_massaging, please? > > You have also not responded to this yet. It seems to me that before your proposed change, `RearrangeNode` had a clear and easy semantic, and now you somehow "hack it" to work with your `vector_indexes_needs_pruning`. Can you explain please why this makes sense and add a comment to `RearrangeNode` what its semantics is? In case target does not directly support two vector selection instruction we lower the IR to its constituents, this is better than intrinsification failure as it saves costly vector boxing penalties. Think in terms of desired compiler IR and not rearrange API semantics, VectorRearrange IR node always expects a shape conformance b/w vector to be permuted and index vector, since shuffle indices are held in byte array based backing storage hence compiler injects VectorLoadShuffle nodes to upcast the byte vector lanes holding indexes to match the input vector lane. Since selectFrom API already passes the indexes through vector hence we can save emitting redundant toShuffle() + toVector() operations in all cases apart from some target specific scenarios e.g. AVX2 targets [do not support direct short vector permute]instruction "VPERMW", hence we need to [massage the index vector](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L8771) to emulate desired permutation using byte permute instruction. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2349801299 From jbhateja at openjdk.org Fri Sep 13 18:40:53 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Sep 2024 18:40:53 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v9] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Documentation suggestions from Paul. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/d3ee3104..1c00f417 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=07-08 Stats: 36 lines in 1 file changed: 23 ins; 2 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From jbhateja at openjdk.org Fri Sep 13 19:09:04 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Sep 2024 19:09:04 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Fri, 13 Sep 2024 18:24:04 GMT, Sandhya Viswanathan wrote: >> src/hotspot/share/opto/vectorIntrinsics.cpp line 2206: >> >>> 2204: const Type * byte_bt = Type::get_const_basic_type(T_BYTE); >>> 2205: const TypeVect * byte_vt = TypeVect::make(byte_bt, num_elem); >>> 2206: Node* byte_shuffle = gvn().transform(VectorCastNode::make(cast_vopc, v1, T_BYTE, num_elem)); >> >> We can be optimal here and prevent down casting and subsequent load shuffles in applicable scenarios, e.g. indexes held in integral vectors. > > @jatin-bhateja If you could expand on this comment with specific cases it will be helpful. The loadShuffle generation is needed for platform specific handling of shuffles and cannot be optimized out here. Hi @sviswa7, I was suggesting emitting toShuffle() + toVector() only if it's needed under a target specific hook, since indexes are anyways passed though vector. Please let me know if you find blow explanation too constraining. https://github.com/openjdk/jdk/pull/20508#issuecomment-2349801299 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759345567 From sviswanathan at openjdk.org Fri Sep 13 19:17:04 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 13 Sep 2024 19:17:04 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: <2483R4bBJDN4UpBlRJSQVE2KjdctYIy0j__kzRGRDHc=.71baed04-9185-4111-a3ce-ce32a40cb570@github.com> On Fri, 13 Sep 2024 19:04:12 GMT, Jatin Bhateja wrote: >> @jatin-bhateja If you could expand on this comment with specific cases it will be helpful. The loadShuffle generation is needed for platform specific handling of shuffles and cannot be optimized out here. > > Hi @sviswa7, I was suggesting emitting toShuffle() + toVector() only if it's needed under a target specific hook, since indexes are anyways passed though vector. Please let me know if you find blow explanation too constraining. > https://github.com/openjdk/jdk/pull/20508#issuecomment-2349801299 I think VectorLoadShuffle removal optimizations should be a separate PR and well thought out. So far the contract has been that rearrange always gets the shuffle through VectorLoadShuffle and I would like to keep that contract in this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759361459 From jbhateja at openjdk.org Fri Sep 13 19:23:14 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Sep 2024 19:23:14 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v3] In-Reply-To: <3WoyFjZSLvQyBaCl3bJxLCXPwutSfmzyaY8fhnZ21bs=.0c8cd153-3160-4391-8f63-20597f5e84b5@github.com> References: <3WoyFjZSLvQyBaCl3bJxLCXPwutSfmzyaY8fhnZ21bs=.0c8cd153-3160-4391-8f63-20597f5e84b5@github.com> Message-ID: On Fri, 13 Sep 2024 17:27:29 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: >> >> - Review comments resolution. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8339790 >> - Review resolutions. >> - 8339790: Support Intel APX setzucc instruction. > > src/hotspot/cpu/x86/gc/x/x_x86_64.ad line 129: > >> 127: format %{ "lock\n\t" >> 128: "cmpxchgq $newval, $mem\n\t" >> 129: "sete_with_zextl $res\n\t" %} > > Please, use `setcc` in format to match `ins_encode`. `sete_with_zextl` isused only when it is supported. Hi @vnkozlov , setcc used in inst_encoding block is a macro assembly routine which emits either setcc + movzbl or setzucc for APX supported targets, I wanted to use one opto instruction to correctly depict semantics of both the cases. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20920#discussion_r1759379552 From kvn at openjdk.org Fri Sep 13 19:40:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 13 Sep 2024 19:40:05 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v3] In-Reply-To: References: <3WoyFjZSLvQyBaCl3bJxLCXPwutSfmzyaY8fhnZ21bs=.0c8cd153-3160-4391-8f63-20597f5e84b5@github.com> Message-ID: <0q8eO87_rRl1zJTaysGKEpPPhLpgZYobzVUIZpJ8onY=.26218530-9e34-4e2f-8cee-6f567fa18b95@github.com> On Fri, 13 Sep 2024 19:20:07 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/gc/x/x_x86_64.ad line 129: >> >>> 127: format %{ "lock\n\t" >>> 128: "cmpxchgq $newval, $mem\n\t" >>> 129: "sete_with_zextl $res\n\t" %} >> >> Please, use `setcc` in format to match `ins_encode`. `sete_with_zextl` isused only when it is supported. > > Hi @vnkozlov , > setcc used in inst_encoding block is a macro assembly routine which emits either setcc + movzbl or setzucc for APX supported targets, I wanted to use one opto instruction to correctly depict semantics of both the cases. Yes, I see `setcc` definition and using it in `format` is fine since it will match to `inst_encoding`. On other hand, there is no macro or assembler instruction `sete_with_zextl` and it will be confusing. If you want you can add comment to format (and you should not use `\n` in last line): "setcc $res\t# emits sete + movzbl or setzucc for APX" %} ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20920#discussion_r1759397114 From psandoz at openjdk.org Fri Sep 13 19:48:04 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 13 Sep 2024 19:48:04 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Fri, 13 Sep 2024 18:17:21 GMT, Sandhya Viswanathan wrote: > > The guidance from Paul Sandoz and John Rose is to keep the the partial wrapping at shuffle construction as is for now and only change the rearrange and selectFrom apis. Yes, we are trying to take smaller incremental steps. Once the we are done with this work we can step back and discuss/review what to do about shuffles. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2350039460 From psandoz at openjdk.org Fri Sep 13 20:09:05 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 13 Sep 2024 20:09:05 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Mon, 19 Aug 2024 21:47:23 GMT, Sandhya Viswanathan wrote: > Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. > > Summary of changes is as follows: > 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. > 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code > > For the following source: > > > public void test() { > var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); > for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { > var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); > index.selectFrom(inpvect).intoArray(byteres, j); > } > } > > > The code generated for inner main now looks as follows: > ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 > 0x00007f40d02274d0: movslq %ebx,%r13 > 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 > 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) > 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 > 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) > 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 > 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) > 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 > 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) > 0x00007f40d022751f: add $0x40,%ebx > 0x00007f40d0227522: cmp %r8d,%ebx > 0x00007f40d0227525: jl 0x00007f40d02274d0 > > Best Regards, > Sandhya src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 2439: > 2437: (v1, s_, m_) -> v1.uOp((i, a) -> { > 2438: int ei = s_.laneSource(i); > 2439: return ei < 0 || !m_.laneIsSet(i) ? 0 : v1.lane(ei); The `ei < 0` test is redundant. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Vector.java line 2637: > 2635: * > 2636: * For each lane {@code N} of the shuffle, and for each lane > 2637: * source index {@code I=s.wrapIndex(s.laneSource(N))} in the shuffle, The pseudo code below starting at line 2644 needs adjusting to: Vector r = this.rearrange(s); return broadcast(0).blend(r, m); src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Vector.java line 2755: > 2753: * > 2754: * The result is the same as the expression > 2755: * {@code v.rearrange(this.toShuffle().wrapIndexes())}. Since we also adjusted `rearrange` the existing expression is fine, recommend no change here and to the mask accepting version. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759431093 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759428672 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759418829 From jbhateja at openjdk.org Fri Sep 13 20:37:27 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Sep 2024 20:37:27 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v4] In-Reply-To: References: Message-ID: > - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The > condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. > - This saves emitting an explicit MOVZX instruction after setCC. > - These new instructions are encoded using 4 byte Extended EVEX encoding. > > Validation performed over stand alone test point using Intel SDE. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20920/files - new: https://git.openjdk.org/jdk/pull/20920/files/998501e1..c1c42d38 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20920&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20920&range=02-03 Stats: 12 lines in 3 files changed: 0 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/20920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20920/head:pull/20920 PR: https://git.openjdk.org/jdk/pull/20920 From jbhateja at openjdk.org Fri Sep 13 20:37:27 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Sep 2024 20:37:27 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v3] In-Reply-To: <0q8eO87_rRl1zJTaysGKEpPPhLpgZYobzVUIZpJ8onY=.26218530-9e34-4e2f-8cee-6f567fa18b95@github.com> References: <3WoyFjZSLvQyBaCl3bJxLCXPwutSfmzyaY8fhnZ21bs=.0c8cd153-3160-4391-8f63-20597f5e84b5@github.com> <0q8eO87_rRl1zJTaysGKEpPPhLpgZYobzVUIZpJ8onY=.26218530-9e34-4e2f-8cee-6f567fa18b95@github.com> Message-ID: On Fri, 13 Sep 2024 19:36:59 GMT, Vladimir Kozlov wrote: >> Hi @vnkozlov , >> setcc used in inst_encoding block is a macro assembly routine which emits either setcc + movzbl or setzucc for APX supported targets, I wanted to use one opto instruction to correctly depict semantics of both the cases. > > Yes, I see `setcc` definition and using it in `format` is fine since it will match to `inst_encoding`. > On other hand, there is no macro or assembler instruction `sete_with_zextl` and it will be confusing. > > If you want you can add comment to format (and you should not use `\n` in last line): > > "setcc $res\t# emits sete + movzbl or setzucc for APX" %} DONE ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20920#discussion_r1759470761 From kvn at openjdk.org Fri Sep 13 21:40:15 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 13 Sep 2024 21:40:15 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v4] In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 20:37:27 GMT, Jatin Bhateja wrote: >> - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The >> condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. >> - This saves emitting an explicit MOVZX instruction after setCC. >> - These new instructions are encoded using 4 byte Extended EVEX encoding. >> >> Validation performed over stand alone test point using Intel SDE. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20920#pullrequestreview-2304195956 From psandoz at openjdk.org Fri Sep 13 21:58:16 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 13 Sep 2024 21:58:16 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v10] In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 16:20:28 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Documentation change suggerstion from Paul src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java line 573: > 571: * @see VectorMath#addSaturating(int, int) > 572: */ > 573: public static final Associative SADD = assoc("SADD", "+", VectorSupport.VECTOR_OP_SADD, VO_NOFP+VO_ASSOC); I don't believe saturation arithmetic is associative. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1759556540 From psandoz at openjdk.org Fri Sep 13 22:02:30 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 13 Sep 2024 22:02:30 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v10] In-Reply-To: References: Message-ID: <0PApPK8O06mwyZXwM5XpEGPDdmeqaxhX-llQryWSUpo=.03ae9624-b809-48ca-8e07-242aad3e9df2@github.com> On Fri, 13 Sep 2024 16:20:28 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Documentation change suggerstion from Paul src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java line 589: > 587: * @see VectorMath#minUnsigned(int, int) (int, int) > 588: */ > 589: public static final Associative UMIN = assoc("UMIN", "umin", VectorSupport.VECTOR_OP_UMIN, VO_NOFP+VO_ASSOC); We should rename the existing unsigned compare operators to use the same naming scheme i.e., s/UNSIGNED_LT/ULT etc. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1759558844 From kvn at openjdk.org Fri Sep 13 22:12:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 13 Sep 2024 22:12:13 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: <_7xTwicd2PDxJZOJ7xLdkHZc18UT-9sVZk-3YFgkMA0=.7db81e94-4f38-44a0-9983-6e391459aab2@github.com> References: <_7xTwicd2PDxJZOJ7xLdkHZc18UT-9sVZk-3YFgkMA0=.7db81e94-4f38-44a0-9983-6e391459aab2@github.com> Message-ID: <-lhMoCYQAGXWEAQ2ySemYzUh_DjKgqi4pG10NdrHils=.b2bc294a-941d-42aa-a00f-149d9260dfeb@github.com> On Mon, 9 Sep 2024 14:41:25 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 112: >> >>> 110: // The answer is that stores of different sizes can co-exist >>> 111: // in the same sequence of RawMem effects. We sometimes initialize >>> 112: // a whole 'tile' of array elements with a single jint or jlong.) >> >> I'm having trouble making sense of this comment. I guess a jlong could be used to null-initialize two >> 32bit oops/narrowOops? But that doesn't have anything to do with jints. > > I am not sure the complex overlap test is necessary here, this code was copy-pasted from [MemNode::find_previous_store()](https://github.com/openjdk/jdk/blob/c54fc08aa3c63e4b26dc5edb2436844dfd3bab7c/src/hotspot/share/opto/memnode.cpp#L678) by [JDK-8057737](https://bugs.openjdk.org/browse/JDK-8057737), and in this new context I do not see how we might find stores of different sizes as mentioned in the comment. jlongs could be used to null-initialize two 32-bit OOPs, but such initializing stores are not even visible in C2's intermediate representation at the time `G1BarrierSetC2::g1_can_remove_pre_barrier()` is called. The fact that the comment refers to initializing several array elements with a single jint suggests to me that this code has lost some of its original purpose after being copied into a narrower context (OOP stores after object allocations). But since this code is pre-existing and in the worst case it is just performing some unnecessary work, I suggest to leave it as-is and possibly investigate how to simplify it as a follow-up task. Yes, the comment reference to combined initialization stores: [memnode.cpp#L4925](https://github.com/openjdk/jdk/blob/c54fc08aa3c63e4b26dc5edb2436844dfd3bab7c/src/hotspot/share/opto/memnode.cpp#L4925) Which is used only for primitive type (integers and floats) constant strores. There was also recent change by Emanuel to combine stores into primitive arrays: [JDK-8335390](https://bugs.openjdk.org/browse/JDK-8335390) None of above do anything to oop stores. I agree that this code could left for now and be optimized later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1759565105 From sviswanathan at openjdk.org Fri Sep 13 22:30:36 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 13 Sep 2024 22:30:36 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v2] In-Reply-To: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: > Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. > > Summary of changes is as follows: > 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. > 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code > > For the following source: > > > public void test() { > var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); > for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { > var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); > index.selectFrom(inpvect).intoArray(byteres, j); > } > } > > > The code generated for inner main now looks as follows: > ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 > 0x00007f40d02274d0: movslq %ebx,%r13 > 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 > 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) > 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 > 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) > 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 > 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) > 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 > 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) > 0x00007f40d022751f: add $0x40,%ebx > 0x00007f40d0227522: cmp %r8d,%ebx > 0x00007f40d0227525: jl 0x00007f40d02274d0 > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20634/files - new: https://git.openjdk.org/jdk/pull/20634/files/694aceb5..428f2289 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20634&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20634&range=00-01 Stats: 14 lines in 8 files changed: 0 ins; 4 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/20634.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20634/head:pull/20634 PR: https://git.openjdk.org/jdk/pull/20634 From sviswanathan at openjdk.org Fri Sep 13 22:33:18 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 13 Sep 2024 22:33:18 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Fri, 13 Sep 2024 19:45:11 GMT, Paul Sandoz wrote: >>> Given `rearrange` with 1 vector gets wrapping indices semantics. I think we should stop normalizing indices when converting a `Vector` into a `VectorShuffle` (currently we wrap all out-of-bound elements to `[-VLEN, 0)`). Then the rearrange with 2 vectors will also wrap similarly (all indices are `& (VLEN * 2 - 1)`, then indices `[0, VLEN)` maps to the first vector and indices `[VLEN, 2 * VLEN)` map to the second vector). We will normalize the indices when we invoke `VectorShuffle::toVector` which I think is much less used than `Vector::toShuffle`. What do you think? >> >> The guidance from Paul Sandoz and John Rose is to keep the the partial wrapping at shuffle construction as is for now and only change the rearrange and selectFrom apis. > >> >> The guidance from Paul Sandoz and John Rose is to keep the the partial wrapping at shuffle construction as is for now and only change the rearrange and selectFrom apis. > > Yes, we are trying to take smaller incremental steps. Once the we are done with this work we can step back and discuss/review what to do about shuffles. @PaulSandoz Thanks a lot for the review. I have addressed your review comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2350535307 From kvn at openjdk.org Fri Sep 13 23:23:19 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 13 Sep 2024 23:23:19 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v20] In-Reply-To: References: Message-ID: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> On Wed, 11 Sep 2024 08:30:02 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Fix a few style issues src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 241: > 239: assert(newval_bottom->isa_ptr() || newval_bottom->isa_narrowoop(), "newval should be an OOP"); > 240: TypePtr::PTR newval_type = newval_bottom->make_ptr()->ptr(); > 241: uint8_t barrier_data = store->barrier_data(); Should you check barrier data for 0? `is_ptr()` has wide set of types. It includes TypeRawPtr, TypeKlassPtr and TypeMetadataPtr. Where you filtering them? src/hotspot/share/gc/g1/g1BarrierSet.cpp line 65: > 63: #else > 64: make_barrier_set_c2(), > 65: #endif I assume it is temporary until all ports a ready (except 32-bit x86 may be). Right? src/hotspot/share/opto/matcher.cpp line 1821: > 1819: if( rule >= _END_INST_CHAIN_RULE || rule < _BEGIN_INST_CHAIN_RULE ) { > 1820: assert(C->node_arena()->contains(s->_leaf) || !has_new_node(s->_leaf), > 1821: "duplicating node that's already been matched"); Why it was removed? src/hotspot/share/opto/matcher.cpp line 2845: > 2843: n->Opcode() == Op_StoreN && > 2844: m->is_EncodeP(); > 2845: } Add comment that `m` is input of `n`. I thought about adding assert too but I will leave it to you. src/hotspot/share/opto/output.cpp line 2026: > 2024: if (n->is_MachNullCheck()) { > 2025: assert(n->in(1)->as_Mach()->barrier_data() == 0, > 2026: "Implicit null checks on memory accesses with barriers are not yet supported"); I don't see here changes in `lcm.cpp` which would prevent it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1759604325 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1759604944 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1759593453 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1759593131 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1759605704 From jbhateja at openjdk.org Sat Sep 14 08:30:48 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 14 Sep 2024 08:30:48 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v11] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review suggestions incorporated. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/4301c817..71114d0d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=09-10 Stats: 330 lines in 22 files changed: 0 ins; 0 del; 330 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From jbhateja at openjdk.org Sat Sep 14 08:40:44 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 14 Sep 2024 08:40:44 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v12] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Update AARCH64 specific test using UNSIGNED_* comparison operators. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/71114d0d..ec7c7553 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=10-11 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From jbhateja at openjdk.org Sat Sep 14 09:11:06 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 14 Sep 2024 09:11:06 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v2] In-Reply-To: <2483R4bBJDN4UpBlRJSQVE2KjdctYIy0j__kzRGRDHc=.71baed04-9185-4111-a3ce-ce32a40cb570@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> <2483R4bBJDN4UpBlRJSQVE2KjdctYIy0j__kzRGRDHc=.71baed04-9185-4111-a3ce-ce32a40cb570@github.com> Message-ID: On Fri, 13 Sep 2024 19:14:29 GMT, Sandhya Viswanathan wrote: >> Hi @sviswa7, I was suggesting emitting toShuffle() + toVector() only if it's needed under a target specific hook, since indexes are anyways passed though vector. Please let me know if you find blow explanation too constraining. >> https://github.com/openjdk/jdk/pull/20508#issuecomment-2349801299 > > I think VectorLoadShuffle removal optimizations should be a separate PR and well thought out. So far the contract has been that rearrange always gets the shuffle through VectorLoadShuffle and I would like to keep that contract in this PR. Hi @sviswa7, @PaulSandoz , I will modify PR#20508 accordingly to honor the contract at IR level and address VectorLoadShuffle optimization for both flavors of selectFrom API in a follow up PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759701508 From ddong at openjdk.org Sun Sep 15 01:22:33 2024 From: ddong at openjdk.org (Denghui Dong) Date: Sun, 15 Sep 2024 01:22:33 GMT Subject: RFR: 8340144: C1: remove unused Compilation::_max_spills Message-ID: Hi, Please review this trivial change that removed the unused field Compilation::_max_spills. Thanks ------------- Commit messages: - 8340144: C1: remove unused Compilation::_max_spills Changes: https://git.openjdk.org/jdk/pull/21007/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21007&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340144 Stats: 5 lines in 2 files changed: 0 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21007.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21007/head:pull/21007 PR: https://git.openjdk.org/jdk/pull/21007 From ddong at openjdk.org Sun Sep 15 01:26:51 2024 From: ddong at openjdk.org (Denghui Dong) Date: Sun, 15 Sep 2024 01:26:51 GMT Subject: RFR: 8340144: C1: remove unused Compilation::_max_spills [v2] In-Reply-To: References: Message-ID: > Hi, > > Please review this trivial change that removed the unused field Compilation::_max_spills. > > Thanks Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21007/files - new: https://git.openjdk.org/jdk/pull/21007/files/ff4ec1d2..cb1ff832 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21007&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21007&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21007.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21007/head:pull/21007 PR: https://git.openjdk.org/jdk/pull/21007 From epeter at openjdk.org Sun Sep 15 07:19:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 15 Sep 2024 07:19:15 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v8] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 13 Sep 2024 18:27:07 GMT, Jatin Bhateja wrote: > > Can you please **define** somewhere what it means to `prune indexes`? It does not help me much more than the previous "massaging indexes" you had before I asked you to change it. > > > Also: I'm a little worried about the semantics change of the RearrangeNode that you did with the changes in RearrangeNode::Ideal. It looks a little "hacky", especially in conjunction with the vector_indexes_needs_massaging method. Can you give a clear definition of the semantics of RearrangeNode and vector_indexes_needs_massaging, please? > > > > > > You have also not responded to this yet. It seems to me that before your proposed change, `RearrangeNode` had a clear and easy semantic, and now you somehow "hack it" to work with your `vector_indexes_needs_pruning`. Can you explain please why this makes sense and add a comment to `RearrangeNode` what its semantics is? > > In case target does not directly support two vector selection instruction we lower the IR to its constituents, this is better than intrinsification failure as it saves costly vector boxing penalties. > > Consider in terms of desired compiler IR and not rearrange API semantics, VectorRearrange IR node generally expects shape conformance b/w vector to be permuted and index vector, since shuffle indices are held in byte array based backing storage hence compiler injects VectorLoadShuffle nodes to upcast the byte vector lanes holding indexes to match the input vector lane. Since selectFrom API already passes the indexes through vector hence we can save emitting redundant toShuffle() + toVector() operations in some cases apart from some target specific scenarios e.g. AVX2 targets [do not support direct short vector permute]instruction "VPERMW", hence we need to [massage the index vector](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L8771) to emulate desired permutation using byte permute instruction. > > VectorLoadShuffle abstraction hides target specific index massaging which is why adding a target specific hook like Matcher::vector_indexes_needs_pruning compiler to selectively emit VectorLoadShuffle. I still do not see a **definition of the semantics of RearrangeNode**: what inputs does it accept and what does it do with them? Can you put this explanation as comment in the code, please? It sounds like this is what the `massaging` / `pruning` is: `emulate desired permutation using byte permute instruction.` You should find an accordingly more suiting name for the method name. Maybe it is something like `must_emulate_permutation_with...`. Or maybe it is rather a `supported` kind of question? I leave that up to you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2351426263 From jbhateja at openjdk.org Sun Sep 15 11:32:40 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 15 Sep 2024 11:32:40 GMT Subject: RFR: 8339793: Fix incorrect APX feature enabling with -XX:-UseAPX [v2] In-Reply-To: References: Message-ID: > Currently VM_Supports::supports_apx_f() returns a true value even if user explicitly pass -XX:-UseAPX runtime flag, this enables APX specific code and register set. > > This bug fix patch turn off the APX_F feature if UseAPX runtime flag is explicitly set to false value. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8339793 - 8339793: Fix incorrect APX feature enabling with -XX:-UseAPX. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20921/files - new: https://git.openjdk.org/jdk/pull/20921/files/bb6ebe5a..86d90d7c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20921&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20921&range=00-01 Stats: 18761 lines in 468 files changed: 10371 ins; 5668 del; 2722 mod Patch: https://git.openjdk.org/jdk/pull/20921.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20921/head:pull/20921 PR: https://git.openjdk.org/jdk/pull/20921 From kvn at openjdk.org Sun Sep 15 18:05:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 15 Sep 2024 18:05:05 GMT Subject: RFR: 8339793: Fix incorrect APX feature enabling with -XX:-UseAPX [v2] In-Reply-To: References: Message-ID: On Sun, 15 Sep 2024 11:32:40 GMT, Jatin Bhateja wrote: >> Currently VM_Supports::supports_apx_f() returns a true value even if user explicitly pass -XX:-UseAPX runtime flag, this enables APX specific code and register set. >> >> This bug fix patch turn off the APX_F feature if UseAPX runtime flag is explicitly set to false value. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8339793 > - 8339793: Fix incorrect APX feature enabling with -XX:-UseAPX. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20921#pullrequestreview-2305494555 From jbhateja at openjdk.org Mon Sep 16 02:58:41 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 16 Sep 2024 02:58:41 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/1c00f417..7c80bfce Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=08-09 Stats: 321 lines in 51 files changed: 57 ins; 97 del; 167 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From jbhateja at openjdk.org Mon Sep 16 03:02:08 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 16 Sep 2024 03:02:08 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v8] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <4IqtmftuGBNSj8_1HsI3x9eKBSf4QhpoKELYs1EanLE=.15ae8f1b-f586-403a-88d6-9193bba90fb2@github.com> On Sun, 15 Sep 2024 07:16:17 GMT, Emanuel Peter wrote: > > > Can you please **define** somewhere what it means to `prune indexes`? It does not help me much more than the previous "massaging indexes" you had before I asked you to change it. > > > > Also: I'm a little worried about the semantics change of the RearrangeNode that you did with the changes in RearrangeNode::Ideal. It looks a little "hacky", especially in conjunction with the vector_indexes_needs_massaging method. Can you give a clear definition of the semantics of RearrangeNode and vector_indexes_needs_massaging, please? > > > > > > > > > You have also not responded to this yet. It seems to me that before your proposed change, `RearrangeNode` had a clear and easy semantic, and now you somehow "hack it" to work with your `vector_indexes_needs_pruning`. Can you explain please why this makes sense and add a comment to `RearrangeNode` what its semantics is? > > > > > > In case target does not directly support two vector selection instruction we lower the IR to its constituents, this is better than intrinsification failure as it saves costly vector boxing penalties. > > Consider in terms of desired compiler IR and not rearrange API semantics, VectorRearrange IR node generally expects shape conformance b/w vector to be permuted and index vector, since shuffle indices are held in byte array based backing storage hence compiler injects VectorLoadShuffle nodes to upcast the byte vector lanes holding indexes to match the input vector lane. Since selectFrom API already passes the indexes through vector hence we can save emitting redundant toShuffle() + toVector() operations in some cases apart from some target specific scenarios e.g. AVX2 targets [do not support direct short vector permute]instruction "VPERMW", hence we need to [massage the index vector](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L8771) to emulate desired permutation using byte permute instruction. > > VectorLoadShuffle abstraction hides target specific index massaging which is why adding a target specific hook like Matcher::vector_indexes_needs_pruning compiler to selectively emit VectorLoadShuffle. > > I still do not see a **definition of the semantics of RearrangeNode**: what inputs does it accept and what does it do with them? > > Can you put this explanation as comment in the code, please? > > It sounds like this is what the `massaging` / `pruning` is: `emulate desired permutation using byte permute instruction.` You should find an accordingly more suiting name for the method name. Maybe it is something like `must_emulate_permutation_with...`. Or maybe it is rather a `supported` kind of question? I leave that up to you. Hi @eme64 , As per discussion on [PR# 20634 ](https://github.com/openjdk/jdk/pull/20634#discussion_r1759701508), we plan to suppress VectorLoadShuffle bypassing optimization for now and address this as a follow up optimization for both the flavors of selectFrom API. I have addressed your comments. Kindly verify. Best Regards, Jatin ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2351944720 From thartmann at openjdk.org Mon Sep 16 05:20:03 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 16 Sep 2024 05:20:03 GMT Subject: RFR: 8340144: C1: remove unused Compilation::_max_spills [v2] In-Reply-To: References: Message-ID: On Sun, 15 Sep 2024 01:26:51 GMT, Denghui Dong wrote: >> Hi, >> >> Please review this trivial change that removed the unused field Compilation::_max_spills. >> >> Thanks > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21007#pullrequestreview-2305755535 From shade at openjdk.org Mon Sep 16 05:35:15 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 16 Sep 2024 05:35:15 GMT Subject: RFR: 8340102: Move assert-only loop in OopMapSort::sort under debug macro In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 10:04:01 GMT, Aleksey Shipilev wrote: > Found this papercut when looking at Leyden perf runs. > > In OopMapSort::sort, there is a loop that apparently is only there for asserts. At least GCC 11.4 apparently not smart enough to eliminate the whole loop in release builds, probably because iterator reads things from the stream. Wrapping the loop with #ifdef ASSERT saves about 144 bytes in code stream. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20992#issuecomment-2352045821 From shade at openjdk.org Mon Sep 16 05:35:16 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 16 Sep 2024 05:35:16 GMT Subject: Integrated: 8340102: Move assert-only loop in OopMapSort::sort under debug macro In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 10:04:01 GMT, Aleksey Shipilev wrote: > Found this papercut when looking at Leyden perf runs. > > In OopMapSort::sort, there is a loop that apparently is only there for asserts. At least GCC 11.4 apparently not smart enough to eliminate the whole loop in release builds, probably because iterator reads things from the stream. Wrapping the loop with #ifdef ASSERT saves about 144 bytes in code stream. This pull request has now been integrated. Changeset: 0e0f10f9 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/0e0f10f95217b5caaed02744a0a341350e4f2bc7 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod 8340102: Move assert-only loop in OopMapSort::sort under debug macro Reviewed-by: stuefe, fyang, kvn ------------- PR: https://git.openjdk.org/jdk/pull/20992 From thartmann at openjdk.org Mon Sep 16 07:18:06 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 16 Sep 2024 07:18:06 GMT Subject: RFR: 8339793: Fix incorrect APX feature enabling with -XX:-UseAPX [v2] In-Reply-To: References: Message-ID: On Sun, 15 Sep 2024 11:32:40 GMT, Jatin Bhateja wrote: >> Currently VM_Supports::supports_apx_f() returns a true value even if user explicitly pass -XX:-UseAPX runtime flag, this enables APX specific code and register set. >> >> This bug fix patch turn off the APX_F feature if UseAPX runtime flag is explicitly set to false value. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8339793 > - 8339793: Fix incorrect APX feature enabling with -XX:-UseAPX. Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20921#pullrequestreview-2305887940 From duke at openjdk.org Mon Sep 16 07:18:17 2024 From: duke at openjdk.org (leo liang) Date: Mon, 16 Sep 2024 07:18:17 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v5] In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 15:44:21 GMT, Volker Simonis wrote: >> Hi there, we are seeing this issue when we run JFR on our services under load, we see a large spike of CPU after JFR is triggered, which cause 500 errors in our service. We are currently using corretto-17 in our service. >> >> Wondering this fix get back ported to JDK 17? As I can't find this change mentioned in [JDK update](https://wiki.openjdk.org/display/JDKUpdates/Archived+Releases) or in [jdk17u tag compare](https://github.com/openjdk/jdk17u/compare/jdk-17.0.9+9...jdk-17.0.13+1) >> >> Also, wondering if there is a walk around for this issue if the PR is not back ported to Java 17. `XX:+EnableDynamicAgentLoading` seems to only supported in Java 21, so that wouldn't help for now > >> Hi there, we are seeing this issue when we run JFR on our services under load, we see a large spike of CPU after JFR is triggered, which cause 500 errors in our service. We are currently using corretto-17 in our service. >> >> Wondering this fix get back ported to JDK 17? As I can't find this change mentioned in [JDK update](https://wiki.openjdk.org/display/JDKUpdates/Archived+Releases) or in [jdk17u tag compare](https://github.com/openjdk/jdk17u/compare/jdk-17.0.9+9...jdk-17.0.13+1) >> >> Also, wondering if there is a walk around for this issue if the PR is not back ported to Java 17. `XX:+EnableDynamicAgentLoading` seems to only supported in Java 21, so that wouldn't help for now > > @leomao10, I'm not sure if this change will ever be downported to older releases like JDK 21 or even JDK 17. I personally consider it low risk, but there have been reports of performance regressions in some cases (e.g. [JDK-8336805](https://bugs.openjdk.org/browse/JDK-8336805)). I couldn't reproduce them, but I can image that they will make the maintainers of LTS releases even more cautious. > > The easiest way to workaround this issue in JDK 17 would be to set the `can_redefine_classes`, `can_retransform_classes` or `can_generate_breakpoint_events` JVMTI capabilities at startup or early in the lifetime of the JVM. There are several ways how you could do this. > - trigger JFR right after startup. This will still invalidate all the JIT compiled methods but if you do this early enough there won'T be many of them. After you've triggered JFR for the first time, the corresponding JVMTI capabilities will be set and all dependencies will be recorded automatically so any subsequent JFR invocation won't suffer from a performance degradation any more. > - attach any other JVMTI agent like for example [async profiler](https://github.com/async-profiler/async-profiler) which requests the corresponding JVMTI capabilities at startup. > - write your own, trivial JVMTI agent which merely requests the corresponding JVMTI capabilities and attach it at startup with `agentpath:jvmtiAgent.so`. The agent can be as simple as: > > /* > > g++ -fPIC -shared -I $JAVA_HOME/include/ -I $JAVA_HOME/inlude/linux -o jvmtiAgent.so jvmtiAgent.cpp > */ > > #include > #include > #include > > extern "C" > JNIEXPORT jint JNICALL Agent_OnLoad(JavaVM* jvm, char* options, void* reserved) { > jvmtiEnv* jvmti = NULL; > jvmtiCapabilities capa; > jvmtiError error; > > jint result = jvm->GetEnv((void**) &jvmti, JVMTI_VERSI... That is awesome, thanks for the response and workarounds @simonis ?? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17509#issuecomment-2352169138 From epeter at openjdk.org Mon Sep 16 07:51:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 07:51:15 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> Message-ID: <0Ak63KM4JYdbOT33Q8XPcv096MKzjY6vyl4xZkapZwM=.6b92e423-9bd5-4109-a0b3-75dbeaf40315@github.com> On Mon, 16 Sep 2024 02:58:41 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. Looks better, I still have a few comments. src/hotspot/share/opto/vectorIntrinsics.cpp line 2739: > 2737: return true; > 2738: } > 2739: @jatin-bhateja You still have 3x `unbox failed v1` here. I already commented this earlier, and you resolved it and gave it a thumbs up ? Can you please fix it now? src/hotspot/share/opto/vectornode.cpp line 2116: > 2114: const TypeVect* index_vect_type = index_vec->bottom_type()->is_vect(); > 2115: BasicType index_elem_bt = index_vect_type->element_basic_type(); > 2116: assert(!is_floating_point_type(index_elem_bt), ""); Why not verify this also in the constructor of `SelectFromTwoVectorNode`? Can you maybe explicitly verify what it must be rather than **not** be? src/hotspot/share/opto/vectornode.cpp line 2122: > 2120: // index format by subsequent VectorLoadShuffle. > 2121: int cast_vopc = VectorCastNode::opcode(0, index_elem_bt, true); > 2122: Node* index_byte_vec = phase->transform(VectorCastNode::make(cast_vopc, index_vec, T_BYTE, num_elem)); This cast assumes that the indices cannot have more than 8 bits. This would allow vector lengths of up to 256. This is fine for intel. But as far as I know ARM has in principle longer vectors - up to 2048 bytes. Should we maybe add some assert here to make sure we never badly truncate the index? src/hotspot/share/opto/vectornode.cpp line 2138: > 2136: > 2137: // Load indexes from byte vector and appropriatly massage them to target specific > 2138: // permutation index format. I would replace `massage` -> `transform` everywhere. src/hotspot/share/opto/vectornode.hpp line 1625: > 1623: Node* Ideal(PhaseGVN* phase, bool can_reshape); > 1624: virtual int Opcode() const; > 1625: }; `index` -> `indexes` because this is a vector, right? Otherwise I'll assume it is a scalar. Can you do some pseudo-code, that says how exactly the indices are interpreted? What if they are out of bounds? Does it wrap? Or assume they are in bounds? Undefined behaviour? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20508#pullrequestreview-2305905569 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760651336 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760674461 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760678107 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760680772 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760665944 From epeter at openjdk.org Mon Sep 16 07:51:16 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 07:51:16 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 30 Aug 2024 14:44:05 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding descriptive comments > > src/hotspot/share/opto/vectornode.cpp line 2104: > >> 2102: // MASK) >> 2103: // This shall prevent an intrinsification failure and associated argument >> 2104: // boxing penalties. > > A quick comment about how the mask is computed could be nice. > `MASK = INDEX < num_elem` @jatin-bhateja very nice, thanks! > src/hotspot/share/opto/vectornode.cpp line 2148: > >> 2146: >> 2147: BoolTest::mask pred = BoolTest::lt; >> 2148: ConINode* pred_node = (ConINode*)phase->makecon(TypeInt::make(pred)); > > Would `as_ConI()` be a better alternative to the `(ConINode*)` cast? Please at least add a comment why you are not following my suggestion. I feel like the work I put in to review is not being respected when comments are just silently resolved without any action or comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760673419 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760656304 From epeter at openjdk.org Mon Sep 16 07:51:17 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 07:51:17 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Mon, 16 Sep 2024 07:27:06 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectornode.cpp line 2148: >> >>> 2146: >>> 2147: BoolTest::mask pred = BoolTest::lt; >>> 2148: ConINode* pred_node = (ConINode*)phase->makecon(TypeInt::make(pred)); >> >> Would `as_ConI()` be a better alternative to the `(ConINode*)` cast? > > Please at least add a comment why you are not following my suggestion. I feel like the work I put in to review is not being respected when comments are just silently resolved without any action or comment. I really do think that `as_ConI()` would be the right thing here. In product it is just a cast, and in debug at least we have an assert. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760657072 From epeter at openjdk.org Mon Sep 16 07:51:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 07:51:18 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: <0Ak63KM4JYdbOT33Q8XPcv096MKzjY6vyl4xZkapZwM=.6b92e423-9bd5-4109-a0b3-75dbeaf40315@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> <0Ak63KM4JYdbOT33Q8XPcv096MKzjY6vyl4xZkapZwM=.6b92e423-9bd5-4109-a0b3-75dbeaf40315@github.com> Message-ID: <4VKGFHuL8RSSll0Pnqgg5DeesBdXys8JOZT64yGUBG8=.58b88db6-58c0-49ea-b01c-d2d814a93cae@github.com> On Mon, 16 Sep 2024 07:35:46 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. > > src/hotspot/share/opto/vectornode.hpp line 1625: > >> 1623: Node* Ideal(PhaseGVN* phase, bool can_reshape); >> 1624: virtual int Opcode() const; >> 1625: }; > > `index` -> `indexes` because this is a vector, right? Otherwise I'll assume it is a scalar. > Can you do some pseudo-code, that says how exactly the indices are interpreted? What if they are out of bounds? Does it wrap? Or assume they are in bounds? Undefined behaviour? For me good comments here would be immendely valuable, because it helps with other C2 optimizations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760667297 From epeter at openjdk.org Mon Sep 16 07:51:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 07:51:18 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <_U2DgK6DAW3ZJhozsMhwHzggUFpj5fnHdLJOoYFcNJA=.1875811f-458f-4834-bb94-339a8ff7360d@github.com> On Fri, 13 Sep 2024 17:38:29 GMT, Jatin Bhateja wrote: >> That does not answer my question. If the backend operations you implemented would have the wrong vector-length: do we have any tests that would catch that? Often that requires not just going "up" with a loop but also "counting down" with the loop iv. Do you know what I mean? > > Patch includes tests for all the species (combination of vector type and sizes), each vector kernel is validated against equivalent scalar implementation, scenario which you are referring is implicitly handled though tests. Ok, just so that I can relax, can you please point me to this test that would implicitly verify that the backend has chosen the correct vector size? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760671393 From rcastanedalo at openjdk.org Mon Sep 16 08:19:24 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 16 Sep 2024 08:19:24 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v20] In-Reply-To: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Fri, 13 Sep 2024 23:16:32 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix a few style issues > > src/hotspot/share/gc/g1/g1BarrierSet.cpp line 65: > >> 63: #else >> 64: make_barrier_set_c2(), >> 65: #endif > > I assume it is temporary until all ports a ready (except 32-bit x86 may be). Right? Right, all code guarded by `G1_LATE_BARRIER_MIGRATION_SUPPORT` will be removed before integration. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1760716721 From roland at openjdk.org Mon Sep 16 08:39:40 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 16 Sep 2024 08:39:40 GMT Subject: RFR: 8336702: C2 compilation fails with "all memory state should have been processed" assert Message-ID: When converting a `LongCountedLoop` into a loop nest, c2 needs jvm state to add predicates to the inner loop. For that, it peels an iteration of the loop and uses the state of the safepoint at the end of the loop. That's only legal if there's no side effect between the safepoint and the backedge that goes back into the loop. The assert failure here happens in code that checks that. That code compares the memory states at the safepoint and at the backedge. If they are the same then there's no side effect. To check consistency, the `MergeMem` at the safepoint is cloned. As the logic iterates over the backedge state, it clears every component of the state it encounters from the `MergeMem`. Once done, the cloned `MergeMem` should be "empty". In the case of this failure, no side effect is found but the cloned `MergeMem` is not empty. That happens because of EA: it adds edges to the `MergeMem` at the safepoint that it doesn't add to the backedge `Phis`. So it's the verification code that fails and I propose dealing with this by ignoring memory state added by EA in the verification code. ------------- Commit messages: - fix & test Changes: https://git.openjdk.org/jdk/pull/21009/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21009&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8336702 Stats: 71 lines in 2 files changed: 69 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21009.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21009/head:pull/21009 PR: https://git.openjdk.org/jdk/pull/21009 From chagedorn at openjdk.org Mon Sep 16 08:49:15 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 16 Sep 2024 08:49:15 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing In-Reply-To: References: Message-ID: On Mon, 15 Jul 2024 15:56:10 GMT, Emanuel Peter wrote: > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... The framework is a very nice idea! It will definitely improve our testing coverage. It seems simple to use and also easier to review instead of having the fully generated test files to go through. I did a first pass and left some comments. Should we also add a `README.md`? You could add good the introduction and motivation in the PR summary to it. Additionally, you can mention the core features, reference your examples, give hints how the JTreg header comment should look like, and also add a section for future work if there is anything you think should/could be explored (not a conclusive list, just a few ideas what you could mention in the `README` :-) ) test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 41: > 39: import java.util.List; > 40: > 41: public class CompileFramework { General comment here: I suggest to add Javadocs to the public API methods in this class that the user can call. test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 45: > 43: private static final boolean VERBOSE = Boolean.getBoolean("CompileFrameworkVerbose"); > 44: > 45: private List sourceCodes = new ArrayList(); Can be omitted, same at other places in code: Suggestion: private List sourceCodes = new ArrayList<>(); test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 53: > 51: String cp = System.getProperty("java.class.path") + > 52: File.pathSeparator + > 53: classesDir.toAbsolutePath().toString(); `toString()` is implicit and can be removed, same at other places in code: Suggestion: classesDir.toAbsolutePath(); test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 54: > 52: File.pathSeparator + > 53: classesDir.toAbsolutePath().toString(); > 54: return cp.replace("\\", "\\\\"); // For windows paths Is this really required and not automatically handled correctly by `Path` on Windows? test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 110: > 108: } > 109: > 110: private void compileJasmSources(List jasmSources) { The `compileJasmSources()/compileJasmFiles()` and `compileJavaSources()/compileJavaFiles()` method seem to be very specific to whether it's a Java or Jasm file. I'm wondering if we could do the following: - Make class `SourceCode` an interface and create two implementing classes `JavaSourceCode` and `JasmSourceCode`. - Move the `compileJasm/JavaSources()/compileJasm/JavaFiles()` to the corresponding classes (the interface method could be `compile()`). - This allows us to get ride of the `enum` as well in `SourceCode`. The interface could look like this: interface SourceCode { void compile(); String code(); String fileExtension(); String className(); String filePathName(); // Could probably just provide the current implementation as default } I have not thought it through completely and I might be missing some things - but might be worth a shot :-) test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 111: > 109: > 110: private void compileJasmSources(List jasmSources) { > 111: if (jasmSources.size() == 0) { Suggestion: if (jasmSources.isEmpty() { test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 123: > 121: > 122: private void compileJasmFiles(List paths) { > 123: // Compile JASM files with asmtools.jar, shipped with jtreg. Can be moved to be a method comment instead. test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 160: > 158: > 159: private void compileJavaSources(List javaSources) { > 160: if (javaSources.size() == 0) { Suggestion: if (javaSources.isEmpty()) { test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 206: > 204: } catch (Exception e) { > 205: throw new CompileFrameworkException("Could not create directory: " + dir.toString(), e); > 206: } Could be extracted to a method `ensureDirectoryExists()`. Same below (i.e. `writeToFile()`). test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 212: > 210: writer.write(code); > 211: } catch (Exception e) { > 212: throw new CompileFrameworkException("Could not write file: " + path.toString(), e); Suggestion: throw new CompileFrameworkException("Could not write file: " + path, e); test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 266: > 264: } > 265: > 266: public Class getClass(String name) { You should not use `Class` without type parameter. Suggestion: Suggestion: public Class getClass(String name) { test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 275: > 273: > 274: public Object invoke(String className, String methodName, Object[] args) { > 275: Class c = getClass(className); Suggestion: Class c = getClass(className); test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 292: > 290: if (method == null) { > 291: throw new CompileFrameworkException("Method \"" + methodName + "\" not found in class \n" + className + "\"."); > 292: } Could be extracted to a method which returns a non-null method object (i.e. `findMethod()`) or something like that. test/hotspot/jtreg/compiler/lib/compile_framework/SourceCode.java line 29: > 27: * This class represents the source code of a specific class. > 28: */ > 29: public class SourceCode { I'm wondering if this class even needs to be exposed? AFAICS, you always do: compileFramework.add(SourceCode.newJava/JasmSourceCode()); ``` This raises the question if you should just directly provide a `addJava/JasmSourceCode()` API method in `CompileFramework`? test/hotspot/jtreg/compiler/lib/compile_framework/SourceCode.java line 34: > 32: public final String className; > 33: public final String code; > 34: public final Kind kind; I think you can make this `private` since you do not expose this enum. Suggestion: private final Kind kind; test/hotspot/jtreg/compiler/lib/compile_framework/SourceCode.java line 36: > 34: public final Kind kind; > 35: > 36: public SourceCode(String className, String code, Kind kind) { Since you have static initializers, you can also make this constructor private. Suggestion: private SourceCode(String className, String code, Kind kind) { test/hotspot/jtreg/compiler/lib/compile_framework/SourceCode.java line 57: > 55: StringBuilder builder = new StringBuilder(); > 56: String extension = this.kind.name().toLowerCase(); > 57: builder.append(this.className.replace('.','/')).append(".").append(extension); `this` is not required: Suggestion: return kind.name().toLowerCase(); } public String filePathName() { StringBuilder builder = new StringBuilder(); String extension = kind.name().toLowerCase(); builder.append(className.replace('.','/')).append(".").append(extension); ------------- PR Review: https://git.openjdk.org/jdk/pull/20184#pullrequestreview-2305813129 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1760600362 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1760589000 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1760589787 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1760602492 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1760731871 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1760639645 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1760640431 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1760592083 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1760733614 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1760733767 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1760736648 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1760736892 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1760740227 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1760636551 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1760620303 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1760621415 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1760622702 From chagedorn at openjdk.org Mon Sep 16 08:49:15 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 16 Sep 2024 08:49:15 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 06:35:52 GMT, Christian Hagedorn wrote: >> **Motivation** >> >> I want to write small dedicated fuzzers: >> - Generate `java` and `jasm` source code: just some `String`. >> - Quickly compile it (with this framework). >> - Execute the compiled code. >> >> The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. >> >> **The CompileFramework** >> >> Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. >> An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. >> >> I implemented a first, simple version of the framework. I added some tests and examples. >> >> **Example** >> >> >> CompileFramework comp = new CompileFramework(); >> comp.add(SourceCode.newJavaSourceCode("XYZ", "")); >> comp.compile(); >> comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); >> >> >> https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 >> >> **Below some use cases: tests that would have been better with the CompileFramework** >> >> **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** >> >> I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. >> >> For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, >> >> to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something i... > > test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 41: > >> 39: import java.util.List; >> 40: >> 41: public class CompileFramework { > > General comment here: I suggest to add Javadocs to the public API methods in this class that the user can call. You should probably add a class comment since this is the main entry class/entry point to use the framework. Maybe give a short concise summary of what it does. More details can still be found in the README. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1760753986 From chagedorn at openjdk.org Mon Sep 16 08:49:16 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 16 Sep 2024 08:49:16 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing In-Reply-To: References: Message-ID: On Mon, 29 Jul 2024 08:32:50 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 222: >> >>> 220: } >>> 221: >>> 222: if (exitCode != 0 || !output.equals("")) { >> >>> Note: FuzzerUtils.java uses or overrides a deprecated API. >>> Note: Recompile with -Xlint:deprecation for details. >> >> Warnings could corrupt the output. And we probably need to at least make it controllable. > > I see. I have so far not encountered any issues. I would like to keep it simple for now. We can invest in a more complicated solution in a follow up RFE. For now I suppose `FuzzerUtils` would just not be allowed. You can probably replace this by: Suggestion: if (exitCode != 0 || !output.isEmpty()) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1760734794 From luhenry at openjdk.org Mon Sep 16 09:14:05 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 16 Sep 2024 09:14:05 GMT Subject: RFR: 8338407: Support grouping several of existing regs into a new one In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 15:11:32 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch to add `group` support to operand? > > ### Some background about this pr > > In some platforms, there is some concept like a group of registers, for example on riscv there is vector group, which is a group of other single vectors. For example, m2 could be v2+v3, or v4+v5, m4 could be v4+v5+v6+v7, or v8+v9+v10+v11. > And, it's helpful to represent these vector group explicitly, otherwise it's tedious and error-prone. For example, in existing code, there's some like below: > > instruct vstring_compareUL(iRegP_R11 str1, iRegI_R12 cnt1, iRegP_R13 str2, iRegI_R14 cnt2, > iRegI_R10 result, vReg_V4 v4, vReg_V5 v5, vReg_V6 v6, vReg_V7 v7, > vReg_V8 v8, vReg_V9 v9, vReg_V10 v10, vReg_V11 v11, > iRegP_R28 tmp1, iRegL_R29 tmp2) > // ... > effect(KILL tmp1, KILL tmp2, USE_KILL str1, USE_KILL str2, USE_KILL cnt1, USE_KILL cnt2, > TEMP v4, TEMP v5, TEMP v6, TEMP v7, TEMP v8, TEMP v9, TEMP v10, TEMP v11); > // ... > __ string_compare_v($str1$$Register, $str2$$Register, > $cnt1$$Register, $cnt2$$Register, $result$$Register, > $tmp1$$Register, $tmp2$$Register, > StrIntrinsicNode::UL); > > The potential problems of the above code are that we need to > 1. write v4~v11 explicitly in its `instruct` and its `effect`, it's tedious; > 2. vector group are represented implicitly, which is not clear and error-prone; > 3. in its encoding `string_compare_v`, we need to specify m4, and v4/v8 explicitly. > 4. if some day we need to adjust from m4 to m2 or m8, it's really tedious and error-prone to make that change in both ad file and macro assembler files. > > > ### This PR > > The proposed solution is to represent a group of vector registers with a real vector group, e.g. `vReg_V4 v4, vReg_V5 v5, vReg_V6 v6, vReg_V7 v7` with `vReg_V4M4 v4m4`, `TEMP v4, TEMP v5, TEMP v6, TEMP v7` with `TEMP v4m4` and in `string_compare_v` implementation, we could query the length of of vector group (i.e. m4 in this case) and set its vtype automatically. > This solution solve the above listed issues, especially the last issue, that means in the future if we need to adjust m4 to m2 or m8, we only need to change the code in ad file and the change is simpler, and no change in string_compare_v is needed. > > ### What it looks like > > For more usage details, please please check [here](https://github.com/openjdk/jdk/compare/master...Hamlin-Li:jdk:explicit-v-reg-gro... @RealFYang I would love to have your input on this. I expect it would really help in expressing some of the vector register groups constraints we have in the runtime in places, and allow us to clean it up and make the code's assumptions clearer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20775#issuecomment-2352381325 From rcastanedalo at openjdk.org Mon Sep 16 09:31:13 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 16 Sep 2024 09:31:13 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v20] In-Reply-To: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Fri, 13 Sep 2024 23:18:44 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix a few style issues > > src/hotspot/share/opto/output.cpp line 2026: > >> 2024: if (n->is_MachNullCheck()) { >> 2025: assert(n->in(1)->as_Mach()->barrier_data() == 0, >> 2026: "Implicit null checks on memory accesses with barriers are not yet supported"); > > I don't see here changes in `lcm.cpp` which would prevent it. I did not make any changes because the current logic in `lcm.cpp` already prevents this, albeit in a rather accidental way: `PhaseCFG::implicit_null_check` requires that all inputs of a candidate memory operation dominate the null check ([here](https://github.com/robcasloz/jdk/blob/d21104ca8ff1eef88a9d87fb78dda3009414b5b8/src/hotspot/share/opto/lcm.cpp#L310-L328)) so that it can be hoisted. This fails if the candidate memory operation has barriers because these always require `MachTemp` nodes, which are placed in the same block as the candidate and break the dominance condition. See a longer explanation [here](https://github.com/openjdk/jdk/pull/19746/files#r1715387255). Should I add a check to `PhaseCFG::implicit_null_check` to discard these memory accesses more explicitly? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1760814745 From epeter at openjdk.org Mon Sep 16 09:32:33 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 09:32:33 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v2] In-Reply-To: References: Message-ID: <3bA6C7Z_rUGLv8PQn1UL8zpxAbh-nGGFSDvwoXVq8SY=.29294065-d72c-4f8f-b0b8-7d33fe8461e4@github.com> > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: - Merge branch 'fuzzer-test' of https://github.com/eme64/jdk into fuzzer-test - fix paths for windows, had compile issue - increase compile timeut - rm unnecessary test - Merge branch 'master' into fuzzer-test - Merge branch 'master' into fuzzer-test - Merge branch 'master' into fuzzer-test - name timeout better - stub of TestMergeStoresFuzzer - private source and classes directory per CompileFramework - ... and 43 more: https://git.openjdk.org/jdk/compare/51a90e3f...efe94764 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/881c76bf..efe94764 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=00-01 Stats: 13728 lines in 288 files changed: 8296 ins; 3612 del; 1820 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From epeter at openjdk.org Mon Sep 16 09:32:33 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 09:32:33 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v2] In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 06:38:08 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: >> >> - Merge branch 'fuzzer-test' of https://github.com/eme64/jdk into fuzzer-test >> - fix paths for windows, had compile issue >> - increase compile timeut >> - rm unnecessary test >> - Merge branch 'master' into fuzzer-test >> - Merge branch 'master' into fuzzer-test >> - Merge branch 'master' into fuzzer-test >> - name timeout better >> - stub of TestMergeStoresFuzzer >> - private source and classes directory per CompileFramework >> - ... and 43 more: https://git.openjdk.org/jdk/compare/51a90e3f...efe94764 > > test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 54: > >> 52: File.pathSeparator + >> 53: classesDir.toAbsolutePath().toString(); >> 54: return cp.replace("\\", "\\\\"); // For windows paths > > Is this really required and not automatically handled correctly by `Path` on Windows? I could also add this at the use-site. The problem is that I'm using it in the generated Java code as a String, and so there we have to escape the backslash. Without this, I got those mysterious timeouts on Windows. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1760811842 From epeter at openjdk.org Mon Sep 16 09:37:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 09:37:25 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v3] In-Reply-To: References: Message-ID: <3dXEvpaFS4lMNXq4EdHQq4d9W5T2JRNLpA_nxDktK18=.a5ec1af8-ec2c-44b9-b36f-de7ceb0da39d@github.com> > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/efe94764..09dc029f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=01-02 Stats: 8 lines in 2 files changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From dnsimon at openjdk.org Mon Sep 16 09:52:04 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 16 Sep 2024 09:52:04 GMT Subject: RFR: 8339954: Print JVMCI names with the Compiler.{perfmap,codelist,CodeHeap_Analytics} diagnostic commands [v2] In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 11:00:45 GMT, Volker Simonis wrote: >> The diagnostic commands `Compiler.codelist`, `Compiler.CodeHeap_Analytics` and `Compiler.perfmap` are handy for analyzing the CodeCache or creating a symbol file for the perf tool. However, with the Truffle framework which uses the GraalVM compiler in "hosted" mode, we can end up with hundreds if not thousands of nmethods which are all linked to the same Java method (most prominently `com.oracle.truffle.runtime.OptimizedCallTarget.profiledPERoot()`). All these nmethods are currently indistinguishable by the two aforementioned diagnostic commands. >> >> But nmethods compiled by the GraalVM compiler have a special "JVMCI name" attached to them, which in the case of Truffle corresponds to the guest language function name. Printing this "JVMCI name" in addition to the true Java method name makes it easier to distinguish various nmethods compiled by Truffle or other frameworks which use the GraalVM compiler in hosted mode. >> >> For the `Compiler.perfmap` command, it should be mentioned that the format of the created perfmap file is specified here: >> https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/jit-interface.txt >> >> It only mandates that each line starts with a start and size number in hex and interprets the whole rest of the line (which can even include special characters) as a "symbolname". Taking into account that we already today produce "symbol names" as different as "`throw_range_check_failed Runtime1 stub`", "`Signature Handler Temp Buffer`", "`I2C/C2I adapters`" or "`boolean java.lang.invoke.VarHandleInts$FieldInstanceReadWrite.compareAndSet(java.lang.invoke.VarHandle, java.lang.Object, int, int)`", adding a potential jvmci suffix like "jvmci_name=myFancyJSFunction()#2" to some methods will not cause any compatibility issues. >> >> ..and the output of `Compiler.CodeHeap_Analytics` is unparsable anyway :) > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Replace call to ::sprintf() by os::snprintf() Looks fine to me. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20954#pullrequestreview-2306187107 From chagedorn at openjdk.org Mon Sep 16 10:32:07 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 16 Sep 2024 10:32:07 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v3] In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 09:26:22 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 54: >> >>> 52: File.pathSeparator + >>> 53: classesDir.toAbsolutePath().toString(); >>> 54: return cp.replace("\\", "\\\\"); // For windows paths >> >> Is this really required and not automatically handled correctly by `Path` on Windows? > > I could also add this at the use-site. The problem is that I'm using it in the generated Java code as a String, and so there we have to escape the backslash. Without this, I got those mysterious timeouts on Windows. Ah, I see. Maybe you can add a comment to clarify the reason behind. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1760894864 From epeter at openjdk.org Mon Sep 16 11:14:44 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 11:14:44 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v4] In-Reply-To: References: Message-ID: > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - rm some toString calls - fix syntax ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/09dc029f..d57b94a9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=02-03 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From epeter at openjdk.org Mon Sep 16 11:14:45 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 11:14:45 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v4] In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 10:29:07 GMT, Christian Hagedorn wrote: >> I could also add this at the use-site. The problem is that I'm using it in the generated Java code as a String, and so there we have to escape the backslash. Without this, I got those mysterious timeouts on Windows. > > Ah, I see. Maybe you can add a comment to clarify the reason behind. // Escape the backslash for Windows paths. We are using the path in the command-line // and Java code, so we always want it to be escaped. Adding this ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1760952398 From epeter at openjdk.org Mon Sep 16 11:35:26 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 11:35:26 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v5] In-Reply-To: References: Message-ID: > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: make SourceCode package private ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/d57b94a9..0a3daf6d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=03-04 Stats: 42 lines in 11 files changed: 12 ins; 15 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From thartmann at openjdk.org Mon Sep 16 12:43:04 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 16 Sep 2024 12:43:04 GMT Subject: RFR: 8338566: Lazy creation of exception instances is not thread safe In-Reply-To: <6REI8keH3KddT31KZ-TAwxekLpClKLkq4h17ZjhtajU=.fb277286-a305-4600-a8ef-10ff0ede873c@github.com> References: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> <6REI8keH3KddT31KZ-TAwxekLpClKLkq4h17ZjhtajU=.fb277286-a305-4600-a8ef-10ff0ede873c@github.com> Message-ID: On Thu, 12 Sep 2024 09:48:28 GMT, Dean Long wrote: >> Thanks for looking into this, David and Dean. Good points, I agree that we would need to make this completely thread-safe to prevent a leak. Looking at the code again, I wonder why we even do all this lazily, especially since we already create `NullPointerException` and `ArithmeticException` eagerly at VM startup: >> https://github.com/openjdk/jdk/blob/438121be6bdb085fa13ad14ec53b09ecdbd4757d/src/hotspot/share/memory/universe.cpp#L1086-L1089 >> >> Couldn't we do the same for `ArrayIndexOutOfBoundsException`, `ArrayStoreException` and `ClassCastException`? This would save us quite some complexity and I think the startup / footprint overhead is negligible. > > @TobiHartmann Yes, that seems like the best idea. I was going to suggest moving the fields into the CompilerThread, which gets rid of the race and limits the redundant objects, but I like your idea better. @dean-long, @vnkozlov please (re-)approve. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20950#issuecomment-2352807733 From epeter at openjdk.org Mon Sep 16 13:00:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 13:00:22 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v5] In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 08:27:24 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> make SourceCode package private > > test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 110: > >> 108: } >> 109: >> 110: private void compileJasmSources(List jasmSources) { > > The `compileJasmSources()/compileJasmFiles()` and `compileJavaSources()/compileJavaFiles()` method seem to be very specific to whether it's a Java or Jasm file. I'm wondering if we could do the following: > > - Make class `SourceCode` an interface and create two implementing classes `JavaSourceCode` and `JasmSourceCode`. > - Move the `compileJasm/JavaSources()/compileJasm/JavaFiles()` to the corresponding classes (the interface method could be `compile()`). > - This allows us to get ride of the `enum` as well in `SourceCode`. The interface could look like this: > > interface SourceCode { > void compile(); > String code(); > String fileExtension(); > String className(); > String filePathName(); // Could probably just provide the current implementation as default > } > > I have not thought it through completely and I might be missing some things - but might be worth a shot :-) The offline discussion was quite fruitful, I'll do it another way now. > test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 123: > >> 121: >> 122: private void compileJasmFiles(List paths) { >> 123: // Compile JASM files with asmtools.jar, shipped with jtreg. > > Can be moved to be a method comment instead. ok ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1761094476 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1761093782 From epeter at openjdk.org Mon Sep 16 13:00:14 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 13:00:14 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v5] In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 08:43:42 GMT, Christian Hagedorn wrote: >> test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 41: >> >>> 39: import java.util.List; >>> 40: >>> 41: public class CompileFramework { >> >> General comment here: I suggest to add Javadocs to the public API methods in this class that the user can call. > > You should probably add a class comment since this is the main entry class/entry point to use the framework. Maybe give a short concise summary of what it does. More details can still be found in the README. Will do! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1761095049 From epeter at openjdk.org Mon Sep 16 13:00:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 13:00:27 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v5] In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 08:29:32 GMT, Christian Hagedorn wrote: >> I see. I have so far not encountered any issues. I would like to keep it simple for now. We can invest in a more complicated solution in a follow up RFE. For now I suppose `FuzzerUtils` would just not be allowed. > > You can probably replace this by: > Suggestion: > > if (exitCode != 0 || !output.isEmpty()) { done! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1761098490 From epeter at openjdk.org Mon Sep 16 13:04:32 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 13:04:32 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v6] In-Reply-To: References: Message-ID: > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: split file into 4 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/0a3daf6d..5d700f12 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=04-05 Stats: 540 lines in 4 files changed: 342 ins; 184 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From epeter at openjdk.org Mon Sep 16 13:29:42 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 13:29:42 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v7] In-Reply-To: References: Message-ID: > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: refactoring continued ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/5d700f12..0707ae18 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=05-06 Stats: 35 lines in 3 files changed: 1 ins; 25 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From epeter at openjdk.org Mon Sep 16 13:50:59 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 13:50:59 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v8] In-Reply-To: References: Message-ID: <6xtiLfVquJIVhRHG6c_TkvJyL466N8weEy1QDoN1Qeg=.838ddb4c-6290-49a6-9684-5a3aa1b55574@github.com> > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: some refactor and comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/0707ae18..032952d8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=06-07 Stats: 30 lines in 4 files changed: 15 ins; 4 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From shade at openjdk.org Mon Sep 16 14:25:04 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 16 Sep 2024 14:25:04 GMT Subject: RFR: 8340144: C1: remove unused Compilation::_max_spills [v2] In-Reply-To: References: Message-ID: On Sun, 15 Sep 2024 01:26:51 GMT, Denghui Dong wrote: >> Hi, >> >> Please review this trivial change that removed the unused field Compilation::_max_spills. >> >> Thanks > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21007#pullrequestreview-2306835524 From kxu at openjdk.org Mon Sep 16 14:41:10 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 16 Sep 2024 14:41:10 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v7] In-Reply-To: References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> Message-ID: On Thu, 12 Sep 2024 06:18:30 GMT, Christian Hagedorn wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> format comments, add @bug, avoid zero divisor > > Thanks for the update, that looks good to me! I'll also give this a spinning in our testing. Will report back once it's completed. @chhagedorn Sorry for pinging. I'm just wondering if there's any updates on internal testing. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20877#issuecomment-2353115285 From epeter at openjdk.org Mon Sep 16 14:44:49 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 14:44:49 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v9] In-Reply-To: References: Message-ID: > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more documentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/032952d8..580ecf10 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=07-08 Stats: 65 lines in 2 files changed: 58 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From epeter at openjdk.org Mon Sep 16 14:47:47 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 14:47:47 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v10] In-Reply-To: References: Message-ID: > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix link brackets ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/580ecf10..46ce0729 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=08-09 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From epeter at openjdk.org Mon Sep 16 14:57:45 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 14:57:45 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v11] In-Reply-To: References: Message-ID: > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: even more documentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/46ce0729..e68b376f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=09-10 Stats: 17 lines in 2 files changed: 7 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From simonis at openjdk.org Mon Sep 16 14:59:10 2024 From: simonis at openjdk.org (Volker Simonis) Date: Mon, 16 Sep 2024 14:59:10 GMT Subject: Integrated: 8339954: Print JVMCI names with the Compiler.{perfmap,codelist,CodeHeap_Analytics} diagnostic commands In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 18:48:18 GMT, Volker Simonis wrote: > The diagnostic commands `Compiler.codelist`, `Compiler.CodeHeap_Analytics` and `Compiler.perfmap` are handy for analyzing the CodeCache or creating a symbol file for the perf tool. However, with the Truffle framework which uses the GraalVM compiler in "hosted" mode, we can end up with hundreds if not thousands of nmethods which are all linked to the same Java method (most prominently `com.oracle.truffle.runtime.OptimizedCallTarget.profiledPERoot()`). All these nmethods are currently indistinguishable by the two aforementioned diagnostic commands. > > But nmethods compiled by the GraalVM compiler have a special "JVMCI name" attached to them, which in the case of Truffle corresponds to the guest language function name. Printing this "JVMCI name" in addition to the true Java method name makes it easier to distinguish various nmethods compiled by Truffle or other frameworks which use the GraalVM compiler in hosted mode. > > For the `Compiler.perfmap` command, it should be mentioned that the format of the created perfmap file is specified here: > https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/jit-interface.txt > > It only mandates that each line starts with a start and size number in hex and interprets the whole rest of the line (which can even include special characters) as a "symbolname". Taking into account that we already today produce "symbol names" as different as "`throw_range_check_failed Runtime1 stub`", "`Signature Handler Temp Buffer`", "`I2C/C2I adapters`" or "`boolean java.lang.invoke.VarHandleInts$FieldInstanceReadWrite.compareAndSet(java.lang.invoke.VarHandle, java.lang.Object, int, int)`", adding a potential jvmci suffix like "jvmci_name=myFancyJSFunction()#2" to some methods will not cause any compatibility issues. > > ..and the output of `Compiler.CodeHeap_Analytics` is unparsable anyway :) This pull request has now been integrated. Changeset: 996790c7 Author: Volker Simonis URL: https://git.openjdk.org/jdk/commit/996790c70f902d7840d0649a6b0867bed47c6537 Stats: 35 lines in 2 files changed: 27 ins; 0 del; 8 mod 8339954: Print JVMCI names with the Compiler.{perfmap,codelist,CodeHeap_Analytics} diagnostic commands Reviewed-by: phh, dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/20954 From epeter at openjdk.org Mon Sep 16 15:01:09 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 15:01:09 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v11] In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 08:33:29 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> even more documentation > > test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 292: > >> 290: if (method == null) { >> 291: throw new CompileFrameworkException("Method \"" + methodName + "\" not found in class \n" + className + "\"."); >> 292: } > > Could be extracted to a method which returns a non-null method object (i.e. `findMethod()`) or something like that. done. > test/hotspot/jtreg/compiler/lib/compile_framework/SourceCode.java line 29: > >> 27: * This class represents the source code of a specific class. >> 28: */ >> 29: public class SourceCode { > > I'm wondering if this class even needs to be exposed? AFAICS, you always do: > > compileFramework.add(SourceCode.newJava/JasmSourceCode()); > ``` > This raises the question if you should just directly provide a `addJava/JasmSourceCode()` API method in `CompileFramework`? changed it accordingly! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1761326918 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1761327821 From epeter at openjdk.org Mon Sep 16 15:05:44 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 15:05:44 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v12] In-Reply-To: References: Message-ID: > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: one more ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/e68b376f..fc363c61 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=10-11 Stats: 19 lines in 1 file changed: 10 ins; 6 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From epeter at openjdk.org Mon Sep 16 15:05:44 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 15:05:44 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v12] In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 08:28:41 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> one more > > test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 206: > >> 204: } catch (Exception e) { >> 205: throw new CompileFrameworkException("Could not create directory: " + dir.toString(), e); >> 206: } > > Could be extracted to a method `ensureDirectoryExists()`. Same below (i.e. `writeToFile()`). I think it is still small enough.... so I'd rather keep it like it is. Splitting it all up creates a lot of boiler plate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1761332838 From epeter at openjdk.org Mon Sep 16 15:12:33 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 15:12:33 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v13] In-Reply-To: References: Message-ID: > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: move some code around ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/fc363c61..45abaed4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=11-12 Stats: 113 lines in 3 files changed: 42 ins; 42 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From epeter at openjdk.org Mon Sep 16 15:12:33 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 15:12:33 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v13] In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 08:46:55 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> move some code around > > The framework is a very nice idea! It will definitely improve our testing coverage. It seems simple to use and also easier to review instead of having the fully generated test files to go through. > > I did a first pass and left some comments. > > Should we also add a `README.md`? You could add good the introduction and motivation in the PR summary to it. Additionally, you can mention the core features, reference your examples, give hints how the JTreg header comment should look like, and also add a section for future work if there is anything you think should/could be explored (not a conclusive list, just a few ideas what you could mention in the `README` :-) ) @chhagedorn thanks a lot for the review! I did some deep refactoring, and now I hope it is more to your liking ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20184#issuecomment-2353199644 From lmesnik at openjdk.org Mon Sep 16 15:14:08 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 16 Sep 2024 15:14:08 GMT Subject: RFR: 8339299: C1 will miss type profile when inline final method [v5] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 08:35:25 GMT, kuaiwei wrote: >> I found sometimes C1 will miss type profile and I tried to write a test to demonstrate it. >> It will happen with these conditions >> 1 It's a virtual call and the callee is a final method. So c1 will think it's static bound. >> 2 The interpreter never touch the callsite, so interpreter does not add type profile. >> 3 In c1 compilation, it will be inlined based on CHA. >> 4 In c2 compilation, the CHA is broken, but type profile is missing, so c2 can not inline it. > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Modify test case to use createTestJavaProcessBuilder Tes looks good. ------------- Marked as reviewed by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20786#pullrequestreview-2306965681 From epeter at openjdk.org Mon Sep 16 15:15:14 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 15:15:14 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v7] In-Reply-To: References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> Message-ID: On Mon, 16 Sep 2024 14:38:29 GMT, Kangcheng Xu wrote: >> Thanks for the update, that looks good to me! I'll also give this a spinning in our testing. Will report back once it's completed. > > @chhagedorn Sorry for pinging. I'm just wondering if there's any updates on internal testing. Thanks! @tabjy The testing looks clean for v06, I just checked @chhagedorn 's run. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20877#issuecomment-2353205290 From epeter at openjdk.org Mon Sep 16 15:18:09 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 15:18:09 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v7] In-Reply-To: References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> Message-ID: On Wed, 11 Sep 2024 20:26:43 GMT, Kangcheng Xu wrote: >> Hello all. This patch addresses [JDK-8332442](https://bugs.openjdk.org/browse/JDK-8332442) and refactors `Op_ModI`/`Op_ModL`/`Op_UModI`/`Op_UModL` cases in `DIVMOD` transforamtions. The purpose of the transformation to convert adjacent div `/` and mod `%` operations of the same operands into one should the platform support this feature (e.g., x86-64). >> >> I took the liberty adding _signed_ DIVMOD nodes (i.e., `DIV_MOD_I` and `DIV_MOD_L`) to the `IRNode` class constants as they are previously missing. Please let me know if they were left intentionally and if there are any other concerns. Thanks! >> >> ~This will be a draft PR before GHA tests are confirmed passing.~ > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > format comments, add @bug, avoid zero divisor test/hotspot/jtreg/compiler/c2/TestDivModNodes.java line 36: > 34: * @summary Test that DIV and MOD nodes are converted into DIVMOD where possible > 35: * @library /test/lib / > 36: * @requires vm.compiler2.enabled Is `C2` really required for this test? Or could another compiler also benefit from your test? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1761357488 From kxu at openjdk.org Mon Sep 16 15:18:07 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 16 Sep 2024 15:18:07 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v7] In-Reply-To: References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> Message-ID: <05BrbHzC-KOk6cxVSaeoHftsPYWDx1hiWP1VEKWrTnQ=.04cbf116-b443-4164-a163-65efa4cafb5f@github.com> On Mon, 16 Sep 2024 15:12:21 GMT, Emanuel Peter wrote: >> @chhagedorn Sorry for pinging. I'm just wondering if there's any updates on internal testing. Thanks! > > @tabjy The testing looks clean for v06, I just checked @chhagedorn 's run. @eme64 Thanks for letting me know! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20877#issuecomment-2353211884 From duke at openjdk.org Mon Sep 16 15:18:09 2024 From: duke at openjdk.org (duke) Date: Mon, 16 Sep 2024 15:18:09 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v7] In-Reply-To: References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> Message-ID: On Wed, 11 Sep 2024 20:26:43 GMT, Kangcheng Xu wrote: >> Hello all. This patch addresses [JDK-8332442](https://bugs.openjdk.org/browse/JDK-8332442) and refactors `Op_ModI`/`Op_ModL`/`Op_UModI`/`Op_UModL` cases in `DIVMOD` transforamtions. The purpose of the transformation to convert adjacent div `/` and mod `%` operations of the same operands into one should the platform support this feature (e.g., x86-64). >> >> I took the liberty adding _signed_ DIVMOD nodes (i.e., `DIV_MOD_I` and `DIV_MOD_L`) to the `IRNode` class constants as they are previously missing. Please let me know if they were left intentionally and if there are any other concerns. Thanks! >> >> ~This will be a draft PR before GHA tests are confirmed passing.~ > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > format comments, add @bug, avoid zero divisor @tabjy Your change (at version ef7882b0990532fccad167a274d06eacadf68407) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20877#issuecomment-2353213354 From kxu at openjdk.org Mon Sep 16 15:45:21 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 16 Sep 2024 15:45:21 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v8] In-Reply-To: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> Message-ID: > Hello all. This patch addresses [JDK-8332442](https://bugs.openjdk.org/browse/JDK-8332442) and refactors `Op_ModI`/`Op_ModL`/`Op_UModI`/`Op_UModL` cases in `DIVMOD` transforamtions. The purpose of the transformation to convert adjacent div `/` and mod `%` operations of the same operands into one should the platform support this feature (e.g., x86-64). > > I took the liberty adding _signed_ DIVMOD nodes (i.e., `DIV_MOD_I` and `DIV_MOD_L`) to the `IRNode` class constants as they are previously missing. Please let me know if they were left intentionally and if there are any other concerns. Thanks! > > ~This will be a draft PR before GHA tests are confirmed passing.~ Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: remove C2 requirement for tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20877/files - new: https://git.openjdk.org/jdk/pull/20877/files/ef7882b0..bfe017d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20877&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20877&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20877.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20877/head:pull/20877 PR: https://git.openjdk.org/jdk/pull/20877 From roland at openjdk.org Mon Sep 16 15:45:21 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 16 Sep 2024 15:45:21 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v8] In-Reply-To: References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> Message-ID: <4S8ZY-xBM08WnsOhcyd9H4rXtWAFJevyd0BriQWPxvA=.614888fa-7964-46d7-aa9b-32de37eaea73@github.com> On Mon, 16 Sep 2024 15:42:01 GMT, Kangcheng Xu wrote: >> Hello all. This patch addresses [JDK-8332442](https://bugs.openjdk.org/browse/JDK-8332442) and refactors `Op_ModI`/`Op_ModL`/`Op_UModI`/`Op_UModL` cases in `DIVMOD` transforamtions. The purpose of the transformation to convert adjacent div `/` and mod `%` operations of the same operands into one should the platform support this feature (e.g., x86-64). >> >> I took the liberty adding _signed_ DIVMOD nodes (i.e., `DIV_MOD_I` and `DIV_MOD_L`) to the `IRNode` class constants as they are previously missing. Please let me know if they were left intentionally and if there are any other concerns. Thanks! >> >> ~This will be a draft PR before GHA tests are confirmed passing.~ > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > remove C2 requirement for tests Marked as reviewed by roland (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20877#pullrequestreview-2307042816 From kxu at openjdk.org Mon Sep 16 15:45:22 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 16 Sep 2024 15:45:22 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v7] In-Reply-To: References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> Message-ID: On Mon, 16 Sep 2024 15:15:45 GMT, Emanuel Peter wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> format comments, add @bug, avoid zero divisor > > test/hotspot/jtreg/compiler/c2/TestDivModNodes.java line 36: > >> 34: * @summary Test that DIV and MOD nodes are converted into DIVMOD where possible >> 35: * @library /test/lib / >> 36: * @requires vm.compiler2.enabled > > Is `C2` really required for this test? Or could another compiler also benefit from your test? Removed C2 requirements for validating arithmetic results on other compilers too ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20877#discussion_r1761393742 From kvn at openjdk.org Mon Sep 16 15:51:15 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 16 Sep 2024 15:51:15 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v20] In-Reply-To: References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Mon, 16 Sep 2024 09:28:30 GMT, Roberto Casta?eda Lozano wrote: > Should I add a check to PhaseCFG::implicit_null_check to discard these memory accesses more explicitly? Yes, please. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1761413544 From kvn at openjdk.org Mon Sep 16 16:02:16 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 16 Sep 2024 16:02:16 GMT Subject: RFR: 8338566: Lazy creation of exception instances is not thread safe [v2] In-Reply-To: <_spGzlVz58-PSyiiwyEgVC2Z5YXsCg-sgF0-WWl_nLE=.55e033e4-c400-453e-a2a1-0c2d2ee7ea9c@github.com> References: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> <_spGzlVz58-PSyiiwyEgVC2Z5YXsCg-sgF0-WWl_nLE=.55e033e4-c400-453e-a2a1-0c2d2ee7ea9c@github.com> Message-ID: On Fri, 13 Sep 2024 09:14:21 GMT, Tobias Hartmann wrote: >> Similar to [JDK-8251923](https://bugs.openjdk.org/browse/JDK-8251923), we need a store-store barrier before publishing a handle because otherwise another thread could observe the handle before it's fully initialized and read null from it. This affects architectures with a weak memory model like AArch64. >> >> Unfortunately, this only happened twice in our testing and I was never able to reproduce it. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Create exceptions eagerly Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20950#pullrequestreview-2307089328 From rcastanedalo at openjdk.org Mon Sep 16 16:34:59 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 16 Sep 2024 16:34:59 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v21] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with three additional commits since the last revision: - Add missing IR test to test run - Skip barrier refining for non-OOP stores and stores without barrier data - Assert that m is input to n in Matcher::is_encode_and_store_pattern ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/141020e6..653f9acf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=19-20 Stats: 21 lines in 3 files changed: 16 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Mon Sep 16 16:37:32 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 16 Sep 2024 16:37:32 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v20] In-Reply-To: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Fri, 13 Sep 2024 22:51:07 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix a few style issues > > src/hotspot/share/opto/matcher.cpp line 2845: > >> 2843: n->Opcode() == Op_StoreN && >> 2844: m->is_EncodeP(); >> 2845: } > > Add comment that `m` is input of `n`. I thought about adding assert too but I will leave it to you. Added the assertion (commit a480d70b). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1761478462 From psandoz at openjdk.org Mon Sep 16 16:47:11 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 16 Sep 2024 16:47:11 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v2] In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Fri, 13 Sep 2024 22:30:36 GMT, Sandhya Viswanathan wrote: >> Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. >> >> Summary of changes is as follows: >> 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. >> 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code >> >> For the following source: >> >> >> public void test() { >> var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); >> for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { >> var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); >> index.selectFrom(inpvect).intoArray(byteres, j); >> } >> } >> >> >> The code generated for inner main now looks as follows: >> ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 >> 0x00007f40d02274d0: movslq %ebx,%r13 >> 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) >> 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) >> 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) >> 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) >> 0x00007f40d022751f: add $0x40,%ebx >> 0x00007f40d0227522: cmp %r8d,%ebx >> 0x00007f40d0227525: jl 0x00007f40d02274d0 >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Java changes are good (I created a CSR). The approach in HotSpot looks good to me, but need HotSpot reviewers. ------------- Marked as reviewed by psandoz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20634#pullrequestreview-2307180561 From rcastanedalo at openjdk.org Mon Sep 16 16:49:25 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 16 Sep 2024 16:49:25 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v20] In-Reply-To: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Fri, 13 Sep 2024 23:14:19 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix a few style issues > > src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 241: > >> 239: assert(newval_bottom->isa_ptr() || newval_bottom->isa_narrowoop(), "newval should be an OOP"); >> 240: TypePtr::PTR newval_type = newval_bottom->make_ptr()->ptr(); >> 241: uint8_t barrier_data = store->barrier_data(); > > Should you check barrier data for 0? > `is_ptr()` has wide set of types. It includes TypeRawPtr, TypeKlassPtr and TypeMetadataPtr. Where you filtering them? I added the check and excluded other pointers than OOPs, narrow OOPs, and null pointers (needed because null in uncompressed OOP mode is typed as `AnyPtr`) in commit 10bc0d2c. Note that these checks are not strictly required for correctness, because for all other pointers the corresponding barrier data would be 0, and the only potential operations over it would be bit clearing. But I still think they have value in that they communicate more clearly the intent and scope of the optimization. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1761494258 From sviswanathan at openjdk.org Mon Sep 16 17:02:09 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 16 Sep 2024 17:02:09 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: <0QUwAu8wCrqU-BSbINCiBATZje4xib3rLEZKgG9mHhE=.fed2bc28-b4c3-417d-b4d6-3b5ce1e34c67@github.com> On Fri, 13 Sep 2024 19:45:11 GMT, Paul Sandoz wrote: >>> Given `rearrange` with 1 vector gets wrapping indices semantics. I think we should stop normalizing indices when converting a `Vector` into a `VectorShuffle` (currently we wrap all out-of-bound elements to `[-VLEN, 0)`). Then the rearrange with 2 vectors will also wrap similarly (all indices are `& (VLEN * 2 - 1)`, then indices `[0, VLEN)` maps to the first vector and indices `[VLEN, 2 * VLEN)` map to the second vector). We will normalize the indices when we invoke `VectorShuffle::toVector` which I think is much less used than `Vector::toShuffle`. What do you think? >> >> The guidance from Paul Sandoz and John Rose is to keep the the partial wrapping at shuffle construction as is for now and only change the rearrange and selectFrom apis. > >> >> The guidance from Paul Sandoz and John Rose is to keep the the partial wrapping at shuffle construction as is for now and only change the rearrange and selectFrom apis. > > Yes, we are trying to take smaller incremental steps. Once the we are done with this work we can step back and discuss/review what to do about shuffles. @PaulSandoz Thanks a lot for the review and the CSR. I will look forward to Hotspot review and CSR progress/approval. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2353449454 From psandoz at openjdk.org Mon Sep 16 17:08:17 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 16 Sep 2024 17:08:17 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v12] In-Reply-To: References: Message-ID: On Sat, 14 Sep 2024 08:40:44 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update AARCH64 specific test using UNSIGNED_* comparison operators. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java line 573: > 571: * @see VectorMath#addSaturating(int, int) > 572: */ > 573: public static final Associative SADD = assoc("SADD", "+", VectorSupport.VECTOR_OP_SADD, VO_NOFP); Change from type `Associative` to `Binary` for `SADD` and `SUADD`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1761527956 From sviswanathan at openjdk.org Mon Sep 16 17:41:10 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 16 Sep 2024 17:41:10 GMT Subject: RFR: 8339793: Fix incorrect APX feature enabling with -XX:-UseAPX [v2] In-Reply-To: References: Message-ID: On Sun, 15 Sep 2024 11:32:40 GMT, Jatin Bhateja wrote: >> Currently VM_Supports::supports_apx_f() returns a true value even if user explicitly pass -XX:-UseAPX runtime flag, this enables APX specific code and register set. >> >> This bug fix patch turn off the APX_F feature if UseAPX runtime flag is explicitly set to false value. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8339793 > - 8339793: Fix incorrect APX feature enabling with -XX:-UseAPX. Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20921#pullrequestreview-2307314685 From psandoz at openjdk.org Mon Sep 16 18:47:17 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 16 Sep 2024 18:47:17 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> Message-ID: On Mon, 16 Sep 2024 02:58:41 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 561: > 559: for (int i = 0; i < vlen; i++) { > 560: int index = ((int)vecPayload1[i]); > 561: res[i] = index >= vlen ? vecPayload3[index & (vlen - 1)] : vecPayload2[index]; This is incorrect as the index could be negative. You need to wrap in the range `[0, 2 * vlen - 1]` before the comparison and selection. int index = ((int)vecPayload1[i]) & ((vlen << 1) - 1)); res[i] = index < vlen ? vecPayload2[index] : vecPayload3[index - vlen]; src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2974: > 2972: final $abstractvectortype$ selectFromTemplate(Class> indexVecClass, > 2973: $abstractvectortype$ v1, $abstractvectortype$ v2) { > 2974: int twoVectorLen = length() * 2; We should assert that the length is a power of two. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1761663646 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1761667602 From duke at openjdk.org Mon Sep 16 19:16:10 2024 From: duke at openjdk.org (duke) Date: Mon, 16 Sep 2024 19:16:10 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v8] In-Reply-To: References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> Message-ID: On Mon, 16 Sep 2024 15:45:21 GMT, Kangcheng Xu wrote: >> Hello all. This patch addresses [JDK-8332442](https://bugs.openjdk.org/browse/JDK-8332442) and refactors `Op_ModI`/`Op_ModL`/`Op_UModI`/`Op_UModL` cases in `DIVMOD` transforamtions. The purpose of the transformation to convert adjacent div `/` and mod `%` operations of the same operands into one should the platform support this feature (e.g., x86-64). >> >> I took the liberty adding _signed_ DIVMOD nodes (i.e., `DIV_MOD_I` and `DIV_MOD_L`) to the `IRNode` class constants as they are previously missing. Please let me know if they were left intentionally and if there are any other concerns. Thanks! >> >> ~This will be a draft PR before GHA tests are confirmed passing.~ > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > remove C2 requirement for tests @tabjy Your change (at version bfe017d7c7a12b615f21621a4fdcd3c16f254000) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20877#issuecomment-2353715752 From kxu at openjdk.org Mon Sep 16 20:51:49 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 16 Sep 2024 20:51:49 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v2] In-Reply-To: References: Message-ID: <-TWzHU1sjg49BOttKN_muQ6ICHAMbAnBoNUurRbqHDg=.a27bbf72-a451-4257-914f-d37e181ce257@github.com> > This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. > > As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). > > Some notable examples of this transformation include: > - `a + a + a` => `a*3` => `(a<<1) + a` > - `a + a + a + a` => `a*4` => `a<<2` > - `a*3 + a` => `a*4` => `a<<2` > - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` > > See included IR unit tests for more. Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision: - Merge branch 'openjdk:master' into arithmetic-canonicalization - Merge pull request #1 from tabjy/arithmetic-canonicalization-v2 Arithmetic canonicalization v2 - remove dead code - fix potential void type const nodes - refactor and cleanup - add more test cases - re-implement depth limit on recursion - passes TestIRLShiftIdeal_XPlusX_LShiftC - passes AddI[L]NodeIdealizationTests - revert depth limits - ... and 12 more: https://git.openjdk.org/jdk/compare/cf66820f...c8fdb74c ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20754/files - new: https://git.openjdk.org/jdk/pull/20754/files/5923f361..c8fdb74c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20754&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20754&range=00-01 Stats: 33684 lines in 970 files changed: 20184 ins; 7746 del; 5754 mod Patch: https://git.openjdk.org/jdk/pull/20754.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20754/head:pull/20754 PR: https://git.openjdk.org/jdk/pull/20754 From sviswanathan at openjdk.org Mon Sep 16 20:53:06 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 16 Sep 2024 20:53:06 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v4] In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 20:37:27 GMT, Jatin Bhateja wrote: >> - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The >> condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. >> - This saves emitting an explicit MOVZX instruction after setCC. >> - These new instructions are encoded using 4 byte Extended EVEX encoding. >> >> Validation performed over stand alone test point using Intel SDE. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution. src/hotspot/cpu/x86/assembler_x86.cpp line 16052: > 16050: > 16051: // Encoding Format : eevex_prefix | opcode_cc | modrm > 16052: int encode = vex_prefix_and_encode(dst->encoding(), 0, 0, VEX_SIMD_F2, /* MAP4 */VEX_OPCODE_0F_3C, &attributes); We could replace this with: int encode = evex_prefix_and_encode_ndd(dst->encoding(), 0, 0, VEX_SIMD_F2, /* MAP4 */VEX_OPCODE_0F_3C, &attributes); src/hotspot/cpu/x86/macroAssembler_x86.cpp line 10426: > 10424: > 10425: void MacroAssembler::setcc(Assembler::Condition comparison, Register dst) { > 10426: if (VM_Version::supports_apx_f()) { We could check UseAPX here instead of VM_Version::supports_apx_f(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20920#discussion_r1761638922 PR Review Comment: https://git.openjdk.org/jdk/pull/20920#discussion_r1761942325 From kxu at openjdk.org Mon Sep 16 21:07:07 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 16 Sep 2024 21:07:07 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v2] In-Reply-To: <-TWzHU1sjg49BOttKN_muQ6ICHAMbAnBoNUurRbqHDg=.a27bbf72-a451-4257-914f-d37e181ce257@github.com> References: <-TWzHU1sjg49BOttKN_muQ6ICHAMbAnBoNUurRbqHDg=.a27bbf72-a451-4257-914f-d37e181ce257@github.com> Message-ID: On Mon, 16 Sep 2024 20:51:49 GMT, Kangcheng Xu wrote: >> This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. >> >> As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). >> >> Some notable examples of this transformation include: >> - `a + a + a` => `a*3` => `(a<<1) + a` >> - `a + a + a + a` => `a*4` => `a<<2` >> - `a*3 + a` => `a*4` => `a<<2` >> - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` >> >> See included IR unit tests for more. > > Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision: > > - Merge branch 'openjdk:master' into arithmetic-canonicalization > - Merge pull request #1 from tabjy/arithmetic-canonicalization-v2 > > Arithmetic canonicalization v2 > - remove dead code > - fix potential void type const nodes > - refactor and cleanup > - add more test cases > - re-implement depth limit on recursion > - passes TestIRLShiftIdeal_XPlusX_LShiftC > - passes AddI[L]NodeIdealizationTests > - revert depth limits > - ... and 12 more: https://git.openjdk.org/jdk/compare/b7b4f7c0...c8fdb74c > If this is your intention, then please ignore this message. Yes, this is my intention. --- My previous approach of identifying optimized `Mul->shift + add/sub` (e.g., `a*6` becomes `(a<<1) + (a<<2)` by `MulNode::Ideal()`) was inherently flawed. I was solely determining this with the number of terms. It is not reliable. In the `TestLargeTreeOfSubNodes` example, it replaces already optimized Mul nodes and a new Mul node and repeats the process, causing performance regression (and timeouts). The new approach matches the exact patterns of optimized `MulNode`s. Additionally, a recursion depth limit of 5 (a rather arbitrary number) is in effect during *iterative* GVN to mitigate the risk of exhausting resources. Untransformed nodes are added to the worklist and will be eventually transformed. Please note, in the case of `TestLargeTreeOfSubNodes` with flags mentioned above, the compilation is skipped without a large enough `-XX:MaxLabelRootDepth`. This is the same behaviour as the current master. Please re-review once GHA is confirmed passing. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20754#issuecomment-2354031910 From kvn at openjdk.org Mon Sep 16 21:08:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 16 Sep 2024 21:08:13 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v4] In-Reply-To: References: Message-ID: <4iyuq0MSWBGxjPqJWbtQP8Y8KL6gmIkHXDbK_pmDgqA=.28b2de70-3506-40ec-aa8c-96349e7c8de4@github.com> On Mon, 16 Sep 2024 20:45:45 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolution. > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 10426: > >> 10424: >> 10425: void MacroAssembler::setcc(Assembler::Condition comparison, Register dst) { >> 10426: if (VM_Version::supports_apx_f()) { > > We could check UseAPX here instead of VM_Version::supports_apx_f(). I think switching off a feature in `vm_version` file based on flags setting is correct. So that in the rest of code we can simple check `VM_Version::supports_*()`. Currently not all code follow this but it is preferable way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20920#discussion_r1761963930 From psandoz at openjdk.org Mon Sep 16 21:21:11 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 16 Sep 2024 21:21:11 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> Message-ID: On Mon, 16 Sep 2024 02:58:41 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2970: > 2968: > 2969: > 2970: /*package-private*/ I think we can simplify with: /*package-private*/ @ForceInline final $abstractvectortype$ selectFromTemplate(Class> indexVecClass, $abstractvectortype$ v1, $abstractvectortype$ v2) { int twoVectorLenMask = (length() << 1) - 1; #if[FP] Vector<$Boxbitstype$> wrapped_indexes = this.convert(VectorOperators.{#if[intOrFloat]?F2I:D2L}, 0) .lanewise(VectorOperators.AND, twoVectorLenMask); return VectorSupport.selectFromTwoVectorOp(getClass(), indexVecClass , $type$.class, $bitstype$.class, length(), wrapped_indexes, v1, v2, (vec1, vec2, vec3) -> selectFromTwoVectorHelper(vec1, vec2, vec3) ); #else[FP] $abstractvectortype$ wrapped_indexes = this.lanewise(VectorOperators.AND, twoVectorLenMask); return VectorSupport.selectFromTwoVectorOp(getClass(), indexVecClass, $type$.class, $type$.class, length(), wrapped_indexes, v1, v2, (vec1, vec2, vec3) -> selectFromTwoVectorHelper(vec1, vec2, vec3) ); #end[FP] } (Note that's without the assert - see separate comment). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1761977004 From dlong at openjdk.org Mon Sep 16 22:28:08 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 16 Sep 2024 22:28:08 GMT Subject: RFR: 8338566: Lazy creation of exception instances is not thread safe [v2] In-Reply-To: <_spGzlVz58-PSyiiwyEgVC2Z5YXsCg-sgF0-WWl_nLE=.55e033e4-c400-453e-a2a1-0c2d2ee7ea9c@github.com> References: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> <_spGzlVz58-PSyiiwyEgVC2Z5YXsCg-sgF0-WWl_nLE=.55e033e4-c400-453e-a2a1-0c2d2ee7ea9c@github.com> Message-ID: On Fri, 13 Sep 2024 09:14:21 GMT, Tobias Hartmann wrote: >> Similar to [JDK-8251923](https://bugs.openjdk.org/browse/JDK-8251923), we need a store-store barrier before publishing a handle because otherwise another thread could observe the handle before it's fully initialized and read null from it. This affects architectures with a weak memory model like AArch64. >> >> Unfortunately, this only happened twice in our testing and I was never able to reproduce it. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Create exceptions eagerly Nice improvement. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20950#pullrequestreview-2307940715 From sviswanathan at openjdk.org Mon Sep 16 23:02:12 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 16 Sep 2024 23:02:12 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v4] In-Reply-To: <4iyuq0MSWBGxjPqJWbtQP8Y8KL6gmIkHXDbK_pmDgqA=.28b2de70-3506-40ec-aa8c-96349e7c8de4@github.com> References: <4iyuq0MSWBGxjPqJWbtQP8Y8KL6gmIkHXDbK_pmDgqA=.28b2de70-3506-40ec-aa8c-96349e7c8de4@github.com> Message-ID: On Mon, 16 Sep 2024 21:05:01 GMT, Vladimir Kozlov wrote: >> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 10426: >> >>> 10424: >>> 10425: void MacroAssembler::setcc(Assembler::Condition comparison, Register dst) { >>> 10426: if (VM_Version::supports_apx_f()) { >> >> We could check UseAPX here instead of VM_Version::supports_apx_f(). > > I think switching off a feature in `vm_version` file based on flags setting is correct. > So that in the rest of code we can simple check `VM_Version::supports_*()`. > Currently not all code follow this but it is preferable way. Sounds good, let us keep it this way (VM_Version::supports_*()). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20920#discussion_r1762059128 From ddong at openjdk.org Tue Sep 17 00:17:09 2024 From: ddong at openjdk.org (Denghui Dong) Date: Tue, 17 Sep 2024 00:17:09 GMT Subject: RFR: 8340144: C1: remove unused Compilation::_max_spills [v2] In-Reply-To: References: Message-ID: On Sun, 15 Sep 2024 01:26:51 GMT, Denghui Dong wrote: >> Hi, >> >> Please review this trivial change that removed the unused field Compilation::_max_spills. >> >> Thanks > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21007#issuecomment-2354247667 From ddong at openjdk.org Tue Sep 17 00:17:09 2024 From: ddong at openjdk.org (Denghui Dong) Date: Tue, 17 Sep 2024 00:17:09 GMT Subject: Integrated: 8340144: C1: remove unused Compilation::_max_spills In-Reply-To: References: Message-ID: On Sun, 15 Sep 2024 01:17:30 GMT, Denghui Dong wrote: > Hi, > > Please review this trivial change that removed the unused field Compilation::_max_spills. > > Thanks This pull request has now been integrated. Changeset: 99d71850 Author: Denghui Dong URL: https://git.openjdk.org/jdk/commit/99d7185071a5daa695adc6255d37ce382285a9b3 Stats: 7 lines in 2 files changed: 0 ins; 5 del; 2 mod 8340144: C1: remove unused Compilation::_max_spills Reviewed-by: thartmann, shade ------------- PR: https://git.openjdk.org/jdk/pull/21007 From jbhateja at openjdk.org Tue Sep 17 01:45:08 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 01:45:08 GMT Subject: Integrated: 8339793: Fix incorrect APX feature enabling with -XX:-UseAPX In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 19:42:55 GMT, Jatin Bhateja wrote: > Currently VM_Supports::supports_apx_f() returns a true value even if user explicitly pass -XX:-UseAPX runtime flag, this enables APX specific code and register set. > > This bug fix patch turn off the APX_F feature if UseAPX runtime flag is explicitly set to false value. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: a4cf1918 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/a4cf1918c963cbe0b0eee6db580f0769c0cbdbcc Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod 8339793: Fix incorrect APX feature enabling with -XX:-UseAPX Reviewed-by: kvn, thartmann, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/20921 From duke at openjdk.org Tue Sep 17 04:14:16 2024 From: duke at openjdk.org (duke) Date: Tue, 17 Sep 2024 04:14:16 GMT Subject: Withdrawn: 8332856: C2: Add new transform for bool eq/ne (cmp (and (urshift X const1) const2) 0) In-Reply-To: References: Message-ID: On Mon, 20 May 2024 14:15:46 GMT, Tobias Hotz wrote: > This PR adds a new ideal optimization for the following pattern: > > public boolean testFunc(int a) { > int mask = 0b101; > int shift = 12; > return ((a >> shift) & mask) == 0; > } > > Where the mask and shift are constant values and a is a variable. For this optimization to work, the right shift has to be idealized to a unsinged right shift earlier in the pipeline, which here: https://github.com/openjdk/jdk/blob/b92bd671835c37cff58e2cdcecd0fe4277557d7f/src/hotspot/share/opto/mulnode.cpp#L731 > If the shift is already an unsiged bit shift, it works as well. > On AMD64 CPUs, this means that this whole line computation can be reduced to a simple `test` instruction. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/19310 From jbhateja at openjdk.org Tue Sep 17 04:38:07 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 04:38:07 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v4] In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 18:17:44 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolution. > > src/hotspot/cpu/x86/assembler_x86.cpp line 16052: > >> 16050: >> 16051: // Encoding Format : eevex_prefix | opcode_cc | modrm >> 16052: int encode = vex_prefix_and_encode(dst->encoding(), 0, 0, VEX_SIMD_F2, /* MAP4 */VEX_OPCODE_0F_3C, &attributes); > > We could replace this with: > int encode = evex_prefix_and_encode_ndd(dst->encoding(), 0, 0, VEX_SIMD_F2, /* MAP4 */VEX_OPCODE_0F_3C, &attributes); Zero upper setCC repurpose the NDD bit which is always set by default. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20920#discussion_r1762273655 From jbhateja at openjdk.org Tue Sep 17 04:38:08 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 04:38:08 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v4] In-Reply-To: References: <4iyuq0MSWBGxjPqJWbtQP8Y8KL6gmIkHXDbK_pmDgqA=.28b2de70-3506-40ec-aa8c-96349e7c8de4@github.com> Message-ID: On Mon, 16 Sep 2024 22:59:01 GMT, Sandhya Viswanathan wrote: >> I think switching off a feature in `vm_version` file based on flags setting is correct. >> So that in the rest of code we can simple check `VM_Version::supports_*()`. >> Currently not all code follow this but it is preferable way. > > Sounds good, let us keep it this way (VM_Version::supports_*()). Yes, CPU [feature is disabled](https://github.com/openjdk/jdk/commit/a4cf1918c963cbe0b0eee6db580f0769c0cbdbcc#diff-6ed856c57ddbe33e49883adb7c52ec51ed377e5f697dfd6d8bea505a97bfc5a5R1049) if UseAVX is set to false ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20920#discussion_r1762273687 From rcastanedalo at openjdk.org Tue Sep 17 05:20:30 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 17 Sep 2024 05:20:30 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v22] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Discard memory accesses with barrier data as implicit null check candidates ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/653f9acf..71a51bfc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=20-21 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Tue Sep 17 05:20:30 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 17 Sep 2024 05:20:30 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v20] In-Reply-To: References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Mon, 16 Sep 2024 15:48:32 GMT, Vladimir Kozlov wrote: >> I did not make any changes because the current logic in `lcm.cpp` already prevents this, albeit in a rather accidental way: `PhaseCFG::implicit_null_check` requires that all inputs of a candidate memory operation dominate the null check ([here](https://github.com/robcasloz/jdk/blob/d21104ca8ff1eef88a9d87fb78dda3009414b5b8/src/hotspot/share/opto/lcm.cpp#L310-L328)) so that it can be hoisted. This fails if the candidate memory operation has barriers because these always require `MachTemp` nodes, which are placed in the same block as the candidate and break the dominance condition. See a longer explanation [here](https://github.com/openjdk/jdk/pull/19746/files#r1715387255). >> >> Should I add a check to `PhaseCFG::implicit_null_check` to discard these memory accesses more explicitly? > >> Should I add a check to PhaseCFG::implicit_null_check to discard these memory accesses more explicitly? > > Yes, please. Done (commit 71a51bfc). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1762318179 From jbhateja at openjdk.org Tue Sep 17 07:10:54 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 07:10:54 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v11] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with two additional commits since the last revision: - Jcheck clearance - Review comments resolution. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/7c80bfce..29530047 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=09-10 Stats: 402 lines in 41 files changed: 98 ins; 98 del; 206 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From jbhateja at openjdk.org Tue Sep 17 07:10:54 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 07:10:54 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: <0Ak63KM4JYdbOT33Q8XPcv096MKzjY6vyl4xZkapZwM=.6b92e423-9bd5-4109-a0b3-75dbeaf40315@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> <0Ak63KM4JYdbOT33Q8XPcv096MKzjY6vyl4xZkapZwM=.6b92e423-9bd5-4109-a0b3-75dbeaf40315@github.com> Message-ID: On Mon, 16 Sep 2024 07:45:51 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. > > src/hotspot/share/opto/vectornode.cpp line 2122: > >> 2120: // index format by subsequent VectorLoadShuffle. >> 2121: int cast_vopc = VectorCastNode::opcode(0, index_elem_bt, true); >> 2122: Node* index_byte_vec = phase->transform(VectorCastNode::make(cast_vopc, index_vec, T_BYTE, num_elem)); > > This cast assumes that the indices cannot have more than 8 bits. This would allow vector lengths of up to 256. This is fine for intel. But as far as I know ARM has in principle longer vectors - up to 2048 bytes. Should we maybe add some assert here to make sure we never badly truncate the index? Shuffle overall is on our todo list, its a know limitation which we tried lifting once, yes you read it correctly, its a limitation for AARCH64 SVE once a 2048 bits vector systems are available, IIRC current max vector size on any available AARCH64 system is 256 bits, with Neoverse V2 they shrink the vector size back to 16 bytes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1762504446 From jbhateja at openjdk.org Tue Sep 17 07:10:54 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 07:10:54 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <8QUaed-UNR5ura5MXAeccEXQgaSOUaM_JCHvrUUeCVE=.d895b3db-6e3c-4351-9147-81eb303536f9@github.com> On Mon, 16 Sep 2024 07:27:44 GMT, Emanuel Peter wrote: >> Please at least add a comment why you are not following my suggestion. I feel like the work I put in to review is not being respected when comments are just silently resolved without any action or comment. > > I really do think that `as_ConI()` would be the right thing here. In product it is just a cast, and in debug at least we have an assert. DONE **It just got overlooked @eme64, we respect reviewer suggestions and value the time you invest in polishing our patches, thanks again :-)** ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1762504618 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1762504671 From jbhateja at openjdk.org Tue Sep 17 07:10:55 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 07:10:55 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> Message-ID: On Mon, 16 Sep 2024 18:35:42 GMT, Paul Sandoz wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 561: > >> 559: for (int i = 0; i < vlen; i++) { >> 560: int index = ((int)vecPayload1[i]); >> 561: res[i] = index >= vlen ? vecPayload3[index & (vlen - 1)] : vecPayload2[index]; > > This is incorrect as the index could be negative. You need to wrap in the range `[0, 2 * vlen - 1]` before the comparison and selection. > > int index = ((int)vecPayload1[i]) & ((vlen << 1) - 1)); > res[i] = index < vlen ? vecPayload2[index] : vecPayload3[index - vlen]; Hi @PaulSandoz , we already pass wrapped indexes to this helper routine called from fallback implementation. > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2974: > >> 2972: final $abstractvectortype$ selectFromTemplate(Class> indexVecClass, >> 2973: $abstractvectortype$ v1, $abstractvectortype$ v2) { >> 2974: int twoVectorLen = length() * 2; > > We should assert that the length is a power of two. API only accepts vector parameters and there is no means though public facing API to create a vector of NPOT sizes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1762504366 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1762504318 From jbhateja at openjdk.org Tue Sep 17 07:10:55 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 07:10:55 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 13 Sep 2024 14:43:42 GMT, Emanuel Peter wrote: >> Original API did throw IndexOutOfBoundsException, but later on we have moved away from exception throwing semantics to wrapping semantics. >> Please find details at following comment >> https://github.com/openjdk/jdk/pull/20508#issuecomment-2306344606 > > And do we test that the wrapping works correctly? VectorAPI Jtreg framework is based on testNG, our custom data providers associated with various test methods ensure to generates range of values which are beyond valid index range, this should check the wrapping scenarios. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1762504894 From jbhateja at openjdk.org Tue Sep 17 07:10:55 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 07:10:55 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: <_U2DgK6DAW3ZJhozsMhwHzggUFpj5fnHdLJOoYFcNJA=.1875811f-458f-4834-bb94-339a8ff7360d@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <_U2DgK6DAW3ZJhozsMhwHzggUFpj5fnHdLJOoYFcNJA=.1875811f-458f-4834-bb94-339a8ff7360d@github.com> Message-ID: On Mon, 16 Sep 2024 07:40:33 GMT, Emanuel Peter wrote: >> Patch includes tests for all the species (combination of vector type and sizes), each vector kernel is validated against equivalent scalar implementation, scenario which you are referring is implicitly handled though tests. > > Ok, just so that I can relax, can you please point me to this test that would implicitly verify that the backend has chosen the correct vector size? Each test method validates the intrinsic code against equivalent scalar implementation, it should catch if backend emits instruction with incorrect vector size. https://github.com/openjdk/jdk/pull/20508/files#diff-95c582657bf90bef3530e67cb143865d070fd2e8e4538849e3dce6061b0d5f2dR4863 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1762504831 From jbhateja at openjdk.org Tue Sep 17 07:14:57 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 07:14:57 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v13] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Missed code fragment from last review comment resolution. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/ec7c7553..a6f8ee8b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=11-12 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From chagedorn at openjdk.org Tue Sep 17 07:22:22 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 17 Sep 2024 07:22:22 GMT Subject: RFR: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() [v8] In-Reply-To: References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> Message-ID: On Mon, 16 Sep 2024 15:45:21 GMT, Kangcheng Xu wrote: >> Hello all. This patch addresses [JDK-8332442](https://bugs.openjdk.org/browse/JDK-8332442) and refactors `Op_ModI`/`Op_ModL`/`Op_UModI`/`Op_UModL` cases in `DIVMOD` transforamtions. The purpose of the transformation to convert adjacent div `/` and mod `%` operations of the same operands into one should the platform support this feature (e.g., x86-64). >> >> I took the liberty adding _signed_ DIVMOD nodes (i.e., `DIV_MOD_I` and `DIV_MOD_L`) to the `IRNode` class constants as they are previously missing. Please let me know if they were left intentionally and if there are any other concerns. Thanks! >> >> ~This will be a draft PR before GHA tests are confirmed passing.~ > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > remove C2 requirement for tests Sorry for letting you wait! Yes, testing was clean. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20877#issuecomment-2354733807 From kxu at openjdk.org Tue Sep 17 07:22:23 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 17 Sep 2024 07:22:23 GMT Subject: Integrated: 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() In-Reply-To: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> References: <4OynmRZK-BmwMX_P-st_1JmIzLe_Ub5PHf3oJCBHul0=.2e558a3a-31ab-4d71-8feb-dc82ffd54646@github.com> Message-ID: On Thu, 5 Sep 2024 21:01:00 GMT, Kangcheng Xu wrote: > Hello all. This patch addresses [JDK-8332442](https://bugs.openjdk.org/browse/JDK-8332442) and refactors `Op_ModI`/`Op_ModL`/`Op_UModI`/`Op_UModL` cases in `DIVMOD` transforamtions. The purpose of the transformation to convert adjacent div `/` and mod `%` operations of the same operands into one should the platform support this feature (e.g., x86-64). > > I took the liberty adding _signed_ DIVMOD nodes (i.e., `DIV_MOD_I` and `DIV_MOD_L`) to the `IRNode` class constants as they are previously missing. Please let me know if they were left intentionally and if there are any other concerns. Thanks! > > ~This will be a draft PR before GHA tests are confirmed passing.~ This pull request has now been integrated. Changeset: 10050a72 Author: Kangcheng Xu URL: https://git.openjdk.org/jdk/commit/10050a723954926926650af65417d5b828cba387 Stats: 348 lines in 7 files changed: 280 ins; 64 del; 4 mod 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() Reviewed-by: roland, chagedorn, jkarthikeyan ------------- PR: https://git.openjdk.org/jdk/pull/20877 From amitkumar at openjdk.org Tue Sep 17 08:45:32 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 17 Sep 2024 08:45:32 GMT Subject: RFR: 8340269: [s390x] TestLargeStub.java failure after 8338123 Message-ID: Fixes the test case; Ran tier1 with fastdebug-vm, didn't see any regression. ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/21033/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21033&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340269 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21033.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21033/head:pull/21033 PR: https://git.openjdk.org/jdk/pull/21033 From jbhateja at openjdk.org Tue Sep 17 08:49:23 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 08:49:23 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v5] In-Reply-To: References: Message-ID: > - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The > condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. > - This saves emitting an explicit MOVZX instruction after setCC. > - These new instructions are encoded using 4 byte Extended EVEX encoding. > > Validation performed over stand alone test point using Intel SDE. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20920/files - new: https://git.openjdk.org/jdk/pull/20920/files/c1c42d38..dc37dea6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20920&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20920&range=03-04 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20920/head:pull/20920 PR: https://git.openjdk.org/jdk/pull/20920 From jbhateja at openjdk.org Tue Sep 17 08:49:24 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 08:49:24 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v4] In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 20:37:27 GMT, Jatin Bhateja wrote: >> - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The >> condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. >> - This saves emitting an explicit MOVZX instruction after setCC. >> - These new instructions are encoded using 4 byte Extended EVEX encoding. >> >> Validation performed over stand alone test point using Intel SDE. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution. src/hotspot/cpu/x86/assembler_x86.cpp line 16052: > 16050: > 16051: // Encoding Format : eevex_prefix | opcode_cc | modrm > 16052: int encode = vex_prefix_and_encode(dst->encoding(), 0, 0, VEX_SIMD_F2, /* MAP4 */VEX_OPCODE_0F_3C, &attributes); Suggestion: int encode = vex_prefix_and_encode(0, 0, dst->encoding(), VEX_SIMD_F2, /* MAP4 */VEX_OPCODE_0F_3C, &attributes); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20920#discussion_r1762668007 From dlunden at openjdk.org Tue Sep 17 09:11:33 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 17 Sep 2024 09:11:33 GMT Subject: RFR: 8340273: Remove CounterHalfLifeTime Message-ID: The CounterHalfLifeTime flag is no longer used and should be removed. ### Changeset Remove CounterHalfLifeTime. ### Testing N/A ------------- Commit messages: - Remove CounterHalfLifeTime Changes: https://git.openjdk.org/jdk/pull/21034/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21034&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340273 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21034.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21034/head:pull/21034 PR: https://git.openjdk.org/jdk/pull/21034 From chagedorn at openjdk.org Tue Sep 17 09:17:18 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 17 Sep 2024 09:17:18 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v13] In-Reply-To: References: Message-ID: <-sIllD9F2YwEbdrKat9o0EkfstPF-zZRGGbGcwqeuRU=.4e115a3e-2fb5-4220-8072-2f66e021c321@github.com> On Mon, 16 Sep 2024 15:12:33 GMT, Emanuel Peter wrote: >> **Motivation** >> >> I want to write small dedicated fuzzers: >> - Generate `java` and `jasm` source code: just some `String`. >> - Quickly compile it (with this framework). >> - Execute the compiled code. >> >> The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. >> >> **The CompileFramework** >> >> Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. >> An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. >> >> I implemented a first, simple version of the framework. I added some tests and examples. >> >> **Example** >> >> >> CompileFramework comp = new CompileFramework(); >> comp.add(SourceCode.newJavaSourceCode("XYZ", "")); >> comp.compile(); >> comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); >> >> >> https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 >> >> **Below some use cases: tests that would have been better with the CompileFramework** >> >> **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** >> >> I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. >> >> For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, >> >> to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something i... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > move some code around Thanks for the updates - already looks much better! Some more comments. test/hotspot/jtreg/compiler/lib/compile_framework/ClassLoaderBuilder.java line 37: > 35: * Build a ClassLoader that loads from classpath and {@code classesDir}. > 36: * Helper class that generates a ClassLoader which allows loading classes > 37: * from the classpath (see {@code Utils.getClassPaths()}) and {@code classesDir}. You can use a `@link` which you can follow inside an IDE: Suggestion: * from the classpath (see {@link Utils#getClassPaths()}) and {@code classesDir}. test/hotspot/jtreg/compiler/lib/compile_framework/ClassLoaderBuilder.java line 38: > 36: * Helper class that generates a ClassLoader which allows loading classes > 37: * from the classpath (see {@code Utils.getClassPaths()}) and {@code classesDir}. > 38: * Suggestion: *

test/hotspot/jtreg/compiler/lib/compile_framework/ClassLoaderBuilder.java line 52: > 50: try { > 51: // Classpath for all included classes (e.g. IR Framework). > 52: // Get all class paths, convert to urls. Suggestion: // Get all class paths, convert to URLs. test/hotspot/jtreg/compiler/lib/compile_framework/ClassLoaderBuilder.java line 53: > 51: // Classpath for all included classes (e.g. IR Framework). > 52: // Get all class paths, convert to urls. > 53: List urls = new ArrayList(); Suggestion: List urls = new ArrayList<>(); test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 33: > 31: > 32: /** > 33: * This is the entry-point for the Compile Framework. Its purpose it to allow General comment about Javadocs. I think the convention is the indent to the first `*`: Suggestion: /** * This is the entry-point for the Compile Framework. Its purpose it to allow test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 35: > 33: * This is the entry-point for the Compile Framework. Its purpose it to allow > 34: * compilation and execution of Java and Jasm sources generated at runtime. > 35: * Blank lines are ignored in Javadocs: ![image](https://github.com/user-attachments/assets/c9d32083-51c5-4c26-913c-0b08bff7893d) You can add `

` which will add a new line in between: ![image](https://github.com/user-attachments/assets/7647eb73-5799-4251-afe4-3cebddf9d001) (Rendered Javadocs by IDEA) test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 36: > 34: * compilation and execution of Java and Jasm sources generated at runtime. > 35: * > 36: * Please reference the README.md for more explanation. Details? Suggestion: * Please reference the README.md for more details. test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 40: > 38: public class CompileFramework { > 39: private List javaSources = new ArrayList<>(); > 40: private List jasmSources = new ArrayList<>(); Can also be made final Suggestion: private final List javaSources = new ArrayList<>(); private final List jasmSources = new ArrayList<>(); test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 57: > 55: public void addJasmSourceCode(String className, String code) { > 56: jasmSources.add(new SourceCode(className, "jasm", code)); > 57: } Since this is the public API that the user should use, I suggest to also add a parameter description for all public API methods in this class with `@param` for completeness. Something like: /** * Add a Jasm source to the compilation. * * @param className The class name of the Jasm class. * @param code The source code of the Jasm class as string. */ which renders to: ![image](https://github.com/user-attachments/assets/4f0ba893-7876-4f57-b6ea-816f7e7660cd) test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 59: > 57: } > 58: > 59: private String sourceCodesAsString(List sourceCodes) { Just a side node, since this is not C where you need to define the methods first and then use it, I suggest to do it the other way round: Define methods below as they are used. This makes it easier to read I think test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 72: > 70: * Java and Jasm sources and store the generated class-files in the classes > 71: * directory. > 72: */ Not sure if it is clear what the sources directory and the classes directory are when just reading this comment. Do you want to mention the actual name of these directories? You could do something like: Suggestion: /** * Compile all sources: store the sources to the {@link sourceDir} directory, compile * Java and Jasm sources and store the generated class-files in the {@link classesDir} * directory. */ test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 87: > 85: > 86: Compile.compileJasmSources(jasmSources, sourceDir, classesDir); > 87: Compile.compileJavaSources(javaSources, sourceDir, classesDir); Suggestion: Instead of having two static methods, how about making the `sourceDir` and `classesDir` fields of `Compile`? Then you can reuse them inside the class without passing them around. test/hotspot/jtreg/compiler/lib/compile_framework/README.md line 2: > 1: # Compile Framework > 2: This compile framework allows the compilation and execution of Java and Jasm sources, which are generated at runtime. I think you can use the given name "Compile Framework" here. Suggestion: The Compile Framework allows the compilation and execution of Java and Jasm sources, which are generated at runtime. test/hotspot/jtreg/compiler/lib/compile_framework/README.md line 7: > 5: We want to be able to generate Java and Jasm source code in the form of Strings at runtime, then compile them, load the classes and invoke some methods. This allows us to write more elaborate tests. For example small dedicated fuzzers that are targetted at some specific compiler optimization. > 6: > 7: This is more powerful than hand-written tests, as we can generalize tests and cover more examples. It can also be better than a script-generated test: those are static and often the script is not checked in with the test. Also, the script is only run once, giving a static tests. Compilation at runtime allows us to randomly generate tests each time. Suggestion: This is more powerful than hand-written tests, as we can generalize tests and cover more examples. It can also be better than a script-generated test: those are static and often the script is not integrated with the generated test. Another limitation of a generator script is that it is only run once, creating fixed static tests. Compilation at runtime allows us to randomly generate tests each time. test/hotspot/jtreg/compiler/lib/compile_framework/README.md line 11: > 9: Of course we could compile at runtime without this framework, but it abstracts away the complexity of compilation, and allows the test-writer to focus on the generation of the source code. > 10: > 11: ## How to Use the Framework Suggestion: ## How to Use the Compile Framework test/hotspot/jtreg/compiler/lib/compile_framework/README.md line 18: > 16: > 17: // Create a new CompileFramework instance. > 18: CompileFramework comp = new CompileFramework(); Just a side note: Should we be explicit here and name the variable `compileFramework` since it is an illustrating how-to example? test/hotspot/jtreg/compiler/lib/compile_framework/README.md line 27: > 25: > 26: // Object ret = XYZ.test(5); > 27: Object ret = comp.invoke("XYZ", "test", new Object[] {5}); Same here, should we name it `returnValue` instead? test/hotspot/jtreg/compiler/lib/compile_framework/README.md line 31: > 29: ### Creating a new Compile Framework Instance > 30: > 31: First, one must create a `new CompileFramework()`, which creates two directories: a sources and a classes directory. The sources directory is where all the sources are placed by the Compile Framework, and the classes directory is where all the compiled classes are placed by the Compile Framework. You could add here that this is a fixed-named directory created inside the JTreg scratch directory. You can also add a link to the actual name where this is created inside `CompileFramework`. test/hotspot/jtreg/compiler/lib/compile_framework/README.md line 35: > 33: ### Adding Sources to the Compilation > 34: > 35: Java and Jasm sources can be added to the compilation using `comp.addJavaSourceCode` and `comp.addJasmSourceCode`. The source classes can depend on each other, and they can also use the IR-Framework ([TestFrameworkJavaExample](../../../testlibrary_tests/compile_framework/examples/TestFrameworkJavaExample.java)). Suggestion: Java and Jasm sources can be added to the compilation using `comp.addJavaSourceCode()` and `comp.addJasmSourceCode()`. The source classes can depend on each other, and they can also use the IR Framework ([TestFrameworkJavaExample](../../../testlibrary_tests/compile_framework/examples/TestFrameworkJavaExample.java)). test/hotspot/jtreg/compiler/lib/compile_framework/README.md line 39: > 37: ### Compiling > 38: > 39: All sources are compiled with `comp.compile()`. First, the sources are stored to the srouces directory, then compiled, and then the class-files stored in the classes directory. The respective directory names are printed, so that the user can easily access the generated files for debugging. Suggestion: All sources are compiled with `comp.compile()`. First, the sources are stored to the sources directory, then compiled, and then the class-files stored in the classes directory. The respective directory names are printed, so that the user can easily access the generated files for debugging. test/hotspot/jtreg/compiler/lib/compile_framework/README.md line 41: > 39: All sources are compiled with `comp.compile()`. First, the sources are stored to the srouces directory, then compiled, and then the class-files stored in the classes directory. The respective directory names are printed, so that the user can easily access the generated files for debugging. > 40: > 41: ### Interacting with the compiled code Suggestion: ### Interacting with the Compiled Code test/hotspot/jtreg/compiler/lib/compile_framework/README.md line 43: > 41: ### Interacting with the compiled code > 42: > 43: The compiled code is then loaded with a ClassLoader. The classes can be accessed directly with `comp.getClass(name)`. Specific methods can also be directly invoked with `comp.invoke`. Suggestion: The compiled code is then loaded with a `ClassLoader`. The classes can be accessed directly with `comp.getClass(name)`. Specific methods can also directly be invoked with `comp.invoke()`. test/hotspot/jtreg/compiler/lib/compile_framework/README.md line 45: > 43: The compiled code is then loaded with a ClassLoader. The classes can be accessed directly with `comp.getClass(name)`. Specific methods can also be directly invoked with `comp.invoke`. > 44: > 45: Should one require the modified classpath that includes the compiled classes, this is available with `comp.getEscapedClassPathOfCompiledClasses()`. This can be necessary if the test launches any other VM's that also access the compiled classes. This is for example necessary when using the IR-Framework. The IR Framework is usually written without `-`: Suggestion: Should one require the modified classpath that includes the compiled classes, this is available with `comp.getEscapedClassPathOfCompiledClasses()`. This can be necessary if the test launches any other VMs that also access the compiled classes. This is for example necessary when using the IR Framework. test/hotspot/jtreg/compiler/lib/compile_framework/Utils.java line 54: > 52: > 53: /** > 54: * Create a temporary directory, with a unique name, so that there can be no collisions Suggestion: * Create a temporary directory with a unique name to avoid collisions test/hotspot/jtreg/compiler/lib/compile_framework/Utils.java line 167: > 165: System.out.println("Compilation failed."); > 166: System.out.println("Exit code: " + exitCode); > 167: System.out.println("Output: '" + output + "'"); You could print to System.err: Suggestion: System.err.println("Compilation failed."); System.err.println("Exit code: " + exitCode); System.err.println("Output: '" + output + "'"); test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/CombinedJavaJasmExample.java line 37: > 35: > 36: /** > 37: * This test shows a compilation of multiple java and jasm source code files. Suggestion: * This test shows a compilation of multiple Java and Jasm source code files. test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/TestFrameworkJavaExample.java line 49: > 47: * classes (see {@code getEscapedClassPathOfCompiledClasses}). > 48: */ > 49: public class TestFrameworkJavaExample { Suggestion: This might be better named `IRFrameworkJavaExample`. It is a little bit unfortunate that I named the IR Framework main class `TestFramework` and not `IRFramework`...but I guess it's too late for that now. ------------- PR Review: https://git.openjdk.org/jdk/pull/20184#pullrequestreview-2308665738 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762645614 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762643040 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762648591 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762648148 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762601530 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762562055 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762562856 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762563191 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762573429 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762577022 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762594396 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762608194 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762660512 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762668935 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762671770 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762674541 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762675097 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762677358 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762678789 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762706985 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762710573 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762713487 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762681300 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762720898 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762732870 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762737328 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762689463 From chagedorn at openjdk.org Tue Sep 17 09:17:19 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 17 Sep 2024 09:17:19 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v13] In-Reply-To: <-sIllD9F2YwEbdrKat9o0EkfstPF-zZRGGbGcwqeuRU=.4e115a3e-2fb5-4220-8072-2f66e021c321@github.com> References: <-sIllD9F2YwEbdrKat9o0EkfstPF-zZRGGbGcwqeuRU=.4e115a3e-2fb5-4220-8072-2f66e021c321@github.com> Message-ID: On Tue, 17 Sep 2024 08:52:54 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> move some code around > > test/hotspot/jtreg/compiler/lib/compile_framework/README.md line 35: > >> 33: ### Adding Sources to the Compilation >> 34: >> 35: Java and Jasm sources can be added to the compilation using `comp.addJavaSourceCode` and `comp.addJasmSourceCode`. The source classes can depend on each other, and they can also use the IR-Framework ([TestFrameworkJavaExample](../../../testlibrary_tests/compile_framework/examples/TestFrameworkJavaExample.java)). > > Suggestion: > > Java and Jasm sources can be added to the compilation using `comp.addJavaSourceCode()` and `comp.addJasmSourceCode()`. The source classes can depend on each other, and they can also use the IR Framework ([TestFrameworkJavaExample](../../../testlibrary_tests/compile_framework/examples/TestFrameworkJavaExample.java)). I suggest to to also mention that IR framework tests must add a * @compile ../../../compiler/lib/ir_framework/TestFramework.java which might easily be overlooked when starting from another Compile Framework example and adding IR tests there. Maybe for easier reference, you can also add a separate subsection for the IR framework, where you can also mention `getEscapedClassPathOfCompiledClasses()` as stated below. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1762687994 From chagedorn at openjdk.org Tue Sep 17 09:22:06 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 17 Sep 2024 09:22:06 GMT Subject: RFR: 8340273: Remove CounterHalfLifeTime In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 09:04:03 GMT, Daniel Lund?n wrote: > The CounterHalfLifeTime flag is no longer used and should be removed. > > ### Changeset > > Remove CounterHalfLifeTime. > > ### Testing > > N/A Looks good and trivial. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21034#pullrequestreview-2308995274 From dholmes at openjdk.org Tue Sep 17 09:33:06 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 17 Sep 2024 09:33:06 GMT Subject: RFR: 8340273: Remove CounterHalfLifeTime In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 09:04:03 GMT, Daniel Lund?n wrote: > The CounterHalfLifeTime flag is no longer used and should be removed. > > ### Changeset > > Remove CounterHalfLifeTime. > > ### Testing > > N/A LGTM2 ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21034#pullrequestreview-2309076017 From chagedorn at openjdk.org Tue Sep 17 09:36:11 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 17 Sep 2024 09:36:11 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v2] In-Reply-To: References: <-TWzHU1sjg49BOttKN_muQ6ICHAMbAnBoNUurRbqHDg=.a27bbf72-a451-4257-914f-d37e181ce257@github.com> Message-ID: On Mon, 16 Sep 2024 21:04:14 GMT, Kangcheng Xu wrote: > Please note, in the case of TestLargeTreeOfSubNodes with flags mentioned above, the compilation is skipped without a large enough -XX:MaxLabelRootDepth. This is the same behaviour as the current master. Have you found out why this is the case? I thought that the original fix wanted to fix the problem of running out of nodes. I gave your patch another spin. We still see various failures and timeouts. For example: `compiler/intrinsics/sha/TestDigest.java` times out with various flag combinations (for example `-server -Xmixed`). Here is the stack at the timeout: Thread 7 (Thread 0x7fc808490700 (LWP 22433)): #0 0x00007fc80d648051 in Node::find_integer_type(BasicType) const () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so #1 0x00007fc80c793214 in AddNode::extract_base_operand_from_serial_additions(PhaseGVN*, Node*, Node**, int) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so #2 0x00007fc80c79306c in AddNode::extract_base_operand_from_serial_additions(PhaseGVN*, Node*, Node**, int) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so ... #90 0x00007fc80c79306c in AddNode::extract_base_operand_from_serial_additions(PhaseGVN*, Node*, Node**, int) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so #91 0x00007fc80c793082 in AddNode::extract_base_operand_from_serial_additions(PhaseGVN*, Node*, Node**, int) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so #92 0x00007fc80c79306c in AddNode::extract_base_operand_from_serial_additions(PhaseGVN*, Node*, Node**, int) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so #93 0x00007fc80c793351 in AddNode::convert_serial_additions(PhaseGVN*, bool, BasicType) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so #94 0x00007fc80c7937c5 in AddNode::IdealIL(PhaseGVN*, bool, BasicType) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so #95 0x00007fc80d73ea47 in PhaseGVN::transform(Node*) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so #96 0x00007fc80d715296 in Parse::do_one_bytecode() () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so #97 0x00007fc80d7029ca in Parse::do_one_block() () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so #98 0x00007fc80d703e86 in Parse::do_all_blocks() () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so #99 0x00007fc80d707738 in Parse::Parse(JVMState*, ciMethod*, float) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so ... I'm also seeing the live node limit assert with test applications/ctw/modules/java_desktop.java and flags: -ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation -Djava.awt.headless=true Assert hit: # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/opt/mach5/mesos/work_dir/slaves/a20696e7-ae7d-4d37-8e9c-83f99ef002cb-S27847/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/118d2aab-d4da-49da-972c-f7b6fd5b5eee/runs/4c2607c3-6f9f-49ea-8d16-a0fb44652951/workspace/open/src/hotspot/share/opto/node.cpp:79), pid=305278, tid=305301 # assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded limit # # JRE version: Java(TM) SE Runtime Environment (24.0) (fastdebug build 24-internal-2024-09-17-0714032.christian.hagedorn.jdk-test) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 24-internal-2024-09-17-0714032.christian.hagedorn.jdk-test, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x146c697] Node::verify_construction()+0x1a7 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20754#issuecomment-2355041639 From roland at openjdk.org Tue Sep 17 09:42:09 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 17 Sep 2024 09:42:09 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v2] In-Reply-To: <-TWzHU1sjg49BOttKN_muQ6ICHAMbAnBoNUurRbqHDg=.a27bbf72-a451-4257-914f-d37e181ce257@github.com> References: <-TWzHU1sjg49BOttKN_muQ6ICHAMbAnBoNUurRbqHDg=.a27bbf72-a451-4257-914f-d37e181ce257@github.com> Message-ID: On Mon, 16 Sep 2024 20:51:49 GMT, Kangcheng Xu wrote: >> This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. >> >> As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). >> >> Some notable examples of this transformation include: >> - `a + a + a` => `a*3` => `(a<<1) + a` >> - `a + a + a + a` => `a*4` => `a<<2` >> - `a*3 + a` => `a*4` => `a<<2` >> - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` >> >> See included IR unit tests for more. > > Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision: > > - Merge branch 'openjdk:master' into arithmetic-canonicalization > - Merge pull request #1 from tabjy/arithmetic-canonicalization-v2 > > Arithmetic canonicalization v2 > - remove dead code > - fix potential void type const nodes > - refactor and cleanup > - add more test cases > - re-implement depth limit on recursion > - passes TestIRLShiftIdeal_XPlusX_LShiftC > - passes AddI[L]NodeIdealizationTests > - revert depth limits > - ... and 12 more: https://git.openjdk.org/jdk/compare/e04bd3f5...c8fdb74c src/hotspot/share/opto/addnode.cpp line 427: > 425: } > 426: > 427: Node* con = (bt == T_INT) ? (Node*) phase->intcon((jint) factor) : (Node*) phase->longcon(factor); You can use `integercon()` and pass `bt` src/hotspot/share/opto/addnode.cpp line 441: > 439: bool AddNode::is_optimized_multiplication(Node* node, Node* base) { > 440: // Look for pattern: LShiftNode(a, CON) > 441: if (node->is_LShift() && node->in(2)->is_Con()) { Maybe passing `bt` to this method would make the whole thing more readable. You would use `Op_Lshift(bt)` here. src/hotspot/share/opto/addnode.cpp line 481: > 479: > 480: // MulNode(any, const), e.g., a*2 > 481: if (node->is_Mul() Same here: passing `bt` would turn this into `node->Opcode() == Op_Mul(bt)`. src/hotspot/share/opto/addnode.cpp line 490: > 488: if (bt == T_INT || bt == T_LONG) { // const could potentially be void type > 489: Node* mul_base; > 490: jlong multiplier = extract_base_operand_from_serial_additions(phase, operand_node, &mul_base, depth_limit - 1); Do you need to recurse at all here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1762855801 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1762859687 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1762865643 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1762866777 From dlunden at openjdk.org Tue Sep 17 09:57:09 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 17 Sep 2024 09:57:09 GMT Subject: RFR: 8340273: Remove CounterHalfLifeTime In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 09:04:03 GMT, Daniel Lund?n wrote: > The CounterHalfLifeTime flag is no longer used and should be removed. > > ### Changeset > > Remove CounterHalfLifeTime. > > ### Testing > > N/A Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21034#issuecomment-2355109821 From dlunden at openjdk.org Tue Sep 17 09:57:10 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 17 Sep 2024 09:57:10 GMT Subject: Integrated: 8340273: Remove CounterHalfLifeTime In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 09:04:03 GMT, Daniel Lund?n wrote: > The CounterHalfLifeTime flag is no longer used and should be removed. > > ### Changeset > > Remove CounterHalfLifeTime. > > ### Testing > > N/A This pull request has now been integrated. Changeset: 8b6e2770 Author: Daniel Lund?n URL: https://git.openjdk.org/jdk/commit/8b6e2770a53002fcc9e07d38b954e6854a644f95 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod 8340273: Remove CounterHalfLifeTime Reviewed-by: chagedorn, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/21034 From mdoerr at openjdk.org Tue Sep 17 09:59:09 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 17 Sep 2024 09:59:09 GMT Subject: RFR: 8340269: [s390x] TestLargeStub.java failure after 8338123 In-Reply-To: References: Message-ID: <0zDxBJohIEtsDXsPpIKq51UFvpxaHTGq_oUF9Mz2L4w=.9712e8c9-eb2b-4124-9d3c-5a7a3ca8a7ae@github.com> On Tue, 17 Sep 2024 08:37:31 GMT, Amit Kumar wrote: > Fixes the test case; Ran tier1 with fastdebug-vm, didn't see any regression. `pd_store_reg` and `pd_load_reg` use `reg2mem_opt` etc. I don't know how large they can get. It's correct if they fit into 16 Bytes. Did you measure the size of a trivial downcall stub? Do you still have some space left? ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21033#pullrequestreview-2309226991 From lucy at openjdk.org Tue Sep 17 10:42:05 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 17 Sep 2024 10:42:05 GMT Subject: RFR: 8340269: [s390x] TestLargeStub.java failure after 8338123 In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 08:37:31 GMT, Amit Kumar wrote: > Fixes the test case; Ran tier1 with fastdebug-vm, didn't see any regression. Looks good. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21033#pullrequestreview-2309366770 From thartmann at openjdk.org Tue Sep 17 10:42:09 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 17 Sep 2024 10:42:09 GMT Subject: RFR: 8338566: Lazy creation of exception instances is not thread safe [v2] In-Reply-To: <_spGzlVz58-PSyiiwyEgVC2Z5YXsCg-sgF0-WWl_nLE=.55e033e4-c400-453e-a2a1-0c2d2ee7ea9c@github.com> References: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> <_spGzlVz58-PSyiiwyEgVC2Z5YXsCg-sgF0-WWl_nLE=.55e033e4-c400-453e-a2a1-0c2d2ee7ea9c@github.com> Message-ID: <7HgAvkdMUL_jFNvQ2Hjh3ev2KryJ8pAfpcmiTI-yHpg=.9521c70b-d306-4065-ace3-ce2d516e4ee6@github.com> On Fri, 13 Sep 2024 09:14:21 GMT, Tobias Hartmann wrote: >> Similar to [JDK-8251923](https://bugs.openjdk.org/browse/JDK-8251923), we need a store-store barrier before publishing a handle because otherwise another thread could observe the handle before it's fully initialized and read null from it. This affects architectures with a weak memory model like AArch64. >> >> Unfortunately, this only happened twice in our testing and I was never able to reproduce it. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Create exceptions eagerly Thanks for the reviews Vladimir and Dean! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20950#issuecomment-2355264389 From thartmann at openjdk.org Tue Sep 17 10:42:10 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 17 Sep 2024 10:42:10 GMT Subject: Integrated: 8338566: Lazy creation of exception instances is not thread safe In-Reply-To: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> References: <-TT6n5B3MDywtm44-HL1CfMGEquA43S-LTbTfmxdlNE=.aef0a81a-ec5c-4e75-acf9-db5eb12edbd0@github.com> Message-ID: On Wed, 11 Sep 2024 14:17:30 GMT, Tobias Hartmann wrote: > Similar to [JDK-8251923](https://bugs.openjdk.org/browse/JDK-8251923), we need a store-store barrier before publishing a handle because otherwise another thread could observe the handle before it's fully initialized and read null from it. This affects architectures with a weak memory model like AArch64. > > Unfortunately, this only happened twice in our testing and I was never able to reproduce it. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 269cd38b Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/269cd38b55391364db0f92291eb29c3b6803db94 Stats: 109 lines in 6 files changed: 41 ins; 62 del; 6 mod 8338566: Lazy creation of exception instances is not thread safe Reviewed-by: shade, kvn, dlong ------------- PR: https://git.openjdk.org/jdk/pull/20950 From stuefe at openjdk.org Tue Sep 17 10:45:14 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 17 Sep 2024 10:45:14 GMT Subject: RFR: 8339939: [JVMCI] Don't compress abstract and interface Klasses In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 13:57:31 GMT, Doug Simon wrote: >> https://github.com/openjdk/jdk/pull/19157 disallows storing abstract and interface Klasses in class metaspace. JVMCI has to respect this and avoids compressing abstract and interface Klasses > > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstant.java line 29: > >> 27: /** >> 28: * Marker interface for hotspot specific constants. >> 29: */ > > Let's take this opportunity to improve this javadoc: > > /** > * A value in a space managed by Hotspot (e.g. heap or metaspace). > * Some of these values can be referenced with a compressed pointer (32 bits) > * instead of a full word-sized pointer. > */ drive-by comment, 32-bit is an implementation detail. The width of a narrowKlass will be adjustable with the upcoming JEP450. Referring to 32 bit may be obsolete soon. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20949#discussion_r1763011967 From mdoerr at openjdk.org Tue Sep 17 10:50:37 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 17 Sep 2024 10:50:37 GMT Subject: RFR: 8340230: Tests crash: assert(is_in_encoding_range || k->is_interface() || k->is_abstract()) failed: sanity Message-ID: If exactly one of UseCompressedOops and UseCompressedClassPointers is enabled: Only the transformation for the enabled part should be used. ------------- Commit messages: - 8340230: Tests crash: assert(is_in_encoding_range || k->is_interface() || k->is_abstract()) failed: sanity Changes: https://git.openjdk.org/jdk/pull/21036/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21036&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340230 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21036.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21036/head:pull/21036 PR: https://git.openjdk.org/jdk/pull/21036 From epeter at openjdk.org Tue Sep 17 12:03:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 17 Sep 2024 12:03:18 GMT Subject: RFR: 8340272: C2 SuperWord: JMH benchmark for Reduction vectorization Message-ID: I'm adding some proper JMH benchmarks for vectorized reductions. There are already some others, but they are not comprehensive or not JMH. Plus, I wanted to do a performance-investigation, hopefully leading to some improvements. **See Future Work below**. **How I run my benchmarks** All benchmarks `make test TEST="micro:vm.compiler.VectorReduction2" CONF=linux-x64` Some specific benchmark, with profiler that tells me which code snippet is hottest: `make test TEST="micro:vm.compiler.VectorReduction2.*doubleMinDotProduct" CONF=linux-x64 MICRO="OPTIONS=-prof perfasm"` **JMH logs** Run on my AVX512 laptop, with master: [run_avx512_master.txt](https://github.com/user-attachments/files/17025111/run_avx512_master.txt) Run on remote asimd (aarch64, NEON) machine: [run_asimd_master.txt](https://github.com/user-attachments/files/17025579/run_asimd_master.txt) **Results** I ran it on 2 machines so far. Left on my AVX512 machine, right on a ASIMD/NEON/aarch64 machine. Here the interesting `int / long / float / double` results, discussion further below: ![image](https://github.com/user-attachments/assets/20abfa7b-aee6-4654-bf4d-e3abc4bbfc8b) And there the less spectacular `byte / char / short` results. There is no vectorization of these cases. But there seems to be some issue with over-unrolling on my AVX512 machine, one case I looked at would only unroll 4x without SuperWord, but 16x with, and that seems to be unfavourable. ![image](https://github.com/user-attachments/assets/6e1c69cf-db6c-4d33-8750-c8797ffc39a2) Here the PDF: [benchmark_results.pdf](https://github.com/user-attachments/files/17027695/benchmark_results.pdf) **Why are all the ...Simple benchmarks not vectorizing, i.e. "not profitable"?** Apparently, there must be sufficient "work" vectors to outweith the "reduction" vectors. The idea used to be that one should have at least 2 work vectors which tend to be profitable, to outweigh the cost of a single reduction vector. // Check if reductions are connected if (is_marked_reduction(p0)) { Node* second_in = p0->in(2); Node_List* second_pk = get_pack(second_in); if ((second_pk == nullptr) || (_num_work_vecs == _num_reductions)) { // No parent pack or not enough work // to cover reduction expansion overhead return false; } else if (second_pk->size() != p->size()) { return false; } } But when I disable this code, then I see on the aarch64/ASIMD machine: VectorReduction2.NoSuperword.intAddSimple 2048 0 avgt 3 751.453 ? 1603.353 ns/op VectorReduction2.WithSuperword.intAddSimple 2048 0 avgt 3 344.263 ? 2.888 ns/op Hence, this assumption no longer holds. I think it is because we are actually able to move the reductions out of the loop now, and that was not the case when this code was added. **2-Element Reductions for INT / LONG** Apparently, all 2-element int and long reductions are currently deemed not profitable, see change: https://github.com/openjdk/jdk/commit/a880f3d1399469c2cd7ef1ace1deb7e04c5ab3d5 This means that the `long` reductions do not vectorize on the ASIMD / aarch64 machine with `MaxVectorSize=16`. // Length 2 reductions of INT/LONG do not offer performance benefits if (((arith_type->basic_type() == T_INT) || (arith_type->basic_type() == T_LONG)) && (size == 2)) { retValue = false; } else { retValue = ReductionNode::implemented(opc, size, arith_type->basic_type()); } **AARCH64 / NEON / ASIMD** This is why the `float / double` `add / mul` reductions (yellow `MATCHER`) do not vectorize. We might be able to tackle this with an appropriate alternative implementation. Also all of the `long` cases fail. For one because they would only be 2-element reductions (because only `MaxVectorSize=16`). But also because the `MulVL` is not allowed apparently, see below. bool Matcher::match_rule_supported_auto_vectorization(int opcode, int vlen, BasicType bt) { if (UseSVE == 0) { // These operations are not profitable to be vectorized on NEON, because no direct // NEON instructions support them. But the match rule support for them is profitable for // Vector API intrinsics. if ((opcode == Op_VectorCastD2X && bt == T_INT) || (opcode == Op_VectorCastL2X && bt == T_FLOAT) || (opcode == Op_CountLeadingZerosV && bt == T_LONG) || (opcode == Op_CountTrailingZerosV && bt == T_LONG) || // The implementations of Op_AddReductionVD/F in Neon are for the Vector API only. // They are not suitable for auto-vectorization because the result would not conform // to the JLS, Section Evaluation Order. opcode == Op_AddReductionVD || opcode == Op_AddReductionVF || opcode == Op_MulReductionVD || opcode == Op_MulReductionVF || opcode == Op_MulVL) { return false; } } return match_rule_supported_vector(opcode, vlen, bt); } **Float / Double with Add and Mul** On `aarch64` NEON these cases do not vectorize, see last section. It turns out that many of these cases actually do vectorize (on x64), but the code is just as fast as the scalar code. This is because the reduction order is strict, to maintain correct rounding. Interestingly, the code runs about at the same speed, if vectorized or not. it seems that the latency of the reduction is simply the determining factor, no matter if it is vectorized or scalar. Running this for example shows that the loop-bodies are quite different: make test TEST="micro:vm.compiler.VectorReduction2.*floatAddBig" CONF=linux-x64 MICRO="OPTIONS=-prof perfasm" Scalar loop body, note only scalar xmm registers are used: 0x00007fcee43ffedb: vaddss %xmm3,%xmm6,%xmm6 0x00007fcee43ffedf: vmulss %xmm2,%xmm4,%xmm4 6.69% 0x00007fcee43ffee3: vmulss %xmm2,%xmm1,%xmm1 0x00007fcee43ffee7: vaddss %xmm4,%xmm5,%xmm3 0x00007fcee43ffeeb: vmulss %xmm18,%xmm13,%xmm2 0x00007fcee43ffef1: vaddss %xmm1,%xmm3,%xmm3 0x00007fcee43ffef5: vmulss %xmm12,%xmm10,%xmm1 0x00007fcee43ffefa: vmulss %xmm11,%xmm13,%xmm5 0x00007fcee43ffeff: vaddss %xmm17,%xmm1,%xmm1 0x00007fcee43fff05: vaddss %xmm5,%xmm2,%xmm4 ; {no_reloc} 5.81% 0x00007fcee43fff09: vaddss %xmm14,%xmm1,%xmm1 0x00007fcee43fff0e: vaddss %xmm15,%xmm4,%xmm4 0x00007fcee43fff13: vaddss %xmm9,%xmm4,%xmm2 0x00007fcee43fff18: vaddss %xmm2,%xmm3,%xmm3 24.14% 0x00007fcee43fff1c: vaddss %xmm3,%xmm6,%xmm2 24.50% 0x00007fcee43fff20: vaddss %xmm2,%xmm1,%xmm9 Vector loop uses `zmm`, `ymm` and `xmm` registers, and not just `mul` and `add`, but also `vpshufd` and `vextractf128` instructions to shuffle the values around in the reduction. 0x00007f4578400c36: vmulps %zmm3,%zmm2,%zmm12 0x00007f4578400c3c: vmulps %zmm9,%zmm11,%zmm13 0.33% 0x00007f4578400c42: vmulps %zmm10,%zmm11,%zmm11 0x00007f4578400c48: vmulps %zmm2,%zmm4,%zmm2 0x00007f4578400c4e: vaddps %zmm13,%zmm11,%zmm11 0x00007f4578400c54: vmulps %zmm3,%zmm4,%zmm3 0x00007f4578400c5a: vmulps %zmm5,%zmm15,%zmm13 0x00007f4578400c60: vaddps %zmm2,%zmm3,%zmm2 0x00007f4578400c66: vmulps %zmm10,%zmm9,%zmm3 0x00007f4578400c6c: vaddps %zmm12,%zmm2,%zmm2 0.39% 0x00007f4578400c72: vaddps %zmm3,%zmm11,%zmm3 ; {no_reloc} 0x00007f4578400c78: vmulps %zmm7,%zmm8,%zmm4 0x00007f4578400c7e: vmulps %zmm6,%zmm8,%zmm8 0.03% 0x00007f4578400c84: vmulps %zmm7,%zmm6,%zmm6 0x00007f4578400c8a: vaddps %zmm8,%zmm4,%zmm4 0x00007f4578400c90: vmulps %zmm5,%zmm16,%zmm5 0x00007f4578400c96: vaddps %zmm6,%zmm4,%zmm4 0x00007f4578400c9c: vmulps %zmm15,%zmm16,%zmm6 0.36% 0x00007f4578400ca2: vaddps %zmm6,%zmm5,%zmm5 0x00007f4578400ca8: vaddps %zmm13,%zmm5,%zmm5 0x00007f4578400cae: vaddss %xmm3,%xmm1,%xmm1 0x00007f4578400cb2: vpshufd $0x1,%xmm3,%xmm13 0.23% 0x00007f4578400cb7: vaddss %xmm13,%xmm1,%xmm1 0.85% 0x00007f4578400cbc: vpshufd $0x2,%xmm3,%xmm13 0x00007f4578400cc1: vaddss %xmm13,%xmm1,%xmm1 1.31% 0x00007f4578400cc6: vpshufd $0x3,%xmm3,%xmm13 0x00007f4578400ccb: vaddss %xmm13,%xmm1,%xmm1 1.61% 0x00007f4578400cd0: vextractf128 $0x1,%ymm3,%xmm13 0x00007f4578400cd6: vaddss %xmm13,%xmm1,%xmm1 1.08% 0x00007f4578400cdb: vpshufd $0x1,%xmm13,%xmm12 0x00007f4578400ce1: vaddss %xmm12,%xmm1,%xmm1 1.48% 0x00007f4578400ce6: vpshufd $0x2,%xmm13,%xmm12 0x00007f4578400cec: vaddss %xmm12,%xmm1,%xmm1 1.48% 0x00007f4578400cf1: vpshufd $0x3,%xmm13,%xmm12 0x00007f4578400cf7: vaddss %xmm12,%xmm1,%xmm1 1.15% 0x00007f4578400cfc: vextracti64x4 $0x1,%zmm3,%ymm12 0x00007f4578400d03: vaddss %xmm12,%xmm1,%xmm1 1.64% 0x00007f4578400d08: vpshufd $0x1,%xmm12,%xmm13 0x00007f4578400d0e: vaddss %xmm13,%xmm1,%xmm1 1.12% 0x00007f4578400d13: vpshufd $0x2,%xmm12,%xmm13 0x00007f4578400d19: vaddss %xmm13,%xmm1,%xmm1 1.34% 0x00007f4578400d1e: vpshufd $0x3,%xmm12,%xmm13 0x00007f4578400d24: vaddss %xmm13,%xmm1,%xmm1 1.41% 0x00007f4578400d29: vextractf128 $0x1,%ymm12,%xmm13 0x00007f4578400d2f: vaddss %xmm13,%xmm1,%xmm1 1.54% 0x00007f4578400d34: vpshufd $0x1,%xmm13,%xmm12 0x00007f4578400d3a: vaddss %xmm12,%xmm1,%xmm1 1.41% 0x00007f4578400d3f: vpshufd $0x2,%xmm13,%xmm12 0x00007f4578400d45: vaddss %xmm12,%xmm1,%xmm1 1.28% 0x00007f4578400d4a: vpshufd $0x3,%xmm13,%xmm12 0x00007f4578400d50: vaddss %xmm12,%xmm1,%xmm1 1.57% 0x00007f4578400d55: vaddss %xmm4,%xmm1,%xmm1 1.31% 0x00007f4578400d59: vpshufd $0x1,%xmm4,%xmm6 0x00007f4578400d5e: vaddss %xmm6,%xmm1,%xmm1 1.28% 0x00007f4578400d62: vpshufd $0x2,%xmm4,%xmm6 0x00007f4578400d67: vaddss %xmm6,%xmm1,%xmm1 1.34% 0x00007f4578400d6b: vpshufd $0x3,%xmm4,%xmm6 0x00007f4578400d70: vaddss %xmm6,%xmm1,%xmm1 1.05% 0x00007f4578400d74: vextractf128 $0x1,%ymm4,%xmm6 ; {no_reloc} 0x00007f4578400d7a: vaddss %xmm6,%xmm1,%xmm1 1.67% 0x00007f4578400d7e: vpshufd $0x1,%xmm6,%xmm11 0x00007f4578400d83: vaddss %xmm11,%xmm1,%xmm1 1.48% 0x00007f4578400d88: vpshufd $0x2,%xmm6,%xmm11 0x00007f4578400d8d: vaddss %xmm11,%xmm1,%xmm1 1.25% 0x00007f4578400d92: vpshufd $0x3,%xmm6,%xmm11 0x00007f4578400d97: vaddss %xmm11,%xmm1,%xmm1 1.21% 0x00007f4578400d9c: vextracti64x4 $0x1,%zmm4,%ymm11 0x00007f4578400da3: vaddss %xmm11,%xmm1,%xmm1 1.94% 0x00007f4578400da8: vpshufd $0x1,%xmm11,%xmm6 0x00007f4578400dae: vaddss %xmm6,%xmm1,%xmm1 1.21% 0x00007f4578400db2: vpshufd $0x2,%xmm11,%xmm6 0x00007f4578400db8: vaddss %xmm6,%xmm1,%xmm1 1.77% 0x00007f4578400dbc: vpshufd $0x3,%xmm11,%xmm6 0x00007f4578400dc2: vaddss %xmm6,%xmm1,%xmm1 1.57% 0x00007f4578400dc6: vextractf128 $0x1,%ymm11,%xmm6 0x00007f4578400dcc: vaddss %xmm6,%xmm1,%xmm1 1.02% 0x00007f4578400dd0: vpshufd $0x1,%xmm6,%xmm11 0x00007f4578400dd5: vaddss %xmm11,%xmm1,%xmm1 1.48% 0x00007f4578400dda: vpshufd $0x2,%xmm6,%xmm11 0x00007f4578400ddf: vaddss %xmm11,%xmm1,%xmm1 1.31% 0x00007f4578400de4: vpshufd $0x3,%xmm6,%xmm11 0x00007f4578400de9: vaddss %xmm11,%xmm1,%xmm1 1.54% 0x00007f4578400dee: vaddss %xmm5,%xmm1,%xmm1 1.61% 0x00007f4578400df2: vpshufd $0x1,%xmm5,%xmm9 0x00007f4578400df7: vaddss %xmm9,%xmm1,%xmm1 1.51% 0x00007f4578400dfc: vpshufd $0x2,%xmm5,%xmm9 0x00007f4578400e01: vaddss %xmm9,%xmm1,%xmm1 1.61% 0x00007f4578400e06: vpshufd $0x3,%xmm5,%xmm9 0x00007f4578400e0b: vaddss %xmm9,%xmm1,%xmm1 1.34% 0x00007f4578400e10: vextractf128 $0x1,%ymm5,%xmm9 0x00007f4578400e16: vaddss %xmm9,%xmm1,%xmm1 1.25% 0x00007f4578400e1b: vpshufd $0x1,%xmm9,%xmm8 0x00007f4578400e21: vaddss %xmm8,%xmm1,%xmm1 2.16% 0x00007f4578400e26: vpshufd $0x2,%xmm9,%xmm8 0x00007f4578400e2c: vaddss %xmm8,%xmm1,%xmm1 1.44% 0x00007f4578400e31: vpshufd $0x3,%xmm9,%xmm8 0x00007f4578400e37: vaddss %xmm8,%xmm1,%xmm1 1.38% 0x00007f4578400e3c: vextracti64x4 $0x1,%zmm5,%ymm8 0x00007f4578400e43: vaddss %xmm8,%xmm1,%xmm1 1.51% 0x00007f4578400e48: vpshufd $0x1,%xmm8,%xmm9 0x00007f4578400e4e: vaddss %xmm9,%xmm1,%xmm1 1.74% 0x00007f4578400e53: vpshufd $0x2,%xmm8,%xmm9 0x00007f4578400e59: vaddss %xmm9,%xmm1,%xmm1 1.87% 0x00007f4578400e5e: vpshufd $0x3,%xmm8,%xmm9 0x00007f4578400e64: vaddss %xmm9,%xmm1,%xmm1 1.28% 0x00007f4578400e69: vextractf128 $0x1,%ymm8,%xmm9 0x00007f4578400e6f: vaddss %xmm9,%xmm1,%xmm1 1.67% 0x00007f4578400e74: vpshufd $0x1,%xmm9,%xmm8 ; {no_reloc} 0x00007f4578400e7a: vaddss %xmm8,%xmm1,%xmm1 1.57% 0x00007f4578400e7f: vpshufd $0x2,%xmm9,%xmm8 0x00007f4578400e85: vaddss %xmm8,%xmm1,%xmm1 1.48% 0x00007f4578400e8a: vpshufd $0x3,%xmm9,%xmm8 0x00007f4578400e90: vaddss %xmm8,%xmm1,%xmm1 1.57% 0x00007f4578400e95: vaddss %xmm2,%xmm1,%xmm1 1.57% 0x00007f4578400e99: vpshufd $0x1,%xmm2,%xmm7 0x00007f4578400e9e: vaddss %xmm7,%xmm1,%xmm1 1.61% 0x00007f4578400ea2: vpshufd $0x2,%xmm2,%xmm7 0x00007f4578400ea7: vaddss %xmm7,%xmm1,%xmm1 1.48% 0x00007f4578400eab: vpshufd $0x3,%xmm2,%xmm7 0x00007f4578400eb0: vaddss %xmm7,%xmm1,%xmm1 1.44% 0x00007f4578400eb4: vextractf128 $0x1,%ymm2,%xmm7 0x00007f4578400eba: vaddss %xmm7,%xmm1,%xmm1 1.02% 0x00007f4578400ebe: vpshufd $0x1,%xmm7,%xmm10 0x00007f4578400ec3: vaddss %xmm10,%xmm1,%xmm1 1.54% 0x00007f4578400ec8: vpshufd $0x2,%xmm7,%xmm10 0x00007f4578400ecd: vaddss %xmm10,%xmm1,%xmm1 1.31% 0x00007f4578400ed2: vpshufd $0x3,%xmm7,%xmm10 0x00007f4578400ed7: vaddss %xmm10,%xmm1,%xmm1 1.28% 0x00007f4578400edc: vextracti64x4 $0x1,%zmm2,%ymm10 0x00007f4578400ee3: vaddss %xmm10,%xmm1,%xmm1 1.28% 0x00007f4578400ee8: vpshufd $0x1,%xmm10,%xmm7 0x00007f4578400eee: vaddss %xmm7,%xmm1,%xmm1 2.03% 0x00007f4578400ef2: vpshufd $0x2,%xmm10,%xmm7 0x00007f4578400ef8: vaddss %xmm7,%xmm1,%xmm1 1.44% 0x00007f4578400efc: vpshufd $0x3,%xmm10,%xmm7 0x00007f4578400f02: vaddss %xmm7,%xmm1,%xmm1 1.48% 0x00007f4578400f06: vextractf128 $0x1,%ymm10,%xmm7 0x00007f4578400f0c: vaddss %xmm7,%xmm1,%xmm1 1.31% 0x00007f4578400f10: vpshufd $0x1,%xmm7,%xmm10 0x00007f4578400f15: vaddss %xmm10,%xmm1,%xmm1 1.08% 0x00007f4578400f1a: vpshufd $0x2,%xmm7,%xmm10 0x00007f4578400f1f: vaddss %xmm10,%xmm1,%xmm1 1.64% 0x00007f4578400f24: vpshufd $0x3,%xmm7,%xmm10 0x00007f4578400f29: vaddss %xmm10,%xmm1,%xmm1 **No vectorization for longMulBig?** I locally can get vectorization in another setting, but somehow not in the JMH benchmark. This is strange, and I'll have to keep investigating. **No speedup with doubleMinDotProduct** Strangely, on my AVX512 machine, that benchmark did vectorize, but it did not experience any speedup. I'm quite confused about that. Especially because the parallel benchmark `doubleMaxDotProduct` vectorizes just fine. More investigation needed. ---------------------------------------- **Future Work** - [JDK-8307513](https://bugs.openjdk.org/browse/JDK-8307513): C2: intrinsify Math.max(long,long) and Math.min(long,long) - The `Simple` benchmarks should be profitable -> allow vectorization! - Investigate the results for `byte / char / short`: is it all due to over-unrolling? - `float / double` with `add / mul`: - the main issue is with the strict order of operations. Maybe we can add some `Float.addAssociative` so that we can do non-strict reduction? That could be a cool feature for those who care about performance and are willing to give up some rounding precision. - The aarch64 backend simply refuses to vectorize these, because there is no strict order implementation. That could be changed. Though we would need a benchmark where it is worth it... - Investigate the strange results with `longMulBig` and `doubleMinDotProduct`. ------------- Commit messages: - 8340272 Changes: https://git.openjdk.org/jdk/pull/21032/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21032&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340272 Stats: 1454 lines in 1 file changed: 1454 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21032.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21032/head:pull/21032 PR: https://git.openjdk.org/jdk/pull/21032 From epeter at openjdk.org Tue Sep 17 12:03:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 17 Sep 2024 12:03:18 GMT Subject: RFR: 8340272: C2 SuperWord: JMH benchmark for Reduction vectorization In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 07:53:40 GMT, Emanuel Peter wrote: > I'm adding some proper JMH benchmarks for vectorized reductions. There are already some others, but they are not comprehensive or not JMH. > > Plus, I wanted to do a performance-investigation, hopefully leading to some improvements. **See Future Work below**. > > **How I run my benchmarks** > > All benchmarks > `make test TEST="micro:vm.compiler.VectorReduction2" CONF=linux-x64` > > Some specific benchmark, with profiler that tells me which code snippet is hottest: > `make test TEST="micro:vm.compiler.VectorReduction2.*doubleMinDotProduct" CONF=linux-x64 MICRO="OPTIONS=-prof perfasm"` > > **JMH logs** > > Run on my AVX512 laptop, with master: > [run_avx512_master.txt](https://github.com/user-attachments/files/17025111/run_avx512_master.txt) > > Run on remote asimd (aarch64, NEON) machine: > [run_asimd_master.txt](https://github.com/user-attachments/files/17025579/run_asimd_master.txt) > > **Results** > > I ran it on 2 machines so far. Left on my AVX512 machine, right on a ASIMD/NEON/aarch64 machine. > > Here the interesting `int / long / float / double` results, discussion further below: > ![image](https://github.com/user-attachments/assets/20abfa7b-aee6-4654-bf4d-e3abc4bbfc8b) > > > And there the less spectacular `byte / char / short` results. There is no vectorization of these cases. But there seems to be some issue with over-unrolling on my AVX512 machine, one case I looked at would only unroll 4x without SuperWord, but 16x with, and that seems to be unfavourable. > > ![image](https://github.com/user-attachments/assets/6e1c69cf-db6c-4d33-8750-c8797ffc39a2) > > Here the PDF: > [benchmark_results.pdf](https://github.com/user-attachments/files/17027695/benchmark_results.pdf) > > > **Why are all the ...Simple benchmarks not vectorizing, i.e. "not profitable"?** > > Apparently, there must be sufficient "work" vectors to outweith the "reduction" vectors. > The idea used to be that one should have at least 2 work vectors which tend to be profitable, to outweigh the cost of a single reduction vector. > > // Check if reductions are connected > if (is_marked_reduction(p0)) { > Node* second_in = p0->in(2); > Node_List* second_pk = get_pack(second_in); > if ((second_pk == nullptr) || (_num_work_vecs == _num_reductions)) { > // No parent pack or not enough work > // to cover reduction expansion overhead > return false; > } else if (second_pk->size() != p->size()) { > return false; > } > } > > > But when I disable this code, then I see on the aarch64/ASIMD machine: > > VectorReduction2.NoSuperword.intAddSimpl... @galderz You can use this JMH benchmark for your work in https://github.com/openjdk/jdk/pull/20098 if you want. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21032#issuecomment-2355504929 From thartmann at openjdk.org Tue Sep 17 13:41:07 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 17 Sep 2024 13:41:07 GMT Subject: RFR: 8340230: Tests crash: assert(is_in_encoding_range || k->is_interface() || k->is_abstract()) failed: sanity In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 10:45:09 GMT, Martin Doerr wrote: > If exactly one of UseCompressedOops and UseCompressedClassPointers is enabled: > Only the transformation for the enabled part should be used. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21036#pullrequestreview-2309795518 From kvn at openjdk.org Tue Sep 17 16:12:24 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 17 Sep 2024 16:12:24 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v22] In-Reply-To: References: Message-ID: <9qt_iRfNSfdLuraZ18LqQx_xVt7xNPF9SBRZXwWkIig=.f86597de-6440-4235-9152-55b3a08c0d89@github.com> On Tue, 17 Sep 2024 05:20:30 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Discard memory accesses with barrier data as implicit null check candidates Looks good to me. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2310210106 From kvn at openjdk.org Tue Sep 17 16:18:12 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 17 Sep 2024 16:18:12 GMT Subject: RFR: 8333258: C2: high memory usage in PhaseCFG::insert_anti_dependences() [v6] In-Reply-To: <4nYVjz-ULoZGwX8qbWOLPvTIMxq6C3IYCWiEdsaC8sk=.527878ef-f3ad-4085-a321-da812e201cea@github.com> References: <4nYVjz-ULoZGwX8qbWOLPvTIMxq6C3IYCWiEdsaC8sk=.527878ef-f3ad-4085-a321-da812e201cea@github.com> Message-ID: <-_h_jEZfru-0qViKUZmd8T3IE8hGuwY4Y6ccpRpc6y0=.845773d5-4e1c-4f12-b07c-6be4d70be8f4@github.com> On Fri, 13 Sep 2024 07:43:46 GMT, Roland Westrelin wrote: >> In a debug build, `PhaseCFG::insert_anti_dependences()` is called >> twice for a single node: once for actual processing, once for >> verification. >> >> In TestAntiDependenciesHighMemUsage, the test has a `Region` that >> merges 337 incoming path. It also has one `Phi` per memory slice that >> are stored to: 1000 `Phi` nodes. Each `Phi` node has 337 inputs that >> are identical except for one. The common input is the memory state on >> method entry. The test has 60 `Load` that needs to be processed for >> anti dependences. All `Load` share the same memory input: the memory >> state on method entry. For each `Load`, all `Phi` nodes are pushed 336 >> times on the work lists for anti dependence processing because all of >> them appear multiple times as uses of each `Load`s memory state: `Phi`s >> are pushed 336 000 on 2 work lists. Memory is not reclaimed on exit >> from `PhaseCFG::insert_anti_dependences()` so memory usage grows as >> `Load` nodes are processed: >> >> 336000 * 2 work lists * 60 loads * 8 bytes pointer = 322 MB. >> >> The fix I propose for this is to not push `Phi` nodes more than once >> when they have the same inputs multiple times. >> >> In TestAntiDependenciesHighMemUsage2, the test has 4000 loads. For >> each of them, when processed for anti dependences, all 4000 loads are >> pushed on the work lists because they share the same memory >> input. Then when they are popped from the work list, they are >> discarded because only stores are of interest: >> >> 4000 loads processed * 4000 loads pushed * 2 work lists * 8 bytes pointer = 256 MB. >> >> The fix I propose for this is to test before pushing on the work list >> whether a node is a store or not. >> >> Finally, I propose adding a `ResourceMark` so memory doesn't >> accumulate over calls to `PhaseCFG::insert_anti_dependences()`. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8333258 > - more review > - more review > - Merge branch 'master' into JDK-8333258 > - review > - Merge branch 'master' into JDK-8333258 > - refactoring > - Merge branch 'master' into JDK-8333258 > - review > - ... and 3 more: https://git.openjdk.org/jdk/compare/6228a21a...4511c175 Latest changes look good to me. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19791#pullrequestreview-2310223556 From kvn at openjdk.org Tue Sep 17 16:32:04 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 17 Sep 2024 16:32:04 GMT Subject: RFR: 8340230: Tests crash: assert(is_in_encoding_range || k->is_interface() || k->is_abstract()) failed: sanity In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 10:45:09 GMT, Martin Doerr wrote: > If exactly one of UseCompressedOops and UseCompressedClassPointers is enabled: > Only the transformation for the enabled part should be used. It is unfortunate that `Matcher::const_*()` methods don't check these flags. Fortunately they are used only in this place so the fix is correct. May be we need to rewrite this sometime later. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21036#pullrequestreview-2310253257 From jbhateja at openjdk.org Tue Sep 17 16:35:43 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 16:35:43 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v6] In-Reply-To: References: Message-ID: > - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The > condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. > - This saves emitting an explicit MOVZX instruction after setCC. > - These new instructions are encoded using 4 byte Extended EVEX encoding. > > Validation performed over stand alone test point using Intel SDE. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Post NDD patch cleanups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20920/files - new: https://git.openjdk.org/jdk/pull/20920/files/dc37dea6..8673c736 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20920&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20920&range=04-05 Stats: 6 lines in 1 file changed: 0 ins; 5 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20920/head:pull/20920 PR: https://git.openjdk.org/jdk/pull/20920 From mdoerr at openjdk.org Tue Sep 17 16:36:06 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 17 Sep 2024 16:36:06 GMT Subject: RFR: 8340230: Tests crash: assert(is_in_encoding_range || k->is_interface() || k->is_abstract()) failed: sanity In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 10:45:09 GMT, Martin Doerr wrote: > If exactly one of UseCompressedOops and UseCompressedClassPointers is enabled: > Only the transformation for the enabled part should be used. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21036#issuecomment-2356404811 From epeter at openjdk.org Tue Sep 17 16:37:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 17 Sep 2024 16:37:11 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v13] In-Reply-To: <-sIllD9F2YwEbdrKat9o0EkfstPF-zZRGGbGcwqeuRU=.4e115a3e-2fb5-4220-8072-2f66e021c321@github.com> References: <-sIllD9F2YwEbdrKat9o0EkfstPF-zZRGGbGcwqeuRU=.4e115a3e-2fb5-4220-8072-2f66e021c321@github.com> Message-ID: On Tue, 17 Sep 2024 08:08:01 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> move some code around > > test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 33: > >> 31: >> 32: /** >> 33: * This is the entry-point for the Compile Framework. Its purpose it to allow > > General comment about Javadocs. I think the convention is the indent to the first `*`: > > Suggestion: > > /** > * This is the entry-point for the Compile Framework. Its purpose it to allow ok, will fix it! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1763551405 From epeter at openjdk.org Tue Sep 17 16:47:09 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 17 Sep 2024 16:47:09 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v13] In-Reply-To: <-sIllD9F2YwEbdrKat9o0EkfstPF-zZRGGbGcwqeuRU=.4e115a3e-2fb5-4220-8072-2f66e021c321@github.com> References: <-sIllD9F2YwEbdrKat9o0EkfstPF-zZRGGbGcwqeuRU=.4e115a3e-2fb5-4220-8072-2f66e021c321@github.com> Message-ID: On Tue, 17 Sep 2024 07:51:13 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> move some code around > > test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 57: > >> 55: public void addJasmSourceCode(String className, String code) { >> 56: jasmSources.add(new SourceCode(className, "jasm", code)); >> 57: } > > Since this is the public API that the user should use, I suggest to also add a parameter description for all public API methods in this class with `@param` for completeness. Something like: > > /** > * Add a Jasm source to the compilation. > * > * @param className The class name of the Jasm class. > * @param code The source code of the Jasm class as string. > */ > > which renders to: > > ![image](https://github.com/user-attachments/assets/4f0ba893-7876-4f57-b6ea-816f7e7660cd) Ok, I'll do that :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1763564199 From epeter at openjdk.org Tue Sep 17 16:54:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 17 Sep 2024 16:54:15 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v13] In-Reply-To: <-sIllD9F2YwEbdrKat9o0EkfstPF-zZRGGbGcwqeuRU=.4e115a3e-2fb5-4220-8072-2f66e021c321@github.com> References: <-sIllD9F2YwEbdrKat9o0EkfstPF-zZRGGbGcwqeuRU=.4e115a3e-2fb5-4220-8072-2f66e021c321@github.com> Message-ID: On Tue, 17 Sep 2024 08:12:24 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> move some code around > > test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 87: > >> 85: >> 86: Compile.compileJasmSources(jasmSources, sourceDir, classesDir); >> 87: Compile.compileJavaSources(javaSources, sourceDir, classesDir); > > Suggestion: Instead of having two static methods, how about making the `sourceDir` and `classesDir` fields of `Compile`? Then you can reuse them inside the class without passing them around (with these methods being member methods). That is why those methods were originally part of the `ComileFramework`, as fields... I think it is easier to pass them as arguments, because that way one can directly and easily understand what are the components that go into the compilation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1763571880 From psandoz at openjdk.org Tue Sep 17 17:03:14 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 17 Sep 2024 17:03:14 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> Message-ID: On Tue, 17 Sep 2024 07:02:15 GMT, Jatin Bhateja wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 561: >> >>> 559: for (int i = 0; i < vlen; i++) { >>> 560: int index = ((int)vecPayload1[i]); >>> 561: res[i] = index >= vlen ? vecPayload3[index & (vlen - 1)] : vecPayload2[index]; >> >> This is incorrect as the index could be negative. You need to wrap in the range `[0, 2 * vlen - 1]` before the comparison and selection. >> >> int index = ((int)vecPayload1[i]) & ((vlen << 1) - 1)); >> res[i] = index < vlen ? vecPayload2[index] : vecPayload3[index - vlen]; > > Hi @PaulSandoz , we already pass wrapped indexes to this helper routine called from fallback implementation. Opps yes, the masking was throwing me off. Can you please add a comment and/or rename the parameters e.g., so `v1` is renamed to `wrappedIndex`? Also i would recommend not doing the masking, it is very misleading and instead do the subtraction. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1763582764 From psandoz at openjdk.org Tue Sep 17 17:07:16 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 17 Sep 2024 17:07:16 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> Message-ID: On Tue, 17 Sep 2024 07:02:12 GMT, Jatin Bhateja wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2974: >> >>> 2972: final $abstractvectortype$ selectFromTemplate(Class> indexVecClass, >>> 2973: $abstractvectortype$ v1, $abstractvectortype$ v2) { >>> 2974: int twoVectorLen = length() * 2; >> >> We should assert that the length is a power of two. > > API only accepts vector parameters and there is no means though public facing API to create a vector of NPOT sizes. https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Vector.java#L842C58-L843C27 You missed the first bit of the sentence linked to "With the possible exception of the {@linkplain VectorShape#S_Max_BIT maximum shape}". In generally the specification avoids assuming POT where it is not explicitly stated (i.e., the constant shapes). In this case we align with the specification of `VectorShuffle::wrapIndex`. We don't need to implement NPOT but we need a reminder in the implementation where we make that assumption. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1763587293 From epeter at openjdk.org Tue Sep 17 17:07:50 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 17 Sep 2024 17:07:50 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v14] In-Reply-To: References: Message-ID: > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix up CompileFramework.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/45abaed4..647b8aca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=12-13 Stats: 93 lines in 1 file changed: 42 ins; 26 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From epeter at openjdk.org Tue Sep 17 17:07:50 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 17 Sep 2024 17:07:50 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v13] In-Reply-To: <-sIllD9F2YwEbdrKat9o0EkfstPF-zZRGGbGcwqeuRU=.4e115a3e-2fb5-4220-8072-2f66e021c321@github.com> References: <-sIllD9F2YwEbdrKat9o0EkfstPF-zZRGGbGcwqeuRU=.4e115a3e-2fb5-4220-8072-2f66e021c321@github.com> Message-ID: On Tue, 17 Sep 2024 07:53:19 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> move some code around > > test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 59: > >> 57: } >> 58: >> 59: private String sourceCodesAsString(List sourceCodes) { > > Just a side node, since this is not C where you need to define the methods first and then use them, I suggest to do it the other way round: Define methods below as they are used. This makes it easier to read I think Ok, I'll keep this in mind :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1763586020 From sviswanathan at openjdk.org Tue Sep 17 17:14:09 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 17 Sep 2024 17:14:09 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v6] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 16:35:43 GMT, Jatin Bhateja wrote: >> - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The >> condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. >> - This saves emitting an explicit MOVZX instruction after setCC. >> - These new instructions are encoded using 4 byte Extended EVEX encoding. >> >> Validation performed over stand alone test point using Intel SDE. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Post NDD patch cleanups LGTM ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20920#pullrequestreview-2310353618 From epeter at openjdk.org Tue Sep 17 17:15:24 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 17 Sep 2024 17:15:24 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v15] In-Reply-To: References: Message-ID: > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/647b8aca..5030e5d5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=13-14 Stats: 17 lines in 4 files changed: 0 ins; 0 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From epeter at openjdk.org Tue Sep 17 17:33:54 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 17 Sep 2024 17:33:54 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v16] In-Reply-To: References: Message-ID: > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more fixup for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/5030e5d5..fb57e286 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=14-15 Stats: 59 lines in 5 files changed: 4 ins; 0 del; 55 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From epeter at openjdk.org Tue Sep 17 17:33:54 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 17 Sep 2024 17:33:54 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v13] In-Reply-To: <-sIllD9F2YwEbdrKat9o0EkfstPF-zZRGGbGcwqeuRU=.4e115a3e-2fb5-4220-8072-2f66e021c321@github.com> References: <-sIllD9F2YwEbdrKat9o0EkfstPF-zZRGGbGcwqeuRU=.4e115a3e-2fb5-4220-8072-2f66e021c321@github.com> Message-ID: On Tue, 17 Sep 2024 09:14:56 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> move some code around > > Thanks for the updates - already looks much better! Some more comments. @chhagedorn thanks a lot for the second round of review! I finally learned how to to Javadocs, and how to compile without IDE: `jdk-fork2/open/test/hotspot/jtreg$ javadoc -sourcepath . -d ./docs -subpackages compiler.lib.compile_framework` > test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/TestFrameworkJavaExample.java line 49: > >> 47: * classes (see {@code getEscapedClassPathOfCompiledClasses}). >> 48: */ >> 49: public class TestFrameworkJavaExample { > > Suggestion: This might be better named `IRFrameworkJavaExample`. It is a little bit unfortunate that I named the IR Framework main class `TestFramework` and not `IRFramework`...but I guess it's too late for that now. Yeah, unfortunate ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20184#issuecomment-2356513651 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1763620758 From jbhateja at openjdk.org Tue Sep 17 17:49:09 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 17:49:09 GMT Subject: Integrated: 8339790: Support Intel APX setzucc instruction In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 19:36:51 GMT, Jatin Bhateja wrote: > - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The > condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. > - This saves emitting an explicit MOVZX instruction after setCC. > - These new instructions are encoded using 4 byte Extended EVEX encoding. > > Validation performed over stand alone test point using Intel SDE. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: 90e92f98 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/90e92f98a6685b196b979853436668cf2b9f2117 Stats: 73 lines in 7 files changed: 22 ins; 25 del; 26 mod 8339790: Support Intel APX setzucc instruction Reviewed-by: sviswanathan, jkarthikeyan, kvn ------------- PR: https://git.openjdk.org/jdk/pull/20920 From kvn at openjdk.org Tue Sep 17 18:06:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 17 Sep 2024 18:06:05 GMT Subject: RFR: 8340272: C2 SuperWord: JMH benchmark for Reduction vectorization In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 07:53:40 GMT, Emanuel Peter wrote: > I'm adding some proper JMH benchmarks for vectorized reductions. There are already some others, but they are not comprehensive or not JMH. > > Plus, I wanted to do a performance-investigation, hopefully leading to some improvements. **See Future Work below**. > > **How I run my benchmarks** > > All benchmarks > `make test TEST="micro:vm.compiler.VectorReduction2" CONF=linux-x64` > > Some specific benchmark, with profiler that tells me which code snippet is hottest: > `make test TEST="micro:vm.compiler.VectorReduction2.*doubleMinDotProduct" CONF=linux-x64 MICRO="OPTIONS=-prof perfasm"` > > **JMH logs** > > Run on my AVX512 laptop, with master: > [run_avx512_master.txt](https://github.com/user-attachments/files/17025111/run_avx512_master.txt) > > Run on remote asimd (aarch64, NEON) machine: > [run_asimd_master.txt](https://github.com/user-attachments/files/17025579/run_asimd_master.txt) > > **Results** > > I ran it on 2 machines so far. Left on my AVX512 machine, right on a ASIMD/NEON/aarch64 machine. > > Here the interesting `int / long / float / double` results, discussion further below: > ![image](https://github.com/user-attachments/assets/20abfa7b-aee6-4654-bf4d-e3abc4bbfc8b) > > > And there the less spectacular `byte / char / short` results. There is no vectorization of these cases. But there seems to be some issue with over-unrolling on my AVX512 machine, one case I looked at would only unroll 4x without SuperWord, but 16x with, and that seems to be unfavourable. > > ![image](https://github.com/user-attachments/assets/6e1c69cf-db6c-4d33-8750-c8797ffc39a2) > > Here the PDF: > [benchmark_results.pdf](https://github.com/user-attachments/files/17027695/benchmark_results.pdf) > > > **Why are all the ...Simple benchmarks not vectorizing, i.e. "not profitable"?** > > Apparently, there must be sufficient "work" vectors to outweith the "reduction" vectors. > The idea used to be that one should have at least 2 work vectors which tend to be profitable, to outweigh the cost of a single reduction vector. > > // Check if reductions are connected > if (is_marked_reduction(p0)) { > Node* second_in = p0->in(2); > Node_List* second_pk = get_pack(second_in); > if ((second_pk == nullptr) || (_num_work_vecs == _num_reductions)) { > // No parent pack or not enough work > // to cover reduction expansion overhead > return false; > } else if (second_pk->size() != p->size()) { > return false; > } > } > > > But when I disable this code, then I see on the aarch64/ASIMD machine: > > VectorReduction2.NoSuperword.intAddSimpl... Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21032#pullrequestreview-2310463953 From psandoz at openjdk.org Tue Sep 17 18:24:05 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 17 Sep 2024 18:24:05 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: <0LLon0mwzZnp8sR_306z0BoBUjXQAgLBn_KHP-37PC0=.802ff560-b97b-43ba-83b1-94d331d7e03e@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> <0LLon0mwzZnp8sR_306z0BoBUjXQAgLBn_KHP-37PC0=.802ff560-b97b-43ba-83b1-94d331d7e03e@github.com> Message-ID: On Thu, 22 Aug 2024 18:43:56 GMT, Paul Sandoz wrote: > Adding link to UTF-8 decoding use case for convenience and reminder: https://github.com/AugustNagro/utf8.java/blob/master/src/main/java/com/augustnagro/utf8/Utf8.java. Another related link to base 64 decoding https://github.com/simdutf/SimdBase64/ ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2356611024 From sviswanathan at openjdk.org Tue Sep 17 18:42:05 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 17 Sep 2024 18:42:05 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> <0LLon0mwzZnp8sR_306z0BoBUjXQAgLBn_KHP-37PC0=.802ff560-b97b-43ba-83b1-94d331d7e03e@github.com> Message-ID: On Tue, 17 Sep 2024 18:21:43 GMT, Paul Sandoz wrote: > > Adding link to UTF-8 decoding use case for convenience and reminder: https://github.com/AugustNagro/utf8.java/blob/master/src/main/java/com/augustnagro/utf8/Utf8.java. > > Another related link to base 64 decoding https://github.com/simdutf/SimdBase64/ Thanks Paul! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2356639277 From duke at openjdk.org Wed Sep 18 01:47:05 2024 From: duke at openjdk.org (duke) Date: Wed, 18 Sep 2024 01:47:05 GMT Subject: RFR: 8339299: C1 will miss type profile when inline final method [v5] In-Reply-To: References: Message-ID: <1ou5GZuPCBMmYIsoTL3nsmh8fbX-hw9n-5s69M1KTdE=.9bc59c5a-9432-44c2-b412-8cd12dcfb3dd@github.com> On Tue, 10 Sep 2024 08:35:25 GMT, kuaiwei wrote: >> I found sometimes C1 will miss type profile and I tried to write a test to demonstrate it. >> It will happen with these conditions >> 1 It's a virtual call and the callee is a final method. So c1 will think it's static bound. >> 2 The interpreter never touch the callsite, so interpreter does not add type profile. >> 3 In c1 compilation, it will be inlined based on CHA. >> 4 In c2 compilation, the CHA is broken, but type profile is missing, so c2 can not inline it. > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Modify test case to use createTestJavaProcessBuilder @kuaiwei Your change (at version 666ce51f2f4f2826c323be9fd6da88fc589942c8) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20786#issuecomment-2357319067 From duke at openjdk.org Wed Sep 18 01:47:05 2024 From: duke at openjdk.org (kuaiwei) Date: Wed, 18 Sep 2024 01:47:05 GMT Subject: RFR: 8339299: C1 will miss type profile when inline final method In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 22:23:22 GMT, Vladimir Ivanov wrote: >>> > In c2 compilation, the CHA is broken >>> >>> What do you mean by that? >> >> It looks like the test invalidates the initial CHA optimazation that was done when there was only one subclass by loading a 2nd subclass. > >> It looks like the test invalidates the initial CHA optimazation that was done when there was only one subclass by loading a 2nd subclass. > > Moreover, CHA has to discover a final method in order to satisfy `can_be_statically_bound()` predicate. @iwanowww @lmesnik Thanks for your review. Could you help sponsor it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20786#issuecomment-2357320618 From jkarthikeyan at openjdk.org Wed Sep 18 03:01:05 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 18 Sep 2024 03:01:05 GMT Subject: RFR: 8340272: C2 SuperWord: JMH benchmark for Reduction vectorization In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 07:53:40 GMT, Emanuel Peter wrote: > I'm adding some proper JMH benchmarks for vectorized reductions. There are already some others, but they are not comprehensive or not JMH. > > Plus, I wanted to do a performance-investigation, hopefully leading to some improvements. **See Future Work below**. > > **How I run my benchmarks** > > All benchmarks > `make test TEST="micro:vm.compiler.VectorReduction2" CONF=linux-x64` > > Some specific benchmark, with profiler that tells me which code snippet is hottest: > `make test TEST="micro:vm.compiler.VectorReduction2.*doubleMinDotProduct" CONF=linux-x64 MICRO="OPTIONS=-prof perfasm"` > > **JMH logs** > > Run on my AVX512 laptop, with master: > [run_avx512_master.txt](https://github.com/user-attachments/files/17025111/run_avx512_master.txt) > > Run on remote asimd (aarch64, NEON) machine: > [run_asimd_master.txt](https://github.com/user-attachments/files/17025579/run_asimd_master.txt) > > **Results** > > I ran it on 2 machines so far. Left on my AVX512 machine, right on a ASIMD/NEON/aarch64 machine. > > Here the interesting `int / long / float / double` results, discussion further below: > ![image](https://github.com/user-attachments/assets/20abfa7b-aee6-4654-bf4d-e3abc4bbfc8b) > > > And there the less spectacular `byte / char / short` results. There is no vectorization of these cases. But there seems to be some issue with over-unrolling on my AVX512 machine, one case I looked at would only unroll 4x without SuperWord, but 16x with, and that seems to be unfavourable. > > ![image](https://github.com/user-attachments/assets/6e1c69cf-db6c-4d33-8750-c8797ffc39a2) > > Here the PDF: > [benchmark_results.pdf](https://github.com/user-attachments/files/17027695/benchmark_results.pdf) > > > **Why are all the ...Simple benchmarks not vectorizing, i.e. "not profitable"?** > > Apparently, there must be sufficient "work" vectors to outweith the "reduction" vectors. > The idea used to be that one should have at least 2 work vectors which tend to be profitable, to outweigh the cost of a single reduction vector. > > // Check if reductions are connected > if (is_marked_reduction(p0)) { > Node* second_in = p0->in(2); > Node_List* second_pk = get_pack(second_in); > if ((second_pk == nullptr) || (_num_work_vecs == _num_reductions)) { > // No parent pack or not enough work > // to cover reduction expansion overhead > return false; > } else if (second_pk->size() != p->size()) { > return false; > } > } > > > But when I disable this code, then I see on the aarch64/ASIMD machine: > > VectorReduction2.NoSuperword.intAddSimpl... Looks nice, the benchmark is very thorough! I was interested to see how it performed on my Zen 3 (AVX2) machine, I've attached the results here in case it's interesting/useful: [perf_results.txt](https://github.com/user-attachments/files/17037796/perf_results.txt) ------------- Marked as reviewed by jkarthikeyan (Committer). PR Review: https://git.openjdk.org/jdk/pull/21032#pullrequestreview-2311518449 From epeter at openjdk.org Wed Sep 18 06:53:44 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Sep 2024 06:53:44 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v17] In-Reply-To: References: Message-ID: > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: another small suggestion from Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/fb57e286..237ce2ed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=15-16 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From epeter at openjdk.org Wed Sep 18 06:53:44 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Sep 2024 06:53:44 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v13] In-Reply-To: <-sIllD9F2YwEbdrKat9o0EkfstPF-zZRGGbGcwqeuRU=.4e115a3e-2fb5-4220-8072-2f66e021c321@github.com> References: <-sIllD9F2YwEbdrKat9o0EkfstPF-zZRGGbGcwqeuRU=.4e115a3e-2fb5-4220-8072-2f66e021c321@github.com> Message-ID: On Tue, 17 Sep 2024 08:51:55 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> move some code around > > test/hotspot/jtreg/compiler/lib/compile_framework/README.md line 31: > >> 29: ### Creating a new Compile Framework Instance >> 30: >> 31: First, one must create a `new CompileFramework()`, which creates two directories: a sources and a classes directory. The sources directory is where all the sources are placed by the Compile Framework, and the classes directory is where all the compiled classes are placed by the Compile Framework. > > You could add here that this is a fixed-named directory created inside the JTreg scratch directory. You can also add a link to the actual name where this is created inside `CompileFramework`. Added your offline suggestion: `First, one must create a `new CompileFramework()`, which creates two directories: a sources and a classes directory (see `sourcesDir` and `classesDir` in [CompileFramework](./CompileFramework.java))` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1764496476 From roland at openjdk.org Wed Sep 18 07:10:18 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 18 Sep 2024 07:10:18 GMT Subject: RFR: 8333258: C2: high memory usage in PhaseCFG::insert_anti_dependences() [v6] In-Reply-To: <-_h_jEZfru-0qViKUZmd8T3IE8hGuwY4Y6ccpRpc6y0=.845773d5-4e1c-4f12-b07c-6be4d70be8f4@github.com> References: <4nYVjz-ULoZGwX8qbWOLPvTIMxq6C3IYCWiEdsaC8sk=.527878ef-f3ad-4085-a321-da812e201cea@github.com> <-_h_jEZfru-0qViKUZmd8T3IE8hGuwY4Y6ccpRpc6y0=.845773d5-4e1c-4f12-b07c-6be4d70be8f4@github.com> Message-ID: On Tue, 17 Sep 2024 16:15:53 GMT, Vladimir Kozlov wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: >> >> - review >> - Merge branch 'master' into JDK-8333258 >> - more review >> - more review >> - Merge branch 'master' into JDK-8333258 >> - review >> - Merge branch 'master' into JDK-8333258 >> - refactoring >> - Merge branch 'master' into JDK-8333258 >> - review >> - ... and 3 more: https://git.openjdk.org/jdk/compare/73db4d82...4511c175 > > Latest changes look good to me. @vnkozlov thanks for the re-review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19791#issuecomment-2357671875 From roland at openjdk.org Wed Sep 18 07:10:20 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 18 Sep 2024 07:10:20 GMT Subject: Integrated: 8333258: C2: high memory usage in PhaseCFG::insert_anti_dependences() In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 12:54:26 GMT, Roland Westrelin wrote: > In a debug build, `PhaseCFG::insert_anti_dependences()` is called > twice for a single node: once for actual processing, once for > verification. > > In TestAntiDependenciesHighMemUsage, the test has a `Region` that > merges 337 incoming path. It also has one `Phi` per memory slice that > are stored to: 1000 `Phi` nodes. Each `Phi` node has 337 inputs that > are identical except for one. The common input is the memory state on > method entry. The test has 60 `Load` that needs to be processed for > anti dependences. All `Load` share the same memory input: the memory > state on method entry. For each `Load`, all `Phi` nodes are pushed 336 > times on the work lists for anti dependence processing because all of > them appear multiple times as uses of each `Load`s memory state: `Phi`s > are pushed 336 000 on 2 work lists. Memory is not reclaimed on exit > from `PhaseCFG::insert_anti_dependences()` so memory usage grows as > `Load` nodes are processed: > > 336000 * 2 work lists * 60 loads * 8 bytes pointer = 322 MB. > > The fix I propose for this is to not push `Phi` nodes more than once > when they have the same inputs multiple times. > > In TestAntiDependenciesHighMemUsage2, the test has 4000 loads. For > each of them, when processed for anti dependences, all 4000 loads are > pushed on the work lists because they share the same memory > input. Then when they are popped from the work list, they are > discarded because only stores are of interest: > > 4000 loads processed * 4000 loads pushed * 2 work lists * 8 bytes pointer = 256 MB. > > The fix I propose for this is to test before pushing on the work list > whether a node is a store or not. > > Finally, I propose adding a `ResourceMark` so memory doesn't > accumulate over calls to `PhaseCFG::insert_anti_dependences()`. This pull request has now been integrated. Changeset: 5381f553 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/5381f553ad61ddaa44d49c3039a05511cc68bdd0 Stats: 11031 lines in 3 files changed: 10990 ins; 11 del; 30 mod 8333258: C2: high memory usage in PhaseCFG::insert_anti_dependences() Reviewed-by: kvn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/19791 From rcastanedalo at openjdk.org Wed Sep 18 07:18:12 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 18 Sep 2024 07:18:12 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v22] In-Reply-To: <9qt_iRfNSfdLuraZ18LqQx_xVt7xNPF9SBRZXwWkIig=.f86597de-6440-4235-9152-55b3a08c0d89@github.com> References: <9qt_iRfNSfdLuraZ18LqQx_xVt7xNPF9SBRZXwWkIig=.f86597de-6440-4235-9152-55b3a08c0d89@github.com> Message-ID: On Tue, 17 Sep 2024 16:09:30 GMT, Vladimir Kozlov wrote: > Looks good to me. Thanks for reviewing, Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2357686525 From jbhateja at openjdk.org Wed Sep 18 07:21:52 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 18 Sep 2024 07:21:52 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v12] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Incorporating review and documentation suggestions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/29530047..31a58642 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=10-11 Stats: 96 lines in 8 files changed: 25 ins; 0 del; 71 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From jbhateja at openjdk.org Wed Sep 18 07:21:52 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 18 Sep 2024 07:21:52 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7bghGF2-qbhP1hJA2ljtdA3xSUSqiV0RLaOYm4AcZSQ=.eb3e36b2-5461-4755-ae71-2de89660649f@github.com> Message-ID: On Fri, 13 Sep 2024 14:49:01 GMT, Emanuel Peter wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 544: >> >>> 542: byte[] vpayload1 = ((ByteVector)v1).vec(); >>> 543: byte[] vpayload2 = ((ByteVector)v2).vec(); >>> 544: byte[] vpayload3 = ((ByteVector)v3).vec(); >> >> Is there a reason you are not using more descriptive names here instead of `vpayload1`? >> I also wonder if the `selectFromHelper` should not be named more specifically: `selectFromTwoVector(s)Helper`? > > You only gave me a thumbs up and no change - but comment resolved. Is that intentional? Makes me feel like you are ignoring my comments, and that discourages me from reviewing in the future. Routine was renamed as per you suggestion and first vector argument also appropriately renamed to wrappedIndex. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1764527888 From epeter at openjdk.org Wed Sep 18 07:26:08 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Sep 2024 07:26:08 GMT Subject: RFR: 8340272: C2 SuperWord: JMH benchmark for Reduction vectorization In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 02:58:10 GMT, Jasmine Karthikeyan wrote: >> I'm adding some proper JMH benchmarks for vectorized reductions. There are already some others, but they are not comprehensive or not JMH. >> >> Plus, I wanted to do a performance-investigation, hopefully leading to some improvements. **See Future Work below**. >> >> **How I run my benchmarks** >> >> All benchmarks >> `make test TEST="micro:vm.compiler.VectorReduction2" CONF=linux-x64` >> >> Some specific benchmark, with profiler that tells me which code snippet is hottest: >> `make test TEST="micro:vm.compiler.VectorReduction2.*doubleMinDotProduct" CONF=linux-x64 MICRO="OPTIONS=-prof perfasm"` >> >> **JMH logs** >> >> Run on my AVX512 laptop, with master: >> [run_avx512_master.txt](https://github.com/user-attachments/files/17025111/run_avx512_master.txt) >> >> Run on remote asimd (aarch64, NEON) machine: >> [run_asimd_master.txt](https://github.com/user-attachments/files/17025579/run_asimd_master.txt) >> >> **Results** >> >> I ran it on 2 machines so far. Left on my AVX512 machine, right on a ASIMD/NEON/aarch64 machine. >> >> Here the interesting `int / long / float / double` results, discussion further below: >> ![image](https://github.com/user-attachments/assets/20abfa7b-aee6-4654-bf4d-e3abc4bbfc8b) >> >> >> And there the less spectacular `byte / char / short` results. There is no vectorization of these cases. But there seems to be some issue with over-unrolling on my AVX512 machine, one case I looked at would only unroll 4x without SuperWord, but 16x with, and that seems to be unfavourable. >> >> ![image](https://github.com/user-attachments/assets/6e1c69cf-db6c-4d33-8750-c8797ffc39a2) >> >> Here the PDF: >> [benchmark_results.pdf](https://github.com/user-attachments/files/17027695/benchmark_results.pdf) >> >> >> **Why are all the ...Simple benchmarks not vectorizing, i.e. "not profitable"?** >> >> Apparently, there must be sufficient "work" vectors to outweith the "reduction" vectors. >> The idea used to be that one should have at least 2 work vectors which tend to be profitable, to outweigh the cost of a single reduction vector. >> >> // Check if reductions are connected >> if (is_marked_reduction(p0)) { >> Node* second_in = p0->in(2); >> Node_List* second_pk = get_pack(second_in); >> if ((second_pk == nullptr) || (_num_work_vecs == _num_reductions)) { >> // No parent pack or not enough work >> // to cover reduction expansion overhead >> return false; >> } else if (second_pk->size() != p->size()) { >> return false; >> } >> } >> >> >> ... > > Looks nice, the benchmark is very thorough! I was interested to see how it performed on my Zen 3 (AVX2) machine, I've attached the results here in case it's interesting/useful: [perf_results.txt](https://github.com/user-attachments/files/17037796/perf_results.txt) @jaskarth thanks for the benchmark! I included it in these results now: [benchmark_results.pdf](https://github.com/user-attachments/files/17040018/benchmark_results.pdf) The results are quite comparable to the AVX512 results. Some comments: - `byte / char / short`: there is also some variation here, but it seems slightly different. We might want to investigate that anyway, especially the regressions that are in the `15-25%` range. It is also possible that we could invest more to even vectorize these cases, but the IR is more complicated with all the "cast to byte/char/short", i.e. the right and left shifting required to remove the upper bits. Pattern matching those cases is difficult with the current SuperWord structure, as far as I can see. I'm open to ideas/suggestions here ;) - `int / float / double` performance is as expected, parallel to ASIMD and AVX512. Good. - `long`: - `MulVL` is not implemented for AVX2 (hardware does not have them as far as I know), so those benchmarks results are as expected. - Your results around the long min/max are a bit unexpected, especially because the vectorization is not supposed to work as far as I know. Could be interesting to investigate more there. ![image](https://github.com/user-attachments/assets/b4e28637-04e8-431f-bd4f-9170d9461133) ![image](https://github.com/user-attachments/assets/1d0caa02-399c-4549-b314-ef460af133f6) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21032#issuecomment-2357701154 From rcastanedalo at openjdk.org Wed Sep 18 07:49:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 18 Sep 2024 07:49:52 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v23] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with seven additional commits since the last revision: - Assert that unneeded stub tmp registers are not initialized in x64 and aarch64 platforms - Set tmp registers to noreg by default in G1PreBarrierStubC2::initialize_registers, for consistency - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion - Restore some asserts - Default values for tmp regs of G1PostBarrierStubC2 - 8334060: [arm32] Implementation of Late Barrier Expansion for G1 - 8330685: [arm32] share barrier spilling logic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/71a51bfc..13b93bd9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=21-22 Stats: 614 lines in 12 files changed: 521 ins; 36 del; 57 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Wed Sep 18 08:00:30 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 18 Sep 2024 08:00:30 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v23] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 07:49:52 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with seven additional commits since the last revision: > > - Assert that unneeded stub tmp registers are not initialized in x64 and aarch64 platforms > - Set tmp registers to noreg by default in G1PreBarrierStubC2::initialize_registers, for consistency > - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion > - Restore some asserts > - Default values for tmp regs of G1PostBarrierStubC2 > - 8334060: [arm32] Implementation of Late Barrier Expansion for G1 > - 8330685: [arm32] share barrier spilling logic Thanks for the arm 32-bits port @snazarkin! Merged in commit 3957c03f. Besides the arm 32-bits port, @snazarkin's changeset includes adding the possibility to use a third temporary register in the platform-independent class `G1PostBarrierStubC2`. This temporary register (`G1PostBarrierStubC2::_tmp3`) is initialized to `noreg` by default in `G1PostBarrierStubC2::initialize_registers`, so no other platform should be affected. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2357765066 From mdoerr at openjdk.org Wed Sep 18 08:30:10 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 18 Sep 2024 08:30:10 GMT Subject: RFR: 8340230: Tests crash: assert(is_in_encoding_range || k->is_interface() || k->is_abstract()) failed: sanity In-Reply-To: References: Message-ID: <_BBBGGNSbPFts8QJ_2wyAEToZnF1h3brs3nX51jK5XQ=.8451f1f3-07c0-4023-b5ea-9c67aaa6ad7b@github.com> On Tue, 17 Sep 2024 10:45:09 GMT, Martin Doerr wrote: > If exactly one of UseCompressedOops and UseCompressedClassPointers is enabled: > Only the transformation for the enabled part should be used. Test results look good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21036#issuecomment-2357824368 From mdoerr at openjdk.org Wed Sep 18 08:30:11 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 18 Sep 2024 08:30:11 GMT Subject: Integrated: 8340230: Tests crash: assert(is_in_encoding_range || k->is_interface() || k->is_abstract()) failed: sanity In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 10:45:09 GMT, Martin Doerr wrote: > If exactly one of UseCompressedOops and UseCompressedClassPointers is enabled: > Only the transformation for the enabled part should be used. This pull request has now been integrated. Changeset: 3895b8fc Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/3895b8fc0b2c6d187080dba6fe08297adad4a480 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod 8340230: Tests crash: assert(is_in_encoding_range || k->is_interface() || k->is_abstract()) failed: sanity Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21036 From tholenstein at openjdk.org Wed Sep 18 09:06:22 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 18 Sep 2024 09:06:22 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access [v2] In-Reply-To: References: Message-ID: > We failed in `LibraryCallKit::inline_unsafe_access()` while trying to inline `Unsafe::getShortUnaligned`. > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java#L86 > The reason is that base (the array) is `ConP #null` hidden behind two `CheckCastPP` with `speculative=byte[int:>=0]` > > We call `Node* adr = make_unsafe_address(base, offset, type, kind == Relaxed);` > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2361 > - with **base** = `147 CheckCastPP` > - `118 ConP === 0 [[[ 106 101 71 ] #null` > type > > Depending on the **offset** we go two different paths in `LibraryCallKit::make_unsafe_address` which both lead to the same error in the end. > 1. For `UNSAFE.getShortUnaligned(array, 1_049_000)` we get kind = `Type::AnyPtr` because `offset >= os::vm_page_size()`. Since we assume base can't be null we insert an assert: > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2111 > > 2. whereas for `UNSAFE.getShortUnaligned(array, 1)` we get kind = `Type:: OopPtr` > https://github.com/openjdk/jdk/blob/c17fa910cf3bad48547a3f0d68a30795ec3194e6/src/hotspot/share/opto/library_call.cpp#L2078 > and insert a null check > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2090 > In both cases we return call `basic_plus_adr(..)` on a base being `top()` which returns **adr** = `1 Con === 0 [[ ]] #top` > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2386 => `_gvn.type(adr)` is _top_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2394 => `adr_type` is _nullptr_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2405-L2406 => `BasicType bt` is _T_ILLEGAL_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2424 => we fail here with `SIGSEGV: null pointer dereference` because `alias_type->adr_type()` is _nullptr_ > > ### Fix > the fix is to bailed out in this case > https://github.com/openjdk/jdk/blob/3d5d51e228c19a... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: Fix 2.0 : Add uncast in LibraryCallKit::classify_unsafe_addr ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20033/files - new: https://git.openjdk.org/jdk/pull/20033/files/34c6e0de..2ab02f83 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20033&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20033&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20033.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20033/head:pull/20033 PR: https://git.openjdk.org/jdk/pull/20033 From tholenstein at openjdk.org Wed Sep 18 09:11:06 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 18 Sep 2024 09:11:06 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access In-Reply-To: <_ExKmq0qRxs73hnA6HBDJG0b_Zqi715jkZ1pIvv_JsA=.68781292-7849-4a01-87c6-86ae65470afc@github.com> References: <_ExKmq0qRxs73hnA6HBDJG0b_Zqi715jkZ1pIvv_JsA=.68781292-7849-4a01-87c6-86ae65470afc@github.com> Message-ID: On Fri, 6 Sep 2024 19:45:43 GMT, Vladimir Ivanov wrote: > > I think Vladimir's question is: how can null_check_oop() return top? AFAIU, it creates a CastPP with 147 as input and that CastPP is transformed to top. How does that happen? What are the steps in the call to _gvn.transform( cast ); that lead to a result of top. > > I think that what should happen when compiler tries to cast a value to an empty type in a dead code. > > Toby's response answered my question: it's a GVN on `CmpP` which determines that both inputs are `NULL` and degenerates the check into an unconditional uncommon trap. (I believe it's `in1->eqv_uncast(in2)` in `SubNode::Value_common()` which does the job.) In such case, performing `base->uncast()` in `LibraryCallKit::classify_unsafe_addr()` seems appropriate to me. Yes, this is exactly what's happening - `in1->eqv_uncast(in2) in `SubNode::Value_common()` determines that both inputs are NULL, which degenerates the check into an unconditional uncommon trap. I've updated the PR to use `base->uncast()` in `LibraryCallKit::classify_unsafe_addr()` as the new fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20033#issuecomment-2357914987 From rehn at openjdk.org Wed Sep 18 09:31:24 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 18 Sep 2024 09:31:24 GMT Subject: RFR: 8333248: VectorGatherMaskFoldingTest.java failed when maximum vector bits is 64 [v5] In-Reply-To: References: Message-ID: On Thu, 13 Jun 2024 15:16:32 GMT, Gui Cao wrote: >> Hi, VectorGatherMaskFoldingTest.java Test fails when max vector bits is 64, when max vector bits is 64, LongVector.SPECIES_MAX.length() and DoubleVector.SPECIES_MAX.length() is 1. >> >> We can reproduce this problem in two ways: >> 1. We can use riscv without rvv1.0 board to reproduce this problem >> 2. Run VectorGatherMaskFoldingTest.java on aarch64 client mode without `-XX:+IncrementalInlineForceCleanup` Option, the `-XX:+IncrementalInlineForceCleanup` is C2 Option, so we need to remove this Option from the VectorGatherMaskFoldingTest.main method. error message: >> >> Base Test: @Test testDoubleVectorStoreLoadMaskedVector: >> compiler.lib.ir_framework.shared.TestRunException: There was an error while invoking @Test method public static void compiler.vectorapi.VectorGatherMaskFoldingTest.testDoubleVectorStoreLoadMaskedVector(). Target: null. Arguments: >> at compiler.lib.ir_framework.test.BaseTest.invokeTestMethod(BaseTest.java:84) >> at compiler.lib.ir_framework.test.BaseTest.invokeTest(BaseTest.java:71) >> at compiler.lib.ir_framework.test.AbstractTest.run(AbstractTest.java:98) >> at compiler.lib.ir_framework.test.TestVM.runTests(TestVM.java:861) >> at compiler.lib.ir_framework.test.TestVM.start(TestVM.java:252) >> at compiler.lib.ir_framework.test.TestVM.main(TestVM.java:165) >> Caused by: java.lang.reflect.InvocationTargetException >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:118) >> at java.base/java.lang.reflect.Method.invoke(Method.java:580) >> at compiler.lib.ir_framework.test.BaseTest.invokeTestMethod(BaseTest.java:80) >> ... 5 more >> Caused by: java.lang.RuntimeException: assertNotEquals: expected [1.0] to not equal [1.0] >> at jdk.test.lib.Asserts.fail(Asserts.java:691) >> at jdk.test.lib.Asserts.assertNotEquals(Asserts.java:451) >> at jdk.test.lib.Asserts.assertNotEquals(Asserts.java:435) >> at compiler.vectorapi.VectorGatherMaskFoldingTest.testDoubleVectorStoreLoadMaskedVector(VectorGatherMaskFoldingTest.java:1089) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) >> ... 7 more >> >> >> For example, the following method will be failed: >> >> private static final VectorSpecies L_SPECIES = LongVector.SPECIES_MAX; >> private static final VectorSpecies D_SPECIES = DoubleVector.SPECIES_MAX; >> ... >> @Test >> @IR(counts = { IRNode.STORE_VECTOR_MASKED, ">= 1", IRNode.LOAD_VECTOR_MASKED, ">= 1" }, apply... > > Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into DK-8333248 > - Add -XX:+IgnoreUnrecognizedVMOptions to mask unrecognized VM option 'IncrementalInlineForceCleanup' in client vm mode > - Fix for some missed > - Fix for Damon comment > - 8333248: VectorGatherMaskFoldingTest.java failed when maximum vector bits is 64 This doesn't seem to made it to jdk23 AFAICT ? Can we backport it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19473#issuecomment-2357958367 From yzheng at openjdk.org Wed Sep 18 09:52:06 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 18 Sep 2024 09:52:06 GMT Subject: RFR: 8339939: [JVMCI] Don't compress abstract and interface Klasses In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 10:42:12 GMT, Thomas Stuefe wrote: >> src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstant.java line 29: >> >>> 27: /** >>> 28: * Marker interface for hotspot specific constants. >>> 29: */ >> >> Let's take this opportunity to improve this javadoc: >> >> /** >> * A value in a space managed by Hotspot (e.g. heap or metaspace). >> * Some of these values can be referenced with a compressed pointer (32 bits) >> * instead of a full word-sized pointer. >> */ > > drive-by comment, 32-bit is an implementation detail. The width of a narrowKlass will be adjustable with the upcoming JEP450. Referring to 32 bit may be obsolete soon. Thanks for the note! By adjustable you mean it can go beyond 32 bits? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20949#discussion_r1764760056 From stuefe at openjdk.org Wed Sep 18 10:12:09 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 18 Sep 2024 10:12:09 GMT Subject: RFR: 8339939: [JVMCI] Don't compress abstract and interface Klasses In-Reply-To: References: Message-ID: <7TkgOs_GnhgNa7qogZoIsm3QPfzxx6yMXUSCE8wpFm0=.84f010ef-20fe-4799-863d-c84b84e1b570@github.com> On Wed, 18 Sep 2024 09:49:48 GMT, Yudi Zheng wrote: >> drive-by comment, 32-bit is an implementation detail. The width of a narrowKlass will be adjustable with the upcoming JEP450. Referring to 32 bit may be obsolete soon. > > Thanks for the note! By adjustable you mean it can go beyond 32 bits? No, it will be smaller. Lilliput 1 will probably ship with 22 bit narrowKlass, and we may reduce this further. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20949#discussion_r1764786815 From rcastanedalo at openjdk.org Wed Sep 18 10:32:06 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 18 Sep 2024 10:32:06 GMT Subject: RFR: 8340363: User-specified default decorators for UnifiedLogging In-Reply-To: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: On Fri, 13 Sep 2024 09:03:55 GMT, Ant?n Seoane wrote: > Hi all, > > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: > > - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. > - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. > - Additionally, defaults may target a specific log level. > > Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. > > In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. > > Please consider this PR, and thanks! Labeling the PR as `hotspot-compiler` because it proposes changing the default decorators of `jit+inlining`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20988#issuecomment-2358104693 From rehn at openjdk.org Wed Sep 18 10:50:14 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 18 Sep 2024 10:50:14 GMT Subject: RFR: 8333248: VectorGatherMaskFoldingTest.java failed when maximum vector bits is 64 [v5] In-Reply-To: References: Message-ID: On Thu, 13 Jun 2024 15:16:32 GMT, Gui Cao wrote: >> Hi, VectorGatherMaskFoldingTest.java Test fails when max vector bits is 64, when max vector bits is 64, LongVector.SPECIES_MAX.length() and DoubleVector.SPECIES_MAX.length() is 1. >> >> We can reproduce this problem in two ways: >> 1. We can use riscv without rvv1.0 board to reproduce this problem >> 2. Run VectorGatherMaskFoldingTest.java on aarch64 client mode without `-XX:+IncrementalInlineForceCleanup` Option, the `-XX:+IncrementalInlineForceCleanup` is C2 Option, so we need to remove this Option from the VectorGatherMaskFoldingTest.main method. error message: >> >> Base Test: @Test testDoubleVectorStoreLoadMaskedVector: >> compiler.lib.ir_framework.shared.TestRunException: There was an error while invoking @Test method public static void compiler.vectorapi.VectorGatherMaskFoldingTest.testDoubleVectorStoreLoadMaskedVector(). Target: null. Arguments: >> at compiler.lib.ir_framework.test.BaseTest.invokeTestMethod(BaseTest.java:84) >> at compiler.lib.ir_framework.test.BaseTest.invokeTest(BaseTest.java:71) >> at compiler.lib.ir_framework.test.AbstractTest.run(AbstractTest.java:98) >> at compiler.lib.ir_framework.test.TestVM.runTests(TestVM.java:861) >> at compiler.lib.ir_framework.test.TestVM.start(TestVM.java:252) >> at compiler.lib.ir_framework.test.TestVM.main(TestVM.java:165) >> Caused by: java.lang.reflect.InvocationTargetException >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:118) >> at java.base/java.lang.reflect.Method.invoke(Method.java:580) >> at compiler.lib.ir_framework.test.BaseTest.invokeTestMethod(BaseTest.java:80) >> ... 5 more >> Caused by: java.lang.RuntimeException: assertNotEquals: expected [1.0] to not equal [1.0] >> at jdk.test.lib.Asserts.fail(Asserts.java:691) >> at jdk.test.lib.Asserts.assertNotEquals(Asserts.java:451) >> at jdk.test.lib.Asserts.assertNotEquals(Asserts.java:435) >> at compiler.vectorapi.VectorGatherMaskFoldingTest.testDoubleVectorStoreLoadMaskedVector(VectorGatherMaskFoldingTest.java:1089) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) >> ... 7 more >> >> >> For example, the following method will be failed: >> >> private static final VectorSpecies L_SPECIES = LongVector.SPECIES_MAX; >> private static final VectorSpecies D_SPECIES = DoubleVector.SPECIES_MAX; >> ... >> @Test >> @IR(counts = { IRNode.STORE_VECTOR_MASKED, ">= 1", IRNode.LOAD_VECTOR_MASKED, ">= 1" }, apply... > > Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into DK-8333248 > - Add -XX:+IgnoreUnrecognizedVMOptions to mask unrecognized VM option 'IncrementalInlineForceCleanup' in client vm mode > - Fix for some missed > - Fix for Damon comment > - 8333248: VectorGatherMaskFoldingTest.java failed when maximum vector bits is 64 > /backport jdk23u Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19473#issuecomment-2358137846 From epeter at openjdk.org Wed Sep 18 12:06:12 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Sep 2024 12:06:12 GMT Subject: RFR: 8340272: C2 SuperWord: JMH benchmark for Reduction vectorization In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 02:58:10 GMT, Jasmine Karthikeyan wrote: >> I'm adding some proper JMH benchmarks for vectorized reductions. There are already some others, but they are not comprehensive or not JMH. >> >> Plus, I wanted to do a performance-investigation, hopefully leading to some improvements. **See Future Work below**. >> >> **How I run my benchmarks** >> >> All benchmarks >> `make test TEST="micro:vm.compiler.VectorReduction2" CONF=linux-x64` >> >> Some specific benchmark, with profiler that tells me which code snippet is hottest: >> `make test TEST="micro:vm.compiler.VectorReduction2.*doubleMinDotProduct" CONF=linux-x64 MICRO="OPTIONS=-prof perfasm"` >> >> **JMH logs** >> >> Run on my AVX512 laptop, with master: >> [run_avx512_master.txt](https://github.com/user-attachments/files/17025111/run_avx512_master.txt) >> >> Run on remote asimd (aarch64, NEON) machine: >> [run_asimd_master.txt](https://github.com/user-attachments/files/17025579/run_asimd_master.txt) >> >> **Results** >> >> I ran it on 2 machines so far. Left on my AVX512 machine, right on a ASIMD/NEON/aarch64 machine. >> >> Here the interesting `int / long / float / double` results, discussion further below: >> ![image](https://github.com/user-attachments/assets/20abfa7b-aee6-4654-bf4d-e3abc4bbfc8b) >> >> >> And there the less spectacular `byte / char / short` results. There is no vectorization of these cases. But there seems to be some issue with over-unrolling on my AVX512 machine, one case I looked at would only unroll 4x without SuperWord, but 16x with, and that seems to be unfavourable. >> >> ![image](https://github.com/user-attachments/assets/6e1c69cf-db6c-4d33-8750-c8797ffc39a2) >> >> Here the PDF: >> [benchmark_results.pdf](https://github.com/user-attachments/files/17027695/benchmark_results.pdf) >> >> >> **Why are all the ...Simple benchmarks not vectorizing, i.e. "not profitable"?** >> >> Apparently, there must be sufficient "work" vectors to outweith the "reduction" vectors. >> The idea used to be that one should have at least 2 work vectors which tend to be profitable, to outweigh the cost of a single reduction vector. >> >> // Check if reductions are connected >> if (is_marked_reduction(p0)) { >> Node* second_in = p0->in(2); >> Node_List* second_pk = get_pack(second_in); >> if ((second_pk == nullptr) || (_num_work_vecs == _num_reductions)) { >> // No parent pack or not enough work >> // to cover reduction expansion overhead >> return false; >> } else if (second_pk->size() != p->size()) { >> return false; >> } >> } >> >> >> ... > > Looks nice, the benchmark is very thorough! I was interested to see how it performed on my Zen 3 (AVX2) machine, I've attached the results here in case it's interesting/useful: [perf_results.txt](https://github.com/user-attachments/files/17037796/perf_results.txt) @jaskarth @vnkozlov thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21032#issuecomment-2358280361 From epeter at openjdk.org Wed Sep 18 12:06:13 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Sep 2024 12:06:13 GMT Subject: Integrated: 8340272: C2 SuperWord: JMH benchmark for Reduction vectorization In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 07:53:40 GMT, Emanuel Peter wrote: > I'm adding some proper JMH benchmarks for vectorized reductions. There are already some others, but they are not comprehensive or not JMH. > > Plus, I wanted to do a performance-investigation, hopefully leading to some improvements. **See Future Work below**. > > **How I run my benchmarks** > > All benchmarks > `make test TEST="micro:vm.compiler.VectorReduction2" CONF=linux-x64` > > Some specific benchmark, with profiler that tells me which code snippet is hottest: > `make test TEST="micro:vm.compiler.VectorReduction2.*doubleMinDotProduct" CONF=linux-x64 MICRO="OPTIONS=-prof perfasm"` > > **JMH logs** > > Run on my AVX512 laptop, with master: > [run_avx512_master.txt](https://github.com/user-attachments/files/17025111/run_avx512_master.txt) > > Run on remote asimd (aarch64, NEON) machine: > [run_asimd_master.txt](https://github.com/user-attachments/files/17025579/run_asimd_master.txt) > > **Results** > > I ran it on 2 machines so far. Left on my AVX512 machine, right on a ASIMD/NEON/aarch64 machine. > > Here the interesting `int / long / float / double` results, discussion further below: > ![image](https://github.com/user-attachments/assets/20abfa7b-aee6-4654-bf4d-e3abc4bbfc8b) > > > And there the less spectacular `byte / char / short` results. There is no vectorization of these cases. But there seems to be some issue with over-unrolling on my AVX512 machine, one case I looked at would only unroll 4x without SuperWord, but 16x with, and that seems to be unfavourable. > > ![image](https://github.com/user-attachments/assets/6e1c69cf-db6c-4d33-8750-c8797ffc39a2) > > Here the PDF: > [benchmark_results.pdf](https://github.com/user-attachments/files/17027695/benchmark_results.pdf) > > > **Why are all the ...Simple benchmarks not vectorizing, i.e. "not profitable"?** > > Apparently, there must be sufficient "work" vectors to outweith the "reduction" vectors. > The idea used to be that one should have at least 2 work vectors which tend to be profitable, to outweigh the cost of a single reduction vector. > > // Check if reductions are connected > if (is_marked_reduction(p0)) { > Node* second_in = p0->in(2); > Node_List* second_pk = get_pack(second_in); > if ((second_pk == nullptr) || (_num_work_vecs == _num_reductions)) { > // No parent pack or not enough work > // to cover reduction expansion overhead > return false; > } else if (second_pk->size() != p->size()) { > return false; > } > } > > > But when I disable this code, then I see on the aarch64/ASIMD machine: > > VectorReduction2.NoSuperword.intAddSimpl... This pull request has now been integrated. Changeset: aeba1ea7 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/aeba1ea7c44d6b378decf8557c8cd9fc7bfb7df5 Stats: 1454 lines in 1 file changed: 1454 ins; 0 del; 0 mod 8340272: C2 SuperWord: JMH benchmark for Reduction vectorization Reviewed-by: kvn, jkarthikeyan ------------- PR: https://git.openjdk.org/jdk/pull/21032 From epeter at openjdk.org Wed Sep 18 12:13:12 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Sep 2024 12:13:12 GMT Subject: RFR: 8336702: C2 compilation fails with "all memory state should have been processed" assert In-Reply-To: References: Message-ID: <0sw5s6nN8FKInMD7qNCuBBa4w2uK-FBV505eke63dA4=.1fc70e4e-01e1-4763-ade6-98f841f84b9f@github.com> On Mon, 16 Sep 2024 08:34:44 GMT, Roland Westrelin wrote: > When converting a `LongCountedLoop` into a loop nest, c2 needs jvm > state to add predicates to the inner loop. For that, it peels an > iteration of the loop and uses the state of the safepoint at the end > of the loop. That's only legal if there's no side effect between the > safepoint and the backedge that goes back into the loop. The assert > failure here happens in code that checks that. > > That code compares the memory states at the safepoint and at the > backedge. If they are the same then there's no side effect. To check > consistency, the `MergeMem` at the safepoint is cloned. As the logic > iterates over the backedge state, it clears every component of the > state it encounters from the `MergeMem`. Once done, the cloned > `MergeMem` should be "empty". In the case of this failure, no side > effect is found but the cloned `MergeMem` is not empty. That happens > because of EA: it adds edges to the `MergeMem` at the safepoint that > it doesn't add to the backedge `Phis`. > > So it's the verification code that fails and I propose dealing with > this by ignoring memory state added by EA in the verification code. Drive-by comment. I don't understand enough about EA to review the VM part. test/hotspot/jtreg/compiler/longcountedloops/TestSafePointWithEAState.java line 59: > 57: float n; > 58: h(float n) { this.n = n; } > 59: } Java indentation is supposed to be 4 spaces ;) Adding some explicit brackets would also be nice, but that is more subjective. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21009#pullrequestreview-2312486543 PR Review Comment: https://git.openjdk.org/jdk/pull/21009#discussion_r1764934007 From epeter at openjdk.org Wed Sep 18 12:18:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Sep 2024 12:18:11 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v12] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Wed, 18 Sep 2024 07:21:52 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Incorporating review and documentation suggestions. Generally, from a C2 point of view this looks good now. ------------- PR Review: https://git.openjdk.org/jdk/pull/20508#pullrequestreview-2312501448 From epeter at openjdk.org Wed Sep 18 12:18:12 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Sep 2024 12:18:12 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> <0Ak63KM4JYdbOT33Q8XPcv096MKzjY6vyl4xZkapZwM=.6b92e423-9bd5-4109-a0b3-75dbeaf40315@github.com> Message-ID: On Tue, 17 Sep 2024 07:02:20 GMT, Jatin Bhateja wrote: >> src/hotspot/share/opto/vectornode.cpp line 2122: >> >>> 2120: // index format by subsequent VectorLoadShuffle. >>> 2121: int cast_vopc = VectorCastNode::opcode(0, index_elem_bt, true); >>> 2122: Node* index_byte_vec = phase->transform(VectorCastNode::make(cast_vopc, index_vec, T_BYTE, num_elem)); >> >> This cast assumes that the indices cannot have more than 8 bits. This would allow vector lengths of up to 256. This is fine for intel. But as far as I know ARM has in principle longer vectors - up to 2048 bytes. Should we maybe add some assert here to make sure we never badly truncate the index? > > Shuffle overhaul is on our todo list, its a know limitation which we tried lifting once, yes you read it correctly, its a limitation for AARCH64 SVE once a 2048 bits vector systems are available, IIRC current max vector size on any available AARCH64 system is 256 bits, with Neoverse V2 they shrink the vector size back to 16 bytes. Are there any asserts that would catch this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1764943566 From epeter at openjdk.org Wed Sep 18 12:26:12 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Sep 2024 12:26:12 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v2] In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Fri, 13 Sep 2024 22:30:36 GMT, Sandhya Viswanathan wrote: >> Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. >> >> Summary of changes is as follows: >> 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. >> 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code >> >> For the following source: >> >> >> public void test() { >> var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); >> for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { >> var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); >> index.selectFrom(inpvect).intoArray(byteres, j); >> } >> } >> >> >> The code generated for inner main now looks as follows: >> ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 >> 0x00007f40d02274d0: movslq %ebx,%r13 >> 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) >> 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) >> 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) >> 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) >> 0x00007f40d022751f: add $0x40,%ebx >> 0x00007f40d0227522: cmp %r8d,%ebx >> 0x00007f40d0227525: jl 0x00007f40d02274d0 >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments I'm a bit confused by the name `shuffleWrapIndexes` and `inline_vector_shuffle_wrap_indexes`. Are you **shuffling wrap-indexes**? I don't know what that would even mean. I think you should name it `wrapShuffleIndexes`. Or is there any naming convention in the VectorAPI that prevents this? ------------- PR Review: https://git.openjdk.org/jdk/pull/20634#pullrequestreview-2312528484 From epeter at openjdk.org Wed Sep 18 12:54:17 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Sep 2024 12:54:17 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v13] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 12:34:58 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Missed code fragment from last review comment resolution. > > src/hotspot/cpu/x86/x86.ad line 6578: > >> 6576: %} >> 6577: ins_pipe( pipe_slow ); >> 6578: %} > > Above, you name both the `format` and method name with `_reg` and `_mem`, but here you do not do it for the method name. Would be nice to keep it consistent. Below you also do it inconsistently. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1764976079 From epeter at openjdk.org Wed Sep 18 12:54:16 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Sep 2024 12:54:16 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v13] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 07:14:57 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Missed code fragment from last review comment resolution. Do we have tests for the publically exposed methods in this new file? Or are they only implicitly tested through the VectorAPI, and its tests? `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java` Why is this even called `VectorMath`? Because those ops are not at all restricted to vectorization, right? src/hotspot/cpu/x86/x86.ad line 6578: > 6576: %} > 6577: ins_pipe( pipe_slow ); > 6578: %} Above, you name both the `format` and method name with `_reg` and `_mem`, but here you do not do it for the method name. Would be nice to keep it consistent. src/hotspot/cpu/x86/x86.ad line 10793: > 10791: match(Set dst (SaturatingAddV (Binary dst (LoadVector src)) mask)); > 10792: match(Set dst (SaturatingSubV (Binary dst (LoadVector src)) mask)); > 10793: format %{ "vector_saturating_unsigned_masked $dst, $mask, $src" %} Suggestion: format %{ "vector_saturating_unsigned_subword_masked $dst, $mask, $src" %} src/hotspot/share/opto/vectornode.hpp line 81: > 79: static VectorNode* shift_count(int opc, Node* cnt, uint vlen, BasicType bt); > 80: static VectorNode* make(int opc, Node* n1, Node* n2, uint vlen, BasicType bt, bool is_var_shift = false); > 81: static VectorNode* make(int vopc, Node* n1, Node* n2, const TypeVect* vt, bool is_mask = false, bool is_var_shift = false, bool is_unsigned = false); Feels like this just slowly grows and grows... eventually we will have too many arguments. Not sure what is a better alternative though... src/hotspot/share/opto/vectornode.hpp line 386: > 384: class SaturatingSubVNode : public SaturatingVectorNode { > 385: public: > 386: SaturatingSubVNode(Node* in1, Node* in2, const TypeVect* vt, bool is_unsigned) : SaturatingVectorNode(in1,in2,vt,is_unsigned) {} Suggestion: SaturatingSubVNode(Node* in1, Node* in2, const TypeVect* vt, bool is_unsigned) : SaturatingVectorNode(in1, in2, vt, is_unsigned) {} spaces required by style guide src/hotspot/share/opto/vectornode.hpp line 598: > 596: class UMinVNode : public VectorNode { > 597: public: > 598: UMinVNode(Node* in1, Node* in2, const TypeVect* vt) : VectorNode(in1,in2,vt) { Suggestion: UMinVNode(Node* in1, Node* in2, const TypeVect* vt) : VectorNode(in1, in2, vt) { spaces required by style guide src/hotspot/share/opto/vectornode.hpp line 614: > 612: class UMaxVNode : public VectorNode { > 613: public: > 614: UMaxVNode(Node* in1, Node* in2, const TypeVect* vt) : VectorNode(in1,in2,vt) { Suggestion: UMaxVNode(Node* in1, Node* in2, const TypeVect* vt) : VectorNode(in1, in2, vt) { spaces required by style guide ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2312554183 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1764975321 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1764982201 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1764985438 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1764987143 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1764987547 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1764988807 From mli at openjdk.org Wed Sep 18 13:03:23 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 18 Sep 2024 13:03:23 GMT Subject: RFR: 8339992: RISC-V: some minor improvements of base64_vector_decode_round Message-ID: Hi, Can you help to review this simple improvement? Thanks. ------------- Commit messages: - comments - initial commit Changes: https://git.openjdk.org/jdk/pull/21059/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21059&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339992 Stats: 9 lines in 1 file changed: 4 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21059.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21059/head:pull/21059 PR: https://git.openjdk.org/jdk/pull/21059 From mli at openjdk.org Wed Sep 18 13:03:23 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 18 Sep 2024 13:03:23 GMT Subject: RFR: 8339992: RISC-V: some minor improvements of base64_vector_decode_round In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 12:57:41 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple improvement? > Thanks. @RealFYang As we discussed previously, we're supposed to do following change too, * `vsetvli(x0, failedIdx, Assembler::e8, lmul, Assembler::mu, Assembler::tu);` => remove mu/tu (i.e. change to ma/ta) But after study the spec again, I can not find out the exact word supporting this change (maybe I missed it), so for safety consideration I will leave it as is. If someone can find out the exact evidence in the spec that changing from mu/tu => mu/tu is absolutely safe on all devices, please kindly share your information, I'll add the change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21059#issuecomment-2358407204 From jbhateja at openjdk.org Wed Sep 18 13:43:32 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 18 Sep 2024 13:43:32 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v14] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/a6f8ee8b..5253706e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=12-13 Stats: 9 lines in 3 files changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From jbhateja at openjdk.org Wed Sep 18 13:47:08 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 18 Sep 2024 13:47:08 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v13] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 12:51:00 GMT, Emanuel Peter wrote: > Do we have tests for the publically exposed methods in this new file? Or are they only implicitly tested through the VectorAPI, and its tests? `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java` > > Why is this even called `VectorMath`? Because those ops are not at all restricted to vectorization, right? Nomenclature is suggested by Paul. We have sufficient test coverage of these APIs in JTREG tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2358514597 From jbhateja at openjdk.org Wed Sep 18 13:47:10 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 18 Sep 2024 13:47:10 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v13] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 12:35:28 GMT, Emanuel Peter wrote: >> src/hotspot/cpu/x86/x86.ad line 6578: >> >>> 6576: %} >>> 6577: ins_pipe( pipe_slow ); >>> 6578: %} >> >> Above, you name both the `format` and method name with `_reg` and `_mem`, but here you do not do it for the method name. Would be nice to keep it consistent. > > Below you also do it inconsistently. > Above, you name both the `format` and method name with `_reg` and `_mem`, but here you do not do it for the method name. Would be nice to keep it consistent. Memory operands are sufficient to implicitly infer memory flavor of opto assembly instruction. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1765088839 From luhenry at openjdk.org Wed Sep 18 13:53:08 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 18 Sep 2024 13:53:08 GMT Subject: RFR: 8339992: RISC-V: some minor improvements of base64_vector_decode_round In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 12:57:41 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple improvement? > Thanks. Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21059#pullrequestreview-2312768237 From fyang at openjdk.org Wed Sep 18 14:05:07 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 18 Sep 2024 14:05:07 GMT Subject: RFR: 8339992: RISC-V: some minor improvements of base64_vector_decode_round In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 12:57:41 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple improvement? > Thanks. Looks good. Thanks for making this change. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21059#pullrequestreview-2312803864 From fyang at openjdk.org Wed Sep 18 14:15:06 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 18 Sep 2024 14:15:06 GMT Subject: RFR: 8339992: RISC-V: some minor improvements of base64_vector_decode_round In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 12:57:41 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple improvement? > Thanks. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5361: > 5359: __ vmseq_vi(v0, outputV1, -1); > 5360: __ vfirst_m(failedIdx, v0); > 5361: Label NoFailure, FailureAt0Idx; Nit: Maybe rename `FailureAt0Idx` to `FailureAtIdx0`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21059#discussion_r1765140867 From jkarthikeyan at openjdk.org Wed Sep 18 14:20:14 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 18 Sep 2024 14:20:14 GMT Subject: RFR: 8340272: C2 SuperWord: JMH benchmark for Reduction vectorization In-Reply-To: References: Message-ID: <4wLEjszo6Wf5zVMTKOM-NxX7DcIzctxOG7LIElJyre0=.c77c41c7-e3e4-4ca2-a562-7d7c6cd996d3@github.com> On Tue, 17 Sep 2024 07:53:40 GMT, Emanuel Peter wrote: > I'm adding some proper JMH benchmarks for vectorized reductions. There are already some others, but they are not comprehensive or not JMH. > > Plus, I wanted to do a performance-investigation, hopefully leading to some improvements. **See Future Work below**. > > **How I run my benchmarks** > > All benchmarks > `make test TEST="micro:vm.compiler.VectorReduction2" CONF=linux-x64` > > Some specific benchmark, with profiler that tells me which code snippet is hottest: > `make test TEST="micro:vm.compiler.VectorReduction2.*doubleMinDotProduct" CONF=linux-x64 MICRO="OPTIONS=-prof perfasm"` > > **JMH logs** > > Run on my AVX512 laptop, with master: > [run_avx512_master.txt](https://github.com/user-attachments/files/17025111/run_avx512_master.txt) > > Run on remote asimd (aarch64, NEON) machine: > [run_asimd_master.txt](https://github.com/user-attachments/files/17025579/run_asimd_master.txt) > > **Results** > > I ran it on 2 machines so far. Left on my AVX512 machine, right on a ASIMD/NEON/aarch64 machine. > > Here the interesting `int / long / float / double` results, discussion further below: > ![image](https://github.com/user-attachments/assets/20abfa7b-aee6-4654-bf4d-e3abc4bbfc8b) > > > And there the less spectacular `byte / char / short` results. There is no vectorization of these cases. But there seems to be some issue with over-unrolling on my AVX512 machine, one case I looked at would only unroll 4x without SuperWord, but 16x with, and that seems to be unfavourable. > > ![image](https://github.com/user-attachments/assets/6e1c69cf-db6c-4d33-8750-c8797ffc39a2) > > Here the PDF: > [benchmark_results.pdf](https://github.com/user-attachments/files/17027695/benchmark_results.pdf) > > > **Why are all the ...Simple benchmarks not vectorizing, i.e. "not profitable"?** > > Apparently, there must be sufficient "work" vectors to outweith the "reduction" vectors. > The idea used to be that one should have at least 2 work vectors which tend to be profitable, to outweigh the cost of a single reduction vector. > > // Check if reductions are connected > if (is_marked_reduction(p0)) { > Node* second_in = p0->in(2); > Node_List* second_pk = get_pack(second_in); > if ((second_pk == nullptr) || (_num_work_vecs == _num_reductions)) { > // No parent pack or not enough work > // to cover reduction expansion overhead > return false; > } else if (second_pk->size() != p->size()) { > return false; > } > } > > > But when I disable this code, then I see on the aarch64/ASIMD machine: > > VectorReduction2.NoSuperword.intAddSimpl... The subword results seem quite tricky, especially since there are some things that performed well for me (like char) but ended up causing regressions for your machine. The long results are also quite strange, but it may just be random noise. I'll definitely make sure to investigate further. Thanks a lot for the analysis! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21032#issuecomment-2358604647 From epeter at openjdk.org Wed Sep 18 14:25:13 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Sep 2024 14:25:13 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v13] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 13:44:11 GMT, Jatin Bhateja wrote: > > Do we have tests for the publically exposed methods in this new file? Or are they only implicitly tested through the VectorAPI, and its tests? `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java` > > Why is this even called `VectorMath`? Because those ops are not at all restricted to vectorization, right? > > Nomenclature is suggested by Paul. @PaulSandoz Do you want to limit these **scalar** operations to a class name that implies **vector** use? > > We have sufficient test coverage of these APIs in JTREG tests. @jatin-bhateja I can't see any dedicated JTREG tests to the `VectorMath` methods. I only see the VectorAPI tests. Can you point me to the `VectorMath` tests? I'd like to review them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2358617657 From mli at openjdk.org Wed Sep 18 14:28:44 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 18 Sep 2024 14:28:44 GMT Subject: RFR: 8339992: RISC-V: some minor improvements of base64_vector_decode_round [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this simple improvement? > Thanks. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: rename ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21059/files - new: https://git.openjdk.org/jdk/pull/21059/files/9fa797a7..74cf53cf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21059&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21059&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21059.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21059/head:pull/21059 PR: https://git.openjdk.org/jdk/pull/21059 From fyang at openjdk.org Wed Sep 18 14:28:44 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 18 Sep 2024 14:28:44 GMT Subject: RFR: 8339992: RISC-V: some minor improvements of base64_vector_decode_round [v2] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 14:25:10 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this simple improvement? >> Thanks. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > rename Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21059#pullrequestreview-2312869400 From mli at openjdk.org Wed Sep 18 14:28:45 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 18 Sep 2024 14:28:45 GMT Subject: RFR: 8339992: RISC-V: some minor improvements of base64_vector_decode_round [v2] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 14:12:35 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> rename > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5361: > >> 5359: __ vmseq_vi(v0, outputV1, -1); >> 5360: __ vfirst_m(failedIdx, v0); >> 5361: Label NoFailure, FailureAt0Idx; > > Nit: Maybe rename `FailureAt0Idx` to `FailureAtIdx0`? Yeh, it makes sense! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21059#discussion_r1765158697 From mli at openjdk.org Wed Sep 18 14:41:13 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 18 Sep 2024 14:41:13 GMT Subject: RFR: 8339992: RISC-V: some minor improvements of base64_vector_decode_round [v2] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 13:50:25 GMT, Ludovic Henry wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> rename > > Marked as reviewed by luhenry (Committer). Thanks for your reviewing! @luhenry @RealFYang ------------- PR Comment: https://git.openjdk.org/jdk/pull/21059#issuecomment-2358653799 From mli at openjdk.org Wed Sep 18 14:41:14 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 18 Sep 2024 14:41:14 GMT Subject: Integrated: 8339992: RISC-V: some minor improvements of base64_vector_decode_round In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 12:57:41 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple improvement? > Thanks. This pull request has now been integrated. Changeset: ae39a660 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/ae39a6603c6c33a36dce30c3290a634b08a6bf05 Stats: 9 lines in 1 file changed: 4 ins; 0 del; 5 mod 8339992: RISC-V: some minor improvements of base64_vector_decode_round Reviewed-by: fyang, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/21059 From sviswanathan at openjdk.org Wed Sep 18 15:30:10 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 18 Sep 2024 15:30:10 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v2] In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Wed, 18 Sep 2024 12:23:48 GMT, Emanuel Peter wrote: > I'm a bit confused by the name `shuffleWrapIndexes` and `inline_vector_shuffle_wrap_indexes`. > > Are you **shuffling wrap-indexes**? I don't know what that would even mean. I think you should name it `wrapShuffleIndexes`. Or is there any naming convention in the VectorAPI that prevents this? Agree, wrapShuffleIndexes makes more sense. I will make the change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2358784233 From dhanalla at openjdk.org Wed Sep 18 16:10:05 2024 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Wed, 18 Sep 2024 16:10:05 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v2] In-Reply-To: References: Message-ID: <1wvg7YCL6ne-5LEBQ7Mi7fkrVn-d72W9_UcsrzvKho8=.d29bb5c9-9e83-42a7-a937-c304efb3b4dd@github.com> On Mon, 12 Aug 2024 18:31:06 GMT, Dhamoder Nalla wrote: >> In the debug build, the assert is triggered during the parsing (before Code_Gen). In the Release build, however, the compilation bails out at `Compile::check_node_count()` during the code generation phase and completes execution without any issues. >> >> When I commented out the assert(C->live_nodes() <= C->max_node_limit()), both debug and Release builds exhibited the same behavior: the compilation bails out, and execution completes without any issues. >> >> The assert statement is not essential, as it is causing unnecessary failures in the debug build. > > Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: > > add test case adding a comment to keep the PR active. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20504#issuecomment-2358875944 From qamai at openjdk.org Wed Sep 18 16:15:38 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 18 Sep 2024 16:15:38 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation Message-ID: Hi, This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. Regarding the related issues: - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. Please take a look and leave reviews. Thanks a lot. The description of the original PR: This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. Upon these changes, a `rearrange` can emit more efficient code: var species = IntVector.SPECIES_128; var v1 = IntVector.fromArray(species, SRC1, 0); var v2 = IntVector.fromArray(species, SRC2, 0); v1.rearrange(v2.toShuffle()).intoArray(DST, 0); Before: movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} vmovdqu 0x10(%r10),%xmm2 movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} vmovdqu 0x10(%r10),%xmm1 movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} vmovdqu 0x10(%r10),%xmm0 vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask ; {external_word} vpackusdw %xmm0,%xmm0,%xmm0 vpackuswb %xmm0,%xmm0,%xmm0 vpmovsxbd %xmm0,%xmm3 vpcmpgtd %xmm3,%xmm1,%xmm3 vtestps %xmm3,%xmm3 jne 0x00007fc2acb4e0d8 vpmovzxbd %xmm0,%xmm0 vpermd %ymm2,%ymm0,%ymm0 movabs $0x751588f98,%r10 ; {oop([I{0x0000000751588f98})} vmovdqu %xmm0,0x10(%r10) After: movabs $0x751589c78,%r10 ; {oop([I{0x0000000751589c78})} vmovdqu 0x10(%r10),%xmm1 movabs $0x75158ac88,%r10 ; {oop([I{0x000000075158ac88})} vmovdqu 0x10(%r10),%xmm2 vpxor %xmm0,%xmm0,%xmm0 vpcmpgtd %xmm2,%xmm0,%xmm3 vtestps %xmm3,%xmm3 jne 0x00007fa818b27cb1 vpermd %ymm1,%ymm2,%ymm0 movabs $0x751588c68,%r10 ; {oop([I{0x0000000751588c68})} vmovdqu %xmm0,0x10(%r10) ------------- Commit messages: - copyright year - remove LoadShuffle from riscv, whitespace - tighten concrete types - [vectorapi] Refactor VectorShuffle implementation Changes: https://git.openjdk.org/jdk/pull/21042/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21042&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310691 Stats: 4984 lines in 64 files changed: 2984 ins; 981 del; 1019 mod Patch: https://git.openjdk.org/jdk/pull/21042.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21042/head:pull/21042 PR: https://git.openjdk.org/jdk/pull/21042 From qamai at openjdk.org Wed Sep 18 16:15:38 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 18 Sep 2024 16:15:38 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 16:13:55 GMT, Quan Anh Mai wrote: > Hi, > > This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. > > Regarding the related issues: > > - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. > - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` > - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. > > Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. > > Please take a look and leave reviews. Thanks a lot. > > The description of the original PR: > > This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: > > Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. > Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. > Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. > Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; {ex... @PaulSandoz What do you think regarding x86-32? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2356451016 From psandoz at openjdk.org Wed Sep 18 16:15:38 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 18 Sep 2024 16:15:38 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 16:59:07 GMT, Quan Anh Mai wrote: > @PaulSandoz What do you think regarding x86-32? I don't see anything obvious in the changes of this PR that would affect x86-32, but i ain't a HotSpot expert. Perhaps this just exacerbates some existing bug?@sviswa7 what do you think? My sense is to proceed, and problem list the failure, and attempt to find the source of the failure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2357043269 From sviswanathan at openjdk.org Wed Sep 18 16:15:38 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 18 Sep 2024 16:15:38 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation In-Reply-To: References: Message-ID: <5oex8dW9c0zy1RNdWjuA3bpaxACV_QGt3iij6SJ2kZ8=.a3d18f15-ed92-4d79-9df8-9e2d828fb33c@github.com> On Tue, 17 Sep 2024 22:29:01 GMT, Paul Sandoz wrote: > > @PaulSandoz What do you think regarding x86-32? > > I don't see anything obvious in the changes of this PR that would affect x86-32, but i ain't a HotSpot expert. Perhaps this just exacerbates some existing bug?@sviswa7 what do you think? > > My sense is to proceed, and problem list the failure, and attempt to find the source of the failure. Yes, let us proceed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2357222835 From qamai at openjdk.org Wed Sep 18 16:15:38 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 18 Sep 2024 16:15:38 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 22:29:01 GMT, Paul Sandoz wrote: >> @PaulSandoz What do you think regarding x86-32? > >> @PaulSandoz What do you think regarding x86-32? > > I don't see anything obvious in the changes of this PR that would affect x86-32, but i ain't a HotSpot expert. Perhaps this just exacerbates some existing bug?@sviswa7 what do you think? > > My sense is to proceed, and problem list the failure, and attempt to find the source of the failure. @PaulSandoz @sviswa7 Thanks for your advice, I have made the PR ready for review @iwanowww Could you take another look at this, please? @jatin-bhateja Could you verify that [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) does not occur? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2358879014 From jbhateja at openjdk.org Wed Sep 18 16:22:30 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 18 Sep 2024 16:22:30 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v15] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Test cleanups. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/5253706e..f81b2525 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=13-14 Stats: 370 lines in 11 files changed: 10 ins; 360 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From jbhateja at openjdk.org Wed Sep 18 16:26:13 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 18 Sep 2024 16:26:13 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v13] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 14:22:16 GMT, Emanuel Peter wrote: > > > Do we have tests for the publically exposed methods in this new file? Or are they only implicitly tested through the VectorAPI, and its tests? `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java` > > > Why is this even called `VectorMath`? Because those ops are not at all restricted to vectorization, right? > > > > > > Nomenclature is suggested by Paul. > > @PaulSandoz Do you want to limit these **scalar** operations to a class name that implies **vector** use? > > > We have sufficient test coverage of these APIs in JTREG tests. > > @jatin-bhateja I can't see any dedicated JTREG tests to the `VectorMath` methods. I only see the VectorAPI tests. Can you point me to the `VectorMath` tests? I'd like to review them. https://github.com/openjdk/jdk/pull/20507/files#diff-6031c9066a7d7a90cc002e93a1eb64f0371f09d385f42289d202426cc7516d2fR3019-R3264 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2358909462 From psandoz at openjdk.org Wed Sep 18 16:56:13 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 18 Sep 2024 16:56:13 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v13] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 14:22:16 GMT, Emanuel Peter wrote: > > > Why is this even called `VectorMath`? Because those ops are not at all restricted to vectorization, right? > > > > > > Nomenclature is suggested by Paul. > > @PaulSandoz Do you want to limit these **scalar** operations to a class name that implies **vector** use? > It's whatever math functions are required to in support of vector operations (as the JavaDoc indicates) that are not provided by other classes such as the boxed primitives or `java.lang.Math`. /** * The class {@code VectorMath} contains methods for performing * scalar numeric operations in support of vector numeric operations. */ public final class VectorMath { These are referenced by the vector operators e.g., /** Produce saturating {@code a+b}. Integral only. * @see VectorMath#addSaturating(int, int) */ public static final Binary SADD = binary("SADD", "+", VectorSupport.VECTOR_OP_SADD, VO_NOFP); And in addition these methods would be used by any tail computation (and the fallback code). At the moment we are uncertain whether such operations should reside elsewhere and we did not want to block progress. I am not beholden to the name, but so far i cannot think of a concise alternative.`VectorOperatorMath` is arguably more precise but more verbose. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2358973116 From duke at openjdk.org Wed Sep 18 16:58:17 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Wed, 18 Sep 2024 16:58:17 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option Message-ID: Disabling the JVMCI compiler with `-XX:-UseJVMCICompiler` not only deactivates JVMCI-based host method compilations but also prevents the loading of the libjvmci compiler. While this works as expected for host method compilations, it poses issues for the Truffle compiler. When `-XX:-UseJVMCICompiler` is used, Truffle falls back to the jargraal compiler, if available. This behavior may be confusing for Truffle users. Expected behavior: With `-XX:+UseGraalJIT`, both host method compilations and Truffle compilations should utilize the libjvmci compiler, if available. With `-XX:+EnableJVMCI`, host method compilations should use the C2 compiler, while only Truffle compilations should leverage the libjvmci compiler, if available. ------------- Commit messages: - JDK-8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option Changes: https://git.openjdk.org/jdk/pull/21069/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21069&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340398 Stats: 9 lines in 2 files changed: 6 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21069.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21069/head:pull/21069 PR: https://git.openjdk.org/jdk/pull/21069 From sviswanathan at openjdk.org Wed Sep 18 17:00:30 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 18 Sep 2024 17:00:30 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v3] In-Reply-To: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: > Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. > > Summary of changes is as follows: > 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. > 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code > > For the following source: > > > public void test() { > var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); > for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { > var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); > index.selectFrom(inpvect).intoArray(byteres, j); > } > } > > > The code generated for inner main now looks as follows: > ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 > 0x00007f40d02274d0: movslq %ebx,%r13 > 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 > 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) > 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 > 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) > 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 > 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) > 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 > 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) > 0x00007f40d022751f: add $0x40,%ebx > 0x00007f40d0227522: cmp %r8d,%ebx > 0x00007f40d0227525: jl 0x00007f40d02274d0 > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Change method name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20634/files - new: https://git.openjdk.org/jdk/pull/20634/files/428f2289..87e103ee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20634&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20634&range=01-02 Stats: 45 lines in 37 files changed: 0 ins; 0 del; 45 mod Patch: https://git.openjdk.org/jdk/pull/20634.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20634/head:pull/20634 PR: https://git.openjdk.org/jdk/pull/20634 From psandoz at openjdk.org Wed Sep 18 17:07:11 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 18 Sep 2024 17:07:11 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v15] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 16:22:30 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Test cleanups. > > > Do we have tests for the publically exposed methods in this new file? Or are they only implicitly tested through the VectorAPI, and its tests? `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java` > > We have sufficient test coverage of these APIs in JTREG tests. > > @jatin-bhateja I can't see any dedicated JTREG tests to the `VectorMath` methods. I only see the VectorAPI tests. Can you point me to the `VectorMath` tests? I'd like to review them. I think Jatin is relying on the vector tests to also test the scalar operations by virtue that eventually the scalar result will be compared with the C2 result. Although both might produced the same result both maybe incorrect! We need some independent scalar tests, especially so if later on these are also made intrinsic. I shall volunteer to add some. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2358992714 From psandoz at openjdk.org Wed Sep 18 17:21:04 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 18 Sep 2024 17:21:04 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 16:13:55 GMT, Quan Anh Mai wrote: > Hi, > > This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. > > Regarding the related issues: > > - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. > - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` > - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. > > Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. > > Please take a look and leave reviews. Thanks a lot. > > The description of the original PR: > > This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: > > Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. > Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. > Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. > Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; {ex... Will this have any direct impact on the changes proposed by #20508 and #20634? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2359018872 From sviswanathan at openjdk.org Wed Sep 18 17:26:06 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 18 Sep 2024 17:26:06 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 17:18:42 GMT, Paul Sandoz wrote: > Will this have any direct impact on the changes proposed by #20508 and #20634? I think we should first get the 20508 and 20634 integrated before this one. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2359026443 From rcastanedalo at openjdk.org Wed Sep 18 17:45:51 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 18 Sep 2024 17:45:51 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v24] In-Reply-To: References: Message-ID: <-TKP8SOh4eIFRiOcn-a3aRb_GEgApaWv9vhyEOiKJSw=.d4ade865-3cb1-477d-a7c7-f46449553a64@github.com> > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion - Remove redundant comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/13b93bd9..d54d67f1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=22-23 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From qamai at openjdk.org Wed Sep 18 17:51:13 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 18 Sep 2024 17:51:13 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 16:13:55 GMT, Quan Anh Mai wrote: > Hi, > > This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. > > Regarding the related issues: > > - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. > - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` > - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. > > Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. > > Please take a look and leave reviews. Thanks a lot. > > The description of the original PR: > > This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: > > Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. > Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. > Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. > Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; {ex... Got it, I think https://github.com/openjdk/jdk/pull/20508 and this PR are unrelated implementation-wise, though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2359070587 From epeter at openjdk.org Wed Sep 18 18:36:24 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Sep 2024 18:36:24 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v19] In-Reply-To: References: Message-ID: <5doWlzeBLP0hFJTsfm29lgfbacuS53uLbJzaBiv_exM=.71055f4e-45dc-4fb3-8947-a9761534cf77@github.com> On Thu, 12 Sep 2024 16:47:27 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > refine comments I see there are lots more comments. I'm out of time today, so I'll just drop a few comments. Looking forward to reading the bulk of the code soon! src/hotspot/share/opto/rangeinference.cpp line 44: > 42: return {true, false, {}}; > 43: } > 44: }; And here we have another `optional`, right? This could be some kind of `Status>`. Where `Status` has a `progress` and `payload` (the optional). And the `Optional` has a `present` and `payload` (the `T`). Boah not sure if this level of abstration I'm thinking about here is worth it. But it would be nice if your comments were consistently above the field or on the same line. And a quick comment above the class would be appreciated too ;) src/hotspot/share/opto/rangeinference.cpp line 64: > 62: return {false, {}, {}}; > 63: } > 64: }; Yet another `Optional`? src/hotspot/share/opto/rangeinference.hpp line 46: > 44: * Bits that are known to be 0 or 1. A value v satisfies this constraint iff > 45: * (v & zeros) == 0 && (v & ones) == ones. I.e, all bits that is set in zeros > 46: * must be unset in v, and all bits that is set in ones must be set in v. Suggestion: * (v & zeros) == 0 && (v & ones) == ones. I.e, any bit that is set in zeros * must be unset in v, and any bit that is set in ones must be set in v. I think that is more correct "math speak". src/hotspot/share/opto/rangeinference.hpp line 47: > 45: * (v & zeros) == 0 && (v & ones) == ones. I.e, all bits that is set in zeros > 46: * must be unset in v, and all bits that is set in ones must be set in v. > 47: * You could also make a table like this: zeros ones allowed bits 0 0 0 or 1 1 0 0 0 1 1 1 1 none (impossible state) src/hotspot/share/opto/rangeinference.hpp line 75: > 73: > 74: template > 75: class CanonicalizedTypeIntPrototype { Ah, is this basically an **Optional**, like `std::optional`? Maybe add a comment above the class for that! Honestly, you could also just define a `Optional` class. Maybe even in a separate `optional.hpp` file. I'm sure we can use this construct again. src/hotspot/share/opto/rangeinference.hpp line 86: > 84: > 85: template > 86: class TypeIntPrototype { Hmm ok, and what does `Prototype` mean here? Not really "optional". Probably this is the whole type-information needed for an `IntType` or `LongType`? A comment for the class would be appreciated :) src/hotspot/share/opto/rangeinference.hpp line 105: > 103: > 104: // The result is tuned down by one since we do not have empty type > 105: // and this is not required to be accurate I don't understand this comment. Why does it not need to be accurate? Maybe you should rather focus on what this method is intuitively supposed to do, and then define what it guarantees for its return? src/hotspot/share/opto/rangeinference.hpp line 107: > 105: // and this is not required to be accurate > 106: template > 107: U cardinality_from_bounds(const RangeInt& srange, const RangeInt& urange) { Do you want to add some `static_assert` for the `S / U` types here? src/hotspot/share/opto/rangeinference.hpp line 146: > 144: > 145: void int_type_dump(const TypeInt* t, outputStream* st, bool verbose); > 146: void int_type_dump(const TypeLong* t, outputStream* st, bool verbose); All these names are exposed globally. Are we sure we want to do that? Or should we maybe put them in a class as static methods? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2312603872 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1765526324 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1765527916 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1765009043 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1765005303 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1765176515 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1765184929 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1765190821 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1765188144 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1765193860 From epeter at openjdk.org Wed Sep 18 18:36:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Sep 2024 18:36:25 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v18] In-Reply-To: References: <5rE5jUqKmzecH6jMAXpaObv9xYRz3Xi1SCvCKhAQJ9o=.010bec0b-856d-4d71-94c8-7e02f0402a4e@github.com> Message-ID: On Thu, 12 Sep 2024 16:48:10 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/rangeinference.hpp line 69: >> >>> 67: return (v & _zeros) == 0 && (v & _ones) == _ones; >>> 68: } >>> 69: }; >> >> It will be good if we add basic operations to KnowBits like. >> KnownBits.getMaxValue() returning ~ZEROS >> KnownBits.getMinValue() returning ONE >> KnownBits.and(KnownBits arg) >> KnownBits.or(KnownBits arg) >> KnownBits.xor(KnownBits args) >> KnownBits.not() >> >> >> These can be quite handy during data flow analysis using KnownBits > > Yes I think they would be helpful in later patches when implementing `Value` methods of several nodes to take advantage of additional `TypeInt` information. Sounds good. Yes, it's great that you split this up and do it step by step! Makes it more manageable to review ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1765172629 From epeter at openjdk.org Wed Sep 18 18:56:14 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Sep 2024 18:56:14 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v9] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Mon, 26 Aug 2024 23:25:47 GMT, Kangcheng Xu wrote: >> Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. >> >> Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. > > Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 27 additional commits since the last revision: > > - Merge branch 'master' into long-typed-parallel-iv > - use @run driver and Argument.RANDOM_ONCE > - Merge branch 'master' into long-typed-parallel-iv > - add random strides to tests > - fix tests on larger strides > - add more expressive comments and test cases > - Merge branch 'master' into long-typed-parallel-iv > - update comments to clarify on type casting > - add pseudocode for subgraphs before/after the transformation > - remove WIP support for long counted loops > - ... and 17 more: https://git.openjdk.org/jdk/compare/e94d582e...20bdc791 Sorry for the delay, I was out of the office and am going on vacation again soon - lots to do not enough time. Things look much better now! A few more comments. For the optmization to work, we need constant `stride_con` and `stride_con2`. But `init2`, `init` and `limit` are variables. I am missing tests where they are actually variables, I think you at most make `init` and `limit` variables, but only individually. And you always have `init2 = 0`. I think it would be quite important to have some variable cases as well. As I said: I'll be away soon, so feel free to ping other reviewers if I don't respond ;) src/hotspot/share/opto/loopnode.cpp line 4000: > 3998: > 3999: // The ratio of the two strides cannot be represented as an int > 4000: // if stride_con2 is min_int and stride_con is -1. You could adjust the comment to mention both `min_jint and min_jlong`. src/hotspot/share/opto/loopnode.cpp line 4052: > 4050: > 4051: Node* ratio_idx = MulNode::make(phi_converted, ratio, stride_con2_bt); > 4052: _igvn.register_new_node_with_optimizer(ratio_idx, phi_converted); This block has some code duplication. Could you refactor it somehow to make it more concise? test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java line 31: > 29: import compiler.lib.ir_framework.IRNode; > 30: import compiler.lib.ir_framework.Test; > 31: import compiler.lib.ir_framework.TestFramework; Suggestion: import compiler.lib.ir_framework.*; I think that should suffice. But up to you. test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java line 51: > 49: stride = new Random().nextInt(1, Integer.MAX_VALUE / 16); > 50: stride2 = stride * new Random().nextInt(1, 16); > 51: } So `stride` and `stride2` are compile-time constants. That is intended, right? test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java line 75: > 73: > 74: return a; > 75: } You have `failOn` for every test. I'm worried that something with the test could be so wrong that we just don't have a `LoopNode` for some completely wrong reason. Can you please add a "control test", that actually has a simple loop, and where you can find it with an `@IR` check? test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java line 255: > 253: > 254: return a; > 255: } I'm also missing some cases where both the `init` and `limit` are variable. You could also sprinkle in a case or two where we use `<=` instead of `<`. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18489#pullrequestreview-2313484847 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1765532812 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1765538519 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1765539228 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1765545340 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1765542098 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1765544249 From dlong at openjdk.org Wed Sep 18 19:03:14 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 18 Sep 2024 19:03:14 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v18] In-Reply-To: References: <5rE5jUqKmzecH6jMAXpaObv9xYRz3Xi1SCvCKhAQJ9o=.010bec0b-856d-4d71-94c8-7e02f0402a4e@github.com> Message-ID: On Thu, 12 Sep 2024 16:49:22 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/type.hpp line 661: >> >>> 659: // the below constraints, see contains(jint) >>> 660: const jint _lo, _hi; // Lower bound, upper bound in the signed domain >>> 661: const juint _ulo, _uhi; // Lower bound, upper bound in the unsigned domain >> >> Can't we do without explicit fields to record unsigned hi / lo ? >> We just need to present a unsigned view of signed _lo and _hi which can be done using safe macros. > > No we can't, consider `TypeInt::NON_ZERO`. It would have `_lo = min_jint`, `_hi = max_jint`, `_zeros = 0`, `_ones = 0`. Which make it impossible to distinguish from `TypeInt::INT` without unsigned bounds. Ignoring the unsigned issue for a moment, and going back to Dual, if we had the concept of Complement, we could represent NON_ZERO as the complement of 0 <= x <= 0, which would be x > 0 || x < 0. In general, the complement of lo <= x <= hi would be x > hi || x < lo, in contrast to the dual, which I believe is defined as the non-intuitive hi <= x <= lo. I think complement would allow us to represent more complicated expressions, such as !(x>=lo && x<=hi). If both dual and complement can be used to map between join and meet, then of the two complement seems more attractive and intuitive. But maybe there is another reason we need dual than I'm missing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1765559677 From qamai at openjdk.org Wed Sep 18 19:51:14 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 18 Sep 2024 19:51:14 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v19] In-Reply-To: <5doWlzeBLP0hFJTsfm29lgfbacuS53uLbJzaBiv_exM=.71055f4e-45dc-4fb3-8947-a9761534cf77@github.com> References: <5doWlzeBLP0hFJTsfm29lgfbacuS53uLbJzaBiv_exM=.71055f4e-45dc-4fb3-8947-a9761534cf77@github.com> Message-ID: On Wed, 18 Sep 2024 14:33:05 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> refine comments > > src/hotspot/share/opto/rangeinference.hpp line 75: > >> 73: >> 74: template >> 75: class CanonicalizedTypeIntPrototype { > > Ah, is this basically an **Optional**, like `std::optional`? Maybe add a comment above the class for that! > > Honestly, you could also just define a `Optional` class. Maybe even in a separate `optional.hpp` file. I'm sure we can use this construct again. I have thought about that but implementing an `Optional` itself is not very trivial, so I stick to these which has the additional benefit of being specific to what it is used for. This is my prototype which I have not thought out too closely yet. template class Optional { private: // A union may have no member active, which is needed here // A char[sizeof(T)] would need std::launder to work properly union { T _value; }; bool _present; public: Optional() : _present(false) {} Optional(const T& val) : _present(true) { ::new(&_value) T(val); } template explicit Optional(InPlace tag, Ts... args) : _present(true) { ::new(&_value) T(args...); } Optional(const Optional& o) : _present(o._present) { if (_present) { ::new(&_value) T(o._value); } } Optional& operator=(const Optional& o) { if (_present) { _value.~T(); } _present = o._present; if (_present) { ::new(&_value) T(o._value); } } ~Optional() { if (_present) { _value.~T(); } } bool has_value() const { return _present; } T& value() { assert(_present, "empty optional"); return _value; } const T& value() const { assert(_present, "empty optional"); return _value; } }; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1765622150 From qamai at openjdk.org Wed Sep 18 20:29:32 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 18 Sep 2024 20:29:32 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v20] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: add comments, refactor functions to helper class ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/25643785..8a5370a1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=18-19 Stats: 138 lines in 3 files changed: 45 ins; 16 del; 77 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From kxu at openjdk.org Wed Sep 18 20:42:41 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 18 Sep 2024 20:42:41 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v2] In-Reply-To: References: <-TWzHU1sjg49BOttKN_muQ6ICHAMbAnBoNUurRbqHDg=.a27bbf72-a451-4257-914f-d37e181ce257@github.com> Message-ID: <8jdKFn_Bln3lPK1vO8UZyUakbwv_gBvKLd-MutObCg0=.bf55c55e-b906-4d90-9493-e26ba2d87298@github.com> On Tue, 17 Sep 2024 09:37:31 GMT, Roland Westrelin wrote: >> Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into arithmetic-canonicalization >> - Merge pull request #1 from tabjy/arithmetic-canonicalization-v2 >> >> Arithmetic canonicalization v2 >> - remove dead code >> - fix potential void type const nodes >> - refactor and cleanup >> - add more test cases >> - re-implement depth limit on recursion >> - passes TestIRLShiftIdeal_XPlusX_LShiftC >> - passes AddI[L]NodeIdealizationTests >> - revert depth limits >> - ... and 12 more: https://git.openjdk.org/jdk/compare/72ec1422...c8fdb74c > > src/hotspot/share/opto/addnode.cpp line 427: > >> 425: } >> 426: >> 427: Node* con = (bt == T_INT) ? (Node*) phase->intcon((jint) factor) : (Node*) phase->longcon(factor); > > You can use `integercon()` and pass `bt` I disagree: `integercon()` internally uses `checked_cast(l)` to make prevent information loss during type conversion and asserts at runtime if the value is larger what a `jint` can hold. However, such an information loss is intended for integer arithmetic overflows. (e.g., `Integer.MAX_VALUE * a + a` is extracted to `((jlong) Integer.MAX_VALUE + (jlong) 1) * a`. Here we want `Integer.MAX_VALUE + 1` to overflow to `(int) Integer.MIN_VALUE`). If I were to use `integercon()`, the best I could do is `intgercon(bt == T_INT ? (jint) factor : factor)` which is rather pointless. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1765694385 From dholmes at openjdk.org Thu Sep 19 01:33:35 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 19 Sep 2024 01:33:35 GMT Subject: RFR: 8340363: User-specified default decorators for UnifiedLogging In-Reply-To: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: On Fri, 13 Sep 2024 09:03:55 GMT, Ant?n Seoane wrote: > Hi all, > > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: > > - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. > - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. > - Additionally, defaults may target a specific log level. > > Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. > > In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. > > Please consider this PR, and thanks! A few comments ... First these are not "user-specified" decorators, but more developer-specified, but more simply they are "custom default decorators". Second I'm not clear how tag combinations actually work (is this the "specificity" you refer to? I'm not sure what that means.). E.g. is `-Xlog:jit+inlining` treated the same as `-Xlog:inlining+jit`? And what about wildcards? Will `jit*` trigger this `jit+inlining` default? > Decorators are also associated with an output file Right this is key design point in UL: decorators are defined per output. But you are now trying to change that so that decorators are associated with tags instead. This seems a significant deviation from the design. In any case I'm unclear exactly what happens in this PR - if we log to stdout we can have different decorators, but if we log to a real file they are all disabled? Or does any attempt to control decorators for tags going to the same output result in them all being ignored? Finally, this is really subjective. You'd really need to socialise the actual proposed changes to the defaults independent of any mechanism to allow it. ------------- PR Review: https://git.openjdk.org/jdk/pull/20988#pullrequestreview-2314164523 From dholmes at openjdk.org Thu Sep 19 01:36:36 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 19 Sep 2024 01:36:36 GMT Subject: RFR: 8340363: User-specified default decorators for UnifiedLogging In-Reply-To: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: On Fri, 13 Sep 2024 09:03:55 GMT, Ant?n Seoane wrote: > Hi all, > > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: > > - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. > - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. > - Additionally, defaults may target a specific log level. > > Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. > > In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. > > Please consider this PR, and thanks! This affects all hotspot developers using UL so extending coverage: ------------- PR Comment: https://git.openjdk.org/jdk/pull/20988#issuecomment-2359737246 From amitkumar at openjdk.org Thu Sep 19 05:06:40 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 19 Sep 2024 05:06:40 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded [v5] In-Reply-To: <9Lc9Ej1toiiI8QFYXndprsPo4l-g20XWPDw5g9l36Fk=.091f48aa-5559-4862-8968-e4283d3fa728@github.com> References: <9Lc9Ej1toiiI8QFYXndprsPo4l-g20XWPDw5g9l36Fk=.091f48aa-5559-4862-8968-e4283d3fa728@github.com> Message-ID: On Fri, 6 Sep 2024 17:51:15 GMT, Jorn Vernee wrote: >> As discussed in the JBS issue: >> >> FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` >> >> However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. >> >> The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. >> >>

>> Performance numbers >> x64: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op >> >> >> aarch64: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op >> >>
>> >> As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. >> >> Testing: tier 1-4 > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > remove PC save/restore on s390 src/hotspot/cpu/s390/stubGenerator_s390.cpp line 3063: > 3061: address start = __ pc(); > 3062: > 3063: __ resolve_jobject(Z_ARG1, Z_tmp_1, Z_tmp_2); @JornVernee is it possible to rebase to head stream ? If there are no issue on other archs :-) I have implemented `resolve_global_jobject` method for s390x and merged changes to head stream with PR https://github.com/openjdk/jdk/pull/20986. Once rebased you can push below change. Suggestion: __ resolve_global_jobject(Z_ARG1, Z_tmp_1, Z_tmp_2); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1766163197 From epeter at openjdk.org Thu Sep 19 06:31:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 19 Sep 2024 06:31:43 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v13] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 16:53:53 GMT, Paul Sandoz wrote: > > > > Why is this even called `VectorMath`? Because those ops are not at all restricted to vectorization, right? > > > > > > > > > Nomenclature is suggested by Paul. > > > > > > @PaulSandoz Do you want to limit these **scalar** operations to a class name that implies **vector** use? > > It's whatever math functions are required to in support of vector operations (as the JavaDoc indicates) that are not provided by other classes such as the boxed primitives or `java.lang.Math`. Ok. I suppose these methods could eventually be moved to `java.lang.Math` or some other `java.lang` class, when the VectorAPI goes out of incubator mode? I feel like these saturating operations, and also the unsigned ops could find a more wider use, away from (explicit) vector usage. For example, the saturating operations are nice because they prevent overflows, and in some cases that would be very nice to have readily available. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2360099389 From epeter at openjdk.org Thu Sep 19 06:40:39 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 19 Sep 2024 06:40:39 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access [v2] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 09:06:22 GMT, Tobias Holenstein wrote: >> We failed in `LibraryCallKit::inline_unsafe_access()` while trying to inline `Unsafe::getShortUnaligned`. >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java#L86 >> The reason is that base (the array) is `ConP #null` hidden behind two `CheckCastPP` with `speculative=byte[int:>=0]` >> >> We call `Node* adr = make_unsafe_address(base, offset, type, kind == Relaxed);` >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2361 >> - with **base** = `147 CheckCastPP` >> - `118 ConP === 0 [[[ 106 101 71 ] #null` >> type >> >> Depending on the **offset** we go two different paths in `LibraryCallKit::make_unsafe_address` which both lead to the same error in the end. >> 1. For `UNSAFE.getShortUnaligned(array, 1_049_000)` we get kind = `Type::AnyPtr` because `offset >= os::vm_page_size()`. Since we assume base can't be null we insert an assert: >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2111 >> >> 2. whereas for `UNSAFE.getShortUnaligned(array, 1)` we get kind = `Type:: OopPtr` >> https://github.com/openjdk/jdk/blob/c17fa910cf3bad48547a3f0d68a30795ec3194e6/src/hotspot/share/opto/library_call.cpp#L2078 >> and insert a null check >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2090 >> In both cases we return call `basic_plus_adr(..)` on a base being `top()` which returns **adr** = `1 Con === 0 [[ ]] #top` >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2386 => `_gvn.type(adr)` is _top_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2394 => `adr_type` is _nullptr_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2405-L2406 => `BasicType bt` is _T_ILLEGAL_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2424 => we fail here with `SIGSEGV: null pointer dereference` because `alias_type->adr_type()` is _nullptr_ >> >> ### Fix (updated on 18th Sep 2024) >> T... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Fix 2.0 : Add uncast in LibraryCallKit::classify_unsafe_addr Generally looks reasonable. Why does the `null` end up above the `CheckCastPP`, and why does the `CheckCastPP` not get constant folded to `null`? Maybe there is no `IGVN` happening since this pattern was created - and that is expected? test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java line 34: > 32: * -XX:+IgnoreUnrecognizedVMOptions > 33: * -XX:TypeProfileLevel=222 > 34: * -XX:+AlwaysIncrementalInline Could it make sense to have a run without all these extra flags? That would allow us to set different values from the outside - maybe that triggers some other (related?) bug down the line. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20033#pullrequestreview-2314519224 PR Review Comment: https://git.openjdk.org/jdk/pull/20033#discussion_r1766235052 From jbhateja at openjdk.org Thu Sep 19 06:44:20 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 19 Sep 2024 06:44:20 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v16] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Tests for newly added VectorMath.* operations ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/f81b2525..bc08bab5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=14-15 Stats: 331 lines in 13 files changed: 282 ins; 44 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From jbhateja at openjdk.org Thu Sep 19 06:44:20 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 19 Sep 2024 06:44:20 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v13] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 14:22:16 GMT, Emanuel Peter wrote: > > > Do we have tests for the publically exposed methods in this new file? Or are they only implicitly tested through the VectorAPI, and its tests? `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java` > > > Why is this even called `VectorMath`? Because those ops are not at all restricted to vectorization, right? > > > > > > Nomenclature is suggested by Paul. > > @PaulSandoz Do you want to limit these **scalar** operations to a class name that implies **vector** use? > > > We have sufficient test coverage of these APIs in JTREG tests. > > @jatin-bhateja I can't see any dedicated JTREG tests to the `VectorMath` methods. I only see the VectorAPI tests. Can you point me to the `VectorMath` tests? I'd like to review them. Hi @eme64 , @PaulSandoz Yes dedicated test for each of newly added VectorMath operation is justified here. Thanks, let me know if there are other comments. > > > > > Why is this even called `VectorMath`? Because those ops are not at all restricted to vectorization, right? > > > > > > > > > > > > Nomenclature is suggested by Paul. > > > > > > > > > @PaulSandoz Do you want to limit these **scalar** operations to a class name that implies **vector** use? > > > > > > It's whatever math functions are required to in support of vector operations (as the JavaDoc indicates) that are not provided by other classes such as the boxed primitives or `java.lang.Math`. > > Ok. I suppose these methods could eventually be moved to `java.lang.Math` or some other `java.lang` class, when the VectorAPI goes out of incubator mode? > > I feel like these saturating operations, and also the unsigned ops could find a more wider use, away from (explicit) vector usage. For example, the saturating operations are nice because they prevent overflows, and in some cases that would be very nice to have readily available. Hi @eme64 , yes that what our extended plan is, for this patch we want to restrict its use to VectorAPI. > > > Do we have tests for the publically exposed methods in this new file? Or are they only implicitly tested through the VectorAPI, and its tests? `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java` > > > Why is this even called `VectorMath`? Because those ops are not at all restricted to vectorization, right? > > > > > > Nomenclature is suggested by Paul. > > @PaulSandoz Do you want to limit these **scalar** operations to a class name that implies **vector** use? > > > We have sufficient test coverage of these APIs in JTREG tests. > > @jatin-bhateja I can't see any dedicated JTREG tests to the `VectorMath` methods. I only see the VectorAPI tests. Can you point me to the `VectorMath` tests? I'd like to review them. @eme64 , @PaulSandoz , I agree that explicit test for all newly added VectorMath operation for all integral types is justified here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2360118143 From jbhateja at openjdk.org Thu Sep 19 06:53:15 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 19 Sep 2024 06:53:15 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v17] In-Reply-To: References: Message-ID: <-L7RYBQd-Q6zLkv5GKU0PDM2SZ-jdm1zAk1VRedDgyM=.c712848d-145b-4ecd-af2f-1a811832559d@github.com> > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Tuning extra spaces. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/bc08bab5..eb2960a9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=15-16 Stats: 38 lines in 1 file changed: 0 ins; 0 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From jbhateja at openjdk.org Thu Sep 19 06:55:45 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 19 Sep 2024 06:55:45 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v13] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 06:41:16 GMT, Jatin Bhateja wrote: > > > > > Why is this even called `VectorMath`? Because those ops are not at all restricted to vectorization, right? > > > > > > > > > > > > Nomenclature is suggested by Paul. > > > > > > > > > @PaulSandoz Do you want to limit these **scalar** operations to a class name that implies **vector** use? > > > > > > It's whatever math functions are required to in support of vector operations (as the JavaDoc indicates) that are not provided by other classes such as the boxed primitives or `java.lang.Math`. > > Ok. I suppose these methods could eventually be moved to `java.lang.Math` or some other `java.lang` class, when the VectorAPI goes out of incubator mode? > > I feel like these saturating operations, and also the unsigned ops could find a more wider use, away from (explicit) vector usage. For example, the saturating operations are nice because they prevent overflows, and in some cases that would be very nice to have readily available. Hi @eme64 , as per @PaulSandoz and @jddarcy we should wait till Valhalla preview to add full blown unsigned value type and associated operations, for the time being restricting the scope of these new operations to VectorAPI. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2360137469 From jbhateja at openjdk.org Thu Sep 19 07:10:38 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 19 Sep 2024 07:10:38 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v3] In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: <-SvKZpGY6NbQyh2PnmV5--a8f4oKdSq3VQKV2siSawg=.c812df74-12d4-428b-a7f9-5b1945cdae39@github.com> On Wed, 18 Sep 2024 17:00:30 GMT, Sandhya Viswanathan wrote: >> Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. >> >> Summary of changes is as follows: >> 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. >> 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code >> >> For the following source: >> >> >> public void test() { >> var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); >> for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { >> var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); >> index.selectFrom(inpvect).intoArray(byteres, j); >> } >> } >> >> >> The code generated for inner main now looks as follows: >> ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 >> 0x00007f40d02274d0: movslq %ebx,%r13 >> 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) >> 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) >> 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) >> 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) >> 0x00007f40d022751f: add $0x40,%ebx >> 0x00007f40d0227522: cmp %r8d,%ebx >> 0x00007f40d0227525: jl 0x00007f40d02274d0 >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Change method name src/hotspot/share/opto/vectorIntrinsics.cpp line 772: > 770: > 771: if (elem_klass == nullptr || shuffle_klass == nullptr || shuffle->is_top() || vlen == nullptr) { > 772: return false; // dead code Why dead code in comment ? this is a failed intrinsification condition. src/hotspot/share/opto/vectorIntrinsics.cpp line 776: > 774: if (!vlen->is_con() || shuffle_klass->const_oop() == nullptr) { > 775: return false; // not enough info for intrinsification > 776: } Why don't you club it with above conditions to be consistent with other inline expanders ? src/hotspot/share/opto/vectorIntrinsics.cpp line 790: > 788: // Shuffles use byte array based backing storage > 789: BasicType shuffle_bt = T_BYTE; > 790: No need a of new line b/w 789 and 791 src/hotspot/share/opto/vectorIntrinsics.cpp line 793: > 791: if (!arch_supports_vector(Op_AndV, num_elem, shuffle_bt, VecMaskNotUsed) || > 792: !arch_supports_vector(Op_Replicate, num_elem, shuffle_bt, VecMaskNotUsed)) { > 793: return false; You should emit proper intrinsification failure message here. src/hotspot/share/opto/vectorIntrinsics.cpp line 805: > 803: const TypeVect* vt = TypeVect::make(shuffle_bt, num_elem); > 804: const Type* shuffle_type_bt = Type::get_const_basic_type(shuffle_bt); > 805: No need of a blank line here. src/hotspot/share/opto/vectorIntrinsics.cpp line 808: > 806: Node* mod_mask = gvn().makecon(TypeInt::make(num_elem-1)); > 807: Node* bcast_mod_mask = gvn().transform(VectorNode::scalar2vector(mod_mask, num_elem, shuffle_type_bt)); > 808: Remove redundant new line. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766272449 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766273205 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766273880 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766274718 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766275107 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766275345 From rcastanedalo at openjdk.org Thu Sep 19 07:11:37 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 19 Sep 2024 07:11:37 GMT Subject: RFR: 8340363: User-specified default decorators for UnifiedLogging In-Reply-To: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: On Fri, 13 Sep 2024 09:03:55 GMT, Ant?n Seoane wrote: > Hi all, > > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: > > - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. > - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. > - Additionally, defaults may target a specific log level. > > Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. > > In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. > > Please consider this PR, and thanks! Nice proposal, Ant?n! This will make it possible to migrate lots of debug/trace-level ad-hoc logging in the compiler code to the UL while preserving its current format (e.g. time decorators are hardly needed when examining the output of `-XX:+TraceLoopOpts`). Having said this, I find the following behavior unintuitive. If I run: -Xlog:jit*=debug I get the global default decorators, i.e. `uptime,level,tags`, which is what I expected. But if I run: java -Xlog:jit+compilation=debug,jit+inlining=debug,jit+thread=debug I would expect to get the same decorators, but instead I get the default decorators for `jit+inlining`, i.e. none. Is this intentional? In general, as a HotSpot developer the behavior I would find most natural is to select the union of all decorators for all chosen tags (regardless of whether the decorators for a tag have been chosen actively by the user, specified as default for the tag, or "inherited" from the global default), as in the first option (`-Xlog:jit*=debug`). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20988#issuecomment-2360164889 From epeter at openjdk.org Thu Sep 19 07:14:49 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 19 Sep 2024 07:14:49 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v20] In-Reply-To: References: Message-ID: <-CV_HwOtB7XCcFDYYJlp4K7ATXCTtq_69V7rZ01AfNc=.360a2c59-01b9-493b-83a8-f29f893f30b4@github.com> On Wed, 18 Sep 2024 20:29:32 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add comments, refactor functions to helper class Wow you are a champ, thanks for the many comments. Just dropping a first few comments because I'm taking a quick break. src/hotspot/share/opto/rangeinference.hpp line 115: > 113: return {false, {}}; > 114: } > 115: }; Could the fields be const? And the comment above looks like it could be an assert also? src/hotspot/share/opto/rangeinference.hpp line 126: > 124: > 125: // Various helper functions for TypeInt/TypeLong operations > 126: class TypeIntHelper { Why not call it `RangeInference`? After all the file name is `rangeinference.hpp`. src/hotspot/share/opto/rangeinference.hpp line 130: > 128: // Calculate the cardinality of a TypeInt/TypeLong ignoring the bits > 129: // constraints, the result is tuned down by 1 to ensure the bottom type is > 130: // correctly calculated So is this an upper bound on the cardinality? And can you explain the "tuned down" part? I'm not following there. src/hotspot/share/opto/rangeinference.hpp line 139: > 137: if (U(srange._lo) == urange._lo) { > 138: return urange._hi - urange._lo; > 139: } Here you check that the two ranges are identical? Maybe add an assert that also the `hi` limit is identical? Or did I get something wrong? src/hotspot/share/opto/rangeinference.hpp line 141: > 139: } > 140: > 141: return (urange._hi - U(srange._lo)) + (U(srange._hi) - urange._lo) + 1; It looks good, but I don't understand it ? Can you please explain? src/hotspot/share/opto/rangeinference.hpp line 157: > 155: return super->_lo <= sub->_lo && super->_hi >= sub->_hi && super->_ulo <= sub->_ulo && super->_uhi >= sub->_uhi && > 156: (super->_bits._zeros &~ sub->_bits._zeros) == 0 && (super->_bits._ones &~ sub->_bits._ones) == 0; > 157: } why are these not member methods of `CT`? It also looks like this could be split for the components - at least the `_bits` could be a member method of `KnownBits`. Having it more modular would make it easier to read and understand. ------------- PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2314571573 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1766263310 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1766265295 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1766267425 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1766269589 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1766271769 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1766274950 From rcastanedalo at openjdk.org Thu Sep 19 07:17:38 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 19 Sep 2024 07:17:38 GMT Subject: RFR: 8340363: User-specified default decorators for UnifiedLogging In-Reply-To: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: On Fri, 13 Sep 2024 09:03:55 GMT, Ant?n Seoane wrote: > Hi all, > > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: > > - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. > - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. > - Additionally, defaults may target a specific log level. > > Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. > > In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. > > Please consider this PR, and thanks! src/hotspot/share/logging/logDecorators.cpp line 31: > 29: const LogLevelType AnyLevel = LogLevelType::NotMentioned; > 30: #define DEFAULT_DECORATORS \ > 31: DEFAULT_VALUE(mask_from_decorators(NoDecorators), AnyLevel, LOG_TAGS(jit, inlining)) As a compiler developer, I agree with the choice of no decorators by default for `jit+inlining`. When this is the only tag selected, there isn't much value in the information provided by the global default decorators. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20988#discussion_r1766288455 From jbhateja at openjdk.org Thu Sep 19 07:32:38 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 19 Sep 2024 07:32:38 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v3] In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: <2pPJKmEHM24iStw8Xv2IQ08Xrp7Ag3P2_9yEzsS4nOw=.49f3554d-0917-40ff-8824-d8719c3d271f@github.com> On Wed, 18 Sep 2024 17:00:30 GMT, Sandhya Viswanathan wrote: >> Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. >> >> Summary of changes is as follows: >> 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. >> 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code >> >> For the following source: >> >> >> public void test() { >> var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); >> for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { >> var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); >> index.selectFrom(inpvect).intoArray(byteres, j); >> } >> } >> >> >> The code generated for inner main now looks as follows: >> ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 >> 0x00007f40d02274d0: movslq %ebx,%r13 >> 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) >> 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) >> 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) >> 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) >> 0x00007f40d022751f: add $0x40,%ebx >> 0x00007f40d0227522: cmp %r8d,%ebx >> 0x00007f40d0227525: jl 0x00007f40d02274d0 >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Change method name Hi @sviswa7 , some comments, overall patch looks good to me. Best Regards, Jatin src/hotspot/share/opto/vectorIntrinsics.cpp line 2120: > 2118: > 2119: if (vector_klass == nullptr || elem_klass == nullptr || vlen == nullptr) { > 2120: return false; // dead code Why dead code in comments ? src/hotspot/share/opto/vectorIntrinsics.cpp line 2129: > 2127: NodeClassNames[argument(2)->Opcode()], > 2128: NodeClassNames[argument(3)->Opcode()]); > 2129: return false; // not enough info for intrinsification Please club this with above condition to be consistent with other inline expanders. src/hotspot/share/opto/vectorIntrinsics.cpp line 2141: > 2139: } > 2140: BasicType elem_bt = elem_type->basic_type(); > 2141: Remove new line. src/hotspot/share/opto/vectorIntrinsics.cpp line 2144: > 2142: int num_elem = vlen->get_con(); > 2143: if ((num_elem < 4) || !is_power_of_2(num_elem)) { > 2144: log_if_needed(" ** vlen < 4 or not power of two=%d", num_elem); Will num_elem < 4 not be handled by L2149 since we have an implementation limitation to support less than 32-bit shuffle / masks. src/hotspot/share/opto/vectorIntrinsics.cpp line 2171: > 2169: use_predicate = false; > 2170: if(!is_masked_op || > 2171: (!arch_supports_vector(Op_VectorRearrange, num_elem, elem_bt, VecMaskNotUsed) || Suggestion: (!arch_supports_vector(Op_VectorRearrange, num_elem, elem_bt, VecMaskUseLoad) || src/hotspot/share/opto/vectorIntrinsics.cpp line 2188: > 2186: > 2187: if (v1 == nullptr || v2 == nullptr) { > 2188: return false; // operand unboxing failed To be consistent with other expanders please emit proper error for unboxing failure like on following line. https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectorIntrinsics.cpp#L426 src/hotspot/share/opto/vectorIntrinsics.cpp line 2197: > 2195: mask = unbox_vector(argument(6), mbox_type, elem_bt, num_elem); > 2196: if (mask == nullptr) { > 2197: log_if_needed(" ** not supported: op=selectFrom vlen=%d etype=%s is_masked_op=1", Error should an unboxing failure here. ------------- PR Review: https://git.openjdk.org/jdk/pull/20634#pullrequestreview-2314643808 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766277056 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766277739 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766278169 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766297640 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766292679 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766303620 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766304688 From jsjolen at openjdk.org Thu Sep 19 09:18:37 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 19 Sep 2024 09:18:37 GMT Subject: RFR: 8340363: User-specified default decorators for UnifiedLogging In-Reply-To: References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: On Thu, 19 Sep 2024 01:30:30 GMT, David Holmes wrote: > Finally, this is really subjective. You'd really need to socialise the actual proposed changes to the defaults independent of any mechanism to allow it. By that you mean the `jit+inlining` default, right? That has been socialized among Oracle's C2 developers if I understand correctly, though it hasn't been done for the wider community. The lack of socialising the changes to the wider the community is an oversight on my part. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20988#issuecomment-2360456868 From jsjolen at openjdk.org Thu Sep 19 09:25:36 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 19 Sep 2024 09:25:36 GMT Subject: RFR: 8340363: User-specified default decorators for UnifiedLogging In-Reply-To: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: On Fri, 13 Sep 2024 09:03:55 GMT, Ant?n Seoane wrote: > Hi all, > > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: > > - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. > - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. > - Additionally, defaults may target a specific log level. > > Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. > > In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. > > Please consider this PR, and thanks! src/hotspot/share/logging/logDecorators.cpp line 30: > 28: > 29: const LogLevelType AnyLevel = LogLevelType::NotMentioned; > 30: #define DEFAULT_DECORATORS \ I think this should also have the default decorators that UL already have. That is, all data about default decorators is gathered here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20988#discussion_r1766489613 From epeter at openjdk.org Thu Sep 19 09:38:44 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 19 Sep 2024 09:38:44 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v20] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 20:29:32 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add comments, refactor functions to helper class src/hotspot/share/opto/rangeinference.cpp line 37: > 35: public: > 36: bool _progress; // whether there is progress compared to the last iteration > 37: bool _present; // whether the calculation arrives at contradiction Hmm, then why not call it `_is_result_contradictory`? And instead of `_data`, you could name it `_result`. src/hotspot/share/opto/rangeinference.cpp line 88: > 86: // different from the corresponding bit in lo, since result is larger than lo > 87: // the bit must be 0 in lo and 1 in result. As result should be the smallest > 88: // value, this bit should be the rightmost one possible. The grammar is a bit confusing. Can you please specify which are the lsb and msb? Otherwise right/left/first/... are difficult to interpret. src/hotspot/share/opto/rangeinference.cpp line 101: > 99: // bit from lo is the 3rd one, while with x2 it is the 7th one. As a result, > 100: // if both x1 and x2 satisfy bits, x2 would be closer to our true result. > 101: if (zero_violation < one_violation) { Hmm I'm struggling to understand this. Maybe you first should say that this is a case-distinction over the most-significant bit that has a violation. Then explain/prove why the check `zero_violation < one_violation` does that trick. src/hotspot/share/opto/rangeinference.cpp line 134: > 132: } > 133: > 134: // This is more difficult because trying to unset a bit requires us to flip Put this in an else-block, and say that `// This means that the first bit that does not satisfy the bit requirement is a 1 that should be a 0` But use msb rather than first. first is always a little tricky, because it depends on big/little endian and from where you look ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1766477886 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1766490610 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1766505516 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1766508557 From epeter at openjdk.org Thu Sep 19 09:41:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 19 Sep 2024 09:41:43 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v20] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 20:29:32 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add comments, refactor functions to helper class src/hotspot/share/opto/rangeinference.cpp line 104: > 102: // This means that the first bit that does not satisfy the bit requirement > 103: // is a 0 that should be a 1, this may be the first different bit we want > 104: // to find. I would say this instead: `The msb violation is a one_violation. We must find it and do ...` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1766513581 From jvernee at openjdk.org Thu Sep 19 12:20:13 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 19 Sep 2024 12:20:13 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded [v6] In-Reply-To: References: Message-ID: > As discussed in the JBS issue: > > FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` > > However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. > > The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. > >
> Performance numbers > x64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op > > > aarch64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op > >
> > As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. > > Testing: tier 1-4 Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: - use resolve_global_jobject on s390 - Merge branch 'master' into LoadVMTraget - remove PC save/restore on s390 - use fatal() - add RISC-V as target platform - Adjust ppc & RISC-V code - Add s390 changes - Merge branch 'master' into LoadVMTraget - Don't save/restore LR/CR + resolve_jobject on s390 - eyeball other platforms - ... and 14 more: https://git.openjdk.org/jdk/compare/2faf8b8d...b703b162 ------------- Changes: https://git.openjdk.org/jdk/pull/20479/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20479&range=05 Stats: 333 lines in 23 files changed: 255 ins; 26 del; 52 mod Patch: https://git.openjdk.org/jdk/pull/20479.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20479/head:pull/20479 PR: https://git.openjdk.org/jdk/pull/20479 From jvernee at openjdk.org Thu Sep 19 12:20:14 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 19 Sep 2024 12:20:14 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded [v5] In-Reply-To: References: <9Lc9Ej1toiiI8QFYXndprsPo4l-g20XWPDw5g9l36Fk=.091f48aa-5559-4862-8968-e4283d3fa728@github.com> Message-ID: On Thu, 19 Sep 2024 05:03:50 GMT, Amit Kumar wrote: >> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: >> >> remove PC save/restore on s390 > > src/hotspot/cpu/s390/stubGenerator_s390.cpp line 3063: > >> 3061: address start = __ pc(); >> 3062: >> 3063: __ resolve_jobject(Z_ARG1, Z_tmp_1, Z_tmp_2); > > @JornVernee is it possible to rebase to head stream ? If there are no issue on other archs :-) > > I have implemented `resolve_global_jobject` method for s390x and merged changes to head stream with PR https://github.com/openjdk/jdk/pull/20986. Once rebased you can push below change. > > Suggestion: > > __ resolve_global_jobject(Z_ARG1, Z_tmp_1, Z_tmp_2); I've switched s390 to use `resolve_global_jobject` as well now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1766727056 From qamai at openjdk.org Thu Sep 19 14:21:08 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 19 Sep 2024 14:21:08 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v21] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: address reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/8a5370a1..644bcedf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=19-20 Stats: 96 lines in 2 files changed: 25 ins; 4 del; 67 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Thu Sep 19 14:21:09 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 19 Sep 2024 14:21:09 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v20] In-Reply-To: References: Message-ID: <_zQ35m-3qBi0YKmfbWwPOENRIjaO3hzTeiSxZKfL5dU=.42f0be5f-cb0e-485c-b552-4796c61e7e0d@github.com> On Thu, 19 Sep 2024 09:23:21 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> add comments, refactor functions to helper class > > src/hotspot/share/opto/rangeinference.cpp line 88: > >> 86: // different from the corresponding bit in lo, since result is larger than lo >> 87: // the bit must be 0 in lo and 1 in result. As result should be the smallest >> 88: // value, this bit should be the rightmost one possible. > > The grammar is a bit confusing. Can you please specify which are the lsb and msb? Otherwise right/left/first/... are difficult to interpret. I found it awkward calling the 2nd significant bit or the 4th least significant bit. As a result, I have added explanations that the value is viewed as a binary string, the first bit would be the msb, the last bit is the lsb, etc. > src/hotspot/share/opto/rangeinference.hpp line 115: > >> 113: return {false, {}}; >> 114: } >> 115: }; > > Could the fields be const? And the comment above looks like it could be an assert also? Maybe, but `const` fields in C++ are really strong (it makes the whole object non-assignable), and this is a prototype object that appears briefly so non-const seems fine. Regarding the assert, verifying the canonical-ness is not trivial, so I only do so in the constructor of `TypeInt/TypeLong`, which is probably more important. > src/hotspot/share/opto/rangeinference.hpp line 126: > >> 124: >> 125: // Various helper functions for TypeInt/TypeLong operations >> 126: class TypeIntHelper { > > Why not call it `RangeInference`? After all the file name is `rangeinference.hpp`. While the file is called `rangeinference.hpp`, there has not been any inference yet, these are helpers that are called from `TypeInt/TypeLong` only. > src/hotspot/share/opto/rangeinference.hpp line 130: > >> 128: // Calculate the cardinality of a TypeInt/TypeLong ignoring the bits >> 129: // constraints, the result is tuned down by 1 to ensure the bottom type is >> 130: // correctly calculated > > So is this an upper bound on the cardinality? And can you explain the "tuned down" part? I'm not following there. This is the cardinality minus 1, which helps avoid overflow calculating the size of the bottom type. > src/hotspot/share/opto/rangeinference.hpp line 141: > >> 139: } >> 140: >> 141: return (urange._hi - U(srange._lo)) + (U(srange._hi) - urange._lo) + 1; > > It looks good, but I don't understand it ? Can you please explain? Basically `urange` and `srange` can be the same, or their intersection is the union of `[srange._lo, urange._hi]` and `[urange._lo, srange._hi]`. This simply calculates the size of 2 intervals and add them together. > src/hotspot/share/opto/rangeinference.hpp line 157: > >> 155: return super->_lo <= sub->_lo && super->_hi >= sub->_hi && super->_ulo <= sub->_ulo && super->_uhi >= sub->_uhi && >> 156: (super->_bits._zeros &~ sub->_bits._zeros) == 0 && (super->_bits._ones &~ sub->_bits._ones) == 0; >> 157: } > > why are these not member methods of `CT`? It also looks like this could be split for the components - at least the `_bits` could be a member method of `KnownBits`. > > Having it more modular would make it easier to read and understand. They are helper methods that get called from `CT`, the main reason is that `TypeInt` and `TypeLong` are not templates so implementing these would be a huge duplication. This class is used merely to help reduce this duplication. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1766926389 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1766915973 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1766917551 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1766918634 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1766921978 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1766924130 From qamai at openjdk.org Thu Sep 19 14:26:43 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 19 Sep 2024 14:26:43 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v20] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 09:38:59 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> add comments, refactor functions to helper class > > src/hotspot/share/opto/rangeinference.cpp line 104: > >> 102: // This means that the first bit that does not satisfy the bit requirement >> 103: // is a 0 that should be a 1, this may be the first different bit we want >> 104: // to find. > > I would say something like this. > `The msb violation is a one_violation - i.e. it is 0 instead of 1. Intuitively, we want to iteratively add 1 to the number, until that violating bit turns from 0 to 1. This is the lowest number that achieves to remove that one_violation. Note, that for this number all higher bits remain the same, the bit itself is 1, and the lower bits must be all zero.` It would be tempting to do that, but it would make the second case harder to understand. The explanation I settled on makes the 2 cases similar, which I hope would help understand the whole picture more easily. In short, since `result > lo`, there must be a bit position `i` such that `result[i] = 1` and `lo[i] = 0` and all bits before `i` would be the same in `result` and `lo`. In both cases, we find a position `j` such that `result[j] != lo[j]`, which leads to `i >= j`; and prove that `j` leads to an acceptable result. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1766936906 From epeter at openjdk.org Thu Sep 19 15:26:46 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 19 Sep 2024 15:26:46 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v21] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 14:21:08 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > address reviews src/hotspot/share/opto/rangeinference.cpp line 93: > 91: // different from the corresponding bit in lo, since result is larger than lo > 92: // the bit must be 0 in lo and 1 in result. As result should be the smallest > 93: // value, this bit should be the last one possible. The first sentence is a bit strange gramatically. Maybe just drop `The principle here is that`. I also wonder if you want to first mention the paragraph below which says what you want. // The algorithm depends on whether the first violation violates zeros or // ones, if it violates zeros, we have the bit being 1 in zero_violation and // 0 in one_violation. Since all higher bits are 0 in zero_violation and // one_violation, we have zero_violation > one_violation. Similarly, if the // first violation violates ones, we have zero_violation < one_violation. And only then make an example or two about larger / smaller. Generally it is nice to know first where we are heading, and then also the intuition, and only then the details / examples :) src/hotspot/share/opto/rangeinference.cpp line 118: > 116: // have all lower bits being 0. This value satisfies zeros, because all > 117: // bits before the first violation have already satisfied zeros, and all > 118: // bits after the first violation are 0. To satisfy 1, simply | this value Suggestion: // bits after the first violation are 0. To satisfy ones, simply OR this value src/hotspot/share/opto/rangeinference.cpp line 131: > 129: // This value must satisfy zeros, because all bits before the 2nd bit have > 130: // already satisfied zeros, and all bits after the 2nd bit are all 0 now. > 131: // Just | this value with ones to obtain the final result. Suggestion: // Just OR this value with ones to obtain the final result. src/hotspot/share/opto/rangeinference.cpp line 144: > 142: // 1 1 0 0 1 0 1 0 > 143: return lo | bits._ones; > 144: } else { Suggestion: } else { // The first bit that does not satisfy the bit requirement is a 1 but should be a 0. src/hotspot/share/opto/rangeinference.cpp line 146: > 144: } else { > 145: // This is more difficult because trying to unset a bit requires us to flip > 146: // some bits before it. Suggestion: // some bits before it (the more significant bits). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1767027307 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1767018301 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1767030195 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1767034123 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1767032614 From epeter at openjdk.org Thu Sep 19 15:26:47 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 19 Sep 2024 15:26:47 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v20] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 20:29:32 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add comments, refactor functions to helper class src/hotspot/share/opto/rangeinference.cpp line 118: > 116: // new lo (11000000 -> 11001000 -> 11001010). The final value is our > 117: // result. > 118: // Implementationwise, from 11000000 we can just | with ones to obtain the Suggestion: // Implementationwise, from 11000000 we can just OR with ones to obtain the ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1767015403 From epeter at openjdk.org Thu Sep 19 15:26:48 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 19 Sep 2024 15:26:48 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v21] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 15:13:12 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> address reviews > > src/hotspot/share/opto/rangeinference.cpp line 118: > >> 116: // have all lower bits being 0. This value satisfies zeros, because all >> 117: // bits before the first violation have already satisfied zeros, and all >> 118: // bits after the first violation are 0. To satisfy 1, simply | this value > > Suggestion: > > // bits after the first violation are 0. To satisfy ones, simply OR this value Ok, this I can understand much better! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1767018989 From epeter at openjdk.org Thu Sep 19 15:26:48 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 19 Sep 2024 15:26:48 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v21] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 15:13:36 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/rangeinference.cpp line 118: >> >>> 116: // have all lower bits being 0. This value satisfies zeros, because all >>> 117: // bits before the first violation have already satisfied zeros, and all >>> 118: // bits after the first violation are 0. To satisfy 1, simply | this value >> >> Suggestion: >> >> // bits after the first violation are 0. To satisfy ones, simply OR this value > > Ok, this I can understand much better! Maybe you can even add a line like below, to visually describe the result of the math below: `result = lo[>i], 1, ones[ References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: formality ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/644bcedf..f2d3f3b1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=20-21 Stats: 120 lines in 1 file changed: 81 ins; 9 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Thu Sep 19 17:12:28 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 19 Sep 2024 17:12:28 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 14:27:22 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - fix compile errors >> - Merge branch 'master' into unsignedbounds >> - add comments >> - Merge branch 'master' into unsignedbounds >> - fix release build >> - add comments, group arguments to reduce C-style reference passing arguments >> - fix tests, add verify >> - add unit tests >> - fix template parameter >> - refactor >> - ... and 1 more: https://git.openjdk.org/jdk/compare/d8e4d3f2...d5ad9f1a > > What are your thoughts on how we are going to debug-print the new int-type? We probably do not always want to print the KnownBits, in general that is way too verbose. But in some alignment example, it would be nice to know that it is the `int/long` range, but the lowest 3 bits are always zero, hence 8 byte aligned. @eme64 Thanks to your suggestions, I have managed to come up with a (fairly) formal proof for the algorithm here! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2361711051 From epeter at openjdk.org Thu Sep 19 18:35:46 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 19 Sep 2024 18:35:46 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v20] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 14:24:12 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/rangeinference.cpp line 104: >> >>> 102: // This means that the first bit that does not satisfy the bit requirement >>> 103: // is a 0 that should be a 1, this may be the first different bit we want >>> 104: // to find. >> >> I would say something like this. >> >> The msb violation is a one_violation - i.e. lo[i] = 0, instead of 1. >> We construct the result from the high bits of lo, a 1, and the low bits of ones: >> result = lo[i>i], 1, ones[> >> Proof of correctness: >> - TODO: why the high bits must be lo[>i] >> - TODO: why the low bits must be ones[ > It would be tempting to do that, but it would make the second case harder to understand. The explanation I settled on makes the 2 cases similar, which I hope would help understand the whole picture more easily. > > In short, since `result > lo`, there must be a bit position `i` such that `result[i] = 1` and `lo[i] = 0` and all bits before `i` would be the same in `result` and `lo`. In both cases, we find a position `j` such that `result[j] != lo[j]`, which leads to `i >= j`; and prove that `j` leads to an acceptable result. This sounds much better :) >> src/hotspot/share/opto/rangeinference.hpp line 141: >> >>> 139: } >>> 140: >>> 141: return (urange._hi - U(srange._lo)) + (U(srange._hi) - urange._lo) + 1; >> >> It looks good, but I don't understand it ? Can you please explain? > > Basically `urange` and `srange` can be the same, or their intersection is the union of `[srange._lo, urange._hi]` and `[urange._lo, srange._hi]`. This simply calculates the size of 2 intervals and add them together. Ah thanks! Can you make that a comment in the code? >> src/hotspot/share/opto/rangeinference.hpp line 157: >> >>> 155: return super->_lo <= sub->_lo && super->_hi >= sub->_hi && super->_ulo <= sub->_ulo && super->_uhi >= sub->_uhi && >>> 156: (super->_bits._zeros &~ sub->_bits._zeros) == 0 && (super->_bits._ones &~ sub->_bits._ones) == 0; >>> 157: } >> >> why are these not member methods of `CT`? It also looks like this could be split for the components - at least the `_bits` could be a member method of `KnownBits`. >> >> Having it more modular would make it easier to read and understand. > > They are helper methods that get called from `CT`, the main reason is that `TypeInt` and `TypeLong` are not templates so implementing these would be a huge duplication. This class is used merely to help reduce this duplication. Ah fair enough. Ok. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1767399737 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1767397636 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1767397153 From epeter at openjdk.org Thu Sep 19 18:46:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 19 Sep 2024 18:46:43 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v22] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 17:12:22 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > formality src/hotspot/share/opto/rangeinference.cpp line 65: > 63: }; > 64: > 65: // Find the minimum value that is not less than lo and satisfies bits. What happens if this is impossible? Imagine lo is `max_uint = FFFFFFFF`, but the known bits know that some specific bit must be zero? Or is there some guarantee that this will never happen? Can we have an assert for that? src/hotspot/share/opto/rangeinference.cpp line 90: > 88: } > 89: > 90: /* Everywhere else you use `//`, and that seems to be generally our style, so I'd keep it consistent ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1767404352 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1767405832 From kxu at openjdk.org Thu Sep 19 20:08:43 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 19 Sep 2024 20:08:43 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v9] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: <2k3we6dqYwZB8Jg3_cJEJBl_SNhMsaU9PPLvUwuq8DY=.a11df197-22cc-4ef8-b5ea-370b9914c13c@github.com> On Wed, 18 Sep 2024 18:47:22 GMT, Emanuel Peter wrote: >> Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 27 additional commits since the last revision: >> >> - Merge branch 'master' into long-typed-parallel-iv >> - use @run driver and Argument.RANDOM_ONCE >> - Merge branch 'master' into long-typed-parallel-iv >> - add random strides to tests >> - fix tests on larger strides >> - add more expressive comments and test cases >> - Merge branch 'master' into long-typed-parallel-iv >> - update comments to clarify on type casting >> - add pseudocode for subgraphs before/after the transformation >> - remove WIP support for long counted loops >> - ... and 17 more: https://git.openjdk.org/jdk/compare/f7f19361...20bdc791 > > test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java line 51: > >> 49: stride = new Random().nextInt(1, Integer.MAX_VALUE / 16); >> 50: stride2 = stride * new Random().nextInt(1, 16); >> 51: } > > So `stride` and `stride2` are compile-time constants. That is intended, right? Yes, this is intended. I followed [your suggestion](https://github.com/openjdk/jdk/pull/18489#discussion_r1638725652) to expand testing with some random numbers. Please let me know if I misunderstood. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1767531620 From sviswanathan at openjdk.org Thu Sep 19 21:43:01 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 19 Sep 2024 21:43:01 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v4] In-Reply-To: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: > Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. > > Summary of changes is as follows: > 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. > 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code > > For the following source: > > > public void test() { > var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); > for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { > var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); > index.selectFrom(inpvect).intoArray(byteres, j); > } > } > > > The code generated for inner main now looks as follows: > ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 > 0x00007f40d02274d0: movslq %ebx,%r13 > 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 > 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) > 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 > 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) > 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 > 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) > 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 > 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) > 0x00007f40d022751f: add $0x40,%ebx > 0x00007f40d0227522: cmp %r8d,%ebx > 0x00007f40d0227525: jl 0x00007f40d02274d0 > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Implement review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20634/files - new: https://git.openjdk.org/jdk/pull/20634/files/87e103ee..f8e67fb3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20634&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20634&range=02-03 Stats: 27 lines in 1 file changed: 9 ins; 8 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/20634.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20634/head:pull/20634 PR: https://git.openjdk.org/jdk/pull/20634 From sviswanathan at openjdk.org Thu Sep 19 21:43:02 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 19 Sep 2024 21:43:02 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v3] In-Reply-To: <2pPJKmEHM24iStw8Xv2IQ08Xrp7Ag3P2_9yEzsS4nOw=.49f3554d-0917-40ff-8824-d8719c3d271f@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> <2pPJKmEHM24iStw8Xv2IQ08Xrp7Ag3P2_9yEzsS4nOw=.49f3554d-0917-40ff-8824-d8719c3d271f@github.com> Message-ID: On Thu, 19 Sep 2024 07:29:11 GMT, Jatin Bhateja wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> Change method name > > Hi @sviswa7 , some comments, overall patch looks good to me. > > Best Regards, > Jatin Thanks a lot @jatin-bhateja. I have implemented your review comments. > src/hotspot/share/opto/vectorIntrinsics.cpp line 772: > >> 770: >> 771: if (elem_klass == nullptr || shuffle_klass == nullptr || shuffle->is_top() || vlen == nullptr) { >> 772: return false; // dead code > > Why dead code in comment ? this is a failed intrinsification condition. Modified comment. > src/hotspot/share/opto/vectorIntrinsics.cpp line 776: > >> 774: if (!vlen->is_con() || shuffle_klass->const_oop() == nullptr) { >> 775: return false; // not enough info for intrinsification >> 776: } > > Why don't you club it with above conditions to be consistent with other inline expanders ? Done > src/hotspot/share/opto/vectorIntrinsics.cpp line 2120: > >> 2118: >> 2119: if (vector_klass == nullptr || elem_klass == nullptr || vlen == nullptr) { >> 2120: return false; // dead code > > Why dead code in comments ? Modified comment. > src/hotspot/share/opto/vectorIntrinsics.cpp line 2129: > >> 2127: NodeClassNames[argument(2)->Opcode()], >> 2128: NodeClassNames[argument(3)->Opcode()]); >> 2129: return false; // not enough info for intrinsification > > Please club this with above condition to be consistent with other inline expanders. done > src/hotspot/share/opto/vectorIntrinsics.cpp line 2144: > >> 2142: int num_elem = vlen->get_con(); >> 2143: if ((num_elem < 4) || !is_power_of_2(num_elem)) { >> 2144: log_if_needed(" ** vlen < 4 or not power of two=%d", num_elem); > > Will num_elem < 4 not be handled by L2149 since we have an implementation limitation to support less than 32-bit shuffle / masks. Yes that should handle it. > src/hotspot/share/opto/vectorIntrinsics.cpp line 2171: > >> 2169: use_predicate = false; >> 2170: if(!is_masked_op || >> 2171: (!arch_supports_vector(Op_VectorRearrange, num_elem, elem_bt, VecMaskNotUsed) || > > Suggestion: > > (!arch_supports_vector(Op_VectorRearrange, num_elem, elem_bt, VecMaskUseLoad) || Here it should be VecMaskNotUsed as this case it using blend to emulate masking. The VecMaskUseLoad case is checked at line 2168. > src/hotspot/share/opto/vectorIntrinsics.cpp line 2188: > >> 2186: >> 2187: if (v1 == nullptr || v2 == nullptr) { >> 2188: return false; // operand unboxing failed > > To be consistent with other expanders please emit proper error for unboxing failure like on following line. > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectorIntrinsics.cpp#L426 done > src/hotspot/share/opto/vectorIntrinsics.cpp line 2197: > >> 2195: mask = unbox_vector(argument(6), mbox_type, elem_bt, num_elem); >> 2196: if (mask == nullptr) { >> 2197: log_if_needed(" ** not supported: op=selectFrom vlen=%d etype=%s is_masked_op=1", > > Error should an unboxing failure here. done ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2362249672 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1767601917 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1767602096 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1767605028 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1767605213 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1767607670 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1767610833 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1767615559 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1767617255 From sviswanathan at openjdk.org Thu Sep 19 21:45:36 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 19 Sep 2024 21:45:36 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v2] In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: <_Q0HCE6Lc7LZY8Sc5XzQvLHg_WdeCDOAGZgMOeEWK4M=.d28c8b11-ee52-4551-92b8-357c04a4d5ef@github.com> On Wed, 18 Sep 2024 12:23:48 GMT, Emanuel Peter wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > I'm a bit confused by the name `shuffleWrapIndexes` and `inline_vector_shuffle_wrap_indexes`. > > Are you **shuffling wrap-indexes**? I don't know what that would even mean. I think you should name it `wrapShuffleIndexes`. Or is there any naming convention in the VectorAPI that prevents this? Thanks a lot @eme64 for the review. I have implemented your review comment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2362253398 From kxu at openjdk.org Thu Sep 19 22:02:52 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 19 Sep 2024 22:02:52 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v10] In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: update tests and comments as requested ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18489/files - new: https://git.openjdk.org/jdk/pull/18489/files/20bdc791..63ff69ea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=08-09 Stats: 125 lines in 2 files changed: 73 ins; 5 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/18489.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489 PR: https://git.openjdk.org/jdk/pull/18489 From psandoz at openjdk.org Thu Sep 19 23:52:35 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Thu, 19 Sep 2024 23:52:35 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 16:13:55 GMT, Quan Anh Mai wrote: > Hi, > > This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. > > Regarding the related issues: > > - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. > - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` > - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. > > Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. > > Please take a look and leave reviews. Thanks a lot. > > The description of the original PR: > > This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: > > Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. > Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. > Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. > Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; {ex... > Got it, I think #20508 and this PR are unrelated implementation-wise, though. It would be nice if we can move independently of #20508 as that may take longer to integrate because of API/CSR review. > > @jatin-bhateja What do you think of using this patch and intrinsifing `Vector::rearrange(VectorShuffle, Vector)` instead of introducing the 2 vector `selectFrom` API? IMO the two-vector `selectFrom` API is complementary to the existing single-vector `selectFrom`, and both have equivalent `rearrange` expressions. For either use we should ideally get to the point that a similar/identical optimal instruction sequence is generated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2362388082 From amitkumar at openjdk.org Fri Sep 20 04:09:40 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 20 Sep 2024 04:09:40 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded [v6] In-Reply-To: References: Message-ID: <6mePQ-uHpmaHQrkZ8IqPNmH9Zbpd9ACeruRj1edeDi8=.d93dfea7-d746-46c6-b9f4-4826f775cf28@github.com> On Thu, 19 Sep 2024 12:20:13 GMT, Jorn Vernee wrote: >> As discussed in the JBS issue: >> >> FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` >> >> However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. >> >> The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. >> >>
>> Performance numbers >> x64: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op >> >> >> aarch64: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op >> >>
>> >> As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. >> >> Testing: tier 1-4 > > Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: > > - use resolve_global_jobject on s390 > - Merge branch 'master' into LoadVMTraget > - remove PC save/restore on s390 > - use fatal() > - add RISC-V as target platform > - Adjust ppc & RISC-V code > - Add s390 changes > - Merge branch 'master' into LoadVMTraget > - Don't save/restore LR/CR + resolve_jobject on s390 > - eyeball other platforms > - ... and 14 more: https://git.openjdk.org/jdk/compare/2faf8b8d...b703b162 I ran tier1-tests on release & fastdebug VMs for s390x. Didn't see any new failure appearing. s390x part seems good to me. Marked as reviewed by amitkumar (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20479#pullrequestreview-2317183506 PR Review: https://git.openjdk.org/jdk/pull/20479#pullrequestreview-2317183803 From amitkumar at openjdk.org Fri Sep 20 06:06:09 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 20 Sep 2024 06:06:09 GMT Subject: RFR: 8340269: [s390x] TestLargeStub.java failure after 8338123 [v2] In-Reply-To: References: Message-ID: <1ToCnqQYL7tnkp-7Ou1HaYo0XARRgt_EhTdlIXkeB34=.24d45cd3-bd9e-4076-8cea-911e05e50ec3@github.com> > Fixes the test case; Ran tier1 with fastdebug-vm, didn't see any regression. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: update code_base_size & size_per_args ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21033/files - new: https://git.openjdk.org/jdk/pull/21033/files/786ba261..e1063a74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21033&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21033&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21033.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21033/head:pull/21033 PR: https://git.openjdk.org/jdk/pull/21033 From amitkumar at openjdk.org Fri Sep 20 06:06:09 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 20 Sep 2024 06:06:09 GMT Subject: RFR: 8340269: [s390x] TestLargeStub.java failure after 8338123 [v2] In-Reply-To: <0zDxBJohIEtsDXsPpIKq51UFvpxaHTGq_oUF9Mz2L4w=.9712e8c9-eb2b-4124-9d3c-5a7a3ca8a7ae@github.com> References: <0zDxBJohIEtsDXsPpIKq51UFvpxaHTGq_oUF9Mz2L4w=.9712e8c9-eb2b-4124-9d3c-5a7a3ca8a7ae@github.com> Message-ID: On Tue, 17 Sep 2024 09:56:34 GMT, Martin Doerr wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> update code_base_size & size_per_args > > `pd_store_reg` and `pd_load_reg` use `reg2mem_opt` etc. I don't know how large they can get. It's correct if they fit into 16 Bytes. Did you measure the size of a trivial downcall stub? Do you still have some space left? @TheRealMDoerr I have updated the code size & args size required as per your suggestion. Please have a look again; Thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21033#issuecomment-2362895159 From amitkumar at openjdk.org Fri Sep 20 06:06:09 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 20 Sep 2024 06:06:09 GMT Subject: RFR: 8340269: [s390x] TestLargeStub.java failure after 8338123 In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 08:37:31 GMT, Amit Kumar wrote: > Fixes the test case; Ran tier1 with fastdebug-vm, didn't see any regression. Also I did run the test again and I am not seeing any issues. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21033#issuecomment-2362895610 From lucy at openjdk.org Fri Sep 20 06:56:35 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 20 Sep 2024 06:56:35 GMT Subject: RFR: 8340269: [s390x] TestLargeStub.java failure after 8338123 [v2] In-Reply-To: <1ToCnqQYL7tnkp-7Ou1HaYo0XARRgt_EhTdlIXkeB34=.24d45cd3-bd9e-4076-8cea-911e05e50ec3@github.com> References: <1ToCnqQYL7tnkp-7Ou1HaYo0XARRgt_EhTdlIXkeB34=.24d45cd3-bd9e-4076-8cea-911e05e50ec3@github.com> Message-ID: <3FMQfMfSWFD6Ruhp90H4Sa-hiHD8Dp2Ntx-nAHe9rKs=.fde35ffe-a015-4244-9ff4-8dee19bf9768@github.com> On Fri, 20 Sep 2024 06:06:09 GMT, Amit Kumar wrote: >> Fixes the test case; Ran tier1 with fastdebug-vm, didn't see any regression. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > update code_base_size & size_per_args Looks good. Should such a sizing issue reoccur, then please increment the values to 512 and 16. And please wait for the tests to complete successfully! ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21033#pullrequestreview-2317420328 From mli at openjdk.org Fri Sep 20 07:42:04 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 20 Sep 2024 07:42:04 GMT Subject: RFR: 8340438: RISC-V: minor improvement in base64 Message-ID: <9iQLhe38yLpIDsL8KopVNPSH0lQLkkyqrV7wdX_QMaU=.836090a1-217a-4fdb-98a6-3a60f461bcde@github.com> Hi, Can you help to review this simple patch? Thanks Thanks @RealFYang for spotting this! ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/21105/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21105&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340438 Stats: 6 lines in 1 file changed: 2 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21105.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21105/head:pull/21105 PR: https://git.openjdk.org/jdk/pull/21105 From mdoerr at openjdk.org Fri Sep 20 08:40:34 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 20 Sep 2024 08:40:34 GMT Subject: RFR: 8340269: [s390x] TestLargeStub.java failure after 8338123 [v2] In-Reply-To: <1ToCnqQYL7tnkp-7Ou1HaYo0XARRgt_EhTdlIXkeB34=.24d45cd3-bd9e-4076-8cea-911e05e50ec3@github.com> References: <1ToCnqQYL7tnkp-7Ou1HaYo0XARRgt_EhTdlIXkeB34=.24d45cd3-bd9e-4076-8cea-911e05e50ec3@github.com> Message-ID: <2ofc-oU4x7wVQKXg2fETPewKL9bI1eMjw5QG1QPX17I=.31e6767b-2729-4d80-8dbf-912e88f4e790@github.com> On Fri, 20 Sep 2024 06:06:09 GMT, Amit Kumar wrote: >> Fixes the test case; Ran tier1 with fastdebug-vm, didn't see any regression. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > update code_base_size & size_per_args LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21033#pullrequestreview-2317638774 From fyang at openjdk.org Fri Sep 20 08:49:35 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 20 Sep 2024 08:49:35 GMT Subject: RFR: 8340438: RISC-V: minor improvement in base64 In-Reply-To: <9iQLhe38yLpIDsL8KopVNPSH0lQLkkyqrV7wdX_QMaU=.836090a1-217a-4fdb-98a6-3a60f461bcde@github.com> References: <9iQLhe38yLpIDsL8KopVNPSH0lQLkkyqrV7wdX_QMaU=.836090a1-217a-4fdb-98a6-3a60f461bcde@github.com> Message-ID: On Fri, 20 Sep 2024 07:37:20 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Thanks > > Thanks @RealFYang for spotting this! Looks good. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21105#pullrequestreview-2317658246 From mli at openjdk.org Fri Sep 20 09:36:38 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 20 Sep 2024 09:36:38 GMT Subject: RFR: 8340438: RISC-V: minor improvement in base64 In-Reply-To: References: <9iQLhe38yLpIDsL8KopVNPSH0lQLkkyqrV7wdX_QMaU=.836090a1-217a-4fdb-98a6-3a60f461bcde@github.com> Message-ID: On Fri, 20 Sep 2024 08:47:17 GMT, Fei Yang wrote: >> Hi, >> Can you help to review this simple patch? >> Thanks >> >> Thanks @RealFYang for spotting this! > > Looks good. Thanks. Thanks for your reviewing! @RealFYang ------------- PR Comment: https://git.openjdk.org/jdk/pull/21105#issuecomment-2363285934 From mli at openjdk.org Fri Sep 20 09:36:39 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 20 Sep 2024 09:36:39 GMT Subject: Integrated: 8340438: RISC-V: minor improvement in base64 In-Reply-To: <9iQLhe38yLpIDsL8KopVNPSH0lQLkkyqrV7wdX_QMaU=.836090a1-217a-4fdb-98a6-3a60f461bcde@github.com> References: <9iQLhe38yLpIDsL8KopVNPSH0lQLkkyqrV7wdX_QMaU=.836090a1-217a-4fdb-98a6-3a60f461bcde@github.com> Message-ID: On Fri, 20 Sep 2024 07:37:20 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Thanks > > Thanks @RealFYang for spotting this! This pull request has now been integrated. Changeset: 3ad6e31d Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/3ad6e31d81bb8a47dc73a6342a6524a901f07687 Stats: 6 lines in 1 file changed: 2 ins; 2 del; 2 mod 8340438: RISC-V: minor improvement in base64 Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/21105 From amitkumar at openjdk.org Fri Sep 20 14:50:40 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 20 Sep 2024 14:50:40 GMT Subject: RFR: 8340269: [s390x] TestLargeStub.java failure after 8338123 [v2] In-Reply-To: <3FMQfMfSWFD6Ruhp90H4Sa-hiHD8Dp2Ntx-nAHe9rKs=.fde35ffe-a015-4244-9ff4-8dee19bf9768@github.com> References: <1ToCnqQYL7tnkp-7Ou1HaYo0XARRgt_EhTdlIXkeB34=.24d45cd3-bd9e-4076-8cea-911e05e50ec3@github.com> <3FMQfMfSWFD6Ruhp90H4Sa-hiHD8Dp2Ntx-nAHe9rKs=.fde35ffe-a015-4244-9ff4-8dee19bf9768@github.com> Message-ID: On Fri, 20 Sep 2024 06:53:55 GMT, Lutz Schmidt wrote: >Should such a sizing issue reoccur, then please increment the values to 512 and 16. Sure Lutz. Thank you both for the review & suggestions; ------------- PR Comment: https://git.openjdk.org/jdk/pull/21033#issuecomment-2363909832 From amitkumar at openjdk.org Fri Sep 20 14:50:40 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 20 Sep 2024 14:50:40 GMT Subject: Integrated: 8340269: [s390x] TestLargeStub.java failure after 8338123 In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 08:37:31 GMT, Amit Kumar wrote: > Fixes the test case; Ran tier1 with fastdebug-vm, didn't see any regression. This pull request has now been integrated. Changeset: e087edeb Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/e087edeb256a9743d1fdb6c295cb5add78d4552e Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8340269: [s390x] TestLargeStub.java failure after 8338123 Reviewed-by: mdoerr, lucy ------------- PR: https://git.openjdk.org/jdk/pull/21033 From qamai at openjdk.org Fri Sep 20 15:44:04 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 20 Sep 2024 15:44:04 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v23] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: comment adjust_lo empty case ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/f2d3f3b1..c440a72a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=21-22 Stats: 12 lines in 1 file changed: 7 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Fri Sep 20 15:44:04 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 20 Sep 2024 15:44:04 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v22] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 18:37:22 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> formality > > src/hotspot/share/opto/rangeinference.cpp line 65: > >> 63: }; >> 64: >> 65: // Find the minimum value that is not less than lo and satisfies bits. > > What happens if this is impossible? Imagine lo is `max_uint = FFFFFFFF`, but the known bits know that some specific bit must be zero? Or is there some guarantee that this will never happen? Can we have an assert for that? It will overflow and return `bits._ones` which will `< lo`. I have added that to the function comment and more asserts to ensure that is correct. > src/hotspot/share/opto/rangeinference.cpp line 90: > >> 88: } >> 89: >> 90: /* > > Everywhere else you use `//`, and that seems to be generally our style, so I'd keep it consistent ;) >From https://github.com/openjdk/jdk/pull/9947, I learnt that for such a long paragraph with dense mathematics, reducing `//` noise would make reading it easier. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1768831332 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1768830185 From qamai at openjdk.org Fri Sep 20 15:44:04 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 20 Sep 2024 15:44:04 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v20] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 18:31:55 GMT, Emanuel Peter wrote: >> Basically `urange` and `srange` can be the same, or their intersection is the union of `[srange._lo, urange._hi]` and `[urange._lo, srange._hi]`. This simply calculates the size of 2 intervals and add them together. > > Ah thanks! Can you make that a comment in the code? I have added comments at this point, I don't know why Github does not mark it as outdated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1768833854 From jbhateja at openjdk.org Fri Sep 20 17:04:41 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 20 Sep 2024 17:04:41 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v4] In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Thu, 19 Sep 2024 21:43:01 GMT, Sandhya Viswanathan wrote: >> Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. >> >> Summary of changes is as follows: >> 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. >> 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code >> >> For the following source: >> >> >> public void test() { >> var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); >> for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { >> var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); >> index.selectFrom(inpvect).intoArray(byteres, j); >> } >> } >> >> >> The code generated for inner main now looks as follows: >> ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 >> 0x00007f40d02274d0: movslq %ebx,%r13 >> 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) >> 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) >> 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) >> 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) >> 0x00007f40d022751f: add $0x40,%ebx >> 0x00007f40d0227522: cmp %r8d,%ebx >> 0x00007f40d0227525: jl 0x00007f40d02274d0 >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Implement review comments Thanks @sviswa7 , LGTM ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20634#pullrequestreview-2318802240 From jbhateja at openjdk.org Fri Sep 20 17:24:44 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 20 Sep 2024 17:24:44 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v22] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 17:12:22 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > formality src/hotspot/share/opto/rangeinference.cpp line 212: > 210: // first_violation is the position of the violation counting from the > 211: // lowest bit up (0-based), since i == 2, first_difference == 6 > 212: juint first_violation = W - 1 - count_leading_zeros(one_violation); // 6 FTR, A zero violation can never cause a value to be smaller than smallest possible value represented by Known bits, while one violation can. src/hotspot/share/opto/rangeinference.hpp line 70: > 68: template > 69: class KnownBits { > 70: static_assert(std::is_unsigned::value, "bit info should be unsigned"); Since you are forcing bit info should correspond to unsigned type, it will be good to add KnownBits.minValue() and KnownBits.maxValue() routines where, MinValue = KnownBits.ONES (assuming all the unknown bits are 0s) MaxValue = ~KnownBits.ZERO (assuming all unknown bits are 1s) src/hotspot/share/opto/rangeinference.hpp line 155: > 153: return t1->_lo == t2->_lo && t1->_hi == t2->_hi && t1->_ulo == t2->_ulo && t1->_uhi == t2->_uhi && > 154: t1->_bits._zeros == t2->_bits._zeros && t1->_bits._ones == t2->_bits._ones; > 155: } For clarity, Expression can be broken into separate signed, unsigned and known bit comparison followed by logical anding. src/hotspot/share/opto/rangeinference.hpp line 159: > 157: template > 158: static bool int_type_subset(const CT* super, const CT* sub) { > 159: return super->_lo <= sub->_lo && super->_hi >= sub->_hi && super->_ulo <= sub->_ulo && super->_uhi >= sub->_uhi && Same as above. src/hotspot/share/opto/type.hpp line 619: > 617: * it by one, which contradicts the assumption of the TypeInt being canonical. > 618: * > 619: * 2. Either _lo == jint(_lo) and _hi == jint(_uhi), or all elements of a Suggestion: * 2. Either _lo == jint(_ulo) and _hi == jint(_uhi), or all elements of a ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1768434099 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1768471879 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1768271214 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1768271934 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1768496674 From qamai at openjdk.org Fri Sep 20 17:35:56 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 20 Sep 2024 17:35:56 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v24] In-Reply-To: References: Message-ID: <1bVd4zmZZvFQMEpZOSmxKNs_UYQwpRbHIPQbinV1kK0=.ff94c815-2595-4939-b244-aef1488a5631@github.com> > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: address reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/c440a72a..4858e12c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=22-23 Stats: 7 lines in 2 files changed: 2 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Fri Sep 20 17:35:57 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 20 Sep 2024 17:35:57 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v22] In-Reply-To: References: Message-ID: On Fri, 20 Sep 2024 11:59:50 GMT, Jatin Bhateja wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> formality > > src/hotspot/share/opto/rangeinference.hpp line 70: > >> 68: template >> 69: class KnownBits { >> 70: static_assert(std::is_unsigned::value, "bit info should be unsigned"); > > Since you are forcing bit info should correspond to unsigned type, it will be good to add KnownBits.minValue() and KnownBits.maxValue() routines where, > > MinValue = KnownBits.ONES (assuming all the unknown bits are 0s) > MaxValue = ~KnownBits.ZERO (assuming all unknown bits are 1s) I would prefer to add them when they are needed. > src/hotspot/share/opto/rangeinference.hpp line 155: > >> 153: return t1->_lo == t2->_lo && t1->_hi == t2->_hi && t1->_ulo == t2->_ulo && t1->_uhi == t2->_uhi && >> 154: t1->_bits._zeros == t2->_bits._zeros && t1->_bits._ones == t2->_bits._ones; >> 155: } > > For clarity, Expression can be broken into separate signed, unsigned and known bit comparison followed by logical anding. That's a really good idea, I have done that. > src/hotspot/share/opto/type.hpp line 619: > >> 617: * it by one, which contradicts the assumption of the TypeInt being canonical. >> 618: * >> 619: * 2. Either _lo == jint(_lo) and _hi == jint(_uhi), or all elements of a > > Suggestion: > > * 2. Either _lo == jint(_ulo) and _hi == jint(_uhi), or all elements of a Thanks for noticing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1768965322 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1768958772 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1768959515 From kxu at openjdk.org Fri Sep 20 19:54:19 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Fri, 20 Sep 2024 19:54:19 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v3] In-Reply-To: References: Message-ID: > This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. > > As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). > > Some notable examples of this transformation include: > - `a + a + a` => `a*3` => `(a<<1) + a` > - `a + a + a + a` => `a*4` => `a<<2` > - `a*3 + a` => `a*4` => `a<<2` > - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` > > See included IR unit tests for more. Kangcheng Xu has updated the pull request incrementally with six additional commits since the last revision: - Merge pull request #2 from tabjy/arithmetic-canonicalization-v2 Arithmetic canonicalization v2 - remove unused variables - remove debug printfs - fix detecting optimized power-of-2 multiplication - revert usage of integercon(): truncation during jlong to jint is intended - implement rwestrel's changes, passing TestDigest ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20754/files - new: https://git.openjdk.org/jdk/pull/20754/files/c8fdb74c..30f119c5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20754&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20754&range=01-02 Stats: 43 lines in 2 files changed: 12 ins; 1 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/20754.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20754/head:pull/20754 PR: https://git.openjdk.org/jdk/pull/20754 From kxu at openjdk.org Fri Sep 20 19:56:38 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Fri, 20 Sep 2024 19:56:38 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v2] In-Reply-To: References: <-TWzHU1sjg49BOttKN_muQ6ICHAMbAnBoNUurRbqHDg=.a27bbf72-a451-4257-914f-d37e181ce257@github.com> Message-ID: On Tue, 17 Sep 2024 09:39:35 GMT, Roland Westrelin wrote: >> Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into arithmetic-canonicalization >> - Merge pull request #1 from tabjy/arithmetic-canonicalization-v2 >> >> Arithmetic canonicalization v2 >> - remove dead code >> - fix potential void type const nodes >> - refactor and cleanup >> - add more test cases >> - re-implement depth limit on recursion >> - passes TestIRLShiftIdeal_XPlusX_LShiftC >> - passes AddI[L]NodeIdealizationTests >> - revert depth limits >> - ... and 12 more: https://git.openjdk.org/jdk/compare/71681c74...c8fdb74c > > src/hotspot/share/opto/addnode.cpp line 490: > >> 488: if (bt == T_INT || bt == T_LONG) { // const could potentially be void type >> 489: Node* mul_base; >> 490: jlong multiplier = extract_base_operand_from_serial_additions(phase, operand_node, &mul_base, depth_limit - 1); > > Do you need to recurse at all here? I believe so. Consider the case `(a + a) * 3`. Recurse here allows us to extract `a` and factor `2 * 3 => 6` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1769185209 From duke at openjdk.org Sat Sep 21 06:44:44 2024 From: duke at openjdk.org (duke) Date: Sat, 21 Sep 2024 06:44:44 GMT Subject: Withdrawn: 8333893: Optimization for StringBuilder append boolean & null In-Reply-To: References: Message-ID: <35kiZDoVYk78VZHgNIqXfyWigKKTDqX4jn4ZH3GHCLo=.1367ac27-6bdf-43f0-9550-a4357fbd84e3@github.com> On Mon, 10 Jun 2024 12:12:58 GMT, Shaojin Wen wrote: > After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into primitive arrays by combining values ??into larger stores. > > This PR rewrites the code of appendNull and append(boolean) methods so that these two methods can be optimized by C2. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/19626 From fyang at openjdk.org Sat Sep 21 06:48:45 2024 From: fyang at openjdk.org (Fei Yang) Date: Sat, 21 Sep 2024 06:48:45 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v24] In-Reply-To: <-TKP8SOh4eIFRiOcn-a3aRb_GEgApaWv9vhyEOiKJSw=.d4ade865-3cb1-477d-a7c7-f46449553a64@github.com> References: <-TKP8SOh4eIFRiOcn-a3aRb_GEgApaWv9vhyEOiKJSw=.d4ade865-3cb1-477d-a7c7-f46449553a64@github.com> Message-ID: On Wed, 18 Sep 2024 17:45:51 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion > - Remove redundant comment src/hotspot/cpu/aarch64/gc/g1/g1_aarch64.ad line 257: > 255: RegSet::of($res$$Register) /* no_preserve */); > 256: __ mov($tmp1$$Register, $oldval$$Register); > 257: __ mov($tmp2$$Register, $newval$$Register); Hi, I don't quite understand these two register-register moves here. Seems to me that we could pass `oldval` and `newval` to `cmpxchg` directly as `cmpxchg` won't modify them, which help us save these two moves. Did I miss anything? Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1769492955 From liach at openjdk.org Sun Sep 22 02:06:44 2024 From: liach at openjdk.org (Chen Liang) Date: Sun, 22 Sep 2024 02:06:44 GMT Subject: RFR: 8333893: Optimization for StringBuilder append boolean & null [v17] In-Reply-To: References: Message-ID: On Sat, 24 Aug 2024 06:27:26 GMT, Shaojin Wen wrote: >> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into primitive arrays by combining values ??into larger stores. >> >> This PR rewrites the code of appendNull and append(boolean) methods so that these two methods can be optimized by C2. > > Shaojin Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - replace unsafe with putChar > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - private static final field `UNSAFE` > - Utf16 case remove `append first utf16 char` > - `delete` -> `setLength` > - copyright 2024 > - optimization for x64 > - ... and 9 more: https://git.openjdk.org/jdk/compare/381c1987...61196ecd src/java.base/share/classes/java/lang/AbstractStringBuilder.java line 640: > 638: private AbstractStringBuilder appendNull() { > 639: ensureCapacityInternal(count + 4); > 640: int count = this.count; We should declare `count` before `ensureCapacitiyInternal`. Same for append boolean. test/hotspot/jtreg/compiler/patches/java.base/java/lang/Helper.java line 136: > 134: > 135: public static int putCharsAt(byte[] value, int i, char c1, char c2, char c3, char c4) { > 136: return StringUTF16.putCharsAt(value, i, c1, c2, c3, c4); Why do we remove the tests for UTF16? And should we add another set of test for LATIN1 too? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19626#discussion_r1769695101 PR Review Comment: https://git.openjdk.org/jdk/pull/19626#discussion_r1769694934 From jbhateja at openjdk.org Sun Sep 22 09:49:38 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 22 Sep 2024 09:49:38 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 16:13:55 GMT, Quan Anh Mai wrote: > Hi, > > This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. > > Regarding the related issues: > > - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. > - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` > - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. > > Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. > > Please take a look and leave reviews. Thanks a lot. > > The description of the original PR: > > This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: > > Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. > Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. > Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. > Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; {ex... > Got it, I think #20508 and this PR are unrelated implementation-wise, though. > > @jatin-bhateja What do you think of using this patch and intrinsifing `Vector::rearrange(VectorShuffle, Vector)` instead of introducing the 2 vector `selectFrom` API? Hi @merykitty , I had implemented as [similar LoadShuffle bypassing optimization](https://github.com/openjdk/jdk/pull/20508/commits/7c80bfce59f486f6c25aec13f0f0f6a42f5319b1) in my original implementation of PR #20508 , which we decided to address in subsequent patch for both the flavors of selectFromAPI. Main difference b/w two vector re-arrange and selectFrom API is w.r.t to their signatures and acceptable index ranges post wrapping. In the latter case wrapping brings down the index range into [0, 2*VLEN -1) while in the former case we prune the exceptional indexes into valid single vector index range [0, VLEN) augmented with selection mask which picks the elements from independently permuted vectors to produce result vector. Unlike single vector re-arrange which now favors index wrapping parting ways from throwing IndexOutOfBounds exception for exceptional indexes (-ve indexes), two vector re-arrange wraps exceptional indexes into valid single vector range. To bring the exceptional indexes into valid two vector range will need changes int wrapping logic to add 2*VECLEN to exceptional indexes, but this may be implemented in target specific manner, we can take this up in a follow up patch after integrating #20508 As Paul mentioned, vector rearrange and selectFrom are complimentary APIs with different signatures and we intend to produce optimal code for both. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2366421736 From dnsimon at openjdk.org Sun Sep 22 10:47:35 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Sun, 22 Sep 2024 10:47:35 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 16:53:40 GMT, Tom?? Zezula wrote: > Disabling the JVMCI compiler with `-XX:-UseJVMCICompiler` not only deactivates JVMCI-based CompileBroker compilations but also prevents the loading of the libjvmci compiler. While this works as expected for CompileBroker compilations, it poses issues for the Truffle compiler. When `-XX:-UseJVMCICompiler` is used, Truffle falls back to the jargraal compiler, if available. This behavior may be confusing for Truffle users. > > Expected behavior: > > With `-XX:+UseGraalJIT`, both CompileBroker compilations and Truffle compilations should utilize the libjvmci compiler, if available. > With `-XX:+EnableJVMCI`, CompileBroker compilations should use the C2 compiler, while only Truffle compilations should leverage the libjvmci compiler, if available. src/hotspot/share/jvmci/jvmci_globals.cpp line 84: > 82: if (EnableJVMCI) { > 83: if (FLAG_IS_DEFAULT(UseJVMCINativeLibrary) && !UseJVMCINativeLibrary) { > 84: char path[JVM_MAXPATHLEN]; This check for enabling `UseJVMCINativeLibrary` should really be: if (UseJVMCICompiler) { if (FLAG_IS_DEFAULT(UseJVMCINativeLibrary) && !UseJVMCINativeLibrary) { - char path[JVM_MAXPATHLEN]; - if (os::dll_locate_lib(path, sizeof(path), Arguments::get_dll_dir(), JVMCI_SHARED_LIBRARY_NAME)) { + if (JVMCI::shared_library_exists()) { // If a JVMCI native library is present, but I will address that as part of [JDK-8340576](https://bugs.openjdk.org/browse/JDK-8340576). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21069#discussion_r1770520057 From dnsimon at openjdk.org Sun Sep 22 11:49:35 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Sun, 22 Sep 2024 11:49:35 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 16:53:40 GMT, Tom?? Zezula wrote: > Disabling the JVMCI compiler with `-XX:-UseJVMCICompiler` not only deactivates JVMCI-based CompileBroker compilations but also prevents the loading of the libjvmci compiler. While this works as expected for CompileBroker compilations, it poses issues for the Truffle compiler. When `-XX:-UseJVMCICompiler` is used, Truffle falls back to the jargraal compiler, if available. This behavior may be confusing for Truffle users. > > Expected behavior: > > With `-XX:+UseGraalJIT`, both CompileBroker compilations and Truffle compilations should utilize the libjvmci compiler, if available. > With `-XX:+EnableJVMCI`, CompileBroker compilations should use the C2 compiler, while only Truffle compilations should leverage the libjvmci compiler, if available. src/hotspot/share/jvmci/jvmci_globals.cpp line 82: > 80: CHECK_NOT_SET(LibJVMCICompilerThreadHidden, UseJVMCICompiler) > 81: > 82: if (EnableJVMCI) { This needs to be `EnableJVMCI || UseJVMCICompiler` (since deriving `EnableJVMCI` from `UseJVMCICompiler` is only done [below](https://github.com/openjdk/jdk/blob/ab06a878f888827026424530781f0af414a8a611/src/hotspot/share/jvmci/jvmci_globals.cpp#L96)). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21069#discussion_r1770532660 From swen at openjdk.org Sun Sep 22 16:17:06 2024 From: swen at openjdk.org (Shaojin Wen) Date: Sun, 22 Sep 2024 16:17:06 GMT Subject: RFR: 8333893: Optimization for StringBuilder append boolean & null [v18] In-Reply-To: References: Message-ID: > After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into primitive arrays by combining values ??into larger stores. > > This PR rewrites the code of appendNull and append(boolean) methods so that these two methods can be optimized by C2. Shaojin Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 23 additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 - revert test - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 - replace unsafe with putChar - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 - private static final field `UNSAFE` - ... and 13 more: https://git.openjdk.org/jdk/compare/7ba4356c...399c8ef5 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19626/files - new: https://git.openjdk.org/jdk/pull/19626/files/61196ecd..399c8ef5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19626&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19626&range=16-17 Stats: 180711 lines in 1606 files changed: 163525 ins; 8881 del; 8305 mod Patch: https://git.openjdk.org/jdk/pull/19626.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19626/head:pull/19626 PR: https://git.openjdk.org/jdk/pull/19626 From swen at openjdk.org Sun Sep 22 16:17:06 2024 From: swen at openjdk.org (Shaojin Wen) Date: Sun, 22 Sep 2024 16:17:06 GMT Subject: RFR: 8333893: Optimization for StringBuilder append boolean & null [v18] In-Reply-To: References: Message-ID: On Sun, 22 Sep 2024 02:01:36 GMT, Chen Liang wrote: >> Shaojin Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 23 additional commits since the last revision: >> >> - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 >> - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 >> - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 >> - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 >> - revert test >> - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 >> - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 >> - replace unsafe with putChar >> - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 >> - private static final field `UNSAFE` >> - ... and 13 more: https://git.openjdk.org/jdk/compare/7ba4356c...399c8ef5 > > test/hotspot/jtreg/compiler/patches/java.base/java/lang/Helper.java line 136: > >> 134: >> 135: public static int putCharsAt(byte[] value, int i, char c1, char c2, char c3, char c4) { >> 136: return StringUTF16.putCharsAt(value, i, c1, c2, c3, c4); > > Why do we remove the tests for UTF16? And should we add another set of test for LATIN1 too? An early version removed putCharsAt, so it was also removed from Helpers. I have added it back. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19626#discussion_r1770586891 From swen at openjdk.org Sun Sep 22 16:20:40 2024 From: swen at openjdk.org (Shaojin Wen) Date: Sun, 22 Sep 2024 16:20:40 GMT Subject: RFR: 8333893: Optimization for StringBuilder append boolean & null [v17] In-Reply-To: References: Message-ID: On Sun, 22 Sep 2024 02:03:02 GMT, Chen Liang wrote: > We should declare `count` before `ensureCapacitiyInternal`. Same for append boolean. Declaring count before ensureCapacityInternal will cause performance regression under x64. It took a lot of time to find this, but the underlying reason is still unclear. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19626#discussion_r1770587530 From gcao at openjdk.org Mon Sep 23 03:09:11 2024 From: gcao at openjdk.org (Gui Cao) Date: Mon, 23 Sep 2024 03:09:11 GMT Subject: RFR: 8340590: RISC-V: C2: Small improvement to vector gather load and scattter store Message-ID: Hi, This is a small improvement for RISC-V C2 vector gather load and scattter store nodes. Currently, we emit whole vector register move (vmv1r.v) to move vector idx to a temp vector register. But a normal vector integer move (vmv.v.v) would do and is more reasonable as the vtype and vl are known here. ### Testing - [x] make test TEST="jdk_vector" JTREG="TIMEOUT_FACTOR=32" on Banana Pi BPI-F3 board (with RVV1.0) ------------- Commit messages: - 8340590: RISC-V: C2: Small improvement to vector gather load and scattter store Changes: https://git.openjdk.org/jdk/pull/21123/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21123&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340590 Stats: 8 lines in 1 file changed: 4 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21123.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21123/head:pull/21123 PR: https://git.openjdk.org/jdk/pull/21123 From fyang at openjdk.org Mon Sep 23 04:14:38 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 23 Sep 2024 04:14:38 GMT Subject: RFR: 8340590: RISC-V: C2: Small improvement to vector gather load and scattter store In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 03:03:23 GMT, Gui Cao wrote: > Hi, > This is a small improvement for RISC-V C2 vector gather load and scattter store nodes. Currently, we emit whole vector register move (vmv1r.v) to move vector idx to a temp vector register. But a normal vector integer move (vmv.v.v) would do and is more reasonable as the vtype and vl are known here. > > > ### Testing > - [x] make test TEST="jdk_vector" JTREG="TIMEOUT_FACTOR=32" on qemu with UseRVV1.0 Looks fine. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21123#pullrequestreview-2321101686 From fyang at openjdk.org Mon Sep 23 05:52:35 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 23 Sep 2024 05:52:35 GMT Subject: RFR: 8340590: RISC-V: C2: Small improvement to vector gather load and scattter store In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 03:03:23 GMT, Gui Cao wrote: > Hi, > This is a small improvement for RISC-V C2 vector gather load and scattter store nodes. Currently, we emit whole vector register move (vmv1r.v) to move vector idx to a temp vector register. But a normal vector integer move (vmv.v.v) would do and is more reasonable as the vtype and vl are known here. > > > ### Testing > - [x] make test TEST="jdk_vector" JTREG="TIMEOUT_FACTOR=32" on qemu with UseRVV1.0 PS: Seems to me that this could be further improved after some more thinking. These `vmv_v_v` instructions could be eliminated if we use `idx` directly as input for `vsll_vi`, like this addon change: diff --git a/src/hotspot/cpu/riscv/riscv_v.ad b/src/hotspot/cpu/riscv/riscv_v.ad index a3c426e71d2..510c0ff5d46 100644 --- a/src/hotspot/cpu/riscv/riscv_v.ad +++ b/src/hotspot/cpu/riscv/riscv_v.ad @@ -4898,8 +4898,7 @@ instruct gather_loadS(vReg dst, indirect mem, vReg idx) %{ BasicType bt = Matcher::vector_element_basic_type(this); Assembler::SEW sew = Assembler::elemtype_to_sew(bt); __ vsetvli_helper(bt, Matcher::vector_length(this)); - __ vmv_v_v(as_VectorRegister($dst$$reg), as_VectorRegister($idx$$reg)); - __ vsll_vi(as_VectorRegister($dst$$reg), as_VectorRegister($dst$$reg), (int)sew); + __ vsll_vi(as_VectorRegister($dst$$reg), as_VectorRegister($idx$$reg), (int)sew); __ vluxei32_v(as_VectorRegister($dst$$reg), as_Register($mem$$base), as_VectorRegister($dst$$reg)); %} @@ -4932,8 +4931,7 @@ instruct gather_loadS_masked(vReg dst, indirect mem, vReg idx, vRegMask_V0 v0, v BasicType bt = Matcher::vector_element_basic_type(this); Assembler::SEW sew = Assembler::elemtype_to_sew(bt); __ vsetvli_helper(bt, Matcher::vector_length(this)); - __ vmv_v_v(as_VectorRegister($tmp$$reg), as_VectorRegister($idx$$reg)); - __ vsll_vi(as_VectorRegister($tmp$$reg), as_VectorRegister($tmp$$reg), (int)sew); + __ vsll_vi(as_VectorRegister($tmp$$reg), as_VectorRegister($idx$$reg), (int)sew); __ vxor_vv(as_VectorRegister($dst$$reg), as_VectorRegister($dst$$reg), as_VectorRegister($dst$$reg)); __ vluxei32_v(as_VectorRegister($dst$$reg), as_Register($mem$$base), @@ -4972,8 +4970,7 @@ instruct scatter_storeS(indirect mem, vReg src, vReg idx, vReg tmp) %{ BasicType bt = Matcher::vector_element_basic_type(this, $src); Assembler::SEW sew = Assembler::elemtype_to_sew(bt); __ vsetvli_helper(bt, Matcher::vector_length(this, $src)); - __ vmv_v_v(as_VectorRegister($tmp$$reg), as_VectorRegister($idx$$reg)); - __ vsll_vi(as_VectorRegister($tmp$$reg), as_VectorRegister($tmp$$reg), (int)sew); + __ vsll_vi(as_VectorRegister($tmp$$reg), as_VectorRegister($idx$$reg), (int)sew); __ vsuxei32_v(as_VectorRegister($src$$reg), as_Register($mem$$base), as_VectorRegister($tmp$$reg)); %} @@ -5006,8 +5003,7 @@ instruct scatter_storeS_masked(indirect mem, vReg src, vReg idx, vRegMask_V0 v0, BasicType bt = Matcher::vector_element_basic_type(this, $src); Assembler::SEW sew = Assembler::elemtype_to_sew(bt); __ vsetvli_helper(bt, Matcher::vector_length(this, $src)); - __ vmv_v_v(as_VectorRegister($tmp$$reg), as_VectorRegister($idx$$reg)); - __ vsll_vi(as_VectorRegister($tmp$$reg), as_VectorRegister($tmp$$reg), (int)sew); + __ vsll_vi(as_VectorRegister($tmp$$reg), as_VectorRegister($idx$$reg), (int)sew); __ vsuxei32_v(as_VectorRegister($src$$reg), as_Register($mem$$base), as_VectorRegister($tmp$$reg), Assembler::v0_t); ------------- PR Comment: https://git.openjdk.org/jdk/pull/21123#issuecomment-2367283027 From dzhang at openjdk.org Mon Sep 23 06:20:37 2024 From: dzhang at openjdk.org (Dingli Zhang) Date: Mon, 23 Sep 2024 06:20:37 GMT Subject: RFR: 8340590: RISC-V: C2: Small improvement to vector gather load and scattter store In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 05:48:29 GMT, Fei Yang wrote: > PS: Seems to me that this could be further improved after some more thinking. These `vmv_v_v` instructions could be eliminated if we use `idx` directly as input for `vsll_vi`, like this addon change: > > ``` > diff --git a/src/hotspot/cpu/riscv/riscv_v.ad b/src/hotspot/cpu/riscv/riscv_v.ad > index a3c426e71d2..510c0ff5d46 100644 > --- a/src/hotspot/cpu/riscv/riscv_v.ad > +++ b/src/hotspot/cpu/riscv/riscv_v.ad > @@ -4898,8 +4898,7 @@ instruct gather_loadS(vReg dst, indirect mem, vReg idx) %{ > BasicType bt = Matcher::vector_element_basic_type(this); > Assembler::SEW sew = Assembler::elemtype_to_sew(bt); > __ vsetvli_helper(bt, Matcher::vector_length(this)); > - __ vmv_v_v(as_VectorRegister($dst$$reg), as_VectorRegister($idx$$reg)); > - __ vsll_vi(as_VectorRegister($dst$$reg), as_VectorRegister($dst$$reg), (int)sew); > + __ vsll_vi(as_VectorRegister($dst$$reg), as_VectorRegister($idx$$reg), (int)sew); > __ vluxei32_v(as_VectorRegister($dst$$reg), as_Register($mem$$base), > as_VectorRegister($dst$$reg)); > %} > @@ -4932,8 +4931,7 @@ instruct gather_loadS_masked(vReg dst, indirect mem, vReg idx, vRegMask_V0 v0, v > BasicType bt = Matcher::vector_element_basic_type(this); > Assembler::SEW sew = Assembler::elemtype_to_sew(bt); > __ vsetvli_helper(bt, Matcher::vector_length(this)); > - __ vmv_v_v(as_VectorRegister($tmp$$reg), as_VectorRegister($idx$$reg)); > - __ vsll_vi(as_VectorRegister($tmp$$reg), as_VectorRegister($tmp$$reg), (int)sew); > + __ vsll_vi(as_VectorRegister($tmp$$reg), as_VectorRegister($idx$$reg), (int)sew); > __ vxor_vv(as_VectorRegister($dst$$reg), as_VectorRegister($dst$$reg), > as_VectorRegister($dst$$reg)); > __ vluxei32_v(as_VectorRegister($dst$$reg), as_Register($mem$$base), > @@ -4972,8 +4970,7 @@ instruct scatter_storeS(indirect mem, vReg src, vReg idx, vReg tmp) %{ > BasicType bt = Matcher::vector_element_basic_type(this, $src); > Assembler::SEW sew = Assembler::elemtype_to_sew(bt); > __ vsetvli_helper(bt, Matcher::vector_length(this, $src)); > - __ vmv_v_v(as_VectorRegister($tmp$$reg), as_VectorRegister($idx$$reg)); > - __ vsll_vi(as_VectorRegister($tmp$$reg), as_VectorRegister($tmp$$reg), (int)sew); > + __ vsll_vi(as_VectorRegister($tmp$$reg), as_VectorRegister($idx$$reg), (int)sew); > __ vsuxei32_v(as_VectorRegister($src$$reg), as_Register($mem$$base), > as_VectorRegister($tmp$$reg)); > %} > @@ -5006,8 +5003,7 @@ instruct scatter_storeS_masked(indirect mem, vReg src, vReg idx, vRegMask_V0 v0, > BasicType bt = Matcher::vector_element_basic_type(this, $src); > Assembler::SEW sew = Assembler::elemtype_to_sew(bt); > __ vsetvli_helper(bt, Matcher::vector_length(this, $src)); > - __ vmv_v_v(as_VectorRegister($tmp$$reg), as_VectorRegister($idx$$reg)); > - __ vsll_vi(as_VectorRegister($tmp$$reg), as_VectorRegister($tmp$$reg), (int)sew); > + __ vsll_vi(as_VectorRegister($tmp$$reg), as_VectorRegister($idx$$reg), (int)sew); > __ vsuxei32_v(as_VectorRegister($src$$reg), as_Register($mem$$base), > as_VectorRegister($tmp$$reg), Assembler::v0_t); > ``` It looks good, thanks for all. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21123#issuecomment-2367314486 From duke at openjdk.org Mon Sep 23 06:21:18 2024 From: duke at openjdk.org (Daniel Skantz) Date: Mon, 23 Sep 2024 06:21:18 GMT Subject: RFR: 8330157: C2: Add a stress flag for bailouts [v10] In-Reply-To: References: Message-ID: > This patch adds a diagnostic/stress flag for C2 bailouts. It can be used to support testing of existing bailouts to prevent issues like [JDK-8318445](https://bugs.openjdk.org/browse/JDK-8318445), and can test for issues only seen at runtime such as [JDK-8326376](https://bugs.openjdk.org/browse/JDK-8326376). It can also be useful if we want to add more bailouts ([JDK-8318900](https://bugs.openjdk.org/browse/JDK-8318900)). > > We check two invariants. > a) Bailouts should be successful starting from any given `failing()` check. > b) The VM should not record a bailout when one is pending (in which case we have continued to optimize for too long). > > a), b) are checked by randomly starting a bailout at calls to `failing()` with a user-given probability. > > The added flag should not have any effect in debug mode. > > Testing: > > T1-5, with flag and without it. We want to check that this does not cause any test failures without the flag set, and no unexpected failures with it. Tests failing because of timeout or because an error is printed to output when compilation fails can be expected in some cases. Daniel Skantz has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 40 commits: - phrasing - nextInt - +1 case - words - Tweak CtwRunner.java; debug only bool flag - merge - +1 whitespace - tweak requires - rename secondary flag v2 - propagate external flags to test - ... and 30 more: https://git.openjdk.org/jdk/compare/10050a72...9ab95aa6 ------------- Changes: https://git.openjdk.org/jdk/pull/19646/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19646&range=09 Stats: 199 lines in 17 files changed: 164 ins; 0 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/19646.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19646/head:pull/19646 PR: https://git.openjdk.org/jdk/pull/19646 From gcao at openjdk.org Mon Sep 23 07:14:08 2024 From: gcao at openjdk.org (Gui Cao) Date: Mon, 23 Sep 2024 07:14:08 GMT Subject: RFR: 8340590: RISC-V: C2: Small improvement to vector gather load and scattter store [v2] In-Reply-To: References: Message-ID: <8n5tZswHsXFfTJa63f8EkLCau8wg1NSj5hKS67254A4=.5f642538-6e7e-47ed-9895-bbeb7191bcda@github.com> > Hi, > This is a small improvement for RISC-V C2 vector gather load and scattter store nodes. Currently, we emit whole vector register move (vmv1r.v) to move vector idx to a temp vector register. But a normal vector integer move (vmv.v.v) would do and is more reasonable as the vtype and vl are known here. > > > ### Testing > - [x] make test TEST="jdk_vector" JTREG="TIMEOUT_FACTOR=32" on qemu with UseRVV1.0 Gui Cao has updated the pull request incrementally with one additional commit since the last revision: Polishing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21123/files - new: https://git.openjdk.org/jdk/pull/21123/files/7f1cb3ec..03f6e3a8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21123&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21123&range=00-01 Stats: 8 lines in 1 file changed: 0 ins; 4 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21123.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21123/head:pull/21123 PR: https://git.openjdk.org/jdk/pull/21123 From fyang at openjdk.org Mon Sep 23 07:19:36 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 23 Sep 2024 07:19:36 GMT Subject: RFR: 8340590: RISC-V: C2: Small improvement to vector gather load and scattter store [v2] In-Reply-To: <8n5tZswHsXFfTJa63f8EkLCau8wg1NSj5hKS67254A4=.5f642538-6e7e-47ed-9895-bbeb7191bcda@github.com> References: <8n5tZswHsXFfTJa63f8EkLCau8wg1NSj5hKS67254A4=.5f642538-6e7e-47ed-9895-bbeb7191bcda@github.com> Message-ID: On Mon, 23 Sep 2024 07:14:08 GMT, Gui Cao wrote: >> Hi, >> This is a small improvement for RISC-V C2 vector gather load and scattter store nodes. Currently, we emit whole vector register move (vmv1r.v) to move vector idx to a temp vector register. But a normal vector integer move (vmv.v.v) would do and is more reasonable as the vtype and vl are known here. >> >> >> ### Testing >> - [x] make test TEST="jdk_vector" JTREG="TIMEOUT_FACTOR=32" on qemu with UseRVV1.0 > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Polishing Updated change looks good. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21123#pullrequestreview-2321295463 From gcao at openjdk.org Mon Sep 23 07:19:37 2024 From: gcao at openjdk.org (Gui Cao) Date: Mon, 23 Sep 2024 07:19:37 GMT Subject: RFR: 8340590: RISC-V: C2: Small improvement to vector gather load and scattter store In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 05:48:29 GMT, Fei Yang wrote: >> Hi, >> This is a small improvement for RISC-V C2 vector gather load and scattter store nodes. Currently, we emit whole vector register move (vmv1r.v) to move vector idx to a temp vector register. But a normal vector integer move (vmv.v.v) would do and is more reasonable as the vtype and vl are known here. >> >> >> ### Testing >> - [x] make test TEST="jdk_vector" JTREG="TIMEOUT_FACTOR=32" on qemu with UseRVV1.0 > > PS: Seems to me that this could be further improved after some more thinking. > These `vmv_v_v` instructions could be eliminated if we use `idx` directly as input for `vsll_vi`, like this addon change: > > > diff --git a/src/hotspot/cpu/riscv/riscv_v.ad b/src/hotspot/cpu/riscv/riscv_v.ad > index a3c426e71d2..510c0ff5d46 100644 > --- a/src/hotspot/cpu/riscv/riscv_v.ad > +++ b/src/hotspot/cpu/riscv/riscv_v.ad > @@ -4898,8 +4898,7 @@ instruct gather_loadS(vReg dst, indirect mem, vReg idx) %{ > BasicType bt = Matcher::vector_element_basic_type(this); > Assembler::SEW sew = Assembler::elemtype_to_sew(bt); > __ vsetvli_helper(bt, Matcher::vector_length(this)); > - __ vmv_v_v(as_VectorRegister($dst$$reg), as_VectorRegister($idx$$reg)); > - __ vsll_vi(as_VectorRegister($dst$$reg), as_VectorRegister($dst$$reg), (int)sew); > + __ vsll_vi(as_VectorRegister($dst$$reg), as_VectorRegister($idx$$reg), (int)sew); > __ vluxei32_v(as_VectorRegister($dst$$reg), as_Register($mem$$base), > as_VectorRegister($dst$$reg)); > %} > @@ -4932,8 +4931,7 @@ instruct gather_loadS_masked(vReg dst, indirect mem, vReg idx, vRegMask_V0 v0, v > BasicType bt = Matcher::vector_element_basic_type(this); > Assembler::SEW sew = Assembler::elemtype_to_sew(bt); > __ vsetvli_helper(bt, Matcher::vector_length(this)); > - __ vmv_v_v(as_VectorRegister($tmp$$reg), as_VectorRegister($idx$$reg)); > - __ vsll_vi(as_VectorRegister($tmp$$reg), as_VectorRegister($tmp$$reg), (int)sew); > + __ vsll_vi(as_VectorRegister($tmp$$reg), as_VectorRegister($idx$$reg), (int)sew); > __ vxor_vv(as_VectorRegister($dst$$reg), as_VectorRegister($dst$$reg), > as_VectorRegister($dst$$reg)); > __ vluxei32_v(as_VectorRegister($dst$$reg), as_Register($mem$$base), > @@ -4972,8 +4970,7 @@ instruct scatter_storeS(indirect mem, vReg src, vReg idx, vReg tmp) %{ > BasicType bt = Matcher::vector_element_basic_type(this, $src); > Assembler::SEW sew = Assembler::elemtype_to_sew(bt); > __ vsetvli_helper(bt, Matcher::vector_length(this, $src)); > - __ vmv_v_v(as_VectorRegister($tmp$$reg), as_VectorRegister($idx$$reg)); > - __ vsll_vi(as_VectorRegister($tmp$$reg), as_VectorRegister($tmp$$reg), (int)sew); > + __ vsll_vi(as_VectorRegister($tmp$$reg), as_VectorRegister($idx$$reg), (int)sew); > __ vsuxei32_v(as_VectorRegister($src$$reg), as_Register($mem$$base), > as_VectorRegister($tmp$$reg)); > %} > @@ -5006,8 +5003,7 @@ instruct scatter_storeS_masked(indirect mem, vR... @RealFYang @DingliZhang : Thanks all for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21123#issuecomment-2367400321 From duke at openjdk.org Mon Sep 23 07:26:36 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Mon, 23 Sep 2024 07:26:36 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option In-Reply-To: References: Message-ID: On Sun, 22 Sep 2024 11:46:49 GMT, Doug Simon wrote: >> Disabling the JVMCI compiler with `-XX:-UseJVMCICompiler` not only deactivates JVMCI-based CompileBroker compilations but also prevents the loading of the libjvmci compiler. While this works as expected for CompileBroker compilations, it poses issues for the Truffle compiler. When `-XX:-UseJVMCICompiler` is used, Truffle falls back to the jargraal compiler, if available. This behavior may be confusing for Truffle users. >> >> Expected behavior: >> >> With `-XX:+UseGraalJIT`, both CompileBroker compilations and Truffle compilations should utilize the libjvmci compiler, if available. >> With `-XX:+EnableJVMCI`, CompileBroker compilations should use the C2 compiler, while only Truffle compilations should leverage the libjvmci compiler, if available. > > src/hotspot/share/jvmci/jvmci_globals.cpp line 82: > >> 80: CHECK_NOT_SET(LibJVMCICompilerThreadHidden, UseJVMCICompiler) >> 81: >> 82: if (EnableJVMCI) { > > This needs to be `EnableJVMCI || UseJVMCICompiler` (since deriving `EnableJVMCI` from `UseJVMCICompiler` is only done [below](https://github.com/openjdk/jdk/blob/ab06a878f888827026424530781f0af414a8a611/src/hotspot/share/jvmci/jvmci_globals.cpp#L96)). I see, `FLAG_SET_DEFAULT(EnableJVMCI, true)` on [line 99](https://github.com/openjdk/jdk/blob/78f576192e815f957db93f5f8cb3763a35474381/src/hotspot/share/jvmci/jvmci_globals.cpp#L99). Maybe moving this block if (!FLAG_IS_DEFAULT(EnableJVMCI) && !EnableJVMCI) { jio_fprintf(defaultStream::error_stream(), "Improperly specified VM option UseJVMCICompiler: EnableJVMCI cannot be disabled\n"); return false; } FLAG_SET_DEFAULT(EnableJVMCI, true); in front of my change makes it more readable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21069#discussion_r1770881033 From duke at openjdk.org Mon Sep 23 07:31:10 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Mon, 23 Sep 2024 07:31:10 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option [v2] In-Reply-To: References: Message-ID: > Disabling the JVMCI compiler with `-XX:-UseJVMCICompiler` not only deactivates JVMCI-based CompileBroker compilations but also prevents the loading of the libjvmci compiler. While this works as expected for CompileBroker compilations, it poses issues for the Truffle compiler. When `-XX:-UseJVMCICompiler` is used, Truffle falls back to the jargraal compiler, if available. This behavior may be confusing for Truffle users. > > Expected behavior: > > With `-XX:+UseGraalJIT`, both CompileBroker compilations and Truffle compilations should utilize the libjvmci compiler, if available. > With `-XX:+EnableJVMCI`, CompileBroker compilations should use the C2 compiler, while only Truffle compilations should leverage the libjvmci compiler, if available. Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: JDK-8340398: Fixed EnableJVMCI handling. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21069/files - new: https://git.openjdk.org/jdk/pull/21069/files/78f57619..b7550463 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21069&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21069&range=00-01 Stats: 15 lines in 1 file changed: 9 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21069.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21069/head:pull/21069 PR: https://git.openjdk.org/jdk/pull/21069 From dnsimon at openjdk.org Mon Sep 23 07:39:37 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 23 Sep 2024 07:39:37 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option [v2] In-Reply-To: References: Message-ID: <2Z_U9xcRXvMmqxA2GKHRb_1j1J-50ULmfK1VGxs-Hqo=.b1af5258-2c74-4a23-8ff9-7b91c3f103f5@github.com> On Mon, 23 Sep 2024 07:21:32 GMT, Tom?? Zezula wrote: >> src/hotspot/share/jvmci/jvmci_globals.cpp line 82: >> >>> 80: CHECK_NOT_SET(LibJVMCICompilerThreadHidden, UseJVMCICompiler) >>> 81: >>> 82: if (EnableJVMCI) { >> >> This needs to be `EnableJVMCI || UseJVMCICompiler` (since deriving `EnableJVMCI` from `UseJVMCICompiler` is only done [below](https://github.com/openjdk/jdk/blob/ab06a878f888827026424530781f0af414a8a611/src/hotspot/share/jvmci/jvmci_globals.cpp#L96)). > > I see, `FLAG_SET_DEFAULT(EnableJVMCI, true)` on [line 99](https://github.com/openjdk/jdk/blob/78f576192e815f957db93f5f8cb3763a35474381/src/hotspot/share/jvmci/jvmci_globals.cpp#L99). > Maybe moving this block > > if (!FLAG_IS_DEFAULT(EnableJVMCI) && !EnableJVMCI) { > jio_fprintf(defaultStream::error_stream(), > "Improperly specified VM option UseJVMCICompiler: EnableJVMCI cannot be disabled\n"); > return false; > } > FLAG_SET_DEFAULT(EnableJVMCI, true); > > in front of my change makes it more readable. It's not obvious to me how that's clearer than just expanding the guard on line 82 to be `EnableJVMCI || UseJVMCICompiler`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21069#discussion_r1770895929 From dnsimon at openjdk.org Mon Sep 23 07:39:37 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 23 Sep 2024 07:39:37 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option [v2] In-Reply-To: <2Z_U9xcRXvMmqxA2GKHRb_1j1J-50ULmfK1VGxs-Hqo=.b1af5258-2c74-4a23-8ff9-7b91c3f103f5@github.com> References: <2Z_U9xcRXvMmqxA2GKHRb_1j1J-50ULmfK1VGxs-Hqo=.b1af5258-2c74-4a23-8ff9-7b91c3f103f5@github.com> Message-ID: On Mon, 23 Sep 2024 07:34:32 GMT, Doug Simon wrote: >> I see, `FLAG_SET_DEFAULT(EnableJVMCI, true)` on [line 99](https://github.com/openjdk/jdk/blob/78f576192e815f957db93f5f8cb3763a35474381/src/hotspot/share/jvmci/jvmci_globals.cpp#L99). >> Maybe moving this block >> >> if (!FLAG_IS_DEFAULT(EnableJVMCI) && !EnableJVMCI) { >> jio_fprintf(defaultStream::error_stream(), >> "Improperly specified VM option UseJVMCICompiler: EnableJVMCI cannot be disabled\n"); >> return false; >> } >> FLAG_SET_DEFAULT(EnableJVMCI, true); >> >> in front of my change makes it more readable. > > It's not obvious to me how that's clearer than just expanding the guard on line 82 to be `EnableJVMCI || UseJVMCICompiler`. Now that I see your change and understand what you meant, it is better - thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21069#discussion_r1770898087 From rcastanedalo at openjdk.org Mon Sep 23 07:48:16 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 23 Sep 2024 07:48:16 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v25] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 46 additional commits since the last revision: - Merge jdk-24+16 - Ensure that detected encode-and-store patterns are matched - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion - Remove redundant comment - Assert that unneeded stub tmp registers are not initialized in x64 and aarch64 platforms - Set tmp registers to noreg by default in G1PreBarrierStubC2::initialize_registers, for consistency - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion - Restore some asserts - Default values for tmp regs of G1PostBarrierStubC2 - 8334060: [arm32] Implementation of Late Barrier Expansion for G1 - ... and 36 more: https://git.openjdk.org/jdk/compare/bdb0e33c...47c982ba ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/d54d67f1..47c982ba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=23-24 Stats: 170497 lines in 1328 files changed: 155223 ins; 8073 del; 7201 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Mon Sep 23 07:57:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 23 Sep 2024 07:57:52 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v25] In-Reply-To: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Fri, 13 Sep 2024 22:51:59 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 46 additional commits since the last revision: >> >> - Merge jdk-24+16 >> - Ensure that detected encode-and-store patterns are matched >> - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion >> - Remove redundant comment >> - Assert that unneeded stub tmp registers are not initialized in x64 and aarch64 platforms >> - Set tmp registers to noreg by default in G1PreBarrierStubC2::initialize_registers, for consistency >> - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion >> - Restore some asserts >> - Default values for tmp regs of G1PostBarrierStubC2 >> - 8334060: [arm32] Implementation of Late Barrier Expansion for G1 >> - ... and 36 more: https://git.openjdk.org/jdk/compare/da906826...47c982ba > > src/hotspot/share/opto/matcher.cpp line 1821: > >> 1819: if( rule >= _END_INST_CHAIN_RULE || rule < _BEGIN_INST_CHAIN_RULE ) { >> 1820: assert(C->node_arena()->contains(s->_leaf) || !has_new_node(s->_leaf), >> 1821: "duplicating node that's already been matched"); > > Why it was removed? The assertion was failing due to it being too strict in several cases where the matcher would generate valid code anyway. One of them is when `is_encode_and_store_pattern(n, m)` returns true but `m -> n` cannot be matched by a single `g1EncodePAndStoreN` instruction. Commit 9ad158b6 removes this case by ensuring that `is_encode_and_store_pattern(n, m)` holds only if `m -> n` can indeed be matched. There are other cases (all of them harmless as far as I can see) in which this assertion can fail. I am investigating whether they can be avoided so that the assertion can be restored, and what would be the impact on the "redundant decompression removal" (`g1EncodePAndStoreN`) optimization. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1770925777 From dnsimon at openjdk.org Mon Sep 23 08:14:39 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 23 Sep 2024 08:14:39 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:31:10 GMT, Tom?? Zezula wrote: >> Disabling the JVMCI compiler with `-XX:-UseJVMCICompiler` not only deactivates JVMCI-based CompileBroker compilations but also prevents the loading of the libjvmci compiler. While this works as expected for CompileBroker compilations, it poses issues for the Truffle compiler. When `-XX:-UseJVMCICompiler` is used, Truffle falls back to the jargraal compiler, if available. This behavior may be confusing for Truffle users. >> >> Expected behavior: >> >> With `-XX:+UseGraalJIT`, both CompileBroker compilations and Truffle compilations should utilize the libjvmci compiler, if available. >> With `-XX:+EnableJVMCI`, CompileBroker compilations should use the C2 compiler, while only Truffle compilations should leverage the libjvmci compiler, if available. > > Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8340398: Fixed EnableJVMCI handling. Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21069#pullrequestreview-2321435416 From dzhang at openjdk.org Mon Sep 23 08:33:35 2024 From: dzhang at openjdk.org (Dingli Zhang) Date: Mon, 23 Sep 2024 08:33:35 GMT Subject: RFR: 8340590: RISC-V: C2: Small improvement to vector gather load and scatter store [v2] In-Reply-To: <8n5tZswHsXFfTJa63f8EkLCau8wg1NSj5hKS67254A4=.5f642538-6e7e-47ed-9895-bbeb7191bcda@github.com> References: <8n5tZswHsXFfTJa63f8EkLCau8wg1NSj5hKS67254A4=.5f642538-6e7e-47ed-9895-bbeb7191bcda@github.com> Message-ID: On Mon, 23 Sep 2024 07:14:08 GMT, Gui Cao wrote: >> Hi, >> This is a small improvement for RISC-V C2 vector gather load and scatter store nodes. Currently, we emit whole vector register move (vmv1r.v) to move vector idx to a temp vector register. But a normal vector integer move (vmv.v.v) would do and is more reasonable as the vtype and vl are known here. >> >> >> ### Testing >> - [x] make test TEST="jdk_vector" JTREG="TIMEOUT_FACTOR=32" on qemu with UseRVV1.0 > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Polishing Look good to me, thanks. (Not a Reviewer) ------------- Marked as reviewed by dzhang (Author). PR Review: https://git.openjdk.org/jdk/pull/21123#pullrequestreview-2321477521 From chagedorn at openjdk.org Mon Sep 23 08:54:47 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 23 Sep 2024 08:54:47 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v17] In-Reply-To: References: Message-ID: <4ZwCxZHzk-6BcoFVe9XpNXBY8V0mMNMa39lbHpGaexs=.228624ad-f288-444b-a8f4-87152465bf66@github.com> On Wed, 18 Sep 2024 06:53:44 GMT, Emanuel Peter wrote: >> **Motivation** >> >> I want to write small dedicated fuzzers: >> - Generate `java` and `jasm` source code: just some `String`. >> - Quickly compile it (with this framework). >> - Execute the compiled code. >> >> The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. >> >> **The CompileFramework** >> >> Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. >> An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. >> >> I implemented a first, simple version of the framework. I added some tests and examples. >> >> **Example** >> >> >> CompileFramework comp = new CompileFramework(); >> comp.add(SourceCode.newJavaSourceCode("XYZ", "")); >> comp.compile(); >> comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); >> >> >> https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 >> >> **Below some use cases: tests that would have been better with the CompileFramework** >> >> **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** >> >> I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. >> >> For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, >> >> to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something i... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > another small suggestion from Christian Almost there, some minor nits left, mostly code style. Otherwise, looks good to me now! test/hotspot/jtreg/compiler/lib/compile_framework/Compile.java line 31: > 29: > 30: /** > 31: * Helper class for compilation of Java and Jasm {@code SourceCode}. Suggestion: * Helper class for compilation of Java and Jasm {@link SourceCode}. test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 48: > 46: * Set up a new Compile Framework instance, for a new compilation unit. > 47: */ > 48: public CompileFramework() {} You can probably omit that since it's empty. test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 106: > 104: * > 105: * @param name Name of the class to be retrieved. > 106: * @return A class corresponding to the {@code name}. The? Suggestion: * @return The class corresponding to the {@code name}. test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 117: > 115: > 116: /** > 117: * Invoke a static method from the compiled code. Maybe add a new line for separation: Suggestion: * Invoke a static method from the compiled code. test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 157: > 155: > 156: /** > 157: * Returns the classpath appended with the {@code classesDir}, where Suggestion: * Returns the classpath appended with the {@link classesDir}, where test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 160: > 158: * the compiled classes are stored. This enables another VM to load > 159: * the compiled classes. Note, the string is already backslash escaped, > 160: * so that the windows paths which use backslashes can be used directly Suggestion: * so that Windows paths which use backslashes can be used directly test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 169: > 167: } > 168: } > 169: Suggestion: test/hotspot/jtreg/compiler/lib/compile_framework/Utils.java line 127: > 125: * Write sources to file. > 126: */ > 127: public static List writeSourcesToFile(List sources, Path sourceDir) { Since you write multiple files, should we name this "writeSourcesToFile**s**"? Suggestion: /** * Write each source in {@code sources} to a file inside {@code sourceDir}. */ public static List writeSourcesToFiles(List sources, Path sourceDir) { test/hotspot/jtreg/compiler/lib/compile_framework/Utils.java line 128: > 126: */ > 127: public static List writeSourcesToFile(List sources, Path sourceDir) { > 128: List storedFiles = new ArrayList(); Suggestion: List storedFiles = new ArrayList<>(); test/hotspot/jtreg/compiler/lib/compile_framework/Utils.java line 170: > 168: throw new CompileFrameworkException("Compilation failed."); > 169: } > 170: } These methods only seem to be used by the class `Compile` and look specific to the needs of that class. Maybe they better live there (or are put into separate classes). What do you think? test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/CombinedJavaJasmExample.java line 87: > 85: } > 86: > 87: public static void main(String args[]) { You should use the Java style instead of C-style and put the `[]` at the type: Suggestion: public static void main(String[] args) { Same in other examples. test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/CombinedJavaJasmExample.java line 113: > 111: throw new RuntimeException("wrong value: " + i); > 112: } > 113: Suggestion: test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/IRFrameworkJavaExample.java line 45: > 43: * might not compile it because it is not present in the class, only in the dynamically compiled > 44: * code. > 45: * Suggestion: *

test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/IRFrameworkJavaExample.java line 47: > 45: * > 46: * Additionally, we must set the classpath for the Test-VM, so that it has access to all compiled > 47: * classes (see {@code getEscapedClassPathOfCompiledClasses}). Suggestion: * classes (see {@link CompileFramework#getEscapedClassPathOfCompiledClasses}). test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/IRFrameworkJavaExample.java line 57: > 55: > 56: // Generate a source java file as String > 57: public static String generate_X1(CompileFramework comp) { Generally, since these are motivating example, we should probably not use underlines in the method names since Java advocates camelCase. test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/IRFrameworkJavaExample.java line 129: > 127: > 128: // Load the compiled class. > 129: Class c = comp.getClass("X2"); Suggestion: Class c = comp.getClass("X2"); test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/MultiFileJavaExample.java line 73: > 71: comp.compile(); > 72: > 73: Suggestion: ------------- PR Review: https://git.openjdk.org/jdk/pull/20184#pullrequestreview-2321341939 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1770900687 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1770923762 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1770969825 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1770970223 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1770971689 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1770973056 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1770973518 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1770987188 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1770982248 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1770913796 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1770989589 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1770989881 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1770992003 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1770991812 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1770995166 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1770992852 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1770996920 From duke at openjdk.org Mon Sep 23 09:36:43 2024 From: duke at openjdk.org (Daniel Skantz) Date: Mon, 23 Sep 2024 09:36:43 GMT Subject: RFR: 8330157: C2: Add a stress flag for bailouts [v9] In-Reply-To: <1v-WNWigdtAWl6wS1BE3S4kikAZo6zuyOc9Q9KxxmZo=.1b5c9937-3043-440d-ab77-839e7d152bf3@github.com> References: <1v-WNWigdtAWl6wS1BE3S4kikAZo6zuyOc9Q9KxxmZo=.1b5c9937-3043-440d-ab77-839e7d152bf3@github.com> Message-ID: On Fri, 23 Aug 2024 15:03:14 GMT, Daniel Skantz wrote: >> Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: >> >> +1 whitespace > > Comment to avoid timeout. > @danielogh is this ready for further reviews or are you still working on the suggestion that @eme64 had? I think it can be reviewed again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19646#issuecomment-2367683001 From duke at openjdk.org Mon Sep 23 09:36:48 2024 From: duke at openjdk.org (Daniel Skantz) Date: Mon, 23 Sep 2024 09:36:48 GMT Subject: RFR: 8330157: C2: Add a stress flag for bailouts [v10] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 06:21:18 GMT, Daniel Skantz wrote: >> This patch adds a diagnostic/stress flag for C2 bailouts. It can be used to support testing of existing bailouts to prevent issues like [JDK-8318445](https://bugs.openjdk.org/browse/JDK-8318445), and can test for issues only seen at runtime such as [JDK-8326376](https://bugs.openjdk.org/browse/JDK-8326376). It can also be useful if we want to add more bailouts ([JDK-8318900](https://bugs.openjdk.org/browse/JDK-8318900)). >> >> We check two invariants. >> a) Bailouts should be successful starting from any given `failing()` check. >> b) The VM should not record a bailout when one is pending (in which case we have continued to optimize for too long). >> >> a), b) are checked by randomly starting a bailout at calls to `failing()` with a user-given probability. >> >> The added flag should not have any effect in debug mode. >> >> Testing: >> >> T1-5, with flag and without it. We want to check that this does not cause any test failures without the flag set, and no unexpected failures with it. Tests failing because of timeout or because an error is printed to output when compilation fails can be expected in some cases. > > Daniel Skantz has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 40 commits: > > - phrasing > - nextInt > - +1 case > - words > - Tweak CtwRunner.java; debug only bool flag > - merge > - +1 whitespace > - tweak requires > - rename secondary flag v2 > - propagate external flags to test > - ... and 30 more: https://git.openjdk.org/jdk/compare/10050a72...9ab95aa6 src/hotspot/share/opto/compile.cpp line 1124: > 1122: */ > 1123: StartNode* Compile::start() const { > 1124: assert (!failing_internal() || C->failure_is_artificial(), "Must not have pending failure. Reason is: %s", failure_reason()); Having the stress mode in debug builds requires weakening asserts since debug builds assert these paths are not taken. test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/CtwRunner.java line 309: > 307: "-XX:StressSeed=" + rng.nextInt(Integer.MAX_VALUE))); > 308: > 309: // Use this stress mode 10% of the time as it could make some long-running compilations likely to abort. I apply this stress mode randomly as it could hypothetically hide an infinite compilation bug if used unconditionally. If it is later used in another test suite we might need similar randomization. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19646#discussion_r1771061037 PR Review Comment: https://git.openjdk.org/jdk/pull/19646#discussion_r1771061584 From roland at openjdk.org Mon Sep 23 11:42:37 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 23 Sep 2024 11:42:37 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v2] In-Reply-To: <8jdKFn_Bln3lPK1vO8UZyUakbwv_gBvKLd-MutObCg0=.bf55c55e-b906-4d90-9493-e26ba2d87298@github.com> References: <-TWzHU1sjg49BOttKN_muQ6ICHAMbAnBoNUurRbqHDg=.a27bbf72-a451-4257-914f-d37e181ce257@github.com> <8jdKFn_Bln3lPK1vO8UZyUakbwv_gBvKLd-MutObCg0=.bf55c55e-b906-4d90-9493-e26ba2d87298@github.com> Message-ID: <9WxAeM7QHqfmAcs90-IeCy8zME-pe_HY4onDTFwJfMQ=.fb286084-6b1c-47b9-8151-1349e0d37a08@github.com> On Wed, 18 Sep 2024 20:39:06 GMT, Kangcheng Xu wrote: >> src/hotspot/share/opto/addnode.cpp line 427: >> >>> 425: } >>> 426: >>> 427: Node* con = (bt == T_INT) ? (Node*) phase->intcon((jint) factor) : (Node*) phase->longcon(factor); >> >> You can use `integercon()` and pass `bt` > > I disagree: `integercon()` internally uses `checked_cast(l)` to make prevent information loss during type conversion and asserts at runtime if the value is larger what a `jint` can hold. However, such an information loss is intended for integer arithmetic overflows. (e.g., `Integer.MAX_VALUE * a + a` is extracted to `((jlong) Integer.MAX_VALUE + (jlong) 1) * a`. Here we want `Integer.MAX_VALUE + 1` to overflow to `(int) Integer.MIN_VALUE`). > > If I were to use `integercon()`, the best I could do is `intgercon(bt == T_INT ? (jint) factor : factor)` which is rather pointless. Good catch. It makes sense to add a comment about that in the source code. Do you have a test case for that corner case? >> src/hotspot/share/opto/addnode.cpp line 490: >> >>> 488: if (bt == T_INT || bt == T_LONG) { // const could potentially be void type >>> 489: Node* mul_base; >>> 490: jlong multiplier = extract_base_operand_from_serial_additions(phase, operand_node, &mul_base, depth_limit - 1); >> >> Do you need to recurse at all here? > > I believe so. Consider the case `(a + a) * 3`. Recurse here allows us to extract `a` and factor `2 * 3 => 6` That case would be better handled in 2 steps, I think: `a+a` into `a*2` with a `AddNode` transformation `(a*2)*3` into `a*6` with a `MulNode` (or `LShift`) transformation. Can you check if it already exists? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1771244410 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1771246595 From dqu at openjdk.org Mon Sep 23 12:06:12 2024 From: dqu at openjdk.org (Daohan Qu) Date: Mon, 23 Sep 2024 12:06:12 GMT Subject: RFR: 8340602: C2: LoadNode::split_through_phi might exhaust nodes in case of base_is_phi Message-ID: # Description [JDK-6934604](https://github.com/openjdk/jdk/commit/b4977e887a53c898b96a7d37a3bf94742c7cc194) introduces the flag `AggressiveUnboxing` in jdk8u, and [JDK-8217919](https://github.com/openjdk/jdk/commit/71759e3177fcd6926bb36a30a26f9f3976f2aae8) enables it by default in jdk13u. But it seems that JDK-6934604 forgets to check duplicate `PhiNode` generated in the function `LoadNode::split_through_phi(PhaseGVN *phase)` (in `memnode.cpp`) in the case that `base` is phi but `mem` is not phi. More exactly, `LoadNode::Identity(PhaseTransform *phase)` doesn't search for `PhiNode` in the correct region in that case. This might cause infinite split which is similar to the bugs fixed in [JDK-6673473](https://github.com/openjdk/jdk/commit/30dc0edfc877000c0ae20384f228b45ba82807b7) and [JDK-8038348](https://github.com/openjdk/jdk/commit/913622a64157c4c2ce496ecddf7a8c4315e1ff84). The infinite split results in "Out of nodes" and make the method "not compilable". Since JDK-8217919 (in jdk13u), all the later versions of jdks are affected by this bug when the expected optimization pattern appears in the code. For example, the following three micro-benchmarks running with make test \ TEST="micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef" \ MICRO="FORK=1;WARMUP_ITER=2" \ TEST_OPTS="VM_OPTIONS=-XX:+UseParallelGC" shows performance improvement after this PR applied. (`-XX:+UseParallelGC` is only for reproduce this bug, all the bms in the following table are run with this option.) |benchmark (throughput, unit: ops/s)| jdk-before-this-patch | jdk-after-this-patch | |---|---|---| |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner | 26.452 ?(99.9%) 0.185 ops/s | 62.060 ?(99.9%) 0.878 ops/s | |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda | 26.922 ?(99.9%) 1.710 ops/s | 67.961 ?(99.9%) 0.850 ops/s | |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef | 28.382 ?(99.9%) 1.021 ops/s | 67.998 ?(99.9%) 0.751 ops/s | # Reproduction Compiled and run the reduced test case `Test.java` in the appendix below using java -Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=comp.log -XX:+UseParallelGC Test and you could find that `Test$Obj.calc` is tagged with `make_not_compilable` and see some output like " And when `-XX:+AbortVMOnCompilationFailure` is used, vm will crash. # Solutions You could set `AggressiveUnboxing` to `false` to disable this optimization thus bypassing this bug, but I'm not sure if it will cause other regressions since it has been set to `true` for several years. This PR tries to fix this bug with minimal side effect. # Tests I have run the following tests locally, and only see the `vmTestbase/jit/misctests/fpustack/GraphApplet.java` failure due to `DISPLAY` unset, which seems to be unrelated to this patch. - [x] `jtreg:test/hotspot/jtreg/compiler` - [x] `jtreg:test/hotspot/jtreg/vmTestbase` # Appendix import java.util.Random; public class Test { static class Obj { final Integer[] array; final int start; final int end; Integer max = Integer.MIN_VALUE; Obj(Integer[] array, int start, int end) { this.array = array; this.start = start; this.end = end; } Integer cmp(Integer i, Integer j) { return i > j ? i : j; } void calc() { int i = start; do { max = cmp(max, array[i]); i++; } while (i < end); } } static final int LEN = 2000; static final Integer[] a = new Integer[LEN]; static { Random r = new Random(0x30052012); for (int i = 0; i < LEN; i++) { a[i] = new Integer(r.nextInt()); } } public static void main (String[] args) { System.out.println("Start"); Obj o = new Obj(a, 0, LEN); for (int i = 0; i < 1000; i++) { o.calc(); } System.out.println(o.max); } } ------------- Commit messages: - Check duplicate split in case of base_is_phi Changes: https://git.openjdk.org/jdk/pull/21134/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21134&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340602 Stats: 100 lines in 3 files changed: 63 ins; 29 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/21134.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21134/head:pull/21134 PR: https://git.openjdk.org/jdk/pull/21134 From epeter at openjdk.org Mon Sep 23 12:10:16 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 23 Sep 2024 12:10:16 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v18] In-Reply-To: References: Message-ID: > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/237ce2ed..46362809 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=16-17 Stats: 15 lines in 6 files changed: 1 ins; 3 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From epeter at openjdk.org Mon Sep 23 12:22:14 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 23 Sep 2024 12:22:14 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v19] In-Reply-To: References: Message-ID: <4Jye5n7-R7bQpEPQ8oZE35G6J1u3sSmxPVMcRbPC4LE=.aa58a45e-2ff6-49b0-9f53-ec0bcdeed0e2@github.com> > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/46362809..7a525e0b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=17-18 Stats: 11 lines in 3 files changed: 0 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From epeter at openjdk.org Mon Sep 23 12:32:13 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 23 Sep 2024 12:32:13 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v20] In-Reply-To: References: Message-ID: <2rMSs735Jx5tvnRv16rdizKVtCtk_frMIkTVzkjav2M=.d31c45a2-4d88-4ff5-90e5-945cac86eaa3@github.com> > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: move some code for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/7a525e0b..041da5d4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=18-19 Stats: 187 lines in 2 files changed: 92 ins; 90 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From epeter at openjdk.org Mon Sep 23 12:34:40 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 23 Sep 2024 12:34:40 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v17] In-Reply-To: <4ZwCxZHzk-6BcoFVe9XpNXBY8V0mMNMa39lbHpGaexs=.228624ad-f288-444b-a8f4-87152465bf66@github.com> References: <4ZwCxZHzk-6BcoFVe9XpNXBY8V0mMNMa39lbHpGaexs=.228624ad-f288-444b-a8f4-87152465bf66@github.com> Message-ID: On Mon, 23 Sep 2024 08:51:58 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> another small suggestion from Christian > > Almost there, some minor nits left, mostly code style. Otherwise, looks good to me now! @chhagedorn thanks for reviewing yet again! I addressed all your points. > test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 48: > >> 46: * Set up a new Compile Framework instance, for a new compilation unit. >> 47: */ >> 48: public CompileFramework() {} > > You can probably omit that since it's empty. I wanted to hava a javadoc string for the constructor though. Let me know if you think I should remove it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20184#issuecomment-2368080856 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1771315878 From yzheng at openjdk.org Mon Sep 23 13:11:10 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 23 Sep 2024 13:11:10 GMT Subject: RFR: 8340585: [JVMCI] compiler/unsafe/UnsafeGetStableArrayElement.java fails with -XX:-UseCompressedClassPointers Message-ID: Graal does not constant fold unaligned long/double reads from primitive stable arrays. Update UnsafeGetStableArrayElement.java accordingly. ------------- Commit messages: - compiler/unsafe/UnsafeGetStableArrayElement.java fails with -XX:-UseCompressedClassPointers Changes: https://git.openjdk.org/jdk/pull/21136/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21136&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340585 Stats: 12 lines in 1 file changed: 0 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/21136.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21136/head:pull/21136 PR: https://git.openjdk.org/jdk/pull/21136 From epeter at openjdk.org Mon Sep 23 13:30:14 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 23 Sep 2024 13:30:14 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v21] In-Reply-To: References: Message-ID: > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/041da5d4..ad3865bb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=19-20 Stats: 8 lines in 8 files changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From chagedorn at openjdk.org Mon Sep 23 13:30:15 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 23 Sep 2024 13:30:15 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v20] In-Reply-To: <2rMSs735Jx5tvnRv16rdizKVtCtk_frMIkTVzkjav2M=.d31c45a2-4d88-4ff5-90e5-945cac86eaa3@github.com> References: <2rMSs735Jx5tvnRv16rdizKVtCtk_frMIkTVzkjav2M=.d31c45a2-4d88-4ff5-90e5-945cac86eaa3@github.com> Message-ID: On Mon, 23 Sep 2024 12:32:13 GMT, Emanuel Peter wrote: >> **Motivation** >> >> I want to write small dedicated fuzzers: >> - Generate `java` and `jasm` source code: just some `String`. >> - Quickly compile it (with this framework). >> - Execute the compiled code. >> >> The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. >> >> **The CompileFramework** >> >> Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. >> An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. >> >> I implemented a first, simple version of the framework. I added some tests and examples. >> >> **Example** >> >> >> CompileFramework comp = new CompileFramework(); >> comp.add(SourceCode.newJavaSourceCode("XYZ", "")); >> comp.compile(); >> comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); >> >> >> https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 >> >> **Below some use cases: tests that would have been better with the CompileFramework** >> >> **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** >> >> I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. >> >> For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, >> >> to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something i... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > move some code for Christian Looks great now! Some last nits but I'd consider it done now from my side. Thanks for the patience and applying all the suggestions :-) test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/IRFrameworkJavaExample.java line 51: > 49: public class IRFrameworkJavaExample { > 50: > 51: public static void main(String args[]) { Suggestion: public static void main(String[] args) { test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/MultiFileJasmExample.java line 65: > 63: } > 64: > 65: public static void main(String args[]) { Suggestion: public static void main(String[] args) { test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/MultiFileJavaExample.java line 61: > 59: } > 60: > 61: public static void main(String args[]) { Suggestion: public static void main(String[] args) { test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJasmExample.java line 57: > 55: } > 56: > 57: public static void main(String args[]) { Suggestion: public static void main(String[] args) { test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java line 53: > 51: } > 52: > 53: public static void main(String args[]) { Suggestion: public static void main(String[] args) { test/hotspot/jtreg/testlibrary_tests/compile_framework/tests/TestBadJasmCompilation.java line 47: > 45: } > 46: > 47: public static void main(String args[]) { Suggestion: public static void main(String[] args) { test/hotspot/jtreg/testlibrary_tests/compile_framework/tests/TestBadJavaCompilation.java line 47: > 45: } > 46: > 47: public static void main(String args[]) { Suggestion: public static void main(String[] args) { test/hotspot/jtreg/testlibrary_tests/compile_framework/tests/TestConcurrentCompilation.java line 90: > 88: } > 89: > 90: public static void main(String args[]) { Suggestion: public static void main(String[] args) { ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20184#pullrequestreview-2322198277 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1771412267 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1771412897 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1771413618 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1771414312 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1771414822 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1771415535 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1771417239 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1771418447 From epeter at openjdk.org Mon Sep 23 13:30:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 23 Sep 2024 13:30:15 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v20] In-Reply-To: References: <2rMSs735Jx5tvnRv16rdizKVtCtk_frMIkTVzkjav2M=.d31c45a2-4d88-4ff5-90e5-945cac86eaa3@github.com> Message-ID: On Mon, 23 Sep 2024 13:24:30 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> move some code for Christian > > Looks great now! Some last nits but I'd consider it done now from my side. Thanks for the patience and applying all the suggestions :-) @chhagedorn thanks! Applied them all. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20184#issuecomment-2368245707 From chagedorn at openjdk.org Mon Sep 23 13:30:15 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 23 Sep 2024 13:30:15 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v17] In-Reply-To: References: <4ZwCxZHzk-6BcoFVe9XpNXBY8V0mMNMa39lbHpGaexs=.228624ad-f288-444b-a8f4-87152465bf66@github.com> Message-ID: On Mon, 23 Sep 2024 12:32:16 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/compile_framework/CompileFramework.java line 48: >> >>> 46: * Set up a new Compile Framework instance, for a new compilation unit. >>> 47: */ >>> 48: public CompileFramework() {} >> >> You can probably omit that since it's empty. > > I wanted to hava a javadoc string for the constructor though. Let me know if you think I should remove it. I see, I guess then it's fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1771419780 From chagedorn at openjdk.org Mon Sep 23 13:42:39 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 23 Sep 2024 13:42:39 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v21] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 13:30:14 GMT, Emanuel Peter wrote: >> **Motivation** >> >> I want to write small dedicated fuzzers: >> - Generate `java` and `jasm` source code: just some `String`. >> - Quickly compile it (with this framework). >> - Execute the compiled code. >> >> The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. >> >> **The CompileFramework** >> >> Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. >> An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. >> >> I implemented a first, simple version of the framework. I added some tests and examples. >> >> **Example** >> >> >> CompileFramework comp = new CompileFramework(); >> comp.add(SourceCode.newJavaSourceCode("XYZ", "")); >> comp.compile(); >> comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); >> >> >> https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 >> >> **Below some use cases: tests that would have been better with the CompileFramework** >> >> **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** >> >> I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. >> >> For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, >> >> to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something i... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20184#pullrequestreview-2322282011 From svkamath at openjdk.org Mon Sep 23 15:08:38 2024 From: svkamath at openjdk.org (Smita Kamath) Date: Mon, 23 Sep 2024 15:08:38 GMT Subject: RFR: 8337632: AES-GCM Algorithm optimization for x86_64 [v3] In-Reply-To: <8VoBChxBjiW1bW9pDuPZqpLetYklTvVTERvUUQjxlQM=.d3e29df2-8b01-4786-8648-a3fda9a4a0d4@github.com> References: <8VoBChxBjiW1bW9pDuPZqpLetYklTvVTERvUUQjxlQM=.d3e29df2-8b01-4786-8648-a3fda9a4a0d4@github.com> Message-ID: <-hMoerjDqe__PSeOn8MMGuRjuD2s_PTdr2oF3ErbS7Q=.82ae0656-a50d-4687-8ff0-fd68cefbee4e@github.com> On Mon, 2 Sep 2024 10:28:37 GMT, Jatin Bhateja wrote: >> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: >> >> Updated copyright dates and addressed review comments > > src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 286: > >> 284: __ push(r15);//holds number of rounds >> 285: __ push(rbx);//scratch register >> 286: #ifdef _WIN64 > > Should we replace these stack access with GPR to scratch register XMM and vice-versa transfers. I am using all XMM registers from 0-31 in the code, so wont be able to do this change. > src/hotspot/cpu/x86/stubGenerator_x86_64_ghash.cpp line 60: > >> 58: // Polynomial x^128+x^127+x^126+x^121+1 >> 59: ATTRIBUTE_ALIGNED(16) static const uint64_t GHASH_POLYNOMIAL[] = { >> 60: 0x0000000000000001ULL, 0xC200000000000000ULL, > > As per https://www.intel.com/content/dam/develop/external/us/en/documents/clmul-wp-rev-2-02-2014-04-20.pdf and https://www.intel.com/content/dam/www/public/us/en/documents/software-support/enabling-high-performance-gcm.pdf > reduction polynomial for GHASH should be "x^128 + x^7 + x^2 + x + 1". > > Also the polynomial defined in comments is not matching with the bit representation 1100 0010 <119 zeros> 1 The polynomial comes from the implementation mentioned in https://github.com/intel/intel-ipsec-mb/blob/main/lib/include/gcm_vaes_avx512.inc ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17515#discussion_r1771638716 PR Review Comment: https://git.openjdk.org/jdk/pull/17515#discussion_r1771638949 From kxu at openjdk.org Mon Sep 23 16:19:03 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 23 Sep 2024 16:19:03 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v11] In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: <2gIAMHwbpfcfrT44pb2Har9bXZIWeJqPEAMo2rD3-C0=.4da5eccb-b126-4112-94a5-3090c1dec85c@github.com> > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 30 commits: - Merge branch 'openjdk:master' into long-typed-parallel-iv - refactor I/L conversion nodes - update tests and comments as requested - Merge branch 'master' into long-typed-parallel-iv - use @run driver and Argument.RANDOM_ONCE - Merge branch 'master' into long-typed-parallel-iv - add random strides to tests - fix tests on larger strides - add more expressive comments and test cases - Merge branch 'master' into long-typed-parallel-iv - ... and 20 more: https://git.openjdk.org/jdk/compare/0f9f7775...e1e112da ------------- Changes: https://git.openjdk.org/jdk/pull/18489/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=10 Stats: 427 lines in 3 files changed: 414 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/18489.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489 PR: https://git.openjdk.org/jdk/pull/18489 From svkamath at openjdk.org Mon Sep 23 16:19:44 2024 From: svkamath at openjdk.org (Smita Kamath) Date: Mon, 23 Sep 2024 16:19:44 GMT Subject: RFR: 8337632: AES-GCM Algorithm optimization for x86_64 [v3] In-Reply-To: <8VoBChxBjiW1bW9pDuPZqpLetYklTvVTERvUUQjxlQM=.d3e29df2-8b01-4786-8648-a3fda9a4a0d4@github.com> References: <8VoBChxBjiW1bW9pDuPZqpLetYklTvVTERvUUQjxlQM=.d3e29df2-8b01-4786-8648-a3fda9a4a0d4@github.com> Message-ID: On Fri, 6 Sep 2024 09:04:50 GMT, Jatin Bhateja wrote: >> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: >> >> Updated copyright dates and addressed review comments > > src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 3001: > >> 2999: if (do_reduction) { >> 3000: //new reduction >> 3001: __ evmovdquq(ZTMPB, ExternalAddress(ghash_polynomial_addr()), Assembler::AVX_512bit, rbx /*rscratch*/); > > Is this based on aggregate reduction method ? > Can you please add some comments to narrate the reduction algorithm. The reduction algorithm is mentioned in the paper - https://github.com/intel/intel-ipsec-mb/wiki/doc/advanced-encryption-standard-galois-counter-mode-optimized-ghash-function-technology-guide-1693300747.pdf ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17515#discussion_r1771745843 From svkamath at openjdk.org Mon Sep 23 16:51:36 2024 From: svkamath at openjdk.org (Smita Kamath) Date: Mon, 23 Sep 2024 16:51:36 GMT Subject: RFR: 8337632: AES-GCM Algorithm optimization for x86_64 [v4] In-Reply-To: References: Message-ID: > Hi, > I want to submit an AES-GCM algorithm optimization. This implementation is using AVX512/VAES Instructions. Additionally, it reduces PARALLEL_LEN from 7680 to 512 bytes. The performance numbers are as below. Kindly review the code. Thank you. > > Benchmark | Datasize | BaseJDK (ops/s) | Patch(ops/s) | %Gain > -- | -- | -- | -- | -- > full.AESGCMBench.decrypt | 512 | 2928259.197 | 3269964.387 | 11.67 > full.AESGCMBench.decrypt | 1024 | 2494254.611 | 3010987.731 | 20.72 > full.AESGCMBench.decrypt | 1500 | 1883453.546 | 1934915.846 | 2.73 > full.AESGCMBench.decrypt | 2048 | 1825780.711 | 2452861.368 | 34.34 > full.AESGCMBench.decrypt | 4096 | 1275108.345 | 1806329.066 | 41.66 > full.AESGCMBench.decrypt | 8192 | 1033936.634 | 1196836.052 | 15.75 > full.AESGCMBench.decrypt | 16384 | 681494.768 | 711630.498 | 4.42 > full.AESGCMBench.decrypt | 32768 | 385026.017 | 395043.193 | 2.6 > full.AESGCMBench.decrypt | 65536 | 207373.924 | 214723.588 | 3.54 > ? | ? | ? | ? | ? > full.AESGCMBench.encrypt | 512 | 2658008.476 | 2882496.94 | 8.45 > full.AESGCMBench.encrypt | 1024 | 2283709.63 | 2589534.403 | 13.39 > full.AESGCMBench.encrypt | 1500 | 1794993.519 | 1817669.531 | 1.26 > full.AESGCMBench.encrypt | 2048 | 1745532.435 | 2191097.29 | 25.52 > full.AESGCMBench.encrypt | 4096 | 1203301.174 | 1649593.953 | 37.08 > full.AESGCMBench.encrypt | 8192 | 985174.988 | 1132407.54 | 14.94 > full.AESGCMBench.encrypt | 16384 | 658980.441 | 684765.771 | 3.91 > full.AESGCMBench.encrypt | 32768 | 373543.798 | 391518.837 | 4.81 > full.AESGCMBench.encrypt | 65536 | 202532.315 | 205084.833 | 1.260301597 Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: Addressed review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17515/files - new: https://git.openjdk.org/jdk/pull/17515/files/ed10bcca..12eab515 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17515&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17515&range=02-03 Stats: 3 lines in 3 files changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17515.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17515/head:pull/17515 PR: https://git.openjdk.org/jdk/pull/17515 From kxu at openjdk.org Mon Sep 23 16:55:18 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 23 Sep 2024 16:55:18 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v12] In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: <6T8i0DZcooO3e9rS9cVk3r_WquXnm9I-fXW80qbg-Ck=.2883943a-d3e2-41a8-aa06-dcfd27b94125@github.com> > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: omit source BasicType ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18489/files - new: https://git.openjdk.org/jdk/pull/18489/files/e1e112da..64bf036e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=10-11 Stats: 5 lines in 2 files changed: 1 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18489.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489 PR: https://git.openjdk.org/jdk/pull/18489 From kxu at openjdk.org Mon Sep 23 17:10:50 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 23 Sep 2024 17:10:50 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v4] In-Reply-To: References: Message-ID: <7VsMJqcILHXihypqfE0TCb3OvHb4Fl5vRX_te3Ou_vM=.20e26d66-70e0-45bd-b566-0120dcfd5d19@github.com> > This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. > > As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). > > Some notable examples of this transformation include: > - `a + a + a` => `a*3` => `(a<<1) + a` > - `a + a + a + a` => `a*4` => `a<<2` > - `a*3 + a` => `a*4` => `a<<2` > - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` > > See included IR unit tests for more. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: add comments about intentional type narrowing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20754/files - new: https://git.openjdk.org/jdk/pull/20754/files/30f119c5..f9ca1124 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20754&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20754&range=02-03 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20754.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20754/head:pull/20754 PR: https://git.openjdk.org/jdk/pull/20754 From kxu at openjdk.org Mon Sep 23 17:10:50 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 23 Sep 2024 17:10:50 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v2] In-Reply-To: <9WxAeM7QHqfmAcs90-IeCy8zME-pe_HY4onDTFwJfMQ=.fb286084-6b1c-47b9-8151-1349e0d37a08@github.com> References: <-TWzHU1sjg49BOttKN_muQ6ICHAMbAnBoNUurRbqHDg=.a27bbf72-a451-4257-914f-d37e181ce257@github.com> <8jdKFn_Bln3lPK1vO8UZyUakbwv_gBvKLd-MutObCg0=.bf55c55e-b906-4d90-9493-e26ba2d87298@github.com> <9WxAeM7QHqfmAcs90-IeCy8zME-pe_HY4onDTFwJfMQ=.fb286084-6b1c-47b9-8151-1349e0d37a08@github.com> Message-ID: On Mon, 23 Sep 2024 11:37:38 GMT, Roland Westrelin wrote: >> I disagree: `integercon()` internally uses `checked_cast(l)` to make prevent information loss during type conversion and asserts at runtime if the value is larger what a `jint` can hold. However, such an information loss is intended for integer arithmetic overflows. (e.g., `Integer.MAX_VALUE * a + a` is extracted to `((jlong) Integer.MAX_VALUE + (jlong) 1) * a`. Here we want `Integer.MAX_VALUE + 1` to overflow to `(int) Integer.MIN_VALUE`). >> >> If I were to use `integercon()`, the best I could do is `intgercon(bt == T_INT ? (jint) factor : factor)` which is rather pointless. > > Good catch. > It makes sense to add a comment about that in the source code. > Do you have a test case for that corner case? Added comments. > Do you have a test case for that corner case? Yes: `TestSerialAdditions::mulAndAddToOverflow` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1771808870 From sviswanathan at openjdk.org Mon Sep 23 18:27:45 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 23 Sep 2024 18:27:45 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v12] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Wed, 18 Sep 2024 07:21:52 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Incorporating review and documentation suggestions. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 2600: > 2598: assert ((vlen & (vlen -1)) == 0); > 2599: int twoVectorLenMask = (vlen << 1) - 1; > 2600: ByteVector wrapped_indexes = this.lanewise(VectorOperators.AND, twoVectorLenMask); This assert and the following AND forcing power of two vector length seems out of place in Java code. You could move the wrapping within the selectFromTwoVectorOp on similar lines as the PR #20634. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1771898190 From swen at openjdk.org Mon Sep 23 23:23:12 2024 From: swen at openjdk.org (Shaojin Wen) Date: Mon, 23 Sep 2024 23:23:12 GMT Subject: RFR: 8333893: Optimization for StringBuilder append boolean & null [v19] In-Reply-To: References: Message-ID: > After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into primitive arrays by combining values ??into larger stores. > > This PR rewrites the code of appendNull and append(boolean) methods so that these two methods can be optimized by C2. Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision: fix build error ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19626/files - new: https://git.openjdk.org/jdk/pull/19626/files/399c8ef5..ae054771 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19626&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19626&range=17-18 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19626.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19626/head:pull/19626 PR: https://git.openjdk.org/jdk/pull/19626 From dlong at openjdk.org Tue Sep 24 00:40:52 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 24 Sep 2024 00:40:52 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 Message-ID: This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. ------------- Commit messages: - rename and restrict usage Changes: https://git.openjdk.org/jdk/pull/21148/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21148&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340141 Stats: 19 lines in 3 files changed: 15 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21148.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21148/head:pull/21148 PR: https://git.openjdk.org/jdk/pull/21148 From duke at openjdk.org Tue Sep 24 01:32:35 2024 From: duke at openjdk.org (Todd V. Jonker) Date: Tue, 24 Sep 2024 01:32:35 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option [v2] In-Reply-To: References: Message-ID: <3Q2rjX8usrpKiTZVeUXq7Ty8lFiFsDi03fmMnNA9IQ0=.7539b42e-03be-4b5c-abcf-51314d85b087@github.com> On Mon, 23 Sep 2024 07:31:10 GMT, Tom?? Zezula wrote: >> Disabling the JVMCI compiler with `-XX:-UseJVMCICompiler` not only deactivates JVMCI-based CompileBroker compilations but also prevents the loading of the libjvmci compiler. While this works as expected for CompileBroker compilations, it poses issues for the Truffle compiler. When `-XX:-UseJVMCICompiler` is used, Truffle falls back to the jargraal compiler, if available. This behavior may be confusing for Truffle users. >> >> Expected behavior: >> >> With `-XX:+UseGraalJIT`, both CompileBroker compilations and Truffle compilations should utilize the libjvmci compiler, if available. >> With `-XX:+EnableJVMCI`, CompileBroker compilations should use the C2 compiler, while only Truffle compilations should leverage the libjvmci compiler, if available. > > Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8340398: Fixed EnableJVMCI handling. If I'm reading things correctly, the doc-string for `UseJVMCINativeLibrary` in `jvmci_globals.hpp` needs updating. That [currently states](https://github.com/openjdk/jdk/blob/78f576192e815f957db93f5f8cb3763a35474381/src/hotspot/share/jvmci/jvmci_globals.hpp#L140-L144) "Defaults to true if EnableJVMCIProduct is true and a JVMCI native library is available" but looks like it default to true if `EnableJVMCI` is true, regardless of the `EnableJVMCIProduct` setting. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21069#issuecomment-2369922039 From duke at openjdk.org Tue Sep 24 05:13:35 2024 From: duke at openjdk.org (duke) Date: Tue, 24 Sep 2024 05:13:35 GMT Subject: RFR: 8340590: RISC-V: C2: Small improvement to vector gather load and scatter store [v2] In-Reply-To: <8n5tZswHsXFfTJa63f8EkLCau8wg1NSj5hKS67254A4=.5f642538-6e7e-47ed-9895-bbeb7191bcda@github.com> References: <8n5tZswHsXFfTJa63f8EkLCau8wg1NSj5hKS67254A4=.5f642538-6e7e-47ed-9895-bbeb7191bcda@github.com> Message-ID: On Mon, 23 Sep 2024 07:14:08 GMT, Gui Cao wrote: >> Hi, >> This is a small improvement for RISC-V C2 vector gather load and scatter store nodes. Currently, we emit whole vector register move (vmv1r.v) to move vector idx to a temp vector register. But a normal vector integer move (vmv.v.v) would do and is more reasonable as the vtype and vl are known here. >> >> >> ### Testing >> - [x] make test TEST="jdk_vector" JTREG="TIMEOUT_FACTOR=32" on qemu with UseRVV1.0 > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Polishing @zifeihan Your change (at version 03f6e3a8d49665d5acba4acf5ce02e49d52d77b7) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21123#issuecomment-2370183969 From dqu at openjdk.org Tue Sep 24 05:23:13 2024 From: dqu at openjdk.org (Daohan Qu) Date: Tue, 24 Sep 2024 05:23:13 GMT Subject: RFR: 8340602: C2: LoadNode::split_through_phi might exhaust nodes in case of base_is_phi [v2] In-Reply-To: References: Message-ID: <2swshN9Ew9xf8p7g0KXHRRw-SHg1Q-X4LZtXy5roDU0=.9e5499d7-0ddf-4ee8-872c-ba00a62cee28@github.com> > # Description > > [JDK-6934604](https://github.com/openjdk/jdk/commit/b4977e887a53c898b96a7d37a3bf94742c7cc194) introduces the flag `AggressiveUnboxing` in jdk8u, and [JDK-8217919](https://github.com/openjdk/jdk/commit/71759e3177fcd6926bb36a30a26f9f3976f2aae8) enables it by default in jdk13u. > > But it seems that JDK-6934604 forgets to check duplicate `PhiNode` generated in the function `LoadNode::split_through_phi(PhaseGVN *phase)` (in `memnode.cpp`) in the case that `base` is phi but `mem` is not phi. More exactly, `LoadNode::Identity(PhaseTransform *phase)` doesn't search for `PhiNode` in the correct region in that case. > > This might cause infinite split in the case of a loop, which is similar to the bugs fixed in [JDK-6673473](https://github.com/openjdk/jdk/commit/30dc0edfc877000c0ae20384f228b45ba82807b7). The infinite split results in "Out of nodes" and make the method "not compilable". > > Since JDK-8217919 (in jdk13u), all the later versions of jdks are affected by this bug when the expected optimization pattern appears in the code. For example, the following three micro-benchmarks running with > > > make test \ > TEST="micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef" \ > MICRO="FORK=1;WARMUP_ITER=2" \ > TEST_OPTS="VM_OPTIONS=-XX:+UseParallelGC" > > > shows performance improvement after this PR applied. (`-XX:+UseParallelGC` is only for reproduce this bug, all the bms in the following table are run with this option.) > > |benchmark (throughput, unit: ops/s)| jdk-before-this-patch | jdk-after-this-patch | > |---|---|---| > |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner | 26.452 ?(99.9%) 0.185 ops/s | 62.060 ?(99.9%) 0.878 ops/s | > |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda | 26.922 ?(99.9%) 1.710 ops/s | 67.961 ?(99.9%) 0.850 ops/s | > |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef | 28.382 ?(99.9%) 1.021 ops/s | 67.998 ?(99.9%) 0.751 ops/s | > > # Reproduction > > Compiled and run the reduced test case `Test.java` in the appendix below using > > > java -Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=comp.log -XX:+UseParallelGC Test > > > and you could find that `Test$Obj.calc` is tagged with `make_not_compilable` and see some output like > > > References: Message-ID: On Tue, 3 Sep 2024 09:30:32 GMT, Christian Hagedorn wrote: >> The computation of `final_correction` in `PhaseIdealLoop::is_counted_loop()` could overflow which is UB: >> >> https://github.com/openjdk/jdk/blob/dc4fd896289db1d2f6f7bbf5795fec533448a48c/src/hotspot/share/opto/loopnode.cpp#L1958-L1967 >> >> `canonicalized_correction` equals `max_int - 1` if stride is `max_int`. `limit_correction` is at most `max_int - 1` in that case. Adding both together will overflow. I don't think that any compiler would wrongly optimize this and we have not observed any issues with that. But we should still fix this UB. >> >> The fix I propose is to simply bail out with very large positive and negative strides such that we avoid an over- or underflow with the existing logic (see added comments for how the upper bound for the stride is determined). These large strides should be very uncommon in practice and even if we encounter these, the loop would only run for a few iterations. So, a bailout seems fine. This bailout has the additional benefit that we avoid other possibly unknown issues or issues in the future with counted loops having large edge-case strides like `min_int`. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Tobias Hartmann Thanks Tobias and Vladimir for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20828#issuecomment-2370330994 From chagedorn at openjdk.org Tue Sep 24 06:49:40 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 24 Sep 2024 06:49:40 GMT Subject: Integrated: 8323688: C2: Fix UB of jlong overflow in PhaseIdealLoop::is_counted_loop() In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 07:35:10 GMT, Christian Hagedorn wrote: > The computation of `final_correction` in `PhaseIdealLoop::is_counted_loop()` could overflow which is UB: > > https://github.com/openjdk/jdk/blob/dc4fd896289db1d2f6f7bbf5795fec533448a48c/src/hotspot/share/opto/loopnode.cpp#L1958-L1967 > > `canonicalized_correction` equals `max_int - 1` if stride is `max_int`. `limit_correction` is at most `max_int - 1` in that case. Adding both together will overflow. I don't think that any compiler would wrongly optimize this and we have not observed any issues with that. But we should still fix this UB. > > The fix I propose is to simply bail out with very large positive and negative strides such that we avoid an over- or underflow with the existing logic (see added comments for how the upper bound for the stride is determined). These large strides should be very uncommon in practice and even if we encounter these, the loop would only run for a few iterations. So, a bailout seems fine. This bailout has the additional benefit that we avoid other possibly unknown issues or issues in the future with counted loops having large edge-case strides like `min_int`. > > Thanks, > Christian This pull request has now been integrated. Changeset: 1dd60b62 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/1dd60b62e384090b13a08d2afa62e49ef52bc46c Stats: 21 lines in 1 file changed: 20 ins; 0 del; 1 mod 8323688: C2: Fix UB of jlong overflow in PhaseIdealLoop::is_counted_loop() Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/20828 From dnsimon at openjdk.org Tue Sep 24 06:54:37 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 24 Sep 2024 06:54:37 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option [v2] In-Reply-To: <3Q2rjX8usrpKiTZVeUXq7Ty8lFiFsDi03fmMnNA9IQ0=.7539b42e-03be-4b5c-abcf-51314d85b087@github.com> References: <3Q2rjX8usrpKiTZVeUXq7Ty8lFiFsDi03fmMnNA9IQ0=.7539b42e-03be-4b5c-abcf-51314d85b087@github.com> Message-ID: On Tue, 24 Sep 2024 01:29:55 GMT, Todd V. Jonker wrote: > If I'm reading things correctly, the doc-string for `UseJVMCINativeLibrary` in `jvmci_globals.hpp` needs updating. That [currently states](https://github.com/openjdk/jdk/blob/78f576192e815f957db93f5f8cb3763a35474381/src/hotspot/share/jvmci/jvmci_globals.hpp#L140-L144) "Defaults to true if EnableJVMCIProduct is true and a JVMCI native library is available" but looks like it default to true if `EnableJVMCI` is true, regardless of the `EnableJVMCIProduct` setting. That is correct and I'm making that fix [here](https://github.com/openjdk/jdk/pull/21120/files#diff-cba70430948d75c7d40424fbbc704e7d7c571d6862502e210630369d8800ec62L143). However, it wouldn't hurt to also fix it here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21069#issuecomment-2370342202 From dqu at openjdk.org Tue Sep 24 07:02:37 2024 From: dqu at openjdk.org (Daohan Qu) Date: Tue, 24 Sep 2024 07:02:37 GMT Subject: RFR: 8340602: C2: LoadNode::split_through_phi might exhaust nodes in case of base_is_phi [v2] In-Reply-To: <2swshN9Ew9xf8p7g0KXHRRw-SHg1Q-X4LZtXy5roDU0=.9e5499d7-0ddf-4ee8-872c-ba00a62cee28@github.com> References: <2swshN9Ew9xf8p7g0KXHRRw-SHg1Q-X4LZtXy5roDU0=.9e5499d7-0ddf-4ee8-872c-ba00a62cee28@github.com> Message-ID: On Tue, 24 Sep 2024 05:23:13 GMT, Daohan Qu wrote: >> # Description >> >> [JDK-6934604](https://github.com/openjdk/jdk/commit/b4977e887a53c898b96a7d37a3bf94742c7cc194) introduces the flag `AggressiveUnboxing` in jdk8u, and [JDK-8217919](https://github.com/openjdk/jdk/commit/71759e3177fcd6926bb36a30a26f9f3976f2aae8) enables it by default in jdk13u. >> >> But it seems that JDK-6934604 forgets to check duplicate `PhiNode` generated in the function `LoadNode::split_through_phi(PhaseGVN *phase)` (in `memnode.cpp`) in the case that `base` is phi but `mem` is not phi. More exactly, `LoadNode::Identity(PhaseTransform *phase)` doesn't search for `PhiNode` in the correct region in that case. >> >> This might cause infinite split in the case of a loop, which is similar to the bugs fixed in [JDK-6673473](https://github.com/openjdk/jdk/commit/30dc0edfc877000c0ae20384f228b45ba82807b7). The infinite split results in "Out of nodes" and make the method "not compilable". >> >> Since JDK-8217919 (in jdk13u), all the later versions of jdks are affected by this bug when the expected optimization pattern appears in the code. For example, the following three micro-benchmarks running with >> >> >> make test \ >> TEST="micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef" \ >> MICRO="FORK=1;WARMUP_ITER=2" \ >> TEST_OPTS="VM_OPTIONS=-XX:+UseParallelGC" >> >> >> shows performance improvement after this PR applied. (`-XX:+UseParallelGC` is only for reproduce this bug, all the bms in the following table are run with this option.) >> >> |benchmark (throughput, unit: ops/s)| jdk-before-this-patch | jdk-after-this-patch | >> |---|---|---| >> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner | 26.452 ?(99.9%) 0.185 ops/s | 62.060 ?(99.9%) 0.878 ops/s | >> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda | 26.922 ?(99.9%) 1.710 ops/s | 67.961 ?(99.9%) 0.850 ops/s | >> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef | 28.382 ?(99.9%) 1.021 ops/s | 67.998 ?(99.9%) 0.751 ops/s | >> >> # Reproduction >> >> Compiled and run the reduced test case `Test.java` in the appendix below using >> >> >> java -Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=comp.log -XX:+UseParallelGC Test >> >> >> and you could find that `Test$Obj.calc` is tagged with `make_not_compilable` and... > > Daohan Qu has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Check duplicate split in case of base_is_phi Hi @vnkozlov , I noticed that you have fixed a similar bug in [JDK-6673473](https://github.com/openjdk/jdk/commit/30dc0edfc877000c0ae20384f228b45ba82807b7). Could you please review this PR? Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21134#issuecomment-2370355269 From jbhateja at openjdk.org Tue Sep 24 07:10:24 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 24 Sep 2024 07:10:24 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Handling NPOT vector length for AArch64 SVE with vector sizes varying b/w 128 and 2048 bits at 128 bit increments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/31a58642..42ca80c5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=11-12 Stats: 225 lines in 41 files changed: 25 ins; 82 del; 118 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From gcao at openjdk.org Tue Sep 24 07:11:39 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 24 Sep 2024 07:11:39 GMT Subject: Integrated: 8340590: RISC-V: C2: Small improvement to vector gather load and scatter store In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 03:03:23 GMT, Gui Cao wrote: > Hi, > This is a small improvement for RISC-V C2 vector gather load and scatter store nodes. Currently, we emit whole vector register move (vmv1r.v) to move vector idx to a temp vector register. But a normal vector integer move (vmv.v.v) would do and is more reasonable as the vtype and vl are known here. > > > ### Testing > - [x] make test TEST="jdk_vector" JTREG="TIMEOUT_FACTOR=32" on qemu with UseRVV1.0 This pull request has now been integrated. Changeset: 88801cae Author: Gui Cao Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/88801caef6ccdc5ba9ade2af830f3b3cd96e1467 Stats: 8 lines in 1 file changed: 0 ins; 4 del; 4 mod 8340590: RISC-V: C2: Small improvement to vector gather load and scatter store Reviewed-by: fyang, dzhang ------------- PR: https://git.openjdk.org/jdk/pull/21123 From duke at openjdk.org Tue Sep 24 07:15:57 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Tue, 24 Sep 2024 07:15:57 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option [v3] In-Reply-To: References: Message-ID: > Disabling the JVMCI compiler with `-XX:-UseJVMCICompiler` not only deactivates JVMCI-based CompileBroker compilations but also prevents the loading of the libjvmci compiler. While this works as expected for CompileBroker compilations, it poses issues for the Truffle compiler. When `-XX:-UseJVMCICompiler` is used, Truffle falls back to the jargraal compiler, if available. This behavior may be confusing for Truffle users. > > Expected behavior: > > With `-XX:+UseGraalJIT`, both CompileBroker compilations and Truffle compilations should utilize the libjvmci compiler, if available. > With `-XX:+EnableJVMCI`, CompileBroker compilations should use the C2 compiler, while only Truffle compilations should leverage the libjvmci compiler, if available. Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: JDK-8340398: Fixed UseJVMCINativeLibrary doc string. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21069/files - new: https://git.openjdk.org/jdk/pull/21069/files/b7550463..28dbd932 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21069&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21069&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21069.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21069/head:pull/21069 PR: https://git.openjdk.org/jdk/pull/21069 From duke at openjdk.org Tue Sep 24 07:15:57 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Tue, 24 Sep 2024 07:15:57 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option [v2] In-Reply-To: References: <3Q2rjX8usrpKiTZVeUXq7Ty8lFiFsDi03fmMnNA9IQ0=.7539b42e-03be-4b5c-abcf-51314d85b087@github.com> Message-ID: On Tue, 24 Sep 2024 06:51:52 GMT, Doug Simon wrote: > If I'm reading things correctly, the doc-string for `UseJVMCINativeLibrary` in `jvmci_globals.hpp` needs updating. Fixed in https://github.com/openjdk/jdk/pull/21069/commits/28dbd9329a4ad67a39e3ba19767aea2209313382 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21069#issuecomment-2370379478 From duke at openjdk.org Tue Sep 24 07:21:37 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Tue, 24 Sep 2024 07:21:37 GMT Subject: RFR: 8340363: Tag-specific default decorators for UnifiedLogging In-Reply-To: References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: On Thu, 19 Sep 2024 01:33:53 GMT, David Holmes wrote: >> Hi all, >> >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: >> >> - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. >> - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. >> - Additionally, defaults may target a specific log level. >> >> Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. >> >> In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. >> >> Please consider this PR, and thanks! > > This affects all hotspot developers using UL so extending coverage: @dholmes-ora in addition to the socialising comments that Roberto and Johan have already responded, I'll try to clarify the PR a little (I've updated the title as well to make it clearer): - `Xlog:jit+inlining` is the same as `Xlog:inlining+jit`, and will get the same treatment - Wildcards have been chosen to be ignored as you could potentially match too many defaults. In the end, these defaults only attempt to offer a "help" to developers using the same specific `-Xlog` option and that don't want to specify the tagset they are interested on every time - I am not planning on changing the design idea of "decorators associated with output device". This PR enables having defaults for `-Xlog`-selected tagsets, but once the output device is configured we will end up with the same decorators throughout it (does not matter if it is stdout or a real file) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20988#issuecomment-2370395774 From duke at openjdk.org Tue Sep 24 07:27:36 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Tue, 24 Sep 2024 07:27:36 GMT Subject: RFR: 8340363: Tag-specific default decorators for UnifiedLogging In-Reply-To: References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: On Thu, 19 Sep 2024 07:08:56 GMT, Roberto Casta?eda Lozano wrote: >> Hi all, >> >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: >> >> - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. >> - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. >> - Additionally, defaults may target a specific log level. >> >> Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. >> >> In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. >> >> Please consider this PR, and thanks! > > Nice proposal, Ant?n! This will make it possible to migrate lots of debug/trace-level ad-hoc logging in the compiler code to the UL while preserving its current format (e.g. time decorators are hardly needed when examining the output of `-XX:+TraceLoopOpts`). > > Having said this, I find the following behavior unintuitive. If I run: > > > -Xlog:jit*=debug > > > I get the global default decorators, i.e. `uptime,level,tags`, which is what I expected. But if I run: > > > java -Xlog:jit+compilation=debug,jit+inlining=debug,jit+thread=debug > > > I would expect to get the same decorators, but instead I get the default decorators for `jit+inlining`, i.e. none. Is this intentional? > > In general, as a HotSpot developer the behavior I would find most natural is to select the union of all decorators for all chosen tags (regardless of whether the decorators for a tag have been chosen actively by the user, specified as default for the tag, or "inherited" from the global default), as in the first option (`-Xlog:jit*=debug`). @robcasloz Originally I had thought of these defaults "taking over" if there were no defaults for the rest, but I know what you mean and in a way the uptime-level-tags are some implicit defaults that should also be applied. Merging all default specification as suggested by @jdksjolen would automatically enable this behaviour ------------- PR Comment: https://git.openjdk.org/jdk/pull/20988#issuecomment-2370406970 From dnsimon at openjdk.org Tue Sep 24 07:31:46 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 24 Sep 2024 07:31:46 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option [v3] In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 07:15:57 GMT, Tom?? Zezula wrote: >> Disabling the JVMCI compiler with `-XX:-UseJVMCICompiler` not only deactivates JVMCI-based CompileBroker compilations but also prevents the loading of the libjvmci compiler. While this works as expected for CompileBroker compilations, it poses issues for the Truffle compiler. When `-XX:-UseJVMCICompiler` is used, Truffle falls back to the jargraal compiler, if available. This behavior may be confusing for Truffle users. >> >> Expected behavior: >> >> With `-XX:+UseGraalJIT`, both CompileBroker compilations and Truffle compilations should utilize the libjvmci compiler, if available. >> With `-XX:+EnableJVMCI`, CompileBroker compilations should use the C2 compiler, while only Truffle compilations should leverage the libjvmci compiler, if available. > > Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8340398: Fixed UseJVMCINativeLibrary doc string. Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21069#pullrequestreview-2324314959 From dholmes at openjdk.org Tue Sep 24 07:46:39 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 24 Sep 2024 07:46:39 GMT Subject: RFR: 8340363: Tag-specific default decorators for UnifiedLogging In-Reply-To: References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: On Tue, 24 Sep 2024 07:19:21 GMT, Ant?n Seoane wrote: > I am not planning on changing the design idea of "decorators associated with output device". This PR enables having defaults for -Xlog-selected tagsets, but once the output device is configured we will end up with the same decorators throughout Sorry can you please clarify exactly how these compose. If I set one set of defaults for tags A+B and another set for C+D, then what happens if I specify `-Xlog:A+B,C+D`? And what happens if I configure decorators for say stdout and in addition enable A+B on stdout - what is the resulting set of decorators? Thanks ------------- PR Comment: https://git.openjdk.org/jdk/pull/20988#issuecomment-2370444955 From dnsimon at openjdk.org Tue Sep 24 08:20:40 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 24 Sep 2024 08:20:40 GMT Subject: RFR: 8340585: [JVMCI] compiler/unsafe/UnsafeGetStableArrayElement.java fails with -XX:-UseCompressedClassPointers In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 13:05:24 GMT, Yudi Zheng wrote: > Graal does not constant fold unaligned long/double reads from primitive stable arrays. Update UnsafeGetStableArrayElement.java accordingly. LGTM ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21136#pullrequestreview-2324486168 From yzheng at openjdk.org Tue Sep 24 08:27:50 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 24 Sep 2024 08:27:50 GMT Subject: RFR: 8340585: [JVMCI] compiler/unsafe/UnsafeGetStableArrayElement.java fails with -XX:-UseCompressedClassPointers In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 13:05:24 GMT, Yudi Zheng wrote: > Graal does not constant fold unaligned long/double reads from primitive stable arrays. Update UnsafeGetStableArrayElement.java accordingly. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21136#issuecomment-2370598286 From yzheng at openjdk.org Tue Sep 24 08:27:50 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 24 Sep 2024 08:27:50 GMT Subject: Integrated: 8340585: [JVMCI] compiler/unsafe/UnsafeGetStableArrayElement.java fails with -XX:-UseCompressedClassPointers In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 13:05:24 GMT, Yudi Zheng wrote: > Graal does not constant fold unaligned long/double reads from primitive stable arrays. Update UnsafeGetStableArrayElement.java accordingly. This pull request has now been integrated. Changeset: 44024826 Author: Yudi Zheng URL: https://git.openjdk.org/jdk/commit/44024826e52373d1613ec366e3f5a9d5bbaefa41 Stats: 12 lines in 1 file changed: 0 ins; 0 del; 12 mod 8340585: [JVMCI] compiler/unsafe/UnsafeGetStableArrayElement.java fails with -XX:-UseCompressedClassPointers Reviewed-by: dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/21136 From duke at openjdk.org Tue Sep 24 08:38:37 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Tue, 24 Sep 2024 08:38:37 GMT Subject: RFR: 8340363: Tag-specific default decorators for UnifiedLogging In-Reply-To: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: On Fri, 13 Sep 2024 09:03:55 GMT, Ant?n Seoane wrote: > Hi all, > > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: > > - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. > - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. > - Additionally, defaults may target a specific log level. > > Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. > > In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. > > Please consider this PR, and thanks! These defaults are not meant to target a specific selected output, so nothing different would occur. With respect to the first question, right now we would not get any defaults applied as there is a "collision" between A+B and C+D. That was my original idea, where I assumed it might be unwanted to apply all the possible defaults one over another. However, now (a) I don't think we will have that many defaults to drive this to chaos, and (b) as per @robcasloz feedback I believe it would be useful to apply all. Going back to your question: right now we do not apply any defaults upon that `-Xlog:A+B,C+D`, but I am working on some changes I will push soon that will change this behaviour to "merge" the default decorators for A+B and C+D ------------- PR Comment: https://git.openjdk.org/jdk/pull/20988#issuecomment-2370628280 From adinn at openjdk.org Tue Sep 24 08:41:44 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 24 Sep 2024 08:41:44 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v12] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 15:21:31 GMT, Quan Anh Mai wrote: >> If _lo <= x <= _hi, then I believe the dual is _hi <= x <= _lo >> >> If dual is really only needed for join, then it seems like we could remove the concept of dual and just implement join. > > @dean-long Thanks, that is really helpful. IIUC, the duality here refers to the set of all `TypeInt` with a set `a` considered higher than `b` if `a` is a subset of `b`. This leads to our notion of bottom type being the universe set and top type being the empty set. It still does not make sense for the concept of a dual `TypeInt`, though, since the concept of duality applies to the set of `TypeInt`, not the `TypeInt`s themselves. > >> My understanding is "join" means "union", "meet" means "intersection", and "dual" means "complement". > > You got it backward, "join" means intersection and "meet" means union. If you want to understand full details of how a (symmetrical) type lattice with duals supports a unified model for many different type flow analysis algorithms you can read up on it in Nielsen, Nielsen and Hankin's book Principles of Program Analysis. If it is new to you then a more simplified account of the use of (unqualified) TOP and BOTTOM types in type flow analysis can be found in Muchnick's book Advanced Compiler Design and Implementation. Note that Cliff Click goes against conventional mathematical terminology in making BOTTOM a universal type and TOP an empty (unrealizable) type. One detail that may not be obvious is that the sub-lattice for int and long sorts includes the hierarchy of single, continuous intervals. Individual integral values (on the lattice centre line) are modelled as singleton ranges i.e. [a,a]. Given the large cardinality of the set of continuous intervals this makes it necessary to place a bound on any fixed point iterations that widen interval ranges. The iteration is killed by widening to the maximum range (this is what Cliff refers to in the code as a 'death march'). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1772913965 From adinn at openjdk.org Tue Sep 24 10:00:16 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 24 Sep 2024 10:00:16 GMT Subject: RFR: 8340793: Fix client build on AArch64 and arm after JDK-8337987 Message-ID: Trivial fix to include missing headers into sharedRuntime_aarch64/arm.cpp. ------------- Commit messages: - 8340793: Fix client build on AArch64 and arm after JDK-8337987 Changes: https://git.openjdk.org/jdk/pull/21153/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21153&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340793 Stats: 2 lines in 2 files changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21153.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21153/head:pull/21153 PR: https://git.openjdk.org/jdk/pull/21153 From duke at openjdk.org Tue Sep 24 10:18:43 2024 From: duke at openjdk.org (duke) Date: Tue, 24 Sep 2024 10:18:43 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option [v3] In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 07:15:57 GMT, Tom?? Zezula wrote: >> Disabling the JVMCI compiler with `-XX:-UseJVMCICompiler` not only deactivates JVMCI-based CompileBroker compilations but also prevents the loading of the libjvmci compiler. While this works as expected for CompileBroker compilations, it poses issues for the Truffle compiler. When `-XX:-UseJVMCICompiler` is used, Truffle falls back to the jargraal compiler, if available. This behavior may be confusing for Truffle users. >> >> Expected behavior: >> >> With `-XX:+UseGraalJIT`, both CompileBroker compilations and Truffle compilations should utilize the libjvmci compiler, if available. >> With `-XX:+EnableJVMCI`, CompileBroker compilations should use the C2 compiler, while only Truffle compilations should leverage the libjvmci compiler, if available. > > Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8340398: Fixed UseJVMCINativeLibrary doc string. @tzezula Your change (at version 28dbd9329a4ad67a39e3ba19767aea2209313382) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21069#issuecomment-2370858957 From duke at openjdk.org Tue Sep 24 10:22:43 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Tue, 24 Sep 2024 10:22:43 GMT Subject: Integrated: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 16:53:40 GMT, Tom?? Zezula wrote: > Disabling the JVMCI compiler with `-XX:-UseJVMCICompiler` not only deactivates JVMCI-based CompileBroker compilations but also prevents the loading of the libjvmci compiler. While this works as expected for CompileBroker compilations, it poses issues for the Truffle compiler. When `-XX:-UseJVMCICompiler` is used, Truffle falls back to the jargraal compiler, if available. This behavior may be confusing for Truffle users. > > Expected behavior: > > With `-XX:+UseGraalJIT`, both CompileBroker compilations and Truffle compilations should utilize the libjvmci compiler, if available. > With `-XX:+EnableJVMCI`, CompileBroker compilations should use the C2 compiler, while only Truffle compilations should leverage the libjvmci compiler, if available. This pull request has now been integrated. Changeset: 4cd8c75a Author: Tomas Zezula Committer: Doug Simon URL: https://git.openjdk.org/jdk/commit/4cd8c75a55163be33917b1fba9f360ea816f3aa9 Stats: 21 lines in 3 files changed: 12 ins; 3 del; 6 mod 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option Reviewed-by: dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/21069 From duke at openjdk.org Tue Sep 24 11:11:43 2024 From: duke at openjdk.org (kuaiwei) Date: Tue, 24 Sep 2024 11:11:43 GMT Subject: Integrated: 8339299: C1 will miss type profile when inline final method In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 07:25:59 GMT, kuaiwei wrote: > I found sometimes C1 will miss type profile and I tried to write a test to demonstrate it. > It will happen with these conditions > 1 It's a virtual call and the callee is a final method. So c1 will think it's static bound. > 2 The interpreter never touch the callsite, so interpreter does not add type profile. > 3 In c1 compilation, it will be inlined based on CHA. > 4 In c2 compilation, the CHA is broken, but type profile is missing, so c2 can not inline it. This pull request has now been integrated. Changeset: e1c4d303 Author: Kuai Wei URL: https://git.openjdk.org/jdk/commit/e1c4d3039f6b5106ce3f65d50f607eacc2a8d168 Stats: 150 lines in 2 files changed: 148 ins; 0 del; 2 mod 8339299: C1 will miss type profile when inline final method Reviewed-by: lmesnik, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/20786 From shade at openjdk.org Tue Sep 24 11:15:35 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 24 Sep 2024 11:15:35 GMT Subject: RFR: 8340793: Fix client build on AArch64 and arm after JDK-8337987 In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 09:54:43 GMT, Andrew Dinn wrote: > Trivial fix to include missing headers into sharedRuntime_aarch64/arm.cpp. What's the build failure? I.e. what is the symbol that fails to be resolved? Otherwise looks good and trivial. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21153#pullrequestreview-2324900701 From shade at openjdk.org Tue Sep 24 11:22:37 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 24 Sep 2024 11:22:37 GMT Subject: RFR: 8340793: Fix client build on AArch64 and arm after JDK-8337987 In-Reply-To: References: Message-ID: <7KCHWE8SyZgQNAol_jCPrz0J_Q_ke-U7mOK0-A9CQoM=.232439ae-47ba-4e3c-b7b6-19ff2b6a63d4@github.com> On Tue, 24 Sep 2024 09:54:43 GMT, Andrew Dinn wrote: > Trivial fix to include missing headers into sharedRuntime_aarch64/arm.cpp. Please also link the issues together in JBS (I have problems logging in, otherwise I would have done it myself). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21153#issuecomment-2370976963 From fyang at openjdk.org Tue Sep 24 11:46:34 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 24 Sep 2024 11:46:34 GMT Subject: RFR: 8340793: Fix client build on AArch64 and arm after JDK-8337987 In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 09:54:43 GMT, Andrew Dinn wrote: > Trivial fix to include missing headers into sharedRuntime_aarch64/arm.cpp. LGTM. BTW: Seems s390 bears a similar problem. I see use of `TraceTime` in file sharedRuntime_s390.cpp (by JDK-8337987) which does not include "runtime/timerTrace.hpp" either. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21153#pullrequestreview-2324976054 From adinn at openjdk.org Tue Sep 24 12:03:34 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 24 Sep 2024 12:03:34 GMT Subject: RFR: 8340793: Fix client build on AArch64 and arm after JDK-8337987 In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 09:54:43 GMT, Andrew Dinn wrote: > Trivial fix to include missing headers into sharedRuntime_aarch64/arm.cpp. Problem was originally notified [here](https://github.com/openjdk/jdk/pull/20566#issuecomment-2363857168) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21153#issuecomment-2371065870 From adinn at openjdk.org Tue Sep 24 12:09:08 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 24 Sep 2024 12:09:08 GMT Subject: RFR: 8340793: Fix client build on AArch64 and arm after JDK-8337987 [v2] In-Reply-To: References: Message-ID: > Trivial fix to include missing headers into sharedRuntime_aarch64/arm.cpp. Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: also add header include on s390 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21153/files - new: https://git.openjdk.org/jdk/pull/21153/files/d065b04e..817cf85a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21153&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21153&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21153.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21153/head:pull/21153 PR: https://git.openjdk.org/jdk/pull/21153 From adinn at openjdk.org Tue Sep 24 12:09:08 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 24 Sep 2024 12:09:08 GMT Subject: RFR: 8340793: Fix client build on AArch64 and arm after JDK-8337987 [v2] In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 11:44:00 GMT, Fei Yang wrote: >> Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: >> >> also add header include on s390 > > LGTM. BTW: Seems s390 bears a similar problem. I see use of `TraceTime` in file sharedRuntime_s390.cpp (by JDK-8337987) which does not include "runtime/timerTrace.hpp" either. @RealFYang Thanks for spotting the s390 issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21153#issuecomment-2371081728 From shade at openjdk.org Tue Sep 24 12:11:34 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 24 Sep 2024 12:11:34 GMT Subject: RFR: 8340793: Fix client builds after JDK-8337987 [v2] In-Reply-To: References: Message-ID: <0CLZeLoprJM8AlVVXGtkE5ljnZ9MUhC9z-8cnALOaTg=.a1fedbdc-6b88-44e2-adb1-bc143b083911@github.com> On Tue, 24 Sep 2024 12:09:08 GMT, Andrew Dinn wrote: >> Trivial fix to include missing headers into sharedRuntime_aarch64/arm.cpp. > > Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: > > also add header include on s390 All right, trivial then. Consider changing the synopsis to just "8340793: Fix client builds after JDK-8337987", as it affects more platforms. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21153#pullrequestreview-2325038319 From shade at openjdk.org Tue Sep 24 12:21:36 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 24 Sep 2024 12:21:36 GMT Subject: RFR: 8340793: Fix client builds after JDK-8337987 [v2] In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 12:09:08 GMT, Andrew Dinn wrote: >> Trivial fix to include missing headers into sharedRuntime_aarch64/arm.cpp. > > Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: > > also add header include on s390 Yeah, the bots should catch up with JBS update proverbially "soon". This is still trivial. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21153#pullrequestreview-2325069990 From adinn at openjdk.org Tue Sep 24 12:21:37 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 24 Sep 2024 12:21:37 GMT Subject: RFR: 8340793: Fix client builds after JDK-8337987 In-Reply-To: <7KCHWE8SyZgQNAol_jCPrz0J_Q_ke-U7mOK0-A9CQoM=.232439ae-47ba-4e3c-b7b6-19ff2b6a63d4@github.com> References: <7KCHWE8SyZgQNAol_jCPrz0J_Q_ke-U7mOK0-A9CQoM=.232439ae-47ba-4e3c-b7b6-19ff2b6a63d4@github.com> Message-ID: On Tue, 24 Sep 2024 11:19:30 GMT, Aleksey Shipilev wrote: >> Trivial fix to include missing headers into sharedRuntime_aarch64/arm.cpp. > > Please also link the issues together in JBS (I have problems logging in, otherwise I would have done it myself). @shipilev The description section of the JBS issue for this PR includes a link to the JBS issue whose PR caused the breakage. That PR includes the problem notification and a response which links back to this one's issue. So I think we are sufficiently cross-gartered. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21153#issuecomment-2371106452 From dholmes at openjdk.org Tue Sep 24 12:48:42 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 24 Sep 2024 12:48:42 GMT Subject: RFR: 8340363: Tag-specific default decorators for UnifiedLogging In-Reply-To: References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: On Tue, 24 Sep 2024 08:36:12 GMT, Ant?n Seoane wrote: > These defaults are not meant to target a specific selected output, so nothing different would occur. Sorry but that doesn't make sense. When you set the tagset defaults they are output agnostic, but once you set A+B on the command-line then that is associated with a specific output and so the decorators apply to that output. > I will push soon that will change this behaviour to "merge" the default decorators for A+B and C+D What does "merge" mean? union? intersection? I can't see how you can come up with rules that will universally make sense here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20988#issuecomment-2371173732 From tholenstein at openjdk.org Tue Sep 24 13:21:39 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 24 Sep 2024 13:21:39 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access [v2] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 06:32:23 GMT, Emanuel Peter wrote: >> Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix 2.0 : Add uncast in LibraryCallKit::classify_unsafe_addr > > test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java line 34: > >> 32: * -XX:+IgnoreUnrecognizedVMOptions >> 33: * -XX:TypeProfileLevel=222 >> 34: * -XX:+AlwaysIncrementalInline > > Could it make sense to have a run without all these extra flags? That would allow us to set different values from the outside - maybe that triggers some other (related?) bug down the line. Yes, makes sense. I added it ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20033#discussion_r1773331642 From tholenstein at openjdk.org Tue Sep 24 13:33:14 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 24 Sep 2024 13:33:14 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access [v3] In-Reply-To: References: Message-ID: <6WTkb-ZgqH8dd-tpK9P8FHClUGXzEp__1tZ1Av45PNg=.5167b860-4036-43ea-b76d-4435c36423f3@github.com> > We failed in `LibraryCallKit::inline_unsafe_access()` while trying to inline `Unsafe::getShortUnaligned`. > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java#L86 > The reason is that base (the array) is `ConP #null` hidden behind two `CheckCastPP` with `speculative=byte[int:>=0]` > > We call `Node* adr = make_unsafe_address(base, offset, type, kind == Relaxed);` > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2361 > - with **base** = `147 CheckCastPP` > - `118 ConP === 0 [[[ 106 101 71 ] #null` > type > > Depending on the **offset** we go two different paths in `LibraryCallKit::make_unsafe_address` which both lead to the same error in the end. > 1. For `UNSAFE.getShortUnaligned(array, 1_049_000)` we get kind = `Type::AnyPtr` because `offset >= os::vm_page_size()`. Since we assume base can't be null we insert an assert: > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2111 > > 2. whereas for `UNSAFE.getShortUnaligned(array, 1)` we get kind = `Type:: OopPtr` > https://github.com/openjdk/jdk/blob/c17fa910cf3bad48547a3f0d68a30795ec3194e6/src/hotspot/share/opto/library_call.cpp#L2078 > and insert a null check > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2090 > In both cases we return call `basic_plus_adr(..)` on a base being `top()` which returns **adr** = `1 Con === 0 [[ ]] #top` > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2386 => `_gvn.type(adr)` is _top_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2394 => `adr_type` is _nullptr_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2405-L2406 => `BasicType bt` is _T_ILLEGAL_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2424 => we fail here with `SIGSEGV: null pointer dereference` because `alias_type->adr_type()` is _nullptr_ > > ### Fix (updated on 18th Sep 2024) > The fix modifies the `LibraryCallKit::classify_unsafe_addr()`... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: add another JTreg test with less flags ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20033/files - new: https://git.openjdk.org/jdk/pull/20033/files/2ab02f83..5ba2d9e6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20033&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20033&range=01-02 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20033.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20033/head:pull/20033 PR: https://git.openjdk.org/jdk/pull/20033 From tholenstein at openjdk.org Tue Sep 24 13:33:14 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 24 Sep 2024 13:33:14 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access [v2] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 06:38:23 GMT, Emanuel Peter wrote: > Generally looks reasonable. > > Why does the `null` end up above the `CheckCastPP`, and why does the `CheckCastPP` not get constant folded to `null`? Maybe there is no `IGVN` happening since this pattern was created - and that is expected? `null` end up above `CheckCastPP` during incremental inlining (see https://github.com/openjdk/jdk/pull/20033#issuecomment-2291151246). Yes, there is no IGVN performed in between because it is too expensive. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20033#issuecomment-2371277571 From roland at openjdk.org Tue Sep 24 13:39:38 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 24 Sep 2024 13:39:38 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access [v3] In-Reply-To: <6WTkb-ZgqH8dd-tpK9P8FHClUGXzEp__1tZ1Av45PNg=.5167b860-4036-43ea-b76d-4435c36423f3@github.com> References: <6WTkb-ZgqH8dd-tpK9P8FHClUGXzEp__1tZ1Av45PNg=.5167b860-4036-43ea-b76d-4435c36423f3@github.com> Message-ID: On Tue, 24 Sep 2024 13:33:14 GMT, Tobias Holenstein wrote: >> We failed in `LibraryCallKit::inline_unsafe_access()` while trying to inline `Unsafe::getShortUnaligned`. >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java#L86 >> The reason is that base (the array) is `ConP #null` hidden behind two `CheckCastPP` with `speculative=byte[int:>=0]` >> >> We call `Node* adr = make_unsafe_address(base, offset, type, kind == Relaxed);` >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2361 >> - with **base** = `147 CheckCastPP` >> - `118 ConP === 0 [[[ 106 101 71 ] #null` >> type >> >> Depending on the **offset** we go two different paths in `LibraryCallKit::make_unsafe_address` which both lead to the same error in the end. >> 1. For `UNSAFE.getShortUnaligned(array, 1_049_000)` we get kind = `Type::AnyPtr` because `offset >= os::vm_page_size()`. Since we assume base can't be null we insert an assert: >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2111 >> >> 2. whereas for `UNSAFE.getShortUnaligned(array, 1)` we get kind = `Type:: OopPtr` >> https://github.com/openjdk/jdk/blob/c17fa910cf3bad48547a3f0d68a30795ec3194e6/src/hotspot/share/opto/library_call.cpp#L2078 >> and insert a null check >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2090 >> In both cases we return call `basic_plus_adr(..)` on a base being `top()` which returns **adr** = `1 Con === 0 [[ ]] #top` >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2386 => `_gvn.type(adr)` is _top_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2394 => `adr_type` is _nullptr_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2405-L2406 => `BasicType bt` is _T_ILLEGAL_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2424 => we fail here with `SIGSEGV: null pointer dereference` because `alias_type->adr_type()` is _nullptr_ >> >> ### Fix (updated on 18th Sep 2024) >> T... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > add another JTreg test with less flags Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20033#pullrequestreview-2325298187 From epeter at openjdk.org Tue Sep 24 13:43:41 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 24 Sep 2024 13:43:41 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access [v3] In-Reply-To: <6WTkb-ZgqH8dd-tpK9P8FHClUGXzEp__1tZ1Av45PNg=.5167b860-4036-43ea-b76d-4435c36423f3@github.com> References: <6WTkb-ZgqH8dd-tpK9P8FHClUGXzEp__1tZ1Av45PNg=.5167b860-4036-43ea-b76d-4435c36423f3@github.com> Message-ID: On Tue, 24 Sep 2024 13:33:14 GMT, Tobias Holenstein wrote: >> We failed in `LibraryCallKit::inline_unsafe_access()` while trying to inline `Unsafe::getShortUnaligned`. >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java#L86 >> The reason is that base (the array) is `ConP #null` hidden behind two `CheckCastPP` with `speculative=byte[int:>=0]` >> >> We call `Node* adr = make_unsafe_address(base, offset, type, kind == Relaxed);` >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2361 >> - with **base** = `147 CheckCastPP` >> - `118 ConP === 0 [[[ 106 101 71 ] #null` >> type >> >> Depending on the **offset** we go two different paths in `LibraryCallKit::make_unsafe_address` which both lead to the same error in the end. >> 1. For `UNSAFE.getShortUnaligned(array, 1_049_000)` we get kind = `Type::AnyPtr` because `offset >= os::vm_page_size()`. Since we assume base can't be null we insert an assert: >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2111 >> >> 2. whereas for `UNSAFE.getShortUnaligned(array, 1)` we get kind = `Type:: OopPtr` >> https://github.com/openjdk/jdk/blob/c17fa910cf3bad48547a3f0d68a30795ec3194e6/src/hotspot/share/opto/library_call.cpp#L2078 >> and insert a null check >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2090 >> In both cases we return call `basic_plus_adr(..)` on a base being `top()` which returns **adr** = `1 Con === 0 [[ ]] #top` >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2386 => `_gvn.type(adr)` is _top_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2394 => `adr_type` is _nullptr_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2405-L2406 => `BasicType bt` is _T_ILLEGAL_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2424 => we fail here with `SIGSEGV: null pointer dereference` because `alias_type->adr_type()` is _nullptr_ >> >> ### Fix (updated on 18th Sep 2024) >> T... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > add another JTreg test with less flags test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java line 39: > 37: * @run main/othervm > 38: * -Xbatch -XX:-TieredCompilation > 39: * compiler.parsing.TestUnsafeArrayAccessWithNullBase Not sure if there is a style-guide for writing JTREG commands... but I'd prefer if they were indented for every `@` command. With the second run, you could even drop `-Xbatch -XX:-TieredCompilation`, and just have `@run driver compiler.parsing.TestUnsafeArrayAccessWithNullBase`, because these 2 flags are also set from the ouside in some tiers, I think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20033#discussion_r1773374032 From adinn at openjdk.org Tue Sep 24 14:54:40 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 24 Sep 2024 14:54:40 GMT Subject: Integrated: 8340793: Fix client builds after JDK-8337987 In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 09:54:43 GMT, Andrew Dinn wrote: > Trivial fix to include missing headers into sharedRuntime_aarch64/arm.cpp. This pull request has now been integrated. Changeset: 2669e22b Author: Andrew Dinn URL: https://git.openjdk.org/jdk/commit/2669e22b76c99c1e41a324099154b561e0433b56 Stats: 3 lines in 3 files changed: 3 ins; 0 del; 0 mod 8340793: Fix client builds after JDK-8337987 Reviewed-by: shade, fyang ------------- PR: https://git.openjdk.org/jdk/pull/21153 From tholenstein at openjdk.org Tue Sep 24 15:08:04 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 24 Sep 2024 15:08:04 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access [v4] In-Reply-To: References: Message-ID: > We failed in `LibraryCallKit::inline_unsafe_access()` while trying to inline `Unsafe::getShortUnaligned`. > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java#L86 > The reason is that base (the array) is `ConP #null` hidden behind two `CheckCastPP` with `speculative=byte[int:>=0]` > > We call `Node* adr = make_unsafe_address(base, offset, type, kind == Relaxed);` > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2361 > - with **base** = `147 CheckCastPP` > - `118 ConP === 0 [[[ 106 101 71 ] #null` > type > > Depending on the **offset** we go two different paths in `LibraryCallKit::make_unsafe_address` which both lead to the same error in the end. > 1. For `UNSAFE.getShortUnaligned(array, 1_049_000)` we get kind = `Type::AnyPtr` because `offset >= os::vm_page_size()`. Since we assume base can't be null we insert an assert: > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2111 > > 2. whereas for `UNSAFE.getShortUnaligned(array, 1)` we get kind = `Type:: OopPtr` > https://github.com/openjdk/jdk/blob/c17fa910cf3bad48547a3f0d68a30795ec3194e6/src/hotspot/share/opto/library_call.cpp#L2078 > and insert a null check > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2090 > In both cases we return call `basic_plus_adr(..)` on a base being `top()` which returns **adr** = `1 Con === 0 [[ ]] #top` > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2386 => `_gvn.type(adr)` is _top_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2394 => `adr_type` is _nullptr_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2405-L2406 => `BasicType bt` is _T_ILLEGAL_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2424 => we fail here with `SIGSEGV: null pointer dereference` because `alias_type->adr_type()` is _nullptr_ > > ### Fix (updated on 18th Sep 2024) > The fix modifies the `LibraryCallKit::classify_unsafe_addr()`... Tobias Holenstein has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: - from JDK-8340707 from ProblemList - Merge remote-tracking branch 'origin/master' into JDK-8320308 - Merge remote-tracking branch 'origin/master' into JDK-8320308 - add another JTreg test with less flags - Fix 2.0 : Add uncast in LibraryCallKit::classify_unsafe_addr - less iterantions - update CompileCommand - Merge branch 'JDK-8320308' of github.com:tobiasholenstein/jdk into JDK-8320308 - Update UnsafeArrayAccess.java - move test - ... and 8 more: https://git.openjdk.org/jdk/compare/f9b5cc9c...79d8e96c ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20033/files - new: https://git.openjdk.org/jdk/pull/20033/files/5ba2d9e6..79d8e96c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20033&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20033&range=02-03 Stats: 260761 lines in 3326 files changed: 210461 ins; 32393 del; 17907 mod Patch: https://git.openjdk.org/jdk/pull/20033.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20033/head:pull/20033 PR: https://git.openjdk.org/jdk/pull/20033 From duke at openjdk.org Tue Sep 24 15:14:40 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Tue, 24 Sep 2024 15:14:40 GMT Subject: RFR: 8340363: Tag-specific default decorators for UnifiedLogging In-Reply-To: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: <0B6RQxjbSpVqb_VL-B_GFQUkwhIP5KmhgW2FP5DfBL4=.296cd768-d198-446d-8c06-d6e33a415e6f@github.com> On Fri, 13 Sep 2024 09:03:55 GMT, Ant?n Seoane wrote: > Hi all, > > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: > > - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. > - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. > - Additionally, defaults may target a specific log level. > > Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. > > In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. > > Please consider this PR, and thanks! Oh, I think I misunderstood you there. In case we have some A+B defaults set, and we run something like `java -Xlog:A+B::uptime`, the "runtime"-specified decorators prevail and the defaults are not to be triggered (e.g. in this case we'd have only uptime decorators, as it has been set explicitly). With merge I mean the union. It's what @robcasloz suggested above and what most people I've talked to feel to understand to be more logical ------------- PR Comment: https://git.openjdk.org/jdk/pull/20988#issuecomment-2371590845 From tholenstein at openjdk.org Tue Sep 24 15:14:56 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 24 Sep 2024 15:14:56 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access [v5] In-Reply-To: References: Message-ID: > We failed in `LibraryCallKit::inline_unsafe_access()` while trying to inline `Unsafe::getShortUnaligned`. > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java#L86 > The reason is that base (the array) is `ConP #null` hidden behind two `CheckCastPP` with `speculative=byte[int:>=0]` > > We call `Node* adr = make_unsafe_address(base, offset, type, kind == Relaxed);` > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2361 > - with **base** = `147 CheckCastPP` > - `118 ConP === 0 [[[ 106 101 71 ] #null` > type > > Depending on the **offset** we go two different paths in `LibraryCallKit::make_unsafe_address` which both lead to the same error in the end. > 1. For `UNSAFE.getShortUnaligned(array, 1_049_000)` we get kind = `Type::AnyPtr` because `offset >= os::vm_page_size()`. Since we assume base can't be null we insert an assert: > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2111 > > 2. whereas for `UNSAFE.getShortUnaligned(array, 1)` we get kind = `Type:: OopPtr` > https://github.com/openjdk/jdk/blob/c17fa910cf3bad48547a3f0d68a30795ec3194e6/src/hotspot/share/opto/library_call.cpp#L2078 > and insert a null check > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2090 > In both cases we return call `basic_plus_adr(..)` on a base being `top()` which returns **adr** = `1 Con === 0 [[ ]] #top` > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2386 => `_gvn.type(adr)` is _top_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2394 => `adr_type` is _nullptr_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2405-L2406 => `BasicType bt` is _T_ILLEGAL_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2424 => we fail here with `SIGSEGV: null pointer dereference` because `alias_type->adr_type()` is _nullptr_ > > ### Fix (updated on 18th Sep 2024) > The fix modifies the `LibraryCallKit::classify_unsafe_addr()`... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: JTreg style ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20033/files - new: https://git.openjdk.org/jdk/pull/20033/files/79d8e96c..2abec7d4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20033&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20033&range=03-04 Stats: 10 lines in 1 file changed: 0 ins; 5 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20033.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20033/head:pull/20033 PR: https://git.openjdk.org/jdk/pull/20033 From chagedorn at openjdk.org Tue Sep 24 16:00:20 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 24 Sep 2024 16:00:20 GMT Subject: RFR: 8340786: Introduce Predicate classes with predicate iterators and visitors for simplified walking Message-ID: This patch introduces new predicate classes which implement a new `Predicate` interface. These classes represent the different predicates in the C2 IR. They are used in combination with new predicate iterator and visitors classes to provide an easy way to walk and process predicates in the IR. ### Predicate Interfaces and Implementing Classes - `Predicate` interface is implemented by four predicate classes: - `ParsePredicate` (existing class) - `RuntimePredicate` (existing and updated class) - `TemplateAssertionPredicate` (new class) - `InitializedAssertionPredicate` (new class, renamed old `InitializedAssertionPredicate` class to `InitializedAssertionPredicateCreator`) ### Predicate Iterator with Visitor classes There is a new `PredicateIterator` class which can be used to iterate through the predicates of a loop. For each predicate, a `PredicateVisitor` can be applied. The user can implement the `PredicateIterator` interface and override the default do-nothing implementations to the specific needs of the code. I've done this for a couple of places in the code by defining new visitors: - `ParsePredicateUsefulMarker`: This visitor marks all Parse Predicates as useful. - Replaces the old now retired `ParsePredicateIterator`. - `DominatedPredicates`: This visitor checks the dominance relation to an `early` node when trying to figure out the latest legal placement for a node. The goal is to skip as many predicates as possible to avoid interference with Loop Predication and/or creating a Loop Limit Check Predicate. - Replaces the old now retired `PredicateEntryIterator`. - `Predicates::dump()`: Newly added dumping code for the predicates above a loop which uses a new `PredicatePrinter` visitor. This helps debugging issues with predicates. #### To Be Replaced soon There are a couple of places where we use similar code to walk predicates and apply some transformation/modifications to the IR. The goal is to replace these locations with the new visitors as well. This will incrementally be done with the next couple of PRs. ### More Information More information about specific classes and changes can be found as code comments and PR comments. Thanks, Christian ------------- Commit messages: - update - 8340786: Introduce Predicate classes with predicate visitors for simplified walking Changes: https://git.openjdk.org/jdk/pull/21161/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21161&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340786 Stats: 565 lines in 4 files changed: 419 ins; 62 del; 84 mod Patch: https://git.openjdk.org/jdk/pull/21161.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21161/head:pull/21161 PR: https://git.openjdk.org/jdk/pull/21161 From chagedorn at openjdk.org Tue Sep 24 16:00:21 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 24 Sep 2024 16:00:21 GMT Subject: RFR: 8340786: Introduce Predicate classes with predicate iterators and visitors for simplified walking In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 14:19:41 GMT, Christian Hagedorn wrote: > This patch introduces new predicate classes which implement a new `Predicate` interface. These classes represent the different predicates in the C2 IR. They are used in combination with new predicate iterator and visitors classes to provide an easy way to walk and process predicates in the IR. > > ### Predicate Interfaces and Implementing Classes > - `Predicate` interface is implemented by four predicate classes: > - `ParsePredicate` (existing class) > - `RuntimePredicate` (existing and updated class) > - `TemplateAssertionPredicate` (new class) > - `InitializedAssertionPredicate` (new class, renamed old `InitializedAssertionPredicate` class to `InitializedAssertionPredicateCreator`) > > ### Predicate Iterator with Visitor classes > There is a new `PredicateIterator` class which can be used to iterate through the predicates of a loop. For each predicate, a `PredicateVisitor` can be applied. The user can implement the `PredicateIterator` interface and override the default do-nothing implementations to the specific needs of the code. I've done this for a couple of places in the code by defining new visitors: > - `ParsePredicateUsefulMarker`: This visitor marks all Parse Predicates as useful. > - Replaces the old now retired `ParsePredicateIterator`. > - `DominatedPredicates`: This visitor checks the dominance relation to an `early` node when trying to figure out the latest legal placement for a node. The goal is to skip as many predicates as possible to avoid interference with Loop Predication and/or creating a Loop Limit Check Predicate. > - Replaces the old now retired `PredicateEntryIterator`. > - `Predicates::dump()`: Newly added dumping code for the predicates above a loop which uses a new `PredicatePrinter` visitor. This helps debugging issues with predicates. > > #### To Be Replaced soon > There are a couple of places where we use similar code to walk predicates and apply some transformation/modifications to the IR. The goal is to replace these locations with the new visitors as well. This will incrementally be done with the next couple of PRs. > > ### More Information > More information about specific classes and changes can be found as code comments and PR comments. > > Thanks, > Christian src/hotspot/share/opto/loopTransform.cpp line 1467: > 1465: assert(assertion_predicate_has_loop_opaque_node(template_assertion_predicate), > 1466: "must find OpaqueLoop* nodes for Template Assertion Predicate"); > 1467: InitializedAssertionPredicateCreator initialized_assertion_predicate(template_assertion_predicate, new_init, Needed to rename `InitializedAssertionPredicate` -> `InitializedAssertionPredicateCreator` to distinguish from the newly added `InitializedAssertionPredicate` class: - `InitializedAssertionPredicate`: Represent an already existing Initialized Assertion Predicate in the IR. - `InitializedAssertionPredicateCreator`: Class to create the IR nodes to represent an Initialized Assertion Predicate (one could return an `InitailizedAssertionPredicate` instance after the creation but that is not used/needed at the moment.) src/hotspot/share/opto/loopnode.cpp line 4325: > 4323: } > 4324: add_useless_parse_predicates_to_igvn_worklist(); > 4325: } `PredicateIterator` + ``ParsePredicateUsefulMarker` combo to replace this class. src/hotspot/share/opto/loopnode.cpp line 6381: > 6379: while( early != legal ) { // While not at earliest legal > 6380: if (legal->is_Start() && !early->is_Root()) { > 6381: #ifdef ASSERT `PredicateIterator` + ``DominatedPredicates` combo to replace this class. src/hotspot/share/opto/predicates.cpp line 171: > 169: // Walk over all Regular Predicates of this block (if any) and return the first node not belonging to the block > 170: // anymore (i.e. entry to the first Regular Predicate in this block if any or `regular_predicate_proj` otherwise). > 171: Node* PredicateBlock::skip_regular_predicates(Node* regular_predicate_proj, Deoptimization::DeoptReason deopt_reason) { Replaced by new `RegularPredicateBlock` class which does the skipping. src/hotspot/share/opto/predicates.hpp line 564: > 562: // Walk over all predicates of this block (if any) and apply the given 'predicate_visitor' to each predicate. > 563: // Returns the entry to the earliest predicate. > 564: Node* for_each(PredicateVisitor& predicate_visitor) const { The goal is eventually to only have this method which does the check for what kind of predicate we face in the graph and replace all other places where we do these kind of checks together with visitors. src/hotspot/share/opto/predicates.hpp line 615: > 613: // Class to walk over all predicates starting at a node, which usually is the loop entry node, and following the inputs. > 614: // At each predicate, a PredicateVisitor is applied which the user can implement freely. > 615: class PredicateIterator : public StackObj { Calling structure `PredicateIterator` -> `PredicateBlockIterator` -> `RegularPredicateBlockIterator`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21161#discussion_r1773455619 PR Review Comment: https://git.openjdk.org/jdk/pull/21161#discussion_r1773456683 PR Review Comment: https://git.openjdk.org/jdk/pull/21161#discussion_r1773457164 PR Review Comment: https://git.openjdk.org/jdk/pull/21161#discussion_r1773521651 PR Review Comment: https://git.openjdk.org/jdk/pull/21161#discussion_r1773528603 PR Review Comment: https://git.openjdk.org/jdk/pull/21161#discussion_r1773529729 From epeter at openjdk.org Tue Sep 24 16:07:39 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 24 Sep 2024 16:07:39 GMT Subject: RFR: 8340602: C2: LoadNode::split_through_phi might exhaust nodes in case of base_is_phi [v2] In-Reply-To: References: <2swshN9Ew9xf8p7g0KXHRRw-SHg1Q-X4LZtXy5roDU0=.9e5499d7-0ddf-4ee8-872c-ba00a62cee28@github.com> Message-ID: On Tue, 24 Sep 2024 06:59:47 GMT, Daohan Qu wrote: >> Daohan Qu has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Check duplicate split in case of base_is_phi > > Hi @vnkozlov , I noticed that you have fixed a similar bug in [JDK-6673473](https://github.com/openjdk/jdk/commit/30dc0edfc877000c0ae20384f228b45ba82807b7). Could you please review this PR? Thanks a lot! @quadhier just a drive-through comment: I think you need a regression test for this. Maybe using the whitebox api to see if the method got compiled, or if it was not compiled because not compilable? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21134#issuecomment-2371721777 From dzhang at openjdk.org Tue Sep 24 16:10:14 2024 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 24 Sep 2024 16:10:14 GMT Subject: RFR: 8320998: RISC-V: C2 RoundDoubleModeV Message-ID: Hi all, This patch will add RoundDoubleModeV intrinsics for riscv64. The vector implementation is similar to the scalar version. Please take a look and have some reviews. Thanks a lot! Just like https://github.com/openjdk/jdk/pull/17745, current test shows that, it bring performance gain when vlenb >= 32 (which is on k1), but bring regression when vlenb == 16 (which is on k230). So I only enable the intrinsic when vlenb >= 32. Please compare the data below, thanks! ## Test test/hotspot/jtreg/compiler/c2/cr6340864/TestDoubleVect.java test/hotspot/jtreg/compiler/floatingpoint/TestRound.java test/jdk/java/lang/Math/RoundTests.java test/jdk/jdk/incubator/vector/* test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java ## Performance - with Intrinsic ### on k1 Benchmark on k1 (+intrinsic) Benchmark (TESTSIZE) Mode Cnt Score Error Units FpRoundingBenchmark.test_ceil 2048 thrpt 15 58.973 ? 0.460 ops/ms FpRoundingBenchmark.test_floor 2048 thrpt 15 59.873 ? 0.054 ops/ms FpRoundingBenchmark.test_rint 2048 thrpt 15 59.460 ? 0.552 ops/ms Benchmark on k1 (-intrinsic) Benchmark (TESTSIZE) Mode Cnt Score Error Units FpRoundingBenchmark.test_ceil 2048 thrpt 15 51.335 ? 0.068 ops/ms FpRoundingBenchmark.test_floor 2048 thrpt 15 51.356 ? 0.062 ops/ms FpRoundingBenchmark.test_rint 2048 thrpt 15 51.387 ? 0.059 ops/ms ### on k230 Benchmark on k230 (+intrinsic, enable intrinsic even when vlenb == 16) Benchmark (TESTSIZE) Mode Cnt Score Error Units FpRoundingBenchmark.test_ceil 2048 thrpt 15 28.263 ? 0.837 ops/ms FpRoundingBenchmark.test_floor 2048 thrpt 15 28.130 ? 0.789 ops/ms FpRoundingBenchmark.test_rint 2048 thrpt 15 28.241 ? 0.868 ops/ms Benchmark on k230 (-intrinsic, enable intrinsic even when vlenb == 16) Benchmark (TESTSIZE) Mode Cnt Score Error Units FpRoundingBenchmark.test_ceil 2048 thrpt 15 44.391 ? 1.249 ops/ms FpRoundingBenchmark.test_floor 2048 thrpt 15 44.423 ? 1.187 ops/ms FpRoundingBenchmark.test_rint 2048 thrpt 15 44.441 ? 1.218 ops/ms ## Performance - without Intrinsic ### on k1, intrinsic disabled due to -UseSuperWord Benchmark on k1, -UseSuperWord (+intrinsic) Benchmark (TESTSIZE) Mode Cnt Score Error Units FpRoundingBenchmark.test_ceil 2048 thrpt 15 51.249 ? 0.038 ops/ms FpRoundingBenchmark.test_floor 2048 thrpt 15 51.232 ? 0.021 ops/ms FpRoundingBenchmark.test_rint 2048 thrpt 15 51.110 ? 0.176 ops/ms Benchmark on k1, -UseSuperWord (-intrinsic) Benchmark (TESTSIZE) Mode Cnt Score Error Units FpRoundingBenchmark.test_ceil 2048 thrpt 15 51.287 ? 0.151 ops/ms FpRoundingBenchmark.test_floor 2048 thrpt 15 51.313 ? 0.107 ops/ms FpRoundingBenchmark.test_rint 2048 thrpt 15 51.350 ? 0.067 ops/ms ### on k230, intrinsic disabled due to -UseSuperWord Benchmark on k230, -UseSuperWord (+intrinsic) Benchmark (TESTSIZE) Mode Cnt Score Error Units FpRoundingBenchmark.test_ceil 2048 thrpt 15 44.375 ? 1.364 ops/ms FpRoundingBenchmark.test_floor 2048 thrpt 15 44.532 ? 1.221 ops/ms FpRoundingBenchmark.test_rint 2048 thrpt 15 44.675 ? 1.295 ops/ms ### on k230, intrinsic disabled due to vlenb == 16 Benchmark on k230, +UseSuperWord (+intrinsic) Benchmark (TESTSIZE) Mode Cnt Score Error Units FpRoundingBenchmark.test_ceil 2048 thrpt 15 44.372 ? 1.357 ops/ms FpRoundingBenchmark.test_floor 2048 thrpt 15 44.513 ? 1.278 ops/ms FpRoundingBenchmark.test_rint 2048 thrpt 15 44.609 ? 1.151 ops/ms ------------- Commit messages: - 8320998: RISC-V: C2 RoundDoubleModeV Changes: https://git.openjdk.org/jdk/pull/21164/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21164&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320998 Stats: 75 lines in 4 files changed: 75 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21164.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21164/head:pull/21164 PR: https://git.openjdk.org/jdk/pull/21164 From epeter at openjdk.org Tue Sep 24 16:11:51 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 24 Sep 2024 16:11:51 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 17:08:47 GMT, Quan Anh Mai wrote: >> What are your thoughts on how we are going to debug-print the new int-type? We probably do not always want to print the KnownBits, in general that is way too verbose. But in some alignment example, it would be nice to know that it is the `int/long` range, but the lowest 3 bits are always zero, hence 8 byte aligned. > > @eme64 Thanks to your suggestions, I have managed to come up with a (fairly) formal proof for the algorithm here! @merykitty FYI: I'm going on vacation for 3 weeks, so I'll hope to come back to this afterward. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2371729922 From epeter at openjdk.org Tue Sep 24 16:12:40 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 24 Sep 2024 16:12:40 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v12] In-Reply-To: <6T8i0DZcooO3e9rS9cVk3r_WquXnm9I-fXW80qbg-Ck=.2883943a-d3e2-41a8-aa06-dcfd27b94125@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> <6T8i0DZcooO3e9rS9cVk3r_WquXnm9I-fXW80qbg-Ck=.2883943a-d3e2-41a8-aa06-dcfd27b94125@github.com> Message-ID: <95xkpEUKqiV5WQtHiSH-50H2RSacCP10hc_T2bWWKI8=.aec44127-c1ec-40d4-9b46-2b3b8286e1f6@github.com> On Mon, 23 Sep 2024 16:55:18 GMT, Kangcheng Xu wrote: >> Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. >> >> Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > omit source BasicType FYI going on vacation, so feel free to ask others to review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18489#issuecomment-2371732867 From duke at openjdk.org Tue Sep 24 16:21:02 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Tue, 24 Sep 2024 16:21:02 GMT Subject: RFR: 8340363: Tag-specific default decorators for UnifiedLogging [v2] In-Reply-To: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: > Hi all, > > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: > > - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. > - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. > - Additionally, defaults may target a specific log level. > > Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. > > In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. > > Please consider this PR, and thanks! Ant?n Seoane has updated the pull request incrementally with two additional commits since the last revision: - Test adaptations to new focus - Grouping all defaults together ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20988/files - new: https://git.openjdk.org/jdk/pull/20988/files/5e0b45bc..af0b27be Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20988&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20988&range=00-01 Stats: 44 lines in 7 files changed: 7 ins; 9 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/20988.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20988/head:pull/20988 PR: https://git.openjdk.org/jdk/pull/20988 From tholenstein at openjdk.org Tue Sep 24 16:29:39 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 24 Sep 2024 16:29:39 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access [v3] In-Reply-To: References: <6WTkb-ZgqH8dd-tpK9P8FHClUGXzEp__1tZ1Av45PNg=.5167b860-4036-43ea-b76d-4435c36423f3@github.com> Message-ID: On Tue, 24 Sep 2024 13:41:17 GMT, Emanuel Peter wrote: >> Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: >> >> add another JTreg test with less flags > > test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java line 39: > >> 37: * @run main/othervm >> 38: * -Xbatch -XX:-TieredCompilation >> 39: * compiler.parsing.TestUnsafeArrayAccessWithNullBase > > Not sure if there is a style-guide for writing JTREG commands... but I'd prefer if they were indented for every `@` command. With the second run, you could even drop `-Xbatch -XX:-TieredCompilation`, and just have `@run driver compiler.parsing.TestUnsafeArrayAccessWithNullBase`, because these 2 flags are also set from the ouside in some tiers, I think. I changed the style and remove the flags. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20033#discussion_r1773677667 From epeter at openjdk.org Tue Sep 24 16:33:38 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 24 Sep 2024 16:33:38 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access [v5] In-Reply-To: References: Message-ID: <9cxCn8633F_V9qtpO2zWX6ypwDB_AUHxfGpyLZUUy1I=.324e7e77-5813-4883-a806-d4112fa4a64b@github.com> On Tue, 24 Sep 2024 15:14:56 GMT, Tobias Holenstein wrote: >> We failed in `LibraryCallKit::inline_unsafe_access()` while trying to inline `Unsafe::getShortUnaligned`. >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java#L86 >> The reason is that base (the array) is `ConP #null` hidden behind two `CheckCastPP` with `speculative=byte[int:>=0]` >> >> We call `Node* adr = make_unsafe_address(base, offset, type, kind == Relaxed);` >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2361 >> - with **base** = `147 CheckCastPP` >> - `118 ConP === 0 [[[ 106 101 71 ] #null` >> type >> >> Depending on the **offset** we go two different paths in `LibraryCallKit::make_unsafe_address` which both lead to the same error in the end. >> 1. For `UNSAFE.getShortUnaligned(array, 1_049_000)` we get kind = `Type::AnyPtr` because `offset >= os::vm_page_size()`. Since we assume base can't be null we insert an assert: >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2111 >> >> 2. whereas for `UNSAFE.getShortUnaligned(array, 1)` we get kind = `Type:: OopPtr` >> https://github.com/openjdk/jdk/blob/c17fa910cf3bad48547a3f0d68a30795ec3194e6/src/hotspot/share/opto/library_call.cpp#L2078 >> and insert a null check >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2090 >> In both cases we return call `basic_plus_adr(..)` on a base being `top()` which returns **adr** = `1 Con === 0 [[ ]] #top` >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2386 => `_gvn.type(adr)` is _top_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2394 => `adr_type` is _nullptr_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2405-L2406 => `BasicType bt` is _T_ILLEGAL_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2424 => we fail here with `SIGSEGV: null pointer dereference` because `alias_type->adr_type()` is _nullptr_ >> >> ### Fix (updated on 18th Sep 2024) >> T... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > JTreg style Looks good :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20033#pullrequestreview-2325805313 From duke at openjdk.org Tue Sep 24 16:37:17 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Tue, 24 Sep 2024 16:37:17 GMT Subject: RFR: 8340363: Tag-specific default decorators for UnifiedLogging [v3] In-Reply-To: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: <-A9Xv_OvbHZLre0zN7Lsf_1pZVMvfnfruSo1XT-AGtA=.b67ca356-833e-47a4-8912-e141fc99afb3@github.com> > Hi all, > > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: > > - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. > - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. > - Additionally, defaults may target a specific log level. > > Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. > > In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. > > Please consider this PR, and thanks! Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: Initialization of _decorators field in logDecorators ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20988/files - new: https://git.openjdk.org/jdk/pull/20988/files/af0b27be..ee24f637 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20988&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20988&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20988.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20988/head:pull/20988 PR: https://git.openjdk.org/jdk/pull/20988 From duke at openjdk.org Tue Sep 24 16:43:57 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Tue, 24 Sep 2024 16:43:57 GMT Subject: RFR: 8340363: Tag-specific default decorators for UnifiedLogging [v4] In-Reply-To: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: > Hi all, > > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: > > - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. > - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. > - Additionally, defaults may target a specific log level. > > Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. > > In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. > > Please consider this PR, and thanks! Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: Removed whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20988/files - new: https://git.openjdk.org/jdk/pull/20988/files/ee24f637..aa47a627 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20988&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20988&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20988.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20988/head:pull/20988 PR: https://git.openjdk.org/jdk/pull/20988 From duke at openjdk.org Tue Sep 24 16:44:57 2024 From: duke at openjdk.org (Daniel Skantz) Date: Tue, 24 Sep 2024 16:44:57 GMT Subject: RFR: 8330157: C2: Add a stress flag for bailouts [v11] In-Reply-To: References: Message-ID: > This patch adds a diagnostic/stress flag for C2 bailouts. It can be used to support testing of existing bailouts to prevent issues like [JDK-8318445](https://bugs.openjdk.org/browse/JDK-8318445), and can test for issues only seen at runtime such as [JDK-8326376](https://bugs.openjdk.org/browse/JDK-8326376). It can also be useful if we want to add more bailouts ([JDK-8318900](https://bugs.openjdk.org/browse/JDK-8318900)). > > We check two invariants. > a) Bailouts should be successful starting from any given `failing()` check. > b) The VM should not record a bailout when one is pending (in which case we have continued to optimize for too long). > > a), b) are checked by randomly starting a bailout at calls to `failing()` with a user-given probability. > > The added flag should not have any effect in debug mode. > > Testing: > > T1-5, with flag and without it. We want to check that this does not cause any test failures without the flag set, and no unexpected failures with it. Tests failing because of timeout or because an error is printed to output when compilation fails can be expected in some cases. Daniel Skantz has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 42 commits: - remove stress seed initialization bool - merge - phrasing - nextInt - +1 case - words - Tweak CtwRunner.java; debug only bool flag - merge - +1 whitespace - tweak requires - ... and 32 more: https://git.openjdk.org/jdk/compare/9176f681...14f64753 ------------- Changes: https://git.openjdk.org/jdk/pull/19646/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19646&range=10 Stats: 194 lines in 17 files changed: 159 ins; 0 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/19646.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19646/head:pull/19646 PR: https://git.openjdk.org/jdk/pull/19646 From duke at openjdk.org Tue Sep 24 16:53:51 2024 From: duke at openjdk.org (Daniel Skantz) Date: Tue, 24 Sep 2024 16:53:51 GMT Subject: RFR: 8330157: C2: Add a stress flag for bailouts [v12] In-Reply-To: References: Message-ID: <0Ce3kXIzVDXuzpKGBzekEEOuJftpvkCvLwDkVOtpPR0=.ec2c1c1b-2b78-497d-97d3-a613ef736d2f@github.com> > This patch adds a diagnostic/stress flag for C2 bailouts. It can be used to support testing of existing bailouts to prevent issues like [JDK-8318445](https://bugs.openjdk.org/browse/JDK-8318445), and can test for issues only seen at runtime such as [JDK-8326376](https://bugs.openjdk.org/browse/JDK-8326376). It can also be useful if we want to add more bailouts ([JDK-8318900](https://bugs.openjdk.org/browse/JDK-8318900)). > > We check two invariants. > a) Bailouts should be successful starting from any given `failing()` check. > b) The VM should not record a bailout when one is pending (in which case we have continued to optimize for too long). > > a), b) are checked by randomly starting a bailout at calls to `failing()` with a user-given probability. > > The added flag should not have any effect in debug mode. > > Testing: > > T1-5, with flag and without it. We want to check that this does not cause any test failures without the flag set, and no unexpected failures with it. Tests failing because of timeout or because an error is printed to output when compilation fails can be expected in some cases. Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: left over ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19646/files - new: https://git.openjdk.org/jdk/pull/19646/files/14f64753..d91bc068 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19646&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19646&range=10-11 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19646.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19646/head:pull/19646 PR: https://git.openjdk.org/jdk/pull/19646 From kvn at openjdk.org Tue Sep 24 20:00:47 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 24 Sep 2024 20:00:47 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v25] In-Reply-To: References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Mon, 23 Sep 2024 07:54:39 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/share/opto/matcher.cpp line 1821: >> >>> 1819: if( rule >= _END_INST_CHAIN_RULE || rule < _BEGIN_INST_CHAIN_RULE ) { >>> 1820: assert(C->node_arena()->contains(s->_leaf) || !has_new_node(s->_leaf), >>> 1821: "duplicating node that's already been matched"); >> >> Why it was removed? > > The assertion was failing due to it being too strict in several cases where the matcher would generate valid code anyway. One of them is when `is_encode_and_store_pattern(n, m)` returns true but `m -> n` cannot be matched by a single `g1EncodePAndStoreN` instruction. Commit 9ad158b6 removes this case by ensuring that `is_encode_and_store_pattern(n, m)` holds only if `m -> n` can indeed be matched. > There are other cases (all of them harmless as far as I can see) in which this assertion can fail. I am investigating whether they can be avoided so that the assertion can be restored, and what would be the impact on the "redundant decompression removal" (`g1EncodePAndStoreN`) optimization. I think you can do that as follow up changes as separate RFE. I am fine with removal in this JEP code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1773999931 From psandoz at openjdk.org Tue Sep 24 20:08:44 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 24 Sep 2024 20:08:44 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v17] In-Reply-To: <-L7RYBQd-Q6zLkv5GKU0PDM2SZ-jdm1zAk1VRedDgyM=.c712848d-145b-4ecd-af2f-1a811832559d@github.com> References: <-L7RYBQd-Q6zLkv5GKU0PDM2SZ-jdm1zAk1VRedDgyM=.c712848d-145b-4ecd-af2f-1a811832559d@github.com> Message-ID: On Thu, 19 Sep 2024 06:53:15 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Tuning extra spaces. I sent a pull request to your branch https://github.com/jatin-bhateja/jdk/pull/5/files that moves the `VectorMath` test to the library area and updates it to be more like a library test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2372268735 From chagedorn at openjdk.org Tue Sep 24 21:20:51 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 24 Sep 2024 21:20:51 GMT Subject: RFR: 8340786: Introduce Predicate classes with predicate iterators and visitors for simplified walking [v2] In-Reply-To: References: Message-ID: > This patch introduces new predicate classes which implement a new `Predicate` interface. These classes represent the different predicates in the C2 IR. They are used in combination with new predicate iterator and visitors classes to provide an easy way to walk and process predicates in the IR. > > ### Predicate Interfaces and Implementing Classes > - `Predicate` interface is implemented by four predicate classes: > - `ParsePredicate` (existing class) > - `RuntimePredicate` (existing and updated class) > - `TemplateAssertionPredicate` (new class) > - `InitializedAssertionPredicate` (new class, renamed old `InitializedAssertionPredicate` class to `InitializedAssertionPredicateCreator`) > > ### Predicate Iterator with Visitor classes > There is a new `PredicateIterator` class which can be used to iterate through the predicates of a loop. For each predicate, a `PredicateVisitor` can be applied. The user can implement the `PredicateIterator` interface and override the default do-nothing implementations to the specific needs of the code. I've done this for a couple of places in the code by defining new visitors: > - `ParsePredicateUsefulMarker`: This visitor marks all Parse Predicates as useful. > - Replaces the old now retired `ParsePredicateIterator`. > - `DominatedPredicates`: This visitor checks the dominance relation to an `early` node when trying to figure out the latest legal placement for a node. The goal is to skip as many predicates as possible to avoid interference with Loop Predication and/or creating a Loop Limit Check Predicate. > - Replaces the old now retired `PredicateEntryIterator`. > - `Predicates::dump()`: Newly added dumping code for the predicates above a loop which uses a new `PredicatePrinter` visitor. This helps debugging issues with predicates. > > #### To Be Replaced soon > There are a couple of places where we use similar code to walk predicates and apply some transformation/modifications to the IR. The goal is to replace these locations with the new visitors as well. This will incrementally be done with the next couple of PRs. > > ### More Information > More information about specific classes and changes can be found as code comments and PR comments. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Fix dump_for_loop() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21161/files - new: https://git.openjdk.org/jdk/pull/21161/files/1cc96f06..7fa77140 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21161&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21161&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21161.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21161/head:pull/21161 PR: https://git.openjdk.org/jdk/pull/21161 From chagedorn at openjdk.org Tue Sep 24 21:20:52 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 24 Sep 2024 21:20:52 GMT Subject: RFR: 8340786: Introduce Predicate classes with predicate iterators and visitors for simplified walking [v2] In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 21:18:11 GMT, Christian Hagedorn wrote: >> This patch introduces new predicate classes which implement a new `Predicate` interface. These classes represent the different predicates in the C2 IR. They are used in combination with new predicate iterator and visitors classes to provide an easy way to walk and process predicates in the IR. >> >> ### Predicate Interfaces and Implementing Classes >> - `Predicate` interface is implemented by four predicate classes: >> - `ParsePredicate` (existing class) >> - `RuntimePredicate` (existing and updated class) >> - `TemplateAssertionPredicate` (new class) >> - `InitializedAssertionPredicate` (new class, renamed old `InitializedAssertionPredicate` class to `InitializedAssertionPredicateCreator`) >> >> ### Predicate Iterator with Visitor classes >> There is a new `PredicateIterator` class which can be used to iterate through the predicates of a loop. For each predicate, a `PredicateVisitor` can be applied. The user can implement the `PredicateIterator` interface and override the default do-nothing implementations to the specific needs of the code. I've done this for a couple of places in the code by defining new visitors: >> - `ParsePredicateUsefulMarker`: This visitor marks all Parse Predicates as useful. >> - Replaces the old now retired `ParsePredicateIterator`. >> - `DominatedPredicates`: This visitor checks the dominance relation to an `early` node when trying to figure out the latest legal placement for a node. The goal is to skip as many predicates as possible to avoid interference with Loop Predication and/or creating a Loop Limit Check Predicate. >> - Replaces the old now retired `PredicateEntryIterator`. >> - `Predicates::dump()`: Newly added dumping code for the predicates above a loop which uses a new `PredicatePrinter` visitor. This helps debugging issues with predicates. >> >> #### To Be Replaced soon >> There are a couple of places where we use similar code to walk predicates and apply some transformation/modifications to the IR. The goal is to replace these locations with the new visitors as well. This will incrementally be done with the next couple of PRs. >> >> ### More Information >> More information about specific classes and changes can be found as code comments and PR comments. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Fix dump_for_loop() src/hotspot/share/opto/predicates.cpp line 98: > 96: return (deopt_reason == Deoptimization::Reason_loop_limit_check || > 97: deopt_reason == Deoptimization::Reason_predicate || > 98: deopt_reason == Deoptimization::Reason_profile_predicate); extracted to new method `has_valid_uncommon_trap()` src/hotspot/share/opto/predicates.cpp line 469: > 467: > 468: // Dumps all predicates from the loop to the earliest predicate in a pretty format. > 469: void Predicates::dump() const { Example output: 239 OuterStripMinedLoop: - Loop Limit Check Predicate Block: - Parse Predicate: 115 ParsePredicate - Runtime Predicate: 276 If - Profiled Loop Predicate Block: - Parse Predicate: 104 ParsePredicate - Loop Predicate Block: - Parse Predicate: 93 ParsePredicate - Template Assertion Predicate: 270 RangeCheck - Template Assertion Predicate: 260 RangeCheck - Runtime Predicate: 253 RangeCheck - Runtime Predicate: 283 RangeCheck ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21161#discussion_r1774091984 PR Review Comment: https://git.openjdk.org/jdk/pull/21161#discussion_r1774103468 From dnsimon at openjdk.org Tue Sep 24 23:05:09 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 24 Sep 2024 23:05:09 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread Message-ID: [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in `-Xcomp` or `-Xbatch` mode. However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. ------------- Commit messages: - added CompilerThreadCanCallJavaScope Changes: https://git.openjdk.org/jdk/pull/21171/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21171&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340733 Stats: 116 lines in 5 files changed: 103 ins; 4 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/21171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21171/head:pull/21171 PR: https://git.openjdk.org/jdk/pull/21171 From dlong at openjdk.org Wed Sep 25 01:21:37 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 25 Sep 2024 01:21:37 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 00:35:47 GMT, Dean Long wrote: > This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). > > An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. The alternative approach would look something like this: if (target->get_Method()->is_old() || cha_monomorphic_target->get_Method()->is_old()) { BAILOUT("redefined method"); } assert(!target->can_be_statically_bound() || target == cha_monomorphic_target, ""); @coleenp, @matias9927, please take a look at this PR too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2372675815 PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2372676983 From rcastanedalo at openjdk.org Wed Sep 25 04:22:25 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 25 Sep 2024 04:22:25 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v26] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with three additional commits since the last revision: - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/47c982ba..6fb36e50 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=24-25 Stats: 104 lines in 5 files changed: 4 ins; 30 del; 70 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Wed Sep 25 04:26:43 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 25 Sep 2024 04:26:43 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v24] In-Reply-To: References: <-TKP8SOh4eIFRiOcn-a3aRb_GEgApaWv9vhyEOiKJSw=.d4ade865-3cb1-477d-a7c7-f46449553a64@github.com> Message-ID: On Sat, 21 Sep 2024 06:44:21 GMT, Fei Yang wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion >> - Remove redundant comment > > src/hotspot/cpu/aarch64/gc/g1/g1_aarch64.ad line 257: > >> 255: RegSet::of($res$$Register) /* no_preserve */); >> 256: __ mov($tmp1$$Register, $oldval$$Register); >> 257: __ mov($tmp2$$Register, $newval$$Register); > > Hi, I don't quite understand these two register-register moves here. Seems to me that we could pass `oldval` and `newval` to `cmpxchg` directly as `cmpxchg` won't modify them, which help us save these two moves. Did I miss anything? Thanks. Hi Fei, good catch, thanks. These moves have been around since the changes were initially prototyped, and are indeed unnecessary. Note that micro-optimization of the barrier code for atomic memory accesses has not been a focus of this JEP since we have not found any performance reason to do it at the macro level. Having said that, the moves are just wasteful and (perhaps more importantly) make the code harder to read and maintain, so I just removed them (commit 2c7f374e). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1774393587 From jbhateja at openjdk.org Wed Sep 25 04:39:26 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 25 Sep 2024 04:39:26 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v18] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with two additional commits since the last revision: - Merge pull request #5 from PaulSandoz/JDK-8338201 Move and convert test - Move and convert test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/eb2960a9..28b29bc6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=16-17 Stats: 523 lines in 2 files changed: 245 ins; 278 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From rcastanedalo at openjdk.org Wed Sep 25 04:58:45 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 25 Sep 2024 04:58:45 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v19] In-Reply-To: References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Tue, 24 Sep 2024 19:57:29 GMT, Vladimir Kozlov wrote: > I think you can do that as follow up changes as separate RFE. I am fine with removal in this JEP code. Thanks, in the meantime I found the remaining case that was causing assertion failures, and managed to handle it and reintroduce the original assertion. The case is where the output of `m` is shared by multiple nodes `N = {n1, n2, ...}` and there exists at the same time a `n` in `N` such that `is_encode_and_store_pattern(n, m)`, and a different `n` in `N` such that `!is_encode_and_store_pattern(n, m)`. Here is an example of such a case: ![ideal](https://github.com/user-attachments/assets/2122dfe0-757c-4094-b8f7-451f4380af45) Commit f96dfe73 ensures that this case does not trigger the assertion by avoiding cloning `m` in `m -> n` if `m` is shared. This means that a few encode-and-store patterns that were matched by a single `g1EncodePAndStoreN` before are now matched with the less optimized `encodeHeapOop` + `g1StoreN`, e.g. in the above example: ![mach-before-after](https://github.com/user-attachments/assets/a6fe5ad2-c0fb-4098-8bed-34593dafbcc5) Luckily, this case is very infrequent so we will only miss around 1% of all optimization opportunities. In return, we can reintroduce the original assertion and be sure that the original invariants of the matcher are preserved. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1774467183 From rcastanedalo at openjdk.org Wed Sep 25 04:58:45 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 25 Sep 2024 04:58:45 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v19] In-Reply-To: References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Wed, 25 Sep 2024 04:55:35 GMT, Roberto Casta?eda Lozano wrote: >> I think you can do that as follow up changes as separate RFE. I am fine with removal in this JEP code. > >> I think you can do that as follow up changes as separate RFE. I am fine with removal in this JEP code. > > Thanks, in the meantime I found the remaining case that was causing assertion failures, and managed to handle it and reintroduce the original assertion. The case is where the output of `m` is shared by multiple nodes `N = {n1, n2, ...}` and there exists at the same time a `n` in `N` such that `is_encode_and_store_pattern(n, m)`, and a different `n` in `N` such that `!is_encode_and_store_pattern(n, m)`. Here is an example of such a case: > > ![ideal](https://github.com/user-attachments/assets/2122dfe0-757c-4094-b8f7-451f4380af45) > > Commit f96dfe73 ensures that this case does not trigger the assertion by avoiding cloning `m` in `m -> n` if `m` is shared. This means that a few encode-and-store patterns that were matched by a single `g1EncodePAndStoreN` before are now matched with the less optimized `encodeHeapOop` + `g1StoreN`, e.g. in the above example: > > ![mach-before-after](https://github.com/user-attachments/assets/a6fe5ad2-c0fb-4098-8bed-34593dafbcc5) > > Luckily, this case is very infrequent so we will only miss around 1% of all optimization opportunities. In return, we can reintroduce the original assertion and be sure that the original invariants of the matcher are preserved. @TheRealMDoerr: since there are now a few corner cases where we match a StoreN node with g1StoreN even though it stores the output of an EncodeP node, I had to remove the assertions in the x64 and ppc g1StoreN definitions, see above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1774467652 From dnsimon at openjdk.org Wed Sep 25 06:02:13 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 25 Sep 2024 06:02:13 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v2] In-Reply-To: References: Message-ID: <4IDkstmCYal4JfjNPQoRETzmvf97QVOkNFaewE6bacU=.d6a690b6-b0c1-4d16-ad76-8b219351e68a@github.com> > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in `-Xcomp` or `-Xbatch` mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: added CompilerThreadCanCallJavaScope ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21171/files - new: https://git.openjdk.org/jdk/pull/21171/files/258492da..c3e23c0e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21171&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21171&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21171/head:pull/21171 PR: https://git.openjdk.org/jdk/pull/21171 From dnsimon at openjdk.org Wed Sep 25 06:05:15 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 25 Sep 2024 06:05:15 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v3] In-Reply-To: References: Message-ID: > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in `-Xcomp` or `-Xbatch` mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: rename changeCompilerThreadCanCallJava to updateCompilerThreadCanCallJava ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21171/files - new: https://git.openjdk.org/jdk/pull/21171/files/c3e23c0e..882cec4c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21171&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21171&range=01-02 Stats: 5 lines in 3 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21171/head:pull/21171 PR: https://git.openjdk.org/jdk/pull/21171 From dqu at openjdk.org Wed Sep 25 07:11:18 2024 From: dqu at openjdk.org (Daohan Qu) Date: Wed, 25 Sep 2024 07:11:18 GMT Subject: RFR: 8340602: C2: LoadNode::split_through_phi might exhaust nodes in case of base_is_phi [v3] In-Reply-To: References: Message-ID: > # Description > > [JDK-6934604](https://github.com/openjdk/jdk/commit/b4977e887a53c898b96a7d37a3bf94742c7cc194) introduces the flag `AggressiveUnboxing` in jdk8u, and [JDK-8217919](https://github.com/openjdk/jdk/commit/71759e3177fcd6926bb36a30a26f9f3976f2aae8) enables it by default in jdk13u. > > But it seems that JDK-6934604 forgets to check duplicate `PhiNode` generated in the function `LoadNode::split_through_phi(PhaseGVN *phase)` (in `memnode.cpp`) in the case that `base` is phi but `mem` is not phi. More exactly, `LoadNode::Identity(PhaseTransform *phase)` doesn't search for `PhiNode` in the correct region in that case. > > This might cause infinite split in the case of a loop, which is similar to the bugs fixed in [JDK-6673473](https://github.com/openjdk/jdk/commit/30dc0edfc877000c0ae20384f228b45ba82807b7). The infinite split results in "Out of nodes" and make the method "not compilable". > > Since JDK-8217919 (in jdk13u), all the later versions of jdks are affected by this bug when the expected optimization pattern appears in the code. For example, the following three micro-benchmarks running with > > > make test \ > TEST="micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef" \ > MICRO="FORK=1;WARMUP_ITER=2" \ > TEST_OPTS="VM_OPTIONS=-XX:+UseParallelGC" > > > shows performance improvement after this PR applied. (`-XX:+UseParallelGC` is only for reproduce this bug, all the bms in the following table are run with this option.) > > |benchmark (throughput, unit: ops/s)| jdk-before-this-patch | jdk-after-this-patch | > |---|---|---| > |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner | 26.452 ?(99.9%) 0.185 ops/s | 62.060 ?(99.9%) 0.878 ops/s | > |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda | 26.922 ?(99.9%) 1.710 ops/s | 67.961 ?(99.9%) 0.850 ops/s | > |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef | 28.382 ?(99.9%) 1.021 ops/s | 67.998 ?(99.9%) 0.751 ops/s | > > # Reproduction > > Compiled and run the reduced test case `Test.java` in the appendix below using > > > java -Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=comp.log -XX:+UseParallelGC Test > > > and you could find that `Test$Obj.calc` is tagged with `make_not_compilable` and see some output like > > > > >> Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > omit source BasicType src/hotspot/share/opto/loopnode.cpp line 3934: > 3932: // } > 3933: // > 3934: // so that the loop can be eliminated given that `stride_con2 / stride_con` is In general, a loop is not eliminated after a parallel iv is found. Can you tweak the comment so it says, that in this particular example the loop can be eliminated, src/hotspot/share/opto/loopnode.cpp line 4002: > 4000: // if stride_con2 is min_jint (or min_jlong, respectively) and > 4001: // stride_con is -1. > 4002: if (((stride_con2_bt == T_INT && stride_con2 == min_jint) || Can you use `min_signed_integer(BasicType bt)` here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1774670503 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1774671519 From fyang at openjdk.org Wed Sep 25 07:36:47 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 25 Sep 2024 07:36:47 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v24] In-Reply-To: References: <-TKP8SOh4eIFRiOcn-a3aRb_GEgApaWv9vhyEOiKJSw=.d4ade865-3cb1-477d-a7c7-f46449553a64@github.com> Message-ID: On Wed, 25 Sep 2024 04:22:49 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/cpu/aarch64/gc/g1/g1_aarch64.ad line 257: >> >>> 255: RegSet::of($res$$Register) /* no_preserve */); >>> 256: __ mov($tmp1$$Register, $oldval$$Register); >>> 257: __ mov($tmp2$$Register, $newval$$Register); >> >> Hi, I don't quite understand these two register-register moves here. Seems to me that we could pass `oldval` and `newval` to `cmpxchg` directly as `cmpxchg` won't modify them, which help us save these two moves. Did I miss anything? Thanks. > > Hi Fei, good catch, thanks. These moves have been around since the changes were initially prototyped, and are indeed unnecessary. Note that micro-optimization of the barrier code for atomic memory accesses has not been a focus of this JEP since we have not found any performance reason to do it at the macro level. Having said that, the moves are just wasteful and (perhaps more importantly) make the code harder to read and maintain, so I just removed them (commit 2c7f374e). Thanks for the update. It now looks cleaner and easier to understand. BTW: Seems that RISC-V part bears a similar issue. I will discuss with @feilongjiang and hopefully we will come up with a similar fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1774695093 From thartmann at openjdk.org Wed Sep 25 08:01:53 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 25 Sep 2024 08:01:53 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access [v5] In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 15:14:56 GMT, Tobias Holenstein wrote: >> We failed in `LibraryCallKit::inline_unsafe_access()` while trying to inline `Unsafe::getShortUnaligned`. >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java#L86 >> The reason is that base (the array) is `ConP #null` hidden behind two `CheckCastPP` with `speculative=byte[int:>=0]` >> >> We call `Node* adr = make_unsafe_address(base, offset, type, kind == Relaxed);` >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2361 >> - with **base** = `147 CheckCastPP` >> - `118 ConP === 0 [[[ 106 101 71 ] #null` >> type >> >> Depending on the **offset** we go two different paths in `LibraryCallKit::make_unsafe_address` which both lead to the same error in the end. >> 1. For `UNSAFE.getShortUnaligned(array, 1_049_000)` we get kind = `Type::AnyPtr` because `offset >= os::vm_page_size()`. Since we assume base can't be null we insert an assert: >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2111 >> >> 2. whereas for `UNSAFE.getShortUnaligned(array, 1)` we get kind = `Type:: OopPtr` >> https://github.com/openjdk/jdk/blob/c17fa910cf3bad48547a3f0d68a30795ec3194e6/src/hotspot/share/opto/library_call.cpp#L2078 >> and insert a null check >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2090 >> In both cases we return call `basic_plus_adr(..)` on a base being `top()` which returns **adr** = `1 Con === 0 [[ ]] #top` >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2386 => `_gvn.type(adr)` is _top_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2394 => `adr_type` is _nullptr_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2405-L2406 => `BasicType bt` is _T_ILLEGAL_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2424 => we fail here with `SIGSEGV: null pointer dereference` because `alias_type->adr_type()` is _nullptr_ >> >> ### Fix (updated on 18th Sep 2024) >> T... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > JTreg style Looks good to me too. > So it would make sense to go over the uses of Type*::BOTTOM/Type*::NOTNULL and check they are not tested with pointer equality What about this concern? Did anyone check yet or should be file a follow-up task? test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java line 31: > 29: * @modules java.base/jdk.internal.misc > 30: * @run main/othervm -Xbatch -XX:CompileCommand=quiet -XX:TypeProfileLevel=222 > 31: * -XX:+IgnoreUnrecognizedVMOptions -XX:+AlwaysIncrementalInline Suggestion: * -XX:+IgnoreUnrecognizedVMOptions -XX:+AlwaysIncrementalInline ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20033#pullrequestreview-2327443242 PR Review Comment: https://git.openjdk.org/jdk/pull/20033#discussion_r1774730149 From tholenstein at openjdk.org Wed Sep 25 08:01:53 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 25 Sep 2024 08:01:53 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access [v6] In-Reply-To: References: Message-ID: > We failed in `LibraryCallKit::inline_unsafe_access()` while trying to inline `Unsafe::getShortUnaligned`. > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java#L86 > The reason is that base (the array) is `ConP #null` hidden behind two `CheckCastPP` with `speculative=byte[int:>=0]` > > We call `Node* adr = make_unsafe_address(base, offset, type, kind == Relaxed);` > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2361 > - with **base** = `147 CheckCastPP` > - `118 ConP === 0 [[[ 106 101 71 ] #null` > type > > Depending on the **offset** we go two different paths in `LibraryCallKit::make_unsafe_address` which both lead to the same error in the end. > 1. For `UNSAFE.getShortUnaligned(array, 1_049_000)` we get kind = `Type::AnyPtr` because `offset >= os::vm_page_size()`. Since we assume base can't be null we insert an assert: > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2111 > > 2. whereas for `UNSAFE.getShortUnaligned(array, 1)` we get kind = `Type:: OopPtr` > https://github.com/openjdk/jdk/blob/c17fa910cf3bad48547a3f0d68a30795ec3194e6/src/hotspot/share/opto/library_call.cpp#L2078 > and insert a null check > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2090 > In both cases we return call `basic_plus_adr(..)` on a base being `top()` which returns **adr** = `1 Con === 0 [[ ]] #top` > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2386 => `_gvn.type(adr)` is _top_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2394 => `adr_type` is _nullptr_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2405-L2406 => `BasicType bt` is _T_ILLEGAL_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2424 => we fail here with `SIGSEGV: null pointer dereference` because `alias_type->adr_type()` is _nullptr_ > > ### Fix (updated on 18th Sep 2024) > The fix modifies the `LibraryCallKit::classify_unsafe_addr()`... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20033/files - new: https://git.openjdk.org/jdk/pull/20033/files/2abec7d4..255c0de3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20033&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20033&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20033.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20033/head:pull/20033 PR: https://git.openjdk.org/jdk/pull/20033 From jsjolen at openjdk.org Wed Sep 25 09:23:39 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 25 Sep 2024 09:23:39 GMT Subject: RFR: 8340363: Tag-specific default decorators for UnifiedLogging [v4] In-Reply-To: References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: <_gHhzeVW-OT3gTMiLMjzeBEjOmOMMnveEfr79i8rQBs=.88da52a5-7b42-4e3a-8965-3f607e383c82@github.com> On Tue, 24 Sep 2024 16:43:57 GMT, Ant?n Seoane wrote: >> Hi all, >> >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: >> >> - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. >> - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. >> - Additionally, defaults may target a specific log level. >> >> Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. >> >> In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. >> >> Please consider this PR, and thanks! > > Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: > > Removed whitespace The main point of importance here is that anything user-specified (from `-Xlog` or `jcmd`) will take priority. We should probably add some information regarding this on the `-Xlog:help` page. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20988#issuecomment-2373534931 From rrich at openjdk.org Wed Sep 25 09:29:04 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 25 Sep 2024 09:29:04 GMT Subject: RFR: 8340792: -XX:+PrintInterpreter: instructions should only be printed if printing all InterpreterCodelets Message-ID: With `-XX:+PrintInterpreter` the instructions of the generated interpreter are printed at startup. But also after startup when printing an interpreted frame the instructions of the corresponding `InterpreterCodelet` are also printed. This can become a problem with `-Xlog:continuations=trace` because large sections of the interpreter are printed repeatedly and redundantly. With this change the instructions are only printed when printing all codelets, i.e. normally once at start-up. I've tested with the reproducer from the JBS-Issue. ------------- Commit messages: - 8340792: -XX:+PrintInterpreter: instructions should only be printed if printing all InterpreterCodelets Changes: https://git.openjdk.org/jdk/pull/21158/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21158&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340792 Stats: 11 lines in 3 files changed: 7 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21158.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21158/head:pull/21158 PR: https://git.openjdk.org/jdk/pull/21158 From fyang at openjdk.org Wed Sep 25 09:32:35 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 25 Sep 2024 09:32:35 GMT Subject: RFR: 8320998: RISC-V: C2 RoundDoubleModeV In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 16:01:47 GMT, Dingli Zhang wrote: > Hi all, > > This patch will add RoundDoubleModeV intrinsics for riscv64. The vector implementation is similar to the scalar version. Please take a look and have some reviews. Thanks a lot! > > Just like https://github.com/openjdk/jdk/pull/17745, current test shows that, it bring performance gain when vlenb >= 32 (which is on k1), but bring regression when vlenb == 16 (which is on k230). So I only enable the intrinsic when vlenb >= 32. > > Please compare the data below, thanks! > > ## Test > ### Test on k1 > test/hotspot/jtreg/compiler/c2/cr6340864/TestDoubleVect.java > test/hotspot/jtreg/compiler/floatingpoint/TestRound.java > test/jdk/java/lang/Math/RoundTests.java > test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java > ### Test on qemu(enable RVV1.0) > test/jdk/jdk/incubator/vector/* > > ## Performance - with Intrinsic > ### on k1 > Benchmark on k1 (+intrinsic) > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.test_ceil 2048 thrpt 15 58.973 ? 0.460 ops/ms > FpRoundingBenchmark.test_floor 2048 thrpt 15 59.873 ? 0.054 ops/ms > FpRoundingBenchmark.test_rint 2048 thrpt 15 59.460 ? 0.552 ops/ms > > > Benchmark on k1 (-intrinsic) > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.test_ceil 2048 thrpt 15 51.335 ? 0.068 ops/ms > FpRoundingBenchmark.test_floor 2048 thrpt 15 51.356 ? 0.062 ops/ms > FpRoundingBenchmark.test_rint 2048 thrpt 15 51.387 ? 0.059 ops/ms > > ### on k230 > Benchmark on k230 (+intrinsic, enable intrinsic even when vlenb == 16) > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.test_ceil 2048 thrpt 15 28.263 ? 0.837 ops/ms > FpRoundingBenchmark.test_floor 2048 thrpt 15 28.130 ? 0.789 ops/ms > FpRoundingBenchmark.test_rint 2048 thrpt 15 28.241 ? 0.868 ops/ms > > > Benchmark on k230 (-intrinsic, enable intrinsic even when vlenb == 16) > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.test_ceil 2048 thrpt 15 44.391 ? 1.249 ops/ms > FpRoundingBenchmark.test_floor 2048 thrpt 15 44.423 ? 1.187 ops/ms > FpRoundingBenchmark.test_rint 2048 thrpt 15 44.441 ? 1.218 ops/ms > > > ## Performance - without Intrinsic > ### on k1, intrinsic disabled due to -Us... src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 3119: > 3117: break; > 3118: case RoundDoubleModeNode::rmode_rint: > 3119: csrwi(CSR_FRM, C2_MacroAssembler::rne); No need to set the CSR here as `FRM` has been set to Round to Nearest mode when enter Java code. Check [JDK-8330094](https://bugs.openjdk.org/browse/JDK-8330094) for more details. And if you set `FRM` to some other rounding modes, you will need to restore it to Round to Nearest mode after processing. But the problem is that modifying CSR on RISC-V is very costly. Guess that's one of the reasons why the JMH result is not obvious. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21164#discussion_r1774893250 From chagedorn at openjdk.org Wed Sep 25 10:28:39 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 25 Sep 2024 10:28:39 GMT Subject: RFR: 8340602: C2: LoadNode::split_through_phi might exhaust nodes in case of base_is_phi [v3] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 07:11:18 GMT, Daohan Qu wrote: >> # Description >> >> [JDK-6934604](https://github.com/openjdk/jdk/commit/b4977e887a53c898b96a7d37a3bf94742c7cc194) introduces the flag `AggressiveUnboxing` in jdk8u, and [JDK-8217919](https://github.com/openjdk/jdk/commit/71759e3177fcd6926bb36a30a26f9f3976f2aae8) enables it by default in jdk13u. >> >> But it seems that JDK-6934604 forgets to check duplicate `PhiNode` generated in the function `LoadNode::split_through_phi(PhaseGVN *phase)` (in `memnode.cpp`) in the case that `base` is phi but `mem` is not phi. More exactly, `LoadNode::Identity(PhaseTransform *phase)` doesn't search for `PhiNode` in the correct region in that case. >> >> This might cause infinite split in the case of a loop, which is similar to the bugs fixed in [JDK-6673473](https://github.com/openjdk/jdk/commit/30dc0edfc877000c0ae20384f228b45ba82807b7). The infinite split results in "Out of nodes" and make the method "not compilable". >> >> Since JDK-8217919 (in jdk13u), all the later versions of jdks are affected by this bug when the expected optimization pattern appears in the code. For example, the following three micro-benchmarks running with >> >> >> make test \ >> TEST="micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef" \ >> MICRO="FORK=1;WARMUP_ITER=2" \ >> TEST_OPTS="VM_OPTIONS=-XX:+UseParallelGC" >> >> >> shows performance improvement after this PR applied. (`-XX:+UseParallelGC` is only for reproduce this bug, all the bms in the following table are run with this option.) >> >> |benchmark (throughput, unit: ops/s)| jdk-before-this-patch | jdk-after-this-patch | >> |---|---|---| >> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner | 26.452 ?(99.9%) 0.185 ops/s | 62.060 ?(99.9%) 0.878 ops/s | >> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda | 26.922 ?(99.9%) 1.710 ops/s | 67.961 ?(99.9%) 0.850 ops/s | >> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef | 28.382 ?(99.9%) 1.021 ops/s | 67.998 ?(99.9%) 0.751 ops/s | >> >> # Reproduction >> >> Compiled and run the reduced test case `Test.java` in the appendix below using >> >> >> java -Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=comp.log -XX:+UseParallelGC Test >> >> >> and you could find that `Test$Obj.calc` is tagged with `make_not_compilable` and... > > Daohan Qu has updated the pull request incrementally with one additional commit since the last revision: > > Add a jtreg test Can you explain in more detail how we fail to cut the infinite splitting and how your patch now solves this? I have some initial comments. src/hotspot/share/opto/memnode.cpp line 1257: > 1255: //----------------------is_instance_field_load_with_local_phi------------------ > 1256: bool LoadNode::is_instance_field_load_with_local_phi() { > 1257: if( in(Memory)->is_Phi() && in(Address)->is_AddP() ) { Was like that before but usually, we should fix the code style when touching old code like spacing ```suggestion if (in(Memory)->is_Phi() && in(Address)->is_AddP()) { or `*` placement (should be at type). There are some more places below which you could also fix with this patch. src/hotspot/share/opto/memnode.cpp line 1303: > 1301: } > 1302: > 1303: if (is_boxed_value_load_with_local_phi(phase)) { Does `t_oop->is_ptr_to_boxed_value()` hold here? Should we assert this? src/hotspot/share/opto/memnode.cpp line 1305: > 1303: if (is_boxed_value_load_with_local_phi(phase)) { > 1304: intptr_t ignore = 0; > 1305: Node * base = AddPNode::Ideal_base_and_offset(in(Address), phase, ignore); There was a null check before for `base`. Is this not required anymore? test/hotspot/jtreg/compiler/loopopts/TestInfiniteSplitInCaseOfBaseIsPhi.java line 27: > 25: * @test > 26: * @bug 8340602 > 27: * @requires vm.compiler2.enabled Since you run with Parallel GC, you should add a requires here: Suggestion: * @requires vm.compiler2.enabled & & vm.gc.Parallel test/hotspot/jtreg/compiler/loopopts/TestInfiniteSplitInCaseOfBaseIsPhi.java line 38: > 36: import java.util.Random; > 37: > 38: public class TestInfiniteSplitInCaseOfBaseIsPhi { You should fix the indentation to 4 spaces for Java code instead of 2. ------------- PR Review: https://git.openjdk.org/jdk/pull/21134#pullrequestreview-2327402828 PR Review Comment: https://git.openjdk.org/jdk/pull/21134#discussion_r1774709937 PR Review Comment: https://git.openjdk.org/jdk/pull/21134#discussion_r1774968342 PR Review Comment: https://git.openjdk.org/jdk/pull/21134#discussion_r1774960562 PR Review Comment: https://git.openjdk.org/jdk/pull/21134#discussion_r1774705779 PR Review Comment: https://git.openjdk.org/jdk/pull/21134#discussion_r1774707120 From kxu at openjdk.org Wed Sep 25 14:59:52 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 25 Sep 2024 14:59:52 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v13] In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: <-rYUlWiTDc28OD0Jak4vpxGpQrCwQAGGdn3rlEcDq-g=.d95aa8f1-e817-48aa-be18-66f6c658fec9@github.com> > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: Update loopnode.cpp update comments and use min_signed_integer() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18489/files - new: https://git.openjdk.org/jdk/pull/18489/files/64bf036e..2f053c5a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=11-12 Stats: 8 lines in 1 file changed: 1 ins; 2 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18489.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489 PR: https://git.openjdk.org/jdk/pull/18489 From kxu at openjdk.org Wed Sep 25 14:59:53 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 25 Sep 2024 14:59:53 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v12] In-Reply-To: <7g7JhzAWgpzcq6-Cr8MLC2J_ORxdkFLaRwLIclgTEU0=.e234ae3c-f7f2-46a0-bee8-1dca2846d902@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> <6T8i0DZcooO3e9rS9cVk3r_WquXnm9I-fXW80qbg-Ck=.2883943a-d3e2-41a8-aa06-dcfd27b94125@github.com> <7g7JhzAWgpzcq6-Cr8MLC2J_ORxdkFLaRwLIclgTEU0=.e234ae3c-f7f2-46a0-bee8-1dca2846d902@github.com> Message-ID: On Wed, 25 Sep 2024 07:19:18 GMT, Roland Westrelin wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> omit source BasicType > > src/hotspot/share/opto/loopnode.cpp line 4002: > >> 4000: // if stride_con2 is min_jint (or min_jlong, respectively) and >> 4001: // stride_con is -1. >> 4002: if (((stride_con2_bt == T_INT && stride_con2 == min_jint) || > > Can you use `min_signed_integer(BasicType bt)` here? Good point! I wasn't aware of this helper. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1775402024 From kxu at openjdk.org Wed Sep 25 15:52:14 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 25 Sep 2024 15:52:14 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v14] In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: better comment formatting ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18489/files - new: https://git.openjdk.org/jdk/pull/18489/files/2f053c5a..4094231e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=12-13 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18489.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489 PR: https://git.openjdk.org/jdk/pull/18489 From kxu at openjdk.org Wed Sep 25 15:56:10 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 25 Sep 2024 15:56:10 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v15] In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: - Merge branch 'openjdk:master' into long-typed-parallel-iv - better comment formatting - Update loopnode.cpp update comments and use min_signed_integer() - omit source BasicType - Merge branch 'openjdk:master' into long-typed-parallel-iv - refactor I/L conversion nodes - update tests and comments as requested - Merge branch 'master' into long-typed-parallel-iv - use @run driver and Argument.RANDOM_ONCE - Merge branch 'master' into long-typed-parallel-iv - ... and 24 more: https://git.openjdk.org/jdk/compare/fb703258...5925761f ------------- Changes: https://git.openjdk.org/jdk/pull/18489/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=14 Stats: 427 lines in 3 files changed: 414 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/18489.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489 PR: https://git.openjdk.org/jdk/pull/18489 From kxu at openjdk.org Wed Sep 25 16:04:54 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 25 Sep 2024 16:04:54 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v16] In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: <6zr-O1Pj6yXbpDB7wfDJxJA8VJqU23ZfiM59ED9yE88=.5d051ddd-8dc7-4e89-9ca2-1c0a05f25282@github.com> > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: make jcheck happy ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18489/files - new: https://git.openjdk.org/jdk/pull/18489/files/5925761f..66b04622 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=14-15 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18489.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489 PR: https://git.openjdk.org/jdk/pull/18489 From duke at openjdk.org Wed Sep 25 17:09:38 2024 From: duke at openjdk.org (Abdelhak Zaaim) Date: Wed, 25 Sep 2024 17:09:38 GMT Subject: RFR: 8340786: Introduce Predicate classes with predicate iterators and visitors for simplified walking [v2] In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 21:20:51 GMT, Christian Hagedorn wrote: >> This patch introduces new predicate classes which implement a new `Predicate` interface. These classes represent the different predicates in the C2 IR. They are used in combination with new predicate iterator and visitors classes to provide an easy way to walk and process predicates in the IR. >> >> ### Predicate Interfaces and Implementing Classes >> - `Predicate` interface is implemented by four predicate classes: >> - `ParsePredicate` (existing class) >> - `RuntimePredicate` (existing and updated class) >> - `TemplateAssertionPredicate` (new class) >> - `InitializedAssertionPredicate` (new class, renamed old `InitializedAssertionPredicate` class to `InitializedAssertionPredicateCreator`) >> >> ### Predicate Iterator with Visitor classes >> There is a new `PredicateIterator` class which can be used to iterate through the predicates of a loop. For each predicate, a `PredicateVisitor` can be applied. The user can implement the `PredicateIterator` interface and override the default do-nothing implementations to the specific needs of the code. I've done this for a couple of places in the code by defining new visitors: >> - `ParsePredicateUsefulMarker`: This visitor marks all Parse Predicates as useful. >> - Replaces the old now retired `ParsePredicateIterator`. >> - `DominatedPredicates`: This visitor checks the dominance relation to an `early` node when trying to figure out the latest legal placement for a node. The goal is to skip as many predicates as possible to avoid interference with Loop Predication and/or creating a Loop Limit Check Predicate. >> - Replaces the old now retired `PredicateEntryIterator`. >> - `Predicates::dump()`: Newly added dumping code for the predicates above a loop which uses a new `PredicatePrinter` visitor. This helps debugging issues with predicates. >> >> #### To Be Replaced soon >> There are a couple of places where we use similar code to walk predicates and apply some transformation/modifications to the IR. The goal is to replace these locations with the new visitors as well. This will incrementally be done with the next couple of PRs. >> >> ### More Information >> More information about specific classes and changes can be found as code comments and PR comments. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Fix dump_for_loop() Marked as reviewed by abdelhak-zaaim at github.com (no known OpenJDK username). ------------- PR Review: https://git.openjdk.org/jdk/pull/21161#pullrequestreview-2328944019 From kvn at openjdk.org Wed Sep 25 17:13:40 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 25 Sep 2024 17:13:40 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access [v6] In-Reply-To: References: Message-ID: <2gx5pEpUCQJmu3XXhW-AaXh2JVWlV9DTnFBScxx2dHQ=.97cdc92d-f1fd-40a6-bdc6-8e4ab36a5a02@github.com> On Wed, 25 Sep 2024 08:01:53 GMT, Tobias Holenstein wrote: >> We failed in `LibraryCallKit::inline_unsafe_access()` while trying to inline `Unsafe::getShortUnaligned`. >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java#L86 >> The reason is that base (the array) is `ConP #null` hidden behind two `CheckCastPP` with `speculative=byte[int:>=0]` >> >> We call `Node* adr = make_unsafe_address(base, offset, type, kind == Relaxed);` >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2361 >> - with **base** = `147 CheckCastPP` >> - `118 ConP === 0 [[[ 106 101 71 ] #null` >> type >> >> Depending on the **offset** we go two different paths in `LibraryCallKit::make_unsafe_address` which both lead to the same error in the end. >> 1. For `UNSAFE.getShortUnaligned(array, 1_049_000)` we get kind = `Type::AnyPtr` because `offset >= os::vm_page_size()`. Since we assume base can't be null we insert an assert: >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2111 >> >> 2. whereas for `UNSAFE.getShortUnaligned(array, 1)` we get kind = `Type:: OopPtr` >> https://github.com/openjdk/jdk/blob/c17fa910cf3bad48547a3f0d68a30795ec3194e6/src/hotspot/share/opto/library_call.cpp#L2078 >> and insert a null check >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2090 >> In both cases we return call `basic_plus_adr(..)` on a base being `top()` which returns **adr** = `1 Con === 0 [[ ]] #top` >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2386 => `_gvn.type(adr)` is _top_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2394 => `adr_type` is _nullptr_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2405-L2406 => `BasicType bt` is _T_ILLEGAL_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2424 => we fail here with `SIGSEGV: null pointer dereference` because `alias_type->adr_type()` is _nullptr_ >> >> ### Fix (updated on 18th Sep 2024) >> T... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java > > Co-authored-by: Tobias Hartmann src/hotspot/share/opto/library_call.cpp line 2367: > 2365: assert(!stopped(), "Inlining of unsafe access failed: address construction stopped unexpectedly"); > 2366: > 2367: if (_gvn.type(base)->isa_ptr() == TypePtr::NULL_PTR) { Why not `uncast` here? test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java line 34: > 32: * -XX:CompileCommand=compileonly,compiler.parsing.TestUnsafeArrayAccessWithNullBase::test* > 33: * -XX:-TieredCompilation compiler.parsing.TestUnsafeArrayAccessWithNullBase > 34: * @run main/othervm compiler.parsing.TestUnsafeArrayAccessWithNullBase You don't need `/othervm` if no VM's flags are specified. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20033#discussion_r1775637977 PR Review Comment: https://git.openjdk.org/jdk/pull/20033#discussion_r1775636351 From duke at openjdk.org Wed Sep 25 21:12:52 2024 From: duke at openjdk.org (hanklo6) Date: Wed, 25 Sep 2024 21:12:52 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions Message-ID: Add test generation tool and gtest for testing APX encoding of instructions with extended general-purpose registers. Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. ### Generate test instructions With `binutils = 2.43` * `python3 x86-asmtest.py > asmtest.out.h` ### Run test * `make test TEST="gtest:AssemblerX86"` ------------- Commit messages: - Remove tab - Remove whitespace - Replace whitespace with tab - Add flag before testing - Fix assertion error on MacOS - Add _LP64 flag - Add missing header - Remove unused tests - The Shift count must be less than 32 - Add ResourceMark to avoid memory leak - ... and 3 more: https://git.openjdk.org/jdk/compare/c3711dc9...2f258ba9 Changes: https://git.openjdk.org/jdk/pull/20857/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20857&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339507 Stats: 46725 lines in 3 files changed: 46725 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20857.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20857/head:pull/20857 PR: https://git.openjdk.org/jdk/pull/20857 From sviswanathan at openjdk.org Wed Sep 25 21:12:52 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 25 Sep 2024 21:12:52 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 16:44:57 GMT, hanklo6 wrote: > Add test generation tool and gtest for testing APX encoding of instructions with extended general-purpose registers. > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > ### Generate test instructions > With `binutils = 2.43` > * `python3 x86-asmtest.py > asmtest.out.h` > ### Run test > * `make test TEST="gtest:AssemblerX86"` Hank Lo is part of Intel Java Team and is contributing under Intel OCA. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20857#issuecomment-2332441463 From kvn at openjdk.org Wed Sep 25 21:53:34 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 25 Sep 2024 21:53:34 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 In-Reply-To: References: Message-ID: <_-tMiGVnR8hdJVcKVNI5XP6haPJahfAEDBx8sXdZcHA=.f87096ab-0f94-4389-8906-40ab5e870678@github.com> On Wed, 25 Sep 2024 01:19:34 GMT, Dean Long wrote: >> This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). >> >> An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. > > @coleenp, @matias9927, please take a look at this PR too. @dean-long why then you choose this change instead of alternative approach which you think is better? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2375326383 From kvn at openjdk.org Wed Sep 25 22:09:34 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 25 Sep 2024 22:09:34 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 00:35:47 GMT, Dean Long wrote: > This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). > > An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. Instead of bailout in alternative approach we can change `cha_monomorphic_target` to `nullptr` in code which is looking for it in previous lines. `target` will be used for call and we will loose a little performance when JVMTI is used instead of skipping compilation. Am I missing something? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2375350501 From dlong at openjdk.org Wed Sep 25 22:54:34 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 25 Sep 2024 22:54:34 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 22:07:18 GMT, Vladimir Kozlov wrote: >> This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). >> >> An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. > > Instead of bailout in alternative approach we can change `cha_monomorphic_target` to `nullptr` in code which is looking for it in previous lines. `target` will be used for call and we will loose a little performance when JVMTI is used instead of skipping compilation. Am I missing something? @vnkozlov , I like the alternative approach better. I went with the current approach because I was thinking it would be simpler than the bailout, but I changed my mind after writing both out. Yes, we can change cha_monomorphic_target to nullptr instead of bailing out. But my understanding is any use of old/redefined methods will cause the compilation to be thrown out when we try to create the nmethod, so we are avoiding wasted work by bailing out early. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2375409856 From kvn at openjdk.org Thu Sep 26 00:09:33 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 26 Sep 2024 00:09:33 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 22:52:18 GMT, Dean Long wrote: >Yes, we can change cha_monomorphic_target to nullptr instead of bailing out. But my understanding is any use of old/redefined methods will cause the compilation to be thrown out when we try to create the nmethod, so we are avoiding wasted work by bailing out early. Good point. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2375478186 From lucy at openjdk.org Thu Sep 26 06:38:09 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 26 Sep 2024 06:38:09 GMT Subject: RFR: 8339542: compiler/codecache/CheckSegmentedCodeCache.java fails Message-ID: <9g7gld5X_2UaKyzwbd2FtMMDWUuIl7FAn8YfxFXls2E=.1b62dd3a-e619-4fe0-850e-ebc54789d5e3@github.com> On systems with large page sizes, the minimum size required for non-method code heap may be larger than the value preset in the test. This effect is triggered by JDK-8334564, which increases the required minimum size such that it will not be rounded down to zero. SAP-internal testsuite completed with no related errors. In particular, CheckSegmentedCodeCache.jave succeeds. ------------- Commit messages: - adjust the correct CodeHeap size - 8339542: compiler/codecache/CheckSegmentedCodeCache.java fails Changes: https://git.openjdk.org/jdk/pull/21179/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21179&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339542 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21179.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21179/head:pull/21179 PR: https://git.openjdk.org/jdk/pull/21179 From mdoerr at openjdk.org Thu Sep 26 06:38:09 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 26 Sep 2024 06:38:09 GMT Subject: RFR: 8339542: compiler/codecache/CheckSegmentedCodeCache.java fails In-Reply-To: <9g7gld5X_2UaKyzwbd2FtMMDWUuIl7FAn8YfxFXls2E=.1b62dd3a-e619-4fe0-850e-ebc54789d5e3@github.com> References: <9g7gld5X_2UaKyzwbd2FtMMDWUuIl7FAn8YfxFXls2E=.1b62dd3a-e619-4fe0-850e-ebc54789d5e3@github.com> Message-ID: On Wed, 25 Sep 2024 10:37:59 GMT, Lutz Schmidt wrote: > On systems with large page sizes, the minimum size required for non-method code heap may be larger than the value preset in the test. This effect is triggered by JDK-8334564, which increases the required minimum size such that it will not be rounded down to zero. > > SAP-internal testsuite completed with no related errors. In particular, CheckSegmentedCodeCache.jave succeeds. LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21179#pullrequestreview-2329366658 From lucy at openjdk.org Thu Sep 26 06:38:09 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 26 Sep 2024 06:38:09 GMT Subject: RFR: 8339542: compiler/codecache/CheckSegmentedCodeCache.java fails In-Reply-To: <9g7gld5X_2UaKyzwbd2FtMMDWUuIl7FAn8YfxFXls2E=.1b62dd3a-e619-4fe0-850e-ebc54789d5e3@github.com> References: <9g7gld5X_2UaKyzwbd2FtMMDWUuIl7FAn8YfxFXls2E=.1b62dd3a-e619-4fe0-850e-ebc54789d5e3@github.com> Message-ID: <-LHZDGOSBE9zQPutJB-QOosqZ-zDULCoIAIb5h_gFOI=.40fb46be-3370-4f3c-9647-74d4b642f597@github.com> On Wed, 25 Sep 2024 10:37:59 GMT, Lutz Schmidt wrote: > On systems with large page sizes, the minimum size required for non-method code heap may be larger than the value preset in the test. This effect is triggered by JDK-8334564, which increases the required minimum size such that it will not be rounded down to zero. > > SAP-internal testsuite completed with no related errors. In particular, CheckSegmentedCodeCache.jave succeeds. @shipilev You mentioned you saw this error on multiple platforms. Could you please retest with this fix? @SAP, we encountered the issue only on linuxppc64le. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21179#issuecomment-2376053566 From chagedorn at openjdk.org Thu Sep 26 07:42:54 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 26 Sep 2024 07:42:54 GMT Subject: RFR: 8340786: Introduce Predicate classes with predicate iterators and visitors for simplified walking [v3] In-Reply-To: References: Message-ID: > This patch introduces new predicate classes which implement a new `Predicate` interface. These classes represent the different predicates in the C2 IR. They are used in combination with new predicate iterator and visitors classes to provide an easy way to walk and process predicates in the IR. > > ### Predicate Interfaces and Implementing Classes > - `Predicate` interface is implemented by four predicate classes: > - `ParsePredicate` (existing class) > - `RuntimePredicate` (existing and updated class) > - `TemplateAssertionPredicate` (new class) > - `InitializedAssertionPredicate` (new class, renamed old `InitializedAssertionPredicate` class to `InitializedAssertionPredicateCreator`) > > ### Predicate Iterator with Visitor classes > There is a new `PredicateIterator` class which can be used to iterate through the predicates of a loop. For each predicate, a `PredicateVisitor` can be applied. The user can implement the `PredicateIterator` interface and override the default do-nothing implementations to the specific needs of the code. I've done this for a couple of places in the code by defining new visitors: > - `ParsePredicateUsefulMarker`: This visitor marks all Parse Predicates as useful. > - Replaces the old now retired `ParsePredicateIterator`. > - `DominatedPredicates`: This visitor checks the dominance relation to an `early` node when trying to figure out the latest legal placement for a node. The goal is to skip as many predicates as possible to avoid interference with Loop Predication and/or creating a Loop Limit Check Predicate. > - Replaces the old now retired `PredicateEntryIterator`. > - `Predicates::dump()`: Newly added dumping code for the predicates above a loop which uses a new `PredicatePrinter` visitor. This helps debugging issues with predicates. > > #### To Be Replaced soon > There are a couple of places where we use similar code to walk predicates and apply some transformation/modifications to the IR. The goal is to replace these locations with the new visitors as well. This will incrementally be done with the next couple of PRs. > > ### More Information > More information about specific classes and changes can be found as code comments and PR comments. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Add missing public for UnifiedPredicateVisitor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21161/files - new: https://git.openjdk.org/jdk/pull/21161/files/7fa77140..4385fa78 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21161&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21161&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21161.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21161/head:pull/21161 PR: https://git.openjdk.org/jdk/pull/21161 From shade at openjdk.org Thu Sep 26 09:01:36 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 26 Sep 2024 09:01:36 GMT Subject: RFR: 8339542: compiler/codecache/CheckSegmentedCodeCache.java fails In-Reply-To: <9g7gld5X_2UaKyzwbd2FtMMDWUuIl7FAn8YfxFXls2E=.1b62dd3a-e619-4fe0-850e-ebc54789d5e3@github.com> References: <9g7gld5X_2UaKyzwbd2FtMMDWUuIl7FAn8YfxFXls2E=.1b62dd3a-e619-4fe0-850e-ebc54789d5e3@github.com> Message-ID: On Wed, 25 Sep 2024 10:37:59 GMT, Lutz Schmidt wrote: > On systems with large page sizes, the minimum size required for non-method code heap may be larger than the value preset in the test. This effect is triggered by JDK-8334564, which increases the required minimum size such that it will not be rounded down to zero. > > SAP-internal testsuite completed with no related errors. In particular, CheckSegmentedCodeCache.jave succeeds. Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21179#pullrequestreview-2330581605 From shade at openjdk.org Thu Sep 26 09:01:37 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 26 Sep 2024 09:01:37 GMT Subject: RFR: 8339542: compiler/codecache/CheckSegmentedCodeCache.java fails In-Reply-To: <-LHZDGOSBE9zQPutJB-QOosqZ-zDULCoIAIb5h_gFOI=.40fb46be-3370-4f3c-9647-74d4b642f597@github.com> References: <9g7gld5X_2UaKyzwbd2FtMMDWUuIl7FAn8YfxFXls2E=.1b62dd3a-e619-4fe0-850e-ebc54789d5e3@github.com> <-LHZDGOSBE9zQPutJB-QOosqZ-zDULCoIAIb5h_gFOI=.40fb46be-3370-4f3c-9647-74d4b642f597@github.com> Message-ID: On Thu, 26 Sep 2024 06:34:43 GMT, Lutz Schmidt wrote: > @shipilev You mentioned you saw this error on multiple platforms. Could you please retest with this fix? @SAP, we encountered the issue only on linuxppc64le. Thanks. Yup, I have seen it on my AArch64 and x86_64 hosts. Tested with the patch and 100 repetitions, does not fail now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21179#issuecomment-2376363386 From duke at openjdk.org Thu Sep 26 09:20:50 2024 From: duke at openjdk.org (Yagmur Eren) Date: Thu, 26 Sep 2024 09:20:50 GMT Subject: RFR: 8337679: Memset warning in src/hotspot/share/adlc/adlArena.cpp Message-ID: <9yDUB3FF_AsaKfYueiUWd-npcSrx9Iy7RPBaHqEDu4Q=.696a57cb-1880-4068-b7f9-f7de7dee4b5e@github.com> The purpose of this `memset` was to overwrite memory with garbage data before freeing it, helping detect bugs where the freed memory is accessed afterward. Therefore, removing it will no impact on functionality. Or it could be zapped with `memset_s` but zapping seems negligible in this case. It passes tier1 tests. See [JDK-8337679](https://bugs.openjdk.org/browse/JDK-8337679). ------------- Commit messages: - 8337679: Memset warning in src/hotspot/share/adlc/adlArena.cpp Changes: https://git.openjdk.org/jdk/pull/21200/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21200&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337679 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21200.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21200/head:pull/21200 PR: https://git.openjdk.org/jdk/pull/21200 From stefank at openjdk.org Thu Sep 26 09:26:36 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 26 Sep 2024 09:26:36 GMT Subject: RFR: 8337679: Memset warning in src/hotspot/share/adlc/adlArena.cpp In-Reply-To: <9yDUB3FF_AsaKfYueiUWd-npcSrx9Iy7RPBaHqEDu4Q=.696a57cb-1880-4068-b7f9-f7de7dee4b5e@github.com> References: <9yDUB3FF_AsaKfYueiUWd-npcSrx9Iy7RPBaHqEDu4Q=.696a57cb-1880-4068-b7f9-f7de7dee4b5e@github.com> Message-ID: On Thu, 26 Sep 2024 09:16:11 GMT, Yagmur Eren wrote: > The purpose of this `memset` was to overwrite memory with garbage data before freeing it, helping detect bugs where the freed memory is accessed afterward. Therefore, removing it will no impact on functionality. Or it could be zapped with `memset_s` but zapping seems negligible in this case. It passes tier1 tests. See [JDK-8337679](https://bugs.openjdk.org/browse/JDK-8337679). Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21200#pullrequestreview-2330642630 From thartmann at openjdk.org Thu Sep 26 10:22:34 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 26 Sep 2024 10:22:34 GMT Subject: RFR: 8337679: Memset warning in src/hotspot/share/adlc/adlArena.cpp In-Reply-To: <9yDUB3FF_AsaKfYueiUWd-npcSrx9Iy7RPBaHqEDu4Q=.696a57cb-1880-4068-b7f9-f7de7dee4b5e@github.com> References: <9yDUB3FF_AsaKfYueiUWd-npcSrx9Iy7RPBaHqEDu4Q=.696a57cb-1880-4068-b7f9-f7de7dee4b5e@github.com> Message-ID: <4aguthnpeOVqmV5p52E_k7fp9dvgIj4TKnjaZdTo5Jw=.590b8cf8-2fdc-438a-a263-89eca85e2f28@github.com> On Thu, 26 Sep 2024 09:16:11 GMT, Yagmur Eren wrote: > The purpose of this `memset` was to overwrite memory with garbage data before freeing it, helping detect bugs where the freed memory is accessed afterward. Therefore, removing it will no impact on functionality. Or it could be zapped with `memset_s` but zapping seems negligible in this case. It passes tier1 tests. See [JDK-8337679](https://bugs.openjdk.org/browse/JDK-8337679). That looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21200#pullrequestreview-2330774259 From duke at openjdk.org Thu Sep 26 10:28:35 2024 From: duke at openjdk.org (Yagmur Eren) Date: Thu, 26 Sep 2024 10:28:35 GMT Subject: RFR: 8337679: Memset warning in src/hotspot/share/adlc/adlArena.cpp In-Reply-To: References: <9yDUB3FF_AsaKfYueiUWd-npcSrx9Iy7RPBaHqEDu4Q=.696a57cb-1880-4068-b7f9-f7de7dee4b5e@github.com> Message-ID: On Thu, 26 Sep 2024 09:24:14 GMT, Stefan Karlsson wrote: >> The purpose of this `memset` was to overwrite memory with garbage data before freeing it, helping detect bugs where the freed memory is accessed afterward. Therefore, removing it will no impact on functionality. Or it could be zapped with `memset_s` but zapping seems negligible in this case. It passes tier1 tests. See [JDK-8337679](https://bugs.openjdk.org/browse/JDK-8337679). > > Marked as reviewed by stefank (Reviewer). Thanks for the review @stefank and @TobiHartmann! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21200#issuecomment-2376551590 From duke at openjdk.org Thu Sep 26 10:28:36 2024 From: duke at openjdk.org (duke) Date: Thu, 26 Sep 2024 10:28:36 GMT Subject: RFR: 8337679: Memset warning in src/hotspot/share/adlc/adlArena.cpp In-Reply-To: <9yDUB3FF_AsaKfYueiUWd-npcSrx9Iy7RPBaHqEDu4Q=.696a57cb-1880-4068-b7f9-f7de7dee4b5e@github.com> References: <9yDUB3FF_AsaKfYueiUWd-npcSrx9Iy7RPBaHqEDu4Q=.696a57cb-1880-4068-b7f9-f7de7dee4b5e@github.com> Message-ID: On Thu, 26 Sep 2024 09:16:11 GMT, Yagmur Eren wrote: > The purpose of this `memset` was to overwrite memory with garbage data before freeing it, helping detect bugs where the freed memory is accessed afterward. Therefore, removing it will no impact on functionality. Or it could be zapped with `memset_s` but zapping seems negligible in this case. It passes tier1 tests. See [JDK-8337679](https://bugs.openjdk.org/browse/JDK-8337679). @nelanbu Your change (at version 2187ff1f14a8ac0e73b939101e7511dd55d8982b) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21200#issuecomment-2376554699 From jwaters at openjdk.org Thu Sep 26 10:37:36 2024 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 26 Sep 2024 10:37:36 GMT Subject: RFR: 8337679: Memset warning in src/hotspot/share/adlc/adlArena.cpp In-Reply-To: <9yDUB3FF_AsaKfYueiUWd-npcSrx9Iy7RPBaHqEDu4Q=.696a57cb-1880-4068-b7f9-f7de7dee4b5e@github.com> References: <9yDUB3FF_AsaKfYueiUWd-npcSrx9Iy7RPBaHqEDu4Q=.696a57cb-1880-4068-b7f9-f7de7dee4b5e@github.com> Message-ID: On Thu, 26 Sep 2024 09:16:11 GMT, Yagmur Eren wrote: > The purpose of this `memset` was to overwrite memory with garbage data before freeing it, helping detect bugs where the freed memory is accessed afterward. Therefore, removing it will no impact on functionality. Or it could be zapped with `memset_s` but zapping seems negligible in this case. It passes tier1 tests. See [JDK-8337679](https://bugs.openjdk.org/browse/JDK-8337679). Hmm, wouldn't this make detecting bugs where memory is incorrectly accessed harder? (Seeing that's the purpose of this particular memset) Not sure if I'm missing anything here. Does the warning trigger if the memset is commented out? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21200#issuecomment-2376573780 From tholenstein at openjdk.org Thu Sep 26 11:22:51 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 26 Sep 2024 11:22:51 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access [v7] In-Reply-To: References: Message-ID: > We failed in `LibraryCallKit::inline_unsafe_access()` while trying to inline `Unsafe::getShortUnaligned`. > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java#L86 > The reason is that base (the array) is `ConP #null` hidden behind two `CheckCastPP` with `speculative=byte[int:>=0]` > > We call `Node* adr = make_unsafe_address(base, offset, type, kind == Relaxed);` > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2361 > - with **base** = `147 CheckCastPP` > - `118 ConP === 0 [[[ 106 101 71 ] #null` > type > > Depending on the **offset** we go two different paths in `LibraryCallKit::make_unsafe_address` which both lead to the same error in the end. > 1. For `UNSAFE.getShortUnaligned(array, 1_049_000)` we get kind = `Type::AnyPtr` because `offset >= os::vm_page_size()`. Since we assume base can't be null we insert an assert: > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2111 > > 2. whereas for `UNSAFE.getShortUnaligned(array, 1)` we get kind = `Type:: OopPtr` > https://github.com/openjdk/jdk/blob/c17fa910cf3bad48547a3f0d68a30795ec3194e6/src/hotspot/share/opto/library_call.cpp#L2078 > and insert a null check > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2090 > In both cases we return call `basic_plus_adr(..)` on a base being `top()` which returns **adr** = `1 Con === 0 [[ ]] #top` > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2386 => `_gvn.type(adr)` is _top_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2394 => `adr_type` is _nullptr_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2405-L2406 => `BasicType bt` is _T_ILLEGAL_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2424 => we fail here with `SIGSEGV: null pointer dereference` because `alias_type->adr_type()` is _nullptr_ > > ### Fix (updated on 18th Sep 2024) > The fix modifies the `LibraryCallKit::classify_unsafe_addr()`... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: remove othervm in TestUnsafeArrayAccessWithNullBase.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20033/files - new: https://git.openjdk.org/jdk/pull/20033/files/255c0de3..85f0b328 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20033&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20033&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20033.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20033/head:pull/20033 PR: https://git.openjdk.org/jdk/pull/20033 From tholenstein at openjdk.org Thu Sep 26 11:22:51 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 26 Sep 2024 11:22:51 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access [v6] In-Reply-To: <2gx5pEpUCQJmu3XXhW-AaXh2JVWlV9DTnFBScxx2dHQ=.97cdc92d-f1fd-40a6-bdc6-8e4ab36a5a02@github.com> References: <2gx5pEpUCQJmu3XXhW-AaXh2JVWlV9DTnFBScxx2dHQ=.97cdc92d-f1fd-40a6-bdc6-8e4ab36a5a02@github.com> Message-ID: On Wed, 25 Sep 2024 17:09:55 GMT, Vladimir Kozlov wrote: >> Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java >> >> Co-authored-by: Tobias Hartmann > > test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java line 34: > >> 32: * -XX:CompileCommand=compileonly,compiler.parsing.TestUnsafeArrayAccessWithNullBase::test* >> 33: * -XX:-TieredCompilation compiler.parsing.TestUnsafeArrayAccessWithNullBase >> 34: * @run main/othervm compiler.parsing.TestUnsafeArrayAccessWithNullBase > > You don't need `/othervm` if no VM's flags are specified. done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20033#discussion_r1776863398 From jsjolen at openjdk.org Thu Sep 26 11:45:38 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 26 Sep 2024 11:45:38 GMT Subject: RFR: 8337679: Memset warning in src/hotspot/share/adlc/adlArena.cpp In-Reply-To: References: <9yDUB3FF_AsaKfYueiUWd-npcSrx9Iy7RPBaHqEDu4Q=.696a57cb-1880-4068-b7f9-f7de7dee4b5e@github.com> Message-ID: On Thu, 26 Sep 2024 10:24:04 GMT, Yagmur Eren wrote: >> Marked as reviewed by stefank (Reviewer). > > Thanks for the review @stefank and @TobiHartmann! Hi! @nelanbu , please wait 24 hours before integrating a change. This rule is in place such that everyone around the globe gets a chance to look and comment on a PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21200#issuecomment-2376707401 From lucy at openjdk.org Thu Sep 26 11:47:39 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 26 Sep 2024 11:47:39 GMT Subject: RFR: 8339542: compiler/codecache/CheckSegmentedCodeCache.java fails In-Reply-To: References: <9g7gld5X_2UaKyzwbd2FtMMDWUuIl7FAn8YfxFXls2E=.1b62dd3a-e619-4fe0-850e-ebc54789d5e3@github.com> <-LHZDGOSBE9zQPutJB-QOosqZ-zDULCoIAIb5h_gFOI=.40fb46be-3370-4f3c-9647-74d4b642f597@github.com> Message-ID: On Thu, 26 Sep 2024 08:58:55 GMT, Aleksey Shipilev wrote: >> @shipilev You mentioned you saw this error on multiple platforms. Could you please retest with this fix? @SAP, we encountered the issue only on linuxppc64le. Thanks. > >> @shipilev You mentioned you saw this error on multiple platforms. Could you please retest with this fix? @SAP, we encountered the issue only on linuxppc64le. Thanks. > > Yup, I have seen it on my AArch64 and x86_64 hosts. Tested with the patch and 100 repetitions, does not fail now. @shipilev @TheRealMDoerr Thanks for the prompt reviews and additional testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21179#issuecomment-2376709534 From lucy at openjdk.org Thu Sep 26 11:47:40 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 26 Sep 2024 11:47:40 GMT Subject: Integrated: 8339542: compiler/codecache/CheckSegmentedCodeCache.java fails In-Reply-To: <9g7gld5X_2UaKyzwbd2FtMMDWUuIl7FAn8YfxFXls2E=.1b62dd3a-e619-4fe0-850e-ebc54789d5e3@github.com> References: <9g7gld5X_2UaKyzwbd2FtMMDWUuIl7FAn8YfxFXls2E=.1b62dd3a-e619-4fe0-850e-ebc54789d5e3@github.com> Message-ID: On Wed, 25 Sep 2024 10:37:59 GMT, Lutz Schmidt wrote: > On systems with large page sizes, the minimum size required for non-method code heap may be larger than the value preset in the test. This effect is triggered by JDK-8334564, which increases the required minimum size such that it will not be rounded down to zero. > > SAP-internal testsuite completed with no related errors. In particular, CheckSegmentedCodeCache.jave succeeds. This pull request has now been integrated. Changeset: 777c20cb Author: Lutz Schmidt URL: https://git.openjdk.org/jdk/commit/777c20cb14010b6726834246ae4c61bc4ccb3f9b Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod 8339542: compiler/codecache/CheckSegmentedCodeCache.java fails Reviewed-by: mdoerr, shade ------------- PR: https://git.openjdk.org/jdk/pull/21179 From tholenstein at openjdk.org Thu Sep 26 11:57:36 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 26 Sep 2024 11:57:36 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access [v5] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 07:54:47 GMT, Tobias Hartmann wrote: > Looks good to me too. > > > So it would make sense to go over the uses of Type*::BOTTOM/Type*::NOTNULL and check they are not tested with pointer equality > > What about this concern? Did anyone check yet or should be file a follow-up task? I filed a follow up task: https://bugs.openjdk.org/browse/JDK-8341023 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20033#issuecomment-2376729311 From duke at openjdk.org Thu Sep 26 12:40:35 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Thu, 26 Sep 2024 12:40:35 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v3] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 06:05:15 GMT, Doug Simon wrote: >> [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in `-Xcomp` or `-Xbatch` mode. >> >> However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). >> >> This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > rename changeCompilerThreadCanCallJava to updateCompilerThreadCanCallJava src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/CompilerThreadCanCallJavaScope.java line 81: > 79: > 80: if (vm != null) { > 81: vm.updateCompilerThreadCanCallJava(!state); This is not correct. The scope has to capture the original `_can_call_java` value in the constructor and restore it here in `close`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21171#discussion_r1776989493 From roland at openjdk.org Thu Sep 26 12:52:37 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 26 Sep 2024 12:52:37 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v16] In-Reply-To: <6zr-O1Pj6yXbpDB7wfDJxJA8VJqU23ZfiM59ED9yE88=.5d051ddd-8dc7-4e89-9ca2-1c0a05f25282@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> <6zr-O1Pj6yXbpDB7wfDJxJA8VJqU23ZfiM59ED9yE88=.5d051ddd-8dc7-4e89-9ca2-1c0a05f25282@github.com> Message-ID: On Wed, 25 Sep 2024 16:04:54 GMT, Kangcheng Xu wrote: >> Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. >> >> Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > make jcheck happy Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18489#pullrequestreview-2331142629 From dlunden at openjdk.org Thu Sep 26 14:06:18 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 26 Sep 2024 14:06:18 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v6] In-Reply-To: References: Message-ID: > If a method has a large number of parameters, we currently bail out from C2 compilation. > > ### Changeset > > Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. > > Changes: > - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. > - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. > - Remove all `can_represent` checks and bailouts. > - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. > - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. > - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it). > > ![c2-regression](https:/... Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - Remove leftover debug var - Update - Merge tag 'jdk-24+16' into HEAD Added tag jdk-24+16 for changeset c58fbef0 - Formatting updates - Update - Update after Roberto's comments and suggestions - Add can_represent asserts - Remove leftover CHUNK_SIZE reference - Support methods with many arguments in C2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20404/files - new: https://git.openjdk.org/jdk/pull/20404/files/36f9dabf..1bec1692 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=04-05 Stats: 229736 lines in 2710 files changed: 190198 ins; 24729 del; 14809 mod Patch: https://git.openjdk.org/jdk/pull/20404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404 PR: https://git.openjdk.org/jdk/pull/20404 From dlunden at openjdk.org Thu Sep 26 14:06:18 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 26 Sep 2024 14:06:18 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v5] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 12:15:50 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: > > - Formatting updates > - Update I just pushed another update. Change summary: - Merge with jdk-24+16 - Add assert to new rare bailout in `PhaseChaitin::Select` that should not happen in practice - Refactor `OptoRegPair` slightly and add asserts checking for overflow (@dean-long). - Improve the static asserts in `regmask.hpp` (@dean-long) - Performance optimization: add a function to trim watermarks in `regmask.hpp` and apply it in the `SUBTRACT*` methods. - Update some tests related to dynamically generating and testing methods with big arities. After this changeset, and in combination with `-Xcomp`, we compile a lot of methods in these tests that we previously bailed out on. I had to adjust some limits to allow bailing out in the register allocator when things get out of hand. It seems like this is only a problem in these very specific (and artificial) tests, but if it surfaces elsewhere in the future we may need to reevaluate the bailouts we have during register allocation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2377064661 From tholenstein at openjdk.org Thu Sep 26 14:45:01 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 26 Sep 2024 14:45:01 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access [v8] In-Reply-To: References: Message-ID: > We failed in `LibraryCallKit::inline_unsafe_access()` while trying to inline `Unsafe::getShortUnaligned`. > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java#L86 > The reason is that base (the array) is `ConP #null` hidden behind two `CheckCastPP` with `speculative=byte[int:>=0]` > > We call `Node* adr = make_unsafe_address(base, offset, type, kind == Relaxed);` > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2361 > - with **base** = `147 CheckCastPP` > - `118 ConP === 0 [[[ 106 101 71 ] #null` > type > > Depending on the **offset** we go two different paths in `LibraryCallKit::make_unsafe_address` which both lead to the same error in the end. > 1. For `UNSAFE.getShortUnaligned(array, 1_049_000)` we get kind = `Type::AnyPtr` because `offset >= os::vm_page_size()`. Since we assume base can't be null we insert an assert: > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2111 > > 2. whereas for `UNSAFE.getShortUnaligned(array, 1)` we get kind = `Type:: OopPtr` > https://github.com/openjdk/jdk/blob/c17fa910cf3bad48547a3f0d68a30795ec3194e6/src/hotspot/share/opto/library_call.cpp#L2078 > and insert a null check > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2090 > In both cases we return call `basic_plus_adr(..)` on a base being `top()` which returns **adr** = `1 Con === 0 [[ ]] #top` > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2386 => `_gvn.type(adr)` is _top_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2394 => `adr_type` is _nullptr_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2405-L2406 => `BasicType bt` is _T_ILLEGAL_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2424 => we fail here with `SIGSEGV: null pointer dereference` because `alias_type->adr_type()` is _nullptr_ > > ### Fix (updated on 18th Sep 2024) > The fix modifies the `LibraryCallKit::classify_unsafe_addr()`... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: add second uncast (Vladimirs suggestion) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20033/files - new: https://git.openjdk.org/jdk/pull/20033/files/85f0b328..4abc6f13 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20033&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20033&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20033.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20033/head:pull/20033 PR: https://git.openjdk.org/jdk/pull/20033 From tholenstein at openjdk.org Thu Sep 26 14:45:01 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 26 Sep 2024 14:45:01 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access [v6] In-Reply-To: <2gx5pEpUCQJmu3XXhW-AaXh2JVWlV9DTnFBScxx2dHQ=.97cdc92d-f1fd-40a6-bdc6-8e4ab36a5a02@github.com> References: <2gx5pEpUCQJmu3XXhW-AaXh2JVWlV9DTnFBScxx2dHQ=.97cdc92d-f1fd-40a6-bdc6-8e4ab36a5a02@github.com> Message-ID: <4nKypaGj2OJlOYqjoW7UEym8Kv8Gcz5p0x0wawaGCN8=.3df11cbc-e037-4e5d-9e3e-1c1c9d74172c@github.com> On Wed, 25 Sep 2024 17:11:14 GMT, Vladimir Kozlov wrote: >> Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java >> >> Co-authored-by: Tobias Hartmann > > src/hotspot/share/opto/library_call.cpp line 2367: > >> 2365: assert(!stopped(), "Inlining of unsafe access failed: address construction stopped unexpectedly"); >> 2366: >> 2367: if (_gvn.type(base)->isa_ptr() == TypePtr::NULL_PTR) { > > Why not `uncast` here? makes sense to add here as well. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20033#discussion_r1777211788 From kvn at openjdk.org Thu Sep 26 15:52:38 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 26 Sep 2024 15:52:38 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access [v8] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 14:45:01 GMT, Tobias Holenstein wrote: >> We failed in `LibraryCallKit::inline_unsafe_access()` while trying to inline `Unsafe::getShortUnaligned`. >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java#L86 >> The reason is that base (the array) is `ConP #null` hidden behind two `CheckCastPP` with `speculative=byte[int:>=0]` >> >> We call `Node* adr = make_unsafe_address(base, offset, type, kind == Relaxed);` >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2361 >> - with **base** = `147 CheckCastPP` >> - `118 ConP === 0 [[[ 106 101 71 ] #null` >> type >> >> Depending on the **offset** we go two different paths in `LibraryCallKit::make_unsafe_address` which both lead to the same error in the end. >> 1. For `UNSAFE.getShortUnaligned(array, 1_049_000)` we get kind = `Type::AnyPtr` because `offset >= os::vm_page_size()`. Since we assume base can't be null we insert an assert: >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2111 >> >> 2. whereas for `UNSAFE.getShortUnaligned(array, 1)` we get kind = `Type:: OopPtr` >> https://github.com/openjdk/jdk/blob/c17fa910cf3bad48547a3f0d68a30795ec3194e6/src/hotspot/share/opto/library_call.cpp#L2078 >> and insert a null check >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2090 >> In both cases we return call `basic_plus_adr(..)` on a base being `top()` which returns **adr** = `1 Con === 0 [[ ]] #top` >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2386 => `_gvn.type(adr)` is _top_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2394 => `adr_type` is _nullptr_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2405-L2406 => `BasicType bt` is _T_ILLEGAL_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2424 => we fail here with `SIGSEGV: null pointer dereference` because `alias_type->adr_type()` is _nullptr_ >> >> ### Fix (updated on 18th Sep 2024) >> T... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > add second uncast (Vladimirs suggestion) Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20033#pullrequestreview-2331720413 From kvn at openjdk.org Thu Sep 26 16:11:38 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 26 Sep 2024 16:11:38 GMT Subject: RFR: 8337679: Memset warning in src/hotspot/share/adlc/adlArena.cpp In-Reply-To: References: <9yDUB3FF_AsaKfYueiUWd-npcSrx9Iy7RPBaHqEDu4Q=.696a57cb-1880-4068-b7f9-f7de7dee4b5e@github.com> Message-ID: On Thu, 26 Sep 2024 10:34:30 GMT, Julian Waters wrote: > Hmm, wouldn't this make detecting bugs where memory is incorrectly accessed harder? (Seeing that's the purpose of this particular memset) Not sure if I'm missing anything here. Does the warning trigger if the memset is commented out? We can use VM's variants `os::malloc()` and `os::free()` to catch dangling pointers as we do in other places in VM. This will avoid warning I think. On other hand `adlc` is used only during VM build and using system's `free()` will not help to catch such issues anyway. So I am fine with this change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21200#issuecomment-2377386136 From kxu at openjdk.org Thu Sep 26 16:19:27 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 26 Sep 2024 16:19:27 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v17] In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: fix wrong expect values in test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18489/files - new: https://git.openjdk.org/jdk/pull/18489/files/66b04622..de6b7e20 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=15-16 Stats: 5 lines in 1 file changed: 2 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18489.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489 PR: https://git.openjdk.org/jdk/pull/18489 From kxu at openjdk.org Thu Sep 26 16:25:40 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 26 Sep 2024 16:25:40 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v17] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Thu, 26 Sep 2024 16:19:27 GMT, Kangcheng Xu wrote: >> Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. >> >> Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > fix wrong expect values in test Random tests were unstable. The latest commit addressed wrong expected values being computed if the loop variable overflows (which the formula used for computing expected values are not accounted for). Stricter upper and lower bounds for RNG were added to avoid such an overflow. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18489#issuecomment-2377415901 From jbhateja at openjdk.org Thu Sep 26 17:30:38 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 26 Sep 2024 17:30:38 GMT Subject: RFR: 8337632: AES-GCM Algorithm optimization for x86_64 [v4] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 16:51:36 GMT, Smita Kamath wrote: >> Hi, >> I want to submit an AES-GCM algorithm optimization. This implementation is using AVX512/VAES Instructions. Additionally, it reduces PARALLEL_LEN from 7680 to 512 bytes. The performance numbers are as below. Kindly review the code. Thank you. >> >> Benchmark | Datasize | BaseJDK (ops/s) | Patch(ops/s) | %Gain >> -- | -- | -- | -- | -- >> full.AESGCMBench.decrypt | 512 | 2928259.197 | 3269964.387 | 11.67 >> full.AESGCMBench.decrypt | 1024 | 2494254.611 | 3010987.731 | 20.72 >> full.AESGCMBench.decrypt | 1500 | 1883453.546 | 1934915.846 | 2.73 >> full.AESGCMBench.decrypt | 2048 | 1825780.711 | 2452861.368 | 34.34 >> full.AESGCMBench.decrypt | 4096 | 1275108.345 | 1806329.066 | 41.66 >> full.AESGCMBench.decrypt | 8192 | 1033936.634 | 1196836.052 | 15.75 >> full.AESGCMBench.decrypt | 16384 | 681494.768 | 711630.498 | 4.42 >> full.AESGCMBench.decrypt | 32768 | 385026.017 | 395043.193 | 2.6 >> full.AESGCMBench.decrypt | 65536 | 207373.924 | 214723.588 | 3.54 >> ? | ? | ? | ? | ? >> full.AESGCMBench.encrypt | 512 | 2658008.476 | 2882496.94 | 8.45 >> full.AESGCMBench.encrypt | 1024 | 2283709.63 | 2589534.403 | 13.39 >> full.AESGCMBench.encrypt | 1500 | 1794993.519 | 1817669.531 | 1.26 >> full.AESGCMBench.encrypt | 2048 | 1745532.435 | 2191097.29 | 25.52 >> full.AESGCMBench.encrypt | 4096 | 1203301.174 | 1649593.953 | 37.08 >> full.AESGCMBench.encrypt | 8192 | 985174.988 | 1132407.54 | 14.94 >> full.AESGCMBench.encrypt | 16384 | 658980.441 | 684765.771 | 3.91 >> full.AESGCMBench.encrypt | 32768 | 373543.798 | 391518.837 | 4.81 >> full.AESGCMBench.encrypt | 65536 | 202532.315 | 205084.833 | 1.260301597 > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments Thanks @smita-kamath for addressing comments. ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17515#pullrequestreview-2331930179 From svkamath at openjdk.org Thu Sep 26 22:09:42 2024 From: svkamath at openjdk.org (Smita Kamath) Date: Thu, 26 Sep 2024 22:09:42 GMT Subject: RFR: 8337632: AES-GCM Algorithm optimization for x86_64 In-Reply-To: References: Message-ID: On Mon, 5 Aug 2024 19:12:52 GMT, Anthony Scarpino wrote: >> Hi, >> I want to submit an AES-GCM algorithm optimization. This implementation is using AVX512/VAES Instructions. Additionally, it reduces PARALLEL_LEN from 7680 to 512 bytes. The performance numbers are as below. Kindly review the code. Thank you. >> >> Benchmark | Datasize | BaseJDK (ops/s) | Patch(ops/s) | %Gain >> -- | -- | -- | -- | -- >> full.AESGCMBench.decrypt | 512 | 2928259.197 | 3269964.387 | 11.67 >> full.AESGCMBench.decrypt | 1024 | 2494254.611 | 3010987.731 | 20.72 >> full.AESGCMBench.decrypt | 1500 | 1883453.546 | 1934915.846 | 2.73 >> full.AESGCMBench.decrypt | 2048 | 1825780.711 | 2452861.368 | 34.34 >> full.AESGCMBench.decrypt | 4096 | 1275108.345 | 1806329.066 | 41.66 >> full.AESGCMBench.decrypt | 8192 | 1033936.634 | 1196836.052 | 15.75 >> full.AESGCMBench.decrypt | 16384 | 681494.768 | 711630.498 | 4.42 >> full.AESGCMBench.decrypt | 32768 | 385026.017 | 395043.193 | 2.6 >> full.AESGCMBench.decrypt | 65536 | 207373.924 | 214723.588 | 3.54 >> ? | ? | ? | ? | ? >> full.AESGCMBench.encrypt | 512 | 2658008.476 | 2882496.94 | 8.45 >> full.AESGCMBench.encrypt | 1024 | 2283709.63 | 2589534.403 | 13.39 >> full.AESGCMBench.encrypt | 1500 | 1794993.519 | 1817669.531 | 1.26 >> full.AESGCMBench.encrypt | 2048 | 1745532.435 | 2191097.29 | 25.52 >> full.AESGCMBench.encrypt | 4096 | 1203301.174 | 1649593.953 | 37.08 >> full.AESGCMBench.encrypt | 8192 | 985174.988 | 1132407.54 | 14.94 >> full.AESGCMBench.encrypt | 16384 | 658980.441 | 684765.771 | 3.91 >> full.AESGCMBench.encrypt | 32768 | 373543.798 | 391518.837 | 4.81 >> full.AESGCMBench.encrypt | 65536 | 202532.315 | 205084.833 | 1.260301597 > > That's a good performance increase for such a small code change. I reviewed the simple java code change. I'll let a hotspot reviewer handle the rest of the code. @ascarpino I have two approvals for this PR. Would it be possible for you to run this through your testing? Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17515#issuecomment-2378020801 From dlong at openjdk.org Thu Sep 26 23:35:42 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 26 Sep 2024 23:35:42 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v6] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 14:06:18 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - Remove leftover debug var > - Update > - Merge tag 'jdk-24+16' into HEAD > > Added tag jdk-24+16 for changeset c58fbef0 > - Formatting updates > - Update > - Update after Roberto's comments and suggestions > - Add can_represent asserts > - Remove leftover CHUNK_SIZE reference > - Support methods with many arguments in C2 src/hotspot/share/opto/optoreg.hpp line 195: > 193: static constexpr bool can_fit(OptoReg::Name n) { > 194: return n <= std::numeric_limits::max(); > 195: } At first I thought this will always return true, but now I see it is checking OptoReg::Name against OptoRegPair::Name. src/hotspot/share/opto/optoreg.hpp line 208: > 206: assert(can_fit(n + 1), "overflow"); > 207: assert(can_fit(n), "overflow"); > 208: _second = n + 1; I think we could just use checked_cast<>() here and then we don't need can_fit. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r1777842616 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r1777843018 From dlong at openjdk.org Thu Sep 26 23:35:42 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 26 Sep 2024 23:35:42 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v6] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 23:31:52 GMT, Dean Long wrote: >> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: >> >> - Remove leftover debug var >> - Update >> - Merge tag 'jdk-24+16' into HEAD >> >> Added tag jdk-24+16 for changeset c58fbef0 >> - Formatting updates >> - Update >> - Update after Roberto's comments and suggestions >> - Add can_represent asserts >> - Remove leftover CHUNK_SIZE reference >> - Support methods with many arguments in C2 > > src/hotspot/share/opto/optoreg.hpp line 208: > >> 206: assert(can_fit(n + 1), "overflow"); >> 207: assert(can_fit(n), "overflow"); >> 208: _second = n + 1; > > I think we could just use checked_cast<>() here and then we don't need can_fit. What do you think? I see we still need can_fit() for the static asserts, so nevermind. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r1777843867 From dqu at openjdk.org Fri Sep 27 02:47:03 2024 From: dqu at openjdk.org (Daohan Qu) Date: Fri, 27 Sep 2024 02:47:03 GMT Subject: RFR: 8340602: C2: LoadNode::split_through_phi might exhaust nodes in case of base_is_phi [v4] In-Reply-To: References: Message-ID: > # Description > > [JDK-6934604](https://github.com/openjdk/jdk/commit/b4977e887a53c898b96a7d37a3bf94742c7cc194) introduces the flag `AggressiveUnboxing` in jdk8u, and [JDK-8217919](https://github.com/openjdk/jdk/commit/71759e3177fcd6926bb36a30a26f9f3976f2aae8) enables it by default in jdk13u. > > But it seems that JDK-6934604 forgets to check duplicate `PhiNode` generated in the function `LoadNode::split_through_phi(PhaseGVN *phase)` (in `memnode.cpp`) in the case that `base` is phi but `mem` is not phi. More exactly, `LoadNode::Identity(PhaseTransform *phase)` doesn't search for `PhiNode` in the correct region in that case. > > This might cause infinite split in the case of a loop, which is similar to the bugs fixed in [JDK-6673473](https://github.com/openjdk/jdk/commit/30dc0edfc877000c0ae20384f228b45ba82807b7). The infinite split results in "Out of nodes" and make the method "not compilable". > > Since JDK-8217919 (in jdk13u), all the later versions of jdks are affected by this bug when the expected optimization pattern appears in the code. For example, the following three micro-benchmarks running with > > > make test \ > TEST="micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef" \ > MICRO="FORK=1;WARMUP_ITER=2" \ > TEST_OPTS="VM_OPTIONS=-XX:+UseParallelGC" > > > shows performance improvement after this PR applied. (`-XX:+UseParallelGC` is only for reproduce this bug, all the bms in the following table are run with this option.) > > |benchmark (throughput, unit: ops/s)| jdk-before-this-patch | jdk-after-this-patch | > |---|---|---| > |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner | 26.452 ?(99.9%) 0.185 ops/s | 62.060 ?(99.9%) 0.878 ops/s | > |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda | 26.922 ?(99.9%) 1.710 ops/s | 67.961 ?(99.9%) 0.850 ops/s | > |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef | 28.382 ?(99.9%) 1.021 ops/s | 67.998 ?(99.9%) 0.751 ops/s | > > # Reproduction > > Compiled and run the reduced test case `Test.java` in the appendix below using > > > java -Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=comp.log -XX:+UseParallelGC Test > > > and you could find that `Test$Obj.calc` is tagged with `make_not_compilable` and see some output like > > > src/hotspot/share/opto/memnode.cpp line 1257: > >> 1255: //----------------------is_instance_field_load_with_local_phi------------------ >> 1256: bool LoadNode::is_instance_field_load_with_local_phi() { >> 1257: if( in(Memory)->is_Phi() && in(Address)->is_AddP() ) { > > Was like that before but usually, we should fix the code style when touching old code like spacing > ```suggestion > if (in(Memory)->is_Phi() && in(Address)->is_AddP()) { > > or `*` placement (should be at type). There are some more places below which you could also fix with this patch. Thanks for highlighting that practice! > src/hotspot/share/opto/memnode.cpp line 1303: > >> 1301: } >> 1302: >> 1303: if (is_boxed_value_load_with_local_phi(phase)) { > > Does `t_oop->is_ptr_to_boxed_value()` hold here? Should we assert this? Since there is not too much code after `t_oop->is_ptr_to_boxed_value()` is checked `is_boxed_value_load_with_local_phi()`, I think this assert might be unnecessary. https://github.com/openjdk/jdk/pull/21134/files#diff-ec810a1df7150822244e55ee309c86d6cbffe108ae9c72b6d258ea5758677c28R1276-R1280 bool base_is_phi = (base != NULL) && base->is_Phi(); bool load_boxed_value = t_oop != NULL && t_oop->is_ptr_to_boxed_value() && C->aggressive_unboxing() && (base != NULL) && (base == address->in(AddPNode::Base)) && phase->type(base)->higher_equal(TypePtr::NOTNULL); return base_is_phi && load_boxed_value; ``` > src/hotspot/share/opto/memnode.cpp line 1305: > >> 1303: if (is_boxed_value_load_with_local_phi(phase)) { >> 1304: intptr_t ignore = 0; >> 1305: Node * base = AddPNode::Ideal_base_and_offset(in(Address), phase, ignore); > > There was a null check before for `base`. Is this not required anymore? I just checked it in `is_boxed_value_load_with_local_phi()` as shown below. If `base` is `nullptr`, this code won't be executed. https://github.com/openjdk/jdk/pull/21134/files#diff-ec810a1df7150822244e55ee309c86d6cbffe108ae9c72b6d258ea5758677c28R1276-R1280 bool base_is_phi = (base != NULL) && base->is_Phi(); bool load_boxed_value = t_oop != NULL && t_oop->is_ptr_to_boxed_value() && C->aggressive_unboxing() && (base != NULL) && (base == address->in(AddPNode::Base)) && phase->type(base)->higher_equal(TypePtr::NOTNULL); return base_is_phi && load_boxed_value; > test/hotspot/jtreg/compiler/loopopts/TestInfiniteSplitInCaseOfBaseIsPhi.java line 27: > >> 25: * @test >> 26: * @bug 8340602 >> 27: * @requires vm.compiler2.enabled > > Since you run with Parallel GC, you should add a requires here: > Suggestion: > > * @requires vm.compiler2.enabled & & vm.gc.Parallel Thanks for reminding! I will update it. (Is there a redundant "&" in the suggested change?:P ) > test/hotspot/jtreg/compiler/loopopts/TestInfiniteSplitInCaseOfBaseIsPhi.java line 38: > >> 36: import java.util.Random; >> 37: >> 38: public class TestInfiniteSplitInCaseOfBaseIsPhi { > > You should fix the indentation to 4 spaces for Java code instead of 2. Indeed. I'll also update it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21134#issuecomment-2378301426 PR Review Comment: https://git.openjdk.org/jdk/pull/21134#discussion_r1777941250 PR Review Comment: https://git.openjdk.org/jdk/pull/21134#discussion_r1777941439 PR Review Comment: https://git.openjdk.org/jdk/pull/21134#discussion_r1777941352 PR Review Comment: https://git.openjdk.org/jdk/pull/21134#discussion_r1777939803 PR Review Comment: https://git.openjdk.org/jdk/pull/21134#discussion_r1777940261 From dlunden at openjdk.org Fri Sep 27 06:41:47 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 27 Sep 2024 06:41:47 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v6] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 23:31:00 GMT, Dean Long wrote: >> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: >> >> - Remove leftover debug var >> - Update >> - Merge tag 'jdk-24+16' into HEAD >> >> Added tag jdk-24+16 for changeset c58fbef0 >> - Formatting updates >> - Update >> - Update after Roberto's comments and suggestions >> - Add can_represent asserts >> - Remove leftover CHUNK_SIZE reference >> - Support methods with many arguments in C2 > > src/hotspot/share/opto/optoreg.hpp line 195: > >> 193: static constexpr bool can_fit(OptoReg::Name n) { >> 194: return n <= std::numeric_limits::max(); >> 195: } > > At first I thought this will always return true, but now I see it is checking OptoReg::Name against OptoRegPair::Name. I'll make it explicit and write `return n <= std::numeric_limits::max();` to avoid confusion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r1778083391 From jwaters at openjdk.org Fri Sep 27 07:11:41 2024 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 27 Sep 2024 07:11:41 GMT Subject: RFR: 8337679: Memset warning in src/hotspot/share/adlc/adlArena.cpp In-Reply-To: <9yDUB3FF_AsaKfYueiUWd-npcSrx9Iy7RPBaHqEDu4Q=.696a57cb-1880-4068-b7f9-f7de7dee4b5e@github.com> References: <9yDUB3FF_AsaKfYueiUWd-npcSrx9Iy7RPBaHqEDu4Q=.696a57cb-1880-4068-b7f9-f7de7dee4b5e@github.com> Message-ID: On Thu, 26 Sep 2024 09:16:11 GMT, Yagmur Eren wrote: > The purpose of this `memset` was to overwrite memory with garbage data before freeing it, helping detect bugs where the freed memory is accessed afterward. Therefore, removing it will no impact on functionality. Or it could be zapped with `memset_s` but zapping seems negligible in this case. It passes tier1 tests. See [JDK-8337679](https://bugs.openjdk.org/browse/JDK-8337679). Marked as reviewed by jwaters (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21200#pullrequestreview-2332917730 From jwaters at openjdk.org Fri Sep 27 07:11:42 2024 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 27 Sep 2024 07:11:42 GMT Subject: RFR: 8337679: Memset warning in src/hotspot/share/adlc/adlArena.cpp In-Reply-To: References: <9yDUB3FF_AsaKfYueiUWd-npcSrx9Iy7RPBaHqEDu4Q=.696a57cb-1880-4068-b7f9-f7de7dee4b5e@github.com> Message-ID: On Thu, 26 Sep 2024 16:08:39 GMT, Vladimir Kozlov wrote: > > Hmm, wouldn't this make detecting bugs where memory is incorrectly accessed harder? (Seeing that's the purpose of this particular memset) Not sure if I'm missing anything here. Does the warning trigger if the memset is commented out? > > We can use VM's variants `os::malloc()` and `os::free()` to catch dangling pointers as we do in other places in VM. This will avoid warning I think. On other hand `adlc` is used only during VM build and using system's `free()` will not help to catch such issues anyway. So I am fine with this change. I see. I would prefer if it was commented, but as everyone else is fine with this change I will approve as well ------------- PR Comment: https://git.openjdk.org/jdk/pull/21200#issuecomment-2378555257 From roland at openjdk.org Fri Sep 27 11:57:40 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 27 Sep 2024 11:57:40 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v17] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: <1rInK7y-uUfBurDSCoiYX_0NlgdXUB8KF49UbHekHVo=.95539c0e-d917-4e68-87bf-38a9fd9305f6@github.com> On Thu, 26 Sep 2024 16:19:27 GMT, Kangcheng Xu wrote: >> Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. >> >> Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > fix wrong expect values in test test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java line 349: > 347: int init1 = rng.nextInt(); > 348: int init2 = rng.nextInt(Integer.MIN_VALUE + i + 1, i); > 349: long init1L = rng.nextLong(Long.MIN_VALUE + i + 1, i); I understand the init2 computation I think (i - init2 should not overflow max signed int value) but I don't understand the `init1L` one. As far as I can tell, `init1L` is used the same way `init1` is used but one is for a test with a long variable and the other for a int variable. Why don't they use the same initialization pattern then? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1778500313 From roland at openjdk.org Fri Sep 27 12:27:38 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 27 Sep 2024 12:27:38 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v17] In-Reply-To: <1rInK7y-uUfBurDSCoiYX_0NlgdXUB8KF49UbHekHVo=.95539c0e-d917-4e68-87bf-38a9fd9305f6@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> <1rInK7y-uUfBurDSCoiYX_0NlgdXUB8KF49UbHekHVo=.95539c0e-d917-4e68-87bf-38a9fd9305f6@github.com> Message-ID: On Fri, 27 Sep 2024 11:55:02 GMT, Roland Westrelin wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> fix wrong expect values in test > > test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java line 349: > >> 347: int init1 = rng.nextInt(); >> 348: int init2 = rng.nextInt(Integer.MIN_VALUE + i + 1, i); >> 349: long init1L = rng.nextLong(Long.MIN_VALUE + i + 1, i); > > I understand the init2 computation I think (i - init2 should not overflow max signed int value) but I don't understand the `init1L` one. As far as I can tell, `init1L` is used the same way `init1` is used but one is for a test with a long variable and the other for a int variable. Why don't they use the same initialization pattern then? Can you add a comment as well? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1778534073 From roland at openjdk.org Fri Sep 27 12:55:38 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 27 Sep 2024 12:55:38 GMT Subject: RFR: 8340786: Introduce Predicate classes with predicate iterators and visitors for simplified walking [v3] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 07:42:54 GMT, Christian Hagedorn wrote: >> This patch introduces new predicate classes which implement a new `Predicate` interface. These classes represent the different predicates in the C2 IR. They are used in combination with new predicate iterator and visitors classes to provide an easy way to walk and process predicates in the IR. >> >> ### Predicate Interfaces and Implementing Classes >> - `Predicate` interface is implemented by four predicate classes: >> - `ParsePredicate` (existing class) >> - `RuntimePredicate` (existing and updated class) >> - `TemplateAssertionPredicate` (new class) >> - `InitializedAssertionPredicate` (new class, renamed old `InitializedAssertionPredicate` class to `InitializedAssertionPredicateCreator`) >> >> ### Predicate Iterator with Visitor classes >> There is a new `PredicateIterator` class which can be used to iterate through the predicates of a loop. For each predicate, a `PredicateVisitor` can be applied. The user can implement the `PredicateIterator` interface and override the default do-nothing implementations to the specific needs of the code. I've done this for a couple of places in the code by defining new visitors: >> - `ParsePredicateUsefulMarker`: This visitor marks all Parse Predicates as useful. >> - Replaces the old now retired `ParsePredicateIterator`. >> - `DominatedPredicates`: This visitor checks the dominance relation to an `early` node when trying to figure out the latest legal placement for a node. The goal is to skip as many predicates as possible to avoid interference with Loop Predication and/or creating a Loop Limit Check Predicate. >> - Replaces the old now retired `PredicateEntryIterator`. >> - `Predicates::dump()`: Newly added dumping code for the predicates above a loop which uses a new `PredicatePrinter` visitor. This helps debugging issues with predicates. >> >> #### To Be Replaced soon >> There are a couple of places where we use similar code to walk predicates and apply some transformation/modifications to the IR. The goal is to replace these locations with the new visitors as well. This will incrementally be done with the next couple of PRs. >> >> ### More Information >> More information about specific classes and changes can be found as code comments and PR comments. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Add missing public for UnifiedPredicateVisitor That looks reasonable to me. src/hotspot/share/opto/loopnode.cpp line 4325: > 4323: class ParsePredicateUsefulMarker : public PredicateVisitor { > 4324: public: > 4325: using PredicateVisitor::visit; Why is this needed? ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21161#pullrequestreview-2333671087 PR Review Comment: https://git.openjdk.org/jdk/pull/21161#discussion_r1778576842 From tholenstein at openjdk.org Fri Sep 27 13:14:37 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 27 Sep 2024 13:14:37 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access [v5] In-Reply-To: References: Message-ID: <7hkOe5MPAQGcp97dTz-9rbk4TlmPQSUvas46Fpejnic=.95ad3fba-1c20-48de-9919-1e21cf540ef8@github.com> On Wed, 25 Sep 2024 07:54:47 GMT, Tobias Hartmann wrote: >> Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: >> >> JTreg style > > Looks good to me too. > >> So it would make sense to go over the uses of Type*::BOTTOM/Type*::NOTNULL and check they are not tested with pointer equality > > What about this concern? Did anyone check yet or should be file a follow-up task? Thanks @TobiHartmann , @rwestrel , @eme64, @vnkozlov and @iwanowww for the reviews! If @iwanowww is ok with the changes, this PR is ready to integrate. I will delegate since I am out of office the next 3 weeks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20033#issuecomment-2379251776 From duke at openjdk.org Fri Sep 27 13:29:37 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Fri, 27 Sep 2024 13:29:37 GMT Subject: RFR: 8340363: Tag-specific default decorators for UnifiedLogging [v4] In-Reply-To: References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: On Tue, 24 Sep 2024 16:43:57 GMT, Ant?n Seoane wrote: >> Hi all, >> >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: >> >> - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. >> - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. >> - Additionally, defaults may target a specific log level. >> >> Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. >> >> In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. >> >> Please consider this PR, and thanks! > > Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: > > Removed whitespace Upon further comments and consideration, I am keeping this PR on hold and opening up to further discussion via the hotspot-dev mailing list: https://mail.openjdk.org/pipermail/hotspot-dev/2024-September/094810.html ------------- PR Comment: https://git.openjdk.org/jdk/pull/20988#issuecomment-2379285507 From roland at openjdk.org Fri Sep 27 13:44:52 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 27 Sep 2024 13:44:52 GMT Subject: RFR: 8340824: C2: Memory for TypeInterfaces not reclaimed by hashcons() [v2] In-Reply-To: References: Message-ID: > The list of interfaces for a `TypeInterfaces` is contained in a `GrowableArray` that's allocated in the type arena. When `hashcons()` deletes a `TypeInterfaces` object because an identical one exists, it can't reclaim memory for the object because it can only free the last thing that was allocated and that's the backing store for the `GrowableArray`, not the `TypeInterfaces` object. > > This patch changes the array of interfaces stored in `TypeInterfaces` into a pointer to a `GrowableArray`. `TypeInterfaces::make` calls `hashcons` with a temporary copy of the array of interfaces allocated in the current thread's resource area. This way if `hascons` deletes the `TypeInterfaces`, it is the last thing allocated in the type arena and memory can be reclaimed. Memory for the `GrowableArray` is freed as well on return from `TypeInterfaces::make`. If the newly allocated `TypeInterfaces` survives `hashcons` then a permanent array of interfaces is allocated in the type arena and linked from the `TypeInterfaces` object. > > I ran into this issue while working on a fix for a separate issue that causes more Type objects to be created. With that prototype fix, TestScalarReplacementMaxLiveNodes uses 1.4GB of memory at peak. With the patch I propose here on top, memory usage goes down to 150-200 MB which is in line with the peak memory usage for TestScalarReplacementMaxLiveNodes when run with current master. > > When I opened the PR initially, the fix I proposed was to let `GrowableArray` try to reclaim memory from an arena when destroyed. With that patch, some gtests failed when a `GrowableArray` is created by a copy constructor and the backing store for the array is shared. As a consequence, I reconsidered the fix and thought it was safer to go with a fix that only affects `TypeInterfaces`. Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - type interfaces footprint - Revert "fix" This reverts commit 3598dc08625269aa0a6ecff2a6903c4217b801ee. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21163/files - new: https://git.openjdk.org/jdk/pull/21163/files/3598dc08..43e2e91c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21163&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21163&range=00-01 Stats: 109 lines in 3 files changed: 22 ins; 22 del; 65 mod Patch: https://git.openjdk.org/jdk/pull/21163.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21163/head:pull/21163 PR: https://git.openjdk.org/jdk/pull/21163 From roland at openjdk.org Fri Sep 27 13:44:52 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 27 Sep 2024 13:44:52 GMT Subject: RFR: 8340824: C2: Memory for TypeInterfaces not reclaimed by hashcons() In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 15:53:06 GMT, Roland Westrelin wrote: > The list of interfaces for a `TypeInterfaces` is contained in a `GrowableArray` that's allocated in the type arena. When `hashcons()` deletes a `TypeInterfaces` object because an identical one exists, it can't reclaim memory for the object because it can only free the last thing that was allocated and that's the backing store for the `GrowableArray`, not the `TypeInterfaces` object. > > This patch changes the array of interfaces stored in `TypeInterfaces` into a pointer to a `GrowableArray`. `TypeInterfaces::make` calls `hashcons` with a temporary copy of the array of interfaces allocated in the current thread's resource area. This way if `hascons` deletes the `TypeInterfaces`, it is the last thing allocated in the type arena and memory can be reclaimed. Memory for the `GrowableArray` is freed as well on return from `TypeInterfaces::make`. If the newly allocated `TypeInterfaces` survives `hashcons` then a permanent array of interfaces is allocated in the type arena and linked from the `TypeInterfaces` object. > > I ran into this issue while working on a fix for a separate issue that causes more Type objects to be created. With that prototype fix, TestScalarReplacementMaxLiveNodes uses 1.4GB of memory at peak. With the patch I propose here on top, memory usage goes down to 150-200 MB which is in line with the peak memory usage for TestScalarReplacementMaxLiveNodes when run with current master. > > When I opened the PR initially, the fix I proposed was to let `GrowableArray` try to reclaim memory from an arena when destroyed. With that patch, some gtests failed when a `GrowableArray` is created by a copy constructor and the backing store for the array is shared. As a consequence, I reconsidered the fix and thought it was safer to go with a fix that only affects `TypeInterfaces`. Converted back to draft while I investigate gtest failures. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21163#issuecomment-2373908023 From tholenstein at openjdk.org Fri Sep 27 15:07:43 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 27 Sep 2024 15:07:43 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v21] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 13:30:14 GMT, Emanuel Peter wrote: >> **Motivation** >> >> I want to write small dedicated fuzzers: >> - Generate `java` and `jasm` source code: just some `String`. >> - Quickly compile it (with this framework). >> - Execute the compiled code. >> >> The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. >> >> **The CompileFramework** >> >> Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. >> An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. >> >> I implemented a first, simple version of the framework. I added some tests and examples. >> >> **Example** >> >> >> CompileFramework comp = new CompileFramework(); >> comp.add(SourceCode.newJavaSourceCode("XYZ", "")); >> comp.compile(); >> comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); >> >> >> https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 >> >> **Below some use cases: tests that would have been better with the CompileFramework** >> >> **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** >> >> I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. >> >> For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, >> >> to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something i... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Christian Hagedorn Nice framework! Looks good to me so far. Could you add an Example how to use the framework with VM flags? ------------- Marked as reviewed by tholenstein (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20184#pullrequestreview-2333985713 From iveresov at openjdk.org Fri Sep 27 16:08:12 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Fri, 27 Sep 2024 16:08:12 GMT Subject: RFR: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result Message-ID: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> This is essentially a defensive forward port of a solution for an issue discovered in 11, where a use of pinned load (for which we disable almost all transforms in `LoadNode::Ideal()`) leads to a graph shape that is not expected by `insert_anti_dependencies()` The `insert_anti_dependencies()` assumes that the memory input for a load is the root of the memory subgraph that it has to search for the possibly conflicting store. Usually this is true if we run all the memory optimizations but with pinned we don't. Compare the good graph shape (with control dependency set to `UnknownControl`): Good graph With the graph produce with the nodes pinned: Bad graph With the "bad graph" loads and store don't share the same memory graph root, and therefore are not considered by `insert_anti_dependencies()`. The solution, I think, could be to walk up the memory chain of the load, skipping MergeMems, in order to get to the real root and then run the precedence edge insertion algorithm from there. ------------- Commit messages: - 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result Changes: https://git.openjdk.org/jdk/pull/21222/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21222&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337066 Stats: 59 lines in 2 files changed: 59 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21222.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21222/head:pull/21222 PR: https://git.openjdk.org/jdk/pull/21222 From ascarpino at openjdk.org Fri Sep 27 16:21:41 2024 From: ascarpino at openjdk.org (Anthony Scarpino) Date: Fri, 27 Sep 2024 16:21:41 GMT Subject: RFR: 8337632: AES-GCM Algorithm optimization for x86_64 In-Reply-To: References: Message-ID: <4-_w9biw_gYMGyapdnjakw4aPvCq8EzwFt5RRdNe9UM=.1d88e032-7939-4bfd-b8d7-cb80ac105d0f@github.com> On Thu, 26 Sep 2024 22:07:18 GMT, Smita Kamath wrote: >> That's a good performance increase for such a small code change. I reviewed the simple java code change. I'll let a hotspot reviewer handle the rest of the code. > > @ascarpino I have two approvals for this PR. Would it be possible for you to run this through your testing? Thanks a lot! @smita-kamath ok. I'll let you know when I have results. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17515#issuecomment-2379641516 From azvegint at openjdk.org Fri Sep 27 16:29:49 2024 From: azvegint at openjdk.org (Alexander Zvegintsev) Date: Fri, 27 Sep 2024 16:29:49 GMT Subject: Integrated: 8341096: ProblemList compiler/cha/TypeProfileFinalMethod.java in Xcomp mode In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 16:20:44 GMT, Daniel D. Daugherty wrote: > A trivial fix to ProblemList compiler/cha/TypeProfileFinalMethod.java in Xcomp mode. Marked as reviewed by azvegint (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21224#pullrequestreview-2334162942 From dcubed at openjdk.org Fri Sep 27 16:29:48 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 27 Sep 2024 16:29:48 GMT Subject: Integrated: 8341096: ProblemList compiler/cha/TypeProfileFinalMethod.java in Xcomp mode Message-ID: A trivial fix to ProblemList compiler/cha/TypeProfileFinalMethod.java in Xcomp mode. ------------- Commit messages: - 8341096: ProblemList compiler/cha/TypeProfileFinalMethod.java in Xcomp mode Changes: https://git.openjdk.org/jdk/pull/21224/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21224&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341096 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21224/head:pull/21224 PR: https://git.openjdk.org/jdk/pull/21224 From dcubed at openjdk.org Fri Sep 27 16:29:49 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 27 Sep 2024 16:29:49 GMT Subject: Integrated: 8341096: ProblemList compiler/cha/TypeProfileFinalMethod.java in Xcomp mode In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 16:22:46 GMT, Alexander Zvegintsev wrote: >> A trivial fix to ProblemList compiler/cha/TypeProfileFinalMethod.java in Xcomp mode. > > Marked as reviewed by azvegint (Reviewer). @azvegint - Thanks for the lightning fast review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21224#issuecomment-2379649409 From dcubed at openjdk.org Fri Sep 27 16:29:49 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 27 Sep 2024 16:29:49 GMT Subject: Integrated: 8341096: ProblemList compiler/cha/TypeProfileFinalMethod.java in Xcomp mode In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 16:20:44 GMT, Daniel D. Daugherty wrote: > A trivial fix to ProblemList compiler/cha/TypeProfileFinalMethod.java in Xcomp mode. This pull request has now been integrated. Changeset: 5aae3d40 Author: Daniel D. Daugherty URL: https://git.openjdk.org/jdk/commit/5aae3d40856d92e1e0ff744cb1a0d3421c3dfd5b Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8341096: ProblemList compiler/cha/TypeProfileFinalMethod.java in Xcomp mode Reviewed-by: azvegint ------------- PR: https://git.openjdk.org/jdk/pull/21224 From kvn at openjdk.org Fri Sep 27 17:34:38 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 27 Sep 2024 17:34:38 GMT Subject: RFR: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result In-Reply-To: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> References: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> Message-ID: <9kF42tnA1qVGkpW1hRaObBtueDCnO0wgZfhpOArNLnI=.11131d42-f12f-4559-9c47-24aa4e527782@github.com> On Fri, 27 Sep 2024 16:02:29 GMT, Igor Veresov wrote: > This is essentially a defensive forward port of a solution for an issue discovered in 11, where a use of pinned load (for which we disable almost all transforms in `LoadNode::Ideal()`) leads to a graph shape that is not expected by `insert_anti_dependencies()` > > The `insert_anti_dependencies()` assumes that the memory input for a load is the root of the memory subgraph that it has to search for the possibly conflicting store. Usually this is true if we run all the memory optimizations but with pinned we don't. > > Compare the good graph shape (with control dependency set to `UnknownControl`): > Good graph > > With the graph produce with the nodes pinned: > Bad graph > > With the "bad graph" loads and store don't share the same memory graph root, and therefore are not considered by `insert_anti_dependencies()`. The solution, I think, could be to walk up the memory chain of the load, skipping MergeMems, in order to get to the real root and then run the precedence edge insertion algorithm from there. src/hotspot/share/opto/gcm.cpp line 753: > 751: // root of our search tree through the corresponding slices of MergeMem nodes to > 752: // get to the node that really creates the memory state for this slice. > 753: if (load_alias_idx >= Compile::AliasIdxRaw) { This will be executed for all loads and not only pinned. Is this okay? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21222#discussion_r1778943508 From duke at openjdk.org Fri Sep 27 17:39:40 2024 From: duke at openjdk.org (Yagmur Eren) Date: Fri, 27 Sep 2024 17:39:40 GMT Subject: Integrated: 8337679: Memset warning in src/hotspot/share/adlc/adlArena.cpp In-Reply-To: <9yDUB3FF_AsaKfYueiUWd-npcSrx9Iy7RPBaHqEDu4Q=.696a57cb-1880-4068-b7f9-f7de7dee4b5e@github.com> References: <9yDUB3FF_AsaKfYueiUWd-npcSrx9Iy7RPBaHqEDu4Q=.696a57cb-1880-4068-b7f9-f7de7dee4b5e@github.com> Message-ID: <9KixSwM0ffyfGF-Tl9qawpqcf5k5AmPlkX9L64amE9A=.41c5b024-0aeb-4f26-837d-bbebe1c3ff20@github.com> On Thu, 26 Sep 2024 09:16:11 GMT, Yagmur Eren wrote: > The purpose of this `memset` was to overwrite memory with garbage data before freeing it, helping detect bugs where the freed memory is accessed afterward. Therefore, removing it will no impact on functionality. Or it could be zapped with `memset_s` but zapping seems negligible in this case. It passes tier1 tests. See [JDK-8337679](https://bugs.openjdk.org/browse/JDK-8337679). This pull request has now been integrated. Changeset: a7bfced6 Author: Yagmur Eren Committer: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/a7bfced60540fe8d4fa7360bff512337ea47b890 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod 8337679: Memset warning in src/hotspot/share/adlc/adlArena.cpp Reviewed-by: stefank, thartmann, jwaters ------------- PR: https://git.openjdk.org/jdk/pull/21200 From qamai at openjdk.org Fri Sep 27 18:12:53 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 27 Sep 2024 18:12:53 GMT Subject: RFR: 8341102: Add element type information to vector types Message-ID: Hi, This patch adds the type information of each element in a `TypeVect`. This helps constant folding vectors as well as strength reduction of several complex operations such as `Rearrange`. Some notable points: - I only implement `ConV` rule on x86, looking at other architectures it seems that I would not only need to implement the `ConV` implementations, but several other rules that match `ReplicateNode` of a constant. - I changed the implementation of an array constant in `constanttable`, I think working with `jbyte` is easier as it allows `memcpy` and at this point, we are close to the metal anyway. - Constant folding for a `VectorUnboxNode`, this is special because an element of a normal stable array is only constant if it is non-zero, so implementing constant folding on a load node seems less trivial. - Memory fences because `Vector::payload` is a final field and we should respect that. - Several places expect a `const Type*` when in reality it expects a `BasicType`, I refactor that so that the intent is clearer and there is less room for possible errors, this is needed because `byte`, `short` and `int` share the same kind of `const Type*`. Please take a look and leave your reviews, thanks a lot. ------------- Commit messages: - build error - add element types to vector types Changes: https://git.openjdk.org/jdk/pull/21229/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21229&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341102 Stats: 1401 lines in 39 files changed: 863 ins; 332 del; 206 mod Patch: https://git.openjdk.org/jdk/pull/21229.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21229/head:pull/21229 PR: https://git.openjdk.org/jdk/pull/21229 From qamai at openjdk.org Fri Sep 27 18:31:35 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 27 Sep 2024 18:31:35 GMT Subject: RFR: 8340824: C2: Memory for TypeInterfaces not reclaimed by hashcons() [v2] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 13:44:52 GMT, Roland Westrelin wrote: >> The list of interfaces for a `TypeInterfaces` is contained in a `GrowableArray` that's allocated in the type arena. When `hashcons()` deletes a `TypeInterfaces` object because an identical one exists, it can't reclaim memory for the object because it can only free the last thing that was allocated and that's the backing store for the `GrowableArray`, not the `TypeInterfaces` object. >> >> This patch changes the array of interfaces stored in `TypeInterfaces` into a pointer to a `GrowableArray`. `TypeInterfaces::make` calls `hashcons` with a temporary copy of the array of interfaces allocated in the current thread's resource area. This way if `hascons` deletes the `TypeInterfaces`, it is the last thing allocated in the type arena and memory can be reclaimed. Memory for the `GrowableArray` is freed as well on return from `TypeInterfaces::make`. If the newly allocated `TypeInterfaces` survives `hashcons` then a permanent array of interfaces is allocated in the type arena and linked from the `TypeInterfaces` object. >> >> I ran into this issue while working on a fix for a separate issue that causes more Type objects to be created. With that prototype fix, TestScalarReplacementMaxLiveNodes uses 1.4GB of memory at peak. With the patch I propose here on top, memory usage goes down to 150-200 MB which is in line with the peak memory usage for TestScalarReplacementMaxLiveNodes when run with current master. >> >> When I opened the PR initially, the fix I proposed was to let `GrowableArray` try to reclaim memory from an arena when destroyed. With that patch, some gtests failed when a `GrowableArray` is created by a copy constructor and the backing store for the array is shared. As a consequence, I reconsidered the fix and thought it was safer to go with a fix that only affects `TypeInterfaces`. > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - type interfaces footprint > - Revert "fix" > > This reverts commit 3598dc08625269aa0a6ecff2a6903c4217b801ee. src/hotspot/share/opto/type.cpp line 3270: > 3268: } > 3269: > 3270: const TypeInterfaces* TypeInterfaces::make(const GrowableArray* interfaces) { I think you can make `_interface` a `ciInstanceKlass**` and do this: void* ptr = Type::operator new(sizeof(TypeInterfaces) + sizeof(ciInstanceKlass*) * interfaces->length()) Then `delete ptr` should drop the whole thing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21163#discussion_r1779001214 From qamai at openjdk.org Fri Sep 27 18:54:36 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 27 Sep 2024 18:54:36 GMT Subject: RFR: 8340824: C2: Memory for TypeInterfaces not reclaimed by hashcons() [v2] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 18:28:37 GMT, Quan Anh Mai wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - type interfaces footprint >> - Revert "fix" >> >> This reverts commit 3598dc08625269aa0a6ecff2a6903c4217b801ee. > > src/hotspot/share/opto/type.cpp line 3270: > >> 3268: } >> 3269: >> 3270: const TypeInterfaces* TypeInterfaces::make(const GrowableArray* interfaces) { > > I think you can make `_interface` a `ciInstanceKlass**` and do this: > > void* ptr = Type::operator new(sizeof(TypeInterfaces) + sizeof(ciInstanceKlass*) * interfaces->length()) > > Then `delete ptr` should drop the whole thing. A `GrowableArrayFromArray` would be mostly compatible with the interface of `GrowableArray`, too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21163#discussion_r1779023497 From iveresov at openjdk.org Fri Sep 27 19:00:40 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Fri, 27 Sep 2024 19:00:40 GMT Subject: RFR: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result In-Reply-To: <9kF42tnA1qVGkpW1hRaObBtueDCnO0wgZfhpOArNLnI=.11131d42-f12f-4559-9c47-24aa4e527782@github.com> References: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> <9kF42tnA1qVGkpW1hRaObBtueDCnO0wgZfhpOArNLnI=.11131d42-f12f-4559-9c47-24aa4e527782@github.com> Message-ID: On Fri, 27 Sep 2024 17:32:27 GMT, Vladimir Kozlov wrote: >> This is essentially a defensive forward port of a solution for an issue discovered in 11, where a use of pinned load (for which we disable almost all transforms in `LoadNode::Ideal()`) leads to a graph shape that is not expected by `insert_anti_dependencies()` >> >> The `insert_anti_dependencies()` assumes that the memory input for a load is the root of the memory subgraph that it has to search for the possibly conflicting store. Usually this is true if we run all the memory optimizations but with pinned we don't. >> >> Compare the good graph shape (with control dependency set to `UnknownControl`): >> Good graph >> >> With the graph produce with the nodes pinned: >> Bad graph >> >> With the "bad graph" loads and store don't share the same memory graph root, and therefore are not considered by `insert_anti_dependencies()`. The solution, I think, could be to walk up the memory chain of the load, skipping MergeMems, in order to get to the real root and then run the precedence edge insertion algorithm from there. > > src/hotspot/share/opto/gcm.cpp line 753: > >> 751: // root of our search tree through the corresponding slices of MergeMem nodes to >> 752: // get to the node that really creates the memory state for this slice. >> 753: if (load_alias_idx >= Compile::AliasIdxRaw) { > > This will be executed for all loads and not only pinned. Is this okay? Normally loads would not have a MergeMem as their memory input at this stage. `Ideal()` splits the memory, so that the memory input of a load is as precise as possible. So I think, that loop is benign for a regular load. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21222#discussion_r1779031708 From kvn at openjdk.org Fri Sep 27 19:17:34 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 27 Sep 2024 19:17:34 GMT Subject: RFR: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result In-Reply-To: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> References: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> Message-ID: On Fri, 27 Sep 2024 16:02:29 GMT, Igor Veresov wrote: > This is essentially a defensive forward port of a solution for an issue discovered in 11, where a use of pinned load (for which we disable almost all transforms in `LoadNode::Ideal()`) leads to a graph shape that is not expected by `insert_anti_dependencies()` > > The `insert_anti_dependencies()` assumes that the memory input for a load is the root of the memory subgraph that it has to search for the possibly conflicting store. Usually this is true if we run all the memory optimizations but with pinned we don't. > > Compare the good graph shape (with control dependency set to `UnknownControl`): > Good graph > > With the graph produce with the nodes pinned: > Bad graph > > With the "bad graph" loads and store don't share the same memory graph root, and therefore are not considered by `insert_anti_dependencies()`. The solution, I think, could be to walk up the memory chain of the load, skipping MergeMems, in order to get to the real root and then run the precedence edge insertion algorithm from there. Please, run additional testing I asked in JBS. ------------- PR Review: https://git.openjdk.org/jdk/pull/21222#pullrequestreview-2334472584 From iveresov at openjdk.org Fri Sep 27 19:17:35 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Fri, 27 Sep 2024 19:17:35 GMT Subject: RFR: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result In-Reply-To: References: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> <9kF42tnA1qVGkpW1hRaObBtueDCnO0wgZfhpOArNLnI=.11131d42-f12f-4559-9c47-24aa4e527782@github.com> Message-ID: On Fri, 27 Sep 2024 18:57:55 GMT, Igor Veresov wrote: >> src/hotspot/share/opto/gcm.cpp line 753: >> >>> 751: // root of our search tree through the corresponding slices of MergeMem nodes to >>> 752: // get to the node that really creates the memory state for this slice. >>> 753: if (load_alias_idx >= Compile::AliasIdxRaw) { >> >> This will be executed for all loads and not only pinned. Is this okay? > > Normally loads would not have a MergeMem as their memory input at this stage. `Ideal()` splits the memory, so that the memory input of a load is as precise as possible. So I think, that loop is benign for a regular load. Also if there is a MergeMem as a root for some weird reason then `insert_anti_dependencies()` may very well miss an interfering store. So we'd have to do this loop for correctness. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21222#discussion_r1779049266 From kvn at openjdk.org Fri Sep 27 19:17:35 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 27 Sep 2024 19:17:35 GMT Subject: RFR: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result In-Reply-To: References: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> <9kF42tnA1qVGkpW1hRaObBtueDCnO0wgZfhpOArNLnI=.11131d42-f12f-4559-9c47-24aa4e527782@github.com> Message-ID: On Fri, 27 Sep 2024 19:12:02 GMT, Igor Veresov wrote: >> Normally loads would not have a MergeMem as their memory input at this stage. `Ideal()` splits the memory, so that the memory input of a load is as precise as possible. So I think, that loop is benign for a regular load. > > Also if there is a MergeMem as a root for some weird reason then `insert_anti_dependencies()` may very well miss an interfering store. So we'd have to do this loop for correctness. Okay ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21222#discussion_r1779051056 From iveresov at openjdk.org Fri Sep 27 19:46:15 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Fri, 27 Sep 2024 19:46:15 GMT Subject: RFR: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result [v2] In-Reply-To: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> References: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> Message-ID: > This is essentially a defensive forward port of a solution for an issue discovered in 11, where a use of pinned load (for which we disable almost all transforms in `LoadNode::Ideal()`) leads to a graph shape that is not expected by `insert_anti_dependencies()` > > The `insert_anti_dependencies()` assumes that the memory input for a load is the root of the memory subgraph that it has to search for the possibly conflicting store. Usually this is true if we run all the memory optimizations but with pinned we don't. > > Compare the good graph shape (with control dependency set to `UnknownControl`): > Good graph > > With the graph produce with the nodes pinned: > Bad graph > > With the "bad graph" loads and store don't share the same memory graph root, and therefore are not considered by `insert_anti_dependencies()`. The solution, I think, could be to walk up the memory chain of the load, skipping MergeMems, in order to get to the real root and then run the precedence edge insertion algorithm from there. Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result Summary: Make sure insert_anti_dependencies() starts from the right root ------------- Changes: https://git.openjdk.org/jdk/pull/21222/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21222&range=01 Stats: 59 lines in 2 files changed: 59 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21222.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21222/head:pull/21222 PR: https://git.openjdk.org/jdk/pull/21222 From ascarpino at openjdk.org Fri Sep 27 21:07:37 2024 From: ascarpino at openjdk.org (Anthony Scarpino) Date: Fri, 27 Sep 2024 21:07:37 GMT Subject: RFR: 8337632: AES-GCM Algorithm optimization for x86_64 [v4] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 16:51:36 GMT, Smita Kamath wrote: >> Hi, >> I want to submit an AES-GCM algorithm optimization. This implementation is using AVX512/VAES Instructions. Additionally, it reduces PARALLEL_LEN from 7680 to 512 bytes. The performance numbers are as below. Kindly review the code. Thank you. >> >> Benchmark | Datasize | BaseJDK (ops/s) | Patch(ops/s) | %Gain >> -- | -- | -- | -- | -- >> full.AESGCMBench.decrypt | 512 | 2928259.197 | 3269964.387 | 11.67 >> full.AESGCMBench.decrypt | 1024 | 2494254.611 | 3010987.731 | 20.72 >> full.AESGCMBench.decrypt | 1500 | 1883453.546 | 1934915.846 | 2.73 >> full.AESGCMBench.decrypt | 2048 | 1825780.711 | 2452861.368 | 34.34 >> full.AESGCMBench.decrypt | 4096 | 1275108.345 | 1806329.066 | 41.66 >> full.AESGCMBench.decrypt | 8192 | 1033936.634 | 1196836.052 | 15.75 >> full.AESGCMBench.decrypt | 16384 | 681494.768 | 711630.498 | 4.42 >> full.AESGCMBench.decrypt | 32768 | 385026.017 | 395043.193 | 2.6 >> full.AESGCMBench.decrypt | 65536 | 207373.924 | 214723.588 | 3.54 >> ? | ? | ? | ? | ? >> full.AESGCMBench.encrypt | 512 | 2658008.476 | 2882496.94 | 8.45 >> full.AESGCMBench.encrypt | 1024 | 2283709.63 | 2589534.403 | 13.39 >> full.AESGCMBench.encrypt | 1500 | 1794993.519 | 1817669.531 | 1.26 >> full.AESGCMBench.encrypt | 2048 | 1745532.435 | 2191097.29 | 25.52 >> full.AESGCMBench.encrypt | 4096 | 1203301.174 | 1649593.953 | 37.08 >> full.AESGCMBench.encrypt | 8192 | 985174.988 | 1132407.54 | 14.94 >> full.AESGCMBench.encrypt | 16384 | 658980.441 | 684765.771 | 3.91 >> full.AESGCMBench.encrypt | 32768 | 373543.798 | 391518.837 | 4.81 >> full.AESGCMBench.encrypt | 65536 | 202532.315 | 205084.833 | 1.260301597 > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments All tests passed. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17515#issuecomment-2380058215 From kxu at openjdk.org Fri Sep 27 21:55:26 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Fri, 27 Sep 2024 21:55:26 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v18] In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: Update TestParallelIvInIntCountedLoop.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18489/files - new: https://git.openjdk.org/jdk/pull/18489/files/de6b7e20..5d1ee27a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=16-17 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18489.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489 PR: https://git.openjdk.org/jdk/pull/18489 From kxu at openjdk.org Fri Sep 27 21:55:26 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Fri, 27 Sep 2024 21:55:26 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v17] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> <1rInK7y-uUfBurDSCoiYX_0NlgdXUB8KF49UbHekHVo=.95539c0e-d917-4e68-87bf-38a9fd9305f6@github.com> Message-ID: On Fri, 27 Sep 2024 12:24:32 GMT, Roland Westrelin wrote: >> test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java line 349: >> >>> 347: int init1 = rng.nextInt(); >>> 348: int init2 = rng.nextInt(Integer.MIN_VALUE + i + 1, i); >>> 349: long init1L = rng.nextLong(Long.MIN_VALUE + i + 1, i); >> >> I understand the init2 computation I think (i - init2 should not overflow max signed int value) but I don't understand the `init1L` one. As far as I can tell, `init1L` is used the same way `init1` is used but one is for a test with a long variable and the other for a int variable. Why don't they use the same initialization pattern then? > > Can you add a comment as well? Good point. You're right. I don't know why I did that. Added a comment about it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1779209829 From iveresov at openjdk.org Fri Sep 27 23:23:35 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Fri, 27 Sep 2024 23:23:35 GMT Subject: RFR: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result [v2] In-Reply-To: References: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> Message-ID: On Fri, 27 Sep 2024 19:15:17 GMT, Vladimir Kozlov wrote: > Please, run additional testing I asked in JBS. Good call. This showed a failure in one of the vectorization tests that is probably caused by this change. I need to take a further look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21222#issuecomment-2380275052 From qamai at openjdk.org Sat Sep 28 01:44:54 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 28 Sep 2024 01:44:54 GMT Subject: RFR: 8341102: Add element type information to vector types [v2] In-Reply-To: References: Message-ID: > Hi, > > This patch adds the type information of each element in a `TypeVect`. This helps constant folding vectors as well as strength reduction of several complex operations such as `Rearrange`. Some notable points: > > - I only implement `ConV` rule on x86, looking at other architectures it seems that I would not only need to implement the `ConV` implementations, but several other rules that match `ReplicateNode` of a constant. > - I changed the implementation of an array constant in `constanttable`, I think working with `jbyte` is easier as it allows `memcpy` and at this point, we are close to the metal anyway. > - Constant folding for a `VectorUnboxNode`, this is special because an element of a normal stable array is only constant if it is non-zero, so implementing constant folding on a load node seems less trivial. > - Memory fences because `Vector::payload` is a final field and we should respect that. > - Several places expect a `const Type*` when in reality it expects a `BasicType`, I refactor that so that the intent is clearer and there is less room for possible errors, this is needed because `byte`, `short` and `int` share the same kind of `const Type*`. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: add mask test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21229/files - new: https://git.openjdk.org/jdk/pull/21229/files/c9bc1c4f..cd5123f3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21229&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21229&range=00-01 Stats: 20 lines in 1 file changed: 19 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21229.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21229/head:pull/21229 PR: https://git.openjdk.org/jdk/pull/21229 From fjiang at openjdk.org Sat Sep 28 11:55:45 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Sat, 28 Sep 2024 11:55:45 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v23] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 07:57:15 GMT, Roberto Casta?eda Lozano wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with seven additional commits since the last revision: >> >> - Assert that unneeded stub tmp registers are not initialized in x64 and aarch64 platforms >> - Set tmp registers to noreg by default in G1PreBarrierStubC2::initialize_registers, for consistency >> - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion >> - Restore some asserts >> - Default values for tmp regs of G1PostBarrierStubC2 >> - 8334060: [arm32] Implementation of Late Barrier Expansion for G1 >> - 8330685: [arm32] share barrier spilling logic > > Thanks for the arm 32-bits port @snazarkin! Merged in commit 3957c03f. > Besides the arm 32-bits port, @snazarkin's changeset includes adding the possibility to use a third temporary register in the platform-independent class `G1PostBarrierStubC2`. This temporary register (`G1PostBarrierStubC2::_tmp3`) is initialized to `noreg` by default in `G1PostBarrierStubC2::initialize_registers`, so no other platform should be affected. Hi @robcasloz, riscv port cleanup is available at https://github.com/feilongjiang/jdk/commit/1297f6086e1de62196e2acddf2f7c86a29619dd7, would you please help to apply it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2380614984 From jbhateja at openjdk.org Sun Sep 29 04:26:19 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 29 Sep 2024 04:26:19 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction Message-ID: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> This patch optimizes LongVector multiplication by inferring VPMULUDQ instruction for following IR pallets. MulL ( And SRC1, 0xFFFFFFFF) ( And SRC2, 0xFFFFFFFF) MulL (URShift SRC1 , 32) (URShift SRC2, 32) MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. VPMULUDQ instruction performs unsigned multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- Sierra Forest :- ============ Baseline:- Benchmark (SIZE) Mode Cnt Score Error Units VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms With Optimization:- Benchmark (SIZE) Mode Cnt Score Error Units VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 1299.407 ops/ms VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 504.995 ops/ms VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 327.544 ops/ms VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 160.963 ops/ms Granite Rapids:- ============= Baseline:- Benchmark (SIZE) Mode Cnt Score Error Units VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 2279.099 ops/ms VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 1148.609 ops/ms VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 570.848 ops/ms VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 268.872 ops/ms With Optimization:- Benchmark (SIZE) Mode Cnt Score Error Units VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 2612.484 ops/ms VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 1308.187 ops/ms VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 653.375 ops/ms VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 316.182 ops/ms Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction Changes: https://git.openjdk.org/jdk/pull/21244/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21244&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341137 Stats: 355 lines in 12 files changed: 343 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/21244.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21244/head:pull/21244 PR: https://git.openjdk.org/jdk/pull/21244 From fjiang at openjdk.org Sun Sep 29 10:57:06 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Sun, 29 Sep 2024 10:57:06 GMT Subject: RFR: 8341146: RISC-V: Unnecessary fences used for load-acquire in template interpreter Message-ID: <_FRFrY50v0MTOOsTdZDhqcXjPORwH-f2YrVOkjU6Y0Q=.de5f994c-febd-46bc-be07-cfd67ae7a85d@github.com> Hi, please consider. RISC-V does not currently have plain load and store opcodes with aq or rl annotations, load-acquire and store-release operations are implemented using fences instead. Initially, we followed the RISC-V spec and placed FENCE RW,RW fence in front of load-acquire operation when porting the template interpreter. The purpose is to enforce a store-release-to-load-acquire ordering (where there must be a FENCE RW,RW between the store-release and load-acquire). But it turns out these fences are unnecessary for our use cases in the template interpreter. In fact, we only need to do a single FENCE R,RW after a normal memory load in order to implement a load-acquire operation. We should remove those unnecessary fences for both performance reasons and for consistency with the rest of the port (i.e., C1 and C2 JIT). Testing: - [x] JCstress - [x] hs-tier1 - hs-tier4 - [x] ~5% improvement on SPECJbb2005 score (-Xint -XX:+UseParallelGC) ------------- Commit messages: - remove unnecessary membar in template interpreter Changes: https://git.openjdk.org/jdk/pull/21248/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21248&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341146 Stats: 9 lines in 1 file changed: 0 ins; 9 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21248.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21248/head:pull/21248 PR: https://git.openjdk.org/jdk/pull/21248 From iveresov at openjdk.org Sun Sep 29 20:05:35 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Sun, 29 Sep 2024 20:05:35 GMT Subject: RFR: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result [v2] In-Reply-To: References: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> Message-ID: On Fri, 27 Sep 2024 23:21:05 GMT, Igor Veresov wrote: > > Please, run additional testing I asked in JBS. > > Good call. This showed a failure in one of the vectorization tests that is probably caused by this change. I need to take a further look. I ran TestAlignVectorFuzzer a bunch of times during the weekend and this failure mode is preexisting. No caused by my change. So everything is clean. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21222#issuecomment-2381584987 From fyang at openjdk.org Mon Sep 30 03:18:40 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 30 Sep 2024 03:18:40 GMT Subject: RFR: 8341146: RISC-V: Unnecessary fences used for load-acquire in template interpreter In-Reply-To: <_FRFrY50v0MTOOsTdZDhqcXjPORwH-f2YrVOkjU6Y0Q=.de5f994c-febd-46bc-be07-cfd67ae7a85d@github.com> References: <_FRFrY50v0MTOOsTdZDhqcXjPORwH-f2YrVOkjU6Y0Q=.de5f994c-febd-46bc-be07-cfd67ae7a85d@github.com> Message-ID: On Sun, 29 Sep 2024 10:52:25 GMT, Feilong Jiang wrote: > Hi, please consider. > > RISC-V does not currently have plain load and store opcodes with aq or rl annotations, load-acquire and > store-release operations are implemented using fences instead. Initially, we followed the RISC-V spec > and placed FENCE RW,RW fence in front of load-acquire operation when porting the template interpreter. > The purpose is to enforce a store-release-to-load-acquire ordering (where there must be a FENCE RW,RW > between the store-release and load-acquire). But it turns out these fences are unnecessary for our use > cases in the template interpreter. In fact, we only need to do a single FENCE R,RW after a normal memory > load in order to implement a load-acquire operation. We should remove those unnecessary fences for both > performance reasons and for consistency with the rest of the port (i.e., C1 and C2 JIT). > > Testing: > - [x] JCstress > - [x] hs-tier1 - hs-tier4 > - [x] ~5% improvement on SPECJbb2005 score (-Xint -XX:+UseParallelGC) Looks reasonable. Thanks for the cleanup! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21248#pullrequestreview-2336258064 From rcastanedalo at openjdk.org Mon Sep 30 05:02:12 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 30 Sep 2024 05:02:12 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion - riscv port refactor - Remove temporary support code - Merge jdk-24+17 - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes - Merge jdk-24+16 - Ensure that detected encode-and-store patterns are matched - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion - ... and 43 more: https://git.openjdk.org/jdk/compare/8ee5f762...14483b83 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/6fb36e50..14483b83 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=25-26 Stats: 19042 lines in 408 files changed: 13042 ins; 3680 del; 2320 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From thartmann at openjdk.org Mon Sep 30 05:30:37 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 30 Sep 2024 05:30:37 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access [v8] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 14:45:01 GMT, Tobias Holenstein wrote: >> We failed in `LibraryCallKit::inline_unsafe_access()` while trying to inline `Unsafe::getShortUnaligned`. >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java#L86 >> The reason is that base (the array) is `ConP #null` hidden behind two `CheckCastPP` with `speculative=byte[int:>=0]` >> >> We call `Node* adr = make_unsafe_address(base, offset, type, kind == Relaxed);` >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2361 >> - with **base** = `147 CheckCastPP` >> - `118 ConP === 0 [[[ 106 101 71 ] #null` >> type >> >> Depending on the **offset** we go two different paths in `LibraryCallKit::make_unsafe_address` which both lead to the same error in the end. >> 1. For `UNSAFE.getShortUnaligned(array, 1_049_000)` we get kind = `Type::AnyPtr` because `offset >= os::vm_page_size()`. Since we assume base can't be null we insert an assert: >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2111 >> >> 2. whereas for `UNSAFE.getShortUnaligned(array, 1)` we get kind = `Type:: OopPtr` >> https://github.com/openjdk/jdk/blob/c17fa910cf3bad48547a3f0d68a30795ec3194e6/src/hotspot/share/opto/library_call.cpp#L2078 >> and insert a null check >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2090 >> In both cases we return call `basic_plus_adr(..)` on a base being `top()` which returns **adr** = `1 Con === 0 [[ ]] #top` >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2386 => `_gvn.type(adr)` is _top_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2394 => `adr_type` is _nullptr_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2405-L2406 => `BasicType bt` is _T_ILLEGAL_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2424 => we fail here with `SIGSEGV: null pointer dereference` because `alias_type->adr_type()` is _nullptr_ >> >> ### Fix (updated on 18th Sep 2024) >> T... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > add second uncast (Vladimirs suggestion) Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20033#pullrequestreview-2336419292 From kxu at openjdk.org Mon Sep 30 06:21:51 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 30 Sep 2024 06:21:51 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v5] In-Reply-To: References: Message-ID: > This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. > > As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). > > Some notable examples of this transformation include: > - `a + a + a` => `a*3` => `(a<<1) + a` > - `a + a + a + a` => `a*4` => `a<<2` > - `a*3 + a` => `a*4` => `a<<2` > - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` > > See included IR unit tests for more. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: Arithmetic canonicalization v3 (#3) * 8340144: C1: remove unused Compilation::_max_spills Reviewed-by: thartmann, shade * 8340176: Replace usage of -noclassgc with -Xnoclassgc in test/jdk/java/lang/management/MemoryMXBean/LowMemoryTest2.java Reviewed-by: kevinw, lmesnik * 8339793: Fix incorrect APX feature enabling with -XX:-UseAPX Reviewed-by: kvn, thartmann, sviswanathan * 8340184: Bug in CompressedKlassPointers::is_in_encodable_range Reviewed-by: coleenp, rkennke, jsjolen * 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() Reviewed-by: roland, chagedorn, jkarthikeyan * 8340119: Remove oopDesc::size_might_change() Reviewed-by: stefank, iwalulya * 8340009: Improve the output from assert_different_registers Reviewed-by: aboldtch, dholmes, shade, mli * 8340273: Remove CounterHalfLifeTime Reviewed-by: chagedorn, dholmes * 8338566: Lazy creation of exception instances is not thread safe Reviewed-by: shade, kvn, dlong * 8339648: ZGC: Division by zero in rule_major_allocation_rate Reviewed-by: aboldtch, lucy, tschatzl * 8329816: Add SLEEF version 3.6.1 Reviewed-by: erikj, mli, luhenry * 8315273: (fs) Path.toRealPath(LinkOption.NOFOLLOW_LINKS) fails when "../../" follows a link (win) Reviewed-by: djelinski * 8339574: Behavior of File.is{Directory,File,Hidden} is not documented with respect to symlinks Reviewed-by: djelinski, alanb * 8340200: Misspelled constant `AttributesProcessingOption.DROP_UNSTABLE_ATRIBUTES` Reviewed-by: liach * 8339934: Simplify Math.scalb(double) method Reviewed-by: darcy * 8339790: Support Intel APX setzucc instruction Reviewed-by: sviswanathan, jkarthikeyan, kvn * 8340323: Test jdk/classfile/OptionsTest.java fails after JDK-8340200 Reviewed-by: alanb * 8338686: App classpath mismatch if a jar from the Class-Path attribute is on the classpath Reviewed-by: dholmes, iklam * 8337563: NMT: rename MEMFLAGS to MemTag Reviewed-by: dholmes, coleenp, jsjolen * 8340210: Add positionTestUI() to PassFailJFrame.Builder Co-authored-by: Alexey Ivanov Reviewed-by: aivanov, azvegint * 8340132: Remove internal CpException for reading malformed utf8 Reviewed-by: asotona * 8340213: jcmd VM.events ignores max argument Reviewed-by: szaldana, cjplummer, amenkov, mli * 8340015: Open source several AWT focus tests - series 7 Reviewed-by: honkar * 8340280: Avoid calling MT.invokerType() when creating LambdaForms Reviewed-by: liach, jvernee * 8333258: C2: high memory usage in PhaseCFG::insert_anti_dependences() Reviewed-by: kvn, epeter * 8340230: Tests crash: assert(is_in_encoding_range || k->is_interface() || k->is_abstract()) failed: sanity Reviewed-by: thartmann, kvn * 8319873: Add windows implementation for jcmd System.map and System.dump_map Co-authored-by: Simon Tooke Reviewed-by: stuefe, kevinw, szaldana * 8339845: Update color.org and wapforum.org links to use HTTPS instead of HTTP Reviewed-by: prr, honkar, aivanov * 8340113: Remove JULONG as a Diagnostic Command argument type (jcmd JFR.view) Reviewed-by: lmesnik, egahlin * 8340272: C2 SuperWord: JMH benchmark for Reduction vectorization Reviewed-by: kvn, jkarthikeyan * 8337302: Undefined type variable results in null Reviewed-by: liach * 8339738: RISC-V: Vectorize crc32 intrinsic Reviewed-by: fyang, luhenry * 8340368: windows-x64-slowdebug build fails after JDK-8319873 Reviewed-by: jpai, kevinw, aboldtch, eosterlund * 8339992: RISC-V: some minor improvements of base64_vector_decode_round Reviewed-by: fyang, luhenry * 8340233: Missed ThreadWXEnable in jfrNativeLibraryLoadEvent.cpp Reviewed-by: mgronlun * 8340391: Windows jcmd System.map and System.dump_map tests failing Reviewed-by: cjplummer * 8339962: Open source AWT TextField tests - Set1 Reviewed-by: jdv, dnguyen, prr * 8340078: Open source several 2D tests Reviewed-by: honkar * 8340360: Update -mx to -Xmx in UnninstallUIMemoryLeaks test Reviewed-by: serb, prr * 8339980: [s390x] ProblemList jdk/java/util/zip/CloseInflaterDeflaterTest.java Reviewed-by: lucy * 8339416: [s390x] Provide implementation for resolve_global_jobject Reviewed-by: mdoerr, lucy * 8286851: Deprecate for removal several of the undocumented java launcher options Reviewed-by: dholmes * 8340276: Test java/lang/management/ThreadMXBean/Locks.java failed with NullPointerException Reviewed-by: cjplummer, lmesnik * 8338759: Add extra diagnostic to java/net/InetAddress/ptr/Lookup.java Reviewed-by: dfuchs, shade * 8337674: ZGC: Consistent style for naming private static constants Reviewed-by: stefank, aboldtch, mli * 8340007: Refactor KeyEvent/FunctionKeyTest.java Reviewed-by: azvegint * 8340306: Add border around instructions in PassFailJFrame Reviewed-by: honkar, prr * 8339787: Add some additional diagnostic output to java/net/ipv6tests/UdpTest.java Reviewed-by: dfuchs * 8338995: New Object to ObjectMonitor mapping: PPC64 implementation Reviewed-by: rrich, lucy * 8331391: Enhance the keytool code by invoking the buildTrustedCerts method for essential options Reviewed-by: coffeys, mullan * 8298614: Support CDS heap dumping for SerialGC and ParallelGC Reviewed-by: dholmes, lmesnik, iklam * 8338693: assert(Atomic::add(&ik->_shared_class_load_count, 1) == 1) failed: shared class loaded more than once Reviewed-by: iklam, dholmes * 8340329: (fs) Message of NotLinkException thrown by FIles.readSymbolicLink does not include file name (win) Reviewed-by: alanb * 8339735: Remove references to Applet in core-libs/security APIs Reviewed-by: coffeys, naoto, iris, rriggs, lancea, mullan * 8340271: Open source several AWT Robot tests Reviewed-by: abhiscxk, honkar * 8340308: PassFailJFrame: Make rows default to number of lines in instructions Reviewed-by: honkar, azvegint * 8340399: Update comment in SourceVersion for language evolution history Reviewed-by: iris * 8340166: [REDO] CDS: Trim down minimum GC region alignment Reviewed-by: ccheung, iklam * 8340400: Shenandoah: Whitebox breakpoint GC requests may cause assertions Reviewed-by: shade * 8339902: Open source couple TextField related tests Reviewed-by: honkar * 8340353: Remove CompressedOops::ptrs_base Reviewed-by: stefank, coleenp, shade, mli * 8340480: Bad copyright notices in changes from JDK-8339902 Reviewed-by: kcr, bpb, kizune * 8339192: Native annotation parsing code of deprecated annotations causes crash Reviewed-by: jrose, mgronlun * 8339895: Open source several AWT focus tests - series 3 Reviewed-by: prr * 8340436: Remove unused CompressedOops::AnyNarrowOopMode Reviewed-by: haosun, dholmes * 8339984: Open source AWT MenuItem related tests Reviewed-by: aivanov * 8339906: Open source several AWT focus tests - series 4 Reviewed-by: abhiscxk, prr * 8340418: GHA: MacOS AArch64 bundles can be removed prematurely Reviewed-by: erikj * 8340439: AArch64: Extra entry declaration for assember test Reviewed-by: haosun, lmesnik, mli * 8340456: Reduce overhead of proxying Object methods in ProxyGenerator Reviewed-by: liach * 8340438: RISC-V: minor improvement in base64 Reviewed-by: fyang * 8340008: KeyEvent/KeyTyped/Numpad1KeyTyped.java has 15 seconds timeout Reviewed-by: azvegint, prr * 8339972: Make a few fields in SortingFocusTraversalPolicy static Reviewed-by: azvegint, aivanov * 8340540: Problemlist DcmdMBeanPermissionsTest.java and SystemDumpMapTest.java Reviewed-by: kevinw * 8338658: New Object to ObjectMonitor mapping: s390x implementation Reviewed-by: lucy, mdoerr * 8340269: [s390x] TestLargeStub.java failure after 8338123 Reviewed-by: mdoerr, lucy * 8340537: Typo in javadoc of java.util.jar.JarFile Reviewed-by: mullan, lancea, iris * 8339198: Remove tag field from AbstractPoolEntry Reviewed-by: asotona, redestad * 8340232: Optimize DataInputStream::readUTF Reviewed-by: liach, bpb * 8338471: Assert deleted methods not returned by CallInfo Reviewed-by: shade, jwaters, dholmes * 8340092: [Linux] containers/systemd/SystemdMemoryAwarenessTest.java failing on some systems Reviewed-by: mbaesken * 8339781: Better use of Javadoc tags in javax.lang.model Reviewed-by: jjg * 8339217: Optimize ClassFile API loadConstant Reviewed-by: liach, redestad, asotona * 8340544: Optimize setLocalsFromArg Reviewed-by: redestad, liach * 8340524: Remove NarrowPtrStruct Reviewed-by: shade, jwaters * 8340387: Update OS detection code to recognize Windows Server 2025 Reviewed-by: mdoerr, jwaters, dholmes * 8340171: CDS: Enhance bitmap truncation Reviewed-by: matsaave, iklam * 8340392: Handle OopStorage in location decoder Reviewed-by: kbarrett, dholmes * 8340573: Remove unused G1ParScanThreadState::_partial_objarray_chunk_size Reviewed-by: tschatzl * 8340084: Open source AWT Frame related tests Reviewed-by: psadhukhan, honkar * 8339852: Fix typos in java.compiler documentation Reviewed-by: liach, darcy * 8325949: Create an internal utility method for creating VarHandle instances Reviewed-by: rriggs * 8339161: ZGC: Remove unused remembered sets Reviewed-by: aboldtch, stefank * 8335334: Stress mode to randomly execute unstable if traps Reviewed-by: chagedorn, kvn * 8340393: Open source closed choice tests #2 Reviewed-by: psadhukhan * 8340183: Shenandoah: Incorrect match for clone barrier in is_gc_barrier_node Reviewed-by: roland, rkennke * 8336025: Improve ZipOutputSream validation of MAX CEN Header field limits Reviewed-by: alanb * 8319332: Security properties files inclusion Co-authored-by: Francisco Ferrari Bihurriet Co-authored-by: Martin Balao Reviewed-by: weijun, mullan, kdriver * 8340461: Amend description for logArea Reviewed-by: azvegint, prr * 8340411: open source several 2D imaging tests Reviewed-by: azvegint * 8340365: Position the first window of a window list Reviewed-by: azvegint, prr * WIP: v3 * 8338918: Remove non translated file name from WinResources resource bundle Reviewed-by: jlu, almatvee * 8340707: ProblemList applications/ctw/modules/java_base.java due to JDK-8340683 Reviewed-by: darcy * 8340114: Remove outdated SelectVersion() function from the launcher and update the code comments explaining the code flow Reviewed-by: dholmes, alanb * 8339995: Open source several AWT focus tests - series 6 Reviewed-by: prr * 8340596: Remove dead code from RequiresSetenv function in java.base/unix/native/libjli/java_md.c Reviewed-by: dholmes * 8340367: Opensource few AWT image tests Reviewed-by: prr * 8340146: ZGC: TestAllocateHeapAt.java should not run with UseLargePages Reviewed-by: tschatzl, stefank * 8323688: C2: Fix UB of jlong overflow in PhaseIdealLoop::is_counted_loop() Reviewed-by: thartmann, kvn * 8340590: RISC-V: C2: Small improvement to vector gather load and scatter store Reviewed-by: fyang, dzhang * 8340623: Remove outdated PROCESSOR_ARCHITECTURE_IA64 from Windows coding Reviewed-by: alanb, dholmes * 8335167: Test runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved Reviewed-by: stuefe, dholmes, gziemski * 8340585: [JVMCI] compiler/unsafe/UnsafeGetStableArrayElement.java fails with -XX:-UseCompressedClassPointers Reviewed-by: dnsimon * 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option Reviewed-by: dnsimon * 8340680: Fix typos in javax.lang.model.SourceVersion Reviewed-by: darcy, iris * 8339299: C1 will miss type profile when inline final method Reviewed-by: lmesnik, vlivanov * 8340657: [PPC64] SA determines wrong unextendedSP Reviewed-by: ysuenaga, mbaesken * 8340383: VM issues warning failure to find kernel32.dll on Windows nanoserver Reviewed-by: dholmes, jwaters * 8340408: Shenandoah: Remove redundant task stats printing code in ShenandoahTaskQueue Reviewed-by: shade, wkemper * 8338546: Speed up ConstantPoolBuilder::classEntry(ClassDesc) Reviewed-by: asotona, redestad * 8338405: JFR: Use FILE type for dcmds Reviewed-by: egahlin, lmesnik * 8340793: Fix client builds after JDK-8337987 Reviewed-by: shade, fyang * 8338694: x86_64 intrinsic for tanh using libm Reviewed-by: kvn, jbhateja, sgibbons, sviswanathan * 8340143: Open source several Java2D rendering loop tests. Reviewed-by: psadhukhan * 8340433: Open source closed choice tests #3 Reviewed-by: honkar, prr * 8340670: Policy.UNSUPPORTED_EMPTY_COLLECTION.isReadOnly does not return true Reviewed-by: mullan * 8340804: doc/building.md update Xcode instructions to note that full install is required Reviewed-by: erikj, jwaters * 8338525: Leading and trailing code blocks by indentation Reviewed-by: hannesw, prappo * 8340717: Remove unused function declarations from java.c/java.h of the launcher Reviewed-by: alanb, dholmes, shade, jwaters * 8340643: RISC-V: Small refactoring for sub/subw macro-assembler routines Reviewed-by: fyang, luhenry * 8340708: Optimize StackMapGenerator::processMethod Reviewed-by: liach * 8340587: Optimize StackMapGenerator$Frame::checkAssignableTo Reviewed-by: liach * 8340710: Optimize DirectClassBuilder::build Reviewed-by: liach * 8339935: Open source several AWT focus tests - series 5 Reviewed-by: prr * 8339771: RISC-V: Reduce icache flushes Reviewed-by: fyang, mli, luhenry * 8340808: RISC-V: Client build fails after JDK-8339738 Reviewed-by: fyang * 8340843: [PPC64/s390x] Error: ShouldNotReachHere() in TemplateInterpreterGenerator::generate_math_entry after 8338694 Reviewed-by: mbaesken, amitkumar * 8339541: CSS rule is not specific enough Reviewed-by: jjg * 8340885: Desugar ZipCoder.Comparison Reviewed-by: lancea, eirbjo * 8340568: Incorrect escaping of single quotes when pretty-printing character literals Reviewed-by: mcimadamore * 8338583: NMT: Malloc overhead is calculated incorrectly Reviewed-by: azafari, yan, gziemski * WIP: use UseNewCode * 8340815: Add SECURITY.md file Reviewed-by: mr, jwaters, erikj * 8340946: Add vmTestbase/gc/memory/Nio/Nio.java and java/nio/Buffer/LimitDirectMemory.java to problem list Reviewed-by: liach, dcubed, alanb * 8340684: Reading from an input stream backed by a closed ZipFile has no test coverage Reviewed-by: lancea * 8340228: Open source couple more miscellaneous AWT tests Reviewed-by: prr * 8340956: ProblemList 4 java/nio/channels/DatagramChannel tests on macosx-all Reviewed-by: liach, alanb, darcy, dfuchs * 8340838: Clean up MutableCallSite to use explicit release fence instead of AtomicInteger Reviewed-by: jrose, redestad, shade * 8340831: Simplify simple validation for class definition in MethodHandles.Lookup Reviewed-by: redestad * 8340864: Remove unused lines related to vmClasses Reviewed-by: shade, kvn * WIP: fixed lshift base term matching * WIP: removed UseNewCode * 8339271: giflib attribution correction Reviewed-by: dnguyen, prr * 8340812: LambdaForm customization via MethodHandle::updateForm is not thread safe Reviewed-by: liach, shade, jvernee * 8339260: Move rarely used constants out of ClassFile Reviewed-by: asotona * 8340923: The class LogSelection copies uninitialized memory Reviewed-by: mbaesken, jwaters, stefank * 8340899: Remove wildcard bound in PositionWindows.positionTestWindows Reviewed-by: azvegint, prr * 8340466: Add description for PassFailJFrame constructors Reviewed-by: prr, honkar * 8339542: compiler/codecache/CheckSegmentedCodeCache.java fails Reviewed-by: mdoerr, shade * 8340687: Open source closed frame tests #1 Reviewed-by: aivanov * 8339560: Unaddressed comments during code review of JDK-8337664 Reviewed-by: mullan * 8336942: Improve test coverage for class loading elements with annotations of different retentions Reviewed-by: vromero * 8336468: Reflection and MethodHandles should use more precise initializer checks Reviewed-by: liach, coleenp * 8336895: BufferedReader doesn't read full \r\n line ending when it doesn't fit in buffer Reviewed-by: jpai, alanb * 8339460: CDS error when module is located in a directory with space in the name Reviewed-by: ccheung, iklam * 8340981: Update citations to "Hacker's Delight" Reviewed-by: bpb, iris, liach, jwaters * 8340983: Use index and definition tags in Object and Double Reviewed-by: bpb, liach * 8333403: Write a test to check various components events are triggered properly Reviewed-by: aivanov * revert changing AddI/LNodeIdealizationTests * 8339261: Logs truncated in test javax/net/ssl/DTLS/DTLSRehandshakeTest.java Reviewed-by: rhalade, hchao * fixed power-of-2 multiplication detection * fixed power-of-2 multiplication detection * refactor lshift multiplier calculation. updated comments --------- Co-authored-by: Denghui Dong Co-authored-by: Jaikiran Pai Co-authored-by: Jatin Bhateja Co-authored-by: Thomas Stuefe Co-authored-by: Thomas Schatzl Co-authored-by: Stefan Karlsson Co-authored-by: Daniel Lund?n Co-authored-by: Tobias Hartmann Co-authored-by: Matthias Baesken Co-authored-by: Magnus Ihse Bursie Co-authored-by: Brian Burkhalter Co-authored-by: David M. Lloyd Co-authored-by: Raffaello Giulietti Co-authored-by: Chen Liang Co-authored-by: Calvin Cheung Co-authored-by: Gerard Ziemski Co-authored-by: Harshitha Onkar Co-authored-by: Alexey Ivanov Co-authored-by: Leonid Mesnik Co-authored-by: Prasanta Sadhukhan Co-authored-by: Claes Redestad Co-authored-by: Roland Westrelin Co-authored-by: Martin Doerr Co-authored-by: Simon Tooke Co-authored-by: Nizar Benalla Co-authored-by: Kevin Walls Co-authored-by: Emanuel Peter Co-authored-by: Rafael Winterhalter Co-authored-by: Hamlin Li Co-authored-by: Phil Race Co-authored-by: Amit Kumar Co-authored-by: Serhiy Sachkov Co-authored-by: Joel Sikstr?m Co-authored-by: Prasadrao Koppula Co-authored-by: Matias Saavedra Silva Co-authored-by: Justin Lu Co-authored-by: Joe Darcy Co-authored-by: Aleksey Shipilev Co-authored-by: William Kemper Co-authored-by: Alexander Zuev Co-authored-by: Kim Barrett Co-authored-by: David Holmes Co-authored-by: Abhishek Kumar Co-authored-by: SendaoYan Co-authored-by: Andrey Turbanov Co-authored-by: Shaojin Wen Co-authored-by: Coleen Phillimore Co-authored-by: Severin Gehwolf Co-authored-by: Pavel Rappo Co-authored-by: Per Minborg Co-authored-by: Alexander Zvegintsev Co-authored-by: Lance Andersen Co-authored-by: Francisco Ferrari Bihurriet Co-authored-by: Martin Balao Co-authored-by: Alexey Semenyuk Co-authored-by: Axel Boldt-Christmas Co-authored-by: Christian Hagedorn Co-authored-by: Gui Cao Co-authored-by: Afshin Zafari Co-authored-by: Yudi Zheng Co-authored-by: Tomas Zezula Co-authored-by: Kuai Wei Co-authored-by: George Adams Co-authored-by: Zhengyu Gu Co-authored-by: Sonia Zaldana Calles Co-authored-by: Andrew Dinn Co-authored-by: vamsi-parasa Co-authored-by: Artur Barashev Co-authored-by: Jonathan Gibbons Co-authored-by: Robbin Ehn Co-authored-by: Hannes Walln?fer Co-authored-by: Liam Miller-Cushon Co-authored-by: Leonov Kirill <91743110+kirleo2 at users.noreply.github.com> Co-authored-by: Eirik Bj?rsn?s Co-authored-by: Daniel D. Daugherty Co-authored-by: Ioi Lam Co-authored-by: Alisen Chung Co-authored-by: Johan Sj?len Co-authored-by: Lutz Schmidt Co-authored-by: Fernando Guallini Co-authored-by: Maxim Kartashev Co-authored-by: Ravi Gupta ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20754/files - new: https://git.openjdk.org/jdk/pull/20754/files/f9ca1124..b767a772 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20754&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20754&range=03-04 Stats: 158225 lines in 838 files changed: 150795 ins; 3459 del; 3971 mod Patch: https://git.openjdk.org/jdk/pull/20754.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20754/head:pull/20754 PR: https://git.openjdk.org/jdk/pull/20754 From kxu at openjdk.org Mon Sep 30 06:22:17 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 30 Sep 2024 06:22:17 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v6] In-Reply-To: References: Message-ID: > This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. > > As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). > > Some notable examples of this transformation include: > - `a + a + a` => `a*3` => `(a<<1) + a` > - `a + a + a + a` => `a*4` => `a<<2` > - `a*3 + a` => `a*4` => `a<<2` > - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` > > See included IR unit tests for more. Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits: - resolve conflicts - resolve conflicts - Arithmetic canonicalization v3 (#3) * 8340144: C1: remove unused Compilation::_max_spills Reviewed-by: thartmann, shade * 8340176: Replace usage of -noclassgc with -Xnoclassgc in test/jdk/java/lang/management/MemoryMXBean/LowMemoryTest2.java Reviewed-by: kevinw, lmesnik * 8339793: Fix incorrect APX feature enabling with -XX:-UseAPX Reviewed-by: kvn, thartmann, sviswanathan * 8340184: Bug in CompressedKlassPointers::is_in_encodable_range Reviewed-by: coleenp, rkennke, jsjolen * 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() Reviewed-by: roland, chagedorn, jkarthikeyan * 8340119: Remove oopDesc::size_might_change() Reviewed-by: stefank, iwalulya * 8340009: Improve the output from assert_different_registers Reviewed-by: aboldtch, dholmes, shade, mli * 8340273: Remove CounterHalfLifeTime Reviewed-by: chagedorn, dholmes * 8338566: Lazy creation of exception instances is not thread safe Reviewed-by: shade, kvn, dlong * 8339648: ZGC: Division by zero in rule_major_allocation_rate Reviewed-by: aboldtch, lucy, tschatzl * 8329816: Add SLEEF version 3.6.1 Reviewed-by: erikj, mli, luhenry * 8315273: (fs) Path.toRealPath(LinkOption.NOFOLLOW_LINKS) fails when "../../" follows a link (win) Reviewed-by: djelinski * 8339574: Behavior of File.is{Directory,File,Hidden} is not documented with respect to symlinks Reviewed-by: djelinski, alanb * 8340200: Misspelled constant `AttributesProcessingOption.DROP_UNSTABLE_ATRIBUTES` Reviewed-by: liach * 8339934: Simplify Math.scalb(double) method Reviewed-by: darcy * 8339790: Support Intel APX setzucc instruction Reviewed-by: sviswanathan, jkarthikeyan, kvn * 8340323: Test jdk/classfile/OptionsTest.java fails after JDK-8340200 Reviewed-by: alanb * 8338686: App classpath mismatch if a jar from the Class-Path attribute is on the classpath Reviewed-by: dholmes, iklam * 8337563: NMT: rename MEMFLAGS to MemTag Reviewed-by: dholmes, coleenp, jsjolen * 8340210: Add positionTestUI() to PassFailJFrame.Builder Co-authored-by: Alexey Ivanov Reviewed-by: aivanov, azvegint * 8340132: Remove internal CpException for reading malformed utf8 Reviewed-by: asotona * 8340213: jcmd VM.events ignores max argument Reviewed-by: szaldana, cjplummer, amenkov, mli * 8340015: Open source several AWT focus tests - series 7 Reviewed-by: honkar * 8340280: Avoid calling MT.invokerType() when creating LambdaForms Reviewed-by: liach, jvernee * 8333258: C2: high memory usage in PhaseCFG::insert_anti_dependences() Reviewed-by: kvn, epeter * 8340230: Tests crash: assert(is_in_encoding_range || k->is_interface() || k->is_abstract()) failed: sanity Reviewed-by: thartmann, kvn * 8319873: Add windows implementation for jcmd System.map and System.dump_map Co-authored-by: Simon Tooke Reviewed-by: stuefe, kevinw, szaldana * 8339845: Update color.org and wapforum.org links to use HTTPS instead of HTTP Reviewed-by: prr, honkar, aivanov * 8340113: Remove JULONG as a Diagnostic Command argument type (jcmd JFR.view) Reviewed-by: lmesnik, egahlin * 8340272: C2 SuperWord: JMH benchmark for Reduction vectorization Reviewed-by: kvn, jkarthikeyan * 8337302: Undefined type variable results in null Reviewed-by: liach * 8339738: RISC-V: Vectorize crc32 intrinsic Reviewed-by: fyang, luhenry * 8340368: windows-x64-slowdebug build fails after JDK-8319873 Reviewed-by: jpai, kevinw, aboldtch, eosterlund * 8339992: RISC-V: some minor improvements of base64_vector_decode_round Reviewed-by: fyang, luhenry * 8340233: Missed ThreadWXEnable in jfrNativeLibraryLoadEvent.cpp Reviewed-by: mgronlun * 8340391: Windows jcmd System.map and System.dump_map tests failing Reviewed-by: cjplummer * 8339962: Open source AWT TextField tests - Set1 Reviewed-by: jdv, dnguyen, prr * 8340078: Open source several 2D tests Reviewed-by: honkar * 8340360: Update -mx to -Xmx in UnninstallUIMemoryLeaks test Reviewed-by: serb, prr * 8339980: [s390x] ProblemList jdk/java/util/zip/CloseInflaterDeflaterTest.java Reviewed-by: lucy * 8339416: [s390x] Provide implementation for resolve_global_jobject Reviewed-by: mdoerr, lucy * 8286851: Deprecate for removal several of the undocumented java launcher options Reviewed-by: dholmes * 8340276: Test java/lang/management/ThreadMXBean/Locks.java failed with NullPointerException Reviewed-by: cjplummer, lmesnik * 8338759: Add extra diagnostic to java/net/InetAddress/ptr/Lookup.java Reviewed-by: dfuchs, shade * 8337674: ZGC: Consistent style for naming private static constants Reviewed-by: stefank, aboldtch, mli * 8340007: Refactor KeyEvent/FunctionKeyTest.java Reviewed-by: azvegint * 8340306: Add border around instructions in PassFailJFrame Reviewed-by: honkar, prr * 8339787: Add some additional diagnostic output to java/net/ipv6tests/UdpTest.java Reviewed-by: dfuchs * 8338995: New Object to ObjectMonitor mapping: PPC64 implementation Reviewed-by: rrich, lucy * 8331391: Enhance the keytool code by invoking the buildTrustedCerts method for essential options Reviewed-by: coffeys, mullan * 8298614: Support CDS heap dumping for SerialGC and ParallelGC Reviewed-by: dholmes, lmesnik, iklam * 8338693: assert(Atomic::add(&ik->_shared_class_load_count, 1) == 1) failed: shared class loaded more than once Reviewed-by: iklam, dholmes * 8340329: (fs) Message of NotLinkException thrown by FIles.readSymbolicLink does not include file name (win) Reviewed-by: alanb * 8339735: Remove references to Applet in core-libs/security APIs Reviewed-by: coffeys, naoto, iris, rriggs, lancea, mullan * 8340271: Open source several AWT Robot tests Reviewed-by: abhiscxk, honkar * 8340308: PassFailJFrame: Make rows default to number of lines in instructions Reviewed-by: honkar, azvegint * 8340399: Update comment in SourceVersion for language evolution history Reviewed-by: iris * 8340166: [REDO] CDS: Trim down minimum GC region alignment Reviewed-by: ccheung, iklam * 8340400: Shenandoah: Whitebox breakpoint GC requests may cause assertions Reviewed-by: shade * 8339902: Open source couple TextField related tests Reviewed-by: honkar * 8340353: Remove CompressedOops::ptrs_base Reviewed-by: stefank, coleenp, shade, mli * 8340480: Bad copyright notices in changes from JDK-8339902 Reviewed-by: kcr, bpb, kizune * 8339192: Native annotation parsing code of deprecated annotations causes crash Reviewed-by: jrose, mgronlun * 8339895: Open source several AWT focus tests - series 3 Reviewed-by: prr * 8340436: Remove unused CompressedOops::AnyNarrowOopMode Reviewed-by: haosun, dholmes * 8339984: Open source AWT MenuItem related tests Reviewed-by: aivanov * 8339906: Open source several AWT focus tests - series 4 Reviewed-by: abhiscxk, prr * 8340418: GHA: MacOS AArch64 bundles can be removed prematurely Reviewed-by: erikj * 8340439: AArch64: Extra entry declaration for assember test Reviewed-by: haosun, lmesnik, mli * 8340456: Reduce overhead of proxying Object methods in ProxyGenerator Reviewed-by: liach * 8340438: RISC-V: minor improvement in base64 Reviewed-by: fyang * 8340008: KeyEvent/KeyTyped/Numpad1KeyTyped.java has 15 seconds timeout Reviewed-by: azvegint, prr * 8339972: Make a few fields in SortingFocusTraversalPolicy static Reviewed-by: azvegint, aivanov * 8340540: Problemlist DcmdMBeanPermissionsTest.java and SystemDumpMapTest.java Reviewed-by: kevinw * 8338658: New Object to ObjectMonitor mapping: s390x implementation Reviewed-by: lucy, mdoerr * 8340269: [s390x] TestLargeStub.java failure after 8338123 Reviewed-by: mdoerr, lucy * 8340537: Typo in javadoc of java.util.jar.JarFile Reviewed-by: mullan, lancea, iris * 8339198: Remove tag field from AbstractPoolEntry Reviewed-by: asotona, redestad * 8340232: Optimize DataInputStream::readUTF Reviewed-by: liach, bpb * 8338471: Assert deleted methods not returned by CallInfo Reviewed-by: shade, jwaters, dholmes * 8340092: [Linux] containers/systemd/SystemdMemoryAwarenessTest.java failing on some systems Reviewed-by: mbaesken * 8339781: Better use of Javadoc tags in javax.lang.model Reviewed-by: jjg * 8339217: Optimize ClassFile API loadConstant Reviewed-by: liach, redestad, asotona * 8340544: Optimize setLocalsFromArg Reviewed-by: redestad, liach * 8340524: Remove NarrowPtrStruct Reviewed-by: shade, jwaters * 8340387: Update OS detection code to recognize Windows Server 2025 Reviewed-by: mdoerr, jwaters, dholmes * 8340171: CDS: Enhance bitmap truncation Reviewed-by: matsaave, iklam * 8340392: Handle OopStorage in location decoder Reviewed-by: kbarrett, dholmes * 8340573: Remove unused G1ParScanThreadState::_partial_objarray_chunk_size Reviewed-by: tschatzl * 8340084: Open source AWT Frame related tests Reviewed-by: psadhukhan, honkar * 8339852: Fix typos in java.compiler documentation Reviewed-by: liach, darcy * 8325949: Create an internal utility method for creating VarHandle instances Reviewed-by: rriggs * 8339161: ZGC: Remove unused remembered sets Reviewed-by: aboldtch, stefank * 8335334: Stress mode to randomly execute unstable if traps Reviewed-by: chagedorn, kvn * 8340393: Open source closed choice tests #2 Reviewed-by: psadhukhan * 8340183: Shenandoah: Incorrect match for clone barrier in is_gc_barrier_node Reviewed-by: roland, rkennke * 8336025: Improve ZipOutputSream validation of MAX CEN Header field limits Reviewed-by: alanb * 8319332: Security properties files inclusion Co-authored-by: Francisco Ferrari Bihurriet Co-authored-by: Martin Balao Reviewed-by: weijun, mullan, kdriver * 8340461: Amend description for logArea Reviewed-by: azvegint, prr * 8340411: open source several 2D imaging tests Reviewed-by: azvegint * 8340365: Position the first window of a window list Reviewed-by: azvegint, prr * WIP: v3 * 8338918: Remove non translated file name from WinResources resource bundle Reviewed-by: jlu, almatvee * 8340707: ProblemList applications/ctw/modules/java_base.java due to JDK-8340683 Reviewed-by: darcy * 8340114: Remove outdated SelectVersion() function from the launcher and update the code comments explaining the code flow Reviewed-by: dholmes, alanb * 8339995: Open source several AWT focus tests - series 6 Reviewed-by: prr * 8340596: Remove dead code from RequiresSetenv function in java.base/unix/native/libjli/java_md.c Reviewed-by: dholmes * 8340367: Opensource few AWT image tests Reviewed-by: prr * 8340146: ZGC: TestAllocateHeapAt.java should not run with UseLargePages Reviewed-by: tschatzl, stefank * 8323688: C2: Fix UB of jlong overflow in PhaseIdealLoop::is_counted_loop() Reviewed-by: thartmann, kvn * 8340590: RISC-V: C2: Small improvement to vector gather load and scatter store Reviewed-by: fyang, dzhang * 8340623: Remove outdated PROCESSOR_ARCHITECTURE_IA64 from Windows coding Reviewed-by: alanb, dholmes * 8335167: Test runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved Reviewed-by: stuefe, dholmes, gziemski * 8340585: [JVMCI] compiler/unsafe/UnsafeGetStableArrayElement.java fails with -XX:-UseCompressedClassPointers Reviewed-by: dnsimon * 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option Reviewed-by: dnsimon * 8340680: Fix typos in javax.lang.model.SourceVersion Reviewed-by: darcy, iris * 8339299: C1 will miss type profile when inline final method Reviewed-by: lmesnik, vlivanov * 8340657: [PPC64] SA determines wrong unextendedSP Reviewed-by: ysuenaga, mbaesken * 8340383: VM issues warning failure to find kernel32.dll on Windows nanoserver Reviewed-by: dholmes, jwaters * 8340408: Shenandoah: Remove redundant task stats printing code in ShenandoahTaskQueue Reviewed-by: shade, wkemper * 8338546: Speed up ConstantPoolBuilder::classEntry(ClassDesc) Reviewed-by: asotona, redestad * 8338405: JFR: Use FILE type for dcmds Reviewed-by: egahlin, lmesnik * 8340793: Fix client builds after JDK-8337987 Reviewed-by: shade, fyang * 8338694: x86_64 intrinsic for tanh using libm Reviewed-by: kvn, jbhateja, sgibbons, sviswanathan * 8340143: Open source several Java2D rendering loop tests. Reviewed-by: psadhukhan * 8340433: Open source closed choice tests #3 Reviewed-by: honkar, prr * 8340670: Policy.UNSUPPORTED_EMPTY_COLLECTION.isReadOnly does not return true Reviewed-by: mullan * 8340804: doc/building.md update Xcode instructions to note that full install is required Reviewed-by: erikj, jwaters * 8338525: Leading and trailing code blocks by indentation Reviewed-by: hannesw, prappo * 8340717: Remove unused function declarations from java.c/java.h of the launcher Reviewed-by: alanb, dholmes, shade, jwaters * 8340643: RISC-V: Small refactoring for sub/subw macro-assembler routines Reviewed-by: fyang, luhenry * 8340708: Optimize StackMapGenerator::processMethod Reviewed-by: liach * 8340587: Optimize StackMapGenerator$Frame::checkAssignableTo Reviewed-by: liach * 8340710: Optimize DirectClassBuilder::build Reviewed-by: liach * 8339935: Open source several AWT focus tests - series 5 Reviewed-by: prr * 8339771: RISC-V: Reduce icache flushes Reviewed-by: fyang, mli, luhenry * 8340808: RISC-V: Client build fails after JDK-8339738 Reviewed-by: fyang * 8340843: [PPC64/s390x] Error: ShouldNotReachHere() in TemplateInterpreterGenerator::generate_math_entry after 8338694 Reviewed-by: mbaesken, amitkumar * 8339541: CSS rule is not specific enough Reviewed-by: jjg * 8340885: Desugar ZipCoder.Comparison Reviewed-by: lancea, eirbjo * 8340568: Incorrect escaping of single quotes when pretty-printing character literals Reviewed-by: mcimadamore * 8338583: NMT: Malloc overhead is calculated incorrectly Reviewed-by: azafari, yan, gziemski * WIP: use UseNewCode * 8340815: Add SECURITY.md file Reviewed-by: mr, jwaters, erikj * 8340946: Add vmTestbase/gc/memory/Nio/Nio.java and java/nio/Buffer/LimitDirectMemory.java to problem list Reviewed-by: liach, dcubed, alanb * 8340684: Reading from an input stream backed by a closed ZipFile has no test coverage Reviewed-by: lancea * 8340228: Open source couple more miscellaneous AWT tests Reviewed-by: prr * 8340956: ProblemList 4 java/nio/channels/DatagramChannel tests on macosx-all Reviewed-by: liach, alanb, darcy, dfuchs * 8340838: Clean up MutableCallSite to use explicit release fence instead of AtomicInteger Reviewed-by: jrose, redestad, shade * 8340831: Simplify simple validation for class definition in MethodHandles.Lookup Reviewed-by: redestad * 8340864: Remove unused lines related to vmClasses Reviewed-by: shade, kvn * WIP: fixed lshift base term matching * WIP: removed UseNewCode * 8339271: giflib attribution correction Reviewed-by: dnguyen, prr * 8340812: LambdaForm customization via MethodHandle::updateForm is not thread safe Reviewed-by: liach, shade, jvernee * 8339260: Move rarely used constants out of ClassFile Reviewed-by: asotona * 8340923: The class LogSelection copies uninitialized memory Reviewed-by: mbaesken, jwaters, stefank * 8340899: Remove wildcard bound in PositionWindows.positionTestWindows Reviewed-by: azvegint, prr * 8340466: Add description for PassFailJFrame constructors Reviewed-by: prr, honkar * 8339542: compiler/codecache/CheckSegmentedCodeCache.java fails Reviewed-by: mdoerr, shade * 8340687: Open source closed frame tests #1 Reviewed-by: aivanov * 8339560: Unaddressed comments during code review of JDK-8337664 Reviewed-by: mullan * 8336942: Improve test coverage for class loading elements with annotations of different retentions Reviewed-by: vromero * 8336468: Reflection and MethodHandles should use more precise initializer checks Reviewed-by: liach, coleenp * 8336895: BufferedReader doesn't read full \r\n line ending when it doesn't fit in buffer Reviewed-by: jpai, alanb * 8339460: CDS error when module is located in a directory with space in the name Reviewed-by: ccheung, iklam * 8340981: Update citations to "Hacker's Delight" Reviewed-by: bpb, iris, liach, jwaters * 8340983: Use index and definition tags in Object and Double Reviewed-by: bpb, liach * 8333403: Write a test to check various components events are triggered properly Reviewed-by: aivanov * revert changing AddI/LNodeIdealizationTests * 8339261: Logs truncated in test javax/net/ssl/DTLS/DTLSRehandshakeTest.java Reviewed-by: rhalade, hchao * fixed power-of-2 multiplication detection * fixed power-of-2 multiplication detection * refactor lshift multiplier calculation. updated comments --------- Co-authored-by: Denghui Dong Co-authored-by: Jaikiran Pai Co-authored-by: Jatin Bhateja Co-authored-by: Thomas Stuefe Co-authored-by: Thomas Schatzl Co-authored-by: Stefan Karlsson Co-authored-by: Daniel Lund?n Co-authored-by: Tobias Hartmann Co-authored-by: Matthias Baesken Co-authored-by: Magnus Ihse Bursie Co-authored-by: Brian Burkhalter Co-authored-by: David M. Lloyd Co-authored-by: Raffaello Giulietti Co-authored-by: Chen Liang Co-authored-by: Calvin Cheung Co-authored-by: Gerard Ziemski Co-authored-by: Harshitha Onkar Co-authored-by: Alexey Ivanov Co-authored-by: Leonid Mesnik Co-authored-by: Prasanta Sadhukhan Co-authored-by: Claes Redestad Co-authored-by: Roland Westrelin Co-authored-by: Martin Doerr Co-authored-by: Simon Tooke Co-authored-by: Nizar Benalla Co-authored-by: Kevin Walls Co-authored-by: Emanuel Peter Co-authored-by: Rafael Winterhalter Co-authored-by: Hamlin Li Co-authored-by: Phil Race Co-authored-by: Amit Kumar Co-authored-by: Serhiy Sachkov Co-authored-by: Joel Sikstr?m Co-authored-by: Prasadrao Koppula Co-authored-by: Matias Saavedra Silva Co-authored-by: Justin Lu Co-authored-by: Joe Darcy Co-authored-by: Aleksey Shipilev Co-authored-by: William Kemper Co-authored-by: Alexander Zuev Co-authored-by: Kim Barrett Co-authored-by: David Holmes Co-authored-by: Abhishek Kumar Co-authored-by: SendaoYan Co-authored-by: Andrey Turbanov Co-authored-by: Shaojin Wen Co-authored-by: Coleen Phillimore Co-authored-by: Severin Gehwolf Co-authored-by: Pavel Rappo Co-authored-by: Per Minborg Co-authored-by: Alexander Zvegintsev Co-authored-by: Lance Andersen Co-authored-by: Francisco Ferrari Bihurriet Co-authored-by: Martin Balao Co-authored-by: Alexey Semenyuk Co-authored-by: Axel Boldt-Christmas Co-authored-by: Christian Hagedorn Co-authored-by: Gui Cao Co-authored-by: Afshin Zafari Co-authored-by: Yudi Zheng Co-authored-by: Tomas Zezula Co-authored-by: Kuai Wei Co-authored-by: George Adams Co-authored-by: Zhengyu Gu Co-authored-by: Sonia Zaldana Calles Co-authored-by: Andrew Dinn Co-authored-by: vamsi-parasa Co-authored-by: Artur Barashev Co-authored-by: Jonathan Gibbons Co-authored-by: Robbin Ehn Co-authored-by: Hannes Walln?fer Co-authored-by: Liam Miller-Cushon Co-authored-by: Leonov Kirill <91743110+kirleo2 at users.noreply.github.com> Co-authored-by: Eirik Bj?rsn?s Co-authored-by: Daniel D. Daugherty Co-authored-by: Ioi Lam Co-authored-by: Alisen Chung Co-authored-by: Johan Sj?len Co-authored-by: Lutz Schmidt Co-authored-by: Fernando Guallini Co-authored-by: Maxim Kartashev Co-authored-by: Ravi Gupta - WIP: v3 - add comments about intentional type narrowing - Merge pull request #2 from tabjy/arithmetic-canonicalization-v2 Arithmetic canonicalization v2 - remove unused variables - remove debug printfs - fix detecting optimized power-of-2 multiplication - revert usage of integercon(): truncation during jlong to jint is intended - ... and 23 more: https://git.openjdk.org/jdk/compare/ae4d2f15...0de4feea ------------- Changes: https://git.openjdk.org/jdk/pull/20754/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20754&range=05 Stats: 450 lines in 3 files changed: 450 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20754.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20754/head:pull/20754 PR: https://git.openjdk.org/jdk/pull/20754 From kxu at openjdk.org Mon Sep 30 06:46:38 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 30 Sep 2024 06:46:38 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v2] In-Reply-To: References: <-TWzHU1sjg49BOttKN_muQ6ICHAMbAnBoNUurRbqHDg=.a27bbf72-a451-4257-914f-d37e181ce257@github.com> Message-ID: On Tue, 17 Sep 2024 09:33:38 GMT, Christian Hagedorn wrote: >>> If this is your intention, then please ignore this message. >> >> Yes, this is my intention. >> >> --- >> >> My previous approach of identifying optimized `Mul->shift + add/sub` (e.g., `a*6` becomes `(a<<1) + (a<<2)` by `MulNode::Ideal()`) was inherently flawed. I was solely determining this with the number of terms. It is not reliable. In the `TestLargeTreeOfSubNodes` example, it replaces already optimized Mul nodes and a new Mul node and repeats the process, causing performance regression (and timeouts). >> >> The new approach matches the exact patterns of optimized `MulNode`s. Additionally, a recursion depth limit of 5 (a rather arbitrary number) is in effect during *iterative* GVN to mitigate the risk of exhausting resources. Untransformed nodes are added to the worklist and will be eventually transformed. >> >> Please note, in the case of `TestLargeTreeOfSubNodes` with flags mentioned above, the compilation is skipped without a large enough `-XX:MaxLabelRootDepth`. This is the same behaviour as the current master. >> >> Please re-review once GHA is confirmed passing. Thanks! > >> Please note, in the case of TestLargeTreeOfSubNodes with flags mentioned above, the compilation is skipped without a large enough -XX:MaxLabelRootDepth. This is the same behaviour as the current master. > > Have you found out why this is the case? I thought that the original fix which added `TestLargeTreeOfSubNodes` wanted to fix the problem of running out of nodes. > > I gave your patch another spin. We still see various failures and timeouts. For example: > > `compiler/intrinsics/sha/TestDigest.java` times out with various flag combinations (for example `-server -Xmixed`). Here is the stack at the timeout: > > > Thread 7 (Thread 0x7fc808490700 (LWP 22433)): > #0 0x00007fc80d648051 in Node::find_integer_type(BasicType) const () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so > #1 0x00007fc80c793214 in AddNode::extract_base_operand_from_serial_additions(PhaseGVN*, Node*, Node**, int) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so > #2 0x00007fc80c79306c in AddNode::extract_base_operand_from_serial_additions(PhaseGVN*, Node*, Node**, int) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so > ... > #90 0x00007fc80c79306c in AddNode::extract_base_operand_from_serial_additions(PhaseGVN*, Node*, Node**, int) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so > #91 0x00007fc80c793082 in AddNode::extract_base_operand_from_serial_additions(PhaseGVN*, Node*, Node**, int) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so > #92 0x00007fc80c79306c in AddNode::extract_base_operand_from_serial_additions(PhaseGVN*, Node*, Node**, int) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so > #93 0x00007fc80c793351 in AddNode::convert_serial_additions(PhaseGVN*, bool, BasicType) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so > #94 0x00007fc80c7937c5 in AddNode... @chhagedorn > Have you found out why this is the case? I thought that the original fix which added TestLargeTreeOfSubNodes wanted to fix the problem of running out of nodes. I think this is intended. The original fix does prevents running out of nodes during optimization. However, the compilation skipped during code gen, not IGVN. The optimization, after all, correctly unrolls into a large number of nodes which default value of 1100 for `MaxLabelRootDepth` prevents compilation. > We still see various failures and timeouts. For example: `compiler/intrinsics/sha/TestDigest.java` This happened because `node->is_Add()` is deceivingly true when `node` is an `Xor[I/L]Node`. I adopted Roland's suggestions, and it's now comparing opcodes directly. > I'm also seeing the live node limit assert with test `applications/ctw/modules/java_desktop.java` There was a problem detecting optimized power-of-2 multiplication where the base term itself is also a `LShiftNode`. It's fixed now. --- After some discussions with Roland, I decided to not to use recurssion at all to avoid risk of exhausting resource. The latest version matches the exact patterns of multiplications optimized into power-of-2 additions. This should be much safer. It passes all `TEST="tier1 hotspot_compiler tier1_compiler tier2_compiler tier3_compiler tier1_compiler_not_xcomp"` tests. Please give it another run. Thank you very much! @rwestrel Could you please give it a quick look as well? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20754#issuecomment-2382240422 From kxu at openjdk.org Mon Sep 30 06:46:39 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 30 Sep 2024 06:46:39 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v2] In-Reply-To: <9WxAeM7QHqfmAcs90-IeCy8zME-pe_HY4onDTFwJfMQ=.fb286084-6b1c-47b9-8151-1349e0d37a08@github.com> References: <-TWzHU1sjg49BOttKN_muQ6ICHAMbAnBoNUurRbqHDg=.a27bbf72-a451-4257-914f-d37e181ce257@github.com> <8jdKFn_Bln3lPK1vO8UZyUakbwv_gBvKLd-MutObCg0=.bf55c55e-b906-4d90-9493-e26ba2d87298@github.com> <9WxAeM7QHqfmAcs90-IeCy8zME-pe_HY4onDTFwJfMQ=.fb286084-6b1c-47b9-8151-1349e0d37a08@github.com> Message-ID: On Mon, 23 Sep 2024 11:39:41 GMT, Roland Westrelin wrote: >> I believe so. Consider the case `(a + a) * 3`. Recurse here allows us to extract `a` and factor `2 * 3 => 6` > > That case would be better handled in 2 steps, I think: > `a+a` into `a*2` with a `AddNode` transformation > `(a*2)*3` into `a*6` with a `MulNode` (or `LShift`) transformation. Can you check if it already exists? Done. The new version doesn't use recurssions at all ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1780520892 From chagedorn at openjdk.org Mon Sep 30 06:48:38 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 30 Sep 2024 06:48:38 GMT Subject: RFR: 8340602: C2: LoadNode::split_through_phi might exhaust nodes in case of base_is_phi [v4] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 02:47:03 GMT, Daohan Qu wrote: >> # Description >> >> [JDK-6934604](https://github.com/openjdk/jdk/commit/b4977e887a53c898b96a7d37a3bf94742c7cc194) introduces the flag `AggressiveUnboxing` in jdk8u, and [JDK-8217919](https://github.com/openjdk/jdk/commit/71759e3177fcd6926bb36a30a26f9f3976f2aae8) enables it by default in jdk13u. >> >> But it seems that JDK-6934604 forgets to check duplicate `PhiNode` generated in the function `LoadNode::split_through_phi(PhaseGVN *phase)` (in `memnode.cpp`) in the case that `base` is phi but `mem` is not phi. More exactly, `LoadNode::Identity(PhaseTransform *phase)` doesn't search for `PhiNode` in the correct region in that case. >> >> This might cause infinite split in the case of a loop, which is similar to the bugs fixed in [JDK-6673473](https://github.com/openjdk/jdk/commit/30dc0edfc877000c0ae20384f228b45ba82807b7). The infinite split results in "Out of nodes" and make the method "not compilable". >> >> Since JDK-8217919 (in jdk13u), all the later versions of jdks are affected by this bug when the expected optimization pattern appears in the code. For example, the following three micro-benchmarks running with >> >> >> make test \ >> TEST="micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef" \ >> MICRO="FORK=1;WARMUP_ITER=2" \ >> TEST_OPTS="VM_OPTIONS=-XX:+UseParallelGC" >> >> >> shows performance improvement after this PR applied. (`-XX:+UseParallelGC` is only for reproduce this bug, all the bms in the following table are run with this option.) >> >> |benchmark (throughput, unit: ops/s)| jdk-before-this-patch | jdk-after-this-patch | >> |---|---|---| >> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner | 26.452 ?(99.9%) 0.185 ops/s | 62.060 ?(99.9%) 0.878 ops/s | >> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda | 26.922 ?(99.9%) 1.710 ops/s | 67.961 ?(99.9%) 0.850 ops/s | >> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef | 28.382 ?(99.9%) 1.021 ops/s | 67.998 ?(99.9%) 0.751 ops/s | >> >> # Reproduction >> >> Compiled and run the reduced test case `Test.java` in the appendix below using >> >> >> java -Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=comp.log -XX:+UseParallelGC Test >> >> >> and you could find that `Test$Obj.calc` is tagged with `make_not_compilable` and... > > Daohan Qu has updated the pull request incrementally with one additional commit since the last revision: > > Add jtreg requirements and fix some format issues Thanks for the updated description! I will have a closer look at it later this week. I gave your patch a spinning in our testing over the weekend and found some timeouts for tests `compiler/eliminateAutobox/Test*Boxing.java` with various flag combos. For example, `compiler/eliminateAutobox/TestIntBoxing.java` on linux-aarch64-debug with `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation` is stuck in `phi_or_self()`. Here is the stacktrace at the timeout: Thread 15 (Thread 0xffff95e001d0 (LWP 2990058)): #0 0x0000ffffb9b13130 in LoadNode::phi_or_self(PhaseGVN*) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-27-2011498.christian.hagedorn.jdk-test/linux-aarch64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so #1 0x0000ffffb9b1db4c in LoadNode::split_through_phi(PhaseGVN*, bool) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-27-2011498.christian.hagedorn.jdk-test/linux-aarch64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so #2 0x0000ffffb9b27ea8 in LoadNode::Ideal(PhaseGVN*, bool) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-27-2011498.christian.hagedorn.jdk-test/linux-aarch64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so #3 0x0000ffffb9cc3d2c in PhaseIterGVN::transform_old(Node*) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-27-2011498.christian.hagedorn.jdk-test/linux-aarch64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so #4 0x0000ffffb9cba950 in PhaseIterGVN::optimize() () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-27-2011498.christian.hagedorn.jdk-test/linux-aarch64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so #5 0x0000ffffb9204364 in Compile::Optimize() () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-27-2011498.christian.hagedorn.jdk-test/linux-aarch64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so #6 0x0000ffffb9207504 in Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-27-2011498.christian.hagedorn.jdk-test/linux-aarch64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so .... Can also be observed on other platforms like linux-x64-debug or Windows. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21134#issuecomment-2382244739 From thartmann at openjdk.org Mon Sep 30 07:04:36 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 30 Sep 2024 07:04:36 GMT Subject: RFR: 8336702: C2 compilation fails with "all memory state should have been processed" assert In-Reply-To: References: Message-ID: <85C55cSsG9xUcs6GV3nCiq-idMJhEOywFN3atCFOO78=.53e95ff8-e9f4-4c86-a97d-5bff2cb009d3@github.com> On Mon, 16 Sep 2024 08:34:44 GMT, Roland Westrelin wrote: > When converting a `LongCountedLoop` into a loop nest, c2 needs jvm > state to add predicates to the inner loop. For that, it peels an > iteration of the loop and uses the state of the safepoint at the end > of the loop. That's only legal if there's no side effect between the > safepoint and the backedge that goes back into the loop. The assert > failure here happens in code that checks that. > > That code compares the memory states at the safepoint and at the > backedge. If they are the same then there's no side effect. To check > consistency, the `MergeMem` at the safepoint is cloned. As the logic > iterates over the backedge state, it clears every component of the > state it encounters from the `MergeMem`. Once done, the cloned > `MergeMem` should be "empty". In the case of this failure, no side > effect is found but the cloned `MergeMem` is not empty. That happens > because of EA: it adds edges to the `MergeMem` at the safepoint that > it doesn't add to the backedge `Phis`. > > So it's the verification code that fails and I propose dealing with > this by ignoring memory state added by EA in the verification code. src/hotspot/share/opto/loopnode.cpp line 708: > 706: for (MergeMemStream mms(mem->as_MergeMem()); mms.next_non_empty(); ) { > 707: // Loop invariant memory state won't be reset by no_side_effect_since_safepoint(). Do it here. > 708: // Escape Analysis can add state to mm that it doesn't add to the backedge memory Phis, breaking verification Where exactly does that happen in EA? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21009#discussion_r1780540892 From dqu at openjdk.org Mon Sep 30 07:14:35 2024 From: dqu at openjdk.org (Daohan Qu) Date: Mon, 30 Sep 2024 07:14:35 GMT Subject: RFR: 8340602: C2: LoadNode::split_through_phi might exhaust nodes in case of base_is_phi [v4] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 06:45:44 GMT, Christian Hagedorn wrote: > Thanks for the updated description! I will have a closer look at it later this week. I gave your patch a spinning in our testing over the weekend and found some timeouts for tests compiler/eliminateAutobox/Test*Boxing.java with various flag combos. Hi Christian, thanks so much for your testing and feedback! I'll look into it ASAP. I had only tested in `release` mode before, which might explain why `jtreg:test/hotspot/jtreg/compiler` and `jtreg:test/hotspot/jtreg/vmTestbase` passed on my end. Just a heads-up: I'll be taking a week off for the National Day holiday starting tomorrow, so my replies might be a bit delayed. I apologize in advance :P ------------- PR Comment: https://git.openjdk.org/jdk/pull/21134#issuecomment-2382289611 From chagedorn at openjdk.org Mon Sep 30 07:25:37 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 30 Sep 2024 07:25:37 GMT Subject: RFR: 8340786: Introduce Predicate classes with predicate iterators and visitors for simplified walking [v3] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 07:42:54 GMT, Christian Hagedorn wrote: >> This patch introduces new predicate classes which implement a new `Predicate` interface. These classes represent the different predicates in the C2 IR. They are used in combination with new predicate iterator and visitors classes to provide an easy way to walk and process predicates in the IR. >> >> ### Predicate Interfaces and Implementing Classes >> - `Predicate` interface is implemented by four predicate classes: >> - `ParsePredicate` (existing class) >> - `RuntimePredicate` (existing and updated class) >> - `TemplateAssertionPredicate` (new class) >> - `InitializedAssertionPredicate` (new class, renamed old `InitializedAssertionPredicate` class to `InitializedAssertionPredicateCreator`) >> >> ### Predicate Iterator with Visitor classes >> There is a new `PredicateIterator` class which can be used to iterate through the predicates of a loop. For each predicate, a `PredicateVisitor` can be applied. The user can implement the `PredicateIterator` interface and override the default do-nothing implementations to the specific needs of the code. I've done this for a couple of places in the code by defining new visitors: >> - `ParsePredicateUsefulMarker`: This visitor marks all Parse Predicates as useful. >> - Replaces the old now retired `ParsePredicateIterator`. >> - `DominatedPredicates`: This visitor checks the dominance relation to an `early` node when trying to figure out the latest legal placement for a node. The goal is to skip as many predicates as possible to avoid interference with Loop Predication and/or creating a Loop Limit Check Predicate. >> - Replaces the old now retired `PredicateEntryIterator`. >> - `Predicates::dump()`: Newly added dumping code for the predicates above a loop which uses a new `PredicatePrinter` visitor. This helps debugging issues with predicates. >> >> #### To Be Replaced soon >> There are a couple of places where we use similar code to walk predicates and apply some transformation/modifications to the IR. The goal is to replace these locations with the new visitors as well. This will incrementally be done with the next couple of PRs. >> >> ### More Information >> More information about specific classes and changes can be found as code comments and PR comments. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Add missing public for UnifiedPredicateVisitor Thanks for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21161#issuecomment-2382310492 From chagedorn at openjdk.org Mon Sep 30 07:25:38 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 30 Sep 2024 07:25:38 GMT Subject: RFR: 8340786: Introduce Predicate classes with predicate iterators and visitors for simplified walking [v3] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 12:53:13 GMT, Roland Westrelin wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Add missing public for UnifiedPredicateVisitor > > src/hotspot/share/opto/loopnode.cpp line 4325: > >> 4323: class ParsePredicateUsefulMarker : public PredicateVisitor { >> 4324: public: >> 4325: using PredicateVisitor::visit; > > Why is this needed? Forgot to explain the reason behind. Without this, compilation fails with: /home/christian/jdk3/open/src/hotspot/share/opto/predicates.hpp:240:16: error: 'virtual void PredicateVisitor::visit(const InitializedAssertionPredicate&)' was hidden [-Werror=overloaded-virtual=] 240 | virtual void visit(const InitializedAssertionPredicate& initialized_assertion_predicate) {} | ^~~~~ /home/christian/jdk3/open/src/hotspot/share/opto/loopnode.cpp:4327:8: note: by 'virtual void ParsePredicateUsefulMarker::visit(const ParsePredicate&)' 4327 | void visit(const ParsePredicate& parse_predicate) override { | I was not quite sure what this means, so I looked it up and found this: https://stackoverflow.com/questions/18515183/c-overloaded-virtual-function-warning-by-clang It looks like the warning is here to prevent accidental hiding of overloads. Here is another example: struct Base { virtual void foo(double e) { printf("Base"); } }; struct Derived: public Base { virtual void foo(int e) { printf("Derived"); } }; int main() { Derived derived; derived.foo(3.4f); return 0; } You would expect that this prints `Base`. But since the lookup happens on `Derived`, it finds a matching `Derived::foo(int e)` and applies that one (we print "Derived") even though we intended to call `Base::foo()`. By adding `using Base::foo`, we make the overloaded version available in `Derived` and pick that one instead when running `main()` (we print "Base", as anticipated). In my example, it would not be a problem. But I guess the warning is here to just generally warn the user if the code was intended or just a typo resulting in unexpected behavior. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21161#discussion_r1780565499 From chagedorn at openjdk.org Mon Sep 30 07:31:35 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 30 Sep 2024 07:31:35 GMT Subject: RFR: 8340602: C2: LoadNode::split_through_phi might exhaust nodes in case of base_is_phi [v4] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 02:47:03 GMT, Daohan Qu wrote: >> # Description >> >> [JDK-6934604](https://github.com/openjdk/jdk/commit/b4977e887a53c898b96a7d37a3bf94742c7cc194) introduces the flag `AggressiveUnboxing` in jdk8u, and [JDK-8217919](https://github.com/openjdk/jdk/commit/71759e3177fcd6926bb36a30a26f9f3976f2aae8) enables it by default in jdk13u. >> >> But it seems that JDK-6934604 forgets to check duplicate `PhiNode` generated in the function `LoadNode::split_through_phi(PhaseGVN *phase)` (in `memnode.cpp`) in the case that `base` is phi but `mem` is not phi. More exactly, `LoadNode::Identity(PhaseTransform *phase)` doesn't search for `PhiNode` in the correct region in that case. >> >> This might cause infinite split in the case of a loop, which is similar to the bugs fixed in [JDK-6673473](https://github.com/openjdk/jdk/commit/30dc0edfc877000c0ae20384f228b45ba82807b7). The infinite split results in "Out of nodes" and make the method "not compilable". >> >> Since JDK-8217919 (in jdk13u), all the later versions of jdks are affected by this bug when the expected optimization pattern appears in the code. For example, the following three micro-benchmarks running with >> >> >> make test \ >> TEST="micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef" \ >> MICRO="FORK=1;WARMUP_ITER=2" \ >> TEST_OPTS="VM_OPTIONS=-XX:+UseParallelGC" >> >> >> shows performance improvement after this PR applied. (`-XX:+UseParallelGC` is only for reproduce this bug, all the bms in the following table are run with this option.) >> >> |benchmark (throughput, unit: ops/s)| jdk-before-this-patch | jdk-after-this-patch | >> |---|---|---| >> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner | 26.452 ?(99.9%) 0.185 ops/s | 62.060 ?(99.9%) 0.878 ops/s | >> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda | 26.922 ?(99.9%) 1.710 ops/s | 67.961 ?(99.9%) 0.850 ops/s | >> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef | 28.382 ?(99.9%) 1.021 ops/s | 67.998 ?(99.9%) 0.751 ops/s | >> >> # Reproduction >> >> Compiled and run the reduced test case `Test.java` in the appendix below using >> >> >> java -Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=comp.log -XX:+UseParallelGC Test >> >> >> and you could find that `Test$Obj.calc` is tagged with `make_not_compilable` and... > > Daohan Qu has updated the pull request incrementally with one additional commit since the last revision: > > Add jtreg requirements and fix some format issues No worries! Take your time, there is no rush :-) Thanks for letting me know. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21134#issuecomment-2382322943 From rcastanedalo at openjdk.org Mon Sep 30 07:59:45 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 30 Sep 2024 07:59:45 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v23] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 07:57:15 GMT, Roberto Casta?eda Lozano wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with seven additional commits since the last revision: >> >> - Assert that unneeded stub tmp registers are not initialized in x64 and aarch64 platforms >> - Set tmp registers to noreg by default in G1PreBarrierStubC2::initialize_registers, for consistency >> - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion >> - Restore some asserts >> - Default values for tmp regs of G1PostBarrierStubC2 >> - 8334060: [arm32] Implementation of Late Barrier Expansion for G1 >> - 8330685: [arm32] share barrier spilling logic > > Thanks for the arm 32-bits port @snazarkin! Merged in commit 3957c03f. > Besides the arm 32-bits port, @snazarkin's changeset includes adding the possibility to use a third temporary register in the platform-independent class `G1PostBarrierStubC2`. This temporary register (`G1PostBarrierStubC2::_tmp3`) is initialized to `noreg` by default in `G1PostBarrierStubC2::initialize_registers`, so no other platform should be affected. > Hi @robcasloz, riscv port cleanup is available at [feilongjiang at 1297f60](https://github.com/feilongjiang/jdk/commit/1297f6086e1de62196e2acddf2f7c86a29619dd7), would you please help to apply it? Done (commit 14483b83), thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2382377364 From rcastanedalo at openjdk.org Mon Sep 30 08:24:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 30 Sep 2024 08:24:52 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 05:02:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: > > - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion > - riscv port refactor > - Remove temporary support code > - Merge jdk-24+17 > - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization > - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions > - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes > - Merge jdk-24+16 > - Ensure that detected encode-and-store patterns are matched > - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion > - ... and 43 more: https://git.openjdk.org/jdk/compare/60c13deb...14483b83 I just updated to jdk-24+17 (commit bda4ab21) and removed the temporary support code guarded by `G1_LATE_BARRIER_MIGRATION_SUPPORT` (commit 55a1f621). The current changeset passes all tests specified in the pull request [description](https://github.com/openjdk/jdk/pull/19746#issue-2356905813) and yields benchmark results similar to those of the original submission. @albertnetymk @vnkozlov @tschatzl @kimbarrett could you please re-review? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2382431347 From tschatzl at openjdk.org Mon Sep 30 10:04:51 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 30 Sep 2024 10:04:51 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: <9o1uzat6Ap4MIn7o6xZhwXKaHsgMRNi_2qyvpcjAlIQ=.311f7fe0-45be-4f39-a28d-3fc1d5ea5471@github.com> On Mon, 30 Sep 2024 05:02:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: > > - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion > - riscv port refactor > - Remove temporary support code > - Merge jdk-24+17 > - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization > - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions > - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes > - Merge jdk-24+16 > - Ensure that detected encode-and-store patterns are matched > - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion > - ... and 43 more: https://git.openjdk.org/jdk/compare/55c0ecf8...14483b83 Still seems good. Mostly only looked at the changes in the GC directory and the barrier code themselves as I do not feel enabled to comment too much on other (compiler) changes. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2336972915 From rcastanedalo at openjdk.org Mon Sep 30 11:33:47 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 30 Sep 2024 11:33:47 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: <9o1uzat6Ap4MIn7o6xZhwXKaHsgMRNi_2qyvpcjAlIQ=.311f7fe0-45be-4f39-a28d-3fc1d5ea5471@github.com> References: <9o1uzat6Ap4MIn7o6xZhwXKaHsgMRNi_2qyvpcjAlIQ=.311f7fe0-45be-4f39-a28d-3fc1d5ea5471@github.com> Message-ID: On Mon, 30 Sep 2024 10:02:17 GMT, Thomas Schatzl wrote: > Still seems good. > > Mostly only looked at the changes in the GC directory and the barrier code themselves as I do not feel enabled to comment too much on other (compiler) changes. Thanks for re-reviewing, Thomas! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2382930857 From fyang at openjdk.org Mon Sep 30 11:53:52 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 30 Sep 2024 11:53:52 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 05:02:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: > > - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion > - riscv port refactor > - Remove temporary support code > - Merge jdk-24+17 > - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization > - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions > - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes > - Merge jdk-24+16 > - Ensure that detected encode-and-store patterns are matched > - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion > - ... and 43 more: https://git.openjdk.org/jdk/compare/dede1992...14483b83 Updated RISC-V part of the change looks good to me. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2337279856 From rcastanedalo at openjdk.org Mon Sep 30 12:06:48 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 30 Sep 2024 12:06:48 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 11:51:02 GMT, Fei Yang wrote: > Updated RISC-V part of the change looks good to me. Thanks, Fei! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2382997964 From roland at openjdk.org Mon Sep 30 12:24:45 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 30 Sep 2024 12:24:45 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v18] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: <1bSYs3IfVqcG9AsMDNuorBU0V-bTNTu_F8vtLlKoqs0=.2dd80dc2-b415-47bd-b886-abf52adb6efa@github.com> On Fri, 27 Sep 2024 21:55:26 GMT, Kangcheng Xu wrote: >> Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. >> >> Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > Update TestParallelIvInIntCountedLoop.java test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java line 349: > 347: // also test with random init and init2 > 348: int init1 = rng.nextInt(); > 349: int init2 = rng.nextInt(Integer.MIN_VALUE + i + 1, i); // Similarly, avoid (i - init2) from overflowing. What does the "similarly" refer to? To comment line 347 or line 321? It's unclear to me. I would remove it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1781010905 From kxu at openjdk.org Mon Sep 30 13:36:19 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 30 Sep 2024 13:36:19 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v19] In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: update comments in TestParallelIvInIntCountedLoop.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18489/files - new: https://git.openjdk.org/jdk/pull/18489/files/5d1ee27a..6cad8c19 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=17-18 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18489.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489 PR: https://git.openjdk.org/jdk/pull/18489 From roland at openjdk.org Mon Sep 30 13:36:19 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 30 Sep 2024 13:36:19 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v19] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Mon, 30 Sep 2024 13:32:38 GMT, Kangcheng Xu wrote: >> Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. >> >> Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > update comments in TestParallelIvInIntCountedLoop.java Marked as reviewed by roland (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18489#pullrequestreview-2337560562 From kxu at openjdk.org Mon Sep 30 13:36:19 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 30 Sep 2024 13:36:19 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v18] In-Reply-To: <1bSYs3IfVqcG9AsMDNuorBU0V-bTNTu_F8vtLlKoqs0=.2dd80dc2-b415-47bd-b886-abf52adb6efa@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> <1bSYs3IfVqcG9AsMDNuorBU0V-bTNTu_F8vtLlKoqs0=.2dd80dc2-b415-47bd-b886-abf52adb6efa@github.com> Message-ID: On Mon, 30 Sep 2024 12:21:49 GMT, Roland Westrelin wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> Update TestParallelIvInIntCountedLoop.java > > test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java line 349: > >> 347: // also test with random init and init2 >> 348: int init1 = rng.nextInt(); >> 349: int init2 = rng.nextInt(Integer.MIN_VALUE + i + 1, i); // Similarly, avoid (i - init2) from overflowing. > > What does the "similarly" refer to? To comment line 347 or line 321? It's unclear to me. I would remove it. Was refering to line 321. Updated comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1781132739 From roland at openjdk.org Mon Sep 30 13:43:43 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 30 Sep 2024 13:43:43 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v6] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 06:22:17 GMT, Kangcheng Xu wrote: >> This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. >> >> As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). >> >> Some notable examples of this transformation include: >> - `a + a + a` => `a*3` => `(a<<1) + a` >> - `a + a + a + a` => `a*4` => `a<<2` >> - `a*3 + a` => `a*4` => `a<<2` >> - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` >> >> See included IR unit tests for more. > > Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits: > > - resolve conflicts > - resolve conflicts > - Arithmetic canonicalization v3 (#3) > > * 8340144: C1: remove unused Compilation::_max_spills > > Reviewed-by: thartmann, shade > > * 8340176: Replace usage of -noclassgc with -Xnoclassgc in test/jdk/java/lang/management/MemoryMXBean/LowMemoryTest2.java > > Reviewed-by: kevinw, lmesnik > > * 8339793: Fix incorrect APX feature enabling with -XX:-UseAPX > > Reviewed-by: kvn, thartmann, sviswanathan > > * 8340184: Bug in CompressedKlassPointers::is_in_encodable_range > > Reviewed-by: coleenp, rkennke, jsjolen > > * 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() > > Reviewed-by: roland, chagedorn, jkarthikeyan > > * 8340119: Remove oopDesc::size_might_change() > > Reviewed-by: stefank, iwalulya > > * 8340009: Improve the output from assert_different_registers > > Reviewed-by: aboldtch, dholmes, shade, mli > > * 8340273: Remove CounterHalfLifeTime > > Reviewed-by: chagedorn, dholmes > > * 8338566: Lazy creation of exception instances is not thread safe > > Reviewed-by: shade, kvn, dlong > > * 8339648: ZGC: Division by zero in rule_major_allocation_rate > > Reviewed-by: aboldtch, lucy, tschatzl > > * 8329816: Add SLEEF version 3.6.1 > > Reviewed-by: erikj, mli, luhenry > > * 8315273: (fs) Path.toRealPath(LinkOption.NOFOLLOW_LINKS) fails when "../../" follows a link (win) > > Reviewed-by: djelinski > > * 8339574: Behavior of File.is{Directory,File,Hidden} is not documented with respect to symlinks > > Reviewed-by: djelinski, alanb > > * 8340200: Misspelled constant `AttributesProcessingOption.DROP_UNSTABLE_ATRIBUTES` > > Reviewed-by: liach > > * 8339934: Simplify Math.scalb(double) method > > Reviewed-by: darcy > > * 8339790: Support Intel APX setzucc instruction > > Reviewed-by: sviswanathan, jkarthikeyan, kvn > > * 8340323: Test jdk/classfile/OptionsTest.java fails after JDK-8340200 > > Reviewed-by: alanb > > * 8338686: App classpath mismatch if a jar from the Class-Path attribute is on the classpath > > Reviewed-by: dholmes, iklam > > * 8337563: NMT: rename MEMFLAGS to MemTag > > Reviewed-by: dholmes, coleenp, jsjolen > > * 8340210: Add positionTestUI() to Pass... src/hotspot/share/opto/addnode.cpp line 409: > 407: // Convert a + a + ... + a into a*n > 408: Node* AddNode::convert_serial_additions(PhaseGVN* phase, bool can_reshape, BasicType bt) { > 409: if (is_optimized_multiplication()) { What would happen if that `if` is removed? Would matching in the rest of the method accept something that's not a valid pattern? src/hotspot/share/opto/addnode.cpp line 422: > 420: // Convert (a + a) + a to 3 * a > 421: // Look for LHS pattern: AddNode(a, a) > 422: if (in1_op == Op_Add(bt) && in1->in(1) == in1->in(2)) { It seems each of the if blocks in this method could be its own method that returns true and `multiplier` (passed by reference, I suppose) if pattern matching succeeds. src/hotspot/share/opto/addnode.cpp line 442: > 440: Node* con = in1->in(2); > 441: BasicType con_bt = phase->type(con)->basic_type(); > 442: if (con_bt == T_VOID) { // const could potentially be void type Does that happen when con is top? In that case I think it's better to use `con->is_top()`. Or maybe `find_integer_as_long` if there's a value we know for sure that con can't take (it can't be negative, right?). src/hotspot/share/opto/addnode.cpp line 487: > 485: // AddNode(LShiftNode(a, CON1), LShiftNode(a, CON2)/a) > 486: // AddNode(LShiftNode(a, CON1)/a, LShiftNode(a, CON2)) > 487: for (int i = 0; i < 2; i++) { I wouldn't use a loop here. I would put the loop body into its own method and call it twice, once with `lhs`, `lhs_base` as arguments, once with `rhs`, `rhs_base`. src/hotspot/share/opto/addnode.cpp line 540: > 538: > 539: PhaseIterGVN* igvn = phase->is_IterGVN(); > 540: if (igvn != nullptr) { Why do you need that? I think it's fine to return a new node from Ideal. src/hotspot/share/opto/addnode.cpp line 562: > 560: // swap LShiftNode to lhs for easier matching > 561: if (!lhs->is_LShift()) { > 562: Node* tmp = lhs; You can use swap() here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1781157596 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1781144432 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1781152047 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1781146767 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1781155594 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1781137244 From duke at openjdk.org Mon Sep 30 14:25:39 2024 From: duke at openjdk.org (duke) Date: Mon, 30 Sep 2024 14:25:39 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v19] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Mon, 30 Sep 2024 13:36:19 GMT, Kangcheng Xu wrote: >> Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. >> >> Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > update comments in TestParallelIvInIntCountedLoop.java @tabjy Your change (at version 6cad8c19663c6023a7b5afea5a5b831e760c2acc) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18489#issuecomment-2383357269 From kxu at openjdk.org Mon Sep 30 14:35:40 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 30 Sep 2024 14:35:40 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v19] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Mon, 30 Sep 2024 13:36:19 GMT, Kangcheng Xu wrote: >> Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. >> >> Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > update comments in TestParallelIvInIntCountedLoop.java We might need a second pair of eyes for the review. Please feel free to do so, and thanks very much! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18489#issuecomment-2383383464 From kxu at openjdk.org Mon Sep 30 15:04:51 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 30 Sep 2024 15:04:51 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v6] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 13:40:59 GMT, Roland Westrelin wrote: >> Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits: >> >> - resolve conflicts >> - resolve conflicts >> - Arithmetic canonicalization v3 (#3) >> >> * 8340144: C1: remove unused Compilation::_max_spills >> >> Reviewed-by: thartmann, shade >> >> * 8340176: Replace usage of -noclassgc with -Xnoclassgc in test/jdk/java/lang/management/MemoryMXBean/LowMemoryTest2.java >> >> Reviewed-by: kevinw, lmesnik >> >> * 8339793: Fix incorrect APX feature enabling with -XX:-UseAPX >> >> Reviewed-by: kvn, thartmann, sviswanathan >> >> * 8340184: Bug in CompressedKlassPointers::is_in_encodable_range >> >> Reviewed-by: coleenp, rkennke, jsjolen >> >> * 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() >> >> Reviewed-by: roland, chagedorn, jkarthikeyan >> >> * 8340119: Remove oopDesc::size_might_change() >> >> Reviewed-by: stefank, iwalulya >> >> * 8340009: Improve the output from assert_different_registers >> >> Reviewed-by: aboldtch, dholmes, shade, mli >> >> * 8340273: Remove CounterHalfLifeTime >> >> Reviewed-by: chagedorn, dholmes >> >> * 8338566: Lazy creation of exception instances is not thread safe >> >> Reviewed-by: shade, kvn, dlong >> >> * 8339648: ZGC: Division by zero in rule_major_allocation_rate >> >> Reviewed-by: aboldtch, lucy, tschatzl >> >> * 8329816: Add SLEEF version 3.6.1 >> >> Reviewed-by: erikj, mli, luhenry >> >> * 8315273: (fs) Path.toRealPath(LinkOption.NOFOLLOW_LINKS) fails when "../../" follows a link (win) >> >> Reviewed-by: djelinski >> >> * 8339574: Behavior of File.is{Directory,File,Hidden} is not documented with respect to symlinks >> >> Reviewed-by: djelinski, alanb >> >> * 8340200: Misspelled constant `AttributesProcessingOption.DROP_UNSTABLE_ATRIBUTES` >> >> Reviewed-by: liach >> >> * 8339934: Simplify Math.scalb(double) method >> >> Reviewed-by: darcy >> >> * 8339790: Support Intel APX setzucc instruction >> >> Reviewed-by: sviswanathan, jkarthikeyan, kvn >> >> * 8340323: Test jdk/classfile/OptionsTest.java fails after JDK-8340200 >> >> Reviewed-by: alanb >> >> * 8338686: App classpath mismatch if a jar from the Class-Path attribute is on the classpath >> >> Reviewed-by: dholmes, iklam >> >> * 8337563: NMT: rename MEMFLAGS to MemTag >> >> ... > > src/hotspot/share/opto/addnode.cpp line 409: > >> 407: // Convert a + a + ... + a into a*n >> 408: Node* AddNode::convert_serial_additions(PhaseGVN* phase, bool can_reshape, BasicType bt) { >> 409: if (is_optimized_multiplication()) { > > What would happen if that `if` is removed? Would matching in the rest of the method accept something that's not a valid pattern? It wouldn't accept invalid patterns. It still makes (semantically) correct transformation but there is a risk it can't progress. The `if` here is to safeguard repeatedly adding terms of an already optimized mul node. Otherwise there would be an eventual timeout/live node limit situation. (E.g., `3 * a` => `(a << 2) + a` => `3 * a` => ...) > src/hotspot/share/opto/addnode.cpp line 562: > >> 560: // swap LShiftNode to lhs for easier matching >> 561: if (!lhs->is_LShift()) { >> 562: Node* tmp = lhs; > > You can use swap() here. Good point. That is more idiomatic c++. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1781309529 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1781309628 From qamai at openjdk.org Mon Sep 30 15:06:46 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 30 Sep 2024 15:06:46 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v12] In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 08:38:48 GMT, Andrew Dinn wrote: >> @dean-long Thanks, that is really helpful. IIUC, the duality here refers to the set of all `TypeInt` with a set `a` considered higher than `b` if `a` is a subset of `b`. This leads to our notion of bottom type being the universe set and top type being the empty set. It still does not make sense for the concept of a dual `TypeInt`, though, since the concept of duality applies to the set of `TypeInt`, not the `TypeInt`s themselves. >> >>> My understanding is "join" means "union", "meet" means "intersection", and "dual" means "complement". >> >> You got it backward, "join" means intersection and "meet" means union. > > If you want to understand full details of how a (symmetrical) type lattice with duals supports a unified model for many different type flow analysis algorithms you can read up on it in Nielsen, Nielsen and Hankin's book Principles of Program Analysis. If it is new to you then a more simplified account of the use of (unqualified) TOP and BOTTOM types in type flow analysis can be found in Muchnick's book Advanced Compiler Design and Implementation. Note that Cliff Click goes against conventional mathematical terminology in making BOTTOM a universal type and TOP an empty (unrealizable) type. > > One detail that may not be obvious is that the sub-lattice for int and long sorts includes the hierarchy of single, continuous intervals. Individual integral values (on the lattice centre line) are modelled as singleton ranges i.e. [a,a]. Given the large cardinality of the set of continuous intervals this makes it necessary to place a bound on any fixed point iterations that widen interval ranges. The iteration is killed by widening to the maximum range (this is what Cliff refers to in the code as a 'death march'). @adinn Thanks a lot for your direction, it is really interesting and took me a while to read through. Although, I think that in practice, currently C2 only uses `dual` to compute the join of 2 types, which is rather confusing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1781311784 From qamai at openjdk.org Mon Sep 30 15:06:46 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 30 Sep 2024 15:06:46 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v18] In-Reply-To: References: <5rE5jUqKmzecH6jMAXpaObv9xYRz3Xi1SCvCKhAQJ9o=.010bec0b-856d-4d71-94c8-7e02f0402a4e@github.com> Message-ID: <2q6SqLFUQWESgeFkpNt77OUu0iqpMfzZMWrejNEU0uM=.4994b29b-c768-462d-b120-7736d62ced69@github.com> On Wed, 18 Sep 2024 19:00:12 GMT, Dean Long wrote: >> No we can't, consider `TypeInt::NON_ZERO`. It would have `_lo = min_jint`, `_hi = max_jint`, `_zeros = 0`, `_ones = 0`. Which make it impossible to distinguish from `TypeInt::INT` without unsigned bounds. > > Ignoring the unsigned issue for a moment, and going back to Dual, if we had the concept of Complement, we could represent NON_ZERO as the complement of 0 <= x <= 0, which would be x > 0 || x < 0. In general, the complement of lo <= x <= hi would be x > hi || x < lo, in contrast to the dual, which I believe is defined as the non-intuitive hi <= x <= lo. I think complement would allow us to represent more complicated expressions, such as !(x>=lo && x<=hi). > > If both dual and complement can be used to map between join and meet, then of the two complement seems more attractive and intuitive. But maybe there is another reason we need dual than I'm missing. The issue then would be the mixture of complementary and non-complementary types. The bounds of a `TypeInt` is the union of 2 intervals `[lo, uhi]` and `[ulo, hi]`, while those of a complementary would be 4, `[min_value, lo]`, `[uhi, -1]`, `[0, ulo]` and `[hi, max_value]`. Additionally, while we only need to compute the union of 2 normal types or 2 dual types. Incorporating complementary would mean that we would need to compute the union of a non-complementary type and a complementary type, which is an entire different beast. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1781309006 From kxu at openjdk.org Mon Sep 30 15:37:43 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 30 Sep 2024 15:37:43 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v6] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 13:38:03 GMT, Roland Westrelin wrote: >> Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits: >> >> - resolve conflicts >> - resolve conflicts >> - Arithmetic canonicalization v3 (#3) >> >> * 8340144: C1: remove unused Compilation::_max_spills >> >> Reviewed-by: thartmann, shade >> >> * 8340176: Replace usage of -noclassgc with -Xnoclassgc in test/jdk/java/lang/management/MemoryMXBean/LowMemoryTest2.java >> >> Reviewed-by: kevinw, lmesnik >> >> * 8339793: Fix incorrect APX feature enabling with -XX:-UseAPX >> >> Reviewed-by: kvn, thartmann, sviswanathan >> >> * 8340184: Bug in CompressedKlassPointers::is_in_encodable_range >> >> Reviewed-by: coleenp, rkennke, jsjolen >> >> * 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() >> >> Reviewed-by: roland, chagedorn, jkarthikeyan >> >> * 8340119: Remove oopDesc::size_might_change() >> >> Reviewed-by: stefank, iwalulya >> >> * 8340009: Improve the output from assert_different_registers >> >> Reviewed-by: aboldtch, dholmes, shade, mli >> >> * 8340273: Remove CounterHalfLifeTime >> >> Reviewed-by: chagedorn, dholmes >> >> * 8338566: Lazy creation of exception instances is not thread safe >> >> Reviewed-by: shade, kvn, dlong >> >> * 8339648: ZGC: Division by zero in rule_major_allocation_rate >> >> Reviewed-by: aboldtch, lucy, tschatzl >> >> * 8329816: Add SLEEF version 3.6.1 >> >> Reviewed-by: erikj, mli, luhenry >> >> * 8315273: (fs) Path.toRealPath(LinkOption.NOFOLLOW_LINKS) fails when "../../" follows a link (win) >> >> Reviewed-by: djelinski >> >> * 8339574: Behavior of File.is{Directory,File,Hidden} is not documented with respect to symlinks >> >> Reviewed-by: djelinski, alanb >> >> * 8340200: Misspelled constant `AttributesProcessingOption.DROP_UNSTABLE_ATRIBUTES` >> >> Reviewed-by: liach >> >> * 8339934: Simplify Math.scalb(double) method >> >> Reviewed-by: darcy >> >> * 8339790: Support Intel APX setzucc instruction >> >> Reviewed-by: sviswanathan, jkarthikeyan, kvn >> >> * 8340323: Test jdk/classfile/OptionsTest.java fails after JDK-8340200 >> >> Reviewed-by: alanb >> >> * 8338686: App classpath mismatch if a jar from the Class-Path attribute is on the classpath >> >> Reviewed-by: dholmes, iklam >> >> * 8337563: NMT: rename MEMFLAGS to MemTag >> >> ... > > src/hotspot/share/opto/addnode.cpp line 442: > >> 440: Node* con = in1->in(2); >> 441: BasicType con_bt = phase->type(con)->basic_type(); >> 442: if (con_bt == T_VOID) { // const could potentially be void type > > Does that happen when con is top? In that case I think it's better to use `con->is_top()`. Or maybe `find_integer_as_long` if there's a value we know for sure that con can't take (it can't be negative, right?). Yes it is better. Will use`is_top()` instead. > if there's a value we know for sure that con can't take (it can't be negative, right?) I thought about `find_integer_as_long(-1)`, but it could be anything, even negative. This transformation accounts for example `a*-43 + a` => `a*-42`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1781357979 From kvn at openjdk.org Mon Sep 30 16:39:35 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 30 Sep 2024 16:39:35 GMT Subject: RFR: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result [v2] In-Reply-To: References: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> Message-ID: On Fri, 27 Sep 2024 19:46:15 GMT, Igor Veresov wrote: >> This is essentially a defensive forward port of a solution for an issue discovered in 11, where a use of pinned load (for which we disable almost all transforms in `LoadNode::Ideal()`) leads to a graph shape that is not expected by `insert_anti_dependences()` >> >> The `insert_anti_dependences()` assumes that the memory input for a load is the root of the memory subgraph that it has to search for the possibly conflicting store. Usually this is true if we run all the memory optimizations but with pinned we don't. >> >> Compare the good graph shape (with control dependency set to `UnknownControl`): >> Good graph >> >> With the graph produce with the nodes pinned: >> Bad graph >> >> With the "bad graph" loads and store don't share the same memory graph root, and therefore are not considered by `insert_anti_dependences()`. The solution, I think, could be to walk up the memory chain of the load, skipping MergeMems, in order to get to the real root and then run the precedence edge insertion algorithm from there. > > Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision: > > 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result > Summary: Make sure insert_anti_dependencies() starts from the right root Good. You need second review for this ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21222#pullrequestreview-2338074157 PR Comment: https://git.openjdk.org/jdk/pull/21222#issuecomment-2383677031 From kvn at openjdk.org Mon Sep 30 16:59:59 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 30 Sep 2024 16:59:59 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 05:02:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: > > - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion > - riscv port refactor > - Remove temporary support code > - Merge jdk-24+17 > - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization > - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions > - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes > - Merge jdk-24+16 > - Ensure that detected encode-and-store patterns are matched > - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion > - ... and 43 more: https://git.openjdk.org/jdk/compare/ae84aa47...14483b83 Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2338111198 From svkamath at openjdk.org Mon Sep 30 17:02:44 2024 From: svkamath at openjdk.org (Smita Kamath) Date: Mon, 30 Sep 2024 17:02:44 GMT Subject: Integrated: 8337632: AES-GCM Algorithm optimization for x86_64 In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 09:38:25 GMT, Smita Kamath wrote: > Hi, > I want to submit an AES-GCM algorithm optimization. This implementation is using AVX512/VAES Instructions. Additionally, it reduces PARALLEL_LEN from 7680 to 512 bytes. The performance numbers are as below. Kindly review the code. Thank you. > > Benchmark | Datasize | BaseJDK (ops/s) | Patch(ops/s) | %Gain > -- | -- | -- | -- | -- > full.AESGCMBench.decrypt | 512 | 2928259.197 | 3269964.387 | 11.67 > full.AESGCMBench.decrypt | 1024 | 2494254.611 | 3010987.731 | 20.72 > full.AESGCMBench.decrypt | 1500 | 1883453.546 | 1934915.846 | 2.73 > full.AESGCMBench.decrypt | 2048 | 1825780.711 | 2452861.368 | 34.34 > full.AESGCMBench.decrypt | 4096 | 1275108.345 | 1806329.066 | 41.66 > full.AESGCMBench.decrypt | 8192 | 1033936.634 | 1196836.052 | 15.75 > full.AESGCMBench.decrypt | 16384 | 681494.768 | 711630.498 | 4.42 > full.AESGCMBench.decrypt | 32768 | 385026.017 | 395043.193 | 2.6 > full.AESGCMBench.decrypt | 65536 | 207373.924 | 214723.588 | 3.54 > ? | ? | ? | ? | ? > full.AESGCMBench.encrypt | 512 | 2658008.476 | 2882496.94 | 8.45 > full.AESGCMBench.encrypt | 1024 | 2283709.63 | 2589534.403 | 13.39 > full.AESGCMBench.encrypt | 1500 | 1794993.519 | 1817669.531 | 1.26 > full.AESGCMBench.encrypt | 2048 | 1745532.435 | 2191097.29 | 25.52 > full.AESGCMBench.encrypt | 4096 | 1203301.174 | 1649593.953 | 37.08 > full.AESGCMBench.encrypt | 8192 | 985174.988 | 1132407.54 | 14.94 > full.AESGCMBench.encrypt | 16384 | 658980.441 | 684765.771 | 3.91 > full.AESGCMBench.encrypt | 32768 | 373543.798 | 391518.837 | 4.81 > full.AESGCMBench.encrypt | 65536 | 202532.315 | 205084.833 | 1.260301597 This pull request has now been integrated. Changeset: a6b31886 Author: Smita Kamath URL: https://git.openjdk.org/jdk/commit/a6b318863fa2775b6381977875b4f466af47beb8 Stats: 1047 lines in 8 files changed: 541 ins; 268 del; 238 mod 8337632: AES-GCM Algorithm optimization for x86_64 Reviewed-by: jbhateja, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/17515 From sviswanathan at openjdk.org Mon Sep 30 21:02:42 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 30 Sep 2024 21:02:42 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Tue, 24 Sep 2024 07:10:24 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Handling NPOT vector length for AArch64 SVE with vector sizes varying b/w 128 and 2048 bits at 128 bit increments. src/hotspot/share/opto/vectorIntrinsics.cpp line 2698: > 2696: int cast_vopc = VectorCastNode::opcode(-1, elem_bt, true); > 2697: if (is_floating_point_type(elem_bt)) { > 2698: index_elem_bt = elem_bt == T_FLOAT ? T_INT : T_LONG; index_elem_bt is already assigned at line 2676 and 2678 so this line could be removed. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 551: > 549: return ((ByteVector)src1).vectorFactory(res); > 550: } > 551: This could instead be: src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); Or even simplified to: src1.rearrange(this.toShuffle(), src2); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1777839817 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1779722306 From psandoz at openjdk.org Mon Sep 30 21:30:42 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 30 Sep 2024 21:30:42 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Sat, 28 Sep 2024 17:37:10 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Handling NPOT vector length for AArch64 SVE with vector sizes varying b/w 128 and 2048 bits at 128 bit increments. > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 551: > >> 549: return ((ByteVector)src1).vectorFactory(res); >> 550: } >> 551: > > This could instead be: > src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); > Or even simplified to: > src1.rearrange(this.toShuffle(), src2); I think you have to do the masking before conversion - `vec.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle()` is not the same as `vec.toShuffle()` for all inputs. jshell> IntVector indexes = IntVector.fromArray(IntVector.SPECIES_256, new int[] {0, 1, 8, 9, 16, 17, 24, 25}, 0); indexes ==> [0, 1, 8, 9, 16, 17, 24, 25] jshell> indexes.lanewise(VectorOperators.AND, indexes.length() * 2 - 1) $19 ==> [0, 1, 8, 9, 0, 1, 8, 9] jshell> indexes.lanewise(VectorOperators.AND, indexes.length() * 2 - 1).toShuffle() $20 ==> Shuffle[0, 1, -8, -7, 0, 1, -8, -7] jshell> indexes.toShuffle() $21 ==> Shuffle[0, 1, -8, -7, -8, -7, -8, -7] ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1781753872 From sviswanathan at openjdk.org Mon Sep 30 22:42:41 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 30 Sep 2024 22:42:41 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Tue, 24 Sep 2024 07:10:24 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Handling NPOT vector length for AArch64 SVE with vector sizes varying b/w 128 and 2048 bits at 128 bit increments. src/hotspot/cpu/x86/x86.ad line 10490: > 10488: %{ > 10489: match(Set index (SelectFromTwoVector (Binary index src1) src2)); > 10490: effect(TEMP index); Just curious, why do we need TEMP index effect? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1781742786 From sviswanathan at openjdk.org Mon Sep 30 22:42:42 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 30 Sep 2024 22:42:42 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Mon, 30 Sep 2024 21:28:22 GMT, Paul Sandoz wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 551: >> >>> 549: return ((ByteVector)src1).vectorFactory(res); >>> 550: } >>> 551: >> >> This could instead be: >> src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); >> Or even simplified to: >> src1.rearrange(this.toShuffle(), src2); > > I think you have to do the masking before conversion - `vec.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle()` is not the same as `vec.toShuffle()` for all inputs. > > > jshell> IntVector indexes = IntVector.fromArray(IntVector.SPECIES_256, new int[] {0, 1, 8, 9, 16, 17, 24, 25}, 0); > indexes ==> [0, 1, 8, 9, 16, 17, 24, 25] > > jshell> indexes.lanewise(VectorOperators.AND, indexes.length() * 2 - 1) > $19 ==> [0, 1, 8, 9, 0, 1, 8, 9] > > jshell> indexes.lanewise(VectorOperators.AND, indexes.length() * 2 - 1).toShuffle() > $20 ==> Shuffle[0, 1, -8, -7, 0, 1, -8, -7] > > jshell> indexes.toShuffle() > $21 ==> Shuffle[0, 1, -8, -7, -8, -7, -8, -7] Thanks for the example. Yes, you have a point there. So we would need to do: src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1781859166 From sviswanathan at openjdk.org Mon Sep 30 23:17:43 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 30 Sep 2024 23:17:43 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Tue, 24 Sep 2024 07:10:24 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Handling NPOT vector length for AArch64 SVE with vector sizes varying b/w 128 and 2048 bits at 128 bit increments. src/hotspot/share/opto/vectorIntrinsics.cpp line 2689: > 2687: !arch_supports_vector(cast_vopc, num_elem, T_BYTE, VecMaskNotUsed) || > 2688: !arch_supports_vector(Op_VectorLoadShuffle, num_elem, index_elem_bt, VecMaskNotUsed) || > 2689: !arch_supports_vector(Op_Replicate, num_elem, T_BYTE, VecMaskNotUsed)) { Where SelectFromTwoVector is not supported, the alternate implementation is as part of SelectFromTwoVectorNode::Ideal() instead of right here. A comment both here as well as in the Ideal() implementation is needed to keep these checks in sync. src/hotspot/share/opto/vectornode.cpp line 2120: > 2118: // are held in a byte vector which are later transformed to target specific permutation > 2119: // index format by subsequent VectorLoadShuffle. > 2120: int cast_vopc = VectorCastNode::opcode(0, index_elem_bt, true); Good to use -1 when we are not sending an actual opcode: int cast_vopc = VectorCastNode::opcode(-1, index_elem_bt, true); src/hotspot/share/opto/vectornode.cpp line 2126: > 2124: Node* bcast_lane_cnt_m1_vec = phase->transform(VectorNode::scalar2vector(lane_cnt_m1, num_elem, Type::get_const_basic_type(T_BYTE), false)); > 2125: > 2126: // Compute the blend mask for merging two indipendently permututed vectors Typo indipendently -> independently ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1781867326 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1781873682 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1781888912 From sviswanathan at openjdk.org Mon Sep 30 23:17:43 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 30 Sep 2024 23:17:43 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <6NYy2NcP98xm3QRYdBWaAkkrvTdquMhhWnm-svxQjwE=.955f6dc8-c74c-472b-8c32-10228bb68d99@github.com> On Mon, 30 Sep 2024 22:51:57 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Handling NPOT vector length for AArch64 SVE with vector sizes varying b/w 128 and 2048 bits at 128 bit increments. > > src/hotspot/share/opto/vectorIntrinsics.cpp line 2689: > >> 2687: !arch_supports_vector(cast_vopc, num_elem, T_BYTE, VecMaskNotUsed) || >> 2688: !arch_supports_vector(Op_VectorLoadShuffle, num_elem, index_elem_bt, VecMaskNotUsed) || >> 2689: !arch_supports_vector(Op_Replicate, num_elem, T_BYTE, VecMaskNotUsed)) { > > Where SelectFromTwoVector is not supported, the alternate implementation is as part of SelectFromTwoVectorNode::Ideal() instead of right here. A comment both here as well as in the Ideal() implementation is needed to keep these checks in sync. We need to add VectorMaskCast here in the checks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1781886783